GNU Anubis Manual: Appendix B Multi-Part Message Processing

Appendix B Multi-Part Message Processing

PREFACE

In its current state (as of Anubis version 4.3) Anubis has proven to be a useful tool for processing plain text outgoing messages. However, its use with MIME messages creates several problems despite of a flexible ruleset supported by the program.

This RFC proposes a new mode of operation that should make processing of MIME messages more convenient.

INTRODUCTION

In general, Anubis processes a message using a set of user-defined rules, called user program, consisting of conditional statements and actions. Both of them may operate on message body as well as on its headers. This mode of operation suites excellently for plain text messages, however it does have its drawbacks when processing multi-part messages.

To begin with, only the first part of multi-part messages is processed, the rest of message is usually passed to the MTA verbatim. Thus, this part can be processed by the user program only if it is in plain text: parts encoded by quoted-printable or, worse yet, base-64 encoding cannot be processed this way. The only way for the user to process non-plaintext multi-part messages is by using some extension procedures (usually external scripts).

A special configuration setting read-entire-body (see section Basic Settings) is provided that forces Anubis to process the entire body of a multi-part message (among other effects it means passing entire body to the external scripts as well). However, it does not help solve the problem, since no attempt is being made to decode parts of the message, so the user is left on his own when processing such messages.

The solution proposed by this memo boils down to the following: process each part of the multi-part message as a message on its own allowing user to define different RULE sections for processing different MIME types. The following sections describe the approach in more detail.

MULTI-PART MESSAGE PROCESSING

When processing a multi part message, Anubis first determines its MIME type. A user is allowed to define several RULE sections(9) that are supposed to handle different MIME types. Anubis keeps a type <-> section association table (a dispatcher table) which is used to determine the entry point for processing of each particular part. If the dispatcher table does not contain an entry for the given MIME type, the contents of the part is passed verbatim. Otherwise, Anubis decodes the part body and passes it for further processing to the RULE section. When invoking this particular section, MIME headers act as a message headers and MIME body acts as its body. After the code section finishes processing of the message part, it is encoded again(10) and then passed to the output.

RECURSIVE NATURE

MIME standards allow multi-part messages to be nested to arbitrary depth, therefore the described above process is inherently recursive. This brings following implications:

The dispatcher table must contain several built-in entries that will handle recursive descent to the messages of determined MIME type. At least messages having multipart/* and message/rfc822 contents must be handled. These entries must be configurable, thus giving final user a possibility to disable some of them. Preferably there should exist a way of specifying new recursive types as well.
A confuguration parameter must be provided that will limit the maximum recursion depth for such messages.

MIME DISPATCHER TABLE

The structure of MIME dispatcher table should allow for flexible search of user program entries depending on MIME type of the part being processed. It is important also that it allows for a default entry, i.e. an entry that will be used for processing a part whose type is not explicitely mentioned in the table. The absence of such default entry should be taken as indication that the part must be transferred verbatim.

Thus, each entry of the dispatcher table must contain at least the following members.

type: Specifies regular expressions describing MIME type this entry handles. For the sake of clarity this memo uses shell-style regular expressions (see glob(7) or fnmatch(3)). However, Anubis implementation can use any other regular expression style it deems appropriate.
entry point: Specifies an entry point to the code section that handles MIME parts of given type. The entry point is either nil, meaning default processing (thus the default entry can be represented as ("*" . nil) at the end of the table), or one of predefined entry points serving for recursive procession of message parts, or, finally, it is a code index of a user-defined rule section.

The dispatcher table can contain several entries matching a given MIME type. In this case, the entry point of each of them must be invoked in turn. For example, consider this dispatcher table:

<text/plain> ⇒ plaintext
<text/x-patch> ⇒ patchfile
<text/*> ⇒ anytext

When processing a part of type text/plain using this dispatcher table, first the section named plaintext is called, then its output is gathered and used as input for the section named anytext. Such approach allows for building flexible structured user programs.

CONFIGURATION ENTITIES

This memo proposes addition of following configuration entities to CONTROL section of Anubis configuration file. These entries may be used in both system-wide and user-specific configuration files, the order of their priority being determined as usual by the rule-priority statement (see section Security Settings).

Option: clear-dispatch-table: This option discards from the dispatcher table all entries gathered so far.

Option: dispatch-mime-type section-id regexp-list

This option adds or modifies entries in MIME dispatcher table. Section-id specifies the section identifier, i.e. either the name of a user-defined rule section, or one of the keywords none and recurse. In the former case, Anubis must make sure the named section is actually defined in the configuration file and issue an error message otherwise.

Regexp-list is whitespace-separated list of regular expressions specifying MIME types that are to be handled by section-id.

The effect of this option is that for each regular expression re from the list regexp-list, the dispatcher table is searched for an entry whose type field is exactly the same as re(11). If such an entry is found, its entry code field is replaced with section-id. Otherwise, if no matching entry was found a new one is constructed:

(re . section-id)

and appended to the end of the list.

For example:

dispatch-mime-type recurse "multipart/*" "message/rfc822"
dispatch-mime-type Text "text/*"
dispatch-mime-type none "*"

This example specifies that messages (or parts) with types matching multipart/* and message/rfc822 must be recursed into, those of type text/* must be processed by user-defined section Text and the rest of parts must be transferred verbatim. The section Text must be declared somewhere in the configuration file as

BEGIN Text
…
END

Notice that the very first dispatch-mime-type specifies a built-in entry. This memo does not specify whether such a built-in entry must be present by default, or it should be explicitely declared as in the example above. The explicit declaration seems to have advantage of preserving backward compatibility with versions 4.0 and earlier of Anubis (see COMPATIBILITY CONSIDERATIONS).

Notice also that when encountering the very first dispatch-mime-type (or dispatch-mime-type-prepend, see below) statement in the user configuration file, Anubis must remove the default entry (if any) from the existing dispatcher table. Such entry should be added back after processing user’s CONTROL section, unless clear-dispatch-table has been used.

Option: dispatch-mime-type-prepend section-id regexp-list: Has the same effect as dispatch-mime-type except that the entries are prepended to the dispatcher table.

Option: recursion-depth number: This option limits the maximum recursion depth when processing multi-part messages to number.

TEXT vs BINARY MIME PARTS

This memo does not determine how exactly is Anubis supposed to discern between text and binary messages. The simplest way is by using the Content-Type header: if it contains charset= then it describes a text part. Otherwise it describes a binary part. Probably some more sophisticated methods should be implemented.

To avoid dependency on any particular charset, text parts must be decoded to UTF-8. Correspondingly, any literals used in Anubis configuration files must represent valid UTF-8 strings. However, this memo does not specify whether Anubis implementation should enforce UTF-8 strings in its configuration files.

It is possible to specify processing rules for binary MIME parts. However, Anubis does not provide any mechanism for binary processing, not is it supposed to provide any. This memo maintains that the existing external-body-processor and guile-process statements are quite sufficient for processing any binary message parts.

SAMPLE CONFIGURATION FILE

BEGIN CONTROL
  dispatch-mime-type recurse "multipart/*" "message/rfc822"
  dispatch-mime-type plaintext "text/plain"
  dispatch-mime-type image "img/*" 
END CONTROL

SECTION plaintext
  modify body ["now"] "then"
END

SECTION image
  external-body-processor resize-message
END

This example configuration shows the idea of using external-body-processor statement for binary part processing. The following version of resize-message script uses convert program for reducing image size to 120x120 pixels:

#! /bin/sh
TMP=$HOME/tmp/$$
cat - > $TMP
convert -size 120x120 $TMP.jpg -resize 120x120 +profile '*' out-$TMP
rm $TMP
cat out-$TMP
rm out-$TMP

COMPATIBILITY CONSIDERATIONS

In the absense of any dispatch-mime-type statements, Anubis should behave exactly as version 4.0 did. Specifying

clear-dispatch-table

in the user configuration file should produce the same effect. This can be useful if system-wide configuration file contained some dispatch-mime-type statements.

SECURITY CONSIDERATIONS

This specification is believed to not introduce any special security considerations.

This document was generated on January 6, 2024 using texi2html 5.0.