1 $Cambridge: exim/doc/doc-txt/README.SIEVE,v 1.4 2005/05/03 10:02:27 ph10 Exp $
3 Notes on the Sieve implementation for Exim
5 Exim Filter Versus Sieve Filter
7 Exim supports two incompatible filters: The traditional Exim filter and
8 the Sieve filter. Since Sieve is a extensible language, it is important
9 to understand "Sieve" in this context as "the specific implementation
12 The Exim filter contains more features, such as variable expansion, and
13 better integration with the host environment, like external processes
16 Sieve is a standard for interoperable filters, defined in RFC 3028,
17 with multiple implementations around. If interoperability is important,
18 then there is no way around it.
23 The Exim Sieve implementation offers the core as defined by RFC 3028, the
24 "envelope" (RFC 3028), the "fileinto" (RFC 3028), the "copy" (RFC 3894)
25 and the "vacation" (draft-ietf-sieve-vacation-01.txt) extension,
26 the "i;ascii-numeric" comparator, but not the "reject" extension.
27 Exim does not support MDMs, so adding it just to the sieve filter makes
30 The Sieve filter is integrated in Exim and works very similar to the
31 Exim filter: Sieve scripts are recognized by the first line containing
32 "# sieve filter". When using "keep" or "fileinto" to save a mail into a
33 folder, the resulting string is available as the variable $address_file
34 in the transport that stores it. A suitable transport could be:
38 file = ${if eq{$address_file}{inbox} \
39 {/var/mail/$local_part} \
40 {${if eq{${substr_0_1:$address_file}}{/} \
42 {$home/$address_file} \
50 Absolute files are stored where specified, relative files are stored
51 relative to $home and "inbox" goes to the standard mailbox location.
53 To enable "vacation", set sieve_vacation_directory for the router to
54 the directory where vacation databases are held (don't put anything
55 else in that directory) and point reply_transport to an autoreply
61 Exim requires the first line to be "# sieve filter". Of course the RFC
62 does not enforce that line. Don't expect examples to work without adding
65 RFC 3028 requires using CRLF to terminate the end of a line.
66 The rationale was that CRLF is universally used in network protocols
67 to mark the end of the line. This implementation does not embed Sieve
68 in a network protocol, but uses Sieve scripts as part of the Exim MTA.
69 Since all parts of Exim use \n as newline character, this implementation
70 does, too. You can change this by defining the macro RFC_EOL at compile
71 time to enforce CRLF being used.
73 Exim violates RFC 2822, section 3.6.8, by accepting 8-bit header names, so
74 this implementation repeats this violation to stay consistent with Exim.
75 This is in preparation to UTF-8 data.
77 Sieve scripts can not contain NUL characters in strings, but mail
78 headers could contain MIME encoded NUL characters, which could never
79 be matched by Sieve scripts using exact comparisons. For that reason,
80 this implementation extends the Sieve quoted string syntax with \0
81 to describe a NUL character, violating \0 being the same as 0 in
82 RFC 3028. Even without using \0, the following tests are all true in
83 this implementation. Implementations that use C-style strings will only
84 evaulate the first test as true.
86 Subject: =?iso-8859-1?q?abc=00def
88 header :contains "Subject" ["abc"]
89 header :contains "Subject" ["def"]
90 header :matches "Subject" ["abc?def"]
92 Note that by considering Sieve to be a MUA, RFC 2047 can be interpreted
93 in a way that NUL characters truncating strings is allowed for Sieve
94 implementations, although not recommended. It is further allowed to use
95 encoded NUL characters in headers, but that's not recommended either.
96 The above example shows why. Good code should still be able to deal
99 RFC 3028 states that if an implementation fails to convert a character
100 set to UTF-8, two strings can not be equal if one contains octects greater
101 than 127. Assuming that all unknown character sets are one-byte character
102 sets with the lower 128 octects being US-ASCII is not sound, so this
103 implementation violates RFC 3028 and treats such MIME words literally.
104 That way at least something could be matched.
106 The folder specified by "fileinto" must not contain the character
107 sequence ".." to avoid security problems. RFC 3028 does not specifiy the
108 syntax of folders apart from keep being equivalent to fileinto "INBOX".
109 This implementation uses "inbox" instead.
111 Sieve script errors currently cause that messages are silently filed into
112 "inbox". RFC 3028 requires that the user is notified of that condition.
113 This may be implemented in future by adding a header line to mails that
114 are filed into "inbox" due to an error in the filter.
117 Strings Containing Header Names Or Envelope Elements
119 RFC 3028 does not specify what happens if a string denoting a header
120 field or envelope element does not contain a valid name, e.g. it
121 contains a colon for a header or it is not "from" or "to" for envelopes.
122 This implementation generates an error instead of ignoring the header
123 field in order to ease script debugging, which fits in the common picture
127 Header Test With Invalid MIME Encoding In Header
129 Some MUAs process invalid base64 encoded data, generating junk.
130 Others ignore junk after seeing an equal sign in base64 encoded data.
131 RFC 2047 does not specify how to react in this case, other than stating
132 that a client must not forbid to process a message for that reason.
133 RFC 2045 specifies that invalid data should be ignored (appearantly
134 looking at end of line characters). It also specifies that invalid data
135 may lead to rejecting messages containing them (and there it appears to
136 talk about true encoding violations), which is a clear contradiction to
139 RFC 3028 does not specify how to process incorrect MIME words.
140 This implementation treats them literally, as it does if the word is
141 correct, but its character set can not be converted to UTF-8.
144 Address Test For Multiple Addresses Per Header
146 A header may contain multiple addresses. RFC 3028 does not explicitly
147 specify how to deal with them, but since the "address" test checks if
148 anything matches anything else, matching one address suffices to
149 satify the condition. That makes it impossible to test if a header
150 contains a certain set of addresses and no more, but it is more logical
151 than letting the test fail if the header contains an additional address
152 besides the one the test checks for.
157 The keep command is equivalent to fileinto "inbox": It saves the
158 message and resets the implicit keep flag. It does not set the
159 implicit keep flag; there is no command to set it once it has
163 Semantics of Fileinto
165 RFC 3028 does not specify if "fileinto" tries to create a mail folder,
166 in case it does not exist. This implementation allows to configure
167 that aspect using the appendfile transport options "create_directory",
168 "create_file" and "file_must_exist". See the appendfile transport in
169 the Exim specification for details.
172 Semantics of Redirect
174 Sieve scripts are supposed to be interoperable between servers, so this
175 implementation does not allow redirecting mail to unqualified addresses,
176 because the domain would depend on the used system and on systems with
177 virtual mail domains it is probably not what the user expects it to be.
182 There has been confusion if the string arguments to "require" are to be
183 matched case-sensitive or not. This implementation matches them with
184 the match type ":is" (default, see section 2.7.1) and the comparator
185 "i;ascii-casemap" (default, see section 2.7.3). The RFC defines the
186 command defaults clearly, so any different implementations violate RFC
187 3028. The same is valid for comparator names, also specified as strings.
192 There is a mistake in RFC 3028: The suffix G denotes gibi-, not tebibyte.
193 The mistake os obvious, because RFC 3028 specifies G to denote 2^30
194 (which is gibi, not tebi), and that's what this implementation uses as
195 scaling factor for the suffix G.
198 Sieve Syntax and Semantics
200 RFC 3028 confuses syntax and semantics sometimes. It uses a generic
201 grammar as syntax for actions and tests and performs many checks during
202 semantic analysis. Syntax is specified as grammar rule, semantics
203 with natural language, despire the latter often talking about syntax.
204 The intention was to provide a framework for the syntax that describes
205 current commands as well as future extensions, and describing commands
206 by semantics. Since the semantic analysis is not specified by formal
207 rules, it is easy to get that phase wrong, as demonstrated by the mistake
208 in RFC 3028 to forbid "elsif" being followed by "elsif" (which is allowed
209 in Sieve, it's just not specified correctly).
211 RFC 3028 does not define if semantic checks are strict (always treat
212 unknown extensions as errors) or lazy (treat unknown extensions as error,
213 if they are executed), and since it employs a very generic grammar,
214 it is not unreasonable for an implementation using a parser for the
215 generic grammar to indeed process scripts that contain unknown commands
216 in dead code. It is just required to treat disabled but known extensions
217 the same as unknown extensions.
219 The following suggestion for section 8.2 gives two grammars, one for
220 the framework, and one for specific commands, thus removing most of the
221 semantic analysis. Since the parser can not parse unsupported extensions,
222 the result is strict error checking. As required in section 2.10.5, known
223 but not enabled extensions must behave the same as unknown extensions,
224 so those also result strictly in errors (though at the thin semantic
225 layer), even if they can be parsed fine.
229 The atoms of the grammar are lexical tokens. White space or comments may
230 appear anywhere between lexical tokens, they are not part of the grammar.
231 The grammar is specified in ABNF with two extensions to describe tagged
232 arguments that can be reordered and grammar extensions: { } denotes a
233 sequence of symbols that may appear in any order. Example:
239 start = ( a b c ) / ( a c b ) / ( b a c ) / ( b c a ) / ( c a b ) / ( c b a )
241 The symbol =) is used to append to a rule:
250 All Sieve commands, including extensions, MUST be words of the following
251 generic grammar with the start symbol "start". They SHOULD be specified
252 using a specific grammar, though.
254 argument = string-list / number / tag
255 arguments = *argument [test / test-list]
256 block = "{" commands "}"
258 string = quoted-string / multi-line
259 string-list = "[" string *("," string) "]" / string
260 test = identifier arguments
261 test-list = "(" test *("," test) ")"
262 command = identifier arguments ( ";" / block )
265 The basic Sieve commands are specified using the following grammar, which
266 language is a subset of the generic grammar above. The start symbol is
269 address-part = ":localpart" / ":domain" / ":all"
270 comparator = ":comparator" string
271 match-type = ":is" / ":contains" / ":matches"
272 string = quoted-string / multi-line
273 string-list = "[" string *("," string) "]" / string
274 address-test = "address" { [address-part] [comparator] [match-type] }
275 string-list string-list
276 test-list = "(" test *("," test) ")"
277 allof-test = "allof" test-list
278 anyof-test = "anyof" test-list
279 exists-test = "exists" string-list
282 header-test = "header" { [comparator] [match-type] }
283 string-list string-list
284 not-test = "not" test
285 relop = ":over" / ":under"
286 size-test = "size" relop number
287 block = "{" commands "}"
288 if-command = "if" test block *( "elsif" test block ) [ "else" block ]
289 stop-command = "stop" { stop-options } ";"
291 keep-command = "keep" { keep-options } ";"
293 discard-command = "discard" { discard-options } ";"
295 redirect-command = "redirect" { redirect-options } string ";"
297 require-command = "require" { require-options } string-list ";"
299 test = address-test / allof-test / anyof-test / exists-test
300 / false-test / true-test / header-test / not-test
302 command = if-command / stop-command / keep-command
303 / discard-command / redirect-command
305 start = *require-command commands
307 The extensions "envelope" and "fileinto" are specified using the following
310 envelope-test = "envelope" { [comparator] [address-part] [match-type] }
311 string-list string-list
312 test =/ envelope-test
314 fileinto-command = "fileinto" { fileinto-options } string ";"
316 command =/ fileinto-command
318 The extension "copy" is specified as:
320 fileinto-options =) ":copy"
321 redirect-options =) ":copy"
324 The i;ascii-numeric Comparator
326 RFC 2244 describes this comparator and specifies that non-numeric strings
327 are considered equal with an ordinal value higher than any numeric string.
328 Although not stated explicitly, this includes the empty string. A range
329 of at least 2^31 is required. This implementation does not limit the
330 range, because it does not convert numbers to binary representation
331 before comparing them.
334 The vacation extension
336 The extension "vacation" is specified using the following grammar
339 vacation-command = "vacation" { vacation-options } <reason: string>
340 vacation-options = [":days" number]
343 [":addresses" string-list]
346 command =/ vacation-command
351 The draft does not specify how strings using MIME entities are used
352 to compose messages. As a result, different implementations generate
353 different mails. The Exim Sieve implementation splits the reason into
354 header and body. It adds the header to the mail header and uses the body
355 as mail body. Be aware, that other imlementations compose a multipart
356 structure with the reason as only part. Both conform to the specification
360 Semantics Of Not Using ":mime"
362 Sieve scripts are written in UTF-8, so is the reason string in this
363 case. This implementation adds MIME headers to indicate that. This
364 is not required by the vacation draft, which does not specify how
365 the UTF-8 reason is processed to compose the resulting message.
370 The draft specifies that the default message subject is "Re: "
371 plus the old subject, stripped by any leading "Re: " strings.
372 This string is to be taken literally, unlike some software which
373 matches a regular expression like "[rR][eE]: *". Using this
374 subject is dangerous, because many mailing lists verify addresses
375 by sending a secret key in the subject of a message, asking to
376 reply to the message for confirmation. Using the default vacation
377 subject confirms any subscription request of this kind, allowing
378 to subscribe a third party to any mailing list, either to annoy
379 the user or to declare spam as legitimate mail by proving to
380 use opt-in. The draft specifies to use "Re: " in front of the
381 subject, but this implementation uses "Auto: ", as suggested in
382 RFC 3834, section 3.1.5.
385 Rate Limiting Responses
387 In absence of a handle, this implementation hashes the reason,
388 ":subject" option, ":mime" option and ":from" option and uses the hex
389 string representation as filename within the "sieve_vacation_directory"
390 to store the recipient addresses for this vacation parameter set.
392 The draft specifies that sites may define a minimum ":days" value than 1.
393 This implementation uses 1. The maximum value MUST greater than 7,
394 and SHOULD be greater than 30. This implementation uses a maximum of 31.
396 Vacation recipient address databases older than 31 days are automatically
397 removed. Users do not have to remove them manually when modifying their
398 scripts. Don't put anything but vacation databases in that directory
399 or you risk that it will be removed, too!
402 Global Reply Address Blacklist
404 The draft requires that each implementation offers a global black list
405 of addresses that will never be replied to. Exim offers this as option
406 "never_mail" in the autoreply transport.
409 Interaction With Other Sieve Elements
411 The draft describes the interaction with vacation, discard, keep,
412 fileinto and redirect. It MUST describe compatibility with other
413 actions, but doesn't. In this implementation, vacation is compatible
414 with any other action.