Imported Upstream version 0.66.1
[hcoop/debian/courier-authlib.git] / libs / rfc822 / rfc822.3
1 '\" t
2 .\"<!-- Copyright 2001-2007 Double Precision, Inc. See COPYING for -->
3 .\"<!-- distribution information. -->
4 .\" Title: rfc822
5 .\" Author: Sam Varshavchik
6 .\" Generator: DocBook XSL Stylesheets v1.78.1 <http://docbook.sf.net/>
7 .\" Date: 08/25/2013
8 .\" Manual: Double Precision, Inc.
9 .\" Source: Courier Mail Server
10 .\" Language: English
11 .\"
12 .TH "RFC822" "3" "08/25/2013" "Courier Mail Server" "Double Precision, Inc\&."
13 .\" -----------------------------------------------------------------
14 .\" * Define some portability stuff
15 .\" -----------------------------------------------------------------
16 .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17 .\" http://bugs.debian.org/507673
18 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
19 .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
20 .ie \n(.g .ds Aq \(aq
21 .el .ds Aq '
22 .\" -----------------------------------------------------------------
23 .\" * set default formatting
24 .\" -----------------------------------------------------------------
25 .\" disable hyphenation
26 .nh
27 .\" disable justification (adjust text to left margin only)
28 .ad l
29 .\" -----------------------------------------------------------------
30 .\" * MAIN CONTENT STARTS HERE *
31 .\" -----------------------------------------------------------------
32 .SH "NAME"
33 rfc822 \- RFC 822 parsing library
34 .SH "SYNOPSIS"
35 .sp
36 .nf
37 #include <rfc822\&.h>
38
39 #include <rfc2047\&.h>
40
41 cc \&.\&.\&. \-lrfc822
42 .fi
43 .SH "DESCRIPTION"
44 .PP
45 The rfc822 library provides functions for parsing E\-mail headers in the RFC 822 format\&. This library also includes some functions to help with encoding and decoding 8\-bit text, as defined by RFC 2047\&.
46 .PP
47 The format used by E\-mail headers to encode sender and recipient information is defined by
48 \m[blue]\fBRFC 822\fR\m[]\&\s-2\u[1]\d\s+2
49 (and its successor,
50 \m[blue]\fBRFC 2822\fR\m[]\&\s-2\u[2]\d\s+2)\&. The format allows the actual E\-mail address and the sender/recipient name to be expressed together, for example:
51 John Smith <jsmith@example\&.com>
52 .PP
53 The main purposes of the rfc822 library is to:
54 .PP
55 1) Parse a text string containing a list of RFC 822\-formatted address into its logical components: names and E\-mail addresses\&.
56 .PP
57 2) Access those individual components\&.
58 .PP
59 3) Allow some limited modifications of the parsed structure, and then convert it back into a text string\&.
60 .SS "Tokenizing an E\-mail header"
61 .sp
62 .if n \{\
63 .RS 4
64 .\}
65 .nf
66 struct rfc822t *tokens=rfc822t_alloc_new(const char *header,
67 void (*err_func)(const char *, int, void *),
68 void *func_arg);
69
70 void rfc822t_free(tokens);
71 .fi
72 .if n \{\
73 .RE
74 .\}
75 .PP
76 The
77 \fBrfc822t_alloc_new\fR() function (superceeds
78 \fBrfc822t_alloc\fR(), which is now obsolete) accepts an E\-mail
79 \fIheader\fR, and parses it into individual tokens\&. This function allocates and returns a pointer to an
80 rfc822t
81 structure, which is later used by
82 \fBrfc822a_alloc\fR() to extract individual addresses from these tokens\&.
83 .PP
84 If
85 \fIerr_func\fR
86 argument, if not NULL, is a pointer to a callback function\&. The function is called in the event that the E\-mail header is corrupted to the point that it cannot even be parsed\&. This is a rare instance \-\- most forms of corruption are still valid at least on the lexical level\&. The only time this error is reported is in the event of mismatched parenthesis, angle brackets, or quotes\&. The callback function receives the
87 \fIheader\fR
88 pointer, an index to the syntax error in the header string, and the
89 \fIfunc_arg\fR
90 argument\&.
91 .PP
92 The semantics of
93 \fIerr_func\fR
94 are subject to change\&. It is recommended to leave this argument as NULL in the current version of the library\&.
95 .PP
96 \fBrfc822t_alloc\fR() returns a pointer to a dynamically\-allocated
97 rfc822t
98 structure\&. A NULL pointer is returned if there\*(Aqs insufficient memory to allocate this structure\&. The
99 \fBrfc822t_free\fR() function destroys
100 rfc822t
101 structure and frees all dynamically allocated memory\&.
102 .if n \{\
103 .sp
104 .\}
105 .RS 4
106 .it 1 an-trap
107 .nr an-no-space-flag 1
108 .nr an-break-flag 1
109 .br
110 .ps +1
111 \fBNote\fR
112 .ps -1
113 .br
114 .PP
115 Until
116 \fBrfc822t_free\fR() is called, the contents of
117 \fIheader\fR
118 MUST NOT be destroyed or altered in any way\&. The contents of
119 \fIheader\fR
120 are not modified by
121 \fBrfc822t_alloc\fR(), however the
122 rfc822t
123 structure contains pointers to portions of the supplied
124 \fIheader\fR, and they must remain valid\&.
125 .sp .5v
126 .RE
127 .SS "Extracting E\-mail addresses"
128 .sp
129 .if n \{\
130 .RS 4
131 .\}
132 .nf
133 struct rfc822a *addrs=rfc822a_alloc(struct rfc822t *tokens);
134
135 void rfc822a_free(addrs);
136 .fi
137 .if n \{\
138 .RE
139 .\}
140 .PP
141 The
142 \fBrfc822a_alloc\fR() function returns a dynamically\-allocated
143 rfc822a
144 structure, that contains individual addresses that were logically parsed from a
145 rfc822t
146 structure\&. The
147 \fBrfc822a_alloc\fR() function returns NULL if there was insufficient memory to allocate the
148 rfc822a
149 structure\&. The
150 \fBrfc822a_free\fR() function destroys the
151 rfc822a
152 function, and frees all associated dynamically\-allocated memory\&. The
153 rfc822t
154 structure passed to
155 \fBrfc822a_alloc\fR() must not be destroyed before
156 \fBrfc822a_free\fR() destroys the
157 rfc822a
158 structure\&.
159 .PP
160 The
161 rfc822a
162 structure has the following fields:
163 .sp
164 .if n \{\
165 .RS 4
166 .\}
167 .nf
168 struct rfc822a {
169 struct rfc822addr *addrs;
170 int naddrs;
171 } ;
172 .fi
173 .if n \{\
174 .RE
175 .\}
176 .PP
177 The
178 \fInaddrs\fR
179 field gives the number of
180 rfc822addr
181 structures that are pointed to by
182 \fIaddrs\fR, which is an array\&. Each
183 rfc822addr
184 structure represents either an address found in the original E\-mail header,
185 \fIor the contents of some legacy "syntactical sugar"\fR\&. For example, the following is a valid E\-mail header:
186 .sp
187 .if n \{\
188 .RS 4
189 .\}
190 .nf
191 To: recipient\-list: tom@example\&.com, john@example\&.com;
192 .fi
193 .if n \{\
194 .RE
195 .\}
196 .PP
197 Typically, all of this, except for "To:", is tokenized by
198 \fBrfc822t_alloc\fR(), then parsed by
199 \fBrfc822a_alloc\fR()\&. "recipient\-list:" and the trailing semicolon is a legacy mailing list specification that is no longer in widespread use, but must still must be accounted for\&. The resulting
200 rfc822a
201 structure will have four
202 rfc822addr
203 structures: one for "recipient\-list:"; one for each address; and one for the trailing semicolon\&. Each
204 rfc822a
205 structure has the following fields:
206 .sp
207 .if n \{\
208 .RS 4
209 .\}
210 .nf
211 struct rfc822addr {
212 struct rfc822token *tokens;
213 struct rfc822token *name;
214 } ;
215 .fi
216 .if n \{\
217 .RE
218 .\}
219 .PP
220 If
221 \fItokens\fR
222 is a null pointer, this structure represents some non\-address portion of the original header, such as "recipient\-list:" or a semicolon\&. Otherwise it points to a structure that represents the E\-mail address in tokenized form\&.
223 .PP
224 \fIname\fR
225 either points to the tokenized form of a non\-address portion of the original header, or to a tokenized form of the recipient\*(Aqs name\&.
226 \fIname\fR
227 will be NULL if the recipient name was not provided\&. For the following address:
228 Tom Jones <tjones@example\&.com>
229 \- the
230 \fItokens\fR
231 field points to the tokenized form of "tjones@example\&.com", and
232 \fIname\fR
233 points to the tokenized form of "Tom Jones"\&.
234 .PP
235 Each
236 rfc822token
237 structure contains the following fields:
238 .sp
239 .if n \{\
240 .RS 4
241 .\}
242 .nf
243 struct rfc822token {
244 struct rfc822token *next;
245 int token;
246 const char *ptr;
247 int len;
248 } ;
249 .fi
250 .if n \{\
251 .RE
252 .\}
253 .PP
254 The
255 \fInext\fR
256 pointer builds a linked list of all tokens in this name or address\&. The possible values for the
257 \fItoken\fR
258 field are:
259 .PP
260 0x00
261 .RS 4
262 This is a simple atom \- a sequence of non\-special characters that is delimited by whitespace or special characters (see below)\&.
263 .RE
264 .PP
265 0x22
266 .RS 4
267 The value of the ascii quote \- this is a quoted string\&.
268 .RE
269 .PP
270 Open parenthesis: \*(Aq(\*(Aq
271 .RS 4
272 This is an old style comment\&. A deprecated form of E\-mail addressing uses \- for example \- "john@example\&.com (John Smith)" instead of "John Smith <john@example\&.com>"\&. This old\-style notation defined parenthesized content as arbitrary comments\&. The
273 rfc822token
274 with
275 \fItoken\fR
276 set to \*(Aq(\*(Aq is created for the contents of the entire comment\&.
277 .RE
278 .PP
279 Symbols: \*(Aq<\*(Aq, \*(Aq>\*(Aq, \*(Aq@\*(Aq, and many others
280 .RS 4
281 The remaining possible values of
282 \fItoken\fR
283 include all the characters in RFC 822 headers that have special significance\&.
284 .RE
285 .PP
286 When a
287 rfc822token
288 structure does not represent a special character, the
289 \fIptr\fR
290 field points to a text string giving its contents\&. The contents are NOT null\-terminated, the
291 \fIlen\fR
292 field contains the number of characters included\&. The macro rfc822_is_atom(token) indicates whether
293 \fIptr\fR
294 and
295 \fIlen\fR
296 are used for the given
297 \fItoken\fR\&. Currently
298 \fBrfc822_is_atom\fR() returns true if
299 \fItoken\fR
300 is a zero byte, \*(Aq"\*(Aq, or \*(Aq(\*(Aq\&.
301 .PP
302 Note that it\*(Aqs possible that
303 \fIlen\fR
304 might be zero\&. This happens with null addresses used as return addresses for delivery status notifications\&.
305 .SS "Working with E\-mail addresses"
306 .sp
307 .if n \{\
308 .RS 4
309 .\}
310 .nf
311 void rfc822_deladdr(struct rfc822a *addrs, int index);
312
313 void rfc822tok_print(const struct rfc822token *list,
314 void (*func)(char, void *), void *func_arg);
315
316 void rfc822_print(const struct rfc822a *addrs,
317 void (*print_func)(char, void *),
318 void (*print_separator)(const char *, void *), void *callback_arg);
319
320 void rfc822_addrlist(const struct rfc822a *addrs,
321 void (*print_func)(char, void *),
322 void *callback_arg);
323
324 void rfc822_namelist(const struct rfc822a *addrs,
325 void (*print_func)(char, void *),
326 void *callback_arg);
327
328 void rfc822_praddr(const struct rfc822a *addrs,
329 int index,
330 void (*print_func)(char, void *),
331 void *callback_arg);
332
333 void rfc822_prname(const struct rfc822a *addrs,
334 int index,
335 void (*print_func)(char, void *),
336 void *callback_arg);
337
338 void rfc822_prname_orlist(const struct rfc822a *addrs,
339 int index,
340 void (*print_func)(char, void *),
341 void *callback_arg);
342
343 char *rfc822_gettok(const struct rfc822token *list);
344 char *rfc822_getaddrs(const struct rfc822a *addrs);
345 char *rfc822_getaddr(const struct rfc822a *addrs, int index);
346 char *rfc822_getname(const struct rfc822a *addrs, int index);
347 char *rfc822_getname_orlist(const struct rfc822a *addrs, int index);
348
349 char *rfc822_getaddrs_wrap(const struct rfc822a *, int);
350 .fi
351 .if n \{\
352 .RE
353 .\}
354 .PP
355 These functions are used to work with individual addresses that are parsed by
356 \fBrfc822a_alloc\fR()\&.
357 .PP
358 \fBrfc822_deladdr\fR() removes a single
359 rfc822addr
360 structure, whose
361 \fIindex\fR
362 is given, from the address array in
363 rfc822addr\&.
364 \fInaddrs\fR
365 is decremented by one\&.
366 .PP
367 \fBrfc822tok_print\fR() converts a tokenized
368 \fIlist\fR
369 of
370 rfc822token
371 objects into a text string\&. The callback function,
372 \fIfunc\fR, is called one character at a time, for every character in the tokenized objects\&. An arbitrary pointer,
373 \fIfunc_arg\fR, is passed unchanged as the additional argument to the callback function\&.
374 \fBrfc822tok_print\fR() is not usually the most convenient and efficient function, but it has its uses\&.
375 .PP
376 \fBrfc822_print\fR() takes an entire
377 rfc822a
378 structure, and uses the callback functions to print the contained addresses, in their original form, separated by commas\&. The function pointed to by
379 \fIprint_func\fR
380 is used to print each individual address, one character at a time\&. Between the addresses, the
381 \fIprint_separator\fR
382 function is called to print the address separator, usually the string ", "\&. The
383 \fIcallback_arg\fR
384 argument is passed along unchanged, as an additional argument to these functions\&.
385 .PP
386 The functions
387 \fBrfc822_addrlist\fR() and
388 \fBrfc822_namelist\fR() also print the contents of the entire
389 rfc822a
390 structure, but in a different way\&.
391 \fBrfc822_addrlist\fR() prints just the actual E\-mail addresses, not the recipient names or comments\&. Each E\-mail address is followed by a newline character\&.
392 \fBrfc822_namelist\fR() prints just the names or comments, followed by newlines\&.
393 .PP
394 The functions
395 \fBrfc822_praddr\fR() and
396 \fBrfc822_prname\fR() are just like
397 \fBrfc822_addrlist\fR() and
398 \fBrfc822_namelist\fR(), except that they print a single name or address in the
399 rfc822a
400 structure, given its
401 \fIindex\fR\&. The functions
402 \fBrfc822_gettok\fR(),
403 \fBrfc822_getaddrs\fR(),
404 \fBrfc822_getaddr\fR(), and
405 \fBrfc822_getname\fR() are equivalent to
406 \fBrfc822tok_print\fR(),
407 \fBrfc822_print\fR(),
408 \fBrfc822_praddr\fR() and
409 \fBrfc822_prname\fR(), but, instead of using a callback function pointer, these functions write the output into a dynamically allocated buffer\&. That buffer must be destroyed by
410 \fBfree\fR(3) after use\&. These functions will return a null pointer in the event of a failure to allocate memory for the buffer\&.
411 .PP
412 \fBrfc822_prname_orlist\fR() is similar to
413 \fBrfc822_prname\fR(), except that it will also print the legacy RFC822 group list syntax (which are also parsed by
414 \fBrfc822a_alloc\fR())\&.
415 \fBrfc822_praddr\fR() will print an empty string for an index that corresponds to a group list name (or terminated semicolon)\&.
416 \fBrfc822_prname\fR() will also print an empty string\&.
417 \fBrfc822_prname_orlist\fR() will instead print either the name of the group list, or a single string ";"\&.
418 \fBrfc822_getname_orlist\fR() will instead save it into a dynamically allocated buffer\&.
419 .PP
420 The function
421 \fBrfc822_getaddrs_wrap\fR() is similar to
422 \fBrfc822_getaddrs\fR(), except that the generated text is wrapped on or about the 73rd column, using newline characters\&.
423 .SS "Working with dates"
424 .sp
425 .if n \{\
426 .RS 4
427 .\}
428 .nf
429 time_t timestamp=rfc822_parsedt(const char *datestr)
430 const char *datestr=rfc822_mkdate(time_t timestamp);
431 void rfc822_mkdate_buf(time_t timestamp, char *buffer);
432 .fi
433 .if n \{\
434 .RE
435 .\}
436 .PP
437 These functions convert between timestamps and dates expressed in the
438 Date:
439 E\-mail header format\&.
440 .PP
441 \fBrfc822_parsedt\fR() returns the timestamp corresponding to the given date string (0 if there was a syntax error)\&.
442 .PP
443 \fBrfc822_mkdate\fR() returns a date string corresponding to the given timestamp\&.
444 \fBrfc822_mkdate_buf\fR() writes the date string into the given buffer instead, which must be big enough to accommodate it\&.
445 .SS "Working with 8\-bit MIME\-encoded headers"
446 .sp
447 .if n \{\
448 .RS 4
449 .\}
450 .nf
451 int error=rfc2047_decode(const char *text,
452 int (*callback_func)(const char *, int, const char *, void *),
453 void *callback_arg);
454
455 extern char *str=rfc2047_decode_simple(const char *text);
456
457 extern char *str=rfc2047_decode_enhanced(const char *text,
458 const char *charset);
459
460 void rfc2047_print(const struct rfc822a *a,
461 const char *charset,
462 void (*print_func)(char, void *),
463 void (*print_separator)(const char *, void *), void *);
464
465
466 char *buffer=rfc2047_encode_str(const char *string,
467 const char *charset);
468
469 int error=rfc2047_encode_callback(const char *string,
470 const char *charset,
471 int (*func)(const char *, size_t, void *),
472 void *callback_arg);
473
474 char *buffer=rfc2047_encode_header(const struct rfc822a *a,
475 const char *charset);
476 .fi
477 .if n \{\
478 .RE
479 .\}
480 .PP
481 These functions provide additional logic to encode or decode 8\-bit content in 7\-bit RFC 822 headers, as specified in RFC 2047\&.
482 .PP
483 \fBrfc2047_decode\fR() is a basic RFC 2047 decoding function\&. It receives a pointer to some 7bit RFC 2047\-encoded text, and a callback function\&. The callback function is repeatedly called\&. Each time it\*(Aqs called it receives a piece of decoded text\&. The arguments are: a pointer to a text fragment, number of bytes in the text fragment, followed by a pointer to the character set of the text fragment\&. The character set pointer is NULL for portions of the original text that are not RFC 2047\-encoded\&.
484 .PP
485 The callback function also receives
486 \fIcallback_arg\fR, as its last argument\&. If the callback function returns a non\-zero value,
487 \fBrfc2047_decode\fR() terminates, returning that value\&. Otherwise,
488 \fBrfc2047_decode\fR() returns 0 after a successful decoding\&.
489 \fBrfc2047_decode\fR() returns \-1 if it was unable to allocate sufficient memory\&.
490 .PP
491 \fBrfc2047_decode_simple\fR() and
492 \fBrfc2047_decode_enhanced\fR() are alternatives to
493 \fBrfc2047_decode\fR() which forego a callback function, and return the decoded text in a dynamically\-allocated memory buffer\&. The buffer must be
494 \fBfree\fR(3)\-ed after use\&.
495 \fBrfc2047_decode_simple\fR() discards all character set specifications, and merely decodes any 8\-bit text\&.
496 \fBrfc2047_decode_enhanced\fR() is a compromise to discarding all character set information\&. The local character set being used is specified as the second argument to
497 \fBrfc2047_decode_enhanced\fR()\&. Any RFC 2047\-encoded text in a different character set will be prefixed by the name of the character set, in brackets, in the resulting output\&.
498 .PP
499 \fBrfc2047_decode_simple\fR() and
500 \fBrfc2047_decode_enhanced\fR() return a null pointer if they are unable to allocate sufficient memory\&.
501 .PP
502 The
503 \fBrfc2047_print\fR() function is equivalent to
504 \fBrfc822_print\fR(), followed by
505 \fBrfc2047_decode_enhanced\fR() on the result\&. The callback functions are used in an identical fashion, except that they receive text that\*(Aqs already decoded\&.
506 .PP
507 The function
508 \fBrfc2047_encode_str\fR() takes a
509 \fIstring\fR
510 and
511 \fIcharset\fR
512 being the name of the local character set, then encodes any 8\-bit portions of
513 \fIstring\fR
514 using RFC 2047 encoding\&.
515 \fBrfc2047_encode_str\fR() returns a dynamically\-allocated buffer with the result, which must be
516 \fBfree\fR(3)\-ed after use, or NULL if there was insufficient memory to allocate the buffer\&.
517 .PP
518 The function
519 \fBrfc2047_encode_callback\fR() is similar to
520 \fBrfc2047_encode_str\fR() except that the callback function is repeatedly called to received the encoding string\&. Each invocation of the callback function receives a pointer to a portion of the encoded text, the number of characters in this portion, and
521 \fIcallback_arg\fR\&.
522 .PP
523 The function
524 \fBrfc2047_encode_header\fR() is basically equivalent to
525 \fBrfc822_getaddrs\fR(), followed by
526 \fBrfc2047_encode_str\fR();
527 .SS "Working with subjects"
528 .sp
529 .if n \{\
530 .RS 4
531 .\}
532 .nf
533 char *basesubj=rfc822_coresubj(const char *subj);
534
535 char *basesubj=rfc822_coresubj_nouc(const char *subj);
536 .fi
537 .if n \{\
538 .RE
539 .\}
540 .PP
541 This function takes the contents of the subject header, and returns the "core" subject header that\*(Aqs used in the specification of the IMAP THREAD function\&. This function is designed to strip all subject line artifacts that might\*(Aqve been added in the process of forwarding or replying to a message\&. Currently,
542 \fBrfc822_coresubj\fR() performs the following transformations:
543 .PP
544 Whitespace
545 .RS 4
546 Leading and trailing whitespace is removed\&. Consecutive whitespace characters are collapsed into a single whitespace character\&. All whitespace characters are replaced by a space\&.
547 .RE
548 .PP
549 Re:, (fwd) [foo]
550 .RS 4
551 These artifacts (and several others) are removed from the subject line\&.
552 .RE
553 .PP
554 Note that this function does NOT do MIME decoding\&. In order to implement IMAP THREAD, it is necessary to call something like
555 \fBrfc2047_decode\fR() before calling
556 \fBrfc822_coresubj\fR()\&.
557 .PP
558 This function returns a pointer to a dynamically\-allocated buffer, which must be
559 \fBfree\fR(3)\-ed after use\&.
560 .PP
561 \fBrfc822_coresubj_nouc\fR() is like
562 \fBrfc822_coresubj\fR(), except that the subject is not converted to uppercase\&.
563 .SH "SEE ALSO"
564 .PP
565 \m[blue]\fB\fBrfc2045\fR(3)\fR\m[]\&\s-2\u[3]\d\s+2,
566 \m[blue]\fB\fBreformail\fR(1)\fR\m[]\&\s-2\u[4]\d\s+2,
567 \m[blue]\fB\fBreformime\fR(1)\fR\m[]\&\s-2\u[5]\d\s+2\&.
568 .SH "AUTHOR"
569 .PP
570 \fBSam Varshavchik\fR
571 .RS 4
572 Author
573 .RE
574 .SH "NOTES"
575 .IP " 1." 4
576 RFC 822
577 .RS 4
578 \%http://www.rfc-editor.org/rfc/rfc822.txt
579 .RE
580 .IP " 2." 4
581 RFC 2822
582 .RS 4
583 \%http://www.rfc-editor.org/rfc/rfc2822.txt
584 .RE
585 .IP " 3." 4
586 \fBrfc2045\fR(3)
587 .RS 4
588 \%[set $man.base.url.for.relative.links]/rfc2045.html
589 .RE
590 .IP " 4." 4
591 \fBreformail\fR(1)
592 .RS 4
593 \%[set $man.base.url.for.relative.links]/reformail.html
594 .RE
595 .IP " 5." 4
596 \fBreformime\fR(1)
597 .RS 4
598 \%[set $man.base.url.for.relative.links]/reformime.html
599 .RE