2 .\"<!-- Copyright 2001-2007 Double Precision, Inc. See COPYING for -->
3 .\"<!-- distribution information. -->
5 .\" Author: Sam Varshavchik
6 .\" Generator: DocBook XSL Stylesheets v1.78.1 <http://docbook.sf.net/>
8 .\" Manual: Double Precision, Inc.
9 .\" Source: Courier Mail Server
12 .TH "RFC822" "3" "06/20/2015" "Courier Mail Server" "Double Precision, Inc\&."
13 .\" -----------------------------------------------------------------
14 .\" * Define some portability stuff
15 .\" -----------------------------------------------------------------
16 .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17 .\" http://bugs.debian.org/507673
18 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
19 .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
22 .\" -----------------------------------------------------------------
23 .\" * set default formatting
24 .\" -----------------------------------------------------------------
25 .\" disable hyphenation
27 .\" disable justification (adjust text to left margin only)
29 .\" -----------------------------------------------------------------
30 .\" * MAIN CONTENT STARTS HERE *
31 .\" -----------------------------------------------------------------
33 rfc822 \- RFC 822 parsing library
39 #include <rfc2047\&.h>
41 cc \&.\&.\&. \-lrfc822
45 The rfc822 library provides functions for parsing E\-mail headers in the RFC 822 format\&. This library also includes some functions to help with encoding and decoding 8\-bit text, as defined by RFC 2047\&.
47 The format used by E\-mail headers to encode sender and recipient information is defined by
48 \m[blue]\fBRFC 822\fR\m[]\&\s-2\u[1]\d\s+2
50 \m[blue]\fBRFC 2822\fR\m[]\&\s-2\u[2]\d\s+2)\&. The format allows the actual E\-mail address and the sender/recipient name to be expressed together, for example:
51 John Smith <jsmith@example\&.com>
53 The main purposes of the rfc822 library is to:
55 1) Parse a text string containing a list of RFC 822\-formatted address into its logical components: names and E\-mail addresses\&.
57 2) Access those individual components\&.
59 3) Allow some limited modifications of the parsed structure, and then convert it back into a text string\&.
60 .SS "Tokenizing an E\-mail header"
66 struct rfc822t *tokens=rfc822t_alloc_new(const char *header,
67 void (*err_func)(const char *, int, void *),
70 void rfc822t_free(tokens);
77 \fBrfc822t_alloc_new\fR() function (superceeds
78 \fBrfc822t_alloc\fR(), which is now obsolete) accepts an E\-mail
79 \fIheader\fR, and parses it into individual tokens\&. This function allocates and returns a pointer to an
81 structure, which is later used by
82 \fBrfc822a_alloc\fR() to extract individual addresses from these tokens\&.
86 argument, if not NULL, is a pointer to a callback function\&. The function is called in the event that the E\-mail header is corrupted to the point that it cannot even be parsed\&. This is a rare instance \-\- most forms of corruption are still valid at least on the lexical level\&. The only time this error is reported is in the event of mismatched parenthesis, angle brackets, or quotes\&. The callback function receives the
88 pointer, an index to the syntax error in the header string, and the
94 are subject to change\&. It is recommended to leave this argument as NULL in the current version of the library\&.
96 \fBrfc822t_alloc\fR() returns a pointer to a dynamically\-allocated
98 structure\&. A NULL pointer is returned if there\*(Aqs insufficient memory to allocate this structure\&. The
99 \fBrfc822t_free\fR() function destroys
101 structure and frees all dynamically allocated memory\&.
107 .nr an-no-space-flag 1
116 \fBrfc822t_free\fR() is called, the contents of
118 MUST NOT be destroyed or altered in any way\&. The contents of
121 \fBrfc822t_alloc\fR(), however the
123 structure contains pointers to portions of the supplied
124 \fIheader\fR, and they must remain valid\&.
127 .SS "Extracting E\-mail addresses"
133 struct rfc822a *addrs=rfc822a_alloc(struct rfc822t *tokens);
135 void rfc822a_free(addrs);
142 \fBrfc822a_alloc\fR() function returns a dynamically\-allocated
144 structure, that contains individual addresses that were logically parsed from a
147 \fBrfc822a_alloc\fR() function returns NULL if there was insufficient memory to allocate the
150 \fBrfc822a_free\fR() function destroys the
152 function, and frees all associated dynamically\-allocated memory\&. The
155 \fBrfc822a_alloc\fR() must not be destroyed before
156 \fBrfc822a_free\fR() destroys the
162 structure has the following fields:
169 struct rfc822addr *addrs;
179 field gives the number of
181 structures that are pointed to by
182 \fIaddrs\fR, which is an array\&. Each
184 structure represents either an address found in the original E\-mail header,
185 \fIor the contents of some legacy "syntactical sugar"\fR\&. For example, the following is a valid E\-mail header:
191 To: recipient\-list: tom@example\&.com, john@example\&.com;
197 Typically, all of this, except for "To:", is tokenized by
198 \fBrfc822t_alloc\fR(), then parsed by
199 \fBrfc822a_alloc\fR()\&. "recipient\-list:" and the trailing semicolon is a legacy mailing list specification that is no longer in widespread use, but must still must be accounted for\&. The resulting
201 structure will have four
203 structures: one for "recipient\-list:"; one for each address; and one for the trailing semicolon\&. Each
205 structure has the following fields:
212 struct rfc822token *tokens;
213 struct rfc822token *name;
222 is a null pointer, this structure represents some non\-address portion of the original header, such as "recipient\-list:" or a semicolon\&. Otherwise it points to a structure that represents the E\-mail address in tokenized form\&.
225 either points to the tokenized form of a non\-address portion of the original header, or to a tokenized form of the recipient\*(Aqs name\&.
227 will be NULL if the recipient name was not provided\&. For the following address:
228 Tom Jones <tjones@example\&.com>
231 field points to the tokenized form of "tjones@example\&.com", and
233 points to the tokenized form of "Tom Jones"\&.
237 structure contains the following fields:
244 struct rfc822token *next;
256 pointer builds a linked list of all tokens in this name or address\&. The possible values for the
262 This is a simple atom \- a sequence of non\-special characters that is delimited by whitespace or special characters (see below)\&.
267 The value of the ascii quote \- this is a quoted string\&.
270 Open parenthesis: \*(Aq(\*(Aq
272 This is an old style comment\&. A deprecated form of E\-mail addressing uses \- for example \- "john@example\&.com (John Smith)" instead of "John Smith <john@example\&.com>"\&. This old\-style notation defined parenthesized content as arbitrary comments\&. The
276 set to \*(Aq(\*(Aq is created for the contents of the entire comment\&.
279 Symbols: \*(Aq<\*(Aq, \*(Aq>\*(Aq, \*(Aq@\*(Aq, and many others
281 The remaining possible values of
283 include all the characters in RFC 822 headers that have special significance\&.
288 structure does not represent a special character, the
290 field points to a text string giving its contents\&. The contents are NOT null\-terminated, the
292 field contains the number of characters included\&. The macro rfc822_is_atom(token) indicates whether
296 are used for the given
297 \fItoken\fR\&. Currently
298 \fBrfc822_is_atom\fR() returns true if
300 is a zero byte, \*(Aq"\*(Aq, or \*(Aq(\*(Aq\&.
302 Note that it\*(Aqs possible that
304 might be zero\&. This happens with null addresses used as return addresses for delivery status notifications\&.
305 .SS "Working with E\-mail addresses"
311 void rfc822_deladdr(struct rfc822a *addrs, int index);
313 void rfc822tok_print(const struct rfc822token *list,
314 void (*func)(char, void *), void *func_arg);
316 void rfc822_print(const struct rfc822a *addrs,
317 void (*print_func)(char, void *),
318 void (*print_separator)(const char *, void *), void *callback_arg);
320 void rfc822_addrlist(const struct rfc822a *addrs,
321 void (*print_func)(char, void *),
324 void rfc822_namelist(const struct rfc822a *addrs,
325 void (*print_func)(char, void *),
328 void rfc822_praddr(const struct rfc822a *addrs,
330 void (*print_func)(char, void *),
333 void rfc822_prname(const struct rfc822a *addrs,
335 void (*print_func)(char, void *),
338 void rfc822_prname_orlist(const struct rfc822a *addrs,
340 void (*print_func)(char, void *),
343 char *rfc822_gettok(const struct rfc822token *list);
344 char *rfc822_getaddrs(const struct rfc822a *addrs);
345 char *rfc822_getaddr(const struct rfc822a *addrs, int index);
346 char *rfc822_getname(const struct rfc822a *addrs, int index);
347 char *rfc822_getname_orlist(const struct rfc822a *addrs, int index);
349 char *rfc822_getaddrs_wrap(const struct rfc822a *, int);
355 These functions are used to work with individual addresses that are parsed by
356 \fBrfc822a_alloc\fR()\&.
358 \fBrfc822_deladdr\fR() removes a single
362 is given, from the address array in
365 is decremented by one\&.
367 \fBrfc822tok_print\fR() converts a tokenized
371 objects into a text string\&. The callback function,
372 \fIfunc\fR, is called one character at a time, for every character in the tokenized objects\&. An arbitrary pointer,
373 \fIfunc_arg\fR, is passed unchanged as the additional argument to the callback function\&.
374 \fBrfc822tok_print\fR() is not usually the most convenient and efficient function, but it has its uses\&.
376 \fBrfc822_print\fR() takes an entire
378 structure, and uses the callback functions to print the contained addresses, in their original form, separated by commas\&. The function pointed to by
380 is used to print each individual address, one character at a time\&. Between the addresses, the
381 \fIprint_separator\fR
382 function is called to print the address separator, usually the string ", "\&. The
384 argument is passed along unchanged, as an additional argument to these functions\&.
387 \fBrfc822_addrlist\fR() and
388 \fBrfc822_namelist\fR() also print the contents of the entire
390 structure, but in a different way\&.
391 \fBrfc822_addrlist\fR() prints just the actual E\-mail addresses, not the recipient names or comments\&. Each E\-mail address is followed by a newline character\&.
392 \fBrfc822_namelist\fR() prints just the names or comments, followed by newlines\&.
395 \fBrfc822_praddr\fR() and
396 \fBrfc822_prname\fR() are just like
397 \fBrfc822_addrlist\fR() and
398 \fBrfc822_namelist\fR(), except that they print a single name or address in the
401 \fIindex\fR\&. The functions
402 \fBrfc822_gettok\fR(),
403 \fBrfc822_getaddrs\fR(),
404 \fBrfc822_getaddr\fR(), and
405 \fBrfc822_getname\fR() are equivalent to
406 \fBrfc822tok_print\fR(),
407 \fBrfc822_print\fR(),
408 \fBrfc822_praddr\fR() and
409 \fBrfc822_prname\fR(), but, instead of using a callback function pointer, these functions write the output into a dynamically allocated buffer\&. That buffer must be destroyed by
410 \fBfree\fR(3) after use\&. These functions will return a null pointer in the event of a failure to allocate memory for the buffer\&.
412 \fBrfc822_prname_orlist\fR() is similar to
413 \fBrfc822_prname\fR(), except that it will also print the legacy RFC822 group list syntax (which are also parsed by
414 \fBrfc822a_alloc\fR())\&.
415 \fBrfc822_praddr\fR() will print an empty string for an index that corresponds to a group list name (or terminated semicolon)\&.
416 \fBrfc822_prname\fR() will also print an empty string\&.
417 \fBrfc822_prname_orlist\fR() will instead print either the name of the group list, or a single string ";"\&.
418 \fBrfc822_getname_orlist\fR() will instead save it into a dynamically allocated buffer\&.
421 \fBrfc822_getaddrs_wrap\fR() is similar to
422 \fBrfc822_getaddrs\fR(), except that the generated text is wrapped on or about the 73rd column, using newline characters\&.
423 .SS "Working with dates"
429 time_t timestamp=rfc822_parsedt(const char *datestr)
430 const char *datestr=rfc822_mkdate(time_t timestamp);
431 void rfc822_mkdate_buf(time_t timestamp, char *buffer);
437 These functions convert between timestamps and dates expressed in the
439 E\-mail header format\&.
441 \fBrfc822_parsedt\fR() returns the timestamp corresponding to the given date string (0 if there was a syntax error)\&.
443 \fBrfc822_mkdate\fR() returns a date string corresponding to the given timestamp\&.
444 \fBrfc822_mkdate_buf\fR() writes the date string into the given buffer instead, which must be big enough to accommodate it\&.
445 .SS "Working with 8\-bit MIME\-encoded headers"
451 int error=rfc2047_decode(const char *text,
452 int (*callback_func)(const char *, int, const char *, void *),
455 extern char *str=rfc2047_decode_simple(const char *text);
457 extern char *str=rfc2047_decode_enhanced(const char *text,
458 const char *charset);
460 void rfc2047_print(const struct rfc822a *a,
462 void (*print_func)(char, void *),
463 void (*print_separator)(const char *, void *), void *);
466 char *buffer=rfc2047_encode_str(const char *string,
467 const char *charset);
469 int error=rfc2047_encode_callback(const char *string,
471 int (*func)(const char *, size_t, void *),
474 char *buffer=rfc2047_encode_header(const struct rfc822a *a,
475 const char *charset);
481 These functions provide additional logic to encode or decode 8\-bit content in 7\-bit RFC 822 headers, as specified in RFC 2047\&.
483 \fBrfc2047_decode\fR() is a basic RFC 2047 decoding function\&. It receives a pointer to some 7bit RFC 2047\-encoded text, and a callback function\&. The callback function is repeatedly called\&. Each time it\*(Aqs called it receives a piece of decoded text\&. The arguments are: a pointer to a text fragment, number of bytes in the text fragment, followed by a pointer to the character set of the text fragment\&. The character set pointer is NULL for portions of the original text that are not RFC 2047\-encoded\&.
485 The callback function also receives
486 \fIcallback_arg\fR, as its last argument\&. If the callback function returns a non\-zero value,
487 \fBrfc2047_decode\fR() terminates, returning that value\&. Otherwise,
488 \fBrfc2047_decode\fR() returns 0 after a successful decoding\&.
489 \fBrfc2047_decode\fR() returns \-1 if it was unable to allocate sufficient memory\&.
491 \fBrfc2047_decode_simple\fR() and
492 \fBrfc2047_decode_enhanced\fR() are alternatives to
493 \fBrfc2047_decode\fR() which forego a callback function, and return the decoded text in a dynamically\-allocated memory buffer\&. The buffer must be
494 \fBfree\fR(3)\-ed after use\&.
495 \fBrfc2047_decode_simple\fR() discards all character set specifications, and merely decodes any 8\-bit text\&.
496 \fBrfc2047_decode_enhanced\fR() is a compromise to discarding all character set information\&. The local character set being used is specified as the second argument to
497 \fBrfc2047_decode_enhanced\fR()\&. Any RFC 2047\-encoded text in a different character set will be prefixed by the name of the character set, in brackets, in the resulting output\&.
499 \fBrfc2047_decode_simple\fR() and
500 \fBrfc2047_decode_enhanced\fR() return a null pointer if they are unable to allocate sufficient memory\&.
503 \fBrfc2047_print\fR() function is equivalent to
504 \fBrfc822_print\fR(), followed by
505 \fBrfc2047_decode_enhanced\fR() on the result\&. The callback functions are used in an identical fashion, except that they receive text that\*(Aqs already decoded\&.
508 \fBrfc2047_encode_str\fR() takes a
512 being the name of the local character set, then encodes any 8\-bit portions of
514 using RFC 2047 encoding\&.
515 \fBrfc2047_encode_str\fR() returns a dynamically\-allocated buffer with the result, which must be
516 \fBfree\fR(3)\-ed after use, or NULL if there was insufficient memory to allocate the buffer\&.
519 \fBrfc2047_encode_callback\fR() is similar to
520 \fBrfc2047_encode_str\fR() except that the callback function is repeatedly called to received the encoding string\&. Each invocation of the callback function receives a pointer to a portion of the encoded text, the number of characters in this portion, and
521 \fIcallback_arg\fR\&.
524 \fBrfc2047_encode_header\fR() is basically equivalent to
525 \fBrfc822_getaddrs\fR(), followed by
526 \fBrfc2047_encode_str\fR();
527 .SS "Working with subjects"
533 char *basesubj=rfc822_coresubj(const char *subj);
535 char *basesubj=rfc822_coresubj_nouc(const char *subj);
541 This function takes the contents of the subject header, and returns the "core" subject header that\*(Aqs used in the specification of the IMAP THREAD function\&. This function is designed to strip all subject line artifacts that might\*(Aqve been added in the process of forwarding or replying to a message\&. Currently,
542 \fBrfc822_coresubj\fR() performs the following transformations:
546 Leading and trailing whitespace is removed\&. Consecutive whitespace characters are collapsed into a single whitespace character\&. All whitespace characters are replaced by a space\&.
551 These artifacts (and several others) are removed from the subject line\&.
554 Note that this function does NOT do MIME decoding\&. In order to implement IMAP THREAD, it is necessary to call something like
555 \fBrfc2047_decode\fR() before calling
556 \fBrfc822_coresubj\fR()\&.
558 This function returns a pointer to a dynamically\-allocated buffer, which must be
559 \fBfree\fR(3)\-ed after use\&.
561 \fBrfc822_coresubj_nouc\fR() is like
562 \fBrfc822_coresubj\fR(), except that the subject is not converted to uppercase\&.
565 \m[blue]\fB\fBrfc2045\fR(3)\fR\m[]\&\s-2\u[3]\d\s+2,
566 \m[blue]\fB\fBreformail\fR(1)\fR\m[]\&\s-2\u[4]\d\s+2,
567 \m[blue]\fB\fBreformime\fR(1)\fR\m[]\&\s-2\u[5]\d\s+2\&.
570 \fBSam Varshavchik\fR
578 \%http://www.rfc-editor.org/rfc/rfc822.txt
583 \%http://www.rfc-editor.org/rfc/rfc2822.txt
588 \%[set $man.base.url.for.relative.links]/rfc2045.html
593 \%[set $man.base.url.for.relative.links]/reformail.html
598 \%[set $man.base.url.for.relative.links]/reformime.html