'\" t .\" .\" .\" Title: rfc822 .\" Author: Sam Varshavchik .\" Generator: DocBook XSL Stylesheets v1.78.1 .\" Date: 06/20/2015 .\" Manual: Double Precision, Inc. .\" Source: Courier Mail Server .\" Language: English .\" .TH "RFC822" "3" "06/20/2015" "Courier Mail Server" "Double Precision, Inc\&." .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" rfc822 \- RFC 822 parsing library .SH "SYNOPSIS" .sp .nf #include #include cc \&.\&.\&. \-lrfc822 .fi .SH "DESCRIPTION" .PP The rfc822 library provides functions for parsing E\-mail headers in the RFC 822 format\&. This library also includes some functions to help with encoding and decoding 8\-bit text, as defined by RFC 2047\&. .PP The format used by E\-mail headers to encode sender and recipient information is defined by \m[blue]\fBRFC 822\fR\m[]\&\s-2\u[1]\d\s+2 (and its successor, \m[blue]\fBRFC 2822\fR\m[]\&\s-2\u[2]\d\s+2)\&. The format allows the actual E\-mail address and the sender/recipient name to be expressed together, for example: John Smith .PP The main purposes of the rfc822 library is to: .PP 1) Parse a text string containing a list of RFC 822\-formatted address into its logical components: names and E\-mail addresses\&. .PP 2) Access those individual components\&. .PP 3) Allow some limited modifications of the parsed structure, and then convert it back into a text string\&. .SS "Tokenizing an E\-mail header" .sp .if n \{\ .RS 4 .\} .nf struct rfc822t *tokens=rfc822t_alloc_new(const char *header, void (*err_func)(const char *, int, void *), void *func_arg); void rfc822t_free(tokens); .fi .if n \{\ .RE .\} .PP The \fBrfc822t_alloc_new\fR() function (superceeds \fBrfc822t_alloc\fR(), which is now obsolete) accepts an E\-mail \fIheader\fR, and parses it into individual tokens\&. This function allocates and returns a pointer to an rfc822t structure, which is later used by \fBrfc822a_alloc\fR() to extract individual addresses from these tokens\&. .PP If \fIerr_func\fR argument, if not NULL, is a pointer to a callback function\&. The function is called in the event that the E\-mail header is corrupted to the point that it cannot even be parsed\&. This is a rare instance \-\- most forms of corruption are still valid at least on the lexical level\&. The only time this error is reported is in the event of mismatched parenthesis, angle brackets, or quotes\&. The callback function receives the \fIheader\fR pointer, an index to the syntax error in the header string, and the \fIfunc_arg\fR argument\&. .PP The semantics of \fIerr_func\fR are subject to change\&. It is recommended to leave this argument as NULL in the current version of the library\&. .PP \fBrfc822t_alloc\fR() returns a pointer to a dynamically\-allocated rfc822t structure\&. A NULL pointer is returned if there\*(Aqs insufficient memory to allocate this structure\&. The \fBrfc822t_free\fR() function destroys rfc822t structure and frees all dynamically allocated memory\&. .if n \{\ .sp .\} .RS 4 .it 1 an-trap .nr an-no-space-flag 1 .nr an-break-flag 1 .br .ps +1 \fBNote\fR .ps -1 .br .PP Until \fBrfc822t_free\fR() is called, the contents of \fIheader\fR MUST NOT be destroyed or altered in any way\&. The contents of \fIheader\fR are not modified by \fBrfc822t_alloc\fR(), however the rfc822t structure contains pointers to portions of the supplied \fIheader\fR, and they must remain valid\&. .sp .5v .RE .SS "Extracting E\-mail addresses" .sp .if n \{\ .RS 4 .\} .nf struct rfc822a *addrs=rfc822a_alloc(struct rfc822t *tokens); void rfc822a_free(addrs); .fi .if n \{\ .RE .\} .PP The \fBrfc822a_alloc\fR() function returns a dynamically\-allocated rfc822a structure, that contains individual addresses that were logically parsed from a rfc822t structure\&. The \fBrfc822a_alloc\fR() function returns NULL if there was insufficient memory to allocate the rfc822a structure\&. The \fBrfc822a_free\fR() function destroys the rfc822a function, and frees all associated dynamically\-allocated memory\&. The rfc822t structure passed to \fBrfc822a_alloc\fR() must not be destroyed before \fBrfc822a_free\fR() destroys the rfc822a structure\&. .PP The rfc822a structure has the following fields: .sp .if n \{\ .RS 4 .\} .nf struct rfc822a { struct rfc822addr *addrs; int naddrs; } ; .fi .if n \{\ .RE .\} .PP The \fInaddrs\fR field gives the number of rfc822addr structures that are pointed to by \fIaddrs\fR, which is an array\&. Each rfc822addr structure represents either an address found in the original E\-mail header, \fIor the contents of some legacy "syntactical sugar"\fR\&. For example, the following is a valid E\-mail header: .sp .if n \{\ .RS 4 .\} .nf To: recipient\-list: tom@example\&.com, john@example\&.com; .fi .if n \{\ .RE .\} .PP Typically, all of this, except for "To:", is tokenized by \fBrfc822t_alloc\fR(), then parsed by \fBrfc822a_alloc\fR()\&. "recipient\-list:" and the trailing semicolon is a legacy mailing list specification that is no longer in widespread use, but must still must be accounted for\&. The resulting rfc822a structure will have four rfc822addr structures: one for "recipient\-list:"; one for each address; and one for the trailing semicolon\&. Each rfc822a structure has the following fields: .sp .if n \{\ .RS 4 .\} .nf struct rfc822addr { struct rfc822token *tokens; struct rfc822token *name; } ; .fi .if n \{\ .RE .\} .PP If \fItokens\fR is a null pointer, this structure represents some non\-address portion of the original header, such as "recipient\-list:" or a semicolon\&. Otherwise it points to a structure that represents the E\-mail address in tokenized form\&. .PP \fIname\fR either points to the tokenized form of a non\-address portion of the original header, or to a tokenized form of the recipient\*(Aqs name\&. \fIname\fR will be NULL if the recipient name was not provided\&. For the following address: Tom Jones \- the \fItokens\fR field points to the tokenized form of "tjones@example\&.com", and \fIname\fR points to the tokenized form of "Tom Jones"\&. .PP Each rfc822token structure contains the following fields: .sp .if n \{\ .RS 4 .\} .nf struct rfc822token { struct rfc822token *next; int token; const char *ptr; int len; } ; .fi .if n \{\ .RE .\} .PP The \fInext\fR pointer builds a linked list of all tokens in this name or address\&. The possible values for the \fItoken\fR field are: .PP 0x00 .RS 4 This is a simple atom \- a sequence of non\-special characters that is delimited by whitespace or special characters (see below)\&. .RE .PP 0x22 .RS 4 The value of the ascii quote \- this is a quoted string\&. .RE .PP Open parenthesis: \*(Aq(\*(Aq .RS 4 This is an old style comment\&. A deprecated form of E\-mail addressing uses \- for example \- "john@example\&.com (John Smith)" instead of "John Smith "\&. This old\-style notation defined parenthesized content as arbitrary comments\&. The rfc822token with \fItoken\fR set to \*(Aq(\*(Aq is created for the contents of the entire comment\&. .RE .PP Symbols: \*(Aq<\*(Aq, \*(Aq>\*(Aq, \*(Aq@\*(Aq, and many others .RS 4 The remaining possible values of \fItoken\fR include all the characters in RFC 822 headers that have special significance\&. .RE .PP When a rfc822token structure does not represent a special character, the \fIptr\fR field points to a text string giving its contents\&. The contents are NOT null\-terminated, the \fIlen\fR field contains the number of characters included\&. The macro rfc822_is_atom(token) indicates whether \fIptr\fR and \fIlen\fR are used for the given \fItoken\fR\&. Currently \fBrfc822_is_atom\fR() returns true if \fItoken\fR is a zero byte, \*(Aq"\*(Aq, or \*(Aq(\*(Aq\&. .PP Note that it\*(Aqs possible that \fIlen\fR might be zero\&. This happens with null addresses used as return addresses for delivery status notifications\&. .SS "Working with E\-mail addresses" .sp .if n \{\ .RS 4 .\} .nf void rfc822_deladdr(struct rfc822a *addrs, int index); void rfc822tok_print(const struct rfc822token *list, void (*func)(char, void *), void *func_arg); void rfc822_print(const struct rfc822a *addrs, void (*print_func)(char, void *), void (*print_separator)(const char *, void *), void *callback_arg); void rfc822_addrlist(const struct rfc822a *addrs, void (*print_func)(char, void *), void *callback_arg); void rfc822_namelist(const struct rfc822a *addrs, void (*print_func)(char, void *), void *callback_arg); void rfc822_praddr(const struct rfc822a *addrs, int index, void (*print_func)(char, void *), void *callback_arg); void rfc822_prname(const struct rfc822a *addrs, int index, void (*print_func)(char, void *), void *callback_arg); void rfc822_prname_orlist(const struct rfc822a *addrs, int index, void (*print_func)(char, void *), void *callback_arg); char *rfc822_gettok(const struct rfc822token *list); char *rfc822_getaddrs(const struct rfc822a *addrs); char *rfc822_getaddr(const struct rfc822a *addrs, int index); char *rfc822_getname(const struct rfc822a *addrs, int index); char *rfc822_getname_orlist(const struct rfc822a *addrs, int index); char *rfc822_getaddrs_wrap(const struct rfc822a *, int); .fi .if n \{\ .RE .\} .PP These functions are used to work with individual addresses that are parsed by \fBrfc822a_alloc\fR()\&. .PP \fBrfc822_deladdr\fR() removes a single rfc822addr structure, whose \fIindex\fR is given, from the address array in rfc822addr\&. \fInaddrs\fR is decremented by one\&. .PP \fBrfc822tok_print\fR() converts a tokenized \fIlist\fR of rfc822token objects into a text string\&. The callback function, \fIfunc\fR, is called one character at a time, for every character in the tokenized objects\&. An arbitrary pointer, \fIfunc_arg\fR, is passed unchanged as the additional argument to the callback function\&. \fBrfc822tok_print\fR() is not usually the most convenient and efficient function, but it has its uses\&. .PP \fBrfc822_print\fR() takes an entire rfc822a structure, and uses the callback functions to print the contained addresses, in their original form, separated by commas\&. The function pointed to by \fIprint_func\fR is used to print each individual address, one character at a time\&. Between the addresses, the \fIprint_separator\fR function is called to print the address separator, usually the string ", "\&. The \fIcallback_arg\fR argument is passed along unchanged, as an additional argument to these functions\&. .PP The functions \fBrfc822_addrlist\fR() and \fBrfc822_namelist\fR() also print the contents of the entire rfc822a structure, but in a different way\&. \fBrfc822_addrlist\fR() prints just the actual E\-mail addresses, not the recipient names or comments\&. Each E\-mail address is followed by a newline character\&. \fBrfc822_namelist\fR() prints just the names or comments, followed by newlines\&. .PP The functions \fBrfc822_praddr\fR() and \fBrfc822_prname\fR() are just like \fBrfc822_addrlist\fR() and \fBrfc822_namelist\fR(), except that they print a single name or address in the rfc822a structure, given its \fIindex\fR\&. The functions \fBrfc822_gettok\fR(), \fBrfc822_getaddrs\fR(), \fBrfc822_getaddr\fR(), and \fBrfc822_getname\fR() are equivalent to \fBrfc822tok_print\fR(), \fBrfc822_print\fR(), \fBrfc822_praddr\fR() and \fBrfc822_prname\fR(), but, instead of using a callback function pointer, these functions write the output into a dynamically allocated buffer\&. That buffer must be destroyed by \fBfree\fR(3) after use\&. These functions will return a null pointer in the event of a failure to allocate memory for the buffer\&. .PP \fBrfc822_prname_orlist\fR() is similar to \fBrfc822_prname\fR(), except that it will also print the legacy RFC822 group list syntax (which are also parsed by \fBrfc822a_alloc\fR())\&. \fBrfc822_praddr\fR() will print an empty string for an index that corresponds to a group list name (or terminated semicolon)\&. \fBrfc822_prname\fR() will also print an empty string\&. \fBrfc822_prname_orlist\fR() will instead print either the name of the group list, or a single string ";"\&. \fBrfc822_getname_orlist\fR() will instead save it into a dynamically allocated buffer\&. .PP The function \fBrfc822_getaddrs_wrap\fR() is similar to \fBrfc822_getaddrs\fR(), except that the generated text is wrapped on or about the 73rd column, using newline characters\&. .SS "Working with dates" .sp .if n \{\ .RS 4 .\} .nf time_t timestamp=rfc822_parsedt(const char *datestr) const char *datestr=rfc822_mkdate(time_t timestamp); void rfc822_mkdate_buf(time_t timestamp, char *buffer); .fi .if n \{\ .RE .\} .PP These functions convert between timestamps and dates expressed in the Date: E\-mail header format\&. .PP \fBrfc822_parsedt\fR() returns the timestamp corresponding to the given date string (0 if there was a syntax error)\&. .PP \fBrfc822_mkdate\fR() returns a date string corresponding to the given timestamp\&. \fBrfc822_mkdate_buf\fR() writes the date string into the given buffer instead, which must be big enough to accommodate it\&. .SS "Working with 8\-bit MIME\-encoded headers" .sp .if n \{\ .RS 4 .\} .nf int error=rfc2047_decode(const char *text, int (*callback_func)(const char *, int, const char *, void *), void *callback_arg); extern char *str=rfc2047_decode_simple(const char *text); extern char *str=rfc2047_decode_enhanced(const char *text, const char *charset); void rfc2047_print(const struct rfc822a *a, const char *charset, void (*print_func)(char, void *), void (*print_separator)(const char *, void *), void *); char *buffer=rfc2047_encode_str(const char *string, const char *charset); int error=rfc2047_encode_callback(const char *string, const char *charset, int (*func)(const char *, size_t, void *), void *callback_arg); char *buffer=rfc2047_encode_header(const struct rfc822a *a, const char *charset); .fi .if n \{\ .RE .\} .PP These functions provide additional logic to encode or decode 8\-bit content in 7\-bit RFC 822 headers, as specified in RFC 2047\&. .PP \fBrfc2047_decode\fR() is a basic RFC 2047 decoding function\&. It receives a pointer to some 7bit RFC 2047\-encoded text, and a callback function\&. The callback function is repeatedly called\&. Each time it\*(Aqs called it receives a piece of decoded text\&. The arguments are: a pointer to a text fragment, number of bytes in the text fragment, followed by a pointer to the character set of the text fragment\&. The character set pointer is NULL for portions of the original text that are not RFC 2047\-encoded\&. .PP The callback function also receives \fIcallback_arg\fR, as its last argument\&. If the callback function returns a non\-zero value, \fBrfc2047_decode\fR() terminates, returning that value\&. Otherwise, \fBrfc2047_decode\fR() returns 0 after a successful decoding\&. \fBrfc2047_decode\fR() returns \-1 if it was unable to allocate sufficient memory\&. .PP \fBrfc2047_decode_simple\fR() and \fBrfc2047_decode_enhanced\fR() are alternatives to \fBrfc2047_decode\fR() which forego a callback function, and return the decoded text in a dynamically\-allocated memory buffer\&. The buffer must be \fBfree\fR(3)\-ed after use\&. \fBrfc2047_decode_simple\fR() discards all character set specifications, and merely decodes any 8\-bit text\&. \fBrfc2047_decode_enhanced\fR() is a compromise to discarding all character set information\&. The local character set being used is specified as the second argument to \fBrfc2047_decode_enhanced\fR()\&. Any RFC 2047\-encoded text in a different character set will be prefixed by the name of the character set, in brackets, in the resulting output\&. .PP \fBrfc2047_decode_simple\fR() and \fBrfc2047_decode_enhanced\fR() return a null pointer if they are unable to allocate sufficient memory\&. .PP The \fBrfc2047_print\fR() function is equivalent to \fBrfc822_print\fR(), followed by \fBrfc2047_decode_enhanced\fR() on the result\&. The callback functions are used in an identical fashion, except that they receive text that\*(Aqs already decoded\&. .PP The function \fBrfc2047_encode_str\fR() takes a \fIstring\fR and \fIcharset\fR being the name of the local character set, then encodes any 8\-bit portions of \fIstring\fR using RFC 2047 encoding\&. \fBrfc2047_encode_str\fR() returns a dynamically\-allocated buffer with the result, which must be \fBfree\fR(3)\-ed after use, or NULL if there was insufficient memory to allocate the buffer\&. .PP The function \fBrfc2047_encode_callback\fR() is similar to \fBrfc2047_encode_str\fR() except that the callback function is repeatedly called to received the encoding string\&. Each invocation of the callback function receives a pointer to a portion of the encoded text, the number of characters in this portion, and \fIcallback_arg\fR\&. .PP The function \fBrfc2047_encode_header\fR() is basically equivalent to \fBrfc822_getaddrs\fR(), followed by \fBrfc2047_encode_str\fR(); .SS "Working with subjects" .sp .if n \{\ .RS 4 .\} .nf char *basesubj=rfc822_coresubj(const char *subj); char *basesubj=rfc822_coresubj_nouc(const char *subj); .fi .if n \{\ .RE .\} .PP This function takes the contents of the subject header, and returns the "core" subject header that\*(Aqs used in the specification of the IMAP THREAD function\&. This function is designed to strip all subject line artifacts that might\*(Aqve been added in the process of forwarding or replying to a message\&. Currently, \fBrfc822_coresubj\fR() performs the following transformations: .PP Whitespace .RS 4 Leading and trailing whitespace is removed\&. Consecutive whitespace characters are collapsed into a single whitespace character\&. All whitespace characters are replaced by a space\&. .RE .PP Re:, (fwd) [foo] .RS 4 These artifacts (and several others) are removed from the subject line\&. .RE .PP Note that this function does NOT do MIME decoding\&. In order to implement IMAP THREAD, it is necessary to call something like \fBrfc2047_decode\fR() before calling \fBrfc822_coresubj\fR()\&. .PP This function returns a pointer to a dynamically\-allocated buffer, which must be \fBfree\fR(3)\-ed after use\&. .PP \fBrfc822_coresubj_nouc\fR() is like \fBrfc822_coresubj\fR(), except that the subject is not converted to uppercase\&. .SH "SEE ALSO" .PP \m[blue]\fB\fBrfc2045\fR(3)\fR\m[]\&\s-2\u[3]\d\s+2, \m[blue]\fB\fBreformail\fR(1)\fR\m[]\&\s-2\u[4]\d\s+2, \m[blue]\fB\fBreformime\fR(1)\fR\m[]\&\s-2\u[5]\d\s+2\&. .SH "AUTHOR" .PP \fBSam Varshavchik\fR .RS 4 Author .RE .SH "NOTES" .IP " 1." 4 RFC 822 .RS 4 \%http://www.rfc-editor.org/rfc/rfc822.txt .RE .IP " 2." 4 RFC 2822 .RS 4 \%http://www.rfc-editor.org/rfc/rfc2822.txt .RE .IP " 3." 4 \fBrfc2045\fR(3) .RS 4 \%[set $man.base.url.for.relative.links]/rfc2045.html .RE .IP " 4." 4 \fBreformail\fR(1) .RS 4 \%[set $man.base.url.for.relative.links]/reformail.html .RE .IP " 5." 4 \fBreformime\fR(1) .RS 4 \%[set $man.base.url.for.relative.links]/reformime.html .RE