Imported Upstream version 0.63.0
[hcoop/debian/courier-authlib.git] / rfc822 / rfc822.html
CommitLineData
d9898ee8 1<?xml version="1.0"?>
8d138742 2<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><title>rfc822</title><link rel="stylesheet" href="style.css" type="text/css"/><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"/><link rel="start" href="#rfc822" title="rfc822"/><link xmlns="" rel="stylesheet" type="text/css" href="manpage.css"/><meta xmlns="" name="MSSmartTagsPreventParsing" content="TRUE"/><link xmlns="" rel="icon" href="icon.gif" type="image/gif"/><!--
d9898ee8 3
4Copyright 1998 - 2007 Double Precision, Inc. See COPYING for distribution
5information.
6
7--></head><body><div class="refentry" lang="en" xml:lang="en"><a id="rfc822" shape="rect"> </a><div class="titlepage"/><div class="refnamediv"><h2>Name</h2><p>rfc822 — RFC 822 parsing library</p></div><div class="refsynopsisdiv"><h2>Synopsis</h2><div class="informalexample"><pre class="programlisting" xml:space="preserve">
8#include &lt;rfc822.h&gt;
9
10#include &lt;rfc2047.h&gt;
11
12cc ... -lrfc822
8d138742 13</pre></div></div><div class="refsect1" lang="en" xml:lang="en"><a id="id413017" shape="rect"> </a><h2>DESCRIPTION</h2><p>
d9898ee8 14The rfc822 library provides functions for parsing E-mail headers in the RFC
15822 format. This library also includes some functions to help with encoding
16and decoding 8-bit text, as defined by RFC 2047.</p><p>
17The format used by E-mail headers to encode sender and recipient
18information is defined by
8d138742 19<a class="ulink" href="http://www.rfc-editor.org/rfc/rfc822.txt" target="_top" shape="rect">RFC 822</a>
d9898ee8 20(and its successor,
8d138742 21<a class="ulink" href="http://www.rfc-editor.org/rfc/rfc2822.txt" target="_top" shape="rect">RFC 2822</a>).
d9898ee8 22The format allows the actual E-mail
23address and the sender/recipient name to be expressed together, for example:
24<code class="literal">John Smith &lt;jsmith@example.com&gt;</code></p><p>
25The main purposes of the rfc822 library is to:</p><p>
261) Parse a text string containing a list of RFC 822-formatted address into
27its logical components: names and E-mail addresses.</p><p>
282) Access those individual components.</p><p>
293) Allow some limited modifications of the parsed structure, and then
8d138742 30convert it back into a text string.</p><div class="refsect2" lang="en" xml:lang="en"><a id="id418693" shape="rect"> </a><h3>Tokenizing an E-mail header</h3><div class="informalexample"><pre class="programlisting" xml:space="preserve">
d9898ee8 31struct rfc822t *tokens=rfc822t_alloc_new(const char *header,
32 void (*err_func)(const char *, int, void *),
33 void *func_arg);
34
35void rfc822t_free(tokens);
36</pre></div><p>
37The <code class="function">rfc822t_alloc_new</code>() function (superceeds
38<code class="function">rfc822t_alloc</code>(), which is now
39obsolete) accepts an E-mail <em class="parameter"><code>header</code></em>, and parses it into
40individual tokens. This function allocates and returns a pointer to an
41<span class="structname">rfc822t</span>
42structure, which is later used by
43<code class="function">rfc822a_alloc</code>() to extract
44individual addresses from these tokens.</p><p>
45If <em class="parameter"><code>err_func</code></em> argument, if not NULL, is a pointer
46to a callback
47function. The function is called in the event that the E-mail header is
48corrupted to the point that it cannot even be parsed. This is a rare instance
49-- most forms of corruption are still valid at least on the lexical level.
50The only time this error is reported is in the event of mismatched
51parenthesis, angle brackets, or quotes. The callback function receives the
52<em class="parameter"><code>header</code></em> pointer, an index to the syntax error in the
53header string, and the <em class="parameter"><code>func_arg</code></em> argument.</p><p>
54The semantics of <em class="parameter"><code>err_func</code></em> are subject to change. It is recommended
55to leave this argument as NULL in the current version of the library.</p><p>
56<code class="function">rfc822t_alloc</code>() returns a pointer to a
57dynamically-allocated <span class="structname">rfc822t</span>
58structure. A NULL pointer is returned if there's insufficient memory to
59allocate this structure. The <code class="function">rfc822t_free</code>() function
60destroys
61<span class="structname">rfc822t</span> structure and frees all
62dynamically allocated memory.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
63Until <code class="function">rfc822t_free</code>() is called, the contents of
64<em class="parameter"><code>header</code></em> MUST
65NOT be destroyed or altered in any way. The contents of
66<em class="parameter"><code>header</code></em> are not
67modified by <code class="function">rfc822t_alloc</code>(), however the
68<span class="structname">rfc822t</span> structure contains
69pointers to portions of the supplied <em class="parameter"><code>header</code></em>,
8d138742 70and they must remain valid.</p></div></div><div class="refsect2" lang="en" xml:lang="en"><a id="id384057" shape="rect"> </a><h3>Extracting E-mail addresses</h3><div class="informalexample"><pre class="programlisting" xml:space="preserve">
d9898ee8 71struct rfc822a *addrs=rfc822a_alloc(struct rfc822t *tokens);
72
73void rfc822a_free(addrs);
74</pre></div><p>
75The <code class="function">rfc822a_alloc</code>() function returns a
76dynamically-allocated <span class="structname">rfc822a</span>
77structure, that contains individual addresses that were logically parsed
78from a <span class="structname">rfc822t</span> structure. The
79<code class="function">rfc822a_alloc</code>() function returns NULL if
80there was insufficient memory to allocate the <span class="structname">rfc822a</span> structure. The
81<code class="function">rfc822a_free</code>() function destroys the <span class="structname">rfc822a</span> function, and frees all
82associated dynamically-allocated memory. The <span class="structname">rfc822t</span> structure passed
83to <code class="function">rfc822a_alloc</code>() must not be destroyed before <code class="function">rfc822a_free</code>() destroys the
84<span class="structname">rfc822a</span> structure.</p><p>
85The <span class="structname">rfc822a</span> structure has the following fields:</p><div class="informalexample"><pre class="programlisting" xml:space="preserve">
86struct rfc822a {
87 struct rfc822addr *addrs;
88 int naddrs;
89} ;
90</pre></div><p>
91The <em class="structfield"><code>naddrs</code></em> field gives the number of
92<span class="structname">rfc822addr</span> structures
93that are pointed to by <em class="structfield"><code>addrs</code></em>, which is an array.
94Each <span class="structname">rfc822addr</span>
95structure represents either an address found in the original E-mail header,
96<span class="emphasis"><em>or the contents of some legacy "syntactical sugar"</em></span>.
97For example, the
98following is a valid E-mail header:</p><div class="informalexample"><pre class="programlisting" xml:space="preserve">
99To: recipient-list: tom@example.com, john@example.com;
100</pre></div><p>Typically, all of this, except for "<code class="literal">To:</code>",
101is tokenized by <code class="function">rfc822t_alloc</code>(), then parsed by
102<code class="function">rfc822a_alloc</code>().
103"<code class="literal">recipient-list:</code>" and
104the trailing semicolon is a legacy mailing list specification that is no
105longer in widespread use, but must still must be accounted for. The resulting
106<span class="structname">rfc822a</span> structure will have four
107<span class="structname">rfc822addr</span> structures: one for
108"<code class="literal">recipient-list:</code>";
109one for each address; and one for the trailing semicolon.
110Each <span class="structname">rfc822a</span> structure has the following
111fields:</p><div class="informalexample"><pre class="programlisting" xml:space="preserve">
112struct rfc822addr {
113 struct rfc822token *tokens;
114 struct rfc822token *name;
115} ;
116</pre></div><p>
117If <em class="structfield"><code>tokens</code></em> is a null pointer, this structure
118represents some
119non-address portion of the original header, such as
120"<code class="literal">recipient-list:</code>" or a
121semicolon. Otherwise it points to a structure that represents the E-mail
122address in tokenized form.</p><p>
123<em class="structfield"><code>name</code></em> either points to the tokenized form of a
124non-address portion of
125the original header, or to a tokenized form of the recipient's name.
126<em class="structfield"><code>name</code></em> will be NULL if the recipient name was not provided. For the
127following address:
128<code class="literal">Tom Jones &lt;tjones@example.com&gt;</code> - the
129<em class="structfield"><code>tokens</code></em> field points to the tokenized form of
130"<code class="literal">tjones@example.com</code>",
131and <em class="structfield"><code>name</code></em> points to the tokenized form of
132"<code class="literal">Tom Jones</code>".</p><p>
133Each <span class="structname">rfc822token</span> structure contains the following
134fields:</p><div class="informalexample"><pre class="programlisting" xml:space="preserve">
135struct rfc822token {
136 struct rfc822token *next;
137 int token;
138 const char *ptr;
139 int len;
140} ;
141</pre></div><p>
142The <em class="structfield"><code>next</code></em> pointer builds a linked list of all
143tokens in this name or
144address. The possible values for the <em class="structfield"><code>token</code></em> field
145are:</p><div class="variablelist"><dl><dt><span class="term">0x00</span></dt><dd><p>
146This is a simple atom - a sequence of non-special characters that
147is delimited by whitespace or special characters (see below).</p></dd><dt><span class="term">0x22</span></dt><dd><p>
148The value of the ascii quote - this is a quoted string.</p></dd><dt><span class="term">Open parenthesis: '('</span></dt><dd><p>
149This is an old style comment. A deprecated form of E-mail
150addressing uses - for example -
151"<code class="literal">john@example.com (John Smith)</code>" instead of
152"<code class="literal">John Smith &lt;john@example.com&gt;</code>".
153This old-style notation defined
154parenthesized content as arbitrary comments.
155The <span class="structname">rfc822token</span> with
156<em class="structfield"><code>token</code></em> set to '(' is created for the contents of
157the entire comment.</p></dd><dt><span class="term">Symbols: '&lt;', '&gt;', '@', and many others</span></dt><dd><p>
158The remaining possible values of <em class="structfield"><code>token</code></em> include all
159the characters in RFC 822 headers that have special significance.</p></dd></dl></div><p>
160When a <span class="structname">rfc822token</span> structure does not represent a
161special character, the <em class="structfield"><code>ptr</code></em> field points to a text
162string giving its contents.
163The contents are NOT null-terminated, the <em class="structfield"><code>len</code></em>
164field contains the number of characters included.
165The macro rfc822_is_atom(token) indicates whether
166<em class="structfield"><code>ptr</code></em> and <em class="structfield"><code>len</code></em> are used for
167the given <em class="structfield"><code>token</code></em>.
168Currently <code class="function">rfc822_is_atom</code>() returns true if
169<em class="structfield"><code>token</code></em> is a zero byte, '<code class="literal">"</code>', or
170'<code class="literal">(</code>'.</p><p>
171Note that it's possible that <em class="structfield"><code>len</code></em> might be zero.
172This happens with null addresses used as return addresses for delivery status
8d138742 173notifications.</p></div><div class="refsect2" lang="en" xml:lang="en"><a id="id428138" shape="rect"> </a><h3>Working with E-mail addresses</h3><div class="informalexample"><pre class="programlisting" xml:space="preserve">
d9898ee8 174void rfc822_deladdr(struct rfc822a *addrs, int index);
175
176void rfc822tok_print(const struct rfc822token *list,
177 void (*func)(char, void *), void *func_arg);
178
179void rfc822_print(const struct rfc822a *addrs,
180 void (*print_func)(char, void *),
181 void (*print_separator)(const char *, void *), void *callback_arg);
182
183void rfc822_addrlist(const struct rfc822a *addrs,
184 void (*print_func)(char, void *),
185 void *callback_arg);
186
187void rfc822_namelist(const struct rfc822a *addrs,
188 void (*print_func)(char, void *),
189 void *callback_arg);
190
191void rfc822_praddr(const struct rfc822a *addrs,
192 int index,
193 void (*print_func)(char, void *),
194 void *callback_arg);
195
196void rfc822_prname(const struct rfc822a *addrs,
197 int index,
198 void (*print_func)(char, void *),
199 void *callback_arg);
200
201void rfc822_prname_orlist(const struct rfc822a *addrs,
202 int index,
203 void (*print_func)(char, void *),
204 void *callback_arg);
205
206char *rfc822_gettok(const struct rfc822token *list);
207char *rfc822_getaddrs(const struct rfc822a *addrs);
208char *rfc822_getaddr(const struct rfc822a *addrs, int index);
209char *rfc822_getname(const struct rfc822a *addrs, int index);
210char *rfc822_getname_orlist(const struct rfc822a *addrs, int index);
211
212char *rfc822_getaddrs_wrap(const struct rfc822a *, int);
213</pre></div><p>
214These functions are used to work with individual addresses that are parsed
215by <code class="function">rfc822a_alloc</code>().</p><p>
216<code class="function">rfc822_deladdr</code>() removes a single
217<span class="structname">rfc822addr</span> structure, whose
218<em class="parameter"><code>index</code></em> is given, from the address array in
219<span class="structname">rfc822addr</span>.
220<em class="structfield"><code>naddrs</code></em> is decremented by one.</p><p>
221<code class="function">rfc822tok_print</code>() converts a tokenized
222<em class="parameter"><code>list</code></em> of <span class="structname">rfc822token</span>
223objects into a text string. The callback function,
224<em class="parameter"><code>func</code></em>, is called one
225character at a time, for every character in the tokenized objects. An
226arbitrary pointer, <em class="parameter"><code>func_arg</code></em>, is passed unchanged as
227the additional argument to the callback function.
228<code class="function">rfc822tok_print</code>() is not usually the most
229convenient and efficient function, but it has its uses.</p><p>
230<code class="function">rfc822_print</code>() takes an entire
231<span class="structname">rfc822a</span> structure, and uses the
232callback functions to print the contained addresses, in their original form,
233separated by commas. The function pointed to by
234<em class="parameter"><code>print_func</code></em> is used to
235print each individual address, one character at a time. Between the
236addresses, the <em class="parameter"><code>print_separator</code></em> function is called to
237print the address separator, usually the string ", ".
238The <em class="parameter"><code>callback_arg</code></em> argument is passed
239along unchanged, as an additional argument to these functions.</p><p>
240The functions <code class="function">rfc822_addrlist</code>() and
241<code class="function">rfc822_namelist</code>() also print the
242contents of the entire <span class="structname">rfc822a</span> structure, but in a
243different way.
244<code class="function">rfc822_addrlist</code>() prints just the actual E-mail
245addresses, not the recipient
246names or comments. Each E-mail address is followed by a newline character.
247<code class="function">rfc822_namelist</code>() prints just the names or comments,
248followed by newlines.</p><p>
249The functions <code class="function">rfc822_praddr</code>() and
250<code class="function">rfc822_prname</code>() are just like
251<code class="function">rfc822_addrlist</code>() and
252<code class="function">rfc822_namelist</code>(), except that they print a single name
253or address in the <span class="structname">rfc822a</span> structure, given its
254<em class="parameter"><code>index</code></em>. The
255functions <code class="function">rfc822_gettok</code>(),
256<code class="function">rfc822_getaddrs</code>(), <code class="function">rfc822_getaddr</code>(),
257and <code class="function">rfc822_getname</code>() are equivalent to
258<code class="function">rfc822tok_print</code>(), <code class="function">rfc822_print</code>(),
259<code class="function">rfc822_praddr</code>() and <code class="function">rfc822_prname</code>(),
260but, instead of using a callback function
261pointer, these functions write the output into a dynamically allocated buffer.
262That buffer must be destroyed by <code class="function">free</code>(3) after use.
263These functions will
264return a null pointer in the event of a failure to allocate memory for the
265buffer.</p><p>
266<code class="function">rfc822_prname_orlist</code>() is similar to
267<code class="function">rfc822_prname</code>(), except that it will
268also print the legacy RFC822 group list syntax (which are also parsed by
269<code class="function">rfc822a_alloc</code>()). <code class="function">rfc822_praddr</code>()
270will print an empty string for an index
271that corresponds to a group list name (or terminated semicolon).
272<code class="function">rfc822_prname</code>() will also print an empty string.
273<code class="function">rfc822_prname_orlist</code>() will
274instead print either the name of the group list, or a single string ";".
275<code class="function">rfc822_getname_orlist</code>() will instead save it into a
276dynamically allocated buffer.</p><p>
277The function <code class="function">rfc822_getaddrs_wrap</code>() is similar to
278<code class="function">rfc822_getaddrs</code>(), except
279that the generated text is wrapped on or about the 73rd column, using
8d138742 280newline characters.</p></div><div class="refsect2" lang="en" xml:lang="en"><a id="id428445" shape="rect"> </a><h3>Working with dates</h3><div class="informalexample"><pre class="programlisting" xml:space="preserve">
d9898ee8 281time_t timestamp=rfc822_parsedt(const char *datestr)
282const char *datestr=rfc822_mkdate(time_t timestamp);
283void rfc822_mkdate_buf(time_t timestamp, char *buffer);
284</pre></div><p>
285These functions convert between timestamps and dates expressed in the
286<code class="literal">Date:</code> E-mail header format.</p><p>
287<code class="function">rfc822_parsedt</code>() returns the timestamp corresponding to
288the given date string (0 if there was a syntax error).</p><p>
289<code class="function">rfc822_mkdate</code>() returns a date string corresponding to
290the given timestamp.
291<code class="function">rfc822_mkdate_buf</code>() writes the date string into the
292given buffer instead,
8d138742 293which must be big enough to accommodate it.</p></div><div class="refsect2" lang="en" xml:lang="en"><a id="id428494" shape="rect"> </a><h3>Working with 8-bit MIME-encoded headers</h3><div class="informalexample"><pre class="programlisting" xml:space="preserve">
d9898ee8 294int error=rfc2047_decode(const char *text,
295 int (*callback_func)(const char *, int, const char *, void *),
296 void *callback_arg);
297
298extern char *str=rfc2047_decode_simple(const char *text);
299
300extern char *str=rfc2047_decode_enhanced(const char *text,
301 const char *charset);
302
303void rfc2047_print(const struct rfc822a *a,
304 const char *charset,
305 void (*print_func)(char, void *),
306 void (*print_separator)(const char *, void *), void *);
307
308
309char *buffer=rfc2047_encode_str(const char *string,
310 const char *charset);
311
312int error=rfc2047_encode_callback(const char *string,
313 const char *charset,
314 int (*func)(const char *, size_t, void *),
315 void *callback_arg);
316
317char *buffer=rfc2047_encode_header(const struct rfc822a *a,
318 const char *charset);
319</pre></div><p>
320These functions provide additional logic to encode or decode 8-bit content
321in 7-bit RFC 822 headers, as specified in RFC 2047.</p><p>
322<code class="function">rfc2047_decode</code>() is a basic RFC 2047 decoding function.
323It receives a
324pointer to some 7bit RFC 2047-encoded text, and a callback function. The
325callback function is repeatedly called. Each time it's called it receives a
326piece of decoded text. The arguments are: a pointer to a text fragment, number
327of bytes in the text fragment, followed by a pointer to the character set of
328the text fragment. The character set pointer is NULL for portions of the
329original text that are not RFC 2047-encoded.</p><p>
330The callback function also receives <em class="parameter"><code>callback_arg</code></em>, as
331its last
332argument. If the callback function returns a non-zero value,
333<code class="function">rfc2047_decode</code>()
334terminates, returning that value. Otherwise,
335<code class="function">rfc2047_decode</code>() returns 0 after
336a successful decoding. <code class="function">rfc2047_decode</code>() returns -1 if it
337was unable to allocate sufficient memory.</p><p>
338<code class="function">rfc2047_decode_simple</code>() and
339<code class="function">rfc2047_decode_enhanced</code>() are alternatives to
340<code class="function">rfc2047_decode</code>() which forego a callback function, and
341return the decoded text
342in a dynamically-allocated memory buffer. The buffer must be
343<code class="function">free</code>(3)-ed after
344use. <code class="function">rfc2047_decode_simple</code>() discards all character set
345specifications, and
346merely decodes any 8-bit text. <code class="function">rfc2047_decode_enhanced</code>()
347is a compromise to
348discarding all character set information. The local character set being used
349is specified as the second argument to
350<code class="function">rfc2047_decode_enhanced</code>(). Any RFC
3512047-encoded text in a different character set will be prefixed by the name of
352the character set, in brackets, in the resulting output.</p><p>
353<code class="function">rfc2047_decode_simple</code>() and
354<code class="function">rfc2047_decode_enhanced</code>() return a null pointer
355if they are unable to allocate sufficient memory.</p><p>
356The <code class="function">rfc2047_print</code>() function is equivalent to
357<code class="function">rfc822_print</code>(), followed by
358<code class="function">rfc2047_decode_enhanced</code>() on the result. The callback
359functions are used in
360an identical fashion, except that they receive text that's already
361decoded.</p><p>
362The function <code class="function">rfc2047_encode_str</code>() takes a
363<em class="parameter"><code>string</code></em> and <em class="parameter"><code>charset</code></em>
364being the name of the local character set, then encodes any 8-bit portions of
365<em class="parameter"><code>string</code></em> using RFC 2047 encoding.
366<code class="function">rfc2047_encode_str</code>() returns a
367dynamically-allocated buffer with the result, which must be
368<code class="function">free</code>(3)-ed after
369use, or NULL if there was insufficient memory to allocate the buffer.</p><p>
370The function <code class="function">rfc2047_encode_callback</code>() is similar to
371<code class="function">rfc2047_encode_str</code>()
372except that the callback function is repeatedly called to received the
373encoding string. Each invocation of the callback function receives a pointer
374to a portion of the encoded text, the number of characters in this portion,
375and <em class="parameter"><code>callback_arg</code></em>.</p><p>
376The function <code class="function">rfc2047_encode_header</code>() is basically
377equivalent to <code class="function">rfc822_getaddrs</code>(), followed by
8d138742 378<code class="function">rfc2047_encode_str</code>();</p></div><div class="refsect2" lang="en" xml:lang="en"><a id="id428714" shape="rect"> </a><h3>Working with subjects</h3><div class="informalexample"><pre class="programlisting" xml:space="preserve">
d9898ee8 379char *basesubj=rfc822_coresubj(const char *subj);
380
381char *basesubj=rfc822_coresubj_nouc(const char *subj);
382</pre></div><p>
383This function takes the contents of the subject header, and returns the
384"core" subject header that's used in the specification of the IMAP THREAD
385function. This function is designed to strip all subject line artifacts that
386might've been added in the process of forwarding or replying to a message.
387Currently, <code class="function">rfc822_coresubj</code>() performs the following transformations:</p><div class="variablelist"><dl><dt><span class="term">Whitespace</span></dt><dd><p>Leading and trailing whitespace is removed. Consecutive
388whitespace characters are collapsed into a single whitespace character.
389All whitespace characters are replaced by a space.</p></dd><dt><span class="term">Re:, (fwd) [foo]</span></dt><dd><p>
390These artifacts (and several others) are removed from
391the subject line.</p></dd></dl></div><p>Note that this function does NOT do MIME decoding. In order to
392implement IMAP THREAD, it is necessary to call something like
393<code class="function">rfc2047_decode</code>() before
394calling <code class="function">rfc822_coresubj</code>().</p><p>
395This function returns a pointer to a dynamically-allocated buffer, which
396must be <code class="function">free</code>(3)-ed after use.</p><p>
397<code class="function">rfc822_coresubj_nouc</code>() is like
398<code class="function">rfc822_coresubj</code>(), except that the subject
8d138742
CE
399is not converted to uppercase.</p></div></div><div class="refsect1" lang="en" xml:lang="en"><a id="id428812" shape="rect"> </a><h2>SEE ALSO</h2><p>
400<a class="ulink" href="rfc2045.html" target="_top" shape="rect"><span class="citerefentry"><span class="refentrytitle">rfc2045</span>(3)</span></a>,
401<a class="ulink" href="reformail.html" target="_top" shape="rect"><span class="citerefentry"><span class="refentrytitle">reformail</span>(1)</span></a>,
402<a class="ulink" href="reformime.html" target="_top" shape="rect"><span class="citerefentry"><span class="refentrytitle">reformime</span>(1)</span></a>.</p></div></div></body></html>