Commit | Line | Data |
---|---|---|
23f87bed MB |
1 | ;;; utf7.el --- UTF-7 encoding/decoding for Emacs -*-coding: iso-8859-1;-*- |
2 | ;; Copyright (C) 1999, 2000, 2003 Free Software Foundation, Inc. | |
c113de23 GM |
3 | |
4 | ;; Author: Jon K Hellan <hellan@acm.org> | |
23f87bed | 5 | ;; Maintainer: bugs@gnus.org |
c113de23 GM |
6 | ;; Keywords: mail |
7 | ||
8 | ;; This file is part of GNU Emacs. | |
9 | ||
10 | ;; GNU Emacs is free software; you can redistribute it and/or modify | |
11 | ;; it under the terms of the GNU General Public License as published by | |
12 | ;; the Free Software Foundation; either version 2, or (at your option) | |
13 | ;; any later version. | |
14 | ||
15 | ;; GNU Emacs is distributed in the hope that it will be useful, | |
16 | ;; but WITHOUT ANY WARRANTY; without even the implied warranty of | |
17 | ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
18 | ;; GNU General Public License for more details. | |
19 | ||
20 | ;; You should have received a copy of the GNU General Public License | |
21 | ;; along with GNU Emacs; see the file COPYING. If not, write to | |
22 | ;; the Free Software Foundation, Inc., 59 Temple Place - Suite 330, | |
23 | ;; Boston, MA 02111-1307, USA. | |
24 | ||
25 | ;;; Commentary: | |
23f87bed MB |
26 | |
27 | ;; UTF-7 - A Mail-Safe Transformation Format of Unicode - RFC 2152 | |
28 | ;; This is a transformation format of Unicode that contains only 7-bit | |
29 | ;; ASCII octets and is intended to be readable by humans in the limiting | |
30 | ;; case that the document consists of characters from the US-ASCII | |
31 | ;; repertoire. | |
32 | ;; In short, runs of characters outside US-ASCII are encoded as base64 | |
33 | ;; inside delimiters. | |
34 | ;; A variation of UTF-7 is specified in IMAP 4rev1 (RFC 2060) as the way | |
35 | ;; to represent characters outside US-ASCII in mailbox names in IMAP. | |
36 | ;; This library supports both variants, but the IMAP variation was the | |
37 | ;; reason I wrote it. | |
38 | ;; The routines convert UTF-7 -> UTF-16 (16 bit encoding of Unicode) | |
39 | ;; -> current character set, and vice versa. | |
40 | ;; However, until Emacs supports Unicode, the only Emacs character set | |
41 | ;; supported here is ISO-8859.1, which can trivially be converted to/from | |
42 | ;; Unicode. | |
43 | ;; When decoding results in a character outside the Emacs character set, | |
44 | ;; an error is thrown. It is up to the application to recover. | |
45 | ||
46 | ;; UTF-7 should be done by providing a coding system. Mule-UCS does | |
47 | ;; already, but I don't know if it does the IMAP version and it's not | |
48 | ;; clear whether that should really be a coding system. The UTF-16 | |
49 | ;; part of the conversion can be done with coding systems available | |
50 | ;; with Mule-UCS or some versions of Emacs. Unfortunately these were | |
51 | ;; done wrongly (regarding handling of byte-order marks and how the | |
52 | ;; variants were named), so we don't have a consistent name for the | |
53 | ;; necessary coding system. The code below doesn't seem to DTRT | |
54 | ;; generally. E.g.: | |
55 | ;; | |
56 |