| 1 | ;;; utf7.el --- UTF-7 encoding/decoding for Emacs -*-coding: iso-8859-1;-*- |
| 2 | |
| 3 | ;; Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, |
| 4 | ;; 2008, 2009, 2010 Free Software Foundation, Inc. |
| 5 | |
| 6 | ;; Author: Jon K Hellan <hellan@acm.org> |
| 7 | ;; Maintainer: bugs@gnus.org |
| 8 | ;; Keywords: mail |
| 9 | |
| 10 | ;; This file is part of GNU Emacs. |
| 11 | |
| 12 | ;; GNU Emacs is free software: you can redistribute it and/or modify |
| 13 | ;; it under the terms of the GNU General Public License as published by |
| 14 | ;; the Free Software Foundation, either version 3 of the License, or |
| 15 | ;; (at your option) any later version. |
| 16 | |
| 17 | ;; GNU Emacs is distributed in the hope that it will be useful, |
| 18 | ;; but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 19 | ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| 20 | ;; GNU General Public License for more details. |
| 21 | |
| 22 | ;; You should have received a copy of the GNU General Public License |
| 23 | ;; along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. |
| 24 | |
| 25 | ;;; Commentary: |
| 26 | |
| 27 | ;; UTF-7 - A Mail-Safe Transformation Format of Unicode - RFC 2152 |
| 28 | ;; This is a transformation format of Unicode that contains only 7-bit |
| 29 | ;; ASCII octets and is intended to be readable by humans in the limiting |
| 30 | ;; case that the document consists of characters from the US-ASCII |
| 31 | ;; repertoire. |
| 32 | ;; In short, runs of characters outside US-ASCII are encoded as base64 |
| 33 | ;; inside delimiters. |
| 34 | ;; A variation of UTF-7 is specified in IMAP 4rev1 (RFC 2060) as the way |
| 35 | ;; to represent characters outside US-ASCII in mailbox names in IMAP. |
| 36 | ;; This library supports both variants, but the IMAP variation was the |
| 37 | ;; reason I wrote it. |
| 38 | ;; The routines convert UTF-7 -> UTF-16 (16 bit encoding of Unicode) |
| 39 | ;; -> current character set, and vice versa. |
| 40 | ;; However, until Emacs supports Unicode, the only Emacs character set |
| 41 | ;; supported here is ISO-8859.1, which can trivially be converted to/from |
| 42 | ;; Unicode. |
| 43 | ;; When decoding results in a character outside the Emacs character set, |
| 44 | ;; an error is thrown. It is up to the application to recover. |
| 45 | |
| 46 | ;; UTF-7 should be done by providing a coding system. Mule-UCS does |
| 47 | ;; already, but I don't know if it does the IMAP version and it's not |
| 48 | ;; clear whether that should really be a coding system. The UTF-16 |
| 49 | ;; part of the conversion can be done with coding systems available |
| 50 | ;; with Mule-UCS or some versions of Emacs. Unfortunately these were |
| 51 | ;; done wrongly (regarding handling of byte-order marks and how the |
| 52 | ;; variants were named), so we don't have a consistent name for the |
| 53 | ;; necessary coding system. The code below doesn't seem to DTRT |
| 54 | ;; generally. E.g.: |
| 55 | ;; |
| 56 |