* notes/unicode: Improve notes about Emacs source file encoding.

author Paul Eggert <eggert@cs.ucla.edu>

Mon, 11 Mar 2013 22:32:07 +0000 (15:32 -0700)

committer Paul Eggert <eggert@cs.ucla.edu>

Mon, 11 Mar 2013 22:32:07 +0000 (15:32 -0700)
author Paul Eggert <eggert@cs.ucla.edu>
Mon, 11 Mar 2013 22:32:07 +0000 (15:32 -0700)
committer Paul Eggert <eggert@cs.ucla.edu>
Mon, 11 Mar 2013 22:32:07 +0000 (15:32 -0700)
diff --git a/admin/ChangeLog b/admin/ChangeLog

index 419336f..a0fd90e 100644 (file)
--- a/admin/ChangeLog
+++ b/admin/ChangeLog
@@ -1,3 +1,7 @@
+2013-03-11  Paul Eggert  <eggert@cs.ucla.edu>
+
+       * notes/unicode: Improve notes about Emacs source file encoding.
+
  2013-03-11  Glenn Morris  <rgm@gnu.org>
  
         * admin.el (make-manuals): Add emacs-lisp-intro and some more
diff --git a/admin/notes/unicode b/admin/notes/unicode

index 0654036..68a6a67 100644 (file)
--- a/admin/notes/unicode
+++ b/admin/notes/unicode
@@ -104,12 +104,15 @@ Source file encoding
  
  Most Emacs source files are encoded in UTF-8 (or in ASCII, which is a
  subset), but there are a few exceptions, listed below.  Perhaps
-someday these files will be converted to UTF-8, for convenience when
-using tools like 'grep -r', but this might need nontrivial changes to
-the build process.
+someday many of the these files will be converted to UTF-8, for
+convenience when using tools like 'grep -r', but this might need
+nontrivial changes to the build process.
  
   * chinese-big5
  
+     These are verbatim copies of files taken from external sources.
+     They haven't been converted to UTF-8.
+
         leim/CXTERM-DIC/4Corner.tit
         leim/CXTERM-DIC/ARRAY30.tit
         leim/CXTERM-DIC/ECDICT.tit
@@ -123,6 +126,9 @@ the build process.
  
   * chinese-iso-8bit
  
+     These are verbatim copies of files taken from external sources.
+     They haven't been converted to UTF-8.
+
         leim/CXTERM-DIC/CCDOSPY.tit
         leim/CXTERM-DIC/Punct.tit
         leim/CXTERM-DIC/QJ.tit
@@ -132,28 +138,73 @@ the build process.
         leim/MISC-DIC/CTLau.html
         leim/MISC-DIC/ziranma.cin
  
+ * cp850
+
+     This file contains non-ASCII characters in unibyte strings.  When
+     editing a keyboard layout it's more convenient to see 'é' than
+     '\202', and the MS-DOS compiler requires the single byte if a
+     backslash escape is not being used.
+
+       src/msdos.c
+
+ * iso-2022-cn-ext
+
+     This file is externally generated from leim/MISC-DIC/cangjie-table.b5
+     by Big5->CNS converter.  It hasn't been converted to UTF-8.
+
+       leim/MISC-DIC/cangjie-table.cns
+
   * iso-latin-2
  
+     These files are processed by csplain, a program that requires
+     Latin-2 input.  In 2012 the csplain maintainers started
+     recommending UTF-8, but these files haven't been converted yet.
+
+       etc/refcards/cs-dired-ref.tex
         etc/refcards/cs-refcard.tex
-       etc/refcards/sk-survival.tex
         etc/refcards/cs-survival.tex
-       etc/refcards/cs-dired-ref.tex
         etc/refcards/sk-dired-ref.tex
         etc/refcards/sk-refcard.tex
+       etc/refcards/sk-survival.tex
  
   * japanese-iso-8bit
  
+     SKK-JISYO.L is a verbatim copy of a file taken from an external source.
+     ja-dic.el is generated automatically by skkdic-convert; this process
+     hasn't been converted to use UTF-8.
+
         leim/SKK-DIC/SKK-JISYO.L
         leim/ja-dic/ja-dic.el
  
   * japanese-shift-jis
  
+     This is a verbatim copy of a file taken from an external source.
+     It hasn't been converted to UTF-8.
+
         admin/charsets/mapfiles/cns2ucsdkw.txt
  
   * no-conversion
  
+     This file purposely contains arbitrary bytes interspersed within text,
+     to test whether the Emacs distribution is corrupted.
+
         lib-src/testfile
  
+ * iso-2022-7bit
+
+     These files contain characters that cannot be encoded in UTF-8.
+
+       leim/quail/tibetan.el
+       leim/quail/ethiopic.el
+       lisp/international/titdic-cnv.el
+       lisp/language/tibetan.el
+       lisp/language/tibet-util.el
+       lisp/language/ind-util.el
+
+     Converting this file to UTF-8 loses non-character information.
+
+       leim/quail/hanja3.el
+
  \f
  This file is part of GNU Emacs.
author	Paul Eggert <eggert@cs.ucla.edu>
	Mon, 11 Mar 2013 22:32:07 +0000 (15:32 -0700)
committer	Paul Eggert <eggert@cs.ucla.edu>
	Mon, 11 Mar 2013 22:32:07 +0000 (15:32 -0700)
admin/ChangeLog		patch \| blob \| blame \| history
admin/notes/unicode		patch \| blob \| blame \| history