Commit | Line | Data |
---|---|---|
c38e0c97 | 1 | -*-mode: text; coding: utf-8;-*- |
e88a2ed3 | 2 | |
ab422c4d | 3 | Copyright (C) 2002-2013 Free Software Foundation, Inc. |
e88a2ed3 GM |
4 | See the end of the file for license conditions. |
5 | ||
6 | Problems, fixmes and other unicode-related issues | |
7 | ------------------------------------------------------------- | |
8 | ||
9 | Notes by fx to record various things of variable importance. handa | |
10 | needs to check them -- don't take too seriously, especially with | |
11 | regard to completeness. | |
12 | ||
13 | * SINGLE_BYTE_CHAR_P returns true for Latin-1 characters, which has | |
14 | undesirable effects. E.g.: | |
c38e0c97 PE |
15 | (multibyte-string-p (let ((s "x")) (aset s 0 ?£) s)) => nil |
16 | (multibyte-string-p (concat [?£])) => nil | |
17 | (text-char-description ?£) => "M-#" | |
e88a2ed3 GM |
18 | |
19 | These examples are all fixed by the change of 2002-10-14, but | |
20 | there still exist questionable SINGLE_BYTE_CHAR_P in the | |
21 | code (keymap.c and print.c). | |
22 | ||
23 | * Rationalize character syntax and its relationship to the Unicode | |
24 | database. (Applies mainly to symbol an punctuation syntax.) | |
25 | ||
26 | * Fontset handling and customization needs work. We want to relate | |
27 | fonts to scripts, probably based on the Unicode blocks. The | |
28 | presence of small-repertoire 10646-encoded fonts in XFree 4 is a | |
29 | pain, not currently worked round. | |
30 | ||
31 | With the change on 2002-07-26, multiple fonts can be | |
32 | specified in a fontset for a specific range of characters. | |
33 | Each range can also be specified by script. Before using | |
34 | ISO10646 fonts, Emacs checks their repertories to avoid such | |
35 | fonts that don't have a glyph for a specific character. | |
36 | ||
37 | fx has worked on fontset customization, but was stymied by | |
38 | basic problems with the way the default face is dealt with | |
39 | (and something else, I think). This needs revisiting. | |
40 | ||
41 | * Work is also needed on charset and coding system priorities. | |
42 | ||
43 | * The relevant bits of latin1-disp.el need porting (and probably | |
44 | re-naming/updating). See also cyril-util.el. | |
45 | ||
46 | * Quail files need more work now the encoding is largely irrelevant. | |
47 | ||
48 | * What to do with the old coding categories stuff? | |
49 | ||
50 | * The preferred-coding-system property of charsets should probably be | |
51 | junked unless it can be made more useful now. | |
52 | ||
53 | * find-multibyte-characters needs looking at. | |
54 | ||
55 | * Implement Korean cp949/UHC, BIG5-HKSCS and any other important missing | |
56 | charsets. | |
57 | ||
58 | * Lazy-load tables for unify-charset somehow? | |
59 | ||
60 | Actually, Emacs clears out all charset maps and unify-map just | |
61 | before dumping, and they are loaded again on demand by the | |
62 | dumped emacs. But, those maps (char tables) generated while | |
63 | temacs is running can't be removed from the dumped emacs. | |
64 | ||
e88a2ed3 GM |
65 | * iso-2022 charsets get unified on i/o. |
66 | ||
67 | With the change on 2003-01-06, decoding routines put `charset' | |
68 | property to decoded text, and iso-2022 encoder pay attention | |
69 | to it. Thus, for instance, reading and writing by | |
70 | iso-2022-7bit preserve the original designation sequences. | |
71 | The property name `preferred-charset' may be better? | |
72 | ||
73 | We may have to utilize this property to decide a font. | |
74 | ||
75 | * Revisit locale processing: look at treating the language and | |
76 | charset parts separately. (Language should affect things like | |
77 | spelling and calendar, but that's not a Unicode issue.) | |
78 | ||
79 | * Handle Unicode combining characters usefully, e.g. diacritics, and | |
c38e0c97 | 80 | handle more scripts specifically (à la Devanagari). There are |
e88a2ed3 GM |
81 | issues with canonicalization. |
82 | ||
e88a2ed3 GM |
83 | * We need tabular input methods, e.g. for maths symbols. (Not |
84 | specific to Unicode.) | |
85 | ||
86 | * Need multibyte text in menus, e.g. for the above. (Not specific to | |
87 | Unicode -- see Emacs etc/TODO, but now mostly works with gtk.) | |
88 | ||
89 | * There's currently no support for Unicode normalization. | |
90 | ||
91 | * Populate char-width-table correctly for Unicode characters and | |
92 | worry about what happens when double-width charsets covering | |
93 | non-CJK characters are unified. | |
94 | ||
e88a2ed3 GM |
95 | * There are type errors lurking, e.g. in |
96 | Fcheck_coding_systems_region. Define ENABLE_CHECKING to find them. | |
97 | ||
e88a2ed3 GM |
98 | * Old auto-save files, and similar files, such as Gnus drafts, |
99 | containing non-ASCII characters probably won't be re-read correctly. | |
100 | ||
d37e4893 PE |
101 | |
102 | Source file encoding | |
103 | -------------------- | |
104 | ||
105 | Most Emacs source files are encoded in UTF-8 (or in ASCII, which is a | |
106 | subset), but there are a few exceptions, listed below. Perhaps | |
107 | someday these files will be converted to UTF-8, for convenience when | |
108 | using tools like 'grep -r', but this might need nontrivial changes to | |
109 | the build process. | |
110 | ||
111 | * chinese-big5 | |
112 | ||
113 | leim/CXTERM-DIC/4Corner.tit | |
114 | leim/CXTERM-DIC/ARRAY30.tit | |
115 | leim/CXTERM-DIC/ECDICT.tit | |
116 | leim/CXTERM-DIC/ETZY.tit | |
117 | leim/CXTERM-DIC/PY-b5.tit | |
118 | leim/CXTERM-DIC/Punct-b5.tit | |
119 | leim/CXTERM-DIC/QJ-b5.tit | |
120 | leim/CXTERM-DIC/ZOZY.tit | |
121 | leim/MISC-DIC/CTLau-b5.html | |
122 | leim/MISC-DIC/cangjie-table.b5 | |
123 | ||
124 | * chinese-iso-8bit | |
125 | ||
126 | leim/CXTERM-DIC/CCDOSPY.tit | |
127 | leim/CXTERM-DIC/Punct.tit | |
128 | leim/CXTERM-DIC/QJ.tit | |
129 | leim/CXTERM-DIC/SW.tit | |
130 | leim/CXTERM-DIC/TONEPY.tit | |
131 | leim/MISC-DIC/pinyin.map | |
132 | leim/MISC-DIC/CTLau.html | |
133 | leim/MISC-DIC/ziranma.cin | |
134 | ||
135 | * iso-latin-2 | |
136 | ||
137 | etc/refcards/cs-refcard.tex | |
138 | etc/refcards/sk-survival.tex | |
139 | etc/refcards/cs-survival.tex | |
140 | etc/refcards/cs-dired-ref.tex | |
141 | etc/refcards/sk-dired-ref.tex | |
142 | etc/refcards/sk-refcard.tex | |
143 | ||
144 | * japanese-iso-8bit | |
145 | ||
146 | leim/SKK-DIC/SKK-JISYO.L | |
147 | leim/ja-dic/ja-dic.el | |
148 | ||
149 | * japanese-shift-jis | |
150 | ||
151 | admin/charsets/mapfiles/cns2ucsdkw.txt | |
152 | ||
153 | * no-conversion | |
154 | ||
155 | lib-src/testfile | |
156 | ||
e88a2ed3 GM |
157 | \f |
158 | This file is part of GNU Emacs. | |
159 | ||
9ad5de0c | 160 | GNU Emacs is free software: you can redistribute it and/or modify |
e88a2ed3 | 161 | it under the terms of the GNU General Public License as published by |
9ad5de0c GM |
162 | the Free Software Foundation, either version 3 of the License, or |
163 | (at your option) any later version. | |
e88a2ed3 GM |
164 | |
165 | GNU Emacs is distributed in the hope that it will be useful, | |
166 | but WITHOUT ANY WARRANTY; without even the implied warranty of | |
167 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
168 | GNU General Public License for more details. | |
169 | ||
170 | You should have received a copy of the GNU General Public License | |
9ad5de0c | 171 | along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. |