Commit | Line | Data |
---|---|---|
c38e0c97 | 1 | -*-mode: text; coding: utf-8;-*- |
e88a2ed3 | 2 | |
ab422c4d | 3 | Copyright (C) 2002-2013 Free Software Foundation, Inc. |
e88a2ed3 GM |
4 | See the end of the file for license conditions. |
5 | ||
6 | Problems, fixmes and other unicode-related issues | |
7 | ------------------------------------------------------------- | |
8 | ||
9 | Notes by fx to record various things of variable importance. handa | |
10 | needs to check them -- don't take too seriously, especially with | |
11 | regard to completeness. | |
12 | ||
13 | * SINGLE_BYTE_CHAR_P returns true for Latin-1 characters, which has | |
14 | undesirable effects. E.g.: | |
c38e0c97 PE |
15 | (multibyte-string-p (let ((s "x")) (aset s 0 ?£) s)) => nil |
16 | (multibyte-string-p (concat [?£])) => nil | |
17 | (text-char-description ?£) => "M-#" | |
e88a2ed3 GM |
18 | |
19 | These examples are all fixed by the change of 2002-10-14, but | |
20 | there still exist questionable SINGLE_BYTE_CHAR_P in the | |
21 | code (keymap.c and print.c). | |
22 | ||
23 | * Rationalize character syntax and its relationship to the Unicode | |
24 | database. (Applies mainly to symbol an punctuation syntax.) | |
25 | ||
26 | * Fontset handling and customization needs work. We want to relate | |
27 | fonts to scripts, probably based on the Unicode blocks. The | |
28 | presence of small-repertoire 10646-encoded fonts in XFree 4 is a | |
29 | pain, not currently worked round. | |
30 | ||
31 | With the change on 2002-07-26, multiple fonts can be | |
32 | specified in a fontset for a specific range of characters. | |
33 | Each range can also be specified by script. Before using | |
34 | ISO10646 fonts, Emacs checks their repertories to avoid such | |
35 | fonts that don't have a glyph for a specific character. | |
36 | ||
37 | fx has worked on fontset customization, but was stymied by | |
38 | basic problems with the way the default face is dealt with | |
39 | (and something else, I think). This needs revisiting. | |
40 | ||
41 | * Work is also needed on charset and coding system priorities. | |
42 | ||
43 | * The relevant bits of latin1-disp.el need porting (and probably | |
44 | re-naming/updating). See also cyril-util.el. | |
45 | ||
46 | * Quail files need more work now the encoding is largely irrelevant. | |
47 | ||
48 | * What to do with the old coding categories stuff? | |
49 | ||
50 | * The preferred-coding-system property of charsets should probably be | |
51 | junked unless it can be made more useful now. | |
52 | ||
53 | * find-multibyte-characters needs looking at. | |
54 | ||
55 | * Implement Korean cp949/UHC, BIG5-HKSCS and any other important missing | |
56 | charsets. | |
57 | ||
58 | * Lazy-load tables for unify-charset somehow? | |
59 | ||
60 | Actually, Emacs clears out all charset maps and unify-map just | |
61 | before dumping, and they are loaded again on demand by the | |
62 | dumped emacs. But, those maps (char tables) generated while | |
63 | temacs is running can't be removed from the dumped emacs. | |
64 | ||
e88a2ed3 GM |
65 | * iso-2022 charsets get unified on i/o. |
66 | ||
67 | With the change on 2003-01-06, decoding routines put `charset' | |
68 | property to decoded text, and iso-2022 encoder pay attention | |
69 | to it. Thus, for instance, reading and writing by | |
70 | iso-2022-7bit preserve the original designation sequences. | |
71 | The property name `preferred-charset' may be better? | |
72 | ||
73 | We may have to utilize this property to decide a font. | |
74 | ||
75 | * Revisit locale processing: look at treating the language and | |
76 | charset parts separately. (Language should affect things like | |
77 | spelling and calendar, but that's not a Unicode issue.) | |
78 | ||
79 | * Handle Unicode combining characters usefully, e.g. diacritics, and | |
c38e0c97 | 80 | handle more scripts specifically (à la Devanagari). There are |
e88a2ed3 GM |
81 | issues with canonicalization. |
82 | ||
e88a2ed3 GM |
83 | * We need tabular input methods, e.g. for maths symbols. (Not |
84 | specific to Unicode.) | |
85 | ||
86 | * Need multibyte text in menus, e.g. for the above. (Not specific to | |
87 | Unicode -- see Emacs etc/TODO, but now mostly works with gtk.) | |
88 | ||
89 | * There's currently no support for Unicode normalization. | |
90 | ||
91 | * Populate char-width-table correctly for Unicode characters and | |
92 | worry about what happens when double-width charsets covering | |
93 | non-CJK characters are unified. | |
94 | ||
e88a2ed3 GM |
95 | * There are type errors lurking, e.g. in |
96 | Fcheck_coding_systems_region. Define ENABLE_CHECKING to find them. | |
97 | ||
e88a2ed3 GM |
98 | * Old auto-save files, and similar files, such as Gnus drafts, |
99 | containing non-ASCII characters probably won't be re-read correctly. | |
100 | ||
101 | \f | |
102 | This file is part of GNU Emacs. | |
103 | ||
9ad5de0c | 104 | GNU Emacs is free software: you can redistribute it and/or modify |
e88a2ed3 | 105 | it under the terms of the GNU General Public License as published by |
9ad5de0c GM |
106 | the Free Software Foundation, either version 3 of the License, or |
107 | (at your option) any later version. | |
e88a2ed3 GM |
108 | |
109 | GNU Emacs is distributed in the hope that it will be useful, | |
110 | but WITHOUT ANY WARRANTY; without even the implied warranty of | |
111 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
112 | GNU General Public License for more details. | |
113 | ||
114 | You should have received a copy of the GNU General Public License | |
9ad5de0c | 115 | along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. |