Commit | Line | Data |
---|---|---|
8d138742 CE |
1 | |
2 | HOW TO ADD A NEW CHARACTER SET MAPPING. | |
3 | ||
4 | * Create a struct unicode_info structure. This structure defines the | |
5 | official character set name, as well as pointers to conversion functions. | |
6 | ||
7 | * Add the name of the character set, and the name of your structure to | |
8 | unicode/charsetlist.txt. Multiple entries in unicode/charsetlist.txt can | |
9 | be used to define aliases for the same character set. Example - "IBM869" | |
10 | and "CP869" both specify the same character set, they both point to the | |
11 | unicode_IBM_869 object, which is defined in ibm869.c | |
12 | ||
13 | There's an automatically generated source file, charsetlist.c, which is | |
14 | generated by a script from charsetlist.txt. That's how character sets end up | |
15 | being linked into the code, and how individual character sets can be | |
16 | selectively included or excluded. | |
17 | ||
18 | The struct unicode_info structure contains pointers to the following | |
19 | functions: | |
20 | ||
21 | + Convert text in this character set to unicode. | |
22 | ||
23 | + Convert unicode to text in this character set. | |
24 | ||
25 | + Convert text in this character set to uppercase. | |
26 | ||
27 | + Convert text in this character set to lowercase. | |
28 | ||
29 | + Convert text in this character set to titlecase. | |
30 | ||
31 | If the character set allows for convenient conversion to | |
32 | upper/lower/titlecase, the conversion code should be coded directly. | |
33 | Otherwise, the library has a set of convenient functions that go against | |
34 | the unicode master table. Text in any character set can | |
35 | upper/lower/titlecased by converting it to unicode, running it through | |
36 | unicode_uc/unicode_lc/unicode_tc, then converting unicode back to the | |
37 | original character set. See utf8_chset.c for an example. | |
38 | ||
39 | Note that unicode_uc/unicode_lc/unicode_tc carries a heavy penalty, and | |
40 | should be avoided. unicode_[ult]c() adds about 26Kb of data tables. | |
41 | ||
42 | Finally, all this code has to be added to libunicode.a. It can simply be | |
43 | added to libunicode_a_SOURCES. | |
44 | ||
45 | If, after doing all that, run make to build libunicode.a and the | |
46 | unicode-info program. Run unicode-info. If the character set is listed by | |
47 | unicode-info, you should be all set, provided that the conversion functions | |
48 | actually work as advertised. | |
49 |