Merge branch 'debian'
[hcoop/debian/courier-authlib.git] / libs / unicode / README
CommitLineData
8d138742
CE
1
2HOW TO ADD A NEW CHARACTER SET MAPPING.
3
4 * Create a struct unicode_info structure. This structure defines the
5 official character set name, as well as pointers to conversion functions.
6
7 * Add the name of the character set, and the name of your structure to
8 unicode/charsetlist.txt. Multiple entries in unicode/charsetlist.txt can
9 be used to define aliases for the same character set. Example - "IBM869"
10 and "CP869" both specify the same character set, they both point to the
11 unicode_IBM_869 object, which is defined in ibm869.c
12
13 There's an automatically generated source file, charsetlist.c, which is
14 generated by a script from charsetlist.txt. That's how character sets end up
15 being linked into the code, and how individual character sets can be
16 selectively included or excluded.
17
18 The struct unicode_info structure contains pointers to the following
19 functions:
20
21 + Convert text in this character set to unicode.
22
23 + Convert unicode to text in this character set.
24
25 + Convert text in this character set to uppercase.
26
27 + Convert text in this character set to lowercase.
28
29 + Convert text in this character set to titlecase.
30
31 If the character set allows for convenient conversion to
32 upper/lower/titlecase, the conversion code should be coded directly.
33 Otherwise, the library has a set of convenient functions that go against
34 the unicode master table. Text in any character set can
35 upper/lower/titlecased by converting it to unicode, running it through
36 unicode_uc/unicode_lc/unicode_tc, then converting unicode back to the
37 original character set. See utf8_chset.c for an example.
38
39 Note that unicode_uc/unicode_lc/unicode_tc carries a heavy penalty, and
40 should be avoided. unicode_[ult]c() adds about 26Kb of data tables.
41
42 Finally, all this code has to be added to libunicode.a. It can simply be
43 added to libunicode_a_SOURCES.
44
45 If, after doing all that, run make to build libunicode.a and the
46 unicode-info program. Run unicode-info. If the character set is listed by
47 unicode-info, you should be all set, provided that the conversion functions
48 actually work as advertised.
49