[hcoop/debian/courier-authlib.git] / libs / unicode / README


HOW TO ADD A NEW CHARACTER SET MAPPING.

 * Create a struct unicode_info structure.  This structure defines the
 official character set name, as well as pointers to conversion functions.

 * Add the name of the character set, and the name of your structure to
 unicode/charsetlist.txt.  Multiple entries in unicode/charsetlist.txt can
 be used to define aliases for the same character set.  Example - "IBM869"
 and "CP869" both specify the same character set, they both point to the
 unicode_IBM_869 object, which is defined in ibm869.c

 There's an automatically generated source file, charsetlist.c, which is
 generated by a script from charsetlist.txt.  That's how character sets end up
 being linked into the code, and how individual character sets can be
 selectively included or excluded.

 The struct unicode_info structure contains pointers to the following
 functions:

 + Convert text in this character set to unicode.

 + Convert unicode to text in this character set.

 + Convert text in this character set to uppercase.

 + Convert text in this character set to lowercase.

 + Convert text in this character set to titlecase.

 If the character set allows for convenient conversion to
 upper/lower/titlecase, the conversion code should be coded directly. 
 Otherwise, the library has a set of convenient functions that go against
 the unicode master table.  Text in any character set can
 upper/lower/titlecased by converting it to unicode, running it through
 unicode_uc/unicode_lc/unicode_tc, then converting unicode back to the
 original character set.  See utf8_chset.c for an example.

 Note that unicode_uc/unicode_lc/unicode_tc carries a heavy penalty, and
 should be avoided.  unicode_[ult]c() adds about 26Kb of data tables.

 Finally, all this code has to be added to libunicode.a.  It can simply be
 added to libunicode_a_SOURCES.

 If, after doing all that, run make to build libunicode.a and the
 unicode-info program.  Run unicode-info.  If the character set is listed by
 unicode-info, you should be all set, provided that the conversion functions
 actually work as advertised.
Commit	Line	Data
8d138742 CE	1
	2	HOW TO ADD A NEW CHARACTER SET MAPPING.
	3
	4	* Create a struct unicode_info structure. This structure defines the
	5	official character set name, as well as pointers to conversion functions.
	6
	7	* Add the name of the character set, and the name of your structure to
	8	unicode/charsetlist.txt. Multiple entries in unicode/charsetlist.txt can
	9	be used to define aliases for the same character set. Example - "IBM869"
	10	and "CP869" both specify the same character set, they both point to the
	11	unicode_IBM_869 object, which is defined in ibm869.c
	12
	13	There's an automatically generated source file, charsetlist.c, which is
	14	generated by a script from charsetlist.txt. That's how character sets end up
	15	being linked into the code, and how individual character sets can be
	16	selectively included or excluded.
	17
	18	The struct unicode_info structure contains pointers to the following
	19	functions:
	20
	21	+ Convert text in this character set to unicode.
	22
	23	+ Convert unicode to text in this character set.
	24
	25	+ Convert text in this character set to uppercase.
	26
	27	+ Convert text in this character set to lowercase.
	28
	29	+ Convert text in this character set to titlecase.
	30
	31	If the character set allows for convenient conversion to
	32	upper/lower/titlecase, the conversion code should be coded directly.
	33	Otherwise, the library has a set of convenient functions that go against
	34	the unicode master table. Text in any character set can
	35	upper/lower/titlecased by converting it to unicode, running it through
	36	unicode_uc/unicode_lc/unicode_tc, then converting unicode back to the
	37	original character set. See utf8_chset.c for an example.
	38
	39	Note that unicode_uc/unicode_lc/unicode_tc carries a heavy penalty, and
	40	should be avoided. unicode_[ult]c() adds about 26Kb of data tables.
	41
	42	Finally, all this code has to be added to libunicode.a. It can simply be
	43	added to libunicode_a_SOURCES.
	44
	45	If, after doing all that, run make to build libunicode.a and the
	46	unicode-info program. Run unicode-info. If the character set is listed by
	47	unicode-info, you should be all set, provided that the conversion functions
	48	actually work as advertised.
	49