Imported Upstream version 0.63.0

[hcoop/debian/courier-authlib.git] / unicode / README
diff --git a/unicode/README b/unicode/README

new file mode 100644 (file)

index 0000000..3112508
--- /dev/null
+++ b/unicode/README
@@ -0,0 +1,49 @@
+
+HOW TO ADD A NEW CHARACTER SET MAPPING.
+
+ * Create a struct unicode_info structure.  This structure defines the
+ official character set name, as well as pointers to conversion functions.
+
+ * Add the name of the character set, and the name of your structure to
+ unicode/charsetlist.txt.  Multiple entries in unicode/charsetlist.txt can
+ be used to define aliases for the same character set.  Example - "IBM869"
+ and "CP869" both specify the same character set, they both point to the
+ unicode_IBM_869 object, which is defined in ibm869.c
+
+ There's an automatically generated source file, charsetlist.c, which is
+ generated by a script from charsetlist.txt.  That's how character sets end up
+ being linked into the code, and how individual character sets can be
+ selectively included or excluded.
+
+ The struct unicode_info structure contains pointers to the following
+ functions:
+
+ + Convert text in this character set to unicode.
+
+ + Convert unicode to text in this character set.
+
+ + Convert text in this character set to uppercase.
+
+ + Convert text in this character set to lowercase.
+
+ + Convert text in this character set to titlecase.
+
+ If the character set allows for convenient conversion to
+ upper/lower/titlecase, the conversion code should be coded directly. 
+ Otherwise, the library has a set of convenient functions that go against
+ the unicode master table.  Text in any character set can
+ upper/lower/titlecased by converting it to unicode, running it through
+ unicode_uc/unicode_lc/unicode_tc, then converting unicode back to the
+ original character set.  See utf8_chset.c for an example.
+
+ Note that unicode_uc/unicode_lc/unicode_tc carries a heavy penalty, and
+ should be avoided.  unicode_[ult]c() adds about 26Kb of data tables.
+
+ Finally, all this code has to be added to libunicode.a.  It can simply be
+ added to libunicode_a_SOURCES.
+
+ If, after doing all that, run make to build libunicode.a and the
+ unicode-info program.  Run unicode-info.  If the character set is listed by
+ unicode-info, you should be all set, provided that the conversion functions
+ actually work as advertised.
+