Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / src / Unicode.adoc
1 Unicode
2 =======
3
4 == Support in The Definition of Standard ML ==
5
6 There is no real support for Unicode in the
7 <:DefinitionOfStandardML:Definition>; there are only a few throw-away
8 sentences along the lines of "the characters with numbers 0 to 127
9 coincide with the ASCII character set."
10
11 == Support in The Standard ML Basis Library ==
12
13 Neither is there real support for Unicode in the <:BasisLibrary:Basis
14 Library>. The general consensus (which includes the opinions of the
15 editors of the Basis Library) is that the `WideChar` and `WideString`
16 structures are insufficient for the purposes of Unicode. There is no
17 `LargeChar` structure, which in itself is a deficiency, since a
18 programmer can not program against the largest supported character
19 size.
20
21 == Current Support in MLton ==
22
23 MLton, as a minor extension over the Definition, supports UTF-8 byte
24 sequences in text constants. This feature enables "UTF-8 convenience"
25 (but not comprehensive Unicode support); in particular, it allows one
26 to copy text from a browser and paste it into a string constant in an
27 editor and, furthermore, if the string is printed to a terminal, then
28 will (typically) appear as the original text. See the
29 <:SuccessorML#ExtendedTextConsts:extended text constants feature of
30 Successor ML> for more details.
31
32 MLton, also as a minor extension over the Definition, supports
33 `\Uxxxxxxxx` numeric escapes in text constants and has preliminary
34 internal support for 16- and 32-bit characters and strings.
35
36 MLton provides `WideChar` and `WideString` structures, corresponding
37 to 32-bit characters and strings, respectively.
38
39 == Questions and Discussions ==
40
41 There are periodic flurries of questions and discussion about Unicode
42 in MLton/SML. In December 2004, there was a discussion that led to
43 some seemingly sound design decisions. The discussion started at:
44
45 * http://www.mlton.org/pipermail/mlton/2004-December/026396.html
46
47 There is a good summary of points at:
48
49 * http://www.mlton.org/pipermail/mlton/2004-December/026440.html
50
51 In November 2005, there was a followup discussion and the beginning of
52 some coding.
53
54 * http://www.mlton.org/pipermail/mlton/2005-November/028300.html
55
56 == Also see ==
57
58 The <:fxp:> XML parser has some support for dealing with Unicode
59 documents.