Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / src / Unicode.adoc
CommitLineData
7f918cf1
CE
1Unicode
2=======
3
4== Support in The Definition of Standard ML ==
5
6There is no real support for Unicode in the
7<:DefinitionOfStandardML:Definition>; there are only a few throw-away
8sentences along the lines of "the characters with numbers 0 to 127
9coincide with the ASCII character set."
10
11== Support in The Standard ML Basis Library ==
12
13Neither is there real support for Unicode in the <:BasisLibrary:Basis
14Library>. The general consensus (which includes the opinions of the
15editors of the Basis Library) is that the `WideChar` and `WideString`
16structures are insufficient for the purposes of Unicode. There is no
17`LargeChar` structure, which in itself is a deficiency, since a
18programmer can not program against the largest supported character
19size.
20
21== Current Support in MLton ==
22
23MLton, as a minor extension over the Definition, supports UTF-8 byte
24sequences in text constants. This feature enables "UTF-8 convenience"
25(but not comprehensive Unicode support); in particular, it allows one
26to copy text from a browser and paste it into a string constant in an
27editor and, furthermore, if the string is printed to a terminal, then
28will (typically) appear as the original text. See the
29<:SuccessorML#ExtendedTextConsts:extended text constants feature of
30Successor ML> for more details.
31
32MLton, also as a minor extension over the Definition, supports
33`\Uxxxxxxxx` numeric escapes in text constants and has preliminary
34internal support for 16- and 32-bit characters and strings.
35
36MLton provides `WideChar` and `WideString` structures, corresponding
37to 32-bit characters and strings, respectively.
38
39== Questions and Discussions ==
40
41There are periodic flurries of questions and discussion about Unicode
42in MLton/SML. In December 2004, there was a discussion that led to
43some seemingly sound design decisions. The discussion started at:
44
45 * http://www.mlton.org/pipermail/mlton/2004-December/026396.html
46
47There is a good summary of points at:
48
49 * http://www.mlton.org/pipermail/mlton/2004-December/026440.html
50
51In November 2005, there was a followup discussion and the beginning of
52some coding.
53
54 * http://www.mlton.org/pipermail/mlton/2005-November/028300.html
55
56== Also see ==
57
58The <:fxp:> XML parser has some support for dealing with Unicode
59documents.