Commit | Line | Data |
---|---|---|
7f918cf1 CE |
1 | Unicode |
2 | ======= | |
3 | ||
4 | == Support in The Definition of Standard ML == | |
5 | ||
6 | There is no real support for Unicode in the | |
7 | <:DefinitionOfStandardML:Definition>; there are only a few throw-away | |
8 | sentences along the lines of "the characters with numbers 0 to 127 | |
9 | coincide with the ASCII character set." | |
10 | ||
11 | == Support in The Standard ML Basis Library == | |
12 | ||
13 | Neither is there real support for Unicode in the <:BasisLibrary:Basis | |
14 | Library>. The general consensus (which includes the opinions of the | |
15 | editors of the Basis Library) is that the `WideChar` and `WideString` | |
16 | structures are insufficient for the purposes of Unicode. There is no | |
17 | `LargeChar` structure, which in itself is a deficiency, since a | |
18 | programmer can not program against the largest supported character | |
19 | size. | |
20 | ||
21 | == Current Support in MLton == | |
22 | ||
23 | MLton, as a minor extension over the Definition, supports UTF-8 byte | |
24 | sequences in text constants. This feature enables "UTF-8 convenience" | |
25 | (but not comprehensive Unicode support); in particular, it allows one | |
26 | to copy text from a browser and paste it into a string constant in an | |
27 | editor and, furthermore, if the string is printed to a terminal, then | |
28 | will (typically) appear as the original text. See the | |
29 | <:SuccessorML#ExtendedTextConsts:extended text constants feature of | |
30 | Successor ML> for more details. | |
31 | ||
32 | MLton, also as a minor extension over the Definition, supports | |
33 | `\Uxxxxxxxx` numeric escapes in text constants and has preliminary | |
34 | internal support for 16- and 32-bit characters and strings. | |
35 | ||
36 | MLton provides `WideChar` and `WideString` structures, corresponding | |
37 | to 32-bit characters and strings, respectively. | |
38 | ||
39 | == Questions and Discussions == | |
40 | ||
41 | There are periodic flurries of questions and discussion about Unicode | |
42 | in MLton/SML. In December 2004, there was a discussion that led to | |
43 | some seemingly sound design decisions. The discussion started at: | |
44 | ||
45 | * http://www.mlton.org/pipermail/mlton/2004-December/026396.html | |
46 | ||
47 | There is a good summary of points at: | |
48 | ||
49 | * http://www.mlton.org/pipermail/mlton/2004-December/026440.html | |
50 | ||
51 | In November 2005, there was a followup discussion and the beginning of | |
52 | some coding. | |
53 | ||
54 | * http://www.mlton.org/pipermail/mlton/2005-November/028300.html | |
55 | ||
56 | == Also see == | |
57 | ||
58 | The <:fxp:> XML parser has some support for dealing with Unicode | |
59 | documents. |