Commit | Line | Data |
---|---|---|
7f918cf1 CE |
1 | <!DOCTYPE html>\r |
2 | <html lang="en">\r | |
3 | <head>\r | |
4 | <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\r | |
5 | <meta name="generator" content="AsciiDoc 8.6.9">\r | |
6 | <title>Unicode</title>\r | |
7 | <link rel="stylesheet" href="./asciidoc.css" type="text/css">\r | |
8 | <link rel="stylesheet" href="./pygments.css" type="text/css">\r | |
9 | \r | |
10 | \r | |
11 | <script type="text/javascript" src="./asciidoc.js"></script>\r | |
12 | <script type="text/javascript">\r | |
13 | /*<![CDATA[*/\r | |
14 | asciidoc.install();\r | |
15 | /*]]>*/\r | |
16 | </script>\r | |
17 | <link rel="stylesheet" href="./mlton.css" type="text/css">\r | |
18 | </head>\r | |
19 | <body class="article">\r | |
20 | <div id="banner">\r | |
21 | <div id="banner-home">\r | |
22 | <a href="./Home">MLton 20180207</a>\r | |
23 | </div>\r | |
24 | </div>\r | |
25 | <div id="header">\r | |
26 | <h1>Unicode</h1>\r | |
27 | </div>\r | |
28 | <div id="content">\r | |
29 | <div class="sect1">\r | |
30 | <h2 id="_support_in_the_definition_of_standard_ml">Support in The Definition of Standard ML</h2>\r | |
31 | <div class="sectionbody">\r | |
32 | <div class="paragraph"><p>There is no real support for Unicode in the\r | |
33 | <a href="DefinitionOfStandardML">Definition</a>; there are only a few throw-away\r | |
34 | sentences along the lines of "the characters with numbers 0 to 127\r | |
35 | coincide with the ASCII character set."</p></div>\r | |
36 | </div>\r | |
37 | </div>\r | |
38 | <div class="sect1">\r | |
39 | <h2 id="_support_in_the_standard_ml_basis_library">Support in The Standard ML Basis Library</h2>\r | |
40 | <div class="sectionbody">\r | |
41 | <div class="paragraph"><p>Neither is there real support for Unicode in the <a href="BasisLibrary">Basis\r | |
42 | Library</a>. The general consensus (which includes the opinions of the\r | |
43 | editors of the Basis Library) is that the <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span>\r | |
44 | structures are insufficient for the purposes of Unicode. There is no\r | |
45 | <span class="monospaced">LargeChar</span> structure, which in itself is a deficiency, since a\r | |
46 | programmer can not program against the largest supported character\r | |
47 | size.</p></div>\r | |
48 | </div>\r | |
49 | </div>\r | |
50 | <div class="sect1">\r | |
51 | <h2 id="_current_support_in_mlton">Current Support in MLton</h2>\r | |
52 | <div class="sectionbody">\r | |
53 | <div class="paragraph"><p>MLton, as a minor extension over the Definition, supports UTF-8 byte\r | |
54 | sequences in text constants. This feature enables "UTF-8 convenience"\r | |
55 | (but not comprehensive Unicode support); in particular, it allows one\r | |
56 | to copy text from a browser and paste it into a string constant in an\r | |
57 | editor and, furthermore, if the string is printed to a terminal, then\r | |
58 | will (typically) appear as the original text. See the\r | |
59 | <a href="SuccessorML#ExtendedTextConsts">extended text constants feature of\r | |
60 | Successor ML</a> for more details.</p></div>\r | |
61 | <div class="paragraph"><p>MLton, also as a minor extension over the Definition, supports\r | |
62 | <span class="monospaced">\Uxxxxxxxx</span> numeric escapes in text constants and has preliminary\r | |
63 | internal support for 16- and 32-bit characters and strings.</p></div>\r | |
64 | <div class="paragraph"><p>MLton provides <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span> structures, corresponding\r | |
65 | to 32-bit characters and strings, respectively.</p></div>\r | |
66 | </div>\r | |
67 | </div>\r | |
68 | <div class="sect1">\r | |
69 | <h2 id="_questions_and_discussions">Questions and Discussions</h2>\r | |
70 | <div class="sectionbody">\r | |
71 | <div class="paragraph"><p>There are periodic flurries of questions and discussion about Unicode\r | |
72 | in MLton/SML. In December 2004, there was a discussion that led to\r | |
73 | some seemingly sound design decisions. The discussion started at:</p></div>\r | |
74 | <div class="ulist"><ul>\r | |
75 | <li>\r | |
76 | <p>\r | |
77 | <a href="http://www.mlton.org/pipermail/mlton/2004-December/026396.html">http://www.mlton.org/pipermail/mlton/2004-December/026396.html</a>\r | |
78 | </p>\r | |
79 | </li>\r | |
80 | </ul></div>\r | |
81 | <div class="paragraph"><p>There is a good summary of points at:</p></div>\r | |
82 | <div class="ulist"><ul>\r | |
83 | <li>\r | |
84 | <p>\r | |
85 | <a href="http://www.mlton.org/pipermail/mlton/2004-December/026440.html">http://www.mlton.org/pipermail/mlton/2004-December/026440.html</a>\r | |
86 | </p>\r | |
87 | </li>\r | |
88 | </ul></div>\r | |
89 | <div class="paragraph"><p>In November 2005, there was a followup discussion and the beginning of\r | |
90 | some coding.</p></div>\r | |
91 | <div class="ulist"><ul>\r | |
92 | <li>\r | |
93 | <p>\r | |
94 | <a href="http://www.mlton.org/pipermail/mlton/2005-November/028300.html">http://www.mlton.org/pipermail/mlton/2005-November/028300.html</a>\r | |
95 | </p>\r | |
96 | </li>\r | |
97 | </ul></div>\r | |
98 | </div>\r | |
99 | </div>\r | |
100 | <div class="sect1">\r | |
101 | <h2 id="_also_see">Also see</h2>\r | |
102 | <div class="sectionbody">\r | |
103 | <div class="paragraph"><p>The <a href="fxp">fxp</a> XML parser has some support for dealing with Unicode\r | |
104 | documents.</p></div>\r | |
105 | </div>\r | |
106 | </div>\r | |
107 | </div>\r | |
108 | <div id="footnotes"><hr></div>\r | |
109 | <div id="footer">\r | |
110 | <div id="footer-text">\r | |
111 | </div>\r | |
112 | <div id="footer-badges">\r | |
113 | </div>\r | |
114 | </div>\r | |
115 | </body>\r | |
116 | </html>\r |