Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / localhost / Unicode
1 <!DOCTYPE html>
2 <html lang="en">
3 <head>
4 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
5 <meta name="generator" content="AsciiDoc 8.6.9">
6 <title>Unicode</title>
7 <link rel="stylesheet" href="./asciidoc.css" type="text/css">
8 <link rel="stylesheet" href="./pygments.css" type="text/css">
9
10
11 <script type="text/javascript" src="./asciidoc.js"></script>
12 <script type="text/javascript">
13 /*<![CDATA[*/
14 asciidoc.install();
15 /*]]>*/
16 </script>
17 <link rel="stylesheet" href="./mlton.css" type="text/css">
18 </head>
19 <body class="article">
20 <div id="banner">
21 <div id="banner-home">
22 <a href="./Home">MLton 20180207</a>
23 </div>
24 </div>
25 <div id="header">
26 <h1>Unicode</h1>
27 </div>
28 <div id="content">
29 <div class="sect1">
30 <h2 id="_support_in_the_definition_of_standard_ml">Support in The Definition of Standard ML</h2>
31 <div class="sectionbody">
32 <div class="paragraph"><p>There is no real support for Unicode in the
33 <a href="DefinitionOfStandardML">Definition</a>; there are only a few throw-away
34 sentences along the lines of "the characters with numbers 0 to 127
35 coincide with the ASCII character set."</p></div>
36 </div>
37 </div>
38 <div class="sect1">
39 <h2 id="_support_in_the_standard_ml_basis_library">Support in The Standard ML Basis Library</h2>
40 <div class="sectionbody">
41 <div class="paragraph"><p>Neither is there real support for Unicode in the <a href="BasisLibrary">Basis
42 Library</a>. The general consensus (which includes the opinions of the
43 editors of the Basis Library) is that the <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span>
44 structures are insufficient for the purposes of Unicode. There is no
45 <span class="monospaced">LargeChar</span> structure, which in itself is a deficiency, since a
46 programmer can not program against the largest supported character
47 size.</p></div>
48 </div>
49 </div>
50 <div class="sect1">
51 <h2 id="_current_support_in_mlton">Current Support in MLton</h2>
52 <div class="sectionbody">
53 <div class="paragraph"><p>MLton, as a minor extension over the Definition, supports UTF-8 byte
54 sequences in text constants. This feature enables "UTF-8 convenience"
55 (but not comprehensive Unicode support); in particular, it allows one
56 to copy text from a browser and paste it into a string constant in an
57 editor and, furthermore, if the string is printed to a terminal, then
58 will (typically) appear as the original text. See the
59 <a href="SuccessorML#ExtendedTextConsts">extended text constants feature of
60 Successor ML</a> for more details.</p></div>
61 <div class="paragraph"><p>MLton, also as a minor extension over the Definition, supports
62 <span class="monospaced">\Uxxxxxxxx</span> numeric escapes in text constants and has preliminary
63 internal support for 16- and 32-bit characters and strings.</p></div>
64 <div class="paragraph"><p>MLton provides <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span> structures, corresponding
65 to 32-bit characters and strings, respectively.</p></div>
66 </div>
67 </div>
68 <div class="sect1">
69 <h2 id="_questions_and_discussions">Questions and Discussions</h2>
70 <div class="sectionbody">
71 <div class="paragraph"><p>There are periodic flurries of questions and discussion about Unicode
72 in MLton/SML. In December 2004, there was a discussion that led to
73 some seemingly sound design decisions. The discussion started at:</p></div>
74 <div class="ulist"><ul>
75 <li>
76 <p>
77 <a href="http://www.mlton.org/pipermail/mlton/2004-December/026396.html">http://www.mlton.org/pipermail/mlton/2004-December/026396.html</a>
78 </p>
79 </li>
80 </ul></div>
81 <div class="paragraph"><p>There is a good summary of points at:</p></div>
82 <div class="ulist"><ul>
83 <li>
84 <p>
85 <a href="http://www.mlton.org/pipermail/mlton/2004-December/026440.html">http://www.mlton.org/pipermail/mlton/2004-December/026440.html</a>
86 </p>
87 </li>
88 </ul></div>
89 <div class="paragraph"><p>In November 2005, there was a followup discussion and the beginning of
90 some coding.</p></div>
91 <div class="ulist"><ul>
92 <li>
93 <p>
94 <a href="http://www.mlton.org/pipermail/mlton/2005-November/028300.html">http://www.mlton.org/pipermail/mlton/2005-November/028300.html</a>
95 </p>
96 </li>
97 </ul></div>
98 </div>
99 </div>
100 <div class="sect1">
101 <h2 id="_also_see">Also see</h2>
102 <div class="sectionbody">
103 <div class="paragraph"><p>The <a href="fxp">fxp</a> XML parser has some support for dealing with Unicode
104 documents.</p></div>
105 </div>
106 </div>
107 </div>
108 <div id="footnotes"><hr></div>
109 <div id="footer">
110 <div id="footer-text">
111 </div>
112 <div id="footer-badges">
113 </div>
114 </div>
115 </body>
116 </html>