Import Debian changes 20180207-1
[hcoop/debian/mlton.git] / doc / guide / localhost / Unicode
CommitLineData
7f918cf1
CE
1<!DOCTYPE html>\r
2<html lang="en">\r
3<head>\r
4<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\r
5<meta name="generator" content="AsciiDoc 8.6.9">\r
6<title>Unicode</title>\r
7<link rel="stylesheet" href="./asciidoc.css" type="text/css">\r
8<link rel="stylesheet" href="./pygments.css" type="text/css">\r
9\r
10\r
11<script type="text/javascript" src="./asciidoc.js"></script>\r
12<script type="text/javascript">\r
13/*<![CDATA[*/\r
14asciidoc.install();\r
15/*]]>*/\r
16</script>\r
17<link rel="stylesheet" href="./mlton.css" type="text/css">\r
18</head>\r
19<body class="article">\r
20<div id="banner">\r
21<div id="banner-home">\r
22<a href="./Home">MLton 20180207</a>\r
23</div>\r
24</div>\r
25<div id="header">\r
26<h1>Unicode</h1>\r
27</div>\r
28<div id="content">\r
29<div class="sect1">\r
30<h2 id="_support_in_the_definition_of_standard_ml">Support in The Definition of Standard ML</h2>\r
31<div class="sectionbody">\r
32<div class="paragraph"><p>There is no real support for Unicode in the\r
33<a href="DefinitionOfStandardML">Definition</a>; there are only a few throw-away\r
34sentences along the lines of "the characters with numbers 0 to 127\r
35coincide with the ASCII character set."</p></div>\r
36</div>\r
37</div>\r
38<div class="sect1">\r
39<h2 id="_support_in_the_standard_ml_basis_library">Support in The Standard ML Basis Library</h2>\r
40<div class="sectionbody">\r
41<div class="paragraph"><p>Neither is there real support for Unicode in the <a href="BasisLibrary">Basis\r
42Library</a>. The general consensus (which includes the opinions of the\r
43editors of the Basis Library) is that the <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span>\r
44structures are insufficient for the purposes of Unicode. There is no\r
45<span class="monospaced">LargeChar</span> structure, which in itself is a deficiency, since a\r
46programmer can not program against the largest supported character\r
47size.</p></div>\r
48</div>\r
49</div>\r
50<div class="sect1">\r
51<h2 id="_current_support_in_mlton">Current Support in MLton</h2>\r
52<div class="sectionbody">\r
53<div class="paragraph"><p>MLton, as a minor extension over the Definition, supports UTF-8 byte\r
54sequences in text constants. This feature enables "UTF-8 convenience"\r
55(but not comprehensive Unicode support); in particular, it allows one\r
56to copy text from a browser and paste it into a string constant in an\r
57editor and, furthermore, if the string is printed to a terminal, then\r
58will (typically) appear as the original text. See the\r
59<a href="SuccessorML#ExtendedTextConsts">extended text constants feature of\r
60Successor ML</a> for more details.</p></div>\r
61<div class="paragraph"><p>MLton, also as a minor extension over the Definition, supports\r
62<span class="monospaced">\Uxxxxxxxx</span> numeric escapes in text constants and has preliminary\r
63internal support for 16- and 32-bit characters and strings.</p></div>\r
64<div class="paragraph"><p>MLton provides <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span> structures, corresponding\r
65to 32-bit characters and strings, respectively.</p></div>\r
66</div>\r
67</div>\r
68<div class="sect1">\r
69<h2 id="_questions_and_discussions">Questions and Discussions</h2>\r
70<div class="sectionbody">\r
71<div class="paragraph"><p>There are periodic flurries of questions and discussion about Unicode\r
72in MLton/SML. In December 2004, there was a discussion that led to\r
73some seemingly sound design decisions. The discussion started at:</p></div>\r
74<div class="ulist"><ul>\r
75<li>\r
76<p>\r
77<a href="http://www.mlton.org/pipermail/mlton/2004-December/026396.html">http://www.mlton.org/pipermail/mlton/2004-December/026396.html</a>\r
78</p>\r
79</li>\r
80</ul></div>\r
81<div class="paragraph"><p>There is a good summary of points at:</p></div>\r
82<div class="ulist"><ul>\r
83<li>\r
84<p>\r
85<a href="http://www.mlton.org/pipermail/mlton/2004-December/026440.html">http://www.mlton.org/pipermail/mlton/2004-December/026440.html</a>\r
86</p>\r
87</li>\r
88</ul></div>\r
89<div class="paragraph"><p>In November 2005, there was a followup discussion and the beginning of\r
90some coding.</p></div>\r
91<div class="ulist"><ul>\r
92<li>\r
93<p>\r
94<a href="http://www.mlton.org/pipermail/mlton/2005-November/028300.html">http://www.mlton.org/pipermail/mlton/2005-November/028300.html</a>\r
95</p>\r
96</li>\r
97</ul></div>\r
98</div>\r
99</div>\r
100<div class="sect1">\r
101<h2 id="_also_see">Also see</h2>\r
102<div class="sectionbody">\r
103<div class="paragraph"><p>The <a href="fxp">fxp</a> XML parser has some support for dealing with Unicode\r
104documents.</p></div>\r
105</div>\r
106</div>\r
107</div>\r
108<div id="footnotes"><hr></div>\r
109<div id="footer">\r
110<div id="footer-text">\r
111</div>\r
112<div id="footer-badges">\r
113</div>\r
114</div>\r
115</body>\r
116</html>\r