4 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
5 <meta name="generator" content="AsciiDoc 8.6.9">
7 <link rel="stylesheet" href="./asciidoc.css" type="text/css">
8 <link rel="stylesheet" href="./pygments.css" type="text/css">
11 <script type="text/javascript" src="./asciidoc.js"></script>
12 <script type="text/javascript">
17 <link rel="stylesheet" href="./mlton.css" type="text/css">
19 <body class="article">
21 <div id="banner-home">
22 <a href="./Home">MLton 20180207</a>
30 <h2 id="_support_in_the_definition_of_standard_ml">Support in The Definition of Standard ML</h2>
31 <div class="sectionbody">
32 <div class="paragraph"><p>There is no real support for Unicode in the
33 <a href="DefinitionOfStandardML">Definition</a>; there are only a few throw-away
34 sentences along the lines of "the characters with numbers 0 to 127
35 coincide with the ASCII character set."</p></div>
39 <h2 id="_support_in_the_standard_ml_basis_library">Support in The Standard ML Basis Library</h2>
40 <div class="sectionbody">
41 <div class="paragraph"><p>Neither is there real support for Unicode in the <a href="BasisLibrary">Basis
42 Library</a>. The general consensus (which includes the opinions of the
43 editors of the Basis Library) is that the <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span>
44 structures are insufficient for the purposes of Unicode. There is no
45 <span class="monospaced">LargeChar</span> structure, which in itself is a deficiency, since a
46 programmer can not program against the largest supported character
51 <h2 id="_current_support_in_mlton">Current Support in MLton</h2>
52 <div class="sectionbody">
53 <div class="paragraph"><p>MLton, as a minor extension over the Definition, supports UTF-8 byte
54 sequences in text constants. This feature enables "UTF-8 convenience"
55 (but not comprehensive Unicode support); in particular, it allows one
56 to copy text from a browser and paste it into a string constant in an
57 editor and, furthermore, if the string is printed to a terminal, then
58 will (typically) appear as the original text. See the
59 <a href="SuccessorML#ExtendedTextConsts">extended text constants feature of
60 Successor ML</a> for more details.</p></div>
61 <div class="paragraph"><p>MLton, also as a minor extension over the Definition, supports
62 <span class="monospaced">\Uxxxxxxxx</span> numeric escapes in text constants and has preliminary
63 internal support for 16- and 32-bit characters and strings.</p></div>
64 <div class="paragraph"><p>MLton provides <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span> structures, corresponding
65 to 32-bit characters and strings, respectively.</p></div>
69 <h2 id="_questions_and_discussions">Questions and Discussions</h2>
70 <div class="sectionbody">
71 <div class="paragraph"><p>There are periodic flurries of questions and discussion about Unicode
72 in MLton/SML. In December 2004, there was a discussion that led to
73 some seemingly sound design decisions. The discussion started at:</p></div>
74 <div class="ulist"><ul>
77 <a href="http://www.mlton.org/pipermail/mlton/2004-December/026396.html">http://www.mlton.org/pipermail/mlton/2004-December/026396.html</a>
81 <div class="paragraph"><p>There is a good summary of points at:</p></div>
82 <div class="ulist"><ul>
85 <a href="http://www.mlton.org/pipermail/mlton/2004-December/026440.html">http://www.mlton.org/pipermail/mlton/2004-December/026440.html</a>
89 <div class="paragraph"><p>In November 2005, there was a followup discussion and the beginning of
90 some coding.</p></div>
91 <div class="ulist"><ul>
94 <a href="http://www.mlton.org/pipermail/mlton/2005-November/028300.html">http://www.mlton.org/pipermail/mlton/2005-November/028300.html</a>
101 <h2 id="_also_see">Also see</h2>
102 <div class="sectionbody">
103 <div class="paragraph"><p>The <a href="fxp">fxp</a> XML parser has some support for dealing with Unicode
108 <div id="footnotes"><hr></div>
110 <div id="footer-text">
112 <div id="footer-badges">