| 1 | <!DOCTYPE html>\r |
| 2 | <html lang="en">\r |
| 3 | <head>\r |
| 4 | <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\r |
| 5 | <meta name="generator" content="AsciiDoc 8.6.9">\r |
| 6 | <title>Unicode</title>\r |
| 7 | <link rel="stylesheet" href="./asciidoc.css" type="text/css">\r |
| 8 | <link rel="stylesheet" href="./pygments.css" type="text/css">\r |
| 9 | \r |
| 10 | \r |
| 11 | <script type="text/javascript" src="./asciidoc.js"></script>\r |
| 12 | <script type="text/javascript">\r |
| 13 | /*<![CDATA[*/\r |
| 14 | asciidoc.install();\r |
| 15 | /*]]>*/\r |
| 16 | </script>\r |
| 17 | <link rel="stylesheet" href="./mlton.css" type="text/css">\r |
| 18 | </head>\r |
| 19 | <body class="article">\r |
| 20 | <div id="banner">\r |
| 21 | <div id="banner-home">\r |
| 22 | <a href="./Home">MLton 20180207</a>\r |
| 23 | </div>\r |
| 24 | </div>\r |
| 25 | <div id="header">\r |
| 26 | <h1>Unicode</h1>\r |
| 27 | </div>\r |
| 28 | <div id="content">\r |
| 29 | <div class="sect1">\r |
| 30 | <h2 id="_support_in_the_definition_of_standard_ml">Support in The Definition of Standard ML</h2>\r |
| 31 | <div class="sectionbody">\r |
| 32 | <div class="paragraph"><p>There is no real support for Unicode in the\r |
| 33 | <a href="DefinitionOfStandardML">Definition</a>; there are only a few throw-away\r |
| 34 | sentences along the lines of "the characters with numbers 0 to 127\r |
| 35 | coincide with the ASCII character set."</p></div>\r |
| 36 | </div>\r |
| 37 | </div>\r |
| 38 | <div class="sect1">\r |
| 39 | <h2 id="_support_in_the_standard_ml_basis_library">Support in The Standard ML Basis Library</h2>\r |
| 40 | <div class="sectionbody">\r |
| 41 | <div class="paragraph"><p>Neither is there real support for Unicode in the <a href="BasisLibrary">Basis\r |
| 42 | Library</a>. The general consensus (which includes the opinions of the\r |
| 43 | editors of the Basis Library) is that the <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span>\r |
| 44 | structures are insufficient for the purposes of Unicode. There is no\r |
| 45 | <span class="monospaced">LargeChar</span> structure, which in itself is a deficiency, since a\r |
| 46 | programmer can not program against the largest supported character\r |
| 47 | size.</p></div>\r |
| 48 | </div>\r |
| 49 | </div>\r |
| 50 | <div class="sect1">\r |
| 51 | <h2 id="_current_support_in_mlton">Current Support in MLton</h2>\r |
| 52 | <div class="sectionbody">\r |
| 53 | <div class="paragraph"><p>MLton, as a minor extension over the Definition, supports UTF-8 byte\r |
| 54 | sequences in text constants. This feature enables "UTF-8 convenience"\r |
| 55 | (but not comprehensive Unicode support); in particular, it allows one\r |
| 56 | to copy text from a browser and paste it into a string constant in an\r |
| 57 | editor and, furthermore, if the string is printed to a terminal, then\r |
| 58 | will (typically) appear as the original text. See the\r |
| 59 | <a href="SuccessorML#ExtendedTextConsts">extended text constants feature of\r |
| 60 | Successor ML</a> for more details.</p></div>\r |
| 61 | <div class="paragraph"><p>MLton, also as a minor extension over the Definition, supports\r |
| 62 | <span class="monospaced">\Uxxxxxxxx</span> numeric escapes in text constants and has preliminary\r |
| 63 | internal support for 16- and 32-bit characters and strings.</p></div>\r |
| 64 | <div class="paragraph"><p>MLton provides <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span> structures, corresponding\r |
| 65 | to 32-bit characters and strings, respectively.</p></div>\r |
| 66 | </div>\r |
| 67 | </div>\r |
| 68 | <div class="sect1">\r |
| 69 | <h2 id="_questions_and_discussions">Questions and Discussions</h2>\r |
| 70 | <div class="sectionbody">\r |
| 71 | <div class="paragraph"><p>There are periodic flurries of questions and discussion about Unicode\r |
| 72 | in MLton/SML. In December 2004, there was a discussion that led to\r |
| 73 | some seemingly sound design decisions. The discussion started at:</p></div>\r |
| 74 | <div class="ulist"><ul>\r |
| 75 | <li>\r |
| 76 | <p>\r |
| 77 | <a href="http://www.mlton.org/pipermail/mlton/2004-December/026396.html">http://www.mlton.org/pipermail/mlton/2004-December/026396.html</a>\r |
| 78 | </p>\r |
| 79 | </li>\r |
| 80 | </ul></div>\r |
| 81 | <div class="paragraph"><p>There is a good summary of points at:</p></div>\r |
| 82 | <div class="ulist"><ul>\r |
| 83 | <li>\r |
| 84 | <p>\r |
| 85 | <a href="http://www.mlton.org/pipermail/mlton/2004-December/026440.html">http://www.mlton.org/pipermail/mlton/2004-December/026440.html</a>\r |
| 86 | </p>\r |
| 87 | </li>\r |
| 88 | </ul></div>\r |
| 89 | <div class="paragraph"><p>In November 2005, there was a followup discussion and the beginning of\r |
| 90 | some coding.</p></div>\r |
| 91 | <div class="ulist"><ul>\r |
| 92 | <li>\r |
| 93 | <p>\r |
| 94 | <a href="http://www.mlton.org/pipermail/mlton/2005-November/028300.html">http://www.mlton.org/pipermail/mlton/2005-November/028300.html</a>\r |
| 95 | </p>\r |
| 96 | </li>\r |
| 97 | </ul></div>\r |
| 98 | </div>\r |
| 99 | </div>\r |
| 100 | <div class="sect1">\r |
| 101 | <h2 id="_also_see">Also see</h2>\r |
| 102 | <div class="sectionbody">\r |
| 103 | <div class="paragraph"><p>The <a href="fxp">fxp</a> XML parser has some support for dealing with Unicode\r |
| 104 | documents.</p></div>\r |
| 105 | </div>\r |
| 106 | </div>\r |
| 107 | </div>\r |
| 108 | <div id="footnotes"><hr></div>\r |
| 109 | <div id="footer">\r |
| 110 | <div id="footer-text">\r |
| 111 | </div>\r |
| 112 | <div id="footer-badges">\r |
| 113 | </div>\r |
| 114 | </div>\r |
| 115 | </body>\r |
| 116 | </html>\r |