[hcoop/debian/mlton.git] / doc / guide / localhost / Unicode

<!DOCTYPE html>\r
<html lang="en">\r
<head>\r
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\r
<meta name="generator" content="AsciiDoc 8.6.9">\r
<title>Unicode</title>\r
<link rel="stylesheet" href="./asciidoc.css" type="text/css">\r
<link rel="stylesheet" href="./pygments.css" type="text/css">\r
\r
\r
<script type="text/javascript" src="./asciidoc.js"></script>\r
<script type="text/javascript">\r
/*<![CDATA[*/\r
asciidoc.install();\r
/*]]>*/\r
</script>\r
<link rel="stylesheet" href="./mlton.css" type="text/css">\r
</head>\r
<body class="article">\r
<div id="banner">\r
<div id="banner-home">\r
<a href="./Home">MLton 20180207</a>\r
</div>\r
</div>\r
<div id="header">\r
<h1>Unicode</h1>\r
</div>\r
<div id="content">\r
<div class="sect1">\r
<h2 id="_support_in_the_definition_of_standard_ml">Support in The Definition of Standard ML</h2>\r
<div class="sectionbody">\r
<div class="paragraph"><p>There is no real support for Unicode in the\r
<a href="DefinitionOfStandardML">Definition</a>; there are only a few throw-away\r
sentences along the lines of "the characters with numbers 0 to 127\r
coincide with the ASCII character set."</p></div>\r
</div>\r
</div>\r
<div class="sect1">\r
<h2 id="_support_in_the_standard_ml_basis_library">Support in The Standard ML Basis Library</h2>\r
<div class="sectionbody">\r
<div class="paragraph"><p>Neither is there real support for Unicode in the <a href="BasisLibrary">Basis\r
Library</a>.  The general consensus (which includes the opinions of the\r
editors of the Basis Library) is that the <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span>\r
structures are insufficient for the purposes of Unicode.  There is no\r
<span class="monospaced">LargeChar</span> structure, which in itself is a deficiency, since a\r
programmer can not program against the largest supported character\r
size.</p></div>\r
</div>\r
</div>\r
<div class="sect1">\r
<h2 id="_current_support_in_mlton">Current Support in MLton</h2>\r
<div class="sectionbody">\r
<div class="paragraph"><p>MLton, as a minor extension over the Definition, supports UTF-8 byte\r
sequences in text constants.  This feature enables "UTF-8 convenience"\r
(but not comprehensive Unicode support); in particular, it allows one\r
to copy text from a browser and paste it into a string constant in an\r
editor and, furthermore, if the string is printed to a terminal, then\r
will (typically) appear as the original text.  See the\r
<a href="SuccessorML#ExtendedTextConsts">extended text constants feature of\r
Successor ML</a> for more details.</p></div>\r
<div class="paragraph"><p>MLton, also as a minor extension over the Definition, supports\r
<span class="monospaced">\Uxxxxxxxx</span> numeric escapes in text constants and has preliminary\r
internal support for 16- and 32-bit characters and strings.</p></div>\r
<div class="paragraph"><p>MLton provides <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span> structures, corresponding\r
to 32-bit characters and strings, respectively.</p></div>\r
</div>\r
</div>\r
<div class="sect1">\r
<h2 id="_questions_and_discussions">Questions and Discussions</h2>\r
<div class="sectionbody">\r
<div class="paragraph"><p>There are periodic flurries of questions and discussion about Unicode\r
in MLton/SML.  In December 2004, there was a discussion that led to\r
some seemingly sound design decisions.  The discussion started at:</p></div>\r
<div class="ulist"><ul>\r
<li>\r
<p>\r
<a href="http://www.mlton.org/pipermail/mlton/2004-December/026396.html">http://www.mlton.org/pipermail/mlton/2004-December/026396.html</a>\r
</p>\r
</li>\r
</ul></div>\r
<div class="paragraph"><p>There is a good summary of points at:</p></div>\r
<div class="ulist"><ul>\r
<li>\r
<p>\r
<a href="http://www.mlton.org/pipermail/mlton/2004-December/026440.html">http://www.mlton.org/pipermail/mlton/2004-December/026440.html</a>\r
</p>\r
</li>\r
</ul></div>\r
<div class="paragraph"><p>In November 2005, there was a followup discussion and the beginning of\r
some coding.</p></div>\r
<div class="ulist"><ul>\r
<li>\r
<p>\r
<a href="http://www.mlton.org/pipermail/mlton/2005-November/028300.html">http://www.mlton.org/pipermail/mlton/2005-November/028300.html</a>\r
</p>\r
</li>\r
</ul></div>\r
</div>\r
</div>\r
<div class="sect1">\r
<h2 id="_also_see">Also see</h2>\r
<div class="sectionbody">\r
<div class="paragraph"><p>The <a href="fxp">fxp</a> XML parser has some support for dealing with Unicode\r
documents.</p></div>\r
</div>\r
</div>\r
</div>\r
<div id="footnotes"><hr></div>\r
<div id="footer">\r
<div id="footer-text">\r
</div>\r
<div id="footer-badges">\r
</div>\r
</div>\r
</body>\r
</html>\r
Commit	Line	Data
7f918cf1 CE	1	<!DOCTYPE html>\r
	2	<html lang="en">\r
	3	<head>\r
	4	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\r
	5	<meta name="generator" content="AsciiDoc 8.6.9">\r
	6	<title>Unicode</title>\r
	7	<link rel="stylesheet" href="./asciidoc.css" type="text/css">\r
	8	<link rel="stylesheet" href="./pygments.css" type="text/css">\r
	9	\r
	10	\r
	11	<script type="text/javascript" src="./asciidoc.js"></script>\r
	12	<script type="text/javascript">\r
	13	/<![CDATA[/\r
	14	asciidoc.install();\r
	15	/]]>/\r
	16	</script>\r
	17	<link rel="stylesheet" href="./mlton.css" type="text/css">\r
	18	</head>\r
	19	<body class="article">\r
	20	<div id="banner">\r
	21	<div id="banner-home">\r
	22	<a href="./Home">MLton 20180207</a>\r
	23	</div>\r
	24	</div>\r
	25	<div id="header">\r
	26	<h1>Unicode</h1>\r
	27	</div>\r
	28	<div id="content">\r
	29	<div class="sect1">\r
	30	<h2 id="_support_in_the_definition_of_standard_ml">Support in The Definition of Standard ML</h2>\r
	31	<div class="sectionbody">\r
	32	<div class="paragraph"><p>There is no real support for Unicode in the\r
	33	<a href="DefinitionOfStandardML">Definition</a>; there are only a few throw-away\r
	34	sentences along the lines of "the characters with numbers 0 to 127\r
	35	coincide with the ASCII character set."</p></div>\r
	36	</div>\r
	37	</div>\r
	38	<div class="sect1">\r
	39	<h2 id="_support_in_the_standard_ml_basis_library">Support in The Standard ML Basis Library</h2>\r
	40	<div class="sectionbody">\r
	41	<div class="paragraph"><p>Neither is there real support for Unicode in the <a href="BasisLibrary">Basis\r
	42	Library</a>. The general consensus (which includes the opinions of the\r
	43	editors of the Basis Library) is that the <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span>\r
	44	structures are insufficient for the purposes of Unicode. There is no\r
	45	<span class="monospaced">LargeChar</span> structure, which in itself is a deficiency, since a\r
	46	programmer can not program against the largest supported character\r
	47	size.</p></div>\r
	48	</div>\r
	49	</div>\r
	50	<div class="sect1">\r
	51	<h2 id="_current_support_in_mlton">Current Support in MLton</h2>\r
	52	<div class="sectionbody">\r
	53	<div class="paragraph"><p>MLton, as a minor extension over the Definition, supports UTF-8 byte\r
	54	sequences in text constants. This feature enables "UTF-8 convenience"\r
	55	(but not comprehensive Unicode support); in particular, it allows one\r
	56	to copy text from a browser and paste it into a string constant in an\r
	57	editor and, furthermore, if the string is printed to a terminal, then\r
	58	will (typically) appear as the original text. See the\r
	59	<a href="SuccessorML#ExtendedTextConsts">extended text constants feature of\r
	60	Successor ML</a> for more details.</p></div>\r
	61	<div class="paragraph"><p>MLton, also as a minor extension over the Definition, supports\r
	62	<span class="monospaced">\Uxxxxxxxx</span> numeric escapes in text constants and has preliminary\r
	63	internal support for 16- and 32-bit characters and strings.</p></div>\r
	64	<div class="paragraph"><p>MLton provides <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span> structures, corresponding\r
65	to 32-bit characters and strings, respectively.</p></div>\r
66	</div>\r
67	</div>\r
68	<div class="sect1">\r
69	<h2 id="_questions_and_discussions">Questions and Discussions</h2>\r
70	<div class="sectionbody">\r
71	<div class="paragraph"><p>There are periodic flurries of questions and discussion about Unicode\r
72	in MLton/SML. In December 2004, there was a discussion that led to\r
73	some seemingly sound design decisions. The discussion started at:</p></div>\r
74	<div class="ulist"><ul>\r
75	<li>\r
76	<p>\r
77	<a href="http://www.mlton.org/pipermail/mlton/2004-December/026396.html">http://www.mlton.org/pipermail/mlton/2004-December/026396.html</a>\r
78	</p>\r
79	</li>\r
80	</ul></div>\r
81	<div class="paragraph"><p>There is a good summary of points at:</p></div>\r
82	<div class="ulist"><ul>\r
83	<li>\r
84	<p>\r
85	<a href="http://www.mlton.org/pipermail/mlton/2004-December/026440.html">http://www.mlton.org/pipermail/mlton/2004-December/026440.html</a>\r
86	</p>\r
87	</li>\r
88	</ul></div>\r
89	<div class="paragraph"><p>In November 2005, there was a followup discussion and the beginning of\r
90	some coding.</p></div>\r
91	<div class="ulist"><ul>\r
92	<li>\r
93	<p>\r
94	<a href="http://www.mlton.org/pipermail/mlton/2005-November/028300.html">http://www.mlton.org/pipermail/mlton/2005-November/028300.html</a>\r
95	</p>\r
96	</li>\r
97	</ul></div>\r
98	</div>\r
99	</div>\r
100	<div class="sect1">\r
101	<h2 id="_also_see">Also see</h2>\r
102	<div class="sectionbody">\r
103	<div class="paragraph"><p>The <a href="fxp">fxp</a> XML parser has some support for dealing with Unicode\r
104	documents.</p></div>\r
105	</div>\r
106	</div>\r
107	</div>\r
108	<div id="footnotes"><hr></div>\r
109	<div id="footer">\r
110	<div id="footer-text">\r
111	</div>\r
112	<div id="footer-badges">\r
113	</div>\r
114	</div>\r
115	</body>\r
116	</html>\r