Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / guide / src / SuccessorML.adoc
CommitLineData
7f918cf1
CE
1SuccessorML
2===========
3
4The purpose of http://sml-family.org/successor-ml/[successor ML], or
5sML for short, is to provide a vehicle for the continued evolution of
6ML, using Standard ML as a starting point. The intention is for
7successor ML to be a living, evolving dialect of ML that is responsive
8to community needs and advances in language design, implementation,
9and semantics.
10
11== SuccessorML Features in MLton ==
12
13The following SuccessorML features have been implemented in MLton.
14The features are disabled by default, and may be enabled utilizing the
15feature's corresponding <:MLBasisAnnotations:ML Basis annotation>
16which is listed directly after the feature name. In addition, the
17+allowSuccessorML {false|true}+ annotation can be used to
18simultaneously enable all of the features.
19
20* <!Anchor(DoDecls)>
21`do` Declarations: +allowDoDecls {false|true}+
22+
23Allow a +do _exp_+ declaration form, which evaluates _exp_ for its
24side effects. The following example uses a `do` declaration:
25+
26[source,sml]
27----
28do print "Hello world.\n"
29----
30+
31and is equivalent to:
32+
33[source,sml]
34----
35val () = print "Hello world.\n"
36----
37
38* <!Anchor(ExtendedConsts)>
39Extended Constants: +allowExtendedConsts {false|true}+
40+
41--
42Allow or disallow all of the extended constants features. This is a
43proxy for all of the following annotations.
44
45** <!Anchor(ExtendedNumConsts)>
46Extended Numeric Constants: +allowExtendedNumConsts {false|true}+
47+
48Allow underscores as a separator in numeric constants and allow binary
49integer and word constants.
50+
51Underscores in a numeric constant must occur between digits and
52consecutive underscores are allowed.
53+
54Binary integer constants use the prefix +0b+ and binary word constants
55use the prefix +0wb+.
56+
57The following example uses extended numeric constants (although it may
58be incorrectly syntax highlighted):
59+
60[source,sml]
61----
62val pb = 0b10101
63val nb = ~0b10_10_10
64val wb = 0wb1010
65val i = 4__327__829
66val r = 6.022_140_9e23
67----
68
69** <!Anchor(ExtendedTextConsts)> Extended Text Constants: +allowExtendedTextConsts {false|true}+
70+
71Allow characters with integer codes &ge; 128 and &le; 247 that
72correspond to syntactically well-formed UTF-8 byte sequences in text
73constants.
74+
75////
76and allow `\Uxxxxxxxx` numeric escapes in text constants.
77////
78+
79Any 1, 2, 3, or 4 byte sequence that can be properly decoded to a
80binary number according to the UTF-8 encoding/decoding scheme is
81allowed in a text constant (but invalid sequences are not explicitly
82rejected) and denotes the corresponding sequence of characters with
83integer codes &ge; 128 and &le; 247. This feature enables "UTF-8
84convenience" (but not comprehensive Unicode support); in particular,
85it allows one to copy text from a browser and paste it into a string
86constant in an editor and, furthermore, if the string is printed to a
87terminal, then will (typically) appear as the original text. The
88following example uses UTF-8 byte sequences:
89+
90[source,sml]
91----
92val s1 : String.string = "\240\159\130\161"
93val s2 : String.string = "🂡"
94val _ = print ("s1 --> " ^ s1 ^ "\n")
95val _ = print ("s2 --> " ^ s2 ^ "\n")
96val _ = print ("String.size s1 --> " ^ Int.toString (String.size s1) ^ "\n")
97val _ = print ("String.size s2 --> " ^ Int.toString (String.size s2) ^ "\n")
98val _ = print ("s1 = s2 --> " ^ Bool.toString (s1 = s2) ^ "\n")
99----
100+
101and, when compiled and executed, will display:
102+
103----
104s1 --> 🂡
105s2 --> 🂡
106String.size s1 --> 4
107String.size s2 --> 4
108s1 = s2 --> true
109----
110+
111Note that the `String.string` type corresponds to any sequence of
1128-bit values, including invalid UTF-8 sequences; hence the string
113constant `"\192"` (a UTF-8 leading byte with no UTF-8 continuation
114byte) is valid. Similarly, the `Char.char` type corresponds to a
115single 8-bit value; hence the char constant `#"α"` is not valid, as
116the text constant `"α"` denotes a sequence of two 8-bit values.
117+
118////
119A `\Uxxxxxxxx` numeric escape denotes a single character with the
120hexadecimal integer code `xxxxxxxx`. Such numeric escapes are not
121necessary for the `String.string` and `Char.char` types, since
122characters in such text constants must have integer codes &le; 255 and
123the `\ddd` and `\uxxxx` numeric escapes suffice. However, the
124`\Uxxxxxxxx` numeric escapes are useful for the `WideString.string`
125and `WideChar.char` types, since characters in such text constants may
126have integer codes &le; 2^32^-1. The following uses a `\Uxxxxxxxx`
127numeric escape (although it may be incorrectly syntax highlighted):
128+
129[source,sml]
130----
131val s1 : WideString.string = "\U0001F0A1" (* 'PLAYING CARD ACE OF SPADES' (U+1F0A1) *)
132val _ = print ("WideString.size s1 --> " ^ Int.toString (WideString.size s1) ^ "\n")
133----
134+
135and, when compiled and executed, will display:
136+
137----
138WideString.size s1 --> 1
139----
140+
141Note that the `WideString.string` type corresponds to any sequence of
14232-bit values, including invalid Unicode code points; hence, the
143string constants `"\U001F0000"` and `"\U40000000"` are valid (but the
144corresponding integer codes are not valid Unicode code points).
145Similarly, the `WideChar.char` type corresponds to a single 32-bit
146value.
147+
148Finally, note that a UTF-8 byte sequence in a `WideString.string` or
149`WideChar.char` text constant does not denote a single 32-bit value,
150but rather a sequence of 32-bit values &ge; 128 and &le; 247. The
151following example uses both UTF-8 byte sequences and `\Uxxxxxxxx`
152numeric escapes (although it may be incorrectly syntax highlighted):
153+
154[source,sml]
155----
156val s1 : WideString.string = "\U0001F0A1" (* 'PLAYING CARD ACE OF SPADES' (U+1F0A1) *)
157val s2 : WideString.string = "🂡"
158val s3 : WideString.string = "\U000000F0\U0000009F\U00000082\U000000A1"
159val _ = print ("WideString.size s1 --> " ^ Int.toString (WideString.size s1) ^ "\n")
160val _ = print ("WideString.size s2 --> " ^ Int.toString (WideString.size s2) ^ "\n")
161val _ = print ("WideString.size s3 --> " ^ Int.toString (WideString.size s3) ^ "\n")
162val _ = print ("s1 = s2 --> " ^ Bool.toString (s1 = s2) ^ "\n")
163val _ = print ("s2 = s3 --> " ^ Bool.toString (s2 = s3) ^ "\n")
164----
165+
166and, when compiled and executed, will display:
167+
168----
169WideString.size s1 --> 1
170WideString.size s2 --> 4
171WideString.size s3 --> 4
172s1 = s2 --> false
173s2 = s3 --> true
174----
175////
176--
177
178* <!Anchor(LineComments)>
179Line Comments: +allowLineComments {false|true}+
180+
181Allow line comments beginning with the token ++(*)++. The following
182example uses a line comment:
183+
184[source,sml]
185----
186(*) This is a line comment
187----
188+
189Line comments properly nest within block comments. The following
190example uses line comments nested within block comments:
191+
192[source,sml]
193----
194(*
195val x = 4 (*) This is a line comment
196*)
197
198(*
199val y = 5 (*) This is a line comment *)
200*)
201----
202
203* <!Anchor(OptBar)>
204Optional Pattern Bars: +allowOptBar {false|true}+
205+
206Allow a bar to appear before the first match rule of a `case`, `fn`,
207or `handle` expression, allow a bar to appear before the first
208function-value binding of a `fun` declaration, and allow a bar to
209appear before the first constructor binding or description of a
210`datatype` declaration or specification. The following example uses
211leading bars in a `datatype` declaration, a `fun` declaration, and a
212`case` expression:
213+
214[source,sml]
215----
216datatype t =
217 | C
218 | B
219 | A
220
221fun
222 | f NONE = 0
223 | f (SOME t) =
224 (case t of
225 | A => 1
226 | B => 2
227 | C => 3)
228----
229+
230By eliminating the special case of the first element, this feature
231allows for simpler refactoring (e.g., sorting the lines of the
232`datatype` declaration's constructor bindings to put the constructors
233in alphabetical order).
234
235* <!Anchor(OptSemicolon)>
236Optional Semicolons: +allowOptSemicolon {false|true}+
237+
238Allow a semicolon to appear after the last expression in a sequence or
239`let`-body expression. The following example uses a trailing
240semicolon in the body of a `let` expression:
241+
242[source,sml]
243----
244fun h z =
245 let
246 val x = 3 * z
247 in
248 f x ;
249 g x ;
250 end
251----
252+
253By eliminating the special case of the last element, this feature
254allows for simpler refactoring.
255
256* <!Anchor(OrPats)>
257Disjunctive (Or) Patterns: +allowOrPats {false|true}+
258+
259Allow disjunctive (a.k.a., "or") patterns of the form +_pat~1~_ |
260_pat~2~_+, which matches a value that matches either +_pat~1~_+ or
261+_pat~2~_+. Disjunctive patterns have lower precedence than `as`
262patterns and constraint patterns, much as `orelse` expressions have
263lower precedence than `andalso` expressions and constraint
264expressions. Both sub-patterns of a disjunctive pattern must bind the
265same variables with the same types. The following example uses
266disjunctive patterns:
267+
268[source,sml]
269----
270datatype t = A of int | B of int | C of int | D of int * int | E of int * int
271
272fun f t =
273 case t of
274 A x | B x | C x => x + 1
275 | D (x, _) | E (_, x) => x * 2
276----
277
278* <!Anchor(RecordPunExps)>
279Record Punning Expressions: +allowRecordPunExps {false|true}+
280+
281Allow record punning expressions, whereby an identifier +_vid_+ as an
282expression row in a record expression denotes the expression row
283+_vid_ = _vid_+ (i.e., treating a label as a variable). The following
284example uses record punning expressions (and also record punning
285patterns):
286+
287[source,sml]
288----
289fun incB r =
290 case r of {a, b, c} => {a, b = b + 1, c}
291----
292+
293and is equivalent to:
294+
295[source,sml]
296----
297fun incB r =
298 case r of {a = a, b = b, c = c} => {a = a, b = b + 1, c = c}
299----
300
301* <!Anchor(SigWithtype)>
302`withtype` in Signatures: +allowSigWithtype {false|true}+
303+
304Allow `withtype` to modify a `datatype` specification in a signature.
305The following example uses `withtype` in a signature (and also
306`withtype` in a declaration):
307+
308[source,sml]
309----
310signature STREAM =
311 sig
312 datatype 'a u = Nil | Cons of 'a * 'a t
313 withtype 'a t = unit -> 'a u
314 end
315structure Stream : STREAM =
316 struct
317 datatype 'a u = Nil | Cons of 'a * 'a t
318 withtype 'a t = unit -> 'a u
319 end
320----
321+
322and is equivalent to:
323+
324[source,sml]
325----
326signature STREAM =
327 sig
328 datatype 'a u = Nil | Cons of 'a * (unit -> 'a u)
329 type 'a t = unit -> 'a u
330 end
331structure Stream : STREAM =
332 struct
333 datatype 'a u = Nil | Cons of 'a * (unit -> 'a u)
334 type 'a t = unit -> 'a u
335 end
336----
337
338* <!Anchor(VectorExpsAndPats)>
339Vector Expressions and Patterns: +allowVectorExpsAndPats {false|true}+
340+
341--
342Allow or disallow vector expressions and vector patterns. This is a
343proxy for all of the following annotations.
344
345** <!Anchor(VectorExps)>
346Vector Expressions: +allowVectorExps {false|true}+
347+
348Allow vector expressions of the form +#[_exp~0~_, _exp~1~_, ..., _exp~n-1~_]+ (where _n ≥ 0_). The expression has type +_τ_ vector+ when each expression _exp~i~_ has type +_τ_+.
349
350** <!Anchor(VectorPats)>
351Vector Patterns: +allowVectorPats {false|true}+
352+
353Allow vector patterns of the form +#[_pat~0~_, _pat~1~_, ..., _pat~n-1~_]+ (where _n ≥ 0_). The pattern matches values of type +_τ_ vector+ when each pattern _pat~i~_ matches values of type +_τ_+.
354--