Backport from sid to buster
[hcoop/debian/mlton.git] / doc / guide / src / SuccessorML.adoc
1 SuccessorML
2 ===========
3
4 The purpose of http://sml-family.org/successor-ml/[successor ML], or
5 sML for short, is to provide a vehicle for the continued evolution of
6 ML, using Standard ML as a starting point. The intention is for
7 successor ML to be a living, evolving dialect of ML that is responsive
8 to community needs and advances in language design, implementation,
9 and semantics.
10
11 == SuccessorML Features in MLton ==
12
13 The following SuccessorML features have been implemented in MLton.
14 The features are disabled by default, and may be enabled utilizing the
15 feature's corresponding <:MLBasisAnnotations:ML Basis annotation>
16 which is listed directly after the feature name. In addition, the
17 +allowSuccessorML {false|true}+ annotation can be used to
18 simultaneously enable all of the features.
19
20 * <!Anchor(DoDecls)>
21 `do` Declarations: +allowDoDecls {false|true}+
22 +
23 Allow a +do _exp_+ declaration form, which evaluates _exp_ for its
24 side effects. The following example uses a `do` declaration:
25 +
26 [source,sml]
27 ----
28 do print "Hello world.\n"
29 ----
30 +
31 and is equivalent to:
32 +
33 [source,sml]
34 ----
35 val () = print "Hello world.\n"
36 ----
37
38 * <!Anchor(ExtendedConsts)>
39 Extended Constants: +allowExtendedConsts {false|true}+
40 +
41 --
42 Allow or disallow all of the extended constants features. This is a
43 proxy for all of the following annotations.
44
45 ** <!Anchor(ExtendedNumConsts)>
46 Extended Numeric Constants: +allowExtendedNumConsts {false|true}+
47 +
48 Allow underscores as a separator in numeric constants and allow binary
49 integer and word constants.
50 +
51 Underscores in a numeric constant must occur between digits and
52 consecutive underscores are allowed.
53 +
54 Binary integer constants use the prefix +0b+ and binary word constants
55 use the prefix +0wb+.
56 +
57 The following example uses extended numeric constants (although it may
58 be incorrectly syntax highlighted):
59 +
60 [source,sml]
61 ----
62 val pb = 0b10101
63 val nb = ~0b10_10_10
64 val wb = 0wb1010
65 val i = 4__327__829
66 val r = 6.022_140_9e23
67 ----
68
69 ** <!Anchor(ExtendedTextConsts)> Extended Text Constants: +allowExtendedTextConsts {false|true}+
70 +
71 Allow characters with integer codes &ge; 128 and &le; 247 that
72 correspond to syntactically well-formed UTF-8 byte sequences in text
73 constants.
74 +
75 ////
76 and allow `\Uxxxxxxxx` numeric escapes in text constants.
77 ////
78 +
79 Any 1, 2, 3, or 4 byte sequence that can be properly decoded to a
80 binary number according to the UTF-8 encoding/decoding scheme is
81 allowed in a text constant (but invalid sequences are not explicitly
82 rejected) and denotes the corresponding sequence of characters with
83 integer codes &ge; 128 and &le; 247. This feature enables "UTF-8
84 convenience" (but not comprehensive Unicode support); in particular,
85 it allows one to copy text from a browser and paste it into a string
86 constant in an editor and, furthermore, if the string is printed to a
87 terminal, then will (typically) appear as the original text. The
88 following example uses UTF-8 byte sequences:
89 +
90 [source,sml]
91 ----
92 val s1 : String.string = "\240\159\130\161"
93 val s2 : String.string = "🂡"
94 val _ = print ("s1 --> " ^ s1 ^ "\n")
95 val _ = print ("s2 --> " ^ s2 ^ "\n")
96 val _ = print ("String.size s1 --> " ^ Int.toString (String.size s1) ^ "\n")
97 val _ = print ("String.size s2 --> " ^ Int.toString (String.size s2) ^ "\n")
98 val _ = print ("s1 = s2 --> " ^ Bool.toString (s1 = s2) ^ "\n")
99 ----
100 +
101 and, when compiled and executed, will display:
102 +
103 ----
104 s1 --> 🂡
105 s2 --> 🂡
106 String.size s1 --> 4
107 String.size s2 --> 4
108 s1 = s2 --> true
109 ----
110 +
111 Note that the `String.string` type corresponds to any sequence of
112 8-bit values, including invalid UTF-8 sequences; hence the string
113 constant `"\192"` (a UTF-8 leading byte with no UTF-8 continuation
114 byte) is valid. Similarly, the `Char.char` type corresponds to a
115 single 8-bit value; hence the char constant `#"α"` is not valid, as
116 the text constant `"α"` denotes a sequence of two 8-bit values.
117 +
118 ////
119 A `\Uxxxxxxxx` numeric escape denotes a single character with the
120 hexadecimal integer code `xxxxxxxx`. Such numeric escapes are not
121 necessary for the `String.string` and `Char.char` types, since
122 characters in such text constants must have integer codes &le; 255 and
123 the `\ddd` and `\uxxxx` numeric escapes suffice. However, the
124 `\Uxxxxxxxx` numeric escapes are useful for the `WideString.string`
125 and `WideChar.char` types, since characters in such text constants may
126 have integer codes &le; 2^32^-1. The following uses a `\Uxxxxxxxx`
127 numeric escape (although it may be incorrectly syntax highlighted):
128 +
129 [source,sml]
130 ----
131 val s1 : WideString.string = "\U0001F0A1" (* 'PLAYING CARD ACE OF SPADES' (U+1F0A1) *)
132 val _ = print ("WideString.size s1 --> " ^ Int.toString (WideString.size s1) ^ "\n")
133 ----
134 +
135 and, when compiled and executed, will display:
136 +
137 ----
138 WideString.size s1 --> 1
139 ----
140 +
141 Note that the `WideString.string` type corresponds to any sequence of
142 32-bit values, including invalid Unicode code points; hence, the
143 string constants `"\U001F0000"` and `"\U40000000"` are valid (but the
144 corresponding integer codes are not valid Unicode code points).
145 Similarly, the `WideChar.char` type corresponds to a single 32-bit
146 value.
147 +
148 Finally, note that a UTF-8 byte sequence in a `WideString.string` or
149 `WideChar.char` text constant does not denote a single 32-bit value,
150 but rather a sequence of 32-bit values &ge; 128 and &le; 247. The
151 following example uses both UTF-8 byte sequences and `\Uxxxxxxxx`
152 numeric escapes (although it may be incorrectly syntax highlighted):
153 +
154 [source,sml]
155 ----
156 val s1 : WideString.string = "\U0001F0A1" (* 'PLAYING CARD ACE OF SPADES' (U+1F0A1) *)
157 val s2 : WideString.string = "🂡"
158 val s3 : WideString.string = "\U000000F0\U0000009F\U00000082\U000000A1"
159 val _ = print ("WideString.size s1 --> " ^ Int.toString (WideString.size s1) ^ "\n")
160 val _ = print ("WideString.size s2 --> " ^ Int.toString (WideString.size s2) ^ "\n")
161 val _ = print ("WideString.size s3 --> " ^ Int.toString (WideString.size s3) ^ "\n")
162 val _ = print ("s1 = s2 --> " ^ Bool.toString (s1 = s2) ^ "\n")
163 val _ = print ("s2 = s3 --> " ^ Bool.toString (s2 = s3) ^ "\n")
164 ----
165 +
166 and, when compiled and executed, will display:
167 +
168 ----
169 WideString.size s1 --> 1
170 WideString.size s2 --> 4
171 WideString.size s3 --> 4
172 s1 = s2 --> false
173 s2 = s3 --> true
174 ----
175 ////
176 --
177
178 * <!Anchor(LineComments)>
179 Line Comments: +allowLineComments {false|true}+
180 +
181 Allow line comments beginning with the token ++(*)++. The following
182 example uses a line comment:
183 +
184 [source,sml]
185 ----
186 (*) This is a line comment
187 ----
188 +
189 Line comments properly nest within block comments. The following
190 example uses line comments nested within block comments:
191 +
192 [source,sml]
193 ----
194 (*
195 val x = 4 (*) This is a line comment
196 *)
197
198 (*
199 val y = 5 (*) This is a line comment *)
200 *)
201 ----
202
203 * <!Anchor(OptBar)>
204 Optional Pattern Bars: +allowOptBar {false|true}+
205 +
206 Allow a bar to appear before the first match rule of a `case`, `fn`,
207 or `handle` expression, allow a bar to appear before the first
208 function-value binding of a `fun` declaration, and allow a bar to
209 appear before the first constructor binding or description of a
210 `datatype` declaration or specification. The following example uses
211 leading bars in a `datatype` declaration, a `fun` declaration, and a
212 `case` expression:
213 +
214 [source,sml]
215 ----
216 datatype t =
217 | C
218 | B
219 | A
220
221 fun
222 | f NONE = 0
223 | f (SOME t) =
224 (case t of
225 | A => 1
226 | B => 2
227 | C => 3)
228 ----
229 +
230 By eliminating the special case of the first element, this feature
231 allows for simpler refactoring (e.g., sorting the lines of the
232 `datatype` declaration's constructor bindings to put the constructors
233 in alphabetical order).
234
235 * <!Anchor(OptSemicolon)>
236 Optional Semicolons: +allowOptSemicolon {false|true}+
237 +
238 Allow a semicolon to appear after the last expression in a sequence or
239 `let`-body expression. The following example uses a trailing
240 semicolon in the body of a `let` expression:
241 +
242 [source,sml]
243 ----
244 fun h z =
245 let
246 val x = 3 * z
247 in
248 f x ;
249 g x ;
250 end
251 ----
252 +
253 By eliminating the special case of the last element, this feature
254 allows for simpler refactoring.
255
256 * <!Anchor(OrPats)>
257 Disjunctive (Or) Patterns: +allowOrPats {false|true}+
258 +
259 Allow disjunctive (a.k.a., "or") patterns of the form +_pat~1~_ |
260 _pat~2~_+, which matches a value that matches either +_pat~1~_+ or
261 +_pat~2~_+. Disjunctive patterns have lower precedence than `as`
262 patterns and constraint patterns, much as `orelse` expressions have
263 lower precedence than `andalso` expressions and constraint
264 expressions. Both sub-patterns of a disjunctive pattern must bind the
265 same variables with the same types. The following example uses
266 disjunctive patterns:
267 +
268 [source,sml]
269 ----
270 datatype t = A of int | B of int | C of int | D of int * int | E of int * int
271
272 fun f t =
273 case t of
274 A x | B x | C x => x + 1
275 | D (x, _) | E (_, x) => x * 2
276 ----
277
278 * <!Anchor(RecordPunExps)>
279 Record Punning Expressions: +allowRecordPunExps {false|true}+
280 +
281 Allow record punning expressions, whereby an identifier +_vid_+ as an
282 expression row in a record expression denotes the expression row
283 +_vid_ = _vid_+ (i.e., treating a label as a variable). The following
284 example uses record punning expressions (and also record punning
285 patterns):
286 +
287 [source,sml]
288 ----
289 fun incB r =
290 case r of {a, b, c} => {a, b = b + 1, c}
291 ----
292 +
293 and is equivalent to:
294 +
295 [source,sml]
296 ----
297 fun incB r =
298 case r of {a = a, b = b, c = c} => {a = a, b = b + 1, c = c}
299 ----
300
301 * <!Anchor(SigWithtype)>
302 `withtype` in Signatures: +allowSigWithtype {false|true}+
303 +
304 Allow `withtype` to modify a `datatype` specification in a signature.
305 The following example uses `withtype` in a signature (and also
306 `withtype` in a declaration):
307 +
308 [source,sml]
309 ----
310 signature STREAM =
311 sig
312 datatype 'a u = Nil | Cons of 'a * 'a t
313 withtype 'a t = unit -> 'a u
314 end
315 structure Stream : STREAM =
316 struct
317 datatype 'a u = Nil | Cons of 'a * 'a t
318 withtype 'a t = unit -> 'a u
319 end
320 ----
321 +
322 and is equivalent to:
323 +
324 [source,sml]
325 ----
326 signature STREAM =
327 sig
328 datatype 'a u = Nil | Cons of 'a * (unit -> 'a u)
329 type 'a t = unit -> 'a u
330 end
331 structure Stream : STREAM =
332 struct
333 datatype 'a u = Nil | Cons of 'a * (unit -> 'a u)
334 type 'a t = unit -> 'a u
335 end
336 ----
337
338 * <!Anchor(VectorExpsAndPats)>
339 Vector Expressions and Patterns: +allowVectorExpsAndPats {false|true}+
340 +
341 --
342 Allow or disallow vector expressions and vector patterns. This is a
343 proxy for all of the following annotations.
344
345 ** <!Anchor(VectorExps)>
346 Vector Expressions: +allowVectorExps {false|true}+
347 +
348 Allow vector expressions of the form +#[_exp~0~_, _exp~1~_, ..., _exp~n-1~_]+ (where _n ≥ 0_). The expression has type +_τ_ vector+ when each expression _exp~i~_ has type +_τ_+.
349
350 ** <!Anchor(VectorPats)>
351 Vector Patterns: +allowVectorPats {false|true}+
352 +
353 Allow vector patterns of the form +#[_pat~0~_, _pat~1~_, ..., _pat~n-1~_]+ (where _n ≥ 0_). The pattern matches values of type +_τ_ vector+ when each pattern _pat~i~_ matches values of type +_τ_+.
354 --