Commit | Line | Data |
---|---|---|
7f918cf1 CE |
1 | SuccessorML |
2 | =========== | |
3 | ||
4 | The purpose of http://sml-family.org/successor-ml/[successor ML], or | |
5 | sML for short, is to provide a vehicle for the continued evolution of | |
6 | ML, using Standard ML as a starting point. The intention is for | |
7 | successor ML to be a living, evolving dialect of ML that is responsive | |
8 | to community needs and advances in language design, implementation, | |
9 | and semantics. | |
10 | ||
11 | == SuccessorML Features in MLton == | |
12 | ||
13 | The following SuccessorML features have been implemented in MLton. | |
14 | The features are disabled by default, and may be enabled utilizing the | |
15 | feature's corresponding <:MLBasisAnnotations:ML Basis annotation> | |
16 | which is listed directly after the feature name. In addition, the | |
17 | +allowSuccessorML {false|true}+ annotation can be used to | |
18 | simultaneously enable all of the features. | |
19 | ||
20 | * <!Anchor(DoDecls)> | |
21 | `do` Declarations: +allowDoDecls {false|true}+ | |
22 | + | |
23 | Allow a +do _exp_+ declaration form, which evaluates _exp_ for its | |
24 | side effects. The following example uses a `do` declaration: | |
25 | + | |
26 | [source,sml] | |
27 | ---- | |
28 | do print "Hello world.\n" | |
29 | ---- | |
30 | + | |
31 | and is equivalent to: | |
32 | + | |
33 | [source,sml] | |
34 | ---- | |
35 | val () = print "Hello world.\n" | |
36 | ---- | |
37 | ||
38 | * <!Anchor(ExtendedConsts)> | |
39 | Extended Constants: +allowExtendedConsts {false|true}+ | |
40 | + | |
41 | -- | |
42 | Allow or disallow all of the extended constants features. This is a | |
43 | proxy for all of the following annotations. | |
44 | ||
45 | ** <!Anchor(ExtendedNumConsts)> | |
46 | Extended Numeric Constants: +allowExtendedNumConsts {false|true}+ | |
47 | + | |
48 | Allow underscores as a separator in numeric constants and allow binary | |
49 | integer and word constants. | |
50 | + | |
51 | Underscores in a numeric constant must occur between digits and | |
52 | consecutive underscores are allowed. | |
53 | + | |
54 | Binary integer constants use the prefix +0b+ and binary word constants | |
55 | use the prefix +0wb+. | |
56 | + | |
57 | The following example uses extended numeric constants (although it may | |
58 | be incorrectly syntax highlighted): | |
59 | + | |
60 | [source,sml] | |
61 | ---- | |
62 | val pb = 0b10101 | |
63 | val nb = ~0b10_10_10 | |
64 | val wb = 0wb1010 | |
65 | val i = 4__327__829 | |
66 | val r = 6.022_140_9e23 | |
67 | ---- | |
68 | ||
69 | ** <!Anchor(ExtendedTextConsts)> Extended Text Constants: +allowExtendedTextConsts {false|true}+ | |
70 | + | |
71 | Allow characters with integer codes ≥ 128 and ≤ 247 that | |
72 | correspond to syntactically well-formed UTF-8 byte sequences in text | |
73 | constants. | |
74 | + | |
75 | //// | |
76 | and allow `\Uxxxxxxxx` numeric escapes in text constants. | |
77 | //// | |
78 | + | |
79 | Any 1, 2, 3, or 4 byte sequence that can be properly decoded to a | |
80 | binary number according to the UTF-8 encoding/decoding scheme is | |
81 | allowed in a text constant (but invalid sequences are not explicitly | |
82 | rejected) and denotes the corresponding sequence of characters with | |
83 | integer codes ≥ 128 and ≤ 247. This feature enables "UTF-8 | |
84 | convenience" (but not comprehensive Unicode support); in particular, | |
85 | it allows one to copy text from a browser and paste it into a string | |
86 | constant in an editor and, furthermore, if the string is printed to a | |
87 | terminal, then will (typically) appear as the original text. The | |
88 | following example uses UTF-8 byte sequences: | |
89 | + | |
90 | [source,sml] | |
91 | ---- | |
92 | val s1 : String.string = "\240\159\130\161" | |
93 | val s2 : String.string = "🂡" | |
94 | val _ = print ("s1 --> " ^ s1 ^ "\n") | |
95 | val _ = print ("s2 --> " ^ s2 ^ "\n") | |
96 | val _ = print ("String.size s1 --> " ^ Int.toString (String.size s1) ^ "\n") | |
97 | val _ = print ("String.size s2 --> " ^ Int.toString (String.size s2) ^ "\n") | |
98 | val _ = print ("s1 = s2 --> " ^ Bool.toString (s1 = s2) ^ "\n") | |
99 | ---- | |
100 | + | |
101 | and, when compiled and executed, will display: | |
102 | + | |
103 | ---- | |
104 | s1 --> 🂡 | |
105 | s2 --> 🂡 | |
106 | String.size s1 --> 4 | |
107 | String.size s2 --> 4 | |
108 | s1 = s2 --> true | |
109 | ---- | |
110 | + | |
111 | Note that the `String.string` type corresponds to any sequence of | |
112 | 8-bit values, including invalid UTF-8 sequences; hence the string | |
113 | constant `"\192"` (a UTF-8 leading byte with no UTF-8 continuation | |
114 | byte) is valid. Similarly, the `Char.char` type corresponds to a | |
115 | single 8-bit value; hence the char constant `#"α"` is not valid, as | |
116 | the text constant `"α"` denotes a sequence of two 8-bit values. | |
117 | + | |
118 | //// | |
119 | A `\Uxxxxxxxx` numeric escape denotes a single character with the | |
120 | hexadecimal integer code `xxxxxxxx`. Such numeric escapes are not | |
121 | necessary for the `String.string` and `Char.char` types, since | |
122 | characters in such text constants must have integer codes ≤ 255 and | |
123 | the `\ddd` and `\uxxxx` numeric escapes suffice. However, the | |
124 | `\Uxxxxxxxx` numeric escapes are useful for the `WideString.string` | |
125 | and `WideChar.char` types, since characters in such text constants may | |
126 | have integer codes ≤ 2^32^-1. The following uses a `\Uxxxxxxxx` | |
127 | numeric escape (although it may be incorrectly syntax highlighted): | |
128 | + | |
129 | [source,sml] | |
130 | ---- | |
131 | val s1 : WideString.string = "\U0001F0A1" (* 'PLAYING CARD ACE OF SPADES' (U+1F0A1) *) | |
132 | val _ = print ("WideString.size s1 --> " ^ Int.toString (WideString.size s1) ^ "\n") | |
133 | ---- | |
134 | + | |
135 | and, when compiled and executed, will display: | |
136 | + | |
137 | ---- | |
138 | WideString.size s1 --> 1 | |
139 | ---- | |
140 | + | |
141 | Note that the `WideString.string` type corresponds to any sequence of | |
142 | 32-bit values, including invalid Unicode code points; hence, the | |
143 | string constants `"\U001F0000"` and `"\U40000000"` are valid (but the | |
144 | corresponding integer codes are not valid Unicode code points). | |
145 | Similarly, the `WideChar.char` type corresponds to a single 32-bit | |
146 | value. | |
147 | + | |
148 | Finally, note that a UTF-8 byte sequence in a `WideString.string` or | |
149 | `WideChar.char` text constant does not denote a single 32-bit value, | |
150 | but rather a sequence of 32-bit values ≥ 128 and ≤ 247. The | |
151 | following example uses both UTF-8 byte sequences and `\Uxxxxxxxx` | |
152 | numeric escapes (although it may be incorrectly syntax highlighted): | |
153 | + | |
154 | [source,sml] | |
155 | ---- | |
156 | val s1 : WideString.string = "\U0001F0A1" (* 'PLAYING CARD ACE OF SPADES' (U+1F0A1) *) | |
157 | val s2 : WideString.string = "🂡" | |
158 | val s3 : WideString.string = "\U000000F0\U0000009F\U00000082\U000000A1" | |
159 | val _ = print ("WideString.size s1 --> " ^ Int.toString (WideString.size s1) ^ "\n") | |
160 | val _ = print ("WideString.size s2 --> " ^ Int.toString (WideString.size s2) ^ "\n") | |
161 | val _ = print ("WideString.size s3 --> " ^ Int.toString (WideString.size s3) ^ "\n") | |
162 | val _ = print ("s1 = s2 --> " ^ Bool.toString (s1 = s2) ^ "\n") | |
163 | val _ = print ("s2 = s3 --> " ^ Bool.toString (s2 = s3) ^ "\n") | |
164 | ---- | |
165 | + | |
166 | and, when compiled and executed, will display: | |
167 | + | |
168 | ---- | |
169 | WideString.size s1 --> 1 | |
170 | WideString.size s2 --> 4 | |
171 | WideString.size s3 --> 4 | |
172 | s1 = s2 --> false | |
173 | s2 = s3 --> true | |
174 | ---- | |
175 | //// | |
176 | -- | |
177 | ||
178 | * <!Anchor(LineComments)> | |
179 | Line Comments: +allowLineComments {false|true}+ | |
180 | + | |
181 | Allow line comments beginning with the token ++(*)++. The following | |
182 | example uses a line comment: | |
183 | + | |
184 | [source,sml] | |
185 | ---- | |
186 | (*) This is a line comment | |
187 | ---- | |
188 | + | |
189 | Line comments properly nest within block comments. The following | |
190 | example uses line comments nested within block comments: | |
191 | + | |
192 | [source,sml] | |
193 | ---- | |
194 | (* | |
195 | val x = 4 (*) This is a line comment | |
196 | *) | |
197 | ||
198 | (* | |
199 | val y = 5 (*) This is a line comment *) | |
200 | *) | |
201 | ---- | |
202 | ||
203 | * <!Anchor(OptBar)> | |
204 | Optional Pattern Bars: +allowOptBar {false|true}+ | |
205 | + | |
206 | Allow a bar to appear before the first match rule of a `case`, `fn`, | |
207 | or `handle` expression, allow a bar to appear before the first | |
208 | function-value binding of a `fun` declaration, and allow a bar to | |
209 | appear before the first constructor binding or description of a | |
210 | `datatype` declaration or specification. The following example uses | |
211 | leading bars in a `datatype` declaration, a `fun` declaration, and a | |
212 | `case` expression: | |
213 | + | |
214 | [source,sml] | |
215 | ---- | |
216 | datatype t = | |
217 | | C | |
218 | | B | |
219 | | A | |
220 | ||
221 | fun | |
222 | | f NONE = 0 | |
223 | | f (SOME t) = | |
224 | (case t of | |
225 | | A => 1 | |
226 | | B => 2 | |
227 | | C => 3) | |
228 | ---- | |
229 | + | |
230 | By eliminating the special case of the first element, this feature | |
231 | allows for simpler refactoring (e.g., sorting the lines of the | |
232 | `datatype` declaration's constructor bindings to put the constructors | |
233 | in alphabetical order). | |
234 | ||
235 | * <!Anchor(OptSemicolon)> | |
236 | Optional Semicolons: +allowOptSemicolon {false|true}+ | |
237 | + | |
238 | Allow a semicolon to appear after the last expression in a sequence or | |
239 | `let`-body expression. The following example uses a trailing | |
240 | semicolon in the body of a `let` expression: | |
241 | + | |
242 | [source,sml] | |
243 | ---- | |
244 | fun h z = | |
245 | let | |
246 | val x = 3 * z | |
247 | in | |
248 | f x ; | |
249 | g x ; | |
250 | end | |
251 | ---- | |
252 | + | |
253 | By eliminating the special case of the last element, this feature | |
254 | allows for simpler refactoring. | |
255 | ||
256 | * <!Anchor(OrPats)> | |
257 | Disjunctive (Or) Patterns: +allowOrPats {false|true}+ | |
258 | + | |
259 | Allow disjunctive (a.k.a., "or") patterns of the form +_pat~1~_ | | |
260 | _pat~2~_+, which matches a value that matches either +_pat~1~_+ or | |
261 | +_pat~2~_+. Disjunctive patterns have lower precedence than `as` | |
262 | patterns and constraint patterns, much as `orelse` expressions have | |
263 | lower precedence than `andalso` expressions and constraint | |
264 | expressions. Both sub-patterns of a disjunctive pattern must bind the | |
265 | same variables with the same types. The following example uses | |
266 | disjunctive patterns: | |
267 | + | |
268 | [source,sml] | |
269 | ---- | |
270 | datatype t = A of int | B of int | C of int | D of int * int | E of int * int | |
271 | ||
272 | fun f t = | |
273 | case t of | |
274 | A x | B x | C x => x + 1 | |
275 | | D (x, _) | E (_, x) => x * 2 | |
276 | ---- | |
277 | ||
278 | * <!Anchor(RecordPunExps)> | |
279 | Record Punning Expressions: +allowRecordPunExps {false|true}+ | |
280 | + | |
281 | Allow record punning expressions, whereby an identifier +_vid_+ as an | |
282 | expression row in a record expression denotes the expression row | |
283 | +_vid_ = _vid_+ (i.e., treating a label as a variable). The following | |
284 | example uses record punning expressions (and also record punning | |
285 | patterns): | |
286 | + | |
287 | [source,sml] | |
288 | ---- | |
289 | fun incB r = | |
290 | case r of {a, b, c} => {a, b = b + 1, c} | |
291 | ---- | |
292 | + | |
293 | and is equivalent to: | |
294 | + | |
295 | [source,sml] | |
296 | ---- | |
297 | fun incB r = | |
298 | case r of {a = a, b = b, c = c} => {a = a, b = b + 1, c = c} | |
299 | ---- | |
300 | ||
301 | * <!Anchor(SigWithtype)> | |
302 | `withtype` in Signatures: +allowSigWithtype {false|true}+ | |
303 | + | |
304 | Allow `withtype` to modify a `datatype` specification in a signature. | |
305 | The following example uses `withtype` in a signature (and also | |
306 | `withtype` in a declaration): | |
307 | + | |
308 | [source,sml] | |
309 | ---- | |
310 | signature STREAM = | |
311 | sig | |
312 | datatype 'a u = Nil | Cons of 'a * 'a t | |
313 | withtype 'a t = unit -> 'a u | |
314 | end | |
315 | structure Stream : STREAM = | |
316 | struct | |
317 | datatype 'a u = Nil | Cons of 'a * 'a t | |
318 | withtype 'a t = unit -> 'a u | |
319 | end | |
320 | ---- | |
321 | + | |
322 | and is equivalent to: | |
323 | + | |
324 | [source,sml] | |
325 | ---- | |
326 | signature STREAM = | |
327 | sig | |
328 | datatype 'a u = Nil | Cons of 'a * (unit -> 'a u) | |
329 | type 'a t = unit -> 'a u | |
330 | end | |
331 | structure Stream : STREAM = | |
332 | struct | |
333 | datatype 'a u = Nil | Cons of 'a * (unit -> 'a u) | |
334 | type 'a t = unit -> 'a u | |
335 | end | |
336 | ---- | |
337 | ||
338 | * <!Anchor(VectorExpsAndPats)> | |
339 | Vector Expressions and Patterns: +allowVectorExpsAndPats {false|true}+ | |
340 | + | |
341 | -- | |
342 | Allow or disallow vector expressions and vector patterns. This is a | |
343 | proxy for all of the following annotations. | |
344 | ||
345 | ** <!Anchor(VectorExps)> | |
346 | Vector Expressions: +allowVectorExps {false|true}+ | |
347 | + | |
348 | Allow vector expressions of the form +#[_exp~0~_, _exp~1~_, ..., _exp~n-1~_]+ (where _n ≥ 0_). The expression has type +_τ_ vector+ when each expression _exp~i~_ has type +_τ_+. | |
349 | ||
350 | ** <!Anchor(VectorPats)> | |
351 | Vector Patterns: +allowVectorPats {false|true}+ | |
352 | + | |
353 | Allow vector patterns of the form +#[_pat~0~_, _pat~1~_, ..., _pat~n-1~_]+ (where _n ≥ 0_). The pattern matches values of type +_τ_ vector+ when each pattern _pat~i~_ matches values of type +_τ_+. | |
354 | -- |