| 1 | SuccessorML |
| 2 | =========== |
| 3 | |
| 4 | The purpose of http://sml-family.org/successor-ml/[successor ML], or |
| 5 | sML for short, is to provide a vehicle for the continued evolution of |
| 6 | ML, using Standard ML as a starting point. The intention is for |
| 7 | successor ML to be a living, evolving dialect of ML that is responsive |
| 8 | to community needs and advances in language design, implementation, |
| 9 | and semantics. |
| 10 | |
| 11 | == SuccessorML Features in MLton == |
| 12 | |
| 13 | The following SuccessorML features have been implemented in MLton. |
| 14 | The features are disabled by default, and may be enabled utilizing the |
| 15 | feature's corresponding <:MLBasisAnnotations:ML Basis annotation> |
| 16 | which is listed directly after the feature name. In addition, the |
| 17 | +allowSuccessorML {false|true}+ annotation can be used to |
| 18 | simultaneously enable all of the features. |
| 19 | |
| 20 | * <!Anchor(DoDecls)> |
| 21 | `do` Declarations: +allowDoDecls {false|true}+ |
| 22 | + |
| 23 | Allow a +do _exp_+ declaration form, which evaluates _exp_ for its |
| 24 | side effects. The following example uses a `do` declaration: |
| 25 | + |
| 26 | [source,sml] |
| 27 | ---- |
| 28 | do print "Hello world.\n" |
| 29 | ---- |
| 30 | + |
| 31 | and is equivalent to: |
| 32 | + |
| 33 | [source,sml] |
| 34 | ---- |
| 35 | val () = print "Hello world.\n" |
| 36 | ---- |
| 37 | |
| 38 | * <!Anchor(ExtendedConsts)> |
| 39 | Extended Constants: +allowExtendedConsts {false|true}+ |
| 40 | + |
| 41 | -- |
| 42 | Allow or disallow all of the extended constants features. This is a |
| 43 | proxy for all of the following annotations. |
| 44 | |
| 45 | ** <!Anchor(ExtendedNumConsts)> |
| 46 | Extended Numeric Constants: +allowExtendedNumConsts {false|true}+ |
| 47 | + |
| 48 | Allow underscores as a separator in numeric constants and allow binary |
| 49 | integer and word constants. |
| 50 | + |
| 51 | Underscores in a numeric constant must occur between digits and |
| 52 | consecutive underscores are allowed. |
| 53 | + |
| 54 | Binary integer constants use the prefix +0b+ and binary word constants |
| 55 | use the prefix +0wb+. |
| 56 | + |
| 57 | The following example uses extended numeric constants (although it may |
| 58 | be incorrectly syntax highlighted): |
| 59 | + |
| 60 | [source,sml] |
| 61 | ---- |
| 62 | val pb = 0b10101 |
| 63 | val nb = ~0b10_10_10 |
| 64 | val wb = 0wb1010 |
| 65 | val i = 4__327__829 |
| 66 | val r = 6.022_140_9e23 |
| 67 | ---- |
| 68 | |
| 69 | ** <!Anchor(ExtendedTextConsts)> Extended Text Constants: +allowExtendedTextConsts {false|true}+ |
| 70 | + |
| 71 | Allow characters with integer codes ≥ 128 and ≤ 247 that |
| 72 | correspond to syntactically well-formed UTF-8 byte sequences in text |
| 73 | constants. |
| 74 | + |
| 75 | //// |
| 76 | and allow `\Uxxxxxxxx` numeric escapes in text constants. |
| 77 | //// |
| 78 | + |
| 79 | Any 1, 2, 3, or 4 byte sequence that can be properly decoded to a |
| 80 | binary number according to the UTF-8 encoding/decoding scheme is |
| 81 | allowed in a text constant (but invalid sequences are not explicitly |
| 82 | rejected) and denotes the corresponding sequence of characters with |
| 83 | integer codes ≥ 128 and ≤ 247. This feature enables "UTF-8 |
| 84 | convenience" (but not comprehensive Unicode support); in particular, |
| 85 | it allows one to copy text from a browser and paste it into a string |
| 86 | constant in an editor and, furthermore, if the string is printed to a |
| 87 | terminal, then will (typically) appear as the original text. The |
| 88 | following example uses UTF-8 byte sequences: |
| 89 | + |
| 90 | [source,sml] |
| 91 | ---- |
| 92 | val s1 : String.string = "\240\159\130\161" |
| 93 | val s2 : String.string = "🂡" |
| 94 | val _ = print ("s1 --> " ^ s1 ^ "\n") |
| 95 | val _ = print ("s2 --> " ^ s2 ^ "\n") |
| 96 | val _ = print ("String.size s1 --> " ^ Int.toString (String.size s1) ^ "\n") |
| 97 | val _ = print ("String.size s2 --> " ^ Int.toString (String.size s2) ^ "\n") |
| 98 | val _ = print ("s1 = s2 --> " ^ Bool.toString (s1 = s2) ^ "\n") |
| 99 | ---- |
| 100 | + |
| 101 | and, when compiled and executed, will display: |
| 102 | + |
| 103 | ---- |
| 104 | s1 --> 🂡 |
| 105 | s2 --> 🂡 |
| 106 | String.size s1 --> 4 |
| 107 | String.size s2 --> 4 |
| 108 | s1 = s2 --> true |
| 109 | ---- |
| 110 | + |
| 111 | Note that the `String.string` type corresponds to any sequence of |
| 112 | 8-bit values, including invalid UTF-8 sequences; hence the string |
| 113 | constant `"\192"` (a UTF-8 leading byte with no UTF-8 continuation |
| 114 | byte) is valid. Similarly, the `Char.char` type corresponds to a |
| 115 | single 8-bit value; hence the char constant `#"α"` is not valid, as |
| 116 | the text constant `"α"` denotes a sequence of two 8-bit values. |
| 117 | + |
| 118 | //// |
| 119 | A `\Uxxxxxxxx` numeric escape denotes a single character with the |
| 120 | hexadecimal integer code `xxxxxxxx`. Such numeric escapes are not |
| 121 | necessary for the `String.string` and `Char.char` types, since |
| 122 | characters in such text constants must have integer codes ≤ 255 and |
| 123 | the `\ddd` and `\uxxxx` numeric escapes suffice. However, the |
| 124 | `\Uxxxxxxxx` numeric escapes are useful for the `WideString.string` |
| 125 | and `WideChar.char` types, since characters in such text constants may |
| 126 | have integer codes ≤ 2^32^-1. The following uses a `\Uxxxxxxxx` |
| 127 | numeric escape (although it may be incorrectly syntax highlighted): |
| 128 | + |
| 129 | [source,sml] |
| 130 | ---- |
| 131 | val s1 : WideString.string = "\U0001F0A1" (* 'PLAYING CARD ACE OF SPADES' (U+1F0A1) *) |
| 132 | val _ = print ("WideString.size s1 --> " ^ Int.toString (WideString.size s1) ^ "\n") |
| 133 | ---- |
| 134 | + |
| 135 | and, when compiled and executed, will display: |
| 136 | + |
| 137 | ---- |
| 138 | WideString.size s1 --> 1 |
| 139 | ---- |
| 140 | + |
| 141 | Note that the `WideString.string` type corresponds to any sequence of |
| 142 | 32-bit values, including invalid Unicode code points; hence, the |
| 143 | string constants `"\U001F0000"` and `"\U40000000"` are valid (but the |
| 144 | corresponding integer codes are not valid Unicode code points). |
| 145 | Similarly, the `WideChar.char` type corresponds to a single 32-bit |
| 146 | value. |
| 147 | + |
| 148 | Finally, note that a UTF-8 byte sequence in a `WideString.string` or |
| 149 | `WideChar.char` text constant does not denote a single 32-bit value, |
| 150 | but rather a sequence of 32-bit values ≥ 128 and ≤ 247. The |
| 151 | following example uses both UTF-8 byte sequences and `\Uxxxxxxxx` |
| 152 | numeric escapes (although it may be incorrectly syntax highlighted): |
| 153 | + |
| 154 | [source,sml] |
| 155 | ---- |
| 156 | val s1 : WideString.string = "\U0001F0A1" (* 'PLAYING CARD ACE OF SPADES' (U+1F0A1) *) |
| 157 | val s2 : WideString.string = "🂡" |
| 158 | val s3 : WideString.string = "\U000000F0\U0000009F\U00000082\U000000A1" |
| 159 | val _ = print ("WideString.size s1 --> " ^ Int.toString (WideString.size s1) ^ "\n") |
| 160 | val _ = print ("WideString.size s2 --> " ^ Int.toString (WideString.size s2) ^ "\n") |
| 161 | val _ = print ("WideString.size s3 --> " ^ Int.toString (WideString.size s3) ^ "\n") |
| 162 | val _ = print ("s1 = s2 --> " ^ Bool.toString (s1 = s2) ^ "\n") |
| 163 | val _ = print ("s2 = s3 --> " ^ Bool.toString (s2 = s3) ^ "\n") |
| 164 | ---- |
| 165 | + |
| 166 | and, when compiled and executed, will display: |
| 167 | + |
| 168 | ---- |
| 169 | WideString.size s1 --> 1 |
| 170 | WideString.size s2 --> 4 |
| 171 | WideString.size s3 --> 4 |
| 172 | s1 = s2 --> false |
| 173 | s2 = s3 --> true |
| 174 | ---- |
| 175 | //// |
| 176 | -- |
| 177 | |
| 178 | * <!Anchor(LineComments)> |
| 179 | Line Comments: +allowLineComments {false|true}+ |
| 180 | + |
| 181 | Allow line comments beginning with the token ++(*)++. The following |
| 182 | example uses a line comment: |
| 183 | + |
| 184 | [source,sml] |
| 185 | ---- |
| 186 | (*) This is a line comment |
| 187 | ---- |
| 188 | + |
| 189 | Line comments properly nest within block comments. The following |
| 190 | example uses line comments nested within block comments: |
| 191 | + |
| 192 | [source,sml] |
| 193 | ---- |
| 194 | (* |
| 195 | val x = 4 (*) This is a line comment |
| 196 | *) |
| 197 | |
| 198 | (* |
| 199 | val y = 5 (*) This is a line comment *) |
| 200 | *) |
| 201 | ---- |
| 202 | |
| 203 | * <!Anchor(OptBar)> |
| 204 | Optional Pattern Bars: +allowOptBar {false|true}+ |
| 205 | + |
| 206 | Allow a bar to appear before the first match rule of a `case`, `fn`, |
| 207 | or `handle` expression, allow a bar to appear before the first |
| 208 | function-value binding of a `fun` declaration, and allow a bar to |
| 209 | appear before the first constructor binding or description of a |
| 210 | `datatype` declaration or specification. The following example uses |
| 211 | leading bars in a `datatype` declaration, a `fun` declaration, and a |
| 212 | `case` expression: |
| 213 | + |
| 214 | [source,sml] |
| 215 | ---- |
| 216 | datatype t = |
| 217 | | C |
| 218 | | B |
| 219 | | A |
| 220 | |
| 221 | fun |
| 222 | | f NONE = 0 |
| 223 | | f (SOME t) = |
| 224 | (case t of |
| 225 | | A => 1 |
| 226 | | B => 2 |
| 227 | | C => 3) |
| 228 | ---- |
| 229 | + |
| 230 | By eliminating the special case of the first element, this feature |
| 231 | allows for simpler refactoring (e.g., sorting the lines of the |
| 232 | `datatype` declaration's constructor bindings to put the constructors |
| 233 | in alphabetical order). |
| 234 | |
| 235 | * <!Anchor(OptSemicolon)> |
| 236 | Optional Semicolons: +allowOptSemicolon {false|true}+ |
| 237 | + |
| 238 | Allow a semicolon to appear after the last expression in a sequence or |
| 239 | `let`-body expression. The following example uses a trailing |
| 240 | semicolon in the body of a `let` expression: |
| 241 | + |
| 242 | [source,sml] |
| 243 | ---- |
| 244 | fun h z = |
| 245 | let |
| 246 | val x = 3 * z |
| 247 | in |
| 248 | f x ; |
| 249 | g x ; |
| 250 | end |
| 251 | ---- |
| 252 | + |
| 253 | By eliminating the special case of the last element, this feature |
| 254 | allows for simpler refactoring. |
| 255 | |
| 256 | * <!Anchor(OrPats)> |
| 257 | Disjunctive (Or) Patterns: +allowOrPats {false|true}+ |
| 258 | + |
| 259 | Allow disjunctive (a.k.a., "or") patterns of the form +_pat~1~_ | |
| 260 | _pat~2~_+, which matches a value that matches either +_pat~1~_+ or |
| 261 | +_pat~2~_+. Disjunctive patterns have lower precedence than `as` |
| 262 | patterns and constraint patterns, much as `orelse` expressions have |
| 263 | lower precedence than `andalso` expressions and constraint |
| 264 | expressions. Both sub-patterns of a disjunctive pattern must bind the |
| 265 | same variables with the same types. The following example uses |
| 266 | disjunctive patterns: |
| 267 | + |
| 268 | [source,sml] |
| 269 | ---- |
| 270 | datatype t = A of int | B of int | C of int | D of int * int | E of int * int |
| 271 | |
| 272 | fun f t = |
| 273 | case t of |
| 274 | A x | B x | C x => x + 1 |
| 275 | | D (x, _) | E (_, x) => x * 2 |
| 276 | ---- |
| 277 | |
| 278 | * <!Anchor(RecordPunExps)> |
| 279 | Record Punning Expressions: +allowRecordPunExps {false|true}+ |
| 280 | + |
| 281 | Allow record punning expressions, whereby an identifier +_vid_+ as an |
| 282 | expression row in a record expression denotes the expression row |
| 283 | +_vid_ = _vid_+ (i.e., treating a label as a variable). The following |
| 284 | example uses record punning expressions (and also record punning |
| 285 | patterns): |
| 286 | + |
| 287 | [source,sml] |
| 288 | ---- |
| 289 | fun incB r = |
| 290 | case r of {a, b, c} => {a, b = b + 1, c} |
| 291 | ---- |
| 292 | + |
| 293 | and is equivalent to: |
| 294 | + |
| 295 | [source,sml] |
| 296 | ---- |
| 297 | fun incB r = |
| 298 | case r of {a = a, b = b, c = c} => {a = a, b = b + 1, c = c} |
| 299 | ---- |
| 300 | |
| 301 | * <!Anchor(SigWithtype)> |
| 302 | `withtype` in Signatures: +allowSigWithtype {false|true}+ |
| 303 | + |
| 304 | Allow `withtype` to modify a `datatype` specification in a signature. |
| 305 | The following example uses `withtype` in a signature (and also |
| 306 | `withtype` in a declaration): |
| 307 | + |
| 308 | [source,sml] |
| 309 | ---- |
| 310 | signature STREAM = |
| 311 | sig |
| 312 | datatype 'a u = Nil | Cons of 'a * 'a t |
| 313 | withtype 'a t = unit -> 'a u |
| 314 | end |
| 315 | structure Stream : STREAM = |
| 316 | struct |
| 317 | datatype 'a u = Nil | Cons of 'a * 'a t |
| 318 | withtype 'a t = unit -> 'a u |
| 319 | end |
| 320 | ---- |
| 321 | + |
| 322 | and is equivalent to: |
| 323 | + |
| 324 | [source,sml] |
| 325 | ---- |
| 326 | signature STREAM = |
| 327 | sig |
| 328 | datatype 'a u = Nil | Cons of 'a * (unit -> 'a u) |
| 329 | type 'a t = unit -> 'a u |
| 330 | end |
| 331 | structure Stream : STREAM = |
| 332 | struct |
| 333 | datatype 'a u = Nil | Cons of 'a * (unit -> 'a u) |
| 334 | type 'a t = unit -> 'a u |
| 335 | end |
| 336 | ---- |
| 337 | |
| 338 | * <!Anchor(VectorExpsAndPats)> |
| 339 | Vector Expressions and Patterns: +allowVectorExpsAndPats {false|true}+ |
| 340 | + |
| 341 | -- |
| 342 | Allow or disallow vector expressions and vector patterns. This is a |
| 343 | proxy for all of the following annotations. |
| 344 | |
| 345 | ** <!Anchor(VectorExps)> |
| 346 | Vector Expressions: +allowVectorExps {false|true}+ |
| 347 | + |
| 348 | Allow vector expressions of the form +#[_exp~0~_, _exp~1~_, ..., _exp~n-1~_]+ (where _n ≥ 0_). The expression has type +_τ_ vector+ when each expression _exp~i~_ has type +_τ_+. |
| 349 | |
| 350 | ** <!Anchor(VectorPats)> |
| 351 | Vector Patterns: +allowVectorPats {false|true}+ |
| 352 | + |
| 353 | Allow vector patterns of the form +#[_pat~0~_, _pat~1~_, ..., _pat~n-1~_]+ (where _n ≥ 0_). The pattern matches values of type +_τ_ vector+ when each pattern _pat~i~_ matches values of type +_τ_+. |
| 354 | -- |