doc/guide/src/mlton-guide.adoc

   1 MLton Guide ({mlton-version})
   2 =============================
   3 :toc:
   4 :mlton-guide-page!:
   5
   6 [abstract]
   7 --
   8 This is the guide for MLton, an open-source, whole-program, optimizing Standard ML compiler.
   9
  10 This guide was generated automatically from the MLton website, available online at http://mlton.org. It is up to date for MLton {mlton-version}.
  11 --
  12
  13
  14 :leveloffset: 1
  15
  16 :mlton-guide-page: Home
  17 [[Home]]
  18 MLton
  19 =====
  20
  21 == What is MLton? ==
  22
  23 MLton is an open-source, whole-program, optimizing
  24 <:StandardML:Standard ML> compiler.
  25
  26 == What's new? ==
  27
  28 * 20180207: Please try out our latest release, <:Release20180207:MLton 20180207>.
  29
  30 * 20140730: http://www.cs.rit.edu/%7emtf[Matthew Fluet] and
  31   http://www.cse.buffalo.edu/%7elziarek[Lukasz Ziarek] have been
  32   awarded an http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=12810[NSF
  33   CISE Research Infrastructure (CRI)] grant titled "Positioning MLton
  34   for Next-Generation Programming Languages Research;" read the award
  35   abstracts
  36   (http://www.nsf.gov/awardsearch/showAward?AWD_ID=1405770[Award{nbsp}#1405770]
  37   and
  38   http://www.nsf.gov/awardsearch/showAward?AWD_ID=1405614[Award{nbsp}#1405614])
  39   for more details.
  40
  41 == Next steps ==
  42
  43 * Read about MLton's <:Features:>.
  44 * Look at <:Documentation:>.
  45 * See some <:Users:> of MLton.
  46 * https://sourceforge.net/projects/mlton/files/mlton/20180207[Download] MLton.
  47 * Meet the MLton <:Developers:>.
  48 * Get involved with MLton <:Development:>.
  49 * User-maintained <:FAQ:>.
  50 * <:Contact:> us.
  51
  52 <<<
  53
  54 :mlton-guide-page: AdamGoode
  55 [[AdamGoode]]
  56 AdamGoode
  57 =========
  58
  59  * I maintain the Fedora package of MLton, in https://admin.fedoraproject.org/pkgdb/packages/name/mlton[Fedora].
  60  * I have contributed some patches for Makefiles and PDF documentation building.
  61
  62 <<<
  63
  64 :mlton-guide-page: AdmitsEquality
  65 [[AdmitsEquality]]
  66 AdmitsEquality
  67 ==============
  68
  69 A <:TypeConstructor:> admits equality if whenever it is applied to
  70 equality types, the result is an <:EqualityType:>.  This notion enables
  71 one to determine whether a type constructor application yields an
  72 equality type solely from the application, without looking at the
  73 definition of the type constructor.  It helps to ensure that
  74 <:PolymorphicEquality:> is only applied to sensible values.
  75
  76 The definition of admits equality depends on whether the type
  77 constructor was declared by a `type` definition or a
  78 `datatype` declaration.
  79
  80
  81 == Type definitions ==
  82
  83 For type definition
  84
  85 [source,sml]
  86 ----
  87 type ('a1, ..., 'an) t = ...
  88 ----
  89
  90 type constructor `t` admits equality if the right-hand side of the
  91 definition is an equality type after replacing `'a1`, ...,
  92 `'an` by equality types (it doesn't matter which equality types
  93 are chosen).
  94
  95 For a nullary type definition, this amounts to the right-hand side
  96 being an equality type.  For example, after the definition
  97
  98 [source,sml]
  99 ----
 100 type t = bool * int
 101 ----
 102
 103 type constructor `t` admits equality because `bool * int` is
 104 an equality type.   On the other hand, after the definition
 105
 106 [source,sml]
 107 ----
 108 type t = bool * int * real
 109 ----
 110
 111 type constructor `t` does not admit equality, because `real`
 112 is not an equality type.
 113
 114 For another example, after the definition
 115
 116 [source,sml]
 117 ----
 118 type 'a t = bool * 'a
 119 ----
 120
 121 type constructor `t` admits equality because `bool * int`
 122 is an equality type (we could have chosen any equality type other than
 123 `int`).
 124
 125 On the other hand, after the definition
 126
 127 [source,sml]
 128 ----
 129 type 'a t = real * 'a
 130 ----
 131
 132 type constructor `t` does not admit equality because
 133 `real * int` is not equality type.
 134
 135 We can check that a type constructor admits equality using an
 136 `eqtype` specification.
 137
 138 [source,sml]
 139 ----
 140 structure Ok: sig eqtype 'a t end =
 141    struct
 142       type 'a t = bool * 'a
 143    end
 144 ----
 145
 146 [source,sml]
 147 ----
 148 structure Bad: sig eqtype 'a t end =
 149    struct
 150       type 'a t = real * int * 'a
 151    end
 152 ----
 153
 154 On `structure Bad`, MLton reports the following error.
 155 ----
 156 Error: z.sml 1.16-1.34.
 157   Type in structure disagrees with signature (admits equality): t.
 158     structure: type 'a t = [real] * _ * _
 159     defn at: z.sml 3.15-3.15
 160     signature: [eqtype] 'a t
 161     spec at: z.sml 1.30-1.30
 162 ----
 163
 164 The `structure:` section provides an explanation of why the type
 165 did not admit equality, highlighting the problematic component
 166 (`real`).
 167
 168
 169 == Datatype declarations ==
 170
 171 For a type constructor declared by a datatype declaration to admit
 172 equality, every <:Variant:variant> of the datatype must admit equality.  For
 173 example, the following datatype admits equality because `bool` and
 174 `char * int` are equality types.
 175
 176 [source,sml]
 177 ----
 178 datatype t = A of bool | B of char * int
 179 ----
 180
 181 Nullary constructors trivially admit equality, so that the following
 182 datatype admits equality.
 183
 184 [source,sml]
 185 ----
 186 datatype t = A | B | C
 187 ----
 188
 189 For a parameterized datatype constructor to admit equality, we
 190 consider each <:Variant:variant> as a type definition, and require that the
 191 definition admit equality.  For example, for the datatype
 192
 193 [source,sml]
 194 ----
 195 datatype 'a t = A of bool * 'a | B of 'a
 196 ----
 197
 198 the type definitions
 199
 200 [source,sml]
 201 ----
 202 type 'a tA = bool * 'a
 203 type 'a tB = 'a
 204 ----
 205
 206 both admit equality.  Thus, type constructor `t` admits equality.
 207
 208 On the other hand, the following datatype does not admit equality.
 209
 210 [source,sml]
 211 ----
 212 datatype 'a t = A of bool * 'a | B of real * 'a
 213 ----
 214
 215 As with type definitions, we can check using an `eqtype`
 216 specification.
 217
 218 [source,sml]
 219 ----
 220 structure Bad: sig eqtype 'a t end =
 221    struct
 222       datatype 'a t = A of bool * 'a | B of real * 'a
 223    end
 224 ----
 225
 226 MLton reports the following error.
 227
 228 ----
 229 Error: z.sml 1.16-1.34.
 230   Type in structure disagrees with signature (admits equality): t.
 231     structure: datatype 'a t = B of [real] * _ | ...
 232     defn at: z.sml 3.19-3.19
 233     signature: [eqtype] 'a t
 234     spec at: z.sml 1.30-1.30
 235 ----
 236
 237 MLton indicates the problematic constructor (`B`), as well as
 238 the problematic component of the constructor's argument.
 239
 240
 241 === Recursive datatypes ===
 242
 243 A recursive datatype like
 244
 245 [source,sml]
 246 ----
 247 datatype t = A | B of int * t
 248 ----
 249
 250 introduces a new problem, since in order to decide whether `t`
 251 admits equality, we need to know for the `B` <:Variant:variant> whether
 252 `t` admits equality.  The <:DefinitionOfStandardML:Definition>
 253 answers this question by requiring a type constructor to admit
 254 equality if it is consistent to do so.  So, in our above example, if
 255 we assume that `t` admits equality, then the <:Variant:variant>
 256 `B of int * t` admits equality.  Then, since the `A` <:Variant:variant>
 257 trivially admits equality, so does the type constructor `t`.
 258 Thus, it was consistent to assume that `t` admits equality, and
 259 so, `t` does admit equality.
 260
 261 On the other hand, in the following declaration
 262
 263 [source,sml]
 264 ----
 265 datatype t = A | B of real * t
 266 ----
 267
 268 if we assume that `t` admits equality, then the `B` <:Variant:variant>
 269 does not admit equality.  Hence, the type constructor `t` does not
 270 admit equality, and our assumption was inconsistent.  Hence, `t`
 271 does not admit equality.
 272
 273 The same kind of reasoning applies to mutually recursive datatypes as
 274 well.  For example, the following defines both `t` and `u` to
 275 admit equality.
 276
 277 [source,sml]
 278 ----
 279 datatype t = A | B of u
 280 and u = C | D of t
 281 ----
 282
 283 But the following defines neither `t` nor `u` to admit
 284 equality.
 285
 286 [source,sml]
 287 ----
 288 datatype t = A | B of u * real
 289 and u = C | D of t
 290 ----
 291
 292 As always, we can check whether a type admits equality using an
 293 `eqtype` specification.
 294
 295 [source,sml]
 296 ----
 297 structure Bad: sig eqtype t eqtype u end =
 298    struct
 299       datatype t = A | B of u * real
 300       and u = C | D of t
 301    end
 302 ----
 303
 304 MLton reports the following error.
 305
 306 ----
 307 Error: z.sml 1.16-1.40.
 308   Type in structure disagrees with signature (admits equality): t.
 309     structure: datatype t = B of [_str.u] * [real] | ...
 310     defn at: z.sml 3.16-3.16
 311     signature: [eqtype] t
 312     spec at: z.sml 1.27-1.27
 313 Error: z.sml 1.16-1.40.
 314   Type in structure disagrees with signature (admits equality): u.
 315     structure: datatype u = D of [_str.t] | ...
 316     defn at: z.sml 4.11-4.11
 317     signature: [eqtype] u
 318     spec at: z.sml 1.36-1.36
 319 ----
 320
 321 <<<
 322
 323 :mlton-guide-page: Alice
 324 [[Alice]]
 325 Alice
 326 =====
 327
 328 http://www.ps.uni-saarland.de/alice[Alice ML] is an extension of SML with
 329 concurrency, dynamic typing, components, distribution, and constraint
 330 solving.
 331
 332 <<<
 333
 334 :mlton-guide-page: AllocateRegisters
 335 [[AllocateRegisters]]
 336 AllocateRegisters
 337 =================
 338
 339 <:AllocateRegisters:> is an analysis pass for the <:RSSA:>
 340 <:IntermediateLanguage:>, invoked from <:ToMachine:>.
 341
 342 == Description ==
 343
 344 Computes an allocation of <:RSSA:> variables as <:Machine:> register
 345 or stack operands.
 346
 347 == Implementation ==
 348
 349 * <!ViewGitFile(mlton,master,mlton/backend/allocate-registers.sig)>
 350 * <!ViewGitFile(mlton,master,mlton/backend/allocate-registers.fun)>
 351
 352 == Details and Notes ==
 353
 354 {empty}
 355
 356 <<<
 357
 358 :mlton-guide-page: AndreiFormiga
 359 [[AndreiFormiga]]
 360 AndreiFormiga
 361 =============
 362
 363 I'm a graduate student just back in academia. I study concurrent and parallel systems, with a great deal of interest in programming languages (theory, design, implementation). I happen to like functional languages.
 364
 365 I use the nickname tautologico on #sml and my email is andrei DOT formiga AT gmail DOT com.
 366
 367 <<<
 368
 369 :mlton-guide-page: ArrayLiteral
 370 [[ArrayLiteral]]
 371 ArrayLiteral
 372 ============
 373
 374 <:StandardML:Standard ML> does not have a syntax for array literals or
 375 vector literals.  The only way to write down an array is like
 376 [source,sml]
 377 ----
 378 Array.fromList [w, x, y, z]
 379 ----
 380
 381 No SML compiler produces efficient code for the above expression.  The
 382 generated code allocates a list and then converts it to an array.  To
 383 alleviate this, one could write down the same array using
 384 `Array.tabulate`, or even using `Array.array` and `Array.update`, but
 385 that is syntactically unwieldy.
 386
 387 Fortunately, using <:Fold:>, it is possible to define constants `A`,
 388 and +&grave;+ so that one can write down an array like:
 389 [source,sml]
 390 ----
 391 A `w `x `y `z $
 392 ----
 393 This is as syntactically concise as the `fromList` expression.
 394 Furthermore, MLton, at least, will generate the efficient code as if
 395 one had written down a use of `Array.array` followed by four uses of
 396 `Array.update`.
 397
 398 Along with `A` and +&grave;+, one can define a constant `V` that makes
 399 it possible to define vector literals with the same syntax, e.g.,
 400 [source,sml]
 401 ----
 402 V `w `x `y `z $
 403 ----
 404
 405 Note that the same element indicator, +&grave;+, serves for both array
 406 and vector literals.  Of course, the `$` is the end-of-arguments
 407 marker always used with <:Fold:>.  The only difference between an
 408 array literal and vector literal is the `A` or `V` at the beginning.
 409
 410 Here is the implementation of `A`, `V`, and +&grave;+.  We place them
 411 in a structure and use signature abstraction to hide the type of the
 412 accumulator.  See <:Fold:> for more on this technique.
 413 [source,sml]
 414 ----
 415 structure Literal:>
 416    sig
 417       type 'a z
 418       val A: ('a z, 'a z, 'a array, 'd) Fold.t
 419       val V: ('a z, 'a z, 'a vector, 'd) Fold.t
 420       val ` : ('a, 'a z, 'a z, 'b, 'c, 'd) Fold.step1
 421    end =
 422    struct
 423       type 'a z = int * 'a option * ('a array -> unit)
 424
 425       val A =
 426          fn z =>
 427          Fold.fold
 428          ((0, NONE, ignore),
 429           fn (n, opt, fill) =>
 430           case opt of
 431              NONE =>
 432                 Array.tabulate (0, fn _ => raise Fail "array0")
 433            | SOME x =>
 434                 let
 435                    val a = Array.array (n, x)
 436                    val () = fill a
 437                 in
 438                    a
 439                 end)
 440          z
 441
 442       val V = fn z => Fold.post (A, Array.vector) z
 443
 444       val ` =
 445          fn z =>
 446          Fold.step1
 447          (fn (x, (i, opt, fill)) =>
 448           (i + 1,
 449            SOME x,
 450            fn a => (Array.update (a, i, x); fill a)))
 451          z
 452    end
 453 ----
 454
 455 The idea of the code is for the fold to accumulate a count of the
 456 number of elements, a sample element, and a function that fills in all
 457 the elements.  When the fold is complete, the finishing function
 458 allocates the array, applies the fill function, and returns the array.
 459 The only difference between `A` and `V` is at the very end; `A` just
 460 returns the array, while `V` converts it to a vector using
 461 post-composition, which is further described on the <:Fold:> page.
 462
 463 <<<
 464
 465 :mlton-guide-page: AST
 466 [[AST]]
 467 AST
 468 ===
 469
 470 <:AST:> is the <:IntermediateLanguage:> produced by the <:FrontEnd:>
 471 and translated by <:Elaborate:> to <:CoreML:>.
 472
 473 == Description ==
 474
 475 The abstract syntax tree produced by the <:FrontEnd:>.
 476
 477 == Implementation ==
 478
 479 * <!ViewGitFile(mlton,master,mlton/ast/ast-programs.sig)>
 480 * <!ViewGitFile(mlton,master,mlton/ast/ast-programs.fun)>
 481 * <!ViewGitFile(mlton,master,mlton/ast/ast-modules.sig)>
 482 * <!ViewGitFile(mlton,master,mlton/ast/ast-modules.fun)>
 483 * <!ViewGitFile(mlton,master,mlton/ast/ast-core.sig)>
 484 * <!ViewGitFile(mlton,master,mlton/ast/ast-core.fun)>
 485 * <!ViewGitDir(mlton,master,mlton/ast)>
 486
 487 == Type Checking ==
 488
 489 The <:AST:> <:IntermediateLanguage:> has no independent type
 490 checker. Type inference is performed on an AST program as part of
 491 <:Elaborate:>.
 492
 493 == Details and Notes ==
 494
 495 === Source locations ===
 496
 497 MLton makes use of a relatively clean method for annotating the
 498 abstract syntax tree with source location information.  Every source
 499 program phrase is "wrapped" with the `WRAPPED` interface:
 500
 501 [source,sml]
 502 ----
 503 sys::[./bin/InclGitFile.py mlton master mlton/control/wrapped.sig 8:19]
 504 ----
 505
 506 The key idea is that `node'` is the type of an unannotated syntax
 507 phrase and `obj` is the type of its annotated counterpart. In the
 508 implementation, every `node'` is annotated with a `Region.t`
 509 (<!ViewGitFile(mlton,master,mlton/control/region.sig)>,
 510 <!ViewGitFile(mlton,master,mlton/control/region.sml)>), which describes the
 511 syntax phrase's left source position and right source position, where
 512 `SourcePos.t` (<!ViewGitFile(mlton,master,mlton/control/source-pos.sig)>,
 513 <!ViewGitFile(mlton,master,mlton/control/source-pos.sml)>) denotes a
 514 particular file, line, and column.  A typical use of the `WRAPPED`
 515 interface is illustrated by the following code:
 516
 517 [source,sml]
 518 ----
 519 sys::[./bin/InclGitFile.py mlton master mlton/ast/ast-core.sig 46:65]
 520 ----
 521
 522 Thus, AST nodes are cleanly separated from source locations.  By way
 523 of contrast, consider the approach taken by <:SMLNJ:SML/NJ> (and also
 524 by the <:CKitLibrary:CKit Library>).  Each datatype denoting a syntax
 525 phrase dedicates a special constructor for annotating source
 526 locations:
 527 [source,sml]
 528 -----
 529 datatype pat = WildPat                             (* empty pattern *)
 530              | AppPat of {constr:pat,argument:pat} (* application *)
 531              | MarkPat of pat * region             (* mark a pattern *)
 532 ----
 533
 534 The main drawback of this approach is that static type checking is not
 535 sufficient to guarantee that the AST emitted from the front-end is
 536 properly annotated.
 537
 538 <<<
 539
 540 :mlton-guide-page: BasisLibrary
 541 [[BasisLibrary]]
 542 BasisLibrary
 543 ============
 544
 545 The <:StandardML:Standard ML> Basis Library is a collection of modules
 546 dealing with basic types, input/output, OS interfaces, and simple
 547 datatypes.  It is intended as a portable library usable across all
 548 implementations of SML.  For the official online version of the Basis
 549 Library specification, see http://www.standardml.org/Basis.
 550 <!Cite(GansnerReppy04, The Standard ML Basis Library)> is a book
 551 version that includes all of the online version and more.  For a
 552 reverse chronological list of changes to the specification, see
 553 http://www.standardml.org/Basis/history.html.
 554
 555 MLton implements all of the required portions of the Basis Library.
 556 MLton also implements many of the optional structures.  You can obtain
 557 a complete and current list of what's available using
 558 `mlton -show-basis` (see <:ShowBasis:>).  By default, MLton makes the
 559 Basis Library available to user programs.  You can also
 560 <:MLBasisAvailableLibraries:access the Basis Library> from
 561 <:MLBasis: ML Basis> files.
 562
 563 Below is a complete list of what MLton implements.
 564
 565 == Top-level types and constructors ==
 566
 567 `eqtype 'a array`
 568
 569 `datatype bool = false | true`
 570
 571 `eqtype char`
 572
 573 `type exn`
 574
 575 `eqtype int`
 576
 577 ++datatype 'a list = nil | {two-colons} of ('a * 'a list)++
 578
 579 `datatype 'a option = NONE | SOME of 'a`
 580
 581 `datatype order = EQUAL | GREATER | LESS`
 582
 583 `type real`
 584
 585 `datatype 'a ref = ref of 'a`
 586
 587 `eqtype string`
 588
 589 `type substring`
 590
 591 `eqtype unit`
 592
 593 `eqtype 'a vector`
 594
 595 `eqtype word`
 596
 597 == Top-level exception constructors ==
 598
 599 `Bind`
 600
 601 `Chr`
 602
 603 `Div`
 604
 605 `Domain`
 606
 607 `Empty`
 608
 609 `Fail of string`
 610
 611 `Match`
 612
 613 `Option`
 614
 615 `Overflow`
 616
 617 `Size`
 618
 619 `Span`
 620
 621 `Subscript`
 622
 623 == Top-level values ==
 624
 625 MLton does not implement the optional top-level value
 626 `use: string -> unit`, which conflicts with whole-program
 627 compilation because it allows new code to be loaded dynamically.
 628
 629 MLton implements all other top-level values:
 630
 631 `!`,
 632 `:=`,
 633 `<>`,
 634 `=`,
 635 `@`,
 636 `^`,
 637 `app`,
 638 `before`,
 639 `ceil`,
 640 `chr`,
 641 `concat`,
 642 `exnMessage`,
 643 `exnName`,
 644 `explode`,
 645 `floor`,
 646 `foldl`,
 647 `foldr`,
 648 `getOpt`,
 649 `hd`,
 650 `ignore`,
 651 `implode`,
 652 `isSome`,
 653 `length`,
 654 `map`,
 655 `not`,
 656 `null`,
 657 `o`,
 658 `ord`,
 659 `print`,
 660 `real`,
 661 `rev`,
 662 `round`,
 663 `size`,
 664 `str`,
 665 `substring`,
 666 `tl`,
 667 `trunc`,
 668 `valOf`,
 669 `vector`
 670
 671 == Overloaded identifiers ==
 672
 673 `*`,
 674 `+`,
 675 `-`,
 676 `/`,
 677 `<`,
 678 `<=`,
 679 `>`,
 680 `>=`,
 681 `~`,
 682 `abs`,
 683 `div`,
 684 `mod`
 685
 686 == Top-level signatures ==
 687
 688 `ARRAY`
 689
 690 `ARRAY2`
 691
 692 `ARRAY_SLICE`
 693
 694 `BIN_IO`
 695
 696 `BIT_FLAGS`
 697
 698 `BOOL`
 699
 700 `BYTE`
 701
 702 `CHAR`
 703
 704 `COMMAND_LINE`
 705
 706 `DATE`
 707
 708 `GENERAL`
 709
 710 `GENERIC_SOCK`
 711
 712 `IEEE_REAL`
 713
 714 `IMPERATIVE_IO`
 715
 716 `INET_SOCK`
 717
 718 `INTEGER`
 719
 720 `INT_INF`
 721
 722 `IO`
 723
 724 `LIST`
 725
 726 `LIST_PAIR`
 727
 728 `MATH`
 729
 730 `MONO_ARRAY`
 731
 732 `MONO_ARRAY2`
 733
 734 `MONO_ARRAY_SLICE`
 735
 736 `MONO_VECTOR`
 737
 738 `MONO_VECTOR_SLICE`
 739
 740 `NET_HOST_DB`
 741
 742 `NET_PROT_DB`
 743
 744 `NET_SERV_DB`
 745
 746 `OPTION`
 747
 748 `OS`
 749
 750 `OS_FILE_SYS`
 751
 752 `OS_IO`
 753
 754 `OS_PATH`
 755
 756 `OS_PROCESS`
 757
 758 `PACK_REAL`
 759
 760 `PACK_WORD`
 761
 762 `POSIX`
 763
 764 `POSIX_ERROR`
 765
 766 `POSIX_FILE_SYS`
 767
 768 `POSIX_IO`
 769
 770 `POSIX_PROCESS`
 771
 772 `POSIX_PROC_ENV`
 773
 774 `POSIX_SIGNAL`
 775
 776 `POSIX_SYS_DB`
 777
 778 `POSIX_TTY`
 779
 780 `PRIM_IO`
 781
 782 `REAL`
 783
 784 `SOCKET`
 785
 786 `STREAM_IO`
 787
 788 `STRING`
 789
 790 `STRING_CVT`
 791
 792 `SUBSTRING`
 793
 794 `TEXT`
 795
 796 `TEXT_IO`
 797
 798 `TEXT_STREAM_IO`
 799
 800 `TIME`
 801
 802 `TIMER`
 803
 804 `UNIX`
 805
 806 `UNIX_SOCK`
 807
 808 `VECTOR`
 809
 810 `VECTOR_SLICE`
 811
 812 `WORD`
 813
 814 == Top-level structures ==
 815
 816 `structure Array: ARRAY`
 817
 818 `structure Array2: ARRAY2`
 819
 820 `structure ArraySlice: ARRAY_SLICE`
 821
 822 `structure BinIO: BIN_IO`
 823
 824 `structure BinPrimIO: PRIM_IO`
 825
 826 `structure Bool: BOOL`
 827
 828 `structure BoolArray: MONO_ARRAY`
 829
 830 `structure BoolArray2: MONO_ARRAY2`
 831
 832 `structure BoolArraySlice: MONO_ARRAY_SLICE`
 833
 834 `structure BoolVector: MONO_VECTOR`
 835
 836 `structure BoolVectorSlice: MONO_VECTOR_SLICE`
 837
 838 `structure Byte: BYTE`
 839
 840 `structure Char: CHAR`
 841
 842 * `Char` characters correspond to ISO-8859-1.  The `Char` functions do not depend on locale.
 843
 844 `structure CharArray: MONO_ARRAY`
 845
 846 `structure CharArray2: MONO_ARRAY2`
 847
 848 `structure CharArraySlice: MONO_ARRAY_SLICE`
 849
 850 `structure CharVector: MONO_VECTOR`
 851
 852 `structure CharVectorSlice: MONO_VECTOR_SLICE`
 853
 854 `structure CommandLine: COMMAND_LINE`
 855
 856 `structure Date: DATE`
 857
 858 * `Date.fromString` and `Date.scan` accept a space in addition to a zero for the first character of the day of the month.  The Basis Library specification only allows a zero.
 859
 860 `structure FixedInt: INTEGER`
 861
 862 `structure General: GENERAL`
 863
 864 `structure GenericSock: GENERIC_SOCK`
 865
 866 `structure IEEEReal: IEEE_REAL`
 867
 868 `structure INetSock: INET_SOCK`
 869
 870 `structure IO: IO`
 871
 872 `structure Int: INTEGER`
 873
 874 `structure Int1: INTEGER`
 875
 876 `structure Int2: INTEGER`
 877
 878 `structure Int3: INTEGER`
 879
 880 `structure Int4: INTEGER`
 881
 882 ...
 883
 884 `structure Int31: INTEGER`
 885
 886 `structure Int32: INTEGER`
 887
 888 `structure Int64: INTEGER`
 889
 890 `structure IntArray: MONO_ARRAY`
 891
 892 `structure IntArray2: MONO_ARRAY2`
 893
 894 `structure IntArraySlice: MONO_ARRAY_SLICE`
 895
 896 `structure IntVector: MONO_VECTOR`
 897
 898 `structure IntVectorSlice: MONO_VECTOR_SLICE`
 899
 900 `structure Int8: INTEGER`
 901
 902 `structure Int8Array: MONO_ARRAY`
 903
 904 `structure Int8Array2: MONO_ARRAY2`
 905
 906 `structure Int8ArraySlice: MONO_ARRAY_SLICE`
 907
 908 `structure Int8Vector: MONO_VECTOR`
 909
 910 `structure Int8VectorSlice: MONO_VECTOR_SLICE`
 911
 912 `structure Int16: INTEGER`
 913
 914 `structure Int16Array: MONO_ARRAY`
 915
 916 `structure Int16Array2: MONO_ARRAY2`
 917
 918 `structure Int16ArraySlice: MONO_ARRAY_SLICE`
 919
 920 `structure Int16Vector: MONO_VECTOR`
 921
 922 `structure Int16VectorSlice: MONO_VECTOR_SLICE`
 923
 924 `structure Int32: INTEGER`
 925
 926 `structure Int32Array: MONO_ARRAY`
 927
 928 `structure Int32Array2: MONO_ARRAY2`
 929
 930 `structure Int32ArraySlice: MONO_ARRAY_SLICE`
 931
 932 `structure Int32Vector: MONO_VECTOR`
 933
 934 `structure Int32VectorSlice: MONO_VECTOR_SLICE`
 935
 936 `structure Int64Array: MONO_ARRAY`
 937
 938 `structure Int64Array2: MONO_ARRAY2`
 939
 940 `structure Int64ArraySlice: MONO_ARRAY_SLICE`
 941
 942 `structure Int64Vector: MONO_VECTOR`
 943
 944 `structure Int64VectorSlice: MONO_VECTOR_SLICE`
 945
 946 `structure IntInf: INT_INF`
 947
 948 `structure LargeInt: INTEGER`
 949
 950 `structure LargeIntArray: MONO_ARRAY`
 951
 952 `structure LargeIntArray2: MONO_ARRAY2`
 953
 954 `structure LargeIntArraySlice: MONO_ARRAY_SLICE`
 955
 956 `structure LargeIntVector: MONO_VECTOR`
 957
 958 `structure LargeIntVectorSlice: MONO_VECTOR_SLICE`
 959
 960 `structure LargeReal: REAL`
 961
 962 `structure LargeRealArray: MONO_ARRAY`
 963
 964 `structure LargeRealArray2: MONO_ARRAY2`
 965
 966 `structure LargeRealArraySlice: MONO_ARRAY_SLICE`
 967
 968 `structure LargeRealVector: MONO_VECTOR`
 969
 970 `structure LargeRealVectorSlice: MONO_VECTOR_SLICE`
 971
 972 `structure LargeWord: WORD`
 973
 974 `structure LargeWordArray: MONO_ARRAY`
 975
 976 `structure LargeWordArray2: MONO_ARRAY2`
 977
 978 `structure LargeWordArraySlice: MONO_ARRAY_SLICE`
 979
 980 `structure LargeWordVector: MONO_VECTOR`
 981
 982 `structure LargeWordVectorSlice: MONO_VECTOR_SLICE`
 983
 984 `structure List: LIST`
 985
 986 `structure ListPair: LIST_PAIR`
 987
 988 `structure Math: MATH`
 989
 990 `structure NetHostDB: NET_HOST_DB`
 991
 992 `structure NetProtDB: NET_PROT_DB`
 993
 994 `structure NetServDB: NET_SERV_DB`
 995
 996 `structure OS: OS`
 997
 998 `structure Option: OPTION`
 999
1000 `structure PackReal32Big: PACK_REAL`
1001
1002 `structure PackReal32Little: PACK_REAL`
1003
1004 `structure PackReal64Big: PACK_REAL`
1005
1006 `structure PackReal64Little: PACK_REAL`
1007
1008 `structure PackRealBig: PACK_REAL`
1009
1010 `structure PackRealLittle: PACK_REAL`
1011
1012 `structure PackWord16Big: PACK_WORD`
1013
1014 `structure PackWord16Little: PACK_WORD`
1015
1016 `structure PackWord32Big: PACK_WORD`
1017
1018 `structure PackWord32Little: PACK_WORD`
1019
1020 `structure PackWord64Big: PACK_WORD`
1021
1022 `structure PackWord64Little: PACK_WORD`
1023
1024 `structure Position: INTEGER`
1025
1026 `structure Posix: POSIX`
1027
1028 `structure Real: REAL`
1029
1030 `structure RealArray: MONO_ARRAY`
1031
1032 `structure RealArray2: MONO_ARRAY2`
1033
1034 `structure RealArraySlice: MONO_ARRAY_SLICE`
1035
1036 `structure RealVector: MONO_VECTOR`
1037
1038 `structure RealVectorSlice: MONO_VECTOR_SLICE`
1039
1040 `structure Real32: REAL`
1041
1042 `structure Real32Array: MONO_ARRAY`
1043
1044 `structure Real32Array2: MONO_ARRAY2`
1045
1046 `structure Real32ArraySlice: MONO_ARRAY_SLICE`
1047
1048 `structure Real32Vector: MONO_VECTOR`
1049
1050 `structure Real32VectorSlice: MONO_VECTOR_SLICE`
1051
1052 `structure Real64: REAL`
1053
1054 `structure Real64Array: MONO_ARRAY`
1055
1056 `structure Real64Array2: MONO_ARRAY2`
1057
1058 `structure Real64ArraySlice: MONO_ARRAY_SLICE`
1059
1060 `structure Real64Vector: MONO_VECTOR`
1061
1062 `structure Real64VectorSlice: MONO_VECTOR_SLICE`
1063
1064 `structure Socket: SOCKET`
1065
1066 * The Basis Library specification requires functions like
1067 `Socket.sendVec` to raise an exception if they fail.  However, on some
1068 platforms, sending to a socket that hasn't yet been connected causes a
1069 `SIGPIPE` signal, which invokes the default signal handler for
1070 `SIGPIPE` and causes the program to terminate.  If you want the
1071 exception to be raised, you can ignore `SIGPIPE` by adding the
1072 following to your program.
1073 +
1074 [source,sml]
1075 ----
1076 let
1077    open MLton.Signal
1078 in
1079    setHandler (Posix.Signal.pipe, Handler.ignore)
1080 end
1081 ----
1082
1083 `structure String: STRING`
1084
1085 * The `String` functions do not depend on locale.
1086
1087 `structure StringCvt: STRING_CVT`
1088
1089 `structure Substring: SUBSTRING`
1090
1091 `structure SysWord: WORD`
1092
1093 `structure Text: TEXT`
1094
1095 `structure TextIO: TEXT_IO`
1096
1097 `structure TextPrimIO: PRIM_IO`
1098
1099 `structure Time: TIME`
1100
1101 `structure Timer: TIMER`
1102
1103 `structure Unix: UNIX`
1104
1105 `structure UnixSock: UNIX_SOCK`
1106
1107 `structure Vector: VECTOR`
1108
1109 `structure VectorSlice: VECTOR_SLICE`
1110
1111 `structure Word: WORD`
1112
1113 `structure Word1: WORD`
1114
1115 `structure Word2: WORD`
1116
1117 `structure Word3: WORD`
1118
1119 `structure Word4: WORD`
1120
1121 ...
1122
1123 `structure Word31: WORD`
1124
1125 `structure Word32: WORD`
1126
1127 `structure Word64: WORD`
1128
1129 `structure WordArray: MONO_ARRAY`
1130
1131 `structure WordArray2: MONO_ARRAY2`
1132
1133 `structure WordArraySlice: MONO_ARRAY_SLICE`
1134
1135 `structure WordVectorSlice: MONO_VECTOR_SLICE`
1136
1137 `structure WordVector: MONO_VECTOR`
1138
1139 `structure Word8Array: MONO_ARRAY`
1140
1141 `structure Word8Array2: MONO_ARRAY2`
1142
1143 `structure Word8ArraySlice: MONO_ARRAY_SLICE`
1144
1145 `structure Word8Vector: MONO_VECTOR`
1146
1147 `structure Word8VectorSlice: MONO_VECTOR_SLICE`
1148
1149 `structure Word16Array: MONO_ARRAY`
1150
1151 `structure Word16Array2: MONO_ARRAY2`
1152
1153 `structure Word16ArraySlice: MONO_ARRAY_SLICE`
1154
1155 `structure Word16Vector: MONO_VECTOR`
1156
1157 `structure Word16VectorSlice: MONO_VECTOR_SLICE`
1158
1159 `structure Word32Array: MONO_ARRAY`
1160
1161 `structure Word32Array2: MONO_ARRAY2`
1162
1163 `structure Word32ArraySlice: MONO_ARRAY_SLICE`
1164
1165 `structure Word32Vector: MONO_VECTOR`
1166
1167 `structure Word32VectorSlice: MONO_VECTOR_SLICE`
1168
1169 `structure Word64Array: MONO_ARRAY`
1170
1171 `structure Word64Array2: MONO_ARRAY2`
1172
1173 `structure Word64ArraySlice: MONO_ARRAY_SLICE`
1174
1175 `structure Word64Vector: MONO_VECTOR`
1176
1177 `structure Word64VectorSlice: MONO_VECTOR_SLICE`
1178
1179 == Top-level functors ==
1180
1181 `ImperativeIO`
1182
1183 `PrimIO`
1184
1185 `StreamIO`
1186
1187 * MLton's `StreamIO` functor takes structures `ArraySlice` and
1188 `VectorSlice` in addition to the arguments specified in the Basis
1189 Library specification.
1190
1191 == Type equivalences ==
1192
1193 The following types are equivalent.
1194 ----
1195 FixedInt = Int64.int
1196 LargeInt = IntInf.int
1197 LargeReal.real = Real64.real
1198 LargeWord = Word64.word
1199 ----
1200
1201 The default `int`, `real`, and `word` types may be set by the
1202 ++-default-type __type__++ <:CompileTimeOptions: compile-time option>.
1203 By default, the following types are equivalent:
1204 ----
1205 int = Int.int = Int32.int
1206 real = Real.real = Real64.real
1207 word = Word.word = Word32.word
1208 ----
1209
1210 == Real and Math functions ==
1211
1212 The `Real`, `Real32`, and `Real64` modules are implemented
1213 using the `C` math library, so the SML functions will reflect the
1214 behavior of the underlying library function.  We have made some effort
1215 to unify the differences between the math libraries on different
1216 platforms, and in particular to handle exceptional cases according to
1217 the Basis Library specification.  However, there will be differences
1218 due to different numerical algorithms and cases we may have missed.
1219 Please submit a <:Bug:bug report> if you encounter an error in
1220 the handling of an exceptional case.
1221
1222 On x86, real arithmetic is implemented internally using 80 bits of
1223 precision.  Using higher precision for intermediate results in
1224 computations can lead to different results than if all the computation
1225 is done at 32 or 64 bits.  If you require strict IEEE compliance, you
1226 can compile with `-ieee-fp true`, which will cause intermediate
1227 results to be stored after each operation.  This may cause a
1228 substantial performance penalty.
1229
1230 <<<
1231
1232 :mlton-guide-page: Bug
1233 [[Bug]]
1234 Bug
1235 ===
1236
1237 To report a bug, please send mail to
1238 mailto:mlton-devel@mlton.org[`mlton-devel@mlton.org`].  Please include
1239 the complete SML program that caused the problem and a log of a
1240 compile of the program with `-verbose 2`.  For large programs (over
1241 256K), please send an email containing the discussion text and a link
1242 to any large files.
1243
1244 There are some <:UnresolvedBugs:> that we don't plan to fix.
1245
1246 We also maintain a list of bugs found with each release.
1247
1248 * <:Bugs20130715:>
1249 * <:Bugs20100608:>
1250 * <:Bugs20070826:>
1251 * <:Bugs20051202:>
1252 * <:Bugs20041109:>
1253
1254 <<<
1255
1256 :mlton-guide-page: Bugs20041109
1257 [[Bugs20041109]]
1258 Bugs20041109
1259 ============
1260
1261 Here are the known bugs in <:Release20041109:MLton 20041109>, listed
1262 in reverse chronological order of date reported.
1263
1264 * <!Anchor(bug17)>
1265  `MLton.Finalizable.touch` doesn't necessarily keep values alive
1266  long enough.  Our SVN has a patch to the compiler.  You must rebuild
1267  the compiler in order for the patch to take effect.
1268 +
1269 Thanks to Florian Weimer for reporting this bug.
1270
1271 * <!Anchor(bug16)>
1272  A bug in an optimization pass may incorrectly transform a program
1273  to flatten ref cells into their containing data structure, yielding a
1274  type-error in the transformed program.  Our CVS has a
1275  http://mlton.org/cgi-bin/viewcvs.cgi/mlton/mlton/mlton/ssa/ref-flatten.fun.diff?r1=1.35&r2=1.37[patch]
1276  to the compiler.  You must rebuild the compiler in order for the
1277  patch to take effect.
1278 +
1279 Thanks to <:VesaKarvonen:> for reporting this bug.
1280
1281 * <!Anchor(bug15)>
1282  A bug in the front end mistakenly allows unary constructors to be
1283  used without an argument in patterns.  For example, the following
1284  program is accepted, and triggers a large internal error.
1285 +
1286 [source,sml]
1287 ----
1288 fun f x = case x of SOME => true | _ => false
1289 ----
1290 +
1291 We have fixed the problem in our CVS.
1292 +
1293 Thanks to William Lovas for reporting this bug.
1294
1295 * <!Anchor(bug14)>
1296  A bug in `Posix.IO.{getlk,setlk,setlkw}` causes a link-time error:
1297  `undefined reference to Posix_IO_FLock_typ`
1298  Our CVS has a
1299  http://mlton.org/cgi-bin/viewcvs.cgi/mlton/mlton/basis-library/posix/primitive.sml.diff?r1=1.34&r2=1.35[patch]
1300  to the Basis Library implementation.
1301 +
1302 Thanks to Adam Chlipala for reporting this bug.
1303
1304 * <!Anchor(bug13)>
1305  A bug can cause programs compiled with `-profile alloc` to
1306  segfault.  Our CVS has a
1307  http://mlton.org/cgi-bin/viewcvs.cgi/mlton/mlton/mlton/backend/ssa-to-rssa.fun.diff?r1=1.106&r2=1.107[patch]
1308  to the compiler.  You must rebuild the compiler in order for the
1309  patch to take effect.
1310 +
1311 Thanks to John Reppy for reporting this bug.
1312
1313 * <!Anchor(bug12)>
1314  A bug in an optimization pass may incorrectly flatten ref cells
1315  into their containing data structure, breaking the sharing between
1316  the cells.  Our CVS has a
1317  http://mlton.org/cgi-bin/viewcvs.cgi/mlton/mlton/mlton/ssa/ref-flatten.fun.diff?r1=1.32&r2=1.33[patch]
1318  to the compiler.  You must rebuild the compiler in order for the
1319  patch to take effect.
1320 +
1321 Thanks to Paul Govereau for reporting this bug.
1322
1323 * <!Anchor(bug11)>
1324  Some arrays or vectors, such as `(char * char) vector`, are
1325  incorrectly implemented, and will conflate the first and second
1326  components of each element.  Our CVS has a
1327  http://mlton.org/cgi-bin/viewcvs.cgi/mlton/mlton/mlton/backend/packed-representation.fun.diff?r1=1.32&r2=1.33[patch]
1328  to the compiler.  You must rebuild the compiler in order for the
1329  patch to take effect.
1330 +
1331 Thanks to Scott Cruzen for reporting this bug.
1332
1333 * <!Anchor(bug10)>
1334  `Socket.Ctl.getLINGER` and `Socket.Ctl.setLINGER`
1335  mistakenly raise `Subscript`.
1336  Our CVS has a
1337  http://mlton.org/cgi-bin/viewcvs.cgi/mlton/mlton/basis-library/net/socket.sml.diff?r1=1.14&r2=1.15[patch]
1338  to the Basis Library implementation.
1339 +
1340 Thanks to Ray Racine for reporting the bug.
1341
1342 * <!Anchor(bug09)>
1343  <:ConcurrentML: CML> `Mailbox.send` makes a call in the wrong atomic context.
1344  Our CVS has a http://mlton.org/cgi-bin/viewcvs.cgi/mlton/mlton/lib/cml/core-cml/mailbox.sml.diff?r1=1.3&r2=1.4[patch]
1345  to the CML implementation.
1346
1347 * <!Anchor(bug08)>
1348  `OS.Path.joinDirFile` and `OS.Path.toString` did not
1349  raise `InvalidArc` when they were supposed to.  They now do.
1350  Our CVS has a http://mlton.org/cgi-bin/viewcvs.cgi/mlton/mlton/basis-library/system/path.sml.diff?r1=1.8&r2=1.11[patch]
1351  to the Basis Library implementation.
1352 +
1353 Thanks to Andreas Rossberg for reporting the bug.
1354
1355 * <!Anchor(bug07)>
1356  The front end incorrectly disallows sequences of expressions
1357  (separated by semicolons) after a topdec has already been processed.
1358  For example, the following is incorrectly rejected.
1359 +
1360 [source,sml]
1361 ----
1362 val x = 0;
1363 ignore x;
1364 ignore x;
1365 ----
1366 +
1367 We have fixed the problem in our CVS.
1368 +
1369 Thanks to Andreas Rossberg for reporting the bug.
1370
1371 * <!Anchor(bug06)>
1372  The front end incorrectly disallows expansive `val`
1373  declarations that bind a type variable that doesn't occur in the
1374  type of the value being bound.   For example, the following is
1375  incorrectly rejected.
1376 +
1377 [source,sml]
1378 ----
1379 val 'a x = let exception E of 'a in () end
1380 ----
1381 +
1382 We have fixed the problem in our CVS.
1383 +
1384 Thanks to Andreas Rossberg for reporting this bug.
1385
1386 * <!Anchor(bug05)>
1387  The x86 codegen fails to account for the possibility that a 64-bit
1388  move could interfere with itself (as simulated by 32-bit moves).  We
1389  have fixed the problem in our CVS.
1390 +
1391 Thanks to Scott Cruzen for reporting this bug.
1392
1393 * <!Anchor(bug04)>
1394  `NetHostDB.scan` and `NetHostDB.fromString` incorrectly
1395  raise an exception on internet addresses whose last component is a
1396  zero, e.g `0.0.0.0`.  Our CVS has a
1397  http://mlton.org/cgi-bin/viewcvs.cgi/mlton/mlton/basis-library/net/net-host-db.sml.diff?r1=1.12&r2=1.13[patch] to the Basis Library implementation.
1398 +
1399 Thanks to Scott Cruzen for reporting this bug.
1400
1401 * <!Anchor(bug03)>
1402  `StreamIO.inputLine` has an off-by-one error causing it to drop
1403  the first character after a newline in some situations.  Our CVS has a
1404  http://mlton.org/cgi-bin/viewcvs.cgi/mlton/mlton/basis-library/io/stream-io.fun.diff?r1=text&tr1=1.29&r2=text&tr2=1.30&diff_format=h[patch].
1405  to the Basis Library implementation.
1406 +
1407 Thanks to Scott Cruzen for reporting this bug.
1408
1409 * <!Anchor(bug02)>
1410  `BinIO.getInstream` and `TextIO.getInstream` are
1411  implemented incorrectly.  This also impacts the behavior of
1412  `BinIO.scanStream` and `TextIO.scanStream`.  If you (directly
1413  or indirectly) realize a `TextIO.StreamIO.instream` and do not
1414  (directly or indirectly) call `TextIO.setInstream` with a derived
1415  stream, you may lose input data.  We have fixed the problem in our
1416  CVS.
1417 +
1418 Thanks to <:WesleyTerpstra:> for reporting this bug.
1419
1420 * <!Anchor(bug01)>
1421  `Posix.ProcEnv.setpgid` doesn't work.  If you compile a program
1422  that uses it, you will get a link time error
1423 +
1424 ----
1425 undefined reference to `Posix_ProcEnv_setpgid'
1426 ----
1427 +
1428 The bug is due to `Posix_ProcEnv_setpgid` being omitted from the
1429  MLton runtime.  We fixed the problem in our CVS by adding the
1430  following definition to `runtime/Posix/ProcEnv/ProcEnv.c`
1431 +
1432 [source,c]
1433 ----
1434 Int Posix_ProcEnv_setpgid (Pid p, Gid g) {
1435         return setpgid (p, g);
1436 }
1437 ----
1438 +
1439 Thanks to Tom Murphy for reporting this bug.
1440
1441 <<<
1442
1443 :mlton-guide-page: Bugs20051202
1444 [[Bugs20051202]]
1445 Bugs20051202
1446 ============
1447
1448 Here are the known bugs in <:Release20051202:MLton 20051202>, listed
1449 in reverse chronological order of date reported.
1450
1451 * <!Anchor(bug16)>
1452 Bug in the http://www.standardml.org/Basis/real.html#SIG:REAL.fmt:VAL[++Real__<N>__.fmt++], http://www.standardml.org/Basis/real.html#SIG:REAL.fromString:VAL[++Real__<N>__.fromString++], http://www.standardml.org/Basis/real.html#SIG:REAL.scan:VAL[++Real__<N>__.scan++], and http://www.standardml.org/Basis/real.html#SIG:REAL.toString:VAL[++Real__<N>__.toString++] functions of the <:BasisLibrary:Basis Library> implementation.  These functions were using `TO_NEAREST` semantics, but should obey the current rounding mode.  (Only ++Real__<N>__.fmt StringCvt.EXACT++, ++Real__<N>__.fromDecimal++, and ++Real__<N>__.toDecimal++ are specified to override the current rounding mode with `TO_NEAREST` semantics.)
1453 +
1454 Thanks to Sean McLaughlin for the bug report.
1455 +
1456 Fixed by revision <!ViewSVNRev(5827)>.
1457
1458 * <!Anchor(bug15)>
1459 Bug in the treatment of floating-point operations.  Floating-point operations depend on the current rounding mode, but were being treated as pure.
1460 +
1461 Thanks to Sean McLaughlin for the bug report.
1462 +
1463 Fixed by revision <!ViewSVNRev(5794)>.
1464
1465 * <!Anchor(bug14)>
1466 Bug in the http://www.standardml.org/Basis/real.html#SIG:REAL.toInt:VAL[++Real32.toInt++] function of the <:BasisLibrary:Basis Library> implementation could lead incorrect results when applied to a `Real32.real` value numerically close to `valOf(Int.maxInt)`.
1467 +
1468 Fixed by revision <!ViewSVNRev(5764)>.
1469
1470 * <!Anchor(bug13)>
1471 The http://www.standardml.org/Basis/socket.html[++Socket++] structure of the <:BasisLibrary:Basis Library> implementation used `andb` rather than `orb` to unmarshal socket options (for ++Socket.Ctl.get__<OPT>__++ functions).
1472 +
1473 Thanks to Anders Petersson for the bug report and patch.
1474 +
1475 Fixed by revision <!ViewSVNRev(5735)>.
1476
1477 * <!Anchor(bug12)>
1478 Bug in the http://www.standardml.org/Basis/date.html[++Date++] structure of the <:BasisLibrary:Basis Library> implementation yielded some functions that would erroneously raise `Date` when applied to a year before 1900.
1479 +
1480 Thanks to Joe Hurd for the bug report.
1481 +
1482 Fixed by revision <!ViewSVNRev(5732)>.
1483
1484 * <!Anchor(bug11)>
1485 Bug in monomorphisation pass could exhibit the error `Type error: type mismatch`.
1486 +
1487 Thanks to Vesa Karvonen for the bug report.
1488 +
1489 Fixed by revision <!ViewSVNRev(5731)>.
1490
1491 * <!Anchor(bug10)>
1492 The http://www.standardml.org/Basis/pack-float.html#SIG:PACK_REAL.toBytes:VAL[++PackReal__<N>__.toBytes++] function in the <:BasisLibrary:Basis Library> implementation incorrectly shared (and mutated) the result vector.
1493 +
1494 Thanks to Eric McCorkle for the bug report and patch.
1495 +
1496 Fixed by revision <!ViewSVNRev(5281)>.
1497
1498 * <!Anchor(bug09)>
1499 Bug in elaboration of FFI forms.  Using a unary FFI types (e.g., `array`, `ref`, `vector`) in places where `MLton.Pointer.t` was required would lead to an internal error `TypeError`.
1500 +
1501 Fixed by revision <!ViewSVNRev(4890)>.
1502
1503 * <!Anchor(bug08)>
1504 The http://www.standardml.org/Basis/mono-vector.html[++MONO_VECTOR++] signature of the <:BasisLibrary:Basis Library> implementation incorrectly omits the specification of `find`.
1505 +
1506 Fixed by revision <!ViewSVNRev(4707)>.
1507
1508 * <!Anchor(bug07)>
1509 The optimizer reports an internal error (`TypeError`) when an imported C function is called but not used.
1510 +
1511 Thanks to "jq" for the bug report.
1512 +
1513 Fixed by revision <!ViewSVNRev(4690)>.
1514
1515 * <!Anchor(bug06)>
1516 Bug in pass to flatten data structures.
1517 +
1518 Thanks to Joe Hurd for the bug report.
1519 +
1520 Fixed by revision <!ViewSVNRev(4662)>.
1521
1522 * <!Anchor(bug05)>
1523 The native codegen's implementation of the C-calling convention failed to widen 16-bit arguments to 32-bits.
1524 +
1525 Fixed by revision <!ViewSVNRev(4631)>.
1526
1527 * <!Anchor(bug04)>
1528 The http://www.standardml.org/Basis/pack-float.html[++PACK_REAL++] structures of the <:BasisLibrary:Basis Library> implementation used byte, rather than element, indexing.
1529 +
1530 Fixed by revision <!ViewSVNRev(4411)>.
1531
1532 * <!Anchor(bug03)>
1533 `MLton.share` could cause a segmentation fault.
1534 +
1535 Fixed by revision <!ViewSVNRev(4400)>.
1536
1537 * <!Anchor(bug02)>
1538 The SSA simplifier could eliminate an irredundant test.
1539 +
1540 Fixed by revision <!ViewSVNRev(4370)>.
1541
1542 * <!Anchor(bug01)>
1543 A program with a very large number of functors could exhibit the error `ElaborateEnv.functorClosure: firstTycons`.
1544 +
1545 Fixed by revision <!ViewSVNRev(4344)>.
1546
1547 <<<
1548
1549 :mlton-guide-page: Bugs20070826
1550 [[Bugs20070826]]
1551 Bugs20070826
1552 ============
1553
1554 Here are the known bugs in <:Release20070826:MLton 20070826>, listed
1555 in reverse chronological order of date reported.
1556
1557 * <!Anchor(bug25)>
1558 Bug in the mark-compact garbage collector where the C library's `memcpy` was used to move objects during the compaction phase; this could lead to heap corruption and segmentation faults with newer versions of gcc and/or glibc, which assume that src and dst in a `memcpy` do not overlap.
1559 +
1560 Fixed by revision <!ViewSVNRev(7461)>.
1561
1562 * <!Anchor(bug24)>
1563 Bug in elaboration of `datatype` declarations with `withtype` bindings.
1564 +
1565 Fixed by revision <!ViewSVNRev(7434)>.
1566
1567 * <!Anchor(bug23)>
1568 Performance bug in <:RefFlatten:> optimization pass.
1569 +
1570 Thanks to Reactive Systems for the bug report.
1571 +
1572 Fixed by revision <!ViewSVNRev(7379)>.
1573
1574 * <!Anchor(bug22)>
1575 Performance bug in <:SimplifyTypes:> optimization pass.
1576 +
1577 Thanks to Reactive Systems for the bug report.
1578 +
1579 Fixed by revisions <!ViewSVNRev(7377)> and <!ViewSVNRev(7378)>.
1580
1581 * <!Anchor(bug21)>
1582 Bug in amd64 codegen register allocation of indirect C calls.
1583 +
1584 Thanks to David Hansel for the bug report.
1585 +
1586 Fixed by revision <!ViewSVNRev(7368)>.
1587
1588 * <!Anchor(bug20)>
1589 Bug in `IntInf.scan` and `IntInf.fromString` where leading spaces were only accepted if the stream had an explicit sign character.
1590 +
1591 Thanks to David Hansel for the bug report.
1592 +
1593 Fixed by revisions <!ViewSVNRev(7227)> and <!ViewSVNRev(7230)>.
1594
1595 * <!Anchor(bug19)>
1596 Bug in `IntInf.~>>` that could cause a `glibc` assertion.
1597 +
1598 Fixed by revisions <!ViewSVNRev(7083)>, <!ViewSVNRev(7084)>, and <!ViewSVNRev(7085)>.
1599
1600 * <!Anchor(bug18)>
1601 Bug in the return type of `MLton.Process.reap`.
1602 +
1603 Thanks to Risto Saarelma for the bug report.
1604 +
1605 Fixed by revision <!ViewSVNRev(7029)>.
1606
1607 * <!Anchor(bug17)>
1608 Bug in `MLton.size` and `MLton.share` when tracing the current stack.
1609 +
1610 Fixed by revisions <!ViewSVNRev(6978)>, <!ViewSVNRev(6981)>, <!ViewSVNRev(6988)>, <!ViewSVNRev(6989)>, and <!ViewSVNRev(6990)>.
1611
1612 * <!Anchor(bug16)>
1613 Bug in nested `_export`/`_import` functions.
1614 +
1615 Fixed by revision <!ViewSVNRev(6919)>.
1616
1617 * <!Anchor(bug15)>
1618 Bug in the name mangling of `_import`-ed functions with the `stdcall` convention.
1619 +
1620 Thanks to Lars Bergstrom for the bug report.
1621 +
1622 Fixed by revision <!ViewSVNRev(6672)>.
1623
1624 * <!Anchor(bug14)>
1625 Bug in Windows code to page the heap to disk when unable to grow the heap to a desired size.
1626 +
1627 Thanks to Sami Evangelista for the bug report.
1628 +
1629 Fixed by revisions <!ViewSVNRev(6600)> and <!ViewSVNRev(6624)>.
1630
1631 * <!Anchor(bug13)>
1632 Bug in \*NIX code to page the heap to disk when unable to grow the heap to a desired size.
1633 +
1634 Thanks to Nicolas Bertolotti for the bug report and patch.
1635 +
1636 Fixed by revisions <!ViewSVNRev(6596)> and <!ViewSVNRev(6600)>.
1637
1638 * <!Anchor(bug12)>
1639 Space-safety bug in pass to <:RefFlatten: flatten refs> into containing data structure.
1640 +
1641 Thanks to Daniel Spoonhower for the bug report and initial diagnosis and patch.
1642 +
1643 Fixed by revision <!ViewSVNRev(6395)>.
1644
1645 * <!Anchor(bug11)>
1646 Bug in the frontend that rejected `op longvid` patterns and expressions.
1647 +
1648 Thanks to Florian Weimer for the bug report.
1649 +
1650 Fixed by revision <!ViewSVNRev(6347)>.
1651
1652 * <!Anchor(bug10)>
1653 Bug in the http://www.standardml.org/Basis/imperative-io.html#SIG:IMPERATIVE_IO.canInput:VAL[`IMPERATIVE_IO.canInput`] function of the <:BasisLibrary:Basis Library> implementation.
1654 +
1655 Thanks to Ville Laurikari for the bug report.
1656 +
1657 Fixed by revision <!ViewSVNRev(6261)>.
1658
1659 * <!Anchor(bug09)>
1660 Bug in algebraic simplification of real primitives.  http://www.standardml.org/Basis/real.html#SIG:REAL.\|@LTE\|:VAL[++REAL__<N>__.\<=(x, x)++] is `false` when `x` is NaN.
1661 +
1662 Fixed by revision <!ViewSVNRev(6242)>.
1663
1664 * <!Anchor(bug08)>
1665 Bug in the FFI visible representation of `Int16.int ref` (and references of other primitive types smaller than 32-bits) on big-endian platforms.
1666 +
1667 Thanks to Dave Herman for the bug report.
1668 +
1669 Fixed by revision <!ViewSVNRev(6267)>.
1670
1671 * <!Anchor(bug07)>
1672 Bug in type inference of flexible records.  This would later cause the compiler to raise the `TypeError` exception.
1673 +
1674 Thanks to Wesley Terpstra for the bug report.
1675 +
1676 Fixed by revision <!ViewSVNRev(6229)>.
1677
1678 * <!Anchor(bug06)>
1679 Bug in cross-compilation of `gdtoa` library.
1680 +
1681 Thanks to Wesley Terpstra for the bug report and patch.
1682 +
1683 Fixed by revision <!ViewSVNRev(6620)>.
1684
1685 * <!Anchor(bug05)>
1686 Bug in pass to <:RefFlatten: flatten refs> into containing data structure.
1687 +
1688 Thanks to Ruy Ley-Wild for the bug report.
1689 +
1690 Fixed by revision <!ViewSVNRev(6191)>.
1691
1692 * <!Anchor(bug04)>
1693 Bug in the handling of weak pointers by the mark-compact garbage collector.
1694 +
1695 Thanks to Sean McLaughlin for the bug report and Florian Weimer for the initial diagnosis.
1696 +
1697 Fixed by revision <!ViewSVNRev(6183)>.
1698
1699 * <!Anchor(bug03)>
1700 Bug in the elaboration of structures with signature constraints.  This would later cause the compiler to raise the `TypeError` exception.
1701 +
1702 Thanks to Vesa Karvonen for the bug report.
1703 +
1704 Fixed by revision <!ViewSVNRev(6046)>.
1705
1706 * <!Anchor(bug02)>
1707 Bug in the interaction of `_export`-ed functions and signal handlers.
1708 +
1709 Thanks to Sean McLaughlin for the bug report.
1710 +
1711 Fixed by revision <!ViewSVNRev(6013)>.
1712
1713 * <!Anchor(bug01)>
1714 Bug in the implementation of `_export`-ed functions using the `char` type, leading to a linker error.
1715 +
1716 Thanks to Katsuhiro Ueno for the bug report.
1717 +
1718 Fixed by revision <!ViewSVNRev(5999)>.
1719
1720 <<<
1721
1722 :mlton-guide-page: Bugs20100608
1723 [[Bugs20100608]]
1724 Bugs20100608
1725 ============
1726
1727 Here are the known bugs in <:Release20100608:MLton 20100608>, listed
1728 in reverse chronological order of date reported.
1729
1730 * <!Anchor(bug11)>
1731 Bugs in `REAL.signBit`, `REAL.copySign`, and `REAL.toDecimal`/`REAL.fromDecimal`.
1732 +
1733 Thanks to Phil Clayton for the bug report and examples.
1734 +
1735 Fixed by revisions <!ViewSVNRev(7571)>, <!ViewSVNRev(7572)>, and <!ViewSVNRev(7573)>.
1736
1737 * <!Anchor(bug10)>
1738 Bug in elaboration of type variables with and without equality status.
1739 +
1740 Thanks to Rob Simmons for the bug report and examples.
1741 +
1742 Fixed by revision <!ViewSVNRev(7565)>.
1743
1744 * <!Anchor(bug09)>
1745 Bug in <:Redundant:redundant> <:SSA:> optimization.
1746 +
1747 Thanks to Lars Magnusson for the bug report and example.
1748 +
1749 Fixed by revision <!ViewSVNRev(7561)>.
1750
1751 * <!Anchor(bug08)>
1752 Bug in <:SSA:>/<:SSA2:> <:Shrink:shrinker> that could erroneously turn a non-tail function call with a `Bug` transfer as its continuation into a tail function call.
1753 +
1754 Thanks to Lars Bergstrom for the bug report.
1755 +
1756 Fixed by revision <!ViewSVNRev(7546)>.
1757
1758 * <!Anchor(bug07)>
1759 Bug in translation from <:SSA2:> to <:RSSA:> with `case` expressions over non-primitive-sized words.
1760 +
1761 Fixed by revision <!ViewSVNRev(7544)>.
1762
1763 * <!Anchor(bug06)>
1764 Bug with <:SSA:>/<:SSA2:> type checking of case expressions over words.
1765 +
1766 Fixed by revision <!ViewSVNRev(7542)>.
1767
1768 * <!Anchor(bug05)>
1769 Bug with treatment of `as`-patterns, which should not allow the redefinition of constructor status.
1770 +
1771 Thanks to Michael Norrish for the bug report.
1772 +
1773 Fixed by revision <!ViewSVNRev(7530)>.
1774
1775 * <!Anchor(bug04)>
1776 Bug with treatment of `nan` in <:CommonSubexp:common subexpression elimination> <:SSA:> optimization.
1777 +
1778 Thanks to Alexandre Hamez for the bug report.
1779 +
1780 Fixed by revision <!ViewSVNRev(7503)>.
1781
1782 * <!Anchor(bug03)>
1783 Bug in translation from <:SSA2:> to <:RSSA:> with weak pointers.
1784 +
1785 Thanks to Alexandre Hamez for the bug report.
1786 +
1787 Fixed by revision <!ViewSVNRev(7502)>.
1788
1789 * <!Anchor(bug02)>
1790 Bug in amd64 codegen calling convention for varargs C calls.
1791 +
1792 Thanks to <:HenryCejtin:> for the bug report and <:WesleyTerpstra:> for the initial diagnosis.
1793 +
1794 Fixed by revision <!ViewSVNRev(7501)>.
1795
1796 * <!Anchor(bug01)>
1797 Bug in comment-handling in lexer for <:MLYacc:>'s input language.
1798 +
1799 Thanks to Michael Norrish for the bug report and patch.
1800 +
1801 Fixed by revision <!ViewSVNRev(7500)>.
1802
1803 * <!Anchor(bug00)>
1804 Bug in elaboration of function clauses with different numbers of arguments that would raise an uncaught `Subscript` exception.
1805 +
1806 Fixed by revision <!ViewSVNRev(75497)>.
1807
1808 <<<
1809
1810 :mlton-guide-page: Bugs20130715
1811 [[Bugs20130715]]
1812 Bugs20130715
1813 ============
1814
1815 Here are the known bugs in <:Release20130715:MLton 20130715>, listed
1816 in reverse chronological order of date reported.
1817
1818 * <!Anchor(bug06)>
1819 Bug with simultaneous `sharing` of multiple structures.
1820 +
1821 Fixed by commit <!ViewGitCommit(mlton,9cb5164f6)>.
1822
1823 * <!Anchor(bug05)>
1824 Minor bug with exception replication.
1825 +
1826 Fixed by commit <!ViewGitCommit(mlton,1c89c42f6)>.
1827
1828 * <!Anchor(bug04)>
1829 Minor bug erroneously accepting symbolic identifiers for strid, sigid, and fctid
1830 and erroneously accepting symbolic identifiers before `.` in long identifiers.
1831 +
1832 Fixed by commit <!ViewGitCommit(mlton,9a56be647)>.
1833
1834 * <!Anchor(bug03)>
1835 Minor bug in precedence parsing of function clauses.
1836 +
1837 Fixed by commit <!ViewGitCommit(mlton,1a6d25ec9)>.
1838
1839 * <!Anchor(bug02)>
1840 Performance bug in creation of worker threads to service calls of `_export`-ed
1841 functions.
1842 +
1843 Thanks to Bernard Berthomieu for the bug report.
1844 +
1845 Fixed by commit <!ViewGitCommit(mlton,97c2bdf1d)>.
1846
1847 * <!Anchor(bug01)>
1848 Bug in `MLton.IntInf.fromRep` that could yield values that violate the `IntInf`
1849 representation invariants.
1850 +
1851 Thanks to Rob Simmons for the bug report.
1852 +
1853 Fixed by commit <!ViewGitCommit(mlton,3add91eda)>.
1854
1855 * <!Anchor(bug00)>
1856 Bug in equality status of some arrays, vectors, and slices in Basis Library
1857 implementation.
1858 +
1859 Fixed by commit <!ViewGitCommit(mlton,a7ed9cbf1)>.
1860
1861 <<<
1862
1863 :mlton-guide-page: Bugs20180207
1864 [[Bugs20180207]]
1865 Bugs20180207
1866 ============
1867
1868 Here are the known bugs in <:Release20180207:MLton 20180207>, listed
1869 in reverse chronological order of date reported.
1870
1871 <<<
1872
1873 :mlton-guide-page: CallGraph
1874 [[CallGraph]]
1875 CallGraph
1876 =========
1877
1878 For easier visualization of <:Profiling:profiling> data, `mlprof` can
1879 create a call graph of the program in dot format, from which you can
1880 use the http://www.research.att.com/sw/tools/graphviz/[graphviz]
1881 software package to create a PostScript or PNG graph.  For example,
1882 ----
1883 mlprof -call-graph foo.dot foo mlmon.out
1884 ----
1885 will create `foo.dot` with a complete call graph.  For each source
1886 function, there will be one node in the graph that contains the
1887 function name (and source position with `-show-line true`), as
1888 well as the percentage of ticks.  If you want to create a call graph
1889 for your program without any profiling data, you can simply call
1890 `mlprof` without any `mlmon.out` files, as in
1891 ----
1892 mlprof -call-graph foo.dot foo
1893 ----
1894
1895 Because SML has higher-order functions, the call graph is is dependent
1896 on MLton's analysis of which functions call each other.  This analysis
1897 depends on many implementation details and might display spurious
1898 edges that a human could conclude are impossible.  However, in
1899 practice, the call graphs tend to be very accurate.
1900
1901 Because call graphs can get big, `mlprof` provides the `-keep` option
1902 to specify the nodes that you would like to see.  This option also
1903 controls which functions appear in the table that `mlprof` prints.
1904 The argument to `-keep` is an expression describing a set of source
1905 functions (i.e. graph nodes).  The expression _e_ should be of the
1906 following form.
1907
1908 * ++all++
1909 * ++"__s__"++
1910 * ++(and __e ...__)++
1911 * ++(from __e__)++
1912 * ++(not __e__)++
1913 * ++(or __e__)++
1914 * ++(pred __e__)++
1915 * ++(succ __e__)++
1916 * ++(thresh __x__)++
1917 * ++(thresh-gc __x__)++
1918 * ++(thresh-stack __x__)++
1919 * ++(to __e__)++
1920
1921 In the grammar, ++all++ denotes the set of all nodes.  ++"__s__"++ is
1922 a regular expression denoting the set of functions whose name
1923 (followed by a space and the source position) has a prefix matching
1924 the regexp.  The `and`, `not`, and `or` expressions denote
1925 intersection, complement, and union, respectively.  The `pred` and
1926 `succ` expressions add the set of immediate predecessors or successors
1927 to their argument, respectively.  The `from` and `to` expressions
1928 denote the set of nodes that have paths from or to the set of nodes
1929 denoted by their arguments, respectively.  Finally, `thresh`,
1930 `thresh-gc`, and `thresh-stack` denote the set of nodes whose
1931 percentage of ticks, gc ticks, or stack ticks, respectively, is
1932 greater than or equal to the real number _x_.
1933
1934 For example, if you want to see the entire call graph for a program,
1935 you can use `-keep all` (this is the default).  If you want to see
1936 all nodes reachable from function `foo` in your program, you would
1937 use `-keep '(from "foo")'`.  Or, if you want to see all the
1938 functions defined in subdirectory `bar` of your project that used
1939 at least 1% of the ticks, you would use
1940 ----
1941 -keep '(and ".*/bar/" (thresh 1.0))'
1942 ----
1943 To see all functions with ticks above a threshold, you can also use
1944 `-thresh x`, which is an abbreviation for `-keep '(thresh x)'`.  You
1945 can not use multiple `-keep` arguments or both `-keep` and `-thresh`.
1946 When you use `-keep` to display a subset of the functions, `mlprof`
1947 will add dashed edges to the call graph to indicate a path in the
1948 original call graph from one function to another.
1949
1950 When compiling with `-profile-stack true`, you can use `mlprof -gray
1951 true` to make the nodes darker or lighter depending on whether their
1952 stack percentage is higher or lower.
1953
1954 MLton's optimizer may duplicate source functions for any of a number
1955 of reasons (functor duplication, monomorphisation, polyvariance,
1956 inlining).  By default, all duplicates of a function are treated as
1957 one.  If you would like to treat the duplicates separately, you can
1958 use ++mlprof -split __regexp__++, which will cause all duplicates of
1959 functions whose name has a prefix matching the regular expression to
1960 be treated separately.  This can be especially useful for higher-order
1961 utility functions like `General.o`.
1962
1963 == Caveats ==
1964
1965 Technically speaking, `mlprof` produces a call-stack graph rather than
1966 a call graph, because it describes the set of possible call stacks.
1967 The difference is in how tail calls are displayed.  For example if `f`
1968 nontail calls `g` and `g` tail calls `h`, then the call-stack graph
1969 has edges from `f` to `g` and `f` to `h`, while the call graph has
1970 edges from `f` to `g` and `g` to `h`.  That is, a tail call from `g`
1971 to `h` removes `g` from the call stack and replaces it with `h`.
1972
1973 <<<
1974
1975 :mlton-guide-page: CallingFromCToSML
1976 [[CallingFromCToSML]]
1977 CallingFromCToSML
1978 =================
1979
1980 MLton's <:ForeignFunctionInterface:> allows programs to _export_ SML
1981 functions to be called from C.  Suppose you would like export from SML
1982 a function of type `real * char -> int` as the C function `foo`.
1983 MLton extends the syntax of SML to allow expressions like the
1984 following:
1985 ----
1986 _export "foo": (real * char -> int) -> unit;
1987 ----
1988 The above expression exports a C function named `foo`, with
1989 prototype
1990 [source,c]
1991 ----
1992 Int32 foo (Real64 x0, Char x1);
1993 ----
1994 The `_export` expression denotes a function of type
1995 `(real * char -> int) -> unit` that when called with a function
1996 `f`, arranges for the exported `foo` function to call `f`
1997 when `foo` is called.  So, for example, the following exports and
1998 defines `foo`.
1999 [source,sml]
2000 ----
2001 val e = _export "foo": (real * char -> int) -> unit;
2002 val _ = e (fn (x, c) => 13 + Real.floor x + Char.ord c)
2003 ----
2004
2005 The general form of an `_export` expression is
2006 ----
2007 _export "C function name" attr... : cFuncTy -> unit;
2008 ----
2009 The type and the semicolon are not optional.  As with `_import`, a
2010 sequence of attributes may follow the function name.
2011
2012 MLton's `-export-header` option generates a C header file with
2013 prototypes for all of the functions exported from SML.  Include this
2014 header file in your C files to type check calls to functions exported
2015 from SML.  This header file includes ++typedef++s for the
2016 <:ForeignFunctionInterfaceTypes: types that can be passed between SML and C>.
2017
2018
2019 == Example ==
2020
2021 Suppose that `export.sml` is
2022
2023 [source,sml]
2024 ----
2025 sys::[./bin/InclGitFile.py mlton master doc/examples/ffi/export.sml]
2026 ----
2027
2028 Note that the the `reentrant` attribute is used for `_import`-ing the
2029 C functions that will call the `_export`-ed SML functions.
2030
2031 Create the header file with `-export-header`.
2032 ----
2033 % mlton -default-ann 'allowFFI true'    \
2034         -export-header export.h         \
2035         -stop tc                        \
2036         export.sml
2037 ----
2038
2039 `export.h` now contains the following C prototypes.
2040 ----
2041 Int8 f (Int32 x0, Real64 x1, Int8 x2);
2042 Pointer f2 (Word8 x0);
2043 void f3 ();
2044 void f4 (Int32 x0);
2045 extern Int32 zzz;
2046 ----
2047
2048 Use `export.h` in a C program, `ffi-export.c`, as follows.
2049
2050 [source,c]
2051 ----
2052 sys::[./bin/InclGitFile.py mlton master doc/examples/ffi/ffi-export.c]
2053 ----
2054
2055 Compile `ffi-export.c` and `export.sml`.
2056 ----
2057 % gcc -c ffi-export.c
2058 % mlton -default-ann 'allowFFI true' \
2059          export.sml ffi-export.o
2060 ----
2061
2062 Finally, run `export`.
2063 ----
2064 % ./export
2065 g starting
2066 ...
2067 g4 (0)
2068 success
2069 ----
2070
2071
2072 == Download ==
2073 * <!RawGitFile(mlton,master,doc/examples/ffi/export.sml)>
2074 * <!RawGitFile(mlton,master,doc/examples/ffi/ffi-export.c)>
2075
2076 <<<
2077
2078 :mlton-guide-page: CallingFromSMLToC
2079 [[CallingFromSMLToC]]
2080 CallingFromSMLToC
2081 =================
2082
2083 MLton's <:ForeignFunctionInterface:> allows an SML program to _import_
2084 C functions.  Suppose you would like to import from C a function with
2085 the following prototype:
2086 [source,c]
2087 ----
2088 int foo (double d, char c);
2089 ----
2090 MLton extends the syntax of SML to allow expressions like the following:
2091 ----
2092 _import "foo": real * char -> int;
2093 ----
2094 This expression denotes a function of type `real * char -> int` whose
2095 behavior is implemented by calling the C function whose name is `foo`.
2096 Thinking in terms of C, imagine that there are C variables `d` of type
2097 `double`, `c` of type `unsigned char`, and `i` of type `int`.  Then,
2098 the C statement `i = foo (d, c)` is executed and `i` is returned.
2099
2100 The general form of an `_import` expression is:
2101 ----
2102 _import "C function name" attr... : cFuncTy;
2103 ----
2104 The type and the semicolon are not optional.
2105
2106 The function name is followed by a (possibly empty) sequence of
2107 attributes, analogous to C `__attribute__` specifiers.
2108
2109
2110 == Example ==
2111
2112 `import.sml` imports the C function `ffi` and the C variable `FFI_INT`
2113 as follows.
2114
2115 [source,sml]
2116 ----
2117 sys::[./bin/InclGitFile.py mlton master doc/examples/ffi/import.sml]
2118 ----
2119
2120 `ffi-import.c` is
2121
2122 [source,c]
2123 ----
2124 sys::[./bin/InclGitFile.py mlton master doc/examples/ffi/ffi-import.c]
2125 ----
2126
2127 Compile and run the program.
2128 ----
2129 % mlton -default-ann 'allowFFI true' -export-header export.h  import.sml ffi-import.c
2130 % ./import
2131 13
2132 success
2133 ----
2134
2135
2136 == Download ==
2137 * <!RawGitFile(mlton,master,doc/examples/ffi/import.sml)>
2138 * <!RawGitFile(mlton,master,doc/examples/ffi/ffi-import.c)>
2139
2140
2141 == Next Steps ==
2142
2143 * <:CallingFromSMLToCFunctionPointer:>
2144
2145 <<<
2146
2147 :mlton-guide-page: CallingFromSMLToCFunctionPointer
2148 [[CallingFromSMLToCFunctionPointer]]
2149 CallingFromSMLToCFunctionPointer
2150 ================================
2151
2152 Just as MLton can <:CallingFromSMLToC:directly call C functions>, it
2153 is possible to make indirect function calls; that is, function calls
2154 through a function pointer.  MLton extends the syntax of SML to allow
2155 expressions like the following:
2156 ----
2157 _import * : MLton.Pointer.t -> real * char -> int;
2158 ----
2159 This expression denotes a function of type
2160 [source,sml]
2161 ----
2162 MLton.Pointer.t -> real * char -> int
2163 ----
2164 whose behavior is implemented by calling the C function at the address
2165 denoted by the `MLton.Pointer.t` argument, and supplying the C
2166 function two arguments, a `double` and an `int`.  The C function
2167 pointer may be obtained, for example, by the dynamic linking loader
2168 (`dlopen`, `dlsym`, ...).
2169
2170 The general form of an indirect `_import` expression is:
2171 ----
2172 _import * attr... : cPtrTy -> cFuncTy;
2173 ----
2174 The type and the semicolon are not optional.
2175
2176
2177 == Example ==
2178
2179 This example uses `dlopen` and friends (imported using normal
2180 `_import`) to dynamically load the math library (`libm`) and call the
2181 `cos` function. Suppose `iimport.sml` contains the following.
2182
2183 [source,sml]
2184 ----
2185 sys::[./bin/InclGitFile.py mlton master doc/examples/ffi/iimport.sml]
2186 ----
2187
2188 Compile and run `iimport.sml`.
2189 ----
2190 % mlton -default-ann 'allowFFI true'    \
2191         -target-link-opt linux -ldl     \
2192         -target-link-opt solaris -ldl   \
2193          iimport.sml
2194 % iimport
2195     Math.cos(2.0) = ~0.416146836547
2196 libm.so::cos(2.0) = ~0.416146836547
2197 ----
2198
2199 This example also shows the `-target-link-opt` option, which uses the
2200 switch when linking only when on the specified platform.  Compile with
2201 `-verbose 1` to see in more detail what's being passed to `gcc`.
2202
2203
2204 == Download ==
2205 * <!RawGitFile(mlton,master,doc/examples/ffi/iimport.sml)>
2206
2207 <<<
2208
2209 :mlton-guide-page: CCodegen
2210 [[CCodegen]]
2211 CCodegen
2212 ========
2213
2214 The <:CCodegen:> is a <:Codegen:code generator> that translates the
2215 <:Machine:> <:IntermediateLanguage:> to C, which is further optimized
2216 and compiled to native object code by `gcc` (or another C compiler).
2217
2218 == Implementation ==
2219
2220 * <!ViewGitFile(mlton,master,mlton/codegen/c-codegen/c-codegen.sig)>
2221 * <!ViewGitFile(mlton,master,mlton/codegen/c-codegen/c-codegen.fun)>
2222
2223 == Details and Notes ==
2224
2225 The <:CCodegen:> is the original <:Codegen:code generator> for MLton.
2226
2227 <<<
2228
2229 :mlton-guide-page: Changelog
2230 [[Changelog]]
2231 Changelog
2232 =========
2233
2234 * <!ViewGitFile(mlton,master,CHANGELOG.adoc)>
2235
2236 ----
2237 sys::[./bin/InclGitFile.py mlton master CHANGELOG.adoc]
2238 ----
2239
2240 <<<
2241
2242 :mlton-guide-page: ChrisClearwater
2243 [[ChrisClearwater]]
2244 ChrisClearwater
2245 ===============
2246
2247 {empty}
2248
2249 <<<
2250
2251 :mlton-guide-page: Chunkify
2252 [[Chunkify]]
2253 Chunkify
2254 ========
2255
2256 <:Chunkify:> is an analysis pass for the <:RSSA:>
2257 <:IntermediateLanguage:>, invoked from <:ToMachine:>.
2258
2259 == Description ==
2260
2261 It partitions all the labels (function and block) in an <:RSSA:>
2262 program into disjoint sets, referred to as chunks.
2263
2264 == Implementation ==
2265
2266 * <!ViewGitFile(mlton,master,mlton/backend/chunkify.sig)>
2267 * <!ViewGitFile(mlton,master,mlton/backend/chunkify.fun)>
2268
2269 == Details and Notes ==
2270
2271 Breaking large <:RSSA:> functions into chunks is necessary for
2272 reasonable compile times with the <:CCodegen:> and the <:LLVMCodegen:>.
2273
2274 <<<
2275
2276 :mlton-guide-page: CKitLibrary
2277 [[CKitLibrary]]
2278 CKitLibrary
2279 ===========
2280
2281 The http://www.smlnj.org/doc/ckit[ckit Library] is a C front end
2282 written in SML that translates C source code (after preprocessing)
2283 into abstract syntax represented as a set of SML datatypes.  The ckit
2284 Library is distributed with SML/NJ.  Due to differences between SML/NJ
2285 and MLton, this library will not work out-of-the box with MLton.
2286
2287 As of 20180119, MLton includes a port of the ckit Library synchronized
2288 with SML/NJ version 110.82.
2289
2290 == Usage ==
2291
2292 * You can import the ckit Library into an MLB file with:
2293 +
2294 [options="header"]
2295 |=====
2296 |MLB file|Description
2297 |`$(SML_LIB)/ckit-lib/ckit-lib.mlb`|
2298 |=====
2299
2300 * If you are porting a project from SML/NJ's <:CompilationManager:> to
2301 MLton's <:MLBasis: ML Basis system> using `cm2mlb`, note that the
2302 following map is included by default:
2303 +
2304 ----
2305 # ckit Library
2306 $ckit-lib.cm                            $(SML_LIB)/ckit-lib
2307 $ckit-lib.cm/ckit-lib.cm                $(SML_LIB)/ckit-lib/ckit-lib.mlb
2308 ----
2309 +
2310 This will automatically convert a `$/ckit-lib.cm` import in an input
2311 `.cm` file into a `$(SML_LIB)/ckit-lib/ckit-lib.mlb` import in the
2312 output `.mlb` file.
2313
2314 == Details ==
2315
2316 The following changes were made to the ckit Library, in addition to
2317 deriving the `.mlb` file from the `.cm` file:
2318
2319 * `ast/pp/pp-ast-adornment-sig.sml` (modified): Rewrote use of `signature` in `local`.
2320 * `ast/pp/pp-ast-ext-sig.sml` (modified): Rewrote use of `signature` in `local`.
2321 * `ast/type-util-sig.sml` (modified): Rewrote use of `signature` in `local`.
2322 * `parser/parse-tree-sig.sml` (modified): Rewrote use of (sequential) `withtype` in signature.
2323 * `parser/parse-tree.sml` (modified): Rewrote use of (sequential) `withtype`.
2324
2325 == Patch ==
2326
2327 * <!ViewGitFile(mlton,master,lib/ckit-lib/ckit.patch)>
2328
2329 <<<
2330
2331 :mlton-guide-page: Closure
2332 [[Closure]]
2333 Closure
2334 =======
2335
2336 A closure is a data structure that is the run-time representation of a
2337 function.
2338
2339
2340 == Typical Implementation ==
2341
2342 In a typical implementation, a closure consists of a _code pointer_
2343 (indicating what the function does) and an _environment_ containing
2344 the values of the free variables of the function.  For example, in the
2345 expression
2346
2347 [source,sml]
2348 ----
2349 let
2350    val x = 5
2351 in
2352    fn y => x + y
2353 end
2354 ----
2355
2356 the closure for `fn y => x + y` contains a pointer to a piece of code
2357 that knows to take its argument and add the value of `x` to it, plus
2358 the environment recording the value of `x` as `5`.
2359
2360 To call a function, the code pointer is extracted and jumped to,
2361 passing in some agreed upon location the environment and the argument.
2362
2363
2364 == MLton's Implementation ==
2365
2366 MLton does not implement closures traditionally.  Instead, based on
2367 whole-program higher-order control-flow analysis, MLton represents a
2368 function as an element of a sum type, where the variant indicates
2369 which function it is and carries the free variables as arguments.  See
2370 <:ClosureConvert:> and <!Cite(CejtinEtAl00)> for details.
2371
2372 <<<
2373
2374 :mlton-guide-page: ClosureConvert
2375 [[ClosureConvert]]
2376 ClosureConvert
2377 ==============
2378
2379 <:ClosureConvert:> is a translation pass from the <:SXML:>
2380 <:IntermediateLanguage:> to the <:SSA:> <:IntermediateLanguage:>.
2381
2382 == Description ==
2383
2384 It converts an <:SXML:> program into an <:SSA:> program.
2385
2386 <:Defunctionalization:> is the technique used to eliminate
2387 <:Closure:>s (see <!Cite(CejtinEtAl00)>).
2388
2389 Uses <:Globalize:> and <:LambdaFree:> analyses.
2390
2391 == Implementation ==
2392
2393 * <!ViewGitFile(mlton,master,mlton/closure-convert/closure-convert.sig)>
2394 * <!ViewGitFile(mlton,master,mlton/closure-convert/closure-convert.fun)>
2395
2396 == Details and Notes ==
2397
2398 {empty}
2399
2400 <<<
2401
2402 :mlton-guide-page: CMinusMinus
2403 [[CMinusMinus]]
2404 CMinusMinus
2405 ===========
2406
2407 http://cminusminus.org[C--] is a portable assembly language intended
2408 to make it easy for compilers for different high-level languages to
2409 share the same backend.  An experimental version of MLton has been
2410 made to generate C--.
2411
2412 * http://www.mlton.org/pipermail/mlton/2005-March/026850.html
2413
2414 == Also see ==
2415
2416  * <:LLVM:>
2417
2418 <<<
2419
2420 :mlton-guide-page: Codegen
2421 [[Codegen]]
2422 Codegen
2423 =======
2424
2425 <:Codegen:> is a translation pass from the <:Machine:>
2426 <:IntermediateLanguage:> to one or more compilation units that can be
2427 compiled to native object code by an external tool.
2428
2429 == Implementation ==
2430
2431 * <!ViewGitDir(mlton,master,mlton/codegen)>
2432
2433 == Details and Notes ==
2434
2435 The following <:Codegen:codegens> are implemented:
2436
2437 * <:AMD64Codegen:>
2438 * <:CCodegen:>
2439 * <:LLVMCodegen:>
2440 * <:X86Codegen:>
2441
2442 <<<
2443
2444 :mlton-guide-page: CombineConversions
2445 [[CombineConversions]]
2446 CombineConversions
2447 ==================
2448
2449 <:CombineConversions:> is an optimization pass for the <:SSA:>
2450 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
2451
2452 == Description ==
2453
2454 This pass looks for and simplifies nested calls to (signed)
2455 extension/truncation.
2456
2457 == Implementation ==
2458
2459 * <!ViewGitFile(mlton,master,mlton/ssa/combine-conversions.fun)>
2460
2461 == Details and Notes ==
2462
2463 It processes each block in dfs order (visiting definitions before uses):
2464
2465 * If the statement is not a `PrimApp` with `Word_extdToWord`, skip it.
2466 * After processing a conversion, it tags the `Var` for subsequent use.
2467 * When inspecting a conversion, check if the `Var` operand is also the
2468 result of a conversion. If it is, try to combine the two operations.
2469 Repeatedly simplify until hitting either a non-conversion `Var` or a
2470 case where the conversion cannot be simplified.
2471
2472 The optimization rules are very simple:
2473 ----
2474 x1 = ...
2475 x2 = Word_extdToWord (W1, W2, {signed=s1}) x1
2476 x3 = Word_extdToWord (W2, W3, {signed=s2}) x2
2477 ----
2478
2479 * If `W1 = W2`, then there is no conversions before `x_1`.
2480 +
2481 This is guaranteed because `W2 = W3` will always trigger optimization.
2482
2483 * Case `W1 <= W3 <= W2`:
2484 +
2485 ----
2486 x3 = Word_extdToWord (W1, W3, {signed=s1}) x1
2487 ----
2488
2489 * Case `W1 <  W2 <  W3  AND  ((NOT s1) OR s2)`:
2490 +
2491 ----
2492 x3 = Word_extdToWord (W1, W3, {signed=s1}) x1
2493 ----
2494
2495 * Case `W1 =  W2 <  W3`:
2496 +
2497 unoptimized, because there are no conversions past `W1` and `x2 = x1`
2498
2499 * Case `W3 <= W2 <= W1  OR  W3 <= W1 <= W2`:
2500 +
2501 ----
2502 x_3 = Word_extdToWord (W1, W3, {signed=_}) x1
2503 ----
2504 +
2505 because `W3 <= W1 && W3 <= W2`, just clip `x1`
2506
2507 * Case `W2 < W1 <= W3  OR  W2 < W3 <= W1`:
2508 +
2509 unoptimized, because `W2 < W1 && W2 < W3`, has truncation effect
2510
2511 * Case `W1 < W2 < W3  AND  (s1 AND (NOT s2))`:
2512 +
2513 unoptimized, because each conversion affects the result separately
2514
2515 <<<
2516
2517 :mlton-guide-page: CommonArg
2518 [[CommonArg]]
2519 CommonArg
2520 =========
2521
2522 <:CommonArg:> is an optimization pass for the <:SSA:>
2523 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
2524
2525 == Description ==
2526
2527 It optimizes instances of `Goto` transfers that pass the same
2528 arguments to the same label; e.g.
2529 ----
2530 L_1 ()
2531   ...
2532   z1 = ?
2533   ...
2534   L_3 (x, y, z1)
2535 L_2 ()
2536   ...
2537   z2 = ?
2538   ...
2539   L_3 (x, y, z2)
2540 L_3 (a, b, c)
2541   ...
2542 ----
2543
2544 This code can be simplified to:
2545 ----
2546 L_1 ()
2547   ...
2548   z1 = ?
2549   ...
2550   L_3 (z1)
2551 L_2 ()
2552   ...
2553   z2 = ?
2554   ...
2555   L_3 (z2)
2556 L_3 (c)
2557   a = x
2558   b = y
2559 ----
2560 which saves a number of resources: time of setting up the arguments
2561 for the jump to `L_3`, space (either stack or pseudo-registers) for
2562 the arguments of `L_3`, etc.  It may also expose some other
2563 optimizations, if more information is known about `x` or `y`.
2564
2565 == Implementation ==
2566
2567 * <!ViewGitFile(mlton,master,mlton/ssa/common-arg.fun)>
2568
2569 == Details and Notes ==
2570
2571 Three analyses were originally proposed to drive the optimization
2572 transformation.  Only the _Dominator Analysis_ is currently
2573 implemented.  (Implementations of the other analyses are available in
2574 the <:Sources:repository history>.)
2575
2576 === Syntactic Analysis ===
2577
2578 The simplest analysis I could think of maintains
2579 ----
2580 varInfo: Var.t -> Var.t option list ref
2581 ----
2582 initialized to `[]`.
2583
2584 * For each variable `v` bound in a `Statement.t` or in the
2585 `Function.t` args, then `List.push(varInfo v, NONE)`.
2586 * For each `L (x1, ..., xn)` transfer where `(a1, ..., an)` are the
2587 formals of `L`, then `List.push(varInfo ai, SOME xi)`.
2588 * For each block argument a used in an unknown context (e.g.,
2589 arguments of blocks used as continuations, handlers, arith success,
2590 runtime return, or case switch labels), then
2591 `List.push(varInfo a, NONE)`.
2592
2593 Now, any block argument `a` such that `varInfo a = xs`, where all of
2594 the elements of `xs` are equal to `SOME x`, can be optimized by
2595 setting `a = x` at the beginning of the block and dropping the
2596 argument from `Goto` transfers.
2597
2598 That takes care of the example above.  We can clearly do slightly
2599 better, by changing the transformation criteria to the following: any
2600 block argument a such that `varInfo a = xs`, where all of the elements
2601 of `xs` are equal to `SOME x` _or_ are equal to `SOME a`, can be
2602 optimized by setting `a = x` at the beginning of the block and
2603 dropping the argument from `Goto` transfers.  This optimizes a case
2604 like:
2605 ----
2606 L_1 ()
2607   ... z1 = ? ...
2608   L_3 (x, y, z1)
2609 L_2 ()
2610   ... z2 = ? ...
2611   L_3(x, y, z2)
2612 L_3 (a, b, c)
2613   ... w = ? ...
2614   case w of
2615     true => L_4 | false => L_5
2616 L_4 ()
2617    ...
2618    L_3 (a, b, w)
2619 L_5 ()
2620    ...
2621 ----
2622 where a common argument is passed to a loop (and is invariant through
2623 the loop).  Of course, the <:LoopInvariant:> optimization pass would
2624 normally introduce a local loop and essentially reduce this to the
2625 first example, but I have seen this in practice, which suggests that
2626 some optimizations after <:LoopInvariant:> do enough simplifications
2627 to introduce (new) loop invariant arguments.
2628
2629 === Fixpoint Analysis ===
2630
2631 However, the above analysis and transformation doesn't cover the cases
2632 where eliminating one common argument exposes the opportunity to
2633 eliminate other common arguments.  For example:
2634 ----
2635 L_1 ()
2636   ...
2637   L_3 (x)
2638 L_2 ()
2639   ...
2640   L_3 (x)
2641 L_3 (a)
2642   ...
2643   L_5 (a)
2644 L_4 ()
2645   ...
2646   L_5 (x)
2647 L_5 (b)
2648   ...
2649 ----
2650
2651 One pass of analysis and transformation would eliminate the argument
2652 to `L_3` and rewrite the `L_5(a)` transfer to `L_5 (x)`, thereby
2653 exposing the opportunity to eliminate the common argument to `L_5`.
2654
2655 The interdependency the arguments to `L_3` and `L_5` suggest
2656 performing some sort of fixed-point analysis.  This analysis is
2657 relatively simple; maintain
2658 ----
2659 varInfo: Var.t -> VarLattice.t
2660 ----
2661 {empty}where
2662 ----
2663 VarLattice.t ~=~ Bot | Point of Var.t | Top
2664 ----
2665 (but is implemented by the <:FlatLattice:> functor with a `lessThan`
2666 list and `value ref` under the hood), initialized to `Bot`.
2667
2668 * For each variable `v` bound in a `Statement.t` or in the
2669 `Function.t` args, then `VarLattice.<= (Point v, varInfo v)`
2670 * For each `L (x1, ..., xn)` transfer where `(a1, ..., an)` are the
2671 formals of `L`}, then `VarLattice.<= (varInfo xi, varInfo ai)`.
2672 * For each block argument a used in an unknown context, then
2673 `VarLattice.<= (Point a, varInfo a)`.
2674
2675 Now, any block argument a such that `varInfo a = Point x` can be
2676 optimized by setting `a = x` at the beginning of the block and
2677 dropping the argument from `Goto` transfers.
2678
2679 Now, with the last example, we introduce the ordering constraints:
2680 ----
2681 varInfo x <= varInfo a
2682 varInfo a <= varInfo b
2683 varInfo x <= varInfo b
2684 ----
2685
2686 Assuming that `varInfo x = Point x`, then we get `varInfo a = Point x`
2687 and `varInfo b = Point x`, and we optimize the example as desired.
2688
2689 But, that is a rather weak assumption.  It's quite possible for
2690 `varInfo x = Top`.  For example, consider:
2691 ----
2692 G_1 ()
2693   ... n = 1 ...
2694   L_0 (n)
2695 G_2 ()
2696   ... m = 2 ...
2697   L_0 (m)
2698 L_0 (x)
2699   ...
2700 L_1 ()
2701   ...
2702   L_3 (x)
2703 L_2 ()
2704   ...
2705   L_3 (x)
2706 L_3 (a)
2707   ...
2708   L_5(a)
2709 L_4 ()
2710   ...
2711   L_5(x)
2712 L_5 (b)
2713    ...
2714 ----
2715
2716 Now `varInfo x = varInfo a = varInfo b = Top`.  What went wrong here?
2717 When `varInfo x` went to `Top`, it got propagated all the way through
2718 to `a` and `b`, and prevented the elimination of any common arguments.
2719 What we'd like to do instead is when `varInfo x` goes to `Top`,
2720 propagate on `Point x` -- we have no hope of eliminating `x`, but if
2721 we hold `x` constant, then we have a chance of eliminating arguments
2722 for which `x` is passed as an actual.
2723
2724 === Dominator Analysis ===
2725
2726 Does anyone see where this is going yet?  Pausing for a little
2727 thought, <:MatthewFluet:> realized that he had once before tried
2728 proposing this kind of "fix" to a fixed-point analysis -- when we were
2729 first investigating the <:Contify:> optimization in light of John
2730 Reppy's CWS paper.  Of course, that "fix" failed because it defined a
2731 non-monotonic function and one couldn't take the fixed point.  But,
2732 <:StephenWeeks:> suggested a dominator based approach, and we were
2733 able to show that, indeed, the dominator analysis subsumed both the
2734 previous call based analysis and the cont based analysis.  And, a
2735 moment's reflection reveals further parallels: when
2736 `varInfo: Var.t -> Var.t option list ref`, we have something analogous
2737 to the call analysis, and when `varInfo: Var.t -> VarLattice.t`, we
2738 have something analogous to the cont analysis.  Maybe there is
2739 something analogous to the dominator approach (and therefore superior
2740 to the previous analyses).
2741
2742 And this turns out to be the case.  Construct the graph `G` as follows:
2743 ----
2744 nodes(G) = {Root} U Var.t
2745 edges(G) = {Root -> v | v bound in a Statement.t or
2746                                 in the Function.t args} U
2747            {xi -> ai | L(x1, ..., xn) transfer where (a1, ..., an)
2748                                       are the formals of L} U
2749            {Root -> a | a is a block argument used in an unknown context}
2750 ----
2751
2752 Let `idom(x)` be the immediate dominator of `x` in `G` with root
2753 `Root`.  Now, any block argument a such that `idom(a) = x <> Root` can
2754 be optimized by setting `a = x` at the beginning of the block and
2755 dropping the argument from `Goto` transfers.
2756
2757 Furthermore, experimental evidence suggests (and we are confident that
2758 a formal presentation could prove) that the dominator analysis
2759 subsumes the "syntactic" and "fixpoint" based analyses in this context
2760 as well and that the dominator analysis gets "everything" in one go.
2761
2762 === Final Thoughts ===
2763
2764 I must admit, I was rather surprised at this progression and final
2765 result.  At the outset, I never would have thought of a connection
2766 between <:Contify:> and <:CommonArg:> optimizations.  They would seem
2767 to be two completely different optimizations.  Although, this may not
2768 really be the case.  As one of the reviewers of the ICFP paper said:
2769 ____
2770 I understand that such a form of CPS might be convenient in some
2771 cases, but when we're talking about analyzing code to detect that some
2772 continuation is constant, I think it makes a lot more sense to make
2773 all the continuation arguments completely explicit.
2774
2775 I believe that making all the continuation arguments explicit will
2776 show that the optimization can be generalized to eliminating constant
2777 arguments, whether continuations or not.
2778 ____
2779
2780 What I think the common argument optimization shows is that the
2781 dominator analysis does slightly better than the reviewer puts it: we
2782 find more than just constant continuations, we find common
2783 continuations.  And I think this is further justified by the fact that
2784 I have observed common argument eliminate some `env_X` arguments which
2785 would appear to correspond to determining that while the closure being
2786 executed isn't constant it is at least the same as the closure being
2787 passed elsewhere.
2788
2789 At first, I was curious whether or not we had missed a bigger picture
2790 with the dominator analysis.  When we wrote the contification paper, I
2791 assumed that the dominator analysis was a specialized solution to a
2792 specialized problem; we never suggested that it was a technique suited
2793 to a larger class of analyses.  After initially finding a connection
2794 between <:Contify:> and <:CommonArg:> (and thinking that the only
2795 connection was the technique), I wondered if the dominator technique
2796 really was applicable to a larger class of analyses.  That is still a
2797 question, but after writing up the above, I'm suspecting that the
2798 "real story" is that the dominator analysis is a solution to the
2799 common argument optimization, and that the <:Contify:> optimization is
2800 specializing <:CommonArg:> to the case of continuation arguments (with
2801 a different transformation at the end).  (Note, a whole-program,
2802 inter-procedural common argument analysis doesn't really make sense
2803 (in our <:SSA:> <:IntermediateLanguage:>), because the only way of
2804 passing values between functions is as arguments.  (Unless of course
2805 in the case that the common argument is also a constant argument, in
2806 which case <:ConstantPropagation:> could lift it to a global.)  The
2807 inter-procedural <:Contify:> optimization works out because there we
2808 move the function to the argument.)
2809
2810 Anyways, it's still unclear to me whether or not the dominator based
2811 approach solves other kinds of problems.
2812
2813 === Phase Ordering ===
2814
2815 On the downside, the optimization doesn't have a huge impact on
2816 runtime, although it does predictably saved some code size.  I stuck
2817 it in the optimization sequence after <:Flatten:> and (the third round
2818 of) <:LocalFlatten:>, since it seems to me that we could have cases
2819 where some components of a tuple used as an argument are common, but
2820 the whole tuple isn't.  I think it makes sense to add it after
2821 <:IntroduceLoops:> and <:LoopInvariant:> (even though <:CommonArg:>
2822 get some things that <:LoopInvariant:> gets, it doesn't get all of
2823 them).  I also think that it makes sense to add it before
2824 <:CommonSubexp:>, since identifying variables could expose more common
2825 subexpressions.  I would think a similar thought applies to
2826 <:RedundantTests:>.
2827
2828 <<<
2829
2830 :mlton-guide-page: CommonBlock
2831 [[CommonBlock]]
2832 CommonBlock
2833 ===========
2834
2835 <:CommonBlock:> is an optimization pass for the <:SSA:>
2836 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
2837
2838 == Description ==
2839
2840 It eliminates equivalent blocks in a <:SSA:> function.  The
2841 equivalence criteria requires blocks to have no arguments or
2842 statements and transfer via `Raise`, `Return`, or `Goto` of a single
2843 global variable.
2844
2845 == Implementation ==
2846
2847 * <!ViewGitFile(mlton,master,mlton/ssa/common-block.fun)>
2848
2849 == Details and Notes ==
2850
2851 * Rewrites
2852 +
2853 ----
2854 L_X ()
2855   raise (global_Y)
2856 ----
2857 +
2858 to
2859 +
2860 ----
2861 L_X ()
2862   L_Y' ()
2863 ----
2864 +
2865 and adds
2866 +
2867 ----
2868 L_Y' ()
2869   raise (global_Y)
2870 ----
2871 +
2872 to the <:SSA:> function.
2873
2874 * Rewrites
2875 +
2876 ----
2877 L_X ()
2878   return (global_Y)
2879 ----
2880 +
2881 to
2882 +
2883 ----
2884 L_X ()
2885   L_Y' ()
2886 ----
2887 +
2888 and adds
2889 +
2890 ----
2891 L_Y' ()
2892   return (global_Y)
2893 ----
2894 +
2895 to the <:SSA:> function.
2896
2897 * Rewrites
2898 +
2899 ----
2900 L_X ()
2901   L_Z (global_Y)
2902 ----
2903 +
2904 to
2905 +
2906 ----
2907 L_X ()
2908   L_Y' ()
2909 ----
2910 +
2911 and adds
2912 +
2913 ----
2914 L_Y' ()
2915   L_Z (global_Y)
2916 ----
2917 +
2918 to the <:SSA:> function.
2919
2920 The <:Shrink:> pass rewrites all uses of `L_X` to `L_Y'` and drops `L_X`.
2921
2922 For example, all uncaught `Overflow` exceptions in a <:SSA:> function
2923 share the same raising block.
2924
2925 <<<
2926
2927 :mlton-guide-page: CommonSubexp
2928 [[CommonSubexp]]
2929 CommonSubexp
2930 ============
2931
2932 <:CommonSubexp:> is an optimization pass for the <:SSA:>
2933 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
2934
2935 == Description ==
2936
2937 It eliminates instances of common subexpressions.
2938
2939 == Implementation ==
2940
2941 * <!ViewGitFile(mlton,master,mlton/ssa/common-subexp.fun)>
2942
2943 == Details and Notes ==
2944
2945 In addition to getting the usual sorts of things like
2946
2947 * {empty}
2948 +
2949 ----
2950 (w + 0wx1) + (w + 0wx1)
2951 ----
2952 +
2953 rewritten to
2954 +
2955 ----
2956 let val w' = w + 0wx1 in w' + w' end
2957 ----
2958
2959 it also gets things like
2960
2961 * {empty}
2962 +
2963 ----
2964 val a = Array_uninit n
2965 val b = Array_length a
2966 ----
2967 +
2968 rewritten to
2969 +
2970 ----
2971 val a = Array_uninit n
2972 val b = n
2973 ----
2974
2975 `Arith` transfers are handled specially.  The _result_ of an `Arith`
2976 transfer can be used in _common_ `Arith` transfers that it dominates:
2977
2978 * {empty}
2979 +
2980 ----
2981 val l = (n + m) + (n + m)
2982
2983 val k = (l + n) + ((l + m) handle Overflow => ((l + m)
2984                                                handle Overflow => l + n))
2985 ----
2986 +
2987 is rewritten so that `(n + m)` is computed exactly once, as are
2988 `(l + n)` and `(l + m)`.
2989
2990 <<<
2991
2992 :mlton-guide-page: CompilationManager
2993 [[CompilationManager]]
2994 CompilationManager
2995 ==================
2996
2997 The http://www.smlnj.org/doc/CM/index.html[Compilation Manager] (CM) is SML/NJ's mechanism for supporting programming-in-the-very-large.
2998
2999 == Porting SML/NJ CM files to MLton ==
3000
3001 To help in porting CM files to MLton, the MLton source distribution
3002 includes the sources for a utility, `cm2mlb`, that will print an
3003 <:MLBasis: ML Basis> file with essentially the same semantics as the
3004 CM file -- handling the full syntax of CM supported by your installed
3005 SML/NJ version and correctly handling export filters.  When `cm2mlb`
3006 encounters a `.cm` import, it attempts to convert it to a
3007 corresponding `.mlb` import.  CM anchored paths are translated to
3008 paths according to a default configuration file
3009 (<!ViewGitFile(mlton,master,util/cm2mlb/cm2mlb-map)>). For example,
3010 the default configuration includes
3011 ----
3012 # Standard ML Basis Library
3013 $SMLNJ-BASIS                            $(SML_LIB)/basis
3014 $basis.cm                               $(SML_LIB)/basis
3015 $basis.cm/basis.cm                      $(SML_LIB)/basis/basis.mlb
3016 ----
3017 to ensure that a `$/basis.cm` import is translated to a
3018 `$(SML_LIB)/basis/basis.mlb` import.  See `util/cm2mlb` for details.
3019 Building `cm2mlb` requires that you have already installed a recent
3020 version of SML/NJ.
3021
3022 <<<
3023
3024 :mlton-guide-page: CompilerOverview
3025 [[CompilerOverview]]
3026 CompilerOverview
3027 ================
3028
3029 The following table shows the overall structure of the compiler.
3030 <:IntermediateLanguage:>s are shown in the center column.  The names
3031 of compiler passes are listed in the left and right columns.
3032
3033 [align="center",witdth="50%",cols="^,^,^"]
3034 |====
3035 3+^| *Compiler Overview*
3036 | _Translation Passes_ | _<:IntermediateLanguage:>_ | _Optimization Passes_
3037 |                      | Source                     |
3038 | <:FrontEnd:>         |                            |
3039 |                      | <:AST:>                    |
3040 | <:Elaborate:>        |                            |
3041 |                      | <:CoreML:>                 | <:CoreMLSimplify:>
3042 | <:Defunctorize:>     |                            |
3043 |                      | <:XML:>                    | <:XMLSimplify:>
3044 | <:Monomorphise:>     |                            |
3045 |                      | <:SXML:>                   | <:SXMLSimplify:>
3046 | <:ClosureConvert:>   |                            |
3047 |                      | <:SSA:>                    | <:SSASimplify:>
3048 | <:ToSSA2:>           |                            |
3049 |                      | <:SSA2:>                   | <:SSA2Simplify:>
3050 | <:ToRSSA:>           |                            |
3051 |                      | <:RSSA:>                   | <:RSSASimplify:>
3052 | <:ToMachine:>        |                            |
3053 |                      | <:Machine:>                |
3054 | <:Codegen:>          |                            |
3055 |====
3056
3057 The `Compile` functor (<!ViewGitFile(mlton,master,mlton/main/compile.sig)>,
3058 <!ViewGitFile(mlton,master,mlton/main/compile.fun)>), controls the
3059 high-level view of the compiler passes, from <:FrontEnd:> to code
3060 generation.
3061
3062 <<<
3063
3064 :mlton-guide-page: CompilerPassTemplate
3065 [[CompilerPassTemplate]]
3066 CompilerPassTemplate
3067 ====================
3068
3069 An analysis pass for the <:ZZZ:> <:IntermediateLanguage:>, invoked from <:ZZZOtherPass:>.
3070 An implementation pass for the <:ZZZ:> <:IntermediateLanguage:>, invoked from <:ZZZSimplify:>.
3071 An optimization pass for the <:ZZZ:> <:IntermediateLanguage:>, invoked from <:ZZZSimplify:>.
3072 A rewrite pass for the <:ZZZ:> <:IntermediateLanguage:>, invoked from <:ZZZOtherPass:>.
3073 A translation pass from the <:ZZA:> <:IntermediateLanguage:> to the <:ZZB:> <:IntermediateLanguage:>.
3074
3075 == Description ==
3076
3077 A short description of the pass.
3078
3079 == Implementation ==
3080
3081 * <!ViewGitFile(mlton,master,mlton/ZZZ.fun)>
3082
3083 == Details and Notes ==
3084
3085 Relevant details and notes.
3086
3087 <<<
3088
3089 :mlton-guide-page: CompileTimeOptions
3090 [[CompileTimeOptions]]
3091 CompileTimeOptions
3092 ==================
3093
3094 MLton's compile-time options control the name of the output file, the
3095 verbosity of compile-time messages, and whether or not certain
3096 optimizations are performed.  They also can specify which intermediate
3097 files are saved and can stop the compilation process early, at some
3098 intermediate pass, in which case compilation can be resumed by passing
3099 the generated files to MLton.  MLton uses the input file suffix to
3100 determine the type of input program.  The possibilities are `.c`,
3101 `.mlb`, `.o`, `.s`, and `.sml`.
3102
3103 With no arguments, MLton prints the version number and exits.  For a
3104 usage message, run MLton with an invalid switch, e.g.  `mlton -z`.  In
3105 the explanation below and in the usage message, for flags that take a
3106 number of choices (e.g. `{true|false}`), the first value listed is the
3107 default.
3108
3109
3110 == Options ==
3111
3112 * ++-align __n__++
3113 +
3114 Aligns object in memory by the specified alignment (+4+ or +8+).
3115 The default varies depending on architecture.
3116
3117 * ++-as-opt __option__++
3118 +
3119 Pass _option_ to `gcc` when compiling assembler code.  If you wish to
3120 pass an option to the assembler, you must use `gcc`'s `-Wa,` syntax.
3121
3122 * ++-cc-opt __option__++
3123 +
3124 Pass _option_ to `gcc` when compiling C code.
3125
3126 * ++-codegen {native|amd64|c|llvm|x86}++
3127 +
3128 Generate native object code via amd64 assembly, C code, LLVM code, or
3129 x86 code or C code.  With `-codegen native` (`-codegen amd64` or
3130 `-codegen x86`), MLton typically compiles more quickly and generates
3131 better code.
3132
3133 * ++-const __name__ __value__++
3134 +
3135 Set the value of a compile-time constant.  Here is a list of
3136 available constants, their default values, and what they control.
3137 +
3138 ** ++Exn.keepHistory {false|true}++
3139 +
3140 Enable `MLton.Exn.history`.  See <:MLtonExn:> for details.  There is a
3141 performance cost to setting this to `true`, both in memory usage of
3142 exceptions and in run time, because of additional work that must be
3143 performed at each exception construction, raise, and handle.
3144
3145 * ++-default-ann __ann__++
3146 +
3147 Specify default <:MLBasisAnnotations:ML Basis annotations>.  For
3148 example, `-default-ann 'warnUnused true'` causes unused variable
3149 warnings to be enabled by default.  A default is overridden by the
3150 corresponding annotation in an ML Basis file.
3151
3152 * ++-default-type __type__++
3153 +
3154 Specify the default binding for a primitive type.  For example,
3155 `-default-type word64` causes the top-level type `word` and the
3156 top-level structure `Word` in the <:BasisLibrary:Basis Library> to be
3157 equal to `Word64.word` and `Word64:WORD`, respectively.  Similarly,
3158 `-default-type intinf` causes the top-level type `int` and the
3159 top-level structure `Int` in the <:BasisLibrary:Basis Library> to be
3160 equal to `IntInf.int` and `IntInf:INTEGER`, respectively.
3161
3162 * ++-disable-ann __ann__++
3163 +
3164 Ignore the specified <:MLBasisAnnotations:ML Basis annotation> in
3165 every ML Basis file.  For example, to see _all_ match and unused
3166 warnings, compile with
3167 +
3168 ----
3169 -default-ann 'warnUnused true'
3170 -disable-ann forceUsed
3171 -disable-ann nonexhaustiveMatch
3172 -disable-ann redundantMatch
3173 -disable-ann warnUnused
3174 ----
3175
3176 * ++-export-header __file__++
3177 +
3178 Write C prototypes to _file_ for all of the functions in the program
3179 <:CallingFromCToSML:exported from SML to C>.
3180
3181 * ++-ieee-fp {false|true}++
3182 +
3183 Cause the x86 native code generator to be pedantic about following the
3184 IEEE floating point standard.  By default, it is not, because of the
3185 performance cost.  This only has an effect with `-codegen x86`.
3186
3187 * ++-inline __n__++
3188 +
3189 Set the inlining threshold used in the optimizer.  The threshold is an
3190 approximate measure of code size of a procedure.  The default is
3191 `320`.
3192
3193 * ++-keep {g|o}++
3194 +
3195 Save intermediate files.  If no `-keep` argument is given, then only
3196 the output file is saved.
3197 +
3198 [cols="^25%,<75%"]
3199 |====
3200 | `g` | generated `.c` and `.s` files passed to `gcc` and generated `.ll` files passed to `llvm-as`
3201 | `o` | object (`.o`) files
3202 |====
3203
3204 * ++-link-opt __option__++
3205 +
3206 Pass _option_ to `gcc` when linking.  You can use this to specify
3207 library search paths, e.g. `-link-opt -Lpath`, and libraries to link
3208 with, e.g., `-link-opt -lfoo`, or even both at the same time,
3209 e.g. `-link-opt '-Lpath -lfoo'`.  If you wish to pass an option to the
3210 linker, you must use `gcc`'s `-Wl,` syntax, e.g.,
3211 `-link-opt '-Wl,--export-dynamic'`.
3212
3213 * ++-llvm-as-opt __option__++
3214 +
3215 Pass _option_ to `llvm-as` when assembling (`.ll` to `.bc`) LLVM code.
3216
3217 * ++-llvm-llc-opt __option__++
3218 +
3219 Pass _option_ to `llc` when compiling (`.bc` to `.o`) LLVM code.
3220
3221 * ++-llvm-opt-opt __option__++
3222 +
3223 Pass _option_ to `opt` when optimizing (`.bc` to `.bc`) LLVM code.
3224
3225 * ++-mlb-path-map __file__++
3226 +
3227 Use _file_ as an <:MLBasisPathMap:ML Basis path map> to define
3228 additional MLB path variables.  Multiple uses of `-mlb-path-map` and
3229 `-mlb-path-var` are allowed, with variable definitions in later path
3230 maps taking precedence over earlier ones.
3231
3232 * ++-mlb-path-var __name__ __value__++
3233 +
3234 Define an additional MLB path variable.  Multiple uses of
3235 `-mlb-path-map` and `-mlb-path-var` are allowed, with variable
3236 definitions in later path maps taking precedence over earlier ones.
3237
3238 * ++-output __file__++
3239 +
3240 Specify the name of the final output file. The default name is the
3241 input file name with its suffix removed and an appropriate, possibly
3242 empty, suffix added.
3243
3244 * ++-profile {no|alloc|count|time}++
3245 +
3246 Produce an executable that gathers <:Profiling: profiling> data.  When
3247 such an executable is run, it produces an `mlmon.out` file.
3248
3249 * ++-profile-branch {false|true}++
3250 +
3251 If true, the profiler will separately gather profiling data for each
3252 branch of a function definition, `case` expression, and `if`
3253 expression.
3254
3255 * ++-profile-stack {false|true}++
3256 +
3257 If `true`, the executable will gather profiling data for all functions
3258 on the stack, not just the currently executing function.  See
3259 <:ProfilingTheStack:>.
3260
3261 * ++-profile-val {false|true}++
3262 +
3263 If `true`, the profiler will separately gather profiling data for each
3264 (expansive) `val` declaration.
3265
3266 * ++-runtime __arg__++
3267 +
3268 Pass argument to the runtime system via `@MLton`.  See
3269 <:RunTimeOptions:>.  The argument will be processed before other
3270 `@MLton` command line switches.  Multiple uses of `-runtime` are
3271 allowed, and will pass all the arguments in order.  If the same
3272 runtime switch occurs more than once, then the last setting will take
3273 effect.  There is no need to supply the leading `@MLton` or the
3274 trailing `--`; these will be supplied automatically.
3275 +
3276 An argument to `-runtime` may contain spaces, which will cause the
3277 argument to be treated as a sequence of words by the runtime.  For
3278 example the command line:
3279 +
3280 ----
3281 mlton -runtime 'ram-slop 0.4' foo.sml
3282 ----
3283 +
3284 will cause `foo` to run as if it had been called like:
3285 +
3286 ----
3287 foo @MLton ram-slop 0.4 --
3288 ----
3289 +
3290 An executable created with `-runtime stop` doesn't process any
3291 `@MLton` arguments.  This is useful to create an executable, e.g.,
3292 `echo`, that must treat `@MLton` like any other command-line argument.
3293 +
3294 ----
3295 % mlton -runtime stop echo.sml
3296 % echo @MLton --
3297 @MLton --
3298 ----
3299
3300 * ++-show-basis __file__++
3301 +
3302 Pretty print to _file_ the basis defined by the input program.  See
3303 <:ShowBasis:>.
3304
3305 * ++-show-def-use __file__++
3306 +
3307 Output def-use information to _file_.  Each identifier that is defined
3308 appears on a line, followed on subsequent lines by the position of
3309 each use.
3310
3311 * ++-stop {f|g|o|tc}++
3312 +
3313 Specify when to stop.
3314 +
3315 [cols="^25%,<75%"]
3316 |====
3317 | `f` | list of files on stdout (only makes sense when input is `foo.mlb`)
3318 | `g` | generated `.c` and `.s` files
3319 | `o` | object (`.o`) files
3320 | `tc` | after type checking
3321 |====
3322 +
3323 If you compile with `-stop g` or `-stop o`, you can resume compilation
3324 by running MLton on the generated `.c` and `.s` or `.o` files.
3325
3326 * ++-target {self|__...__}++
3327 +
3328 Generate an executable that runs on the specified platform.  The
3329 default is `self`, which means to compile for the machine that MLton
3330 is running on.  To use any other target, you must first install a
3331 <:CrossCompiling: cross compiler>.
3332
3333 * ++-target-as-opt __target__ __option__++
3334 +
3335 Like `-as-opt`, this passes _option_ to `gcc` when compliling
3336 assembler code, except it only passes _option_ when the target
3337 architecture, operating system, or arch-os pair is _target_.
3338
3339 * ++-target-cc-opt __target__ __option__++
3340 +
3341 Like `-cc-opt`, this passes _option_ to `gcc` when compiling C code,
3342 except it only passes _option_ when the target architecture, operating
3343 system, or arch-os pair is _target_.
3344
3345 * ++-target-link-opt __target__ __option__++
3346 +
3347 Like `-link-opt`, this passes _option_ to `gcc` when linking, except
3348 it only passes _option_ when the target architecture, operating
3349 system, or arch-os pair is _target_.
3350
3351 * ++-verbose {0|1|2|3}++
3352 +
3353 How verbose to be about what passes are running.  The default is `0`.
3354 +
3355 [cols="^25%,<75%"]
3356 |====
3357 | `0` | silent
3358 | `1` | calls to compiler, assembler, and linker
3359 | `2` | 1, plus intermediate compiler passes
3360 | `3` | 2, plus some data structure sizes
3361 |====
3362
3363 <<<
3364
3365 :mlton-guide-page: CompilingWithSMLNJ
3366 [[CompilingWithSMLNJ]]
3367 CompilingWithSMLNJ
3368 ==================
3369
3370 You can compile MLton with <:SMLNJ:SML/NJ>, however the resulting
3371 compiler will run much more slowly than MLton compiled by itself.  We
3372 don't recommend using SML/NJ as a means of
3373 <:PortingMLton:porting MLton> to a new platform or bootstrapping on a
3374 new platform.
3375
3376 If you do want to build MLton with SML/NJ, it is best to have a binary
3377 MLton package installed.  If you don't, here are some issues you may
3378 encounter when you run `make smlnj-mlton`.
3379
3380 You will get (many copies of) the error messages:
3381
3382 ----
3383 /bin/sh: mlton: command not found
3384 ----
3385
3386 and
3387
3388 ----
3389 make[2]: mlton: Command not found
3390 ----
3391
3392 The `Makefile` calls `mlton` to determine dependencies, and can
3393 proceed in spite of this error.
3394
3395 If you don't have an `mllex` executable, you will get the error
3396 message:
3397
3398 ----
3399 mllex: Command not found
3400 ----
3401
3402 Building MLton requires `mllex` and `mlyacc` executables, which are
3403 distributed with a binary package of MLton.  The easiest solution is
3404 to copy the front-end lexer/parser files from a different machine
3405 (`ml.grm.sml`, `ml.grm.sig`, `ml.lex.sml`, `mlb.grm.sig`,
3406 `mlb.grm.sml`).
3407
3408 <<<
3409
3410 :mlton-guide-page: ConcurrentML
3411 [[ConcurrentML]]
3412 ConcurrentML
3413 ============
3414
3415 http://cml.cs.uchicago.edu/[Concurrent ML] is an SML concurrency
3416 library based on synchronous message passing.  MLton has an initial
3417 port of CML from SML/NJ, but is missing a thread-safe wrapper around
3418 the Basis Library and event-based equivalents to `IO` and `OS`
3419 functions.
3420
3421 All of the core CML functionality is present.
3422
3423 [source,sml]
3424 ----
3425 structure CML: CML
3426 structure SyncVar: SYNC_VAR
3427 structure Mailbox: MAILBOX
3428 structure Multicast: MULTICAST
3429 structure SimpleRPC: SIMPLE_RPC
3430 structure RunCML: RUN_CML
3431 ----
3432
3433 The `RUN_CML` signature is minimal.
3434
3435 [source,sml]
3436 ----
3437 signature RUN_CML =
3438    sig
3439       val isRunning: unit -> bool
3440       val doit: (unit -> unit) * Time.time option -> OS.Process.status
3441       val shutdown: OS.Process.status -> 'a
3442    end
3443 ----
3444
3445 MLton's `RunCML` structure does not include all of the cleanup and
3446 logging operations of SML/NJ's `RunCML` structure.  However, the
3447 implementation does include the `CML.timeOutEvt` and `CML.atTimeEvt`
3448 functions, and a preemptive scheduler that knows to sleep when there
3449 are no ready threads and some threads blocked on time events.
3450
3451 Because MLton does not wrap the Basis Library for CML, the "right" way
3452 to call a Basis Library function that is stateful is to wrap the call
3453 with `MLton.Thread.atomically`.
3454
3455 == Usage ==
3456
3457 * You can import the CML Library into an MLB file with:
3458 +
3459 [options="header"]
3460 |=====
3461 |MLB file|Description
3462 |`$(SML_LIB)/cml/cml.mlb`|
3463 |====
3464
3465 * If you are porting a project from SML/NJ's <:CompilationManager:> to
3466 MLton's <:MLBasis: ML Basis system> using `cm2mlb`, note that the
3467 following map is included by default:
3468 +
3469 ----
3470 # CML Library
3471 $cml                                    $(SML_LIB)/cml
3472 $cml/cml.cm                             $(SML_LIB)/cml/cml.mlb
3473 ----
3474 +
3475 This will automatically convert a `$cml/cml.cm` import in an input `.cm` file into a `$(SML_LIB)/cml/cml.mlb` import in the output `.mlb` file.
3476
3477 == Also see ==
3478
3479 * <:ConcurrentMLImplementation:>
3480 * <:eXene:>
3481
3482 <<<
3483
3484 :mlton-guide-page: ConcurrentMLImplementation
3485 [[ConcurrentMLImplementation]]
3486 ConcurrentMLImplementation
3487 ==========================
3488
3489 Here are some notes on MLton's implementation of <:ConcurrentML:>.
3490
3491 Concurrent ML was originally implemented for SML/NJ.  It was ported to
3492 MLton in the summer of 2004.  The main difference between the
3493 implementations is that SML/NJ uses continuations to implement CML
3494 threads, while MLton uses its underlying <:MLtonThread:thread>
3495 package.  Presently, MLton's threads are a little more heavyweight
3496 than SML/NJ's continuations, but it's pretty clear that there is some
3497 fat there that could be trimmed.
3498
3499 The implementation of CML in SML/NJ is built upon the first-class
3500 continuations of the `SMLofNJ.Cont` module.
3501 [source,sml]
3502 ----
3503 type 'a cont
3504 val callcc: ('a cont -> 'a) -> 'a
3505 val isolate: ('a -> unit) -> 'a cont
3506 val throw: 'a cont -> 'a -> 'b
3507 ----
3508
3509 The implementation of CML in MLton is built upon the first-class
3510 threads of the <:MLtonThread:> module.
3511 [source,sml]
3512 ----
3513 type 'a t
3514 val new: ('a -> unit) -> 'a t
3515 val prepare: 'a t * 'a -> Runnable.t
3516 val switch: ('a t -> Runnable.t) -> 'a
3517 ----
3518
3519 The port is relatively straightforward, because CML always throws to a
3520 continuation at most once.  Hence, an "abstract" implementation of
3521 CML could be built upon first-class one-shot continuations, which map
3522 equally well to SML/NJ's continuations and MLton's threads.
3523
3524 The "essence" of the port is to transform:
3525 ----
3526 callcc (fn k => ... throw k' v')
3527 ----
3528 {empty}to
3529 ----
3530 switch (fn t => ... prepare (t', v'))
3531 ----
3532 which suffices for the vast majority of the CML implementation.
3533
3534 There was only one complicated transformation: blocking multiple base
3535 events.  In SML/NJ CML, the representation of base events is given by:
3536 [source,sml]
3537 ----
3538 datatype 'a event_status
3539   = ENABLED of {prio: int, doFn: unit -> 'a}
3540   | BLOCKED of {
3541         transId: trans_id ref,
3542         cleanUp: unit -> unit,
3543         next: unit -> unit
3544       } -> 'a
3545 type 'a base_evt = unit -> 'a event_status
3546 ----
3547
3548 When synchronizing on a set of base events, which are all blocked, we
3549 must invoke each `BLOCKED` function with the same `transId` and
3550 `cleanUp` (the `transId` is (checked and) set to `CANCEL` by the
3551 `cleanUp` function, which is invoked by the first enabled event; this
3552 "fizzles" every other event in the synchronization group that later
3553 becomes enabled).  However, each `BLOCKED` function is implemented by
3554 a callcc, so that when the event is enabled, it throws back to the
3555 point of synchronization.  Hence, the next function (which doesn't
3556 return) is invoked by the `BLOCKED` function to escape the callcc and
3557 continue in the thread performing the synchronization.  In SML/NJ this
3558 is implemented as follows:
3559 [source,sml]
3560 ----
3561 fun ext ([], blockFns) = callcc (fn k => let
3562       val throw = throw k
3563       val (transId, setFlg) = mkFlg()
3564       fun log [] = S.atomicDispatch ()
3565         | log (blockFn:: r) =
3566             throw (blockFn {
3567                 transId = transId,
3568                 cleanUp = setFlg,
3569                 next = fn () => log r
3570               })
3571       in
3572         log blockFns; error "[log]"
3573       end)
3574 ----
3575 (Note that `S.atomicDispatch` invokes the continuation of the next
3576 continuation on the ready queue.)  This doesn't map well to the MLton
3577 thread model.  Although it follows the
3578 ----
3579 callcc (fn k => ... throw k v)
3580 ----
3581 model, the fact that `blockFn` will also attempt to do
3582 ----
3583 callcc (fn k' => ... next ())
3584 ----
3585 means that the naive transformation will result in nested `switch`-es.
3586
3587 We need to think a little more about what this code is trying to do.
3588 Essentially, each `blockFn` wants to capture this continuation, hold
3589 on to it until the event is enabled, and continue with next; when the
3590 event is enabled, before invoking the continuation and returning to
3591 the synchronization point, the `cleanUp` and other event specific
3592 operations are performed.
3593
3594 To accomplish the same effect in the MLton thread implementation, we
3595 have the following:
3596 [source,sml]
3597 ----
3598 datatype 'a status =
3599    ENABLED of {prio: int, doitFn: unit -> 'a}
3600  | BLOCKED of {transId: trans_id,
3601                cleanUp: unit -> unit,
3602                next: unit -> rdy_thread} -> 'a
3603
3604 type 'a base = unit -> 'a status
3605
3606 fun ext ([], blockFns): 'a =
3607      S.atomicSwitch
3608      (fn (t: 'a S.thread) =>
3609       let
3610          val (transId, cleanUp) = TransID.mkFlg ()
3611          fun log blockFns: S.rdy_thread =
3612             case blockFns of
3613                [] => S.next ()
3614              | blockFn::blockFns =>
3615                   (S.prep o S.new)
3616                   (fn _ => fn () =>
3617                    let
3618                       val () = S.atomicBegin ()
3619                       val x = blockFn {transId = transId,
3620                                        cleanUp = cleanUp,
3621                                        next = fn () => log blockFns}
3622                    in S.switch(fn _ => S.prepVal (t, x))
3623                    end)
3624       in
3625          log blockFns
3626       end)
3627 ----
3628
3629 To avoid the nested `switch`-es, I run the `blockFn` in it's own
3630 thread, whose only purpose is to return to the synchronization point.
3631 This corresponds to the `throw (blockFn {...})` in the SML/NJ
3632 implementation.  I'm worried that this implementation might be a
3633 little expensive, starting a new thread for each blocked event (when
3634 there are only multiple blocked events in a synchronization group).
3635 But, I don't see another way of implementing this behavior in the
3636 MLton thread model.
3637
3638 Note that another way of thinking about what is going on is to
3639 consider each `blockFn` as prepending a different set of actions to
3640 the thread `t`.  It might be possible to give a
3641 `MLton.Thread.unsafePrepend`.
3642 [source,sml]
3643 ----
3644 fun unsafePrepend (T r: 'a t, f: 'b -> 'a): 'b t =
3645    let
3646       val t =
3647          case !r of
3648             Dead => raise Fail "prepend to a Dead thread"
3649           | New g => New (g o f)
3650           | Paused (g, t) => Paused (fn h => g (f o h), t)
3651    in (* r := Dead; *)
3652       T (ref t)
3653    end
3654 ----
3655 I have commented out the `r := Dead`, which would allow multiple
3656 prepends to the same thread (i.e., not destroying the original thread
3657 in the process).  Of course, only one of the threads could be run: if
3658 the original thread were in the `Paused` state, then multiple threads
3659 would share the underlying runtime/primitive thread.  Now, this
3660 matches the "one-shot" nature of CML continuations/threads, but I'm
3661 not comfortable with extending `MLton.Thread` with such an unsafe
3662 operation.
3663
3664 Other than this complication with blocking multiple base events, the
3665 port was quite routine.  (As a very pleasant surprise, the CML
3666 implementation in SML/NJ doesn't use any SML/NJ-isms.)  There is a
3667 slight difference in the way in which critical sections are handled in
3668 SML/NJ and MLton; since `MLton.Thread.switch` _always_ leaves a
3669 critical section, it is sometimes necessary to add additional
3670 `atomicBegin`-s/`atomicEnd`-s to ensure that we remain in a critical
3671 section after a thread switch.
3672
3673 While looking at virtually every file in the core CML implementation,
3674 I took the liberty of simplifying things where it seemed possible; in
3675 terms of style, the implementation is about half-way between Reppy's
3676 original and MLton's.
3677
3678 Some changes of note:
3679
3680 * `util/` contains all pertinent data-structures: (functional and
3681 imperative) queues, (functional) priority queues.  Hence, it should be
3682 easier to switch in more efficient or real-time implementations.
3683
3684 * `core-cml/scheduler.sml`: in both implementations, this is where
3685 most of the interesting action takes place.  I've made the connection
3686 between `MLton.Thread.t`-s and `ThreadId.thread_id`-s more abstract
3687 than it is in the SML/NJ implementation, and encapsulated all of the
3688 `MLton.Thread` operations in this module.
3689
3690 * eliminated all of the "by hand" inlining
3691
3692
3693 == Future Extensions ==
3694
3695 The CML documentation says the following:
3696 ____
3697
3698 ----
3699 CML.joinEvt: thread_id -> unit event
3700 ----
3701
3702 * `joinEvt tid`
3703 +
3704 creates an event value for synchronizing on the termination of the
3705 thread with the ID tid.  There are three ways that a thread may
3706 terminate: the function that was passed to spawn (or spawnc) may
3707 return; it may call the exit function, or it may have an uncaught
3708 exception.  Note that `joinEvt` does not distinguish between these
3709 cases; it also does not become enabled if the named thread deadlocks
3710 (even if it is garbage collected).
3711 ____
3712
3713 I believe that the `MLton.Finalizable` might be able to relax that
3714 last restriction.  Upon the creation of a `'a Scheduler.thread`, we
3715 could attach a finalizer to the underlying `'a MLton.Thread.t` that
3716 enables the `joinEvt` (in the associated `ThreadID.thread_id`) when
3717 the `'a MLton.Thread.t` becomes unreachable.
3718
3719 I don't know why CML doesn't have
3720 ----
3721 CML.kill: thread_id -> unit
3722 ----
3723 which has a fairly simple implementation -- setting a kill flag in the
3724 `thread_id` and adjusting the scheduler to discard any killed threads
3725 that it takes off the ready queue.  The fairness of the scheduler
3726 ensures that a killed thread will eventually be discarded.  The
3727 semantics are little murky for blocked threads that are killed,
3728 though.  For example, consider a thread blocked on `SyncVar.mTake mv`
3729 and a thread blocked on `SyncVar.mGet mv`.  If the first thread is
3730 killed while blocked, and a third thread does `SyncVar.mPut (mv, x)`,
3731 then we might expect that we'll enable the second thread, and never
3732 the first.  But, when only the ready queue is able to discard killed
3733 threads, then the `SyncVar.mPut` could enable the first thread
3734 (putting it on the ready queue, from which it will be discarded) and
3735 leave the second thread blocked.  We could solve this by adjusting the
3736 `TransID.trans_id types` and the "cleaner" functions to look for both
3737 canceled transactions and transactions on killed threads.
3738
3739 John Reppy says that <!Cite(MarlowEtAl01)> and <!Cite(FlattFindler04)>
3740 explain why `CML.kill` would be a bad idea.
3741
3742 Between `CML.timeOutEvt` and `CML.kill`, one could give an efficient
3743 solution to the recent `comp.lang.ml` post about terminating a
3744 function that doesn't complete in a given time.
3745 [source,sml]
3746 ----
3747   fun timeOut (f: unit -> 'a, t: Time.time): 'a option =
3748     let
3749        val iv = SyncVar.iVar ()
3750        val tid = CML.spawn (fn () => SyncVar.iPut (iv, f ()))
3751     in
3752        CML.select
3753        [CML.wrap (CML.timeOutEvt t, fn () => (CML.kill tid; NONE)),
3754         CML.wrap (SyncVar.iGetEvt iv, fn x => SOME x)]
3755     end
3756 ----
3757
3758
3759 == Space Safety ==
3760
3761 There are some CML related posts on the MLton mailing list:
3762
3763 * http://www.mlton.org/pipermail/mlton/2004-May/
3764
3765 that discuss concerns that SML/NJ's implementation is not space
3766 efficient, because multi-shot continuations can be held indefinitely
3767 on event queues.  MLton is better off because of the one-shot nature
3768 -- when an event enables a thread, all other copies of the thread
3769 waiting in other event queues get turned into dead threads (of zero
3770 size).
3771
3772 <<<
3773
3774 :mlton-guide-page: ConstantPropagation
3775 [[ConstantPropagation]]
3776 ConstantPropagation
3777 ===================
3778
3779 <:ConstantPropagation:> is an optimization pass for the <:SSA:>
3780 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
3781
3782 == Description ==
3783
3784 This is whole-program constant propagation, even through data
3785 structures.  It also performs globalization of (small) values computed
3786 once.
3787
3788 Uses <:Multi:>.
3789
3790 == Implementation ==
3791
3792 * <!ViewGitFile(mlton,master,mlton/ssa/constant-propagation.fun)>
3793
3794 == Details and Notes ==
3795
3796 {empty}
3797
3798 <<<
3799
3800 :mlton-guide-page: Contact
3801 [[Contact]]
3802 Contact
3803 =======
3804
3805 == Mailing lists ==
3806
3807 There are three mailing lists available.
3808
3809 * mailto:MLton-user@mlton.org[`MLton-user@mlton.org`]
3810 +
3811 MLton user community discussion
3812 +
3813 --
3814 * https://lists.sourceforge.net/lists/listinfo/mlton-user[subscribe]
3815 https://sourceforge.net/mailarchive/forum.php?forum_name=mlton-user[archive (SourceForge; current)],
3816 http://www.mlton.org/pipermail/mlton-user/[archive (PiperMail; through 201110)]
3817 --
3818
3819 * mailto:MLton-devel@mlton.org[`MLton-devel@mlton.org`]
3820 +
3821 MLton developer community discussion
3822 +
3823 --
3824 * https://lists.sourceforge.net/lists/listinfo/mlton-devel[subscribe]
3825 https://sourceforge.net/mailarchive/forum.php?forum_name=mlton-devel[archive (SourceForge; current)],
3826 http://www.mlton.org/pipermail/mlton-devel/[archive (PiperMail; through 201110)]
3827 --
3828
3829 * mailto:MLton-commit@mlton.org[`MLton-commit@mlton.org`]
3830 +
3831 MLton code commits
3832 +
3833 --
3834 * https://lists.sourceforge.net/lists/listinfo/mlton-commit[subscribe]
3835 * https://sourceforge.net/mailarchive/forum.php?forum_name=mlton-commit[archive (SourceForge; current)],
3836 http://www.mlton.org/pipermail/mlton-commit/[archive (PiperMail; through 201110)]
3837 --
3838
3839
3840 === Mailing list policies ===
3841
3842 * Both mailing lists are unmoderated.  However, the mailing lists are
3843 configured to discard all spam, to hold all non-subscriber posts
3844 for moderation, to accept all subscriber posts, and to admin approve
3845 subscription requests.  Please contact
3846 mailto:matthew.fluet@gmail.com[Matthew Fluet] if it appears that your
3847 messages are being discarded as spam.
3848
3849 * Large messages (over 256K) should not be sent.  Rather, please send
3850 an email containing the discussion text and a link to any large files.
3851
3852 /////
3853 * Very active mailto:MLton-devel@mlton.org[`MLton@mlton.org`] list
3854 members who might otherwise be expected to provide a fast response
3855 should send a message when they will be offline for more than a few
3856 days.  The convention is to put
3857 "++__userid__ offline until __date__++" in the subject line to make it
3858 easy to scan.
3859 /////
3860
3861 * Discussions started on the mailing lists should stay on the mailing
3862 lists.  Private replies may be bounced to the mailing list for the
3863 benefit of those following the discussion.
3864
3865 * Discussions started on
3866 mailto:MLton-user@mlton.org[`MLton-user@mlton.org`] may be migrated to
3867 mailto:MLton-devel@mlton.org[`MLton-devel@mlton.org`], particularly
3868 when the discussion shifts from how to use MLton to how to modify
3869 MLton (e.g., to fix a bug identified by the initial discussion).
3870
3871 == IRC ==
3872
3873 * Some MLton developers and users are in channel `#sml` on http://freenode.net.
3874
3875 <<<
3876
3877 :mlton-guide-page: Contify
3878 [[Contify]]
3879 Contify
3880 =======
3881
3882 <:Contify:> is an optimization pass for the <:SSA:>
3883 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
3884
3885 == Description ==
3886
3887 Contification is a compiler optimization that turns a function that
3888 always returns to the same place into a continuation.  This exposes
3889 control-flow information that is required by many optimizations,
3890 including traditional loop optimizations.
3891
3892 == Implementation ==
3893
3894 * <!ViewGitFile(mlton,master,mlton/ssa/contify.fun)>
3895
3896 == Details and Notes ==
3897
3898 See <!Cite(FluetWeeks01, Contification Using Dominators)>.  The
3899 intermediate language described in that paper has since evolved to the
3900 <:SSA:> <:IntermediateLanguage:>; hence, the complication described in
3901 Section 6.1 is no longer relevant.
3902
3903 <<<
3904
3905 :mlton-guide-page: CoreML
3906 [[CoreML]]
3907 CoreML
3908 ======
3909
3910 <:CoreML:Core ML> is an <:IntermediateLanguage:>, translated from
3911 <:AST:> by <:Elaborate:>, optimized by <:CoreMLSimplify:>, and
3912 translated by <:Defunctorize:> to <:XML:>.
3913
3914 == Description ==
3915
3916 <:CoreML:> is polymorphic, higher-order, and has nested patterns.
3917
3918 == Implementation ==
3919
3920 * <!ViewGitFile(mlton,master,mlton/core-ml/core-ml.sig)>
3921 * <!ViewGitFile(mlton,master,mlton/core-ml/core-ml.fun)>
3922
3923 == Type Checking ==
3924
3925 The <:CoreML:> <:IntermediateLanguage:> has no independent type
3926 checker.
3927
3928 == Details and Notes ==
3929
3930 {empty}
3931
3932 <<<
3933
3934 :mlton-guide-page: CoreMLSimplify
3935 [[CoreMLSimplify]]
3936 CoreMLSimplify
3937 ==============
3938
3939 The single optimization pass for the <:CoreML:>
3940 <:IntermediateLanguage:> is controlled by the `Compile` functor
3941 (<!ViewGitFile(mlton,master,mlton/main/compile.fun)>).
3942
3943 The following optimization pass is implemented:
3944
3945 * <:DeadCode:>
3946
3947 <<<
3948
3949 :mlton-guide-page: Credits
3950 [[Credits]]
3951 Credits
3952 =======
3953
3954 MLton was designed and implemented by HenryCejtin,
3955 MatthewFluet, SureshJagannathan, and <:StephenWeeks:>.
3956
3957  * <:HenryCejtin:> wrote the `IntInf` implementation, the original
3958  profiler, the original man pages, the `.spec` files for the RPMs,
3959  and lots of little hacks to speed stuff up.
3960
3961  * <:MatthewFluet:> implemented the X86 and AMD64 native code generators,
3962  ported `mlprof` to work with the native code generator, did a lot
3963  of work on the SSA optimizer, both adding new optimizations and
3964  improving or porting existing optimizations, updated the
3965  <:BasisLibrary:Basis Library> implementation, ported
3966  <:ConcurrentML:> and <:MLNLFFI:ML-NLFFI> to MLton, implemented the
3967  <:MLBasis: ML Basis system>, ported MLton to 64-bit platforms,
3968  and currently leads the project.
3969
3970  * <:SureshJagannathan:> implemented some early inlining and uncurrying
3971  optimizations.
3972
3973  * <:StephenWeeks:> implemented most of the original version of MLton, and
3974  continues to keep his fingers in most every part.
3975
3976 Many people have helped us over the years.  Here is an alphabetical
3977 list.
3978
3979  * <:JesperLouisAndersen:> sent several patches to improve the runtime on
3980  FreeBSD and ported MLton to run on NetBSD and OpenBSD.
3981
3982  * <:JohnnyAndersen:> implemented `BinIO`, modified MLton so it could
3983  cross compile to MinGW, and provided useful discussion about
3984  cross-compilation.
3985
3986  * Alexander Abushkevich extended support for OpenBSD.
3987
3988  * Ross Bayer added the `-keep ast` compile-time option and experimented with
3989  porting the build system to CMake.
3990
3991  * Kevin Bradley added initial support for <:SuccessorML:> features.
3992
3993  * Bryan Camp added `-disable-pass _regex_` and `enable-pass _regex_` compile
3994  options to generalize `-drop-pass _regex_` and added `Array_copyArray` and
3995  `Array_copyVector` primitives.
3996
3997  * Jason Carr added a parser combinator library and a parser for the <:SXML:>
3998  IR, extended compilation to start with a `.sxml` file, and experimented with
3999  alternate control-flow analyses for <:ClosureConvert: closure conversion>.
4000
4001  * Christopher Cramer contributed support for additional
4002  `Posix.ProcEnv.sysconf` variables, performance improvements for
4003  `String.concatWith`, and Debian packaging.
4004
4005  * Alain Deutsch and
4006  http://www.polyspace.com/[PolySpace Technologies] provided many bug
4007  fixes and runtime system improvements, code to help the Sparc/Solaris
4008  port, and funded a number of improvements to MLton.
4009
4010  * Armando Doval updated `mlnlffigen` to warn and skip functions with
4011  `struct`/`union` arguments.
4012
4013  * Martin Elsman provided helpful discussions in the development of
4014  the <:MLBasis:ML Basis system>.
4015
4016  * Brent Fulgham ported MLton most of the way to MinGW.
4017
4018  * <:AdamGoode:> provided a script to build the PDF MLton Guide and
4019  maintains the
4020  https://admin.fedoraproject.org/pkgdb/acls/name/mlton[Fedora]
4021  packages.
4022
4023  * Simon Helsen provided bug reports, suggestions, and helpful
4024  discussions.
4025
4026  * Joe Hurd provided useful discussion and feedback on source-level
4027  profiling.
4028
4029  * <:VesaKarvonen:> contributed `esml-mode.el` and `esml-mlb-mode.el` (see <:Emacs:>),
4030  contributed patches for improving match warnings,
4031  contributed `esml-du-mlton.el` and extended def-use output to include types of variable definitions (see <:EmacsDefUseMode:>), and
4032  improved constant folding of floating-point operations.
4033
4034  * Richard Kelsey provided helpful discussions.
4035
4036  * Ville Laurikari ported MLton to IA64/HPUX, HPPA/HPUX, PowerPC/AIX, PowerPC64/AIX.
4037
4038  * Brian Leibig implemented the <:LLVMCodegen:>.
4039
4040  * Geoffrey Mainland helped with FreeBSD packaging.
4041
4042  * Eric McCorkle ported MLton to Intel Mac.
4043
4044  * <:TomMurphy:> wrote the original version of `MLton.Syslog` as part
4045  of his `mlftpd` project, and has sent many useful bug reports and
4046  suggestions.
4047
4048  * Michael Neumann helped to patch the runtime to compile under
4049  FreeBSD.
4050
4051  * Barak Pearlmutter built the original
4052  http://packages.debian.org/mlton[Debian package] for MLton, and
4053  helped us to take over the process.
4054
4055  * Filip Pizlo ported MLton to (PowerPC) Darwin.
4056
4057  * Vedant Raiththa extended the <:ForeignFunctionInterface:> with support for
4058  `pure` and `impure` attributes to `_import`.
4059
4060  * Krishna Ravikumar added initial support for vector expressions and the
4061  `Vector_vector` primitive.
4062
4063  * John Reppy assisted in porting MLton to Intel Mac.
4064
4065  * Sam Rushing ported MLton to FreeBSD.
4066
4067  * Rob Simmons refactored the array and vector implementation in the
4068  <:BasisLibrary: Basis Library:> into a primitive implementation (using
4069  `SeqInt.int` for indexing) and a wrapper implementation (using the default
4070  `Int.int` for indexing).
4071
4072  * Jeffrey Mark Siskind provided helpful discussions and inspiration
4073  with his Stalin Scheme compiler.
4074
4075  * Matthew Surawski added <:LoopUnroll:> and <:LoopUnswitch:> SSA optimizations.
4076
4077  * <:WesleyTerpstra:> added support for `MLton.Process.create`, made
4078  a number of contributions to the <:ForeignFunctionInterface:>,
4079  contributed a number of runtime system patches,
4080  added support for compiling to a <:LibrarySupport:C library>,
4081  ported MLton to http://mingw.org[MinGW] and all http://packages.debian.org/search?keywords=mlton&searchon=names&suite=all&section=all[Debian] supported architectures with <:CrossCompiling:cross-compiling> support,
4082  and maintains the http://packages.debian.org/search?keywords=mlton&searchon=names&suite=all&section=all[Debian] and http://mingw.org[MinGW] packages.
4083
4084  * Maksim Yegorov added rudimentary support for `./configure` and other
4085  improvements to the build system and implemented the <:ShareZeroVec:> SSA
4086  optimization.
4087
4088  * Luke Ziarek assisted in porting MLton to (PowerPC) Darwin.
4089
4090 We have also benefited from other software development tools and
4091 used code from other sources.
4092
4093  * MLton was developed using
4094  <:SMLNJ:Standard ML of New Jersey> and the
4095  <:CompilationManager:Compilation Manager (CM)>
4096
4097  * MLton's lexer (`mlton/frontend/ml.lex`), parser
4098  (`mlton/frontend/ml.grm`), and precedence-parser
4099  (`mlton/elaborate/precedence-parse.fun`) are modified versions of
4100  code from SML/NJ.
4101
4102  * The MLton <:BasisLibrary:Basis Library> implementation of
4103  conversions between binary and decimal representations of reals uses
4104  David Gay's http://www.netlib.org/fp/[gdtoa] library.
4105
4106  * The MLton <:BasisLibrary:Basis Library> implementation uses
4107  modified versions of  portions of the the SML/NJ Basis Library
4108  implementation modules `OS.IO`, `Posix.IO`, `Process`,
4109  and `Unix`.
4110
4111  * The MLton <:BasisLibrary:Basis Library> implementation uses
4112  modified versions of portions of the <:MLKit:ML Kit> Version 4.1.4
4113  Basis Library implementation modules `Path`, `Time`, and
4114  `Date`.
4115
4116  * Many of the benchmarks come from the SML/NJ benchmark suite.
4117
4118  * Many of the regression tests come from the ML Kit Version 4.1.4
4119  distribution, which borrowed them from the
4120  http://www.dina.kvl.dk/%7Esestoft/mosml.html[Moscow ML] distribution.
4121
4122  * MLton uses the http://www.gnu.org/software/gmp/gmp.html[GNU multiprecision library] for its implementation of `IntInf`.
4123
4124  * MLton's implementation of <:MLLex: mllex>, <:MLYacc: mlyacc>,
4125  the <:CKitLibrary:ckit Library>,
4126  the <:MLLPTLibrary:ML-LPT Library>,
4127  the <:MLRISCLibrary:MLRISC Library>,
4128  the <:SMLNJLibrary:SML/NJ Library>,
4129  <:ConcurrentML:Concurrent ML>,
4130  mlnlffigen and <:MLNLFFI:ML-NLFFI>
4131  are modified versions of code from SML/NJ.
4132
4133 <<<
4134
4135 :mlton-guide-page: CrossCompiling
4136 [[CrossCompiling]]
4137 CrossCompiling
4138 ==============
4139
4140 MLton's `-target` flag directs MLton to cross compile an application
4141 for another platform.  By default, MLton is only able to compile for
4142 the machine it is running on.  In order to use MLton as a cross
4143 compiler, you need to do two things.
4144
4145 1. Install the GCC cross-compiler tools on the host so that GCC can
4146 compile to the target.
4147
4148 2. Cross compile the MLton runtime system to build the runtime
4149 libraries for the target.
4150
4151 To make the terminology clear, we refer to the _host_ as the machine
4152 MLton is running on and the _target_ as the machine that MLton is
4153 compiling for.
4154
4155 To build a GCC cross-compiler toolset on the host, you can use the
4156 script `bin/build-cross-gcc`, available in the MLton sources, as a
4157 template.  The value of the `target` variable in that script is
4158 important, since that is what you will pass to MLton's `-target` flag.
4159 Once you have the toolset built, you should be able to test it by
4160 cross compiling a simple hello world program on your host machine.
4161 ----
4162 % gcc -b i386-pc-cygwin -o hello-world hello-world.c
4163 ----
4164
4165 You should now be able to run `hello-world` on the target machine, in
4166 this case, a Cygwin machine.
4167
4168 Next, you must cross compile the MLton runtime system and inform MLton
4169 of the availability of the new target.  The script `bin/add-cross`
4170 from the MLton sources will help you do this.  Please read the
4171 comments at the top of the script.  Here is a sample run adding a
4172 Solaris cross compiler.
4173 ----
4174 % add-cross sparc-sun-solaris sun blade
4175 Making runtime.
4176 Building print-constants executable.
4177 Running print-constants on blade.
4178 ----
4179
4180 Running `add-cross` uses `ssh` to compile the runtime on the target
4181 machine and to create `print-constants`, which prints out all of the
4182 constants that MLton needs in order to implement the
4183 <:BasisLibrary:Basis Library>.  The script runs `print-constants` on
4184 the target machine (`blade` in this case), and saves the output.
4185
4186 Once you have done all this, you should be able to cross compile SML
4187 applications.  For example,
4188 ----
4189 mlton -target i386-pc-cygwin hello-world.sml
4190 ----
4191 will create `hello-world`, which you should be able to run from a
4192 Cygwin shell on your Windows machine.
4193
4194
4195 == Cross-compiling alternatives ==
4196
4197 Building and maintaining cross-compiling `gcc`'s is complex.  You may
4198 find it simpler to use `mlton -keep g` to generate the files on the
4199 host, then copy the files to the target, and then use `gcc` or `mlton`
4200 on the target to compile the files.
4201
4202 <<<
4203
4204 :mlton-guide-page: CVS
4205 [[CVS]]
4206 CVS
4207 ===
4208
4209 http://www.gnu.org/software/cvs/[CVS] (Concurrent Versions System) is
4210 a version control system. The MLton project used CVS to maintain its
4211 <:Sources:source code>, but switched to <:Subversion:> on 20050730.
4212
4213 Here are some online CVS resources.
4214
4215 * http://cvsbook.red-bean.com/[Open Source Development with CVS]
4216
4217 <<<
4218
4219 :mlton-guide-page: DeadCode
4220 [[DeadCode]]
4221 DeadCode
4222 ========
4223
4224 <:DeadCode:> is an optimization pass for the <:CoreML:>
4225 <:IntermediateLanguage:>, invoked from <:CoreMLSimplify:>.
4226
4227 == Description ==
4228
4229 This pass eliminates declarations from the
4230 <:BasisLibrary:Basis Library> not needed by the user program.
4231
4232 == Implementation ==
4233
4234 * <!ViewGitFile(mlton,master,mlton/core-ml/dead-code.sig)>
4235 * <!ViewGitFile(mlton,master,mlton/core-ml/dead-code.fun)>
4236
4237 == Details and Notes ==
4238
4239 In order to compile small programs rapidly, a pass of dead code
4240 elimination is run in order to eliminate as much of the Basis Library
4241 as possible.  The dead code elimination algorithm used is not safe in
4242 general, and only works because the Basis Library implementation has
4243 special properties:
4244
4245 * it terminates
4246 * it performs no I/O
4247
4248 The dead code elimination includes the minimal set of
4249 declarations from the Basis Library so that there are no free
4250 variables in the user program (or remaining Basis Library
4251 implementation).  It has a special hack to include all
4252 bindings of the form:
4253 [source,sml]
4254 ----
4255  val _ = ...
4256 ----
4257
4258 There is an <:MLBasisAnnotations:ML Basis annotation>,
4259 `deadCode true`, that governs which code is subject to this unsafe
4260 dead-code elimination.
4261
4262 <<<
4263
4264 :mlton-guide-page: DeepFlatten
4265 [[DeepFlatten]]
4266 DeepFlatten
4267 ===========
4268
4269 <:DeepFlatten:> is an optimization pass for the <:SSA2:>
4270 <:IntermediateLanguage:>, invoked from <:SSA2Simplify:>.
4271
4272 == Description ==
4273
4274 This pass flattens into mutable fields of objects and into vectors.
4275
4276 For example, an `(int * int) ref` is represented by a 2 word
4277 object, and an `(int * int) array` contains pairs of `int`-s,
4278 rather than pointers to pairs of `int`-s.
4279
4280 == Implementation ==
4281
4282 * <!ViewGitFile(mlton,master,mlton/ssa/deep-flatten.fun)>
4283
4284 == Details and Notes ==
4285
4286 There are some performance issues with the deep flatten pass, where it
4287 consumes an excessive amount of memory.
4288
4289 * http://www.mlton.org/pipermail/mlton/2005-April/026990.html
4290 * http://www.mlton.org/pipermail/mlton-user/2010-June/001626.html
4291 * http://www.mlton.org/pipermail/mlton/2010-December/030876.html
4292
4293 A number of applications require compilation with
4294 `-disable-pass deepFlatten` to avoid exceeding available memory.  It is
4295 often asked whether the deep flatten pass usually has a significant
4296 impact on performance.  The standard benchmark suite was run with and
4297 without the deep flatten pass enabled when the pass was first
4298 introduced:
4299
4300 * http://www.mlton.org/pipermail/mlton/2004-August/025760.html
4301
4302 The conclusion is that it does not have a significant impact.
4303 However, these are micro benchmarks; other applications may derive
4304 greater benefit from the pass.
4305
4306 <<<
4307
4308 :mlton-guide-page: DefineTypeBeforeUse
4309 [[DefineTypeBeforeUse]]
4310 DefineTypeBeforeUse
4311 ===================
4312
4313 <:StandardML:Standard ML> requires types to be defined before they are
4314 used.  Because of type inference, the use of a type can be implicit;
4315 hence, this requirement is more subtle than it might appear.  For
4316 example, the following program is not type correct, because the type
4317 of `r` is `t option ref`, but `t` is defined after `r`.
4318
4319 [source,sml]
4320 ----
4321 val r = ref NONE
4322 datatype t = A | B
4323 val () = r := SOME A
4324 ----
4325
4326 MLton reports the following error, indicating that the type defined on
4327 line 2 is used on line 1.
4328
4329 ----
4330 Error: z.sml 3.10-3.20.
4331   Function applied to incorrect argument.
4332     expects: _ * [???] option
4333     but got: _ * [t] option
4334     in: := (r, SOME A)
4335     note: type would escape its scope: t
4336     escape from: z.sml 2.10-2.10
4337     escape to: z.sml 1.1-1.16
4338 Warning: z.sml 1.5-1.5.
4339   Type of variable was not inferred and could not be generalized: r.
4340     type: ??? option ref
4341     in: val r = ref NONE
4342 ----
4343
4344 While the above example is benign, the following example shows how to
4345 cast an integer to a function by (implicitly) using a type before it
4346 is defined.  In the example, the ref cell `r` is of type
4347 `t option ref`, where `t` is defined _after_ `r`, as a parameter to
4348 functor `F`.
4349
4350 [source,sml]
4351 ----
4352 val r = ref NONE
4353 functor F (type t
4354            val x: t) =
4355    struct
4356       val () = r := SOME x
4357       fun get () = valOf (!r)
4358    end
4359 structure S1 = F (type t = unit -> unit
4360                   val x = fn () => ())
4361 structure S2 = F (type t = int
4362                   val x = 13)
4363 val () = S1.get () ()
4364 ----
4365
4366 MLton reports the following error.
4367
4368 ----
4369 Warning: z.sml 1.5-1.5.
4370   Type of variable was not inferred and could not be generalized: r.
4371     type: ??? option ref
4372     in: val r = ref NONE
4373 Error: z.sml 5.16-5.26.
4374   Function applied to incorrect argument.
4375     expects: _ * [???] option
4376     but got: _ * [t] option
4377     in: := (r, SOME x)
4378     note: type would escape its scope: t
4379     escape from: z.sml 2.17-2.17
4380     escape to: z.sml 1.1-1.16
4381 Warning: z.sml 6.11-6.13.
4382   Type of variable was not inferred and could not be generalized: get.
4383     type: unit -> ???
4384     in: fun get () = (valOf (! r))
4385 Error: z.sml 12.10-12.18.
4386   Function not of arrow type.
4387     function: [unit]
4388     in: (S1.get ()) ()
4389 ----
4390
4391 <<<
4392
4393 :mlton-guide-page: DefinitionOfStandardML
4394 [[DefinitionOfStandardML]]
4395 DefinitionOfStandardML
4396 ======================
4397
4398 <!Cite(MilnerEtAl97, The Definition of Standard ML (Revised))> is a
4399 terse and formal specification of <:StandardML:Standard ML>'s syntax
4400 and semantics.  The language specified by this book is often referred
4401 to as SML 97. You can check its syntax
4402 http://www.mpi-sws.org/~rossberg/sml.html[grammar] online (thanks to
4403 Andreas Rossberg).
4404
4405 <!Cite(MilnerEtAl90, The Definition of Standard ML)> is an older
4406 version of the definition, published in 1990. The accompanying
4407 <!Cite(MilnerTofte91, Commentary)> introduces and explains the notation
4408 and approach. The same notation is used in the SML 97 definition, so it
4409 is worth keeping the older definition and its commentary at hand if you
4410 intend a close study of the definition.
4411
4412 <<<
4413
4414 :mlton-guide-page: Defunctorize
4415 [[Defunctorize]]
4416 Defunctorize
4417 ============
4418
4419 <:Defunctorize:> is a translation pass from the <:CoreML:>
4420 <:IntermediateLanguage:> to the <:XML:> <:IntermediateLanguage:>.
4421
4422 == Description ==
4423
4424 This pass converts a <:CoreML:> program to an <:XML:> program by
4425 performing:
4426
4427 * linearization
4428 * <:MatchCompile:>
4429 * polymorphic `val` dec expansion
4430 * `datatype` lifting (to the top-level)
4431
4432 == Implementation ==
4433
4434 * <!ViewGitFile(mlton,master,mlton/defunctorize/defunctorize.sig)>
4435 * <!ViewGitFile(mlton,master,mlton/defunctorize/defunctorize.fun)>
4436
4437 == Details and Notes ==
4438
4439 This pass is grossly misnamed and does not perform defunctorization.
4440
4441 === Datatype Lifting ===
4442
4443 This pass moves all `datatype` declarations to the top level.
4444
4445 <:StandardML:Standard ML> `datatype` declarations can contain type
4446 variables that are not bound in the declaration itself.  For example,
4447 the following program is valid.
4448 [source,sml]
4449 ----
4450 fun 'a f (x: 'a) =
4451    let
4452       datatype 'b t = T of 'a * 'b
4453       val y: int t = T (x, 1)
4454    in
4455       13
4456    end
4457 ----
4458
4459 Unfortunately, the `datatype` declaration can not be immediately moved
4460 to the top level, because that would leave `'a` free.
4461 [source,sml]
4462 ----
4463 datatype 'b t = T of 'a * 'b
4464 fun 'a f (x: 'a) =
4465    let
4466       val y: int t = T (x, 1)
4467    in
4468       13
4469    end
4470 ----
4471
4472 In order to safely move `datatype`s, this pass must close them, as
4473 well as add any free type variables as extra arguments to the type
4474 constructor.  For example, the above program would be translated to
4475 the following.
4476 [source,sml]
4477 ----
4478 datatype ('a, 'b) t = T of 'a * 'b
4479 fun 'a f (x: 'a) =
4480    let
4481       val y: ('a * int) t = T (x, 1)
4482    in
4483       13
4484    end
4485 ----
4486
4487 == Historical Notes ==
4488
4489 The <:Defunctorize:> pass originally eliminated
4490 <:StandardML:Standard ML> functors by duplicating their body at each
4491 application.  These duties have been adopted by the <:Elaborate:>
4492 pass.
4493
4494 <<<
4495
4496 :mlton-guide-page: Developers
4497 [[Developers]]
4498 Developers
4499 ==========
4500
4501 Here is a picture of the MLton team at a meeting in Chicago in August
4502 2003.  From left to right we have:
4503
4504 [align="center",frame="none",cols="^"]
4505 |=====
4506 |<:StephenWeeks:> -- <:MatthewFluet:> -- <:HenryCejtin:> -- <:SureshJagannathan:>
4507 |=====
4508
4509 image::Developers.attachments/team.jpg[align="center"]
4510
4511 Also see the <:Credits:> for a list of specific contributions.
4512
4513
4514 == Developers list ==
4515
4516 A number of people read the developers mailing list,
4517 mailto:MLton-devel@mlton.org[`MLton-devel@mlton.org`], and make
4518 contributions there.  Here's a list of those who have a page here.
4519
4520 * <:AndreiFormiga:>
4521 * <:JesperLouisAndersen:>
4522 * <:JohnnyAndersen:>
4523 * <:MichaelNorrish:>
4524 * <:MikeThomas:>
4525 * <:RayRacine:>
4526 * <:WesleyTerpstra:>
4527 * <:VesaKarvonen:>
4528
4529 <<<
4530
4531 :mlton-guide-page: Development
4532 [[Development]]
4533 Development
4534 ===========
4535
4536 This page is the central point for MLton development.
4537
4538 * Access the <:Sources:>.
4539 * Check the current <!ViewGitFile(mlton,master,CHANGELOG.adoc)> or recent https://github.com/MLton/mlton/commits/master[commits].
4540 * Open https://github.com/MLton/mlton/issues[Issues].
4541 * Ideas for <:Projects:> to improve MLton.
4542 * <:Developers:> that are or have been involved in the project.
4543 // * Help maintain and improve the <:WebSite:>.
4544
4545 == Notes ==
4546
4547 * <:CompilerOverview:>
4548 * <:CompilingWithSMLNJ:>
4549 * <:CrossCompiling:>
4550 * <:License:>
4551 * <:NeedsReview:>
4552 * <:PortingMLton:>
4553 * <:ReleaseChecklist:>
4554 * <:SelfCompiling:>
4555
4556 <<<
4557
4558 :mlton-guide-page: Documentation
4559 [[Documentation]]
4560 Documentation
4561 =============
4562
4563 Documentation is available on the following topics.
4564
4565 * <:StandardML:Standard ML>
4566 ** <:BasisLibrary:Basis Library>
4567 ** <:Libraries: Additional libraries>
4568 * <:Installation:Installing MLton>
4569 * Using MLton
4570 ** <:ForeignFunctionInterface: Foreign function interface (FFI)>
4571 ** <:ManualPage: Manual page> (<:CompileTimeOptions:compile-time options> <:RunTimeOptions:run-time options>)
4572 ** <:MLBasis: ML Basis system>
4573 ** <:MLtonStructure: MLton structure>
4574 ** <:PlatformSpecificNotes: Platform-specific notes>
4575 ** <:Profiling: Profiling>
4576 ** <:TypeChecking: Type checking>
4577 ** Help for porting from <:SMLNJ:SML/NJ> to MLton.
4578 * About MLton
4579 ** <:Credits:>
4580 ** <:Drawbacks:>
4581 ** <:Features:>
4582 ** <:History:>
4583 ** <:License:>
4584 ** <:Talk:>
4585 ** <:WishList:>
4586 * Tools
4587 ** <:MLLex:> (<!Attachment(Documentation,mllex.pdf)>)
4588 ** <:MLYacc:> (<!Attachment(Documentation,mlyacc.pdf)>)
4589 ** <:MLNLFFIGen:> (<!Attachment(Documentation,mlyacc.pdf)>)
4590 * <:References:>
4591
4592 <<<
4593
4594 :mlton-guide-page: Drawbacks
4595 [[Drawbacks]]
4596 Drawbacks
4597 =========
4598
4599 MLton has several drawbacks due to its use of whole-program
4600 compilation.
4601
4602 * Large compile-time memory requirement.
4603 +
4604 Because MLton performs whole-program analysis and optimization,
4605 compilation requires a large amount of memory.  For example, compiling
4606 MLton (over 140K lines) requires at least 512M RAM.
4607
4608 * Long compile times.
4609 +
4610 Whole-program compilation can take a long time.  For example,
4611 compiling MLton (over 140K lines) on a 1.6GHz machine takes five to
4612 ten minutes.
4613
4614 * No interactive top level.
4615 +
4616 Because of whole-program compilation, MLton does not provide an
4617 interactive top level.  In particular, it does not implement the
4618 optional <:BasisLibrary:Basis Library> function `use`.
4619
4620 <<<
4621
4622 :mlton-guide-page: Eclipse
4623 [[Eclipse]]
4624 Eclipse
4625 =======
4626
4627 http://eclipse.org/[Eclipse] is an open, extensible IDE.
4628
4629 http://www.cse.iitd.ernet.in/%7Ecsu02132/mldev/[ML-Dev] is a plug-in
4630 for Eclipse, based on <:SMLNJ:SML/NJ>.
4631
4632 There has been some talk on the MLton mailing list about adding
4633 support to Eclipse for MLton/SML, and in particular, using
4634 http://eclipsefp.sourceforge.net/.  We are unaware of any progress
4635 along those lines.
4636
4637 <<<
4638
4639 :mlton-guide-page: Elaborate
4640 [[Elaborate]]
4641 Elaborate
4642 =========
4643
4644 <:Elaborate:> is a translation pass from the <:AST:>
4645 <:IntermediateLanguage:> to the <:CoreML:> <:IntermediateLanguage:>.
4646
4647 == Description ==
4648
4649 This pass performs type inference and type checking according to the
4650 <:DefinitionOfStandardML:Definition>.  It also defunctorizes the
4651 program, eliminating all module-level constructs.
4652
4653 == Implementation ==
4654
4655 * <!ViewGitFile(mlton,master,mlton/elaborate/elaborate.sig)>
4656 * <!ViewGitFile(mlton,master,mlton/elaborate/elaborate.fun)>
4657 * <!ViewGitFile(mlton,master,mlton/elaborate/elaborate-env.sig)>
4658 * <!ViewGitFile(mlton,master,mlton/elaborate/elaborate-env.fun)>
4659 * <!ViewGitFile(mlton,master,mlton/elaborate/elaborate-modules.sig)>
4660 * <!ViewGitFile(mlton,master,mlton/elaborate/elaborate-modules.fun)>
4661 * <!ViewGitFile(mlton,master,mlton/elaborate/elaborate-core.sig)>
4662 * <!ViewGitFile(mlton,master,mlton/elaborate/elaborate-core.fun)>
4663 * <!ViewGitDir(mlton,master,mlton/elaborate)>
4664
4665 == Details and Notes ==
4666
4667 At the modules level, the <:Elaborate:> pass:
4668
4669 * elaborates signatures with interfaces (see
4670 <!ViewGitFile(mlton,master,mlton/elaborate/interface.sig)> and
4671 <!ViewGitFile(mlton,master,mlton/elaborate/interface.fun)>)
4672 +
4673 The main trick is to use disjoint sets to efficiently handle sharing
4674 of tycons and of structures and then to copy signatures as dags rather
4675 than as trees.
4676
4677 * checks functors at the point of definition, using functor summaries
4678 to speed up checking of functor applications.
4679 +
4680 When a functor is first type checked, we keep track of the dummy
4681 argument structure and the dummy result structure, as well as all the
4682 tycons that were created while elaborating the body.  Then, if we
4683 later need to type check an application of the functor (as opposed to
4684 defunctorize an application), we pair up tycons in the dummy argument
4685 structure with the actual argument structure and then replace the
4686 dummy tycons with the actual tycons in the dummy result structure,
4687 yielding the actual result structure.  We also generate new tycons for
4688 all the tycons that we created while originally elaborating the body.
4689
4690 * handles opaque signature constraints.
4691 +
4692 This is implemented by building a dummy structure realized from the
4693 signature, just as we would for a functor argument when type checking
4694 a functor.  The dummy structure contains exactly the type information
4695 that is in the signature, which is what opacity requires.  We then
4696 replace the variables (and constructors) in the dummy structure with
4697 the corresponding variables (and constructors) from the actual
4698 structure so that the translation to <:CoreML:> uses the right stuff.
4699 For each tycon in the dummy structure, we keep track of the
4700 corresponding type structure in the actual structure.  This is used
4701 when producing the <:CoreML:> types (see `expandOpaque` in
4702 <!ViewGitFile(mlton,master,mlton/elaborate/type-env.sig)> and
4703 <!ViewGitFile(mlton,master,mlton/elaborate/type-env.fun)>).
4704 +
4705 Then, within each `structure` or `functor` body, for each declaration
4706 (`<dec>` in the <:StandardML:Standard ML> grammar), the <:Elaborate:>
4707 pass does three steps:
4708 +
4709 --
4710 1. <:ScopeInference:>
4711 2. {empty}
4712 ** <:PrecedenceParse:>
4713 ** `_{ex,im}port` expansion
4714 ** profiling insertion
4715 ** unification
4716 3. Overloaded {constant, function, record pattern} resolution
4717 --
4718
4719 === Defunctorization ===
4720
4721 The <:Elaborate:> pass performs a number of duties historically
4722 assigned to the <:Defunctorize:> pass.
4723
4724 As part of the <:Elaborate:> pass, all module level constructs
4725 (`open`, `signature`, `structure`, `functor`, long identifiers) are
4726 removed.  This works because the <:Elaborate:> pass assigns a unique
4727 name to every type and variable in the program.  This also allows the
4728 <:Elaborate:> pass to eliminate `local` declarations, which are purely
4729 for namespace management.
4730
4731
4732 == Examples ==
4733
4734 Here are a number of examples of elaboration.
4735
4736 * All variables bound in `val` declarations are renamed.
4737 +
4738 [source,sml]
4739 ----
4740 val x = 13
4741 val y = x
4742 ----
4743 +
4744 ----
4745 val x_0 = 13
4746 val y_0 = x_0
4747 ----
4748
4749 * All variables in `fun` declarations are renamed.
4750 +
4751 [source,sml]
4752 ----
4753 fun f x = g x
4754 and g y = f y
4755 ----
4756 +
4757 ----
4758 fun f_0 x_0 = g_0 x_0
4759 and g_0 y_0 = f_0 y_0
4760 ----
4761
4762 * Type abbreviations are removed, and the abbreviation is expanded
4763 wherever it is used.
4764 +
4765 [source,sml]
4766 ----
4767 type 'a u = int * 'a
4768 type 'b t = 'b u * real
4769 fun f (x : bool t) = x
4770 ----
4771 +
4772 ----
4773 fun f_0 (x_0 : (int * bool) * real) = x_0
4774 ----
4775
4776 * Exception declarations create a new constructor and rename the type.
4777 +
4778 [source,sml]
4779 ----
4780 type t = int
4781 exception E of t * real
4782 ----
4783 +
4784 ----
4785 exception E_0 of int * real
4786 ----
4787
4788 * The type and value constructors in datatype declarations are renamed.
4789 +
4790 [source,sml]
4791 ----
4792 datatype t = A of int | B of real * t
4793 ----
4794 +
4795 ----
4796 datatype t_0 = A_0 of int | B_0 of real * t_0
4797 ----
4798
4799 * Local declarations are moved to the top-level.  The environment
4800 keeps track of the variables in scope.
4801 +
4802 [source,sml]
4803 ----
4804 val x = 13
4805 local val x = 14
4806 in val y = x
4807 end
4808 val z = x
4809 ----
4810 +
4811 ----
4812 val x_0 = 13
4813 val x_1 = 14
4814 val y_0 = x_1
4815 val z_0 = x_0
4816 ----
4817
4818 * Structure declarations are eliminated, with all declarations moved
4819 to the top level.  Long identifiers are renamed.
4820 +
4821 [source,sml]
4822 ----
4823 structure S =
4824    struct
4825       type t = int
4826       val x : t = 13
4827    end
4828 val y : S.t = S.x
4829 ----
4830 +
4831 ----
4832 val x_0 : int = 13
4833 val y_0 : int = x_0
4834 ----
4835
4836 * Open declarations are eliminated.
4837 +
4838 [source,sml]
4839 ----
4840 val x = 13
4841 val y = 14
4842 structure S =
4843    struct
4844      val x = 15
4845    end
4846 open S
4847 val z = x + y
4848 ----
4849 +
4850 ----
4851 val x_0 = 13
4852 val y_0 = 14
4853 val x_1 = 15
4854 val z_0 = x_1 + y_0
4855 ----
4856
4857 * Functor declarations are eliminated, and the body of a functor is
4858 duplicated wherever the functor is applied.
4859 +
4860 [source,sml]
4861 ----
4862 functor F(val x : int) =
4863    struct
4864      val y = x
4865    end
4866 structure F1 = F(val x = 13)
4867 structure F2 = F(val x = 14)
4868 val z = F1.y + F2.y
4869 ----
4870 +
4871 ----
4872 val x_0 = 13
4873 val y_0 = x_0
4874 val x_1 = 14
4875 val y_1 = x_1
4876 val z_0 = y_0 + y_1
4877 ----
4878
4879 * Signature constraints are eliminated.  Note that signatures do
4880 affect how subsequent variables are renamed.
4881 +
4882 [source,sml]
4883 ----
4884 val y = 13
4885 structure S : sig
4886                  val x : int
4887               end =
4888    struct
4889       val x = 14
4890       val y = x
4891    end
4892 open S
4893 val z = x + y
4894 ----
4895 +
4896 ----
4897 val y_0 = 13
4898 val x_0 = 14
4899 val y_1 = x_0
4900 val z_0 = x_0 + y_0
4901 ----
4902
4903 <<<
4904
4905 :mlton-guide-page: Emacs
4906 [[Emacs]]
4907 Emacs
4908 =====
4909
4910 == SML modes ==
4911
4912 There are a few Emacs modes for SML.
4913
4914 * `sml-mode`
4915 ** http://www.xemacs.org/Documentation/packages/html/sml-mode_3.html
4916 ** http://www.smlnj.org/doc/Emacs/sml-mode.html
4917 ** http://www.iro.umontreal.ca/%7Emonnier/elisp/
4918
4919 * <!ViewGitFile(mlton,master,ide/emacs/mlton.el)> contains the Emacs lisp that <:StephenWeeks:> uses to interact with MLton (in addition to using `sml-mode`).
4920
4921 * http://primate.net/%7Eitz/mindent.tar, developed by Ian Zimmerman, who writes:
4922 +
4923 _____
4924 Unlike the widespread `sml-mode.el` it doesn't try to indent code
4925 based on ML syntax.  I gradually got skeptical about this approach
4926 after writing the initial indentation support for caml mode and
4927 watching it bloat insanely as the language added new features.  Also,
4928 any such attempts that I know of impose a particular coding style, or
4929 at best a choice among a limited set of styles, which I now oppose.
4930 Instead my mode is based on a generic package which provides manual
4931 bindable commands for common indentation operations (example: indent
4932 the current line under the n-th occurrence of a particular character
4933 in the previous non-blank line).
4934 _____
4935
4936 == MLB modes ==
4937
4938 There is a mode for editing <:MLBasis: ML Basis> files.
4939
4940 * <!ViewGitFile(mlton,master,ide/emacs/esml-mlb-mode.el)> (plus other files)
4941
4942 == Definitions and uses ==
4943
4944 There is a mode that supports the precise def-use information that
4945 MLton can output.  It highlights definitions and uses and provides
4946 commands for navigation (e.g., `jump-to-def`, `jump-to-next`,
4947 `list-all-refs`).  It can be handy, for example, for navigating in the
4948 MLton compiler source code.  See <:EmacsDefUseMode:> for further
4949 information.
4950
4951 == Building on the background ==
4952
4953 Tired of manually starting/stopping/restarting builds after editing
4954 files?  Now you don't have to.  See <:EmacsBgBuildMode:> for further
4955 information.
4956
4957 == Error messages ==
4958
4959 MLton's error messages are not among those that the Emacs `next-error`
4960 parser natively understands.  The easiest way to fix this is to add
4961 the following to your `.emacs` to teach Emacs to recognize MLton's
4962 error messages.
4963
4964 [source,cl]
4965 ----
4966 (require 'compile)
4967 (add-to-list 'compilation-error-regexp-alist 'mlton)
4968 (add-to-list 'compilation-error-regexp-alist-alist
4969              '(mlton
4970                "^[[:space:]]*\\(\\(?:\\(Error\\)\\|\\(Warning\\)\\|\\(\\(?:\\(?:defn\\|spec\\) at\\)\\|\\(?:escape \\(?:from\\|to\\)\\)\\|\\(?:scoped at\\)\\)\\): \\(.+\\) \\([0-9]+\\)\\.\\([0-9]+\\)\\(?:-\\([0-9]+\\)\\.\\([0-9]+\\)\\)?\\.?\\)$"
4971                5 (6 . 8) (7 . 9) (3 . 4) 1))
4972 ----
4973
4974 <<<
4975
4976 :mlton-guide-page: EmacsBgBuildMode
4977 [[EmacsBgBuildMode]]
4978 EmacsBgBuildMode
4979 ================
4980
4981 Do you really want to think about starting a build of you project?
4982 What if you had a personal slave that would restart a build of your
4983 project whenever you save any file belonging to that project?  The
4984 bg-build mode does just that.  Just save the file, a compile is
4985 started (silently!), you can continue working without even thinking
4986 about starting a build, and if there are errors, you are notified
4987 (with a message), and can then jump to errors.
4988
4989 This mode is not specific to MLton per se, but is particularly useful
4990 for working with MLton due to the longer compile times.  By the time
4991 you start wondering about possible errors, the build is already on the
4992 way.
4993
4994 == Functionality and Features ==
4995
4996 * Each time a file is saved, and after a user configurable delay
4997 period has been exhausted, a build is started silently in the
4998 background.
4999 * When the build is finished, a status indicator (message) is
5000 displayed non-intrusively.
5001 * At any time, you can switch to a build process buffer where all the
5002 messages from the build are shown.
5003 * Optionally highlights (error/warning) message locations in (source
5004 code) buffers after a finished build.
5005 * After a build has finished, you can jump to locations of warnings
5006 and errors from the build process buffer or by using the `first-error`
5007 and `next-error` commands.
5008 * When a build fails, bg-build mode can optionally execute a user
5009 specified command.  By default, bg-build mode executes `first-error`.
5010 * When starting a build of a particular project, a possible previous
5011 live build of the same project is interrupted first.
5012 * A project configuration file specifies the commands required to
5013 build a project.
5014 * Multiple projects can be loaded into bg-build mode and bg-build mode
5015 can build a given maximum number of projects concurrently.
5016 * Supports both http://www.gnu.org/software/emacs/[Gnu Emacs] and
5017 http://www.xemacs.org[XEmacs].
5018
5019
5020 == Download ==
5021
5022 There is no package for the mode at the moment.  To install the mode you
5023 need to fetch the Emacs Lisp, `*.el`, files from the MLton repository:
5024 <!ViewGitDir(mlton,master,ide/emacs)>.
5025
5026
5027 == Setup ==
5028
5029 The easiest way to load the mode is to first tell Emacs where to find the
5030 files.  For example, add
5031
5032 [source,cl]
5033 ----
5034 (add-to-list 'load-path (file-truename "path-to-the-el-files"))
5035 ----
5036
5037 to your `~/.emacs` or `~/.xemacs/init.el`.  You'll probably also want
5038 to start the mode automatically by adding
5039
5040 [source,cl]
5041 ----
5042 (require 'bg-build-mode)
5043 (bg-build-mode)
5044 ----
5045
5046 to your Emacs init file.  Once the mode is activated, you should see
5047 the `BGB` indicator on the mode line.
5048
5049
5050 === MLton and Compilation-Mode ===
5051
5052 At the time of writing, neither Gnu Emacs nor XEmacs contain an error
5053 regexp that would match MLton's messages.
5054
5055 If you use Gnu Emacs, insert the following code into your `.emacs` file:
5056
5057 [source,cl]
5058 ----
5059 (require 'compile)
5060 (add-to-list
5061  'compilation-error-regexp-alist
5062  '("^\\(Warning\\|Error\\): \\(.+\\) \\([0-9]+\\)\\.\\([0-9]+\\)\\.$"
5063    2 3 4))
5064 ----
5065
5066 If you use XEmacs, insert the following code into your `init.el` file:
5067
5068 [source,cl]
5069 ----
5070 (require 'compile)
5071 (add-to-list
5072  'compilation-error-regexp-alist-alist
5073  '(mlton
5074    ("^\\(Warning\\|Error\\): \\(.+\\) \\([0-9]+\\)\\.\\([0-9]+\\)\\.$"
5075     2 3 4)))
5076 (compilation-build-compilation-error-regexp-alist)
5077 ----
5078
5079 == Usage ==
5080
5081 Typically projects are built (or compiled) using a tool like http://www.gnu.org/software/make/[`make`],
5082 but the details vary.  The bg-build mode needs a project configuration file to
5083 know how to build your project.  A project configuration file basically contains
5084 an Emacs Lisp expression calling a function named `bg-build` that returns a
5085 project object.  A simple example of a project configuration file would be the
5086 (<!ViewGitFile(mltonlib,master,com/ssh/async/unstable/example/smlbot/Build.bgb)>)
5087 file used with smlbot:
5088
5089 [source,cl]
5090 ----
5091 sys::[./bin/InclGitFile.py mltonlib master com/ssh/async/unstable/example/smlbot/Build.bgb 5:]
5092 ----
5093
5094 The `bg-build` function takes a number of keyword arguments:
5095
5096 * `:name` specifies the name of the project.  This can be any
5097 expression that evaluates to a string or to a nullary function that
5098 returns a string.
5099
5100 * `:shell` specifies a shell command to execute.  This can be any
5101 expression that evaluates to a string, a list of strings, or to a
5102 nullary function returning a list of strings.
5103
5104 * `:build?` specifies a predicate to determine whether the project
5105 should be built after some files have been modified.  The predicate is
5106 given a list of filenames and should return a non-nil value when the
5107 project should be built and nil otherwise.
5108
5109 All of the keyword arguments, except `:shell`, are optional and can be left out.
5110
5111 Note the use of the `nice` command above.  It means that background
5112 build process is given a lower priority by the system process
5113 scheduler.  Assuming your machine has enough memory, using nice
5114 ensures that your computer remains responsive.  (You probably won't
5115 even notice when a build is started.)
5116
5117 Once you have written a project file for bg-build mode.  Use the
5118 `bg-build-add-project` command to load the project file for bg-build
5119 mode.  The bg-build mode can also optionally load recent project files
5120 automatically at startup.
5121
5122 After the project file has been loaded and bg-build mode activated,
5123 each time you save a file in Emacs, the bg-build mode tries to build
5124 your project.
5125
5126 The `bg-build-status` command creates a buffer that displays some
5127 status information on builds and allows you to manage projects (start
5128 builds explicitly, remove a project from bg-build, ...) as well as
5129 visit buffers created by bg-build.  Notice the count of started
5130 builds.  At the end of the day it can be in the hundreds or thousands.
5131 Imagine the number of times you've been relieved of starting a build
5132 explicitly!
5133
5134 <<<
5135
5136 :mlton-guide-page: EmacsDefUseMode
5137 [[EmacsDefUseMode]]
5138 EmacsDefUseMode
5139 ===============
5140
5141 MLton provides an <:CompileTimeOptions:option>,
5142 ++-show-def-use __file__++, to output precise (giving exact source
5143 locations) and accurate (including all uses and no false data)
5144 whole-program def-use information to a file.  Unlike typical tags
5145 facilities, the information includes local variables and distinguishes
5146 between different definitions even when they have the same name.  The
5147 def-use Emacs mode uses the information to provide navigation support,
5148 which can be particularly useful while reading SML programs compiled
5149 with MLton (such as the MLton compiler itself).
5150
5151
5152 == Screen Capture ==
5153
5154 Note the highlighting and the type displayed in the minibuffer.
5155
5156 image::EmacsDefUseMode.attachments/def-use-capture.png[align="center"]
5157
5158
5159 == Features ==
5160
5161 * Highlights definitions and uses.  Different colors for definitions, unused definitions, and uses.
5162 * Shows types (with highlighting) of variable definitions in the minibuffer.
5163 * Navigation: `jump-to-def`, `jump-to-next`, and `jump-to-prev`.  These work precisely (no searching involved).
5164 * Can list, visit and mark all references to a definition (within a program).
5165 * Automatically reloads updated def-use files.
5166 * Automatically loads previously used def-use files at startup.
5167 * Supports both http://www.gnu.org/software/emacs/[Gnu Emacs] and http://www.xemacs.org[XEmacs].
5168
5169
5170 == Download ==
5171
5172 There is no separate package for the def-use mode although the mode
5173 has been relatively stable for some time already.  To install the mode
5174 you need to get the Emacs Lisp, `*.el`, files from MLton's repository:
5175 <!ViewGitDir(mlton,master,ide/emacs)>.  The easiest way to get the files
5176 is to use <:Git:> to access MLton's <:Sources:sources>.
5177
5178 /////
5179 If you only want the Emacs lisp files, you can use the following
5180 command:
5181 ----
5182 svn co svn://mlton.org/mlton/trunk/ide/emacs mlton-emacs-ide
5183 ----
5184 /////
5185
5186 == Setup ==
5187
5188 The easiest way to load def-use mode is to first tell Emacs where to
5189 find the files.  For example, add
5190
5191 [source,cl]
5192 ----
5193 (add-to-list 'load-path (file-truename "path-to-the-el-files"))
5194 ----
5195
5196 to your `~/.emacs` or `~/.xemacs/init.el`.  You'll probably
5197 also want to start `def-use-mode` automatically by adding
5198
5199 [source,cl]
5200 ----
5201 (require 'esml-du-mlton)
5202 (def-use-mode)
5203 ----
5204
5205 to your Emacs init file.  Once the def-use mode is activated, you
5206 should see the `DU` indicator on the mode line.
5207
5208 == Usage ==
5209
5210 To use def-use mode one typically first sets up the program's makefile
5211 or build script so that the def-use information is saved each time the
5212 program is compiled.  In addition to the ++-show-def-use __file__++
5213 option, the ++-prefer-abs-paths true++ expert option is required.
5214 Note that the time it takes to save the information is small (compared
5215 to type-checking), so it is recommended to simply add the options to
5216 the MLton invocation that compiles the program.  However, it is only
5217 necessary to type check the program (or library), so one can specify
5218 the ++-stop tc++ option.  For example, suppose you have a program
5219 defined by an MLB file named `my-prg.mlb`, you can save the def-use
5220 information to the file `my-prg.du` by invoking MLton as:
5221
5222 ----
5223 mlton -prefer-abs-paths true -show-def-use my-prg.du -stop tc my-prg.mlb
5224 ----
5225
5226 Finally, one needs to tell the mode where to find the def-use
5227 information.  This is done with the `esml-du-mlton` command.  For
5228 example, to load the `my-prg.du` file, one would type:
5229
5230 ----
5231 M-x esml-du-mlton my-prg.du
5232 ----
5233
5234 After doing all of the above, find an SML file covered by the
5235 previously saved and loaded def-use information, and place the cursor
5236 at some variable (definition or use, it doesn't matter).  You should
5237 see the variable being highlighted.  (Note that specifications in
5238 signatures do not define variables.)
5239
5240 You might also want to setup and use the
5241 <:EmacsBgBuildMode:Bg-Build mode> to start builds automatically.
5242
5243
5244 == Types ==
5245
5246 `-show-def-use` output was extended to include types of variable
5247 definitions in revision <!ViewSVNRev(6333)>.  To get good type names, the
5248 types must be in scope at the end of the program.  If you are using the
5249 <:MLBasis:ML Basis> system, this means that the root MLB-file for your
5250 application should not wrap the libraries used in the application inside
5251 `local ... in ... end`, because that would remove them from the scope before
5252 the end of the program.
5253
5254 <<<
5255
5256 :mlton-guide-page: Enscript
5257 [[Enscript]]
5258 Enscript
5259 ========
5260
5261 http://www.gnu.org/s/enscript/[GNU Enscript] converts ASCII files to
5262 PostScript, HTML, and other output languages, applying language
5263 sensitive highlighting (similar to <:Emacs:>'s font lock mode).  Here
5264 are a few _states_ files for highlighting <:StandardML: Standard ML>.
5265
5266 * <!ViewGitFile(mlton,master,ide/enscript/sml_simple.st)> -- Provides highlighting of keywords, string and character constants, and (nested) comments.
5267 /////
5268 +
5269 [source,sml]
5270 ----
5271 (* Comments (* can be nested *) *)
5272 structure S = struct
5273   val x = (1, 2, "three")
5274 end
5275 ----
5276 /////
5277
5278 * <!ViewGitFile(mlton,master,ide/enscript/sml_verbose.st)> -- Supersedes
5279 the above, adding highlighting of numeric constants.  Due to the
5280 limited parsing available, numeric record labels are highlighted as
5281 numeric constants, in all contexts.  Likewise, a binding precedence
5282 separated from `infix` or `infixr` by a newline is highlighted as a
5283 numeric constant and a numeric record label selector separated from
5284 `#` by a newline is highlighted as a numeric constant.
5285 /////
5286 +
5287 [source,sml]
5288 ----
5289 structure S = struct
5290   (* These look good *)
5291   val x = (1, 2, "three")
5292   val z = #2 x
5293
5294   (* Although these look bad (not all the numbers are constants),       *
5295    * they never occur in practice, as they are equivalent to the above. *)
5296   val x = {1 = 1, 3 = "three", 2 = 2}
5297   val z = #
5298             2 x
5299 end
5300 ----
5301 /////
5302
5303 * <!ViewGitFile(mlton,master,ide/enscript/sml_fancy.st)> -- Supersedes the
5304 above, adding highlighting of type and constructor bindings,
5305 highlighting of explicit binding of type variables at `val` and `fun`
5306 declarations, and separate highlighting of core and modules level
5307 keywords.  Due to the limited parsing available, it is assumed that
5308 the input is a syntactically correct, top-level declaration.
5309 /////
5310 +
5311 [source,sml]
5312 ----
5313 structure S = struct
5314   val x = (1, 2, "three")
5315   datatype 'a t = T of 'a
5316        and u = U of v * v
5317   withtype v = {left: int t, right: int t}
5318   exception E1 of int and E2
5319   fun 'a id (x: 'a) : 'a = x
5320
5321   (* Although this looks bad (the explicitly bound type variable 'a is *
5322    * not highlighted), it is unlikely to occur in practice.            *)
5323   val
5324       'a id = fn (x : 'a) => x
5325 end
5326 ----
5327 /////
5328
5329 * <!ViewGitFile(mlton,master,ide/enscript/sml_gaudy.st)> -- Supersedes the
5330 above, adding highlighting of type annotations, in both expressions
5331 and signatures.  Due to the limited parsing available, it is assumed
5332 that the input is a syntactically correct, top-level declaration.
5333 /////
5334 +
5335 [source,sml]
5336 ----
5337 signature S = sig
5338   type t
5339   val x : t
5340   val f : t * int -> int
5341 end
5342 structure S : S = struct
5343   datatype t = T of int
5344   val x : t = T 0
5345   fun f (T x, i : int) : int = x + y
5346   fun 'a id (x: 'a) : 'a = x
5347 end
5348 ----
5349 /////
5350
5351 == Install and use ==
5352
5353 * Version 1.6.3 of http://people.ssh.com/mtr/genscript[GNU Enscript]
5354 ** Copy all files to `/usr/share/enscript/hl/` or `.enscript/` in your home directory.
5355 ** Invoke `enscript` with `--highlight=sml_simple` (or `--highlight=sml_verbose` or `--highlight=sml_fancy` or `--highlight=sml_gaudy`).
5356
5357 * Version 1.6.1 of http://people.ssh.com/mtr/genscript[GNU Enscript]
5358 ** Append <!ViewGitFile(mlton,master,ide/enscript/sml_all.st)> to `/usr/share/enscript/enscript.st`
5359 ** Invoke `enscript` with `--pretty-print=sml_simple` (or `--pretty-print=sml_verbose` or `--pretty-print=sml_fancy` or `--pretty-print=sml_gaudy`).
5360
5361 == Feedback ==
5362
5363 Comments and suggestions should be directed to <:MatthewFluet:>.
5364
5365 <<<
5366
5367 :mlton-guide-page: EqualityType
5368 [[EqualityType]]
5369 EqualityType
5370 ============
5371
5372 An equality type is a type to which <:PolymorphicEquality:> can be
5373 applied.  The <:DefinitionOfStandardML:Definition> and the
5374 <:BasisLibrary:Basis Library> precisely spell out which types are
5375 equality types.
5376
5377 * `bool`, `char`, `IntInf.int`, ++Int__<N>__.int++, `string`, and ++Word__<N>__.word++ are equality types.
5378
5379 * for any `t`, both `t array` and `t ref` are equality types.
5380
5381 * if `t` is an equality type, then `t list`, and `t vector` are equality types.
5382
5383 * if `t1`, ..., `tn` are equality types, then `t1 * ... * tn` and `{l1: t1, ..., ln: tn}` are equality types.
5384
5385 * if `t1`, ..., `tn` are equality types and `t` <:AdmitsEquality:>, then `(t1, ..., tn) t` is an equality type.
5386
5387 To check that a type t is an equality type, use the following idiom.
5388 [source,sml]
5389 ----
5390 structure S: sig eqtype t end =
5391    struct
5392       type t = ...
5393    end
5394 ----
5395
5396 Notably, `exn` and `real` are not equality types.  Neither is `t1 -> t2`, for any `t1` and `t2`.
5397
5398 Equality on arrays and ref cells is by identity, not structure.
5399 For example, `ref 13 = ref 13` is `false`.
5400 On the other hand, equality for lists, strings, and vectors is by
5401 structure, not identity.  For example, the following equalities hold.
5402
5403 [source,sml]
5404 ----
5405 val _ = [1, 2, 3] = 1 :: [2, 3]
5406 val _ = "foo" = concat ["f", "o", "o"]
5407 val _ = Vector.fromList [1, 2, 3] = Vector.tabulate (3, fn i => i + 1)
5408 ----
5409
5410 <<<
5411
5412 :mlton-guide-page: EqualityTypeVariable
5413 [[EqualityTypeVariable]]
5414 EqualityTypeVariable
5415 ====================
5416
5417 An equality type variable is a type variable that starts with two or
5418 more primes, as in `''a` or `''b`.  The canonical use of equality type
5419 variables is in specifying the type of the <:PolymorphicEquality:>
5420 function, which is `''a * ''a -> bool`.  Equality type variables
5421 ensure that polymorphic equality is only used on
5422 <:EqualityType:equality types>, by requiring that at every use of a
5423 polymorphic value, equality type variables are instantiated by
5424 equality types.
5425
5426 For example, the following program is type correct because polymorphic
5427 equality is applied to variables of type `''a`.
5428
5429 [source,sml]
5430 ----
5431 fun f (x: ''a, y: ''a): bool = x = y
5432 ----
5433
5434 On the other hand, the following program is not type correct, because
5435 polymorphic equality is applied to variables of type `'a`, which is
5436 not an equality type.
5437
5438 [source,sml]
5439 ----
5440 fun f (x: 'a, y: 'a): bool = x = y
5441 ----
5442
5443 MLton reports the following error, indicating that polymorphic
5444 equality expects equality types, but didn't get them.
5445
5446 ----
5447 Error: z.sml 1.30-1.34.
5448   Function applied to incorrect argument.
5449     expects: [<equality>] * [<equality>]
5450     but got: ['a] * ['a]
5451     in: = (x, y)
5452 ----
5453
5454 As an example of using such a function that requires equality types,
5455 suppose that `f` has polymorphic type `''a -> unit`.  Then, `f 13` is
5456 type correct because `int` is an equality type.  On the other hand,
5457 `f 13.0` and `f (fn x => x)` are not type correct, because `real` and
5458 arrow types are not equality types.  We can test these facts with the
5459 following short programs.  First, we verify that such an `f` can be
5460 applied to integers.
5461
5462 [source,sml]
5463 ----
5464 functor Ok (val f: ''a -> unit): sig end =
5465    struct
5466       val () = f 13
5467       val () = f 14
5468    end
5469 ----
5470
5471 We can do better, and verify that such an `f` can be applied to
5472 any integer.
5473
5474 [source,sml]
5475 ----
5476 functor Ok (val f: ''a -> unit): sig end =
5477    struct
5478       fun g (x: int) = f x
5479    end
5480 ----
5481
5482 Even better, we don't need to introduce a dummy function name; we can
5483 use a type constraint.
5484
5485 [source,sml]
5486 ----
5487 functor Ok (val f: ''a -> unit): sig end =
5488    struct
5489       val _ = f: int -> unit
5490    end
5491 ----
5492
5493 Even better, we can use a signature constraint.
5494
5495 [source,sml]
5496 ----
5497 functor Ok (S: sig val f: ''a -> unit end):
5498    sig val f: int -> unit end = S
5499 ----
5500
5501 This functor concisely verifies that a function of polymorphic type
5502 `''a -> unit` can be safely used as a function of type `int -> unit`.
5503
5504 As above, we can verify that such an `f` can not be used at
5505 non-equality types.
5506
5507 [source,sml]
5508 ----
5509 functor Bad (S: sig val f: ''a -> unit end):
5510    sig val f: real -> unit end = S
5511
5512 functor Bad (S: sig val f: ''a -> unit end):
5513    sig val f: ('a -> 'a) -> unit end = S
5514 ----
5515
5516 MLton reports the following errors.
5517
5518 ----
5519 Error: z.sml 2.4-2.30.
5520   Variable in structure disagrees with signature (type): f.
5521     structure: val f: [<equality>] -> _
5522     defn at: z.sml 1.25-1.25
5523     signature: val f: [real] -> _
5524     spec at: z.sml 2.12-2.12
5525 Error: z.sml 5.4-5.36.
5526   Variable in structure disagrees with signature (type): f.
5527     structure: val f: [<equality>] -> _
5528     defn at: z.sml 4.25-4.25
5529     signature: val f: [_ -> _] -> _
5530     spec at: z.sml 5.12-5.12
5531 ----
5532
5533
5534 == Equality type variables in type and datatype declarations ==
5535
5536 Equality type variables can be used in type and datatype declarations;
5537 however they play no special role.  For example,
5538
5539 [source,sml]
5540 ----
5541 type 'a t = 'a * int
5542 ----
5543
5544 is completely identical to
5545
5546 [source,sml]
5547 ----
5548 type ''a t = ''a * int
5549 ----
5550
5551 In particular, such a definition does _not_ require that `t` only be
5552 applied to equality types.
5553
5554 Similarly,
5555
5556 [source,sml]
5557 ----
5558 datatype 'a t = A | B of 'a
5559 ----
5560
5561 is completely identical to
5562
5563 [source,sml]
5564 ----
5565 datatype ''a t = A | B of ''a
5566 ----
5567
5568 <<<
5569
5570 :mlton-guide-page: EtaExpansion
5571 [[EtaExpansion]]
5572 EtaExpansion
5573 ============
5574
5575 Eta expansion is a simple syntactic change used to work around the
5576 <:ValueRestriction:> in <:StandardML:Standard ML>.
5577
5578 The eta expansion of an expression `e` is the expression
5579 `fn z => e z`, where `z` does not occur in `e`.  This only
5580 makes sense if `e` denotes a function, i.e. is of arrow type.  Eta
5581 expansion delays the evaluation of `e` until the function is
5582 applied, and will re-evaluate `e` each time the function is
5583 applied.
5584
5585 The name "eta expansion" comes from the eta-conversion rule of the
5586 <:LambdaCalculus:lambda calculus>.  Expansion refers to the
5587 directionality of the equivalence being used, namely taking `e` to
5588 `fn z => e z` rather than `fn z => e z` to `e` (eta
5589 contraction).
5590
5591 <<<
5592
5593 :mlton-guide-page: eXene
5594 [[eXene]]
5595 eXene
5596 =====
5597
5598 http://people.cs.uchicago.edu/%7Ejhr/eXene/index.html[eXene] is a
5599 multi-threaded X Window System toolkit written in <:ConcurrentML:>.
5600
5601 There is a group at K-State working toward
5602 http://www.cis.ksu.edu/%7Estough/eXene/[eXene 2.0].
5603
5604 <<<
5605
5606 :mlton-guide-page: FAQ
5607 [[FAQ]]
5608 FAQ
5609 ===
5610
5611 Feel free to ask questions and to update answers by editing this page.
5612 Since we try to make as much information as possible available on the
5613 web site and we like to avoid duplication, many of the answers are
5614 simply links to a web page that answers the question.
5615
5616 == How do you pronounce MLton? ==
5617
5618 <:Pronounce:>
5619
5620 == What SML software has been ported to MLton? ==
5621
5622 <:Libraries:>
5623
5624 == What graphical libraries are available for MLton? ==
5625
5626 <:Libraries:>
5627
5628 == How does MLton's performance compare to other SML compilers and to other languages? ==
5629
5630 MLton has <:Performance:excellent performance>.
5631
5632 == Does MLton treat monomorphic arrays and vectors specially? ==
5633
5634 MLton implements monomorphic arrays and vectors (e.g. `BoolArray`,
5635 `Word8Vector`) exactly as instantiations of their polymorphic
5636 counterpart (e.g. `bool array`, `Word8.word vector`).  Thus, there is
5637 no need to use the monomorphic versions except when required to
5638 interface with the <:BasisLibrary:Basis Library> or for portability
5639 with other SML implementations.
5640
5641 == Why do I get a Segfault/Bus error in a program that uses `IntInf`/`LargeInt` to calculate numbers with several hundred thousand digits? ==
5642
5643 <:GnuMP:>
5644
5645 == How can I decrease compile-time memory usage? ==
5646
5647 * Compile with `-verbose 3` to find out if the problem is due to an
5648 SSA optimization pass.  If so, compile with ++-disable-pass __pass__++ to
5649 skip that pass.
5650
5651 * Compile with `@MLton hash-cons 0.5 --`, which will instruct the
5652 runtime to hash cons the heap every other GC.
5653
5654 * Compile with `-polyvariance false`, which is an undocumented option
5655 that causes less code duplication.
5656
5657 Also, please <:Contact:> us to let us know the problem to help us
5658 better understand MLton's limitations.
5659
5660 == How portable is SML code across SML compilers? ==
5661
5662 <:StandardMLPortability:>
5663
5664 <<<
5665
5666 :mlton-guide-page: Features
5667 [[Features]]
5668 Features
5669 ========
5670
5671 MLton has the following features.
5672
5673 == Portability ==
5674
5675 * Runs on a variety of platforms.
5676
5677 ** <:RunningOnARM:ARM>:
5678 *** <:RunningOnLinux:Linux> (Debian)
5679
5680 ** <:RunningOnAlpha:Alpha>:
5681 *** <:RunningOnLinux:Linux> (Debian)
5682
5683 ** <:RunningOnAMD64:AMD64>:
5684 *** <:RunningOnDarwin:Darwin> (Mac OS X)
5685 *** <:RunningOnFreeBSD:FreeBSD>
5686 *** <:RunningOnLinux:Linux> (Debian, Fedora, Ubuntu, ...)
5687 *** <:RunningOnOpenBSD:OpenBSD>
5688 *** <:RunningOnSolaris:Solaris> (10 and above)
5689
5690 ** <:RunningOnHPPA:HPPA>:
5691 *** <:RunningOnHPUX:HPUX> (11.11 and above)
5692 *** <:RunningOnLinux:Linux> (Debian)
5693
5694 ** <:RunningOnIA64:IA64>:
5695 *** <:RunningOnHPUX:HPUX> (11.11 and above)
5696 *** <:RunningOnLinux:Linux> (Debian)
5697
5698 ** <:RunningOnPowerPC:PowerPC>:
5699 *** <:RunningOnAIX:AIX> (5.2 and above)
5700 *** <:RunningOnDarwin:Darwin> (Mac OS X)
5701 *** <:RunningOnLinux:Linux> (Debian, Fedora, ...)
5702
5703 ** <:RunningOnPowerPC64:PowerPC64>:
5704 *** <:RunningOnAIX:AIX> (5.2 and above)
5705
5706 ** <:RunningOnS390:S390>
5707 *** <:RunningOnLinux:Linux> (Debian)
5708
5709 ** <:RunningOnSparc:Sparc>
5710 *** <:RunningOnLinux:Linux> (Debian)
5711 *** <:RunningOnSolaris:Solaris> (8 and above)
5712
5713 ** <:RunningOnX86:X86>:
5714 *** <:RunningOnCygwin:Cygwin>/Windows
5715 *** <:RunningOnDarwin:Darwin> (Mac OS X)
5716 *** <:RunningOnFreeBSD:FreeBSD>
5717 *** <:RunningOnLinux:Linux> (Debian, Fedora, Ubuntu, ...)
5718 *** <:RunningOnMinGW:MinGW>/Windows
5719 *** <:RunningOnNetBSD:NetBSD>
5720 *** <:RunningOnOpenBSD:OpenBSD>
5721 *** <:RunningOnSolaris:Solaris> (10 and above)
5722
5723 == Robustness ==
5724
5725 * Supports the full SML 97 language as given in <:DefinitionOfStandardML:The Definition of Standard ML (Revised)>.
5726 +
5727 If there is a program that is valid according to the
5728 <:DefinitionOfStandardML:Definition> that is rejected by MLton, or a
5729 program that is invalid according to the
5730 <:DefinitionOfStandardML:Definition> that is accepted by MLton, it is
5731 a bug.  For a list of known bugs, see <:UnresolvedBugs:>.
5732
5733 * A complete implementation of the <:BasisLibrary:Basis Library>.
5734 +
5735 MLton's implementation matches latest <:BasisLibrary:Basis Library>
5736 http://www.standardml.org/Basis[specification], and includes a
5737 complete implementation of all the required modules, as well as many
5738 of the optional modules.
5739
5740 * Generates standalone executables.
5741 +
5742 No additional code or libraries are necessary in order to run an
5743 executable, except for the standard shared libraries.  MLton can also
5744 generate statically linked executables.
5745
5746 * Compiles large programs.
5747 +
5748 MLton is sufficiently efficient and robust that it can compile large
5749 programs, including itself (over 190K lines).  The distributed version
5750 of MLton was compiled by MLton.
5751
5752 * Support for large amounts of memory (up to 4G on 32-bit systems; more on 64-bit systems).
5753
5754 * Support for large array lengths (up to 2^31^-1 on 32-bit systems; up to 2^63^-1 on 64-bit systems).
5755
5756 * Support for large files, using 64-bit file positions.
5757
5758 == Performance ==
5759
5760 * Executables have <:Performance:excellent running times>.
5761
5762 * Generates small executables.
5763 +
5764 MLton takes advantage of whole-program compilation to perform very
5765 aggressive dead-code elimination, which often leads to smaller
5766 executables than with other SML compilers.
5767
5768 * Untagged and unboxed native integers, reals, and words.
5769 +
5770 In MLton, integers and words are 8 bits, 16 bits, 32 bits, and 64 bits
5771 and arithmetic does not have any overhead due to tagging or boxing.
5772 Also, reals (32-bit and 64-bit) are stored unboxed, avoiding any
5773 overhead due to boxing.
5774
5775 * Unboxed native arrays.
5776 +
5777 In MLton, an array (or vector) of integers, reals, or words uses the
5778 natural C-like representation.  This is fast and supports easy
5779 exchange of data with C.  Monomorphic arrays (and vectors) use the
5780 same C-like representations as their polymorphic counterparts.
5781
5782 * Multiple <:GarbageCollection:garbage collection> strategies.
5783
5784 * Fast arbitrary precision arithmetic (`IntInf`) based on <:GnuMP:>.
5785 +
5786 For `IntInf` intensive programs, MLton can be an order of magnitude or
5787 more faster than Poly/ML or SML/NJ.
5788
5789 == Tools ==
5790
5791 * Source-level <:Profiling:> of both time and allocation.
5792 * <:MLLex:> lexer generator
5793 * <:MLYacc:> parser generator
5794 * <:MLNLFFIGen:> foreign-function-interface generator
5795
5796 == Extensions ==
5797
5798 * A simple and fast C <:ForeignFunctionInterface:> that supports calling from SML to C and from C to SML.
5799
5800 * The <:MLBasis:ML Basis system> for programming in the very large, separate delivery of library sources, and more.
5801
5802 * A number of extension libraries that provide useful functionality
5803 that cannot be implemented with the <:BasisLibrary:Basis Library>.
5804 See below for an overview and <:MLtonStructure:> for details.
5805
5806 ** <:MLtonCont:continuations>
5807 +
5808 MLton supports continuations via `callcc` and `throw`.
5809
5810 ** <:MLtonFinalizable:finalization>
5811 +
5812 MLton supports finalizable values of arbitrary type.
5813
5814 ** <:MLtonItimer:interval timers>
5815 +
5816 MLton supports the functionality of the C `setitimer` function.
5817
5818 ** <:MLtonRandom:random numbers>
5819 +
5820 MLton has functions similar to the C `rand` and `srand` functions, as well as support for access to `/dev/random` and `/dev/urandom`.
5821
5822 ** <:MLtonRlimit:resource limits>
5823 +
5824 MLton has functions similar to the C `getrlimit` and `setrlimit` functions.
5825
5826 ** <:MLtonRusage:resource usage>
5827 +
5828 MLton supports a subset of the functionality of the C `getrusage` function.
5829
5830 ** <:MLtonSignal:signal handlers>
5831 +
5832 MLton supports signal handlers written in SML.  Signal handlers run in
5833 a separate MLton thread, and have access to the thread that was
5834 interrupted by the signal.  Signal handlers can be used in conjunction
5835 with threads to implement preemptive multitasking.
5836
5837 ** <:MLtonStructure:size primitive>
5838 +
5839 MLton includes a primitive that returns the size (in bytes) of any
5840 object.  This can be useful in understanding the space behavior of a
5841 program.
5842
5843 ** <:MLtonSyslog:system logging>
5844 +
5845 MLton has a complete interface to the C `syslog` function.
5846
5847 ** <:MLtonThread:threads>
5848 +
5849 MLton has support for its own threads, upon which either preemptive or
5850 non-preemptive multitasking can be implemented.  MLton also has
5851 support for <:ConcurrentML:Concurrent ML> (CML).
5852
5853 ** <:MLtonWeak:weak pointers>
5854 +
5855 MLton supports weak pointers, which allow the garbage collector to
5856 reclaim objects that it would otherwise be forced to keep.  Weak
5857 pointers are also used to provide finalization.
5858
5859 ** <:MLtonWorld:world save and restore>
5860 +
5861 MLton has a facility for saving the entire state of a computation to a
5862 file and restarting it later.  This facility can be used for staging
5863 and for checkpointing computations.  It can even be used from within
5864 signal handlers, allowing interrupt driven checkpointing.
5865
5866 <<<
5867
5868 :mlton-guide-page: FirstClassPolymorphism
5869 [[FirstClassPolymorphism]]
5870 FirstClassPolymorphism
5871 ======================
5872
5873 First-class polymorphism is the ability to treat polymorphic functions
5874 just like other values: pass them as arguments, store them in data
5875 structures, etc.  Although <:StandardML:Standard ML> does have
5876 polymorphic functions, it does not support first-class polymorphism.
5877
5878 For example, the following declares and uses the polymorphic function
5879 `id`.
5880 [source,sml]
5881 ----
5882 val id = fn x => x
5883 val _ = id 13
5884 val _ = id "foo"
5885 ----
5886
5887 If SML supported first-class polymorphism, we could write the
5888 following.
5889 [source,sml]
5890 ----
5891 fun useId id = (id 13; id "foo")
5892 ----
5893
5894 However, this does not type check.  MLton reports the following error.
5895 ----
5896 Error: z.sml 1.24-1.31.
5897   Function applied to incorrect argument.
5898     expects: [int]
5899     but got: [string]
5900     in: id "foo"
5901 ----
5902 The error message arises because MLton infers from `id 13` that `id`
5903 accepts an integer argument, but that `id "foo"` is passing a string.
5904
5905 Using explicit types sheds some light on the problem.
5906 [source,sml]
5907 ----
5908 fun useId (id: 'a -> 'a) = (id 13; id "foo")
5909 ----
5910
5911 On this, MLton reports the following errors.
5912 ----
5913 Error: z.sml 1.29-1.33.
5914   Function applied to incorrect argument.
5915     expects: ['a]
5916     but got: [int]
5917     in: id 13
5918 Error: z.sml 1.36-1.43.
5919   Function applied to incorrect argument.
5920     expects: ['a]
5921     but got: [string]
5922     in: id "foo"
5923 ----
5924
5925 The errors arise because the argument `id` is _not_ polymorphic;
5926 rather, it is monomorphic, with type `'a -> 'a`.  It is perfectly
5927 valid to apply `id` to a value of type `'a`, as in the following
5928 [source,sml]
5929 ----
5930 fun useId (id: 'a -> 'a, x: 'a) = id x  (* type correct *)
5931 ----
5932
5933 So, what is the difference between the type specification on `id` in
5934 the following two declarations?
5935 [source,sml]
5936 ----
5937 val id: 'a -> 'a = fn x => x
5938 fun useId (id: 'a -> 'a) = (id 13; id "foo")
5939 ----
5940
5941 While the type specifications on `id` look identical, they mean
5942 different things.  The difference can be made clearer by explicitly
5943 <:TypeVariableScope:scoping the type variables>.
5944 [source,sml]
5945 ----
5946 val 'a id: 'a -> 'a = fn x => x
5947 fun 'a useId (id: 'a -> 'a) = (id 13; id "foo")  (* type error *)
5948 ----
5949
5950 In `val 'a id`, the type variable scoping means that for any `'a`,
5951 `id` has type `'a -> 'a`.  Hence, `id` can be applied to arguments of
5952 type `int`, `real`, etc.  Similarly, in `fun 'a useId`, the scoping
5953 means that `useId` is a polymorphic function that for any `'a` takes a
5954 function of type `'a -> 'a` and does something.  Thus, `useId` could
5955 be applied to a function of type `int -> int`, `real -> real`, etc.
5956
5957 One could imagine an extension of SML that allowed scoping of type
5958 variables at places other than `fun` or `val` declarations, as in the
5959 following.
5960 ----
5961 fun useId (id: ('a).'a -> 'a) = (id 13; id "foo")  (* not SML *)
5962 ----
5963
5964 Such an extension would need to be thought through very carefully, as
5965 it could cause significant complications with <:TypeInference:>,
5966 possible even undecidability.
5967
5968 <<<
5969
5970 :mlton-guide-page: Fixpoints
5971 [[Fixpoints]]
5972 Fixpoints
5973 =========
5974
5975 This page discusses a framework that makes it possible to compute
5976 fixpoints over arbitrary products of abstract types.  The code is from
5977 an Extended Basis library
5978 (<!ViewGitFile(mltonlib,master,com/ssh/extended-basis/unstable/README)>).
5979
5980 First the signature of the framework
5981 (<!ViewGitFile(mltonlib,master,com/ssh/extended-basis/unstable/public/generic/tie.sig)>):
5982 [source,sml]
5983 ----
5984 sys::[./bin/InclGitFile.py mltonlib master com/ssh/extended-basis/unstable/public/generic/tie.sig 6:]
5985 ----
5986
5987 `fix` is a <:TypeIndexedValues:type-indexed> function.  The type-index
5988 parameter to `fix` is called a "witness".  To compute fixpoints over
5989 products, one uses the +*&grave;+ operator to combine witnesses.  To provide
5990 a fixpoint combinator for an abstract type, one implements a witness
5991 providing a thunk whose instantiation allocates a fresh, mutable proxy
5992 and a procedure for updating the proxy with the solution.  Naturally
5993 this means that not all possible ways of computing a fixpoint of a
5994 particular type are possible under the framework.  The `pure`
5995 combinator is a generalization of `tier`.  The `iso` combinator is
5996 provided for reusing existing witnesses.
5997
5998 Note that instead of using an infix operator, we could alternatively
5999 employ an interface using <:Fold:>.  Also, witnesses are eta-expanded
6000 to work around the <:ValueRestriction:value restriction>, while
6001 maintaining abstraction.
6002
6003 Here is the implementation
6004 (<!ViewGitFile(mltonlib,master,com/ssh/extended-basis/unstable/detail/generic/tie.sml)>):
6005 [source,sml]
6006 ----
6007 sys::[./bin/InclGitFile.py mltonlib master com/ssh/extended-basis/unstable/detail/generic/tie.sml 6:]
6008 ----
6009
6010 Let's then take a look at a couple of additional examples.
6011
6012 Here is a naive implementation of lazy promises:
6013 [source,sml]
6014 ----
6015 structure Promise :> sig
6016    type 'a t
6017    val lazy : 'a Thunk.t -> 'a t
6018    val force : 'a t -> 'a
6019    val Y : 'a t Tie.t
6020 end = struct
6021    datatype 'a t' =
6022       EXN of exn
6023     | THUNK of 'a Thunk.t
6024     | VALUE of 'a
6025    type 'a t = 'a t' Ref.t
6026    fun lazy f = ref (THUNK f)
6027    fun force t =
6028       case !t
6029        of EXN e   => raise e
6030         | THUNK f => (t := VALUE (f ()) handle e => t := EXN e ; force t)
6031         | VALUE v => v
6032    fun Y ? = Tie.tier (fn () => let
6033                              val r = lazy (raising Fix.Fix)
6034                           in
6035                              (r, r <\ op := o !)
6036                           end) ?
6037 end
6038 ----
6039
6040 An example use of our naive lazy promises is to implement equally naive
6041 lazy streams:
6042 [source,sml]
6043 ----
6044 structure Stream :> sig
6045    type 'a t
6046    val cons : 'a * 'a t -> 'a t
6047    val get : 'a t -> ('a * 'a t) Option.t
6048    val Y : 'a t Tie.t
6049 end = struct
6050    datatype 'a t = IN of ('a * 'a t) Option.t Promise.t
6051    fun cons (x, xs) = IN (Promise.lazy (fn () => SOME (x, xs)))
6052    fun get (IN p) = Promise.force p
6053    fun Y ? = Tie.iso Promise.Y (fn IN p => p, IN) ?
6054 end
6055 ----
6056
6057 Note that above we make use of the `iso` combinator.  Here is a finite
6058 representation of an infinite stream of ones:
6059
6060 [source,sml]
6061 ----
6062 val ones = let
6063    open Tie Stream
6064 in
6065    fix Y (fn ones => cons (1, ones))
6066 end
6067 ----
6068
6069 <<<
6070
6071 :mlton-guide-page: Flatten
6072 [[Flatten]]
6073 Flatten
6074 =======
6075
6076 <:Flatten:> is an optimization pass for the <:SSA:>
6077 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
6078
6079 == Description ==
6080
6081 This pass flattens arguments to <:SSA:> constructors, blocks, and
6082 functions.
6083
6084 If a tuple is explicitly available at all uses of a function
6085 (resp. block), then:
6086
6087 * The formals and call sites are changed so that the components of the
6088 tuple are passed.
6089
6090 * The tuple is reconstructed at the beginning of the body of the
6091 function (resp. block).
6092
6093 Similarly, if a tuple is explicitly available at all uses of a
6094 constructor, then:
6095
6096 * The constructor argument datatype is changed to flatten the tuple
6097 type.
6098
6099 * The tuple is passed flat at each `ConApp`.
6100
6101 * The tuple is reconstructed at each `Case` transfer target.
6102
6103 == Implementation ==
6104
6105 * <!ViewGitFile(mlton,master,mlton/ssa/flatten.fun)>
6106
6107 == Details and Notes ==
6108
6109 {empty}
6110
6111 <<<
6112
6113 :mlton-guide-page: Fold
6114 [[Fold]]
6115 Fold
6116 ====
6117
6118 This page describes a technique that enables convenient syntax for a
6119 number of language features that are not explicitly supported by
6120 <:StandardML:Standard ML>, including: variable number of arguments,
6121 <:OptionalArguments:optional arguments and labeled arguments>,
6122 <:ArrayLiteral:array and vector literals>,
6123 <:FunctionalRecordUpdate:functional record update>,
6124 and (seemingly) dependently typed functions like <:Printf:printf> and scanf.
6125
6126 The key idea to _fold_ is to define functions `fold`, `step0`,
6127 and `$` such that the following equation holds.
6128
6129 [source,sml]
6130 ----
6131 fold (a, f) (step0 h1) (step0 h2) ... (step0 hn) $
6132 = f (hn (... (h2 (h1 a))))
6133 ----
6134
6135 The name `fold` comes because this is like a traditional list fold,
6136 where `a` is the _base element_, and each _step function_,
6137 `step0 hi`, corresponds to one element of the list and does one
6138 step of the fold.  The name `$` is chosen to mean "end of
6139 arguments" from its common use in regular-expression syntax.
6140
6141 Unlike the usual list fold in which the same function is used to step
6142 over each element in the list, this fold allows the step functions to
6143 be different from each other, and even to be of different types.  Also
6144 unlike the usual list fold, this fold includes a "finishing
6145 function", `f`, that is applied to the result of the fold.  The
6146 presence of the finishing function may seem odd because there is no
6147 analogy in list fold.  However, the finishing function is essential;
6148 without it, there would be no way for the folder to perform an
6149 arbitrary computation after processing all the arguments.  The
6150 examples below will make this clear.
6151
6152 The functions `fold`, `step0`, and `$` are easy to
6153 define.
6154
6155 [source,sml]
6156 ----
6157 fun $ (a, f) = f a
6158 fun id x = x
6159 structure Fold =
6160    struct
6161       fun fold (a, f) g = g (a, f)
6162       fun step0 h (a, f) = fold (h a, f)
6163    end
6164 ----
6165
6166 We've placed `fold` and `step0` in the `Fold` structure
6167 but left `$` at the toplevel because it is convenient in code to
6168 always have `$` in scope.  We've also defined the identity
6169 function, `id`, at the toplevel since we use it so frequently.
6170
6171 Plugging in the definitions, it is easy to verify the equation from
6172 above.
6173
6174 [source,sml]
6175 ----
6176 fold (a, f) (step0 h1) (step0 h2) ... (step0 hn) $
6177 = step0 h1 (a, f) (step0 h2) ... (step0 hn) $
6178 = fold (h1 a, f) (step0 h2) ... (step0 hn) $
6179 = step0 h2 (h1 a, f) ... (step0 hn) $
6180 = fold (h2 (h1 a), f) ... (step0 hn) $
6181 ...
6182 = fold (hn (... (h2 (h1 a))), f) $
6183 = $ (hn (... (h2 (h1 a))), f)
6184 = f (hn (... (h2 (h1 a))))
6185 ----
6186
6187
6188 == Example: variable number of arguments ==
6189
6190 The simplest example of fold is accepting a variable number of
6191 (curried) arguments.  We'll define a function `f` and argument
6192 `a` such that all of the following expressions are valid.
6193
6194 [source,sml]
6195 ----
6196 f $
6197 f a $
6198 f a a $
6199 f a a a $
6200 f a a a ... a a a $ (* as many a's as we want *)
6201 ----
6202
6203 Off-hand it may appear impossible that all of the above expressions
6204 are type correct SML -- how can a function `f` accept a variable
6205 number of curried arguments?  What could the type of `f` be?
6206 We'll have more to say later on how type checking works.  For now,
6207 once we have supplied the definitions below, you can check that the
6208 expressions are type correct by feeding them to your favorite SML
6209 implementation.
6210
6211 It is simple to define `f` and `a`.  We define `f` as a
6212 folder whose base element is `()` and whose finish function does
6213 nothing.  We define `a` as the step function that does nothing.
6214 The only trickiness is that we must <:EtaExpansion:eta expand> the
6215 definition of `f` and `a` to work around the ValueRestriction;
6216 we frequently use eta expansion for this purpose without mention.
6217
6218 [source,sml]
6219 ----
6220 val base = ()
6221 fun finish () = ()
6222 fun step () = ()
6223 val f = fn z => Fold.fold (base, finish) z
6224 val a = fn z => Fold.step0 step z
6225 ----
6226
6227 One can easily apply the fold equation to verify by hand that `f`
6228 applied to any number of `a`'s evaluates to `()`.
6229
6230 [source,sml]
6231 ----
6232 f a ... a $
6233 = finish (step (... (step base)))
6234 = finish (step (... ()))
6235 ...
6236 = finish ()
6237 = ()
6238 ----
6239
6240
6241 == Example: variable-argument sum ==
6242
6243 Let's look at an example that computes something: a variable-argument
6244 function `sum` and a stepper `a` such that
6245
6246 [source,sml]
6247 ----
6248 sum (a i1) (a i2) ... (a im) $ = i1 + i2 + ... + im
6249 ----
6250
6251 The idea is simple -- the folder starts with a base accumulator of
6252 `0` and the stepper adds each element to the accumulator, `s`,
6253 which the folder simply returns at the end.
6254
6255 [source,sml]
6256 ----
6257 val sum = fn z => Fold.fold (0, fn s => s) z
6258 fun a i = Fold.step0 (fn s => i + s)
6259 ----
6260
6261 Using the fold equation, one can verify the following.
6262
6263 [source,sml]
6264 ----
6265 sum (a 1) (a 2) (a 3) $ = 6
6266 ----
6267
6268
6269 == Step1 ==
6270
6271 It is sometimes syntactically convenient to omit the parentheses
6272 around the steps in a fold.  This is easily done by defining a new
6273 function, `step1`, as follows.
6274
6275 [source,sml]
6276 ----
6277 structure Fold =
6278    struct
6279       open Fold
6280       fun step1 h (a, f) b = fold (h (b, a), f)
6281    end
6282 ----
6283
6284 From the definition of `step1`, we have the following
6285 equivalence.
6286
6287 [source,sml]
6288 ----
6289 fold (a, f) (step1 h) b
6290 = step1 h (a, f) b
6291 = fold (h (b, a), f)
6292 ----
6293
6294 Using the above equivalence, we can compute the following equation for
6295 `step1`.
6296
6297 [source,sml]
6298 ----
6299 fold (a, f) (step1 h1) b1 (step1 h2) b2 ... (step1 hn) bn $
6300 = fold (h1 (b1, a), f) (step1 h2) b2 ... (step1 hn) bn $
6301 = fold (h2 (b2, h1 (b1, a)), f) ... (step1 hn) bn $
6302 = fold (hn (bn, ... (h2 (b2, h1 (b1, a)))), f) $
6303 = f (hn (bn, ... (h2 (b2, h1 (b1, a)))))
6304 ----
6305
6306 Here is an example using `step1` to define a variable-argument
6307 product function, `prod`, with a convenient syntax.
6308
6309 [source,sml]
6310 ----
6311 val prod = fn z => Fold.fold (1, fn p => p) z
6312 val ` = fn z => Fold.step1 (fn (i, p) => i * p) z
6313 ----
6314
6315 The functions `prod` and +&grave;+ satisfy the following equation.
6316 [source,sml]
6317 ----
6318 prod `i1 `i2 ... `im $ = i1 * i2 * ... * im
6319 ----
6320
6321 Note that in SML, +&grave;i1+ is two different tokens, +&grave;+ and
6322 `i1`.  We often use +&grave;+ for an instance of a `step1` function
6323 because of its syntactic unobtrusiveness and because no space is
6324 required to separate it from an alphanumeric token.
6325
6326 Also note that there are no parenthesis around the steps.  That is,
6327 the following expression is not the same as the above one (in fact, it
6328 is not type correct).
6329
6330 [source,sml]
6331 ----
6332 prod (`i1) (`i2) ... (`im) $
6333 ----
6334
6335
6336 == Example: list literals ==
6337
6338 SML already has a syntax for list literals, e.g. `[w, x, y, z]`.
6339 However, using fold, we can define our own syntax.
6340
6341 [source,sml]
6342 ----
6343 val list = fn z => Fold.fold ([], rev) z
6344 val ` = fn z => Fold.step1 (op ::) z
6345 ----
6346
6347 The idea is that the folder starts out with the empty list, the steps
6348 accumulate the elements into a list, and then the finishing function
6349 reverses the list at the end.
6350
6351 With these definitions one can write a list like:
6352
6353 [source,sml]
6354 ----
6355 list `w `x `y `z $
6356 ----
6357
6358 While the example is not practically useful, it does demonstrate the
6359 need for the finishing function to be incorporated in `fold`.
6360 Without a finishing function, every use of `list` would need to be
6361 wrapped in `rev`, as follows.
6362
6363 [source,sml]
6364 ----
6365 rev (list `w `x `y `z $)
6366 ----
6367
6368 The finishing function allows us to incorporate the reversal into the
6369 definition of `list`, and to treat `list` as a truly variable
6370 argument function, performing an arbitrary computation after receiving
6371 all of its arguments.
6372
6373 See <:ArrayLiteral:> for a similar use of `fold` that provides a
6374 syntax for array and vector literals, which are not built in to SML.
6375
6376
6377 == Fold right ==
6378
6379 Just as `fold` is analogous to a fold left, in which the functions
6380 are applied to the accumulator left-to-right, we can define a variant
6381 of `fold` that is analogous to a fold right, in which the
6382 functions are applied to the accumulator right-to-left.  That is, we
6383 can define functions `foldr` and `step0` such that the
6384 following equation holds.
6385
6386 [source,sml]
6387 ----
6388 foldr (a, f) (step0 h1) (step0 h2) ... (step0 hn) $
6389 = f (h1 (h2 (... (hn a))))
6390 ----
6391
6392 The implementation of fold right is easy, using fold.  The idea is for
6393 the fold to start with `f` and for each step to precompose the
6394 next `hi`.  Then, the finisher applies the composed function to
6395 the base value, `a`.  Here is the code.
6396
6397 [source,sml]
6398 ----
6399 structure Foldr =
6400    struct
6401       fun foldr (a, f) = Fold.fold (f, fn g => g a)
6402       fun step0 h = Fold.step0 (fn g => g o h)
6403    end
6404 ----
6405
6406 Verifying the fold-right equation is straightforward, using the
6407 fold-left equation.
6408
6409 [source,sml]
6410 ----
6411 foldr (a, f) (Foldr.step0 h1) (Foldr.step0 h2) ... (Foldr.step0 hn) $
6412 = fold (f, fn g => g a)
6413     (Fold.step0 (fn g => g o h1))
6414     (Fold.step0 (fn g => g o h2))
6415     ...
6416     (Fold.step0 (fn g => g o hn)) $
6417 = (fn g => g a)
6418   ((fn g => g o hn) (... ((fn g => g o h2) ((fn g => g o h1) f))))
6419 = (fn g => g a)
6420   ((fn g => g o hn) (... ((fn g => g o h2) (f o h1))))
6421 = (fn g => g a) ((fn g => g o hn) (... (f o h1 o h2)))
6422 = (fn g => g a) (f o h1 o h2 o ... o hn)
6423 = (f o h1 o h2 o ... o hn) a
6424 = f (h1 (h2 (... (hn a))))
6425 ----
6426
6427 One can also define the fold-right analogue of `step1`.
6428
6429 [source,sml]
6430 ----
6431 structure Foldr =
6432    struct
6433       open Foldr
6434       fun step1 h = Fold.step1 (fn (b, g) => g o (fn a => h (b, a)))
6435    end
6436 ----
6437
6438
6439 == Example: list literals via fold right ==
6440
6441 Revisiting the list literal example from earlier, we can use fold
6442 right to define a syntax for list literals that doesn't do a reversal.
6443
6444 [source,sml]
6445 ----
6446 val list = fn z => Foldr.foldr ([], fn l => l) z
6447 val ` = fn z => Foldr.step1 (op ::) z
6448 ----
6449
6450 As before, with these definitions, one can write a list like:
6451
6452 [source,sml]
6453 ----
6454 list `w `x `y `z $
6455 ----
6456
6457 The difference between the fold-left and fold-right approaches is that
6458 the fold-right approach does not have to reverse the list at the end,
6459 since it accumulates the elements in the correct order.  In practice,
6460 MLton will simplify away all of the intermediate function composition,
6461 so the the fold-right approach will be more efficient.
6462
6463
6464 == Mixing steppers ==
6465
6466 All of the examples so far have used the same step function throughout
6467 a fold.  This need not be the case.  For example, consider the
6468 following.
6469
6470 [source,sml]
6471 ----
6472 val n = fn z => Fold.fold (0, fn i => i) z
6473 val I = fn z => Fold.step0 (fn i => i * 2) z
6474 val O = fn z => Fold.step0 (fn i => i * 2 + 1) z
6475 ----
6476
6477 Here we have one folder, `n`, that can be used with two different
6478 steppers, `I` and `O`.  By using the fold equation, one can
6479 verify the following equations.
6480
6481 [source,sml]
6482 ----
6483 n O $ = 0
6484 n I $ = 1
6485 n I O $ = 2
6486 n I O I $ = 5
6487 n I I I O $ = 14
6488 ----
6489
6490 That is, we've defined a syntax for writing binary integer constants.
6491
6492 Not only can one use different instances of `step0` in the same
6493 fold, one can also intermix uses of `step0` and `step1`.  For
6494 example, consider the following.
6495
6496 [source,sml]
6497 ----
6498 val n = fn z => Fold.fold (0, fn i => i) z
6499 val O = fn z => Fold.step0 (fn i => n * 8) z
6500 val ` = fn z => Fold.step1 (fn (i, n) => n * 8 + i) z
6501 ----
6502
6503 Using the straightforward generalization of the fold equation to mixed
6504 steppers, one can verify the following equations.
6505
6506 [source,sml]
6507 ----
6508 n 0 $ = 0
6509 n `3 O $ = 24
6510 n `1 O `7 $ = 71
6511 ----
6512
6513 That is, we've defined a syntax for writing octal integer constants,
6514 with a special syntax, `O`, for the zero digit (admittedly
6515 contrived, since one could just write +&grave;0+ instead of `O`).
6516
6517 See <:NumericLiteral:> for a practical extension of this approach that
6518 supports numeric constants in any base and of any type.
6519
6520
6521 == (Seemingly) dependent types ==
6522
6523 A normal list fold always returns the same type no matter what
6524 elements are in the list or how long the list is.  Variable-argument
6525 fold is more powerful, because the result type can vary based both on
6526 the arguments that are passed and on their number.  This can provide
6527 the illusion of dependent types.
6528
6529 For example, consider the following.
6530
6531 [source,sml]
6532 ----
6533 val f = fn z => Fold.fold ((), id) z
6534 val a = fn z => Fold.step0 (fn () => "hello") z
6535 val b = fn z => Fold.step0 (fn () => 13) z
6536 val c = fn z => Fold.step0 (fn () => (1, 2)) z
6537 ----
6538
6539 Using the fold equation, one can verify the following equations.
6540
6541 [source,sml]
6542 ----
6543 f a $ = "hello": string
6544 f b $ = 13: int
6545 f c $ = (1, 2): int * int
6546 ----
6547
6548 That is, `f` returns a value of a different type depending on
6549 whether it is applied to argument `a`, argument `b`, or
6550 argument `c`.
6551
6552 The following example shows how the type of a fold can depend on the
6553 number of arguments.
6554
6555 [source,sml]
6556 ----
6557 val grow = fn z => Fold.fold ([], fn l => l) z
6558 val a = fn z => Fold.step0 (fn x => [x]) z
6559 ----
6560
6561 Using the fold equation, one can verify the following equations.
6562
6563 [source,sml]
6564 ----
6565 grow $ = []: 'a list
6566 grow a $ = [[]]: 'a list list
6567 grow a a $ = [[[]]]: 'a list list list
6568 ----
6569
6570 Clearly, the result type of a call to the variable argument `grow`
6571 function depends on the number of arguments that are passed.
6572
6573 As a reminder, this is well-typed SML.  You can check it out in any
6574 implementation.
6575
6576
6577 == (Seemingly) dependently-typed functional results ==
6578
6579 Fold is especially useful when it returns a curried function whose
6580 arity depends on the number of arguments.  For example, consider the
6581 following.
6582
6583 [source,sml]
6584 ----
6585 val makeSum = fn z => Fold.fold (id, fn f => f 0) z
6586 val I = fn z => Fold.step0 (fn f => fn i => fn x => f (x + i)) z
6587 ----
6588
6589 The `makeSum` folder constructs a function whose arity depends on
6590 the number of `I` arguments and that adds together all of its
6591 arguments.  For example,
6592 `makeSum I $` is of type `int -> int` and
6593 `makeSum I I $` is of type `int -> int -> int`.
6594
6595 One can use the fold equation to verify that the `makeSum` works
6596 correctly.  For example, one can easily check by hand the following
6597 equations.
6598
6599 [source,sml]
6600 ----
6601 makeSum I $ 1 = 1
6602 makeSum I I $ 1 2 = 3
6603 makeSum I I I $ 1 2 3 = 6
6604 ----
6605
6606 Returning a function becomes especially interesting when there are
6607 steppers of different types.  For example, the following `makeSum`
6608 folder constructs functions that sum integers and reals.
6609
6610 [source,sml]
6611 ----
6612 val makeSum = fn z => Foldr.foldr (id, fn f => f 0.0) z
6613 val I = fn z => Foldr.step0 (fn f => fn x => fn i => f (x + real i)) z
6614 val R = fn z => Foldr.step0 (fn f => fn x: real => fn r => f (x + r)) z
6615 ----
6616
6617 With these definitions, `makeSum I R $` is of type
6618 `int -> real -> real` and `makeSum R I I $` is of type
6619 `real -> int -> int -> real`.  One can use the foldr equation to
6620 check the following equations.
6621
6622 [source,sml]
6623 ----
6624 makeSum I $ 1 = 1.0
6625 makeSum I R $ 1 2.5 = 3.5
6626 makeSum R I I $ 1.5 2 3 = 6.5
6627 ----
6628
6629 We used `foldr` instead of `fold` for this so that the order
6630 in which the specifiers `I` and `R` appear is the same as the
6631 order in which the arguments appear.  Had we used `fold`, things
6632 would have been reversed.
6633
6634 An extension of this idea is sufficient to define <:Printf:>-like
6635 functions in SML.
6636
6637
6638 == An idiom for combining steps ==
6639
6640 It is sometimes useful to combine a number of steps together and name
6641 them as a single step.  As a simple example, suppose that one often
6642 sees an integer follower by a real in the `makeSum` example above.
6643 One can define a new _compound step_ `IR` as follows.
6644
6645 [source,sml]
6646 ----
6647 val IR = fn u => Fold.fold u I R
6648 ----
6649
6650 With this definition in place, one can verify the following.
6651
6652 [source,sml]
6653 ----
6654 makeSum IR IR $ 1 2.2 3 4.4 = 10.6
6655 ----
6656
6657 In general, one can combine steps `s1`, `s2`, ... `sn` as
6658
6659 [source,sml]
6660 ----
6661 fn u => Fold.fold u s1 s2 ... sn
6662 ----
6663
6664 The following calculation shows why a compound step behaves as the
6665 composition of its constituent steps.
6666
6667 [source,sml]
6668 ----
6669 fold u (fn u => fold u s1 s2 ... sn)
6670 = (fn u => fold u s1 s2 ... sn) u
6671 = fold u s1 s2 ... sn
6672 ----
6673
6674
6675 == Post composition ==
6676
6677 Suppose we already have a function defined via fold,
6678 `w = fold (a, f)`, and we would like to construct a new fold
6679 function that is like `w`, but applies `g` to the result
6680 produced by `w`.  This is similar to function composition, but we
6681 can't just do `g o w`, because we don't want to use `g` until
6682 `w` has been applied to all of its arguments and received the
6683 end-of-arguments terminator `$`.
6684
6685 More precisely, we want to define a post-composition function
6686 `post` that satisfies the following equation.
6687
6688 [source,sml]
6689 ----
6690 post (w, g) s1 ... sn $ = g (w s1 ... sn $)
6691 ----
6692
6693 Here is the definition of `post`.
6694
6695 [source,sml]
6696 ----
6697 structure Fold =
6698    struct
6699       open Fold
6700       fun post (w, g) s = w (fn (a, h) => s (a, g o h))
6701    end
6702 ----
6703
6704 The following calculations show that `post` satisfies the desired
6705 equation, where `w = fold (a, f)`.
6706
6707 [source,sml]
6708 ----
6709 post (w, g) s
6710 = w (fn (a, h) => s (a, g o h))
6711 = fold (a, f) (fn (a, h) => s (a, g o h))
6712 = (fn (a, h) => s (a, g o h)) (a, f)
6713 = s (a, g o f)
6714 = fold (a, g o f) s
6715 ----
6716
6717 Now, suppose `si = step0 hi` for `i` from `1` to `n`.
6718
6719 [source,sml]
6720 ----
6721 post (w, g) s1 s2 ... sn $
6722 = fold (a, g o f) s1 s2 ... sn $
6723 = (g o f) (hn (... (h1 a)))
6724 = g (f (hn (... (h1 a))))
6725 = g (fold (a, f) s1 ... sn $)
6726 = g (w s1 ... sn $)
6727 ----
6728
6729 For a practical example of post composition, see <:ArrayLiteral:>.
6730
6731
6732 == Lift ==
6733
6734 We now define a peculiar-looking function, `lift0`, that is,
6735 equationally speaking, equivalent to the identity function on a step
6736 function.
6737
6738 [source,sml]
6739 ----
6740 fun lift0 s (a, f) = fold (fold (a, id) s $, f)
6741 ----
6742
6743 Using the definitions, we can prove the following equation.
6744
6745 [source,sml]
6746 ----
6747 fold (a, f) (lift0 (step0 h)) = fold (a, f) (step0 h)
6748 ----
6749
6750 Here is the proof.
6751
6752 [source,sml]
6753 ----
6754 fold (a, f) (lift0 (step0 h))
6755 = lift0 (step0 h) (a, f)
6756 = fold (fold (a, id) (step0 h) $, f)
6757 = fold (step0 h (a, id) $, f)
6758 = fold (fold (h a, id) $, f)
6759 = fold ($ (h a, id), f)
6760 = fold (id (h a), f)
6761 = fold (h a, f)
6762 = step0 h (a, f)
6763 = fold (a, f) (step0 h)
6764 ----
6765
6766 If `lift0` is the identity, then why even define it?  The answer
6767 lies in the typing of fold expressions, which we have, until now, left
6768 unexplained.
6769
6770
6771 == Typing ==
6772
6773 Perhaps the most surprising aspect of fold is that it can be checked
6774 by the SML type system.  The types involved in fold expressions are
6775 complex; fortunately type inference is able to deduce them.
6776 Nevertheless, it is instructive to study the types of fold functions
6777 and steppers.  More importantly, it is essential to understand the
6778 typing aspects of fold in order to write down signatures of functions
6779 defined using fold and step.
6780
6781 Here is the `FOLD` signature, and a recapitulation of the entire
6782 `Fold` structure, with additional type annotations.
6783
6784 [source,sml]
6785 ----
6786 signature FOLD =
6787    sig
6788       type ('a, 'b, 'c, 'd) step = 'a * ('b -> 'c) -> 'd
6789       type ('a, 'b, 'c, 'd) t = ('a, 'b, 'c, 'd) step -> 'd
6790       type ('a1, 'a2, 'b, 'c, 'd) step0 =
6791          ('a1, 'b, 'c, ('a2, 'b, 'c, 'd) t) step
6792       type ('a11, 'a12, 'a2, 'b, 'c, 'd) step1 =
6793          ('a12, 'b, 'c, 'a11 -> ('a2, 'b, 'c, 'd) t) step
6794
6795       val fold: 'a * ('b -> 'c) -> ('a, 'b, 'c, 'd) t
6796       val lift0: ('a1, 'a2, 'a2, 'a2, 'a2) step0
6797                  -> ('a1, 'a2, 'b, 'c, 'd) step0
6798       val post: ('a, 'b, 'c1, 'd) t * ('c1 -> 'c2)
6799                 -> ('a, 'b, 'c2, 'd) t
6800       val step0: ('a1 -> 'a2) -> ('a1, 'a2, 'b, 'c, 'd) step0
6801       val step1: ('a11 * 'a12 -> 'a2)
6802                  -> ('a11, 'a12, 'a2, 'b, 'c, 'd) step1
6803    end
6804
6805 structure Fold:> FOLD =
6806    struct
6807       type ('a, 'b, 'c, 'd) step = 'a * ('b -> 'c) -> 'd
6808
6809       type ('a, 'b, 'c, 'd) t = ('a, 'b, 'c, 'd) step -> 'd
6810
6811       type ('a1, 'a2, 'b, 'c, 'd) step0 =
6812          ('a1, 'b, 'c, ('a2, 'b, 'c, 'd) t) step
6813
6814       type ('a11, 'a12, 'a2, 'b, 'c, 'd) step1 =
6815          ('a12, 'b, 'c, 'a11 -> ('a2, 'b, 'c, 'd) t) step
6816
6817       fun fold (a: 'a, f: 'b -> 'c)
6818                (g: ('a, 'b, 'c, 'd) step): 'd =
6819          g (a, f)
6820
6821       fun step0 (h: 'a1 -> 'a2)
6822                 (a1: 'a1, f: 'b -> 'c): ('a2, 'b, 'c, 'd) t =
6823          fold (h a1, f)
6824
6825       fun step1 (h: 'a11 * 'a12 -> 'a2)
6826                 (a12: 'a12, f: 'b -> 'c)
6827                 (a11: 'a11): ('a2, 'b, 'c, 'd) t =
6828          fold (h (a11, a12), f)
6829
6830       fun lift0 (s: ('a1, 'a2, 'a2, 'a2, 'a2) step0)
6831                 (a: 'a1, f: 'b -> 'c): ('a2, 'b, 'c, 'd) t =
6832          fold (fold (a, id) s $, f)
6833
6834       fun post (w: ('a, 'b, 'c1, 'd) t,
6835                 g: 'c1 -> 'c2)
6836                (s: ('a, 'b, 'c2, 'd) step): 'd =
6837          w (fn (a, h) => s (a, g o h))
6838    end
6839 ----
6840
6841 That's a lot to swallow, so let's walk through it one step at a time.
6842 First, we have the definition of type `Fold.step`.
6843
6844 [source,sml]
6845 ----
6846 type ('a, 'b, 'c, 'd) step = 'a * ('b -> 'c) -> 'd
6847 ----
6848
6849 As a fold proceeds over its arguments, it maintains two things: the
6850 accumulator, of type `'a`, and the finishing function, of type
6851 `'b -> 'c`.  Each step in the fold is a function that takes those
6852 two pieces (i.e. `'a * ('b -> 'c)` and does something to them
6853 (i.e. produces `'d`).  The result type of the step is completely
6854 left open to be filled in by type inference, as it is an arrow type
6855 that is capable of consuming the rest of the arguments to the fold.
6856
6857 A folder, of type `Fold.t`, is a function that consumes a single
6858 step.
6859
6860 [source,sml]
6861 ----
6862 type ('a, 'b, 'c, 'd) t = ('a, 'b, 'c, 'd) step -> 'd
6863 ----
6864
6865 Expanding out the type, we have:
6866
6867 [source,sml]
6868 ----
6869 type ('a, 'b, 'c, 'd) t = ('a * ('b -> 'c) -> 'd) -> 'd
6870 ----
6871
6872 This shows that the only thing a folder does is to hand its
6873 accumulator (`'a`) and finisher (`'b -> 'c`) to the next step
6874 (`'a * ('b -> 'c) -> 'd`).  If SML had <:FirstClassPolymorphism:first-class polymorphism>,
6875 we would write the fold type as follows.
6876
6877 [source,sml]
6878 ----
6879 type ('a, 'b, 'c) t = Forall 'd . ('a, 'b, 'c, 'd) step -> 'd
6880 ----
6881
6882 This type definition shows that a folder had nothing to do with
6883 the rest of the fold, it only deals with the next step.
6884
6885 We now can understand the type of `fold`, which takes the initial
6886 value of the accumulator and the finishing function, and constructs a
6887 folder, i.e. a function awaiting the next step.
6888
6889 [source,sml]
6890 ----
6891 val fold: 'a * ('b -> 'c) -> ('a, 'b, 'c, 'd) t
6892 fun fold (a: 'a, f: 'b -> 'c)
6893          (g: ('a, 'b, 'c, 'd) step): 'd =
6894    g (a, f)
6895 ----
6896
6897 Continuing on, we have the type of step functions.
6898
6899 [source,sml]
6900 ----
6901 type ('a1, 'a2, 'b, 'c, 'd) step0 =
6902    ('a1, 'b, 'c, ('a2, 'b, 'c, 'd) t) step
6903 ----
6904
6905 Expanding out the type a bit gives:
6906
6907 [source,sml]
6908 ----
6909 type ('a1, 'a2, 'b, 'c, 'd) step0 =
6910    'a1 * ('b -> 'c) -> ('a2, 'b, 'c, 'd) t
6911 ----
6912
6913 So, a step function takes the accumulator (`'a1`) and finishing
6914 function (`'b -> 'c`), which will be passed to it by the previous
6915 folder, and transforms them to a new folder.  This new folder has a
6916 new accumulator (`'a2`) and the same finishing function.
6917
6918 Again, imagining that SML had <:FirstClassPolymorphism:first-class polymorphism> makes the type
6919 clearer.
6920
6921 [source,sml]
6922 ----
6923 type ('a1, 'a2) step0 =
6924    Forall ('b, 'c) . ('a1, 'b, 'c, ('a2, 'b, 'c) t) step
6925 ----
6926
6927 Thus, in essence, a `step0` function is a wrapper around a
6928 function of type `'a1 -> 'a2`, which is exactly what the
6929 definition of `step0` does.
6930
6931 [source,sml]
6932 ----
6933 val step0: ('a1 -> 'a2) -> ('a1, 'a2, 'b, 'c, 'd) step0
6934 fun step0 (h: 'a1 -> 'a2)
6935           (a1: 'a1, f: 'b -> 'c): ('a2, 'b, 'c, 'd) t =
6936    fold (h a1, f)
6937 ----
6938
6939 It is not much beyond `step0` to understand `step1`.
6940
6941 [source,sml]
6942 ----
6943 type ('a11, 'a12, 'a2, 'b, 'c, 'd) step1 =
6944    ('a12, 'b, 'c, 'a11 -> ('a2, 'b, 'c, 'd) t) step
6945 ----
6946
6947 A `step1` function takes the accumulator (`'a12`) and finisher
6948 (`'b -> 'c`) passed to it by the previous folder and transforms
6949 them into a function that consumes the next argument (`'a11`) and
6950 produces a folder that will continue the fold with a new accumulator
6951 (`'a2`) and the same finisher.
6952
6953 [source,sml]
6954 ----
6955 fun step1 (h: 'a11 * 'a12 -> 'a2)
6956           (a12: 'a12, f: 'b -> 'c)
6957           (a11: 'a11): ('a2, 'b, 'c, 'd) t =
6958    fold (h (a11, a12), f)
6959 ----
6960
6961 With <:FirstClassPolymorphism:first-class polymorphism>, a `step1` function is more clearly
6962 seen as a wrapper around a binary function of type
6963 `'a11 * 'a12 -> 'a2`.
6964
6965 [source,sml]
6966 ----
6967 type ('a11, 'a12, 'a2) step1 =
6968    Forall ('b, 'c) . ('a12, 'b, 'c, 'a11 -> ('a2, 'b, 'c) t) step
6969 ----
6970
6971 The type of `post` is clear: it takes a folder with a finishing
6972 function that produces type `'c1`, and a function of type
6973 `'c1 -> 'c2` to postcompose onto the folder.  It returns a new
6974 folder with a finishing function that produces type `'c2`.
6975
6976 [source,sml]
6977 ----
6978 val post: ('a, 'b, 'c1, 'd) t * ('c1 -> 'c2)
6979           -> ('a, 'b, 'c2, 'd) t
6980 fun post (w: ('a, 'b, 'c1, 'd) t,
6981           g: 'c1 -> 'c2)
6982          (s: ('a, 'b, 'c2, 'd) step): 'd =
6983    w (fn (a, h) => s (a, g o h))
6984 ----
6985
6986 We will return to `lift0` after an example.
6987
6988
6989 == An example typing ==
6990
6991 Let's type check our simplest example, a variable-argument fold.
6992 Recall that we have a folder `f` and a stepper `a` defined as
6993 follows.
6994
6995 [source,sml]
6996 ----
6997 val f = fn z => Fold.fold ((), fn () => ()) z
6998 val a = fn z => Fold.step0 (fn () => ()) z
6999 ----
7000
7001 Since the accumulator and finisher are uninteresting, we'll use some
7002 abbreviations to simplify things.
7003
7004 [source,sml]
7005 ----
7006 type 'd step = (unit, unit, unit, 'd) Fold.step
7007 type 'd fold = 'd step -> 'd
7008 ----
7009
7010 With these abbreviations, `f` and `a` have the following polymorphic
7011 types.
7012
7013 [source,sml]
7014 ----
7015 f: 'd fold
7016 a: 'd step
7017 ----
7018
7019 Suppose we want to type check
7020
7021 [source,sml]
7022 ----
7023 f a a a $: unit
7024 ----
7025
7026 As a reminder, the fully parenthesized expression is
7027 [source,sml]
7028 ----
7029 ((((f a) a) a) a) $
7030 ----
7031
7032 The observation that we will use repeatedly is that for any type
7033 `z`, if `f: z fold` and `s: z step`, then `f s: z`.
7034 So, if we want
7035
7036 [source,sml]
7037 ----
7038 (f a a a) $: unit
7039 ----
7040
7041 then we must have
7042
7043 [source,sml]
7044 ----
7045 f a a a: unit fold
7046 $: unit step
7047 ----
7048
7049 Applying the observation again, we must have
7050
7051 [source,sml]
7052 ----
7053 f a a: unit fold fold
7054 a: unit fold step
7055 ----
7056
7057 Applying the observation two more times leads to the following type
7058 derivation.
7059
7060 [source,sml]
7061 ----
7062 f: unit fold fold fold fold  a: unit fold fold fold step
7063 f a: unit fold fold fold     a: unit fold fold step
7064 f a a: unit fold fold        a: unit fold step
7065 f a a a: unit fold           $: unit step
7066 f a a a $: unit
7067 ----
7068
7069 So, each application is a fold that consumes the next step, producing
7070 a fold of one smaller type.
7071
7072 One can expand some of the type definitions in `f` to see that it is
7073 indeed a function that takes four curried arguments, each one a step
7074 function.
7075
7076 [source,sml]
7077 ----
7078 f: unit fold fold fold step
7079    -> unit fold fold step
7080    -> unit fold step
7081    -> unit step
7082    -> unit
7083 ----
7084
7085 This example shows why we must eta expand uses of `fold` and `step0`
7086 to work around the value restriction and make folders and steppers
7087 polymorphic.  The type of a fold function like `f` depends on the
7088 number of arguments, and so will vary from use to use.  Similarly,
7089 each occurrence of an argument like `a` has a different type,
7090 depending on the number of remaining arguments.
7091
7092 This example also shows that the type of a folder, when fully
7093 expanded, is exponential in the number of arguments: there are as many
7094 nested occurrences of the `fold` type constructor as there are
7095 arguments, and each occurrence duplicates its type argument.  One can
7096 observe this exponential behavior in a type checker that doesn't share
7097 enough of the representation of types (e.g. one that represents types
7098 as trees rather than directed acyclic graphs).
7099
7100 Generalizing this type derivation to uses of fold where the
7101 accumulator and finisher are more interesting is straightforward.  One
7102 simply includes the type of the accumulator, which may change, for
7103 each step, and the type of the finisher, which doesn't change from
7104 step to step.
7105
7106
7107 == Typing lift ==
7108
7109 The lack of <:FirstClassPolymorphism:first-class polymorphism> in SML
7110 causes problems if one wants to use a step in a first-class way.
7111 Consider the following `double` function, which takes a step, `s`, and
7112 produces a composite step that does `s` twice.
7113
7114 [source,sml]
7115 ----
7116 fun double s = fn u => Fold.fold u s s
7117 ----
7118
7119 The definition of `double` is not type correct.  The problem is that
7120 the type of a step depends on the number of remaining arguments but
7121 that the parameter `s` is not polymorphic, and so can not be used in
7122 two different positions.
7123
7124 Fortunately, we can define a function, `lift0`, that takes a monotyped
7125 step function and _lifts_ it into a polymorphic step function.  This
7126 is apparent in the type of `lift0`.
7127
7128 [source,sml]
7129 ----
7130 val lift0: ('a1, 'a2, 'a2, 'a2, 'a2) step0
7131            -> ('a1, 'a2, 'b, 'c, 'd) step0
7132 fun lift0 (s: ('a1, 'a2, 'a2, 'a2, 'a2) step0)
7133           (a: 'a1, f: 'b -> 'c): ('a2, 'b, 'c, 'd) t =
7134    fold (fold (a, id) s $, f)
7135 ----
7136
7137 The following definition of `double` uses `lift0`, appropriately eta
7138 wrapped, to fix the problem.
7139
7140 [source,sml]
7141 ----
7142 fun double s =
7143    let
7144       val s = fn z => Fold.lift0 s z
7145    in
7146       fn u => Fold.fold u s s
7147    end
7148 ----
7149
7150 With that definition of `double` in place, we can use it as in the
7151 following example.
7152
7153 [source,sml]
7154 ----
7155 val f = fn z => Fold.fold ((), fn () => ()) z
7156 val a = fn z => Fold.step0 (fn () => ()) z
7157 val a2 = fn z => double a z
7158 val () = f a a2 a a2 $
7159 ----
7160
7161 Of course, we must eta wrap the call `double` in order to use its
7162 result, which is a step function, polymorphically.
7163
7164
7165 == Hiding the type of the accumulator ==
7166
7167 For clarity and to avoid mistakes, it can be useful to hide the type
7168 of the accumulator in a fold.  Reworking the simple variable-argument
7169 example to do this leads to the following.
7170
7171 [source,sml]
7172 ----
7173 structure S:>
7174   sig
7175      type ac
7176      val f: (ac, ac, unit, 'd) Fold.t
7177      val s: (ac, ac, 'b, 'c, 'd) Fold.step0
7178   end =
7179   struct
7180      type ac = unit
7181      val f = fn z => Fold.fold ((), fn () => ()) z
7182      val s = fn z => Fold.step0 (fn () => ()) z
7183   end
7184 ----
7185
7186 The idea is to name the accumulator type and use opaque signature
7187 matching to make it abstract.  This can prevent improper manipulation
7188 of the accumulator by client code and ensure invariants that the
7189 folder and stepper would like to maintain.
7190
7191 For a practical example of this technique, see <:ArrayLiteral:>.
7192
7193
7194 == Also see ==
7195
7196 Fold has a number of practical applications.  Here are some of them.
7197
7198 * <:ArrayLiteral:>
7199 * <:Fold01N:>
7200 * <:FunctionalRecordUpdate:>
7201 * <:NumericLiteral:>
7202 * <:OptionalArguments:>
7203 * <:Printf:>
7204 * <:VariableArityPolymorphism:>
7205
7206 There are a number of related techniques.  Here are some of them.
7207
7208 * <:StaticSum:>
7209 * <:TypeIndexedValues:>
7210
7211 <<<
7212
7213 :mlton-guide-page: Fold01N
7214 [[Fold01N]]
7215 Fold01N
7216 =======
7217
7218 A common use pattern of <:Fold:> is to define a variable-arity
7219 function that combines multiple arguments together using a binary
7220 function.  It is slightly tricky to do this directly using fold,
7221 because of the special treatment required for the case of zero or one
7222 argument.  Here is a structure, `Fold01N`, that solves the problem
7223 once and for all, and eases the definition of such functions.
7224
7225 [source,sml]
7226 ----
7227 structure Fold01N =
7228    struct
7229       fun fold {finish, start, zero} =
7230          Fold.fold ((id, finish, fn () => zero, start),
7231                     fn (finish, _, p, _) => finish (p ()))
7232
7233       fun step0 {combine, input} =
7234          Fold.step0 (fn (_, finish, _, f) =>
7235                      (finish,
7236                       finish,
7237                       fn () => f input,
7238                       fn x' => combine (f input, x')))
7239
7240       fun step1 {combine} z input =
7241          step0 {combine = combine, input = input} z
7242    end
7243 ----
7244
7245 If one has a value `zero`, and functions `start`, `c`, and `finish`,
7246 then one can define a variable-arity function `f` and stepper
7247 +&grave;+ as follows.
7248 [source,sml]
7249 ----
7250 val f = fn z => Fold01N.fold {finish = finish, start = start, zero = zero} z
7251 val ` = fn z => Fold01N.step1 {combine = c} z
7252 ----
7253
7254 One can then use the fold equation to prove the following equations.
7255 [source,sml]
7256 ----
7257 f $ = zero
7258 f `a1 $ = finish (start a1)
7259 f `a1 `a2 $ = finish (c (start a1, a2))
7260 f `a1 `a2 `a3 $ = finish (c (c (start a1, a2), a3))
7261 ...
7262 ----
7263
7264 For an example of `Fold01N`, see <:VariableArityPolymorphism:>.
7265
7266
7267 == Typing Fold01N ==
7268
7269 Here is the signature for `Fold01N`.  We use a trick to avoid having
7270 to duplicate the definition of some rather complex types in both the
7271 signature and the structure.  We first define the types in a
7272 structure.  Then, we define them via type re-definitions in the
7273 signature, and via `open` in the full structure.
7274 [source,sml]
7275 ----
7276 structure Fold01N =
7277    struct
7278       type ('input, 'accum1, 'accum2, 'answer, 'zero,
7279             'a, 'b, 'c, 'd, 'e) t =
7280          (('zero -> 'zero)
7281           * ('accum2 -> 'answer)
7282           * (unit -> 'zero)
7283           * ('input -> 'accum1),
7284           ('a -> 'b) * 'c * (unit -> 'a) * 'd,
7285           'b,
7286           'e) Fold.t
7287
7288        type ('input1, 'accum1, 'input2, 'accum2,
7289             'a, 'b, 'c, 'd, 'e, 'f) step0 =
7290          ('a * 'b * 'c * ('input1 -> 'accum1),
7291           'b * 'b * (unit -> 'accum1) * ('input2 -> 'accum2),
7292           'd, 'e, 'f) Fold.step0
7293
7294       type ('accum1, 'input, 'accum2,
7295             'a, 'b, 'c, 'd, 'e, 'f, 'g) step1 =
7296          ('a,
7297           'b * 'c * 'd * ('a -> 'accum1),
7298           'c * 'c * (unit -> 'accum1) * ('input -> 'accum2),
7299           'e, 'f, 'g) Fold.step1
7300    end
7301
7302 signature FOLD_01N =
7303    sig
7304       type ('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h, 'i, 'j) t =
7305          ('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h, 'i, 'j) Fold01N.t
7306       type ('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h, 'i, 'j) step0 =
7307          ('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h, 'i, 'j) Fold01N.step0
7308       type ('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h, 'i, 'j) step1 =
7309          ('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h, 'i, 'j) Fold01N.step1
7310
7311       val fold:
7312          {finish: 'accum2 -> 'answer,
7313           start: 'input -> 'accum1,
7314           zero: 'zero}
7315          -> ('input, 'accum1, 'accum2, 'answer, 'zero,
7316              'a, 'b, 'c, 'd, 'e) t
7317
7318       val step0:
7319          {combine: 'accum1 * 'input2 -> 'accum2,
7320           input: 'input1}
7321          -> ('input1, 'accum1, 'input2, 'accum2,
7322              'a, 'b, 'c, 'd, 'e, 'f) step0
7323
7324       val step1:
7325          {combine: 'accum1 * 'input -> 'accum2}
7326          -> ('accum1, 'input, 'accum2,
7327              'a, 'b, 'c, 'd, 'e, 'f, 'g) step1
7328    end
7329
7330 structure Fold01N: FOLD_01N =
7331    struct
7332       open Fold01N
7333
7334       fun fold {finish, start, zero} =
7335          Fold.fold ((id, finish, fn () => zero, start),
7336                     fn (finish, _, p, _) => finish (p ()))
7337
7338       fun step0 {combine, input} =
7339          Fold.step0 (fn (_, finish, _, f) =>
7340                      (finish,
7341                       finish,
7342                       fn () => f input,
7343                       fn x' => combine (f input, x')))
7344
7345       fun step1 {combine} z input =
7346          step0 {combine = combine, input = input} z
7347    end
7348 ----
7349
7350 <<<
7351
7352 :mlton-guide-page: ForeignFunctionInterface
7353 [[ForeignFunctionInterface]]
7354 ForeignFunctionInterface
7355 ========================
7356
7357 MLton's foreign function interface (FFI) extends Standard ML and makes
7358 it easy to take the address of C global objects, access C global
7359 variables, call from SML to C, and call from C to SML.  MLton also
7360 provides <:MLNLFFI:ML-NLFFI>, which is a higher-level FFI for calling
7361 C functions and manipulating C data from SML.
7362
7363 == Overview ==
7364 * <:ForeignFunctionInterfaceTypes:Foreign Function Interface Types>
7365 * <:ForeignFunctionInterfaceSyntax:Foreign Function Interface Syntax>
7366
7367 == Importing Code into SML ==
7368 * <:CallingFromSMLToC:Calling From SML To C>
7369 * <:CallingFromSMLToCFunctionPointer:Calling From SML To C Function Pointer>
7370
7371 == Exporting Code from SML ==
7372 * <:CallingFromCToSML:Calling From C To SML>
7373
7374 == Building System Libraries ==
7375 * <:LibrarySupport:Library Support>
7376
7377 <<<
7378
7379 :mlton-guide-page: ForeignFunctionInterfaceSyntax
7380 [[ForeignFunctionInterfaceSyntax]]
7381 ForeignFunctionInterfaceSyntax
7382 ==============================
7383
7384 MLton extends the syntax of SML with expressions that enable a
7385 <:ForeignFunctionInterface:> to C.  The following description of the
7386 syntax uses some abbreviations.
7387
7388 [options="header"]
7389 |====
7390 | C base type | _cBaseTy_ | <:ForeignFunctionInterfaceTypes: Foreign Function Interface types>
7391 | C argument type | _cArgTy_ | _cBaseTy_~1~ `*` ... `*` _cBaseTy_~n~ or `unit`
7392 | C return type | _cRetTy_ | _cBaseTy_ or `unit`
7393 | C function type | _cFuncTy_ | _cArgTy_ `->` _cRetTy_
7394 | C pointer type | _cPtrTy_ | `MLton.Pointer.t`
7395 |====
7396
7397 The type annotation and the semicolon are not optional in the syntax
7398 of <:ForeignFunctionInterface:> expressions.  However, the type is
7399 lexed, parsed, and elaborated as an SML type, so any type (including
7400 type abbreviations) may be used, so long as it elaborates to a type of
7401 the correct form.
7402
7403
7404 == Address ==
7405
7406 ----
7407 _address "CFunctionOrVariableName" attr... : cPtrTy;
7408 ----
7409
7410 Denotes the address of the C function or variable.
7411
7412 `attr...` denotes a (possibly empty) sequence of attributes.  The following attributes are recognized:
7413
7414 * `external` : import with external symbol scope (see <:LibrarySupport:>) (default).
7415 * `private` : import with private symbol scope (see <:LibrarySupport:>).
7416 * `public` : import with public symbol scope (see <:LibrarySupport:>).
7417
7418 See <:MLtonPointer: MLtonPointer> for functions that manipulate C pointers.
7419
7420
7421 == Symbol ==
7422
7423 ----
7424 _symbol "CVariableName" attr... : (unit -> cBaseTy) * (cBaseTy -> unit);
7425 ----
7426
7427 Denotes the _getter_ and _setter_ for a C variable.  The __cBaseTy__s
7428 must be identical.
7429
7430 `attr...` denotes a (possibly empty) sequence of attributes.  The following attributes are recognized:
7431
7432 * `alloc` : allocate storage (and export a symbol) for the C variable.
7433 * `external` : import or export with external symbol scope (see <:LibrarySupport:>) (default if not `alloc`).
7434 * `private` : import or export with private symbol scope (see <:LibrarySupport:>).
7435 * `public` : import or export with public symbol scope (see <:LibrarySupport:>) (default if `alloc`).
7436
7437
7438 ----
7439 _symbol * : cPtrTy -> (unit -> cBaseTy) * (cBaseTy -> unit);
7440 ----
7441
7442 Denotes the _getter_ and _setter_ for a C pointer to a variable.
7443 The __cBaseTy__s must be identical.
7444
7445
7446 == Import ==
7447
7448 ----
7449 _import "CFunctionName" attr... : cFuncTy;
7450 ----
7451
7452 Denotes an SML function whose behavior is implemented by calling the C
7453 function.  See <:CallingFromSMLToC: Calling from SML to C> for more
7454 details.
7455
7456 `attr...` denotes a (possibly empty) sequence of attributes.  The following attributes are recognized:
7457
7458 * `cdecl` : call with the `cdecl` calling convention (default).
7459 * `external` : import with external symbol scope (see <:LibrarySupport:>) (default).
7460 * `impure`: assert that the function depends upon state and/or performs side effects (default).
7461 * `private` : import with private symbol scope (see <:LibrarySupport:>).
7462 * `public` : import with public symbol scope (see <:LibrarySupport:>).
7463 * `pure`: assert that the function does not depend upon state or perform any side effects; such functions are subject to various optimizations (e.g., <:CommonSubexp:>, <:RemoveUnused:>)
7464 * `reentrant`: assert that the function (directly or indirectly) calls an `_export`-ed SML function.
7465 * `stdcall` : call with the `stdcall` calling convention (ignored except on Cygwin and MinGW).
7466
7467
7468 ----
7469 _import * attr... : cPtrTy -> cFuncTy;
7470 ----
7471
7472 Denotes an SML function whose behavior is implemented by calling a C
7473 function through a C function pointer.
7474
7475 `attr...` denotes a (possibly empty) sequence of attributes.  The following attributes are recognized:
7476
7477 * `cdecl` : call with the `cdecl` calling convention (default).
7478 * `impure`: assert that the function depends upon state and/or performs side effects (default).
7479 * `pure`: assert that the function does not depend upon state or perform any side effects; such functions are subject to various optimizations (e.g., <:CommonSubexp:>, <:RemoveUnused:>)
7480 * `reentrant`: assert that the function (directly or indirectly) calls an `_export`-ed SML function.
7481 * `stdcall` : call with the `stdcall` calling convention (ignored except on Cygwin and MinGW).
7482
7483 See
7484 <:CallingFromSMLToCFunctionPointer: Calling from SML to C function pointer>
7485 for more details.
7486
7487
7488 == Export ==
7489
7490 ----
7491 _export "CFunctionName" attr... : cFuncTy -> unit;
7492 ----
7493
7494 Exports a C function with the name `CFunctionName` that can be used to
7495 call an SML function of the type _cFuncTy_. When the function denoted
7496 by the export expression is applied to an SML function `f`, subsequent
7497 C calls to `CFunctionName` will call `f`.  It is an error to call
7498 `CFunctionName` before the export has been applied.  The export may be
7499 applied more than once, with each application replacing any previous
7500 definition of `CFunctionName`.
7501
7502 `attr...` denotes a (possibly empty) sequence of attributes.  The following attributes are recognized:
7503
7504 * `cdecl` : call with the `cdecl` calling convention (default).
7505 * `private` : export with private symbol scope (see <:LibrarySupport:>).
7506 * `public` : export with public symbol scope (see <:LibrarySupport:>) (default).
7507 * `stdcall` : call with the `stdcall` calling convention (ignored except on Cygwin and MinGW).
7508
7509 See <:CallingFromCToSML: Calling from C to SML> for more details.
7510
7511 <<<
7512
7513 :mlton-guide-page: ForeignFunctionInterfaceTypes
7514 [[ForeignFunctionInterfaceTypes]]
7515 ForeignFunctionInterfaceTypes
7516 =============================
7517
7518 MLton's <:ForeignFunctionInterface:> only allows values of certain SML
7519 types to be passed between SML and C.  The following types are
7520 allowed: `bool`, `char`, `int`, `real`, `word`.  All of the different
7521 sizes of (fixed-sized) integers, reals, and words are supported as
7522 well: `Int8.int`, `Int16.int`, `Int32.int`, `Int64.int`,
7523 `Real32.real`, `Real64.real`, `Word8.word`, `Word16.word`,
7524 `Word32.word`, `Word64.word`.  There is a special type,
7525 `MLton.Pointer.t`, for passing C pointers -- see <:MLtonPointer:> for
7526 details.
7527
7528 Arrays, refs, and vectors of the above types are also allowed.
7529 Because in MLton monomorphic arrays and vectors are exactly the same
7530 as their polymorphic counterpart, these are also allowed.  Hence,
7531 `string`, `char vector`, and `CharVector.vector` are also allowed.
7532 Strings are not null terminated, unless you manually do so from the
7533 SML side.
7534
7535 Unfortunately, passing tuples or datatypes is not allowed because that
7536 would interfere with representation optimizations.
7537
7538 The C header file that `-export-header` generates includes
7539 ++typedef++s for the C types corresponding to the SML types.  Here is
7540 the mapping between SML types and C types.
7541
7542 [options="header"]
7543 |====
7544 | SML type | C typedef | C type | Note
7545 | `array` | `Pointer` | `unsigned char *` |
7546 | `bool` | `Bool` | `int32_t` |
7547 | `char` | `Char8` | `uint8_t` |
7548 | `Int8.int` | `Int8` | `int8_t` |
7549 | `Int16.int` | `Int16` | `int16_t` |
7550 | `Int32.int` | `Int32` | `int32_t` |
7551 | `Int64.int` | `Int64` | `int64_t` |
7552 | `int` | `Int32` | `int32_t` | <:#Default:(default)>
7553 | `MLton.Pointer.t` | `Pointer` | `unsigned char *` |
7554 | `Real32.real` | `Real32` | `float` |
7555 | `Real64.real` | `Real64` | `double` |
7556 | `real` | `Real64` | `double` | <:#Default:(default)>
7557 | `ref` | `Pointer` | `unsigned char *` |
7558 | `string` | `Pointer` | `unsigned char *` | <:#ReadOnly:(read only)>
7559 | `vector` | `Pointer` | `unsigned char *` | <:#ReadOnly:(read only)>
7560 | `Word8.word` | `Word8` | `uint8_t` |
7561 | `Word16.word` | `Word16` | `uint16_t` |
7562 | `Word32.word` | `Word32` | `uint32_t` |
7563 | `Word64.word` | `Word64` | `uint64_t` |
7564 | `word` | `Word32` | `uint32_t` | <:#Default:(default)>
7565 |====
7566
7567 <!Anchor(Default)>Note (default): The default `int`, `real`, and
7568 `word` types may be set by the ++-default-type __type__++
7569 <:CompileTimeOptions: compiler option>.  The given C typedef and C
7570 types correspond to the default behavior.
7571
7572 <!Anchor(ReadOnly)>Note (read only): Because MLton assumes that
7573 vectors and strings are read-only (and will perform optimizations
7574 that, for instance, cause them to share space), you must not modify
7575 the data pointed to by the `unsigned char *` in C code.
7576
7577 Although the C type of an array, ref, or vector is always `Pointer`,
7578 in reality, the object has the natural C representation.  Your C code
7579 should cast to the appropriate C type if you want to keep the C
7580 compiler from complaining.
7581
7582 When calling an <:CallingFromSMLToC: imported C function from SML>
7583 that returns an array, ref, or vector result or when calling an
7584 <:CallingFromCToSML: exported SML function from C> that takes an
7585 array, ref, or string argument, then the object must be an ML object
7586 allocated on the ML heap.  (Although an array, ref, or vector object
7587 has the natural C representation, the object also has an additional
7588 header used by the SML runtime system.)
7589
7590 In addition, there is an <:MLBasis:> file, `$(SML_LIB)/basis/c-types.mlb`,
7591 which provides structure aliases for various C types:
7592
7593 |====
7594 | C type | Structure | Signature
7595 | `char` | `C_Char` | `INTEGER`
7596 | `signed char` | `C_SChar` | `INTEGER`
7597 | `unsigned char` | `C_UChar` | `WORD`
7598 | `short` | `C_Short` | `INTEGER`
7599 | `signed short` | `C_SShort` | `INTEGER`
7600 | `unsigned short` | `C_UShort` | `WORD`
7601 | `int` | `C_Int` | `INTEGER`
7602 | `signed int` | `C_SInt` | `INTEGER`
7603 | `unsigned int` | `C_UInt` | `WORD`
7604 | `long` | `C_Long` | `INTEGER`
7605 | `signed long` | `C_SLong` | `INTEGER`
7606 | `unsigned long` | `C_ULong` | `WORD`
7607 | `long long` | `C_LongLong` | `INTEGER`
7608 | `signed long long` | `C_SLongLong` | `INTEGER`
7609 | `unsigned long long` | `C_ULongLong` | `WORD`
7610 | `float` | `C_Float` | `REAL`
7611 | `double` | `C_Double` | `REAL`
7612 | `size_t` | `C_Size` | `WORD`
7613 | `ptrdiff_t` | `C_Ptrdiff` | `INTEGER`
7614 | `intmax_t` | `C_Intmax` | `INTEGER`
7615 | `uintmax_t` | `C_UIntmax` | `WORD`
7616 | `intptr_t` | `C_Intptr` | `INTEGER`
7617 | `uintptr_t` | `C_UIntptr` | `WORD`
7618 | `void *` | `C_Pointer` | `WORD`
7619 |====
7620
7621 These aliases depend on the configuration of the C compiler for the
7622 target architecture, and are independent of the configuration of MLton
7623 (including the ++-default-type __type__++
7624 <:CompileTimeOptions: compiler option>).
7625
7626 <<<
7627
7628 :mlton-guide-page: ForLoops
7629 [[ForLoops]]
7630 ForLoops
7631 ========
7632
7633 A `for`-loop is typically used to iterate over a range of consecutive
7634 integers that denote indices of some sort.  For example, in <:OCaml:>
7635 a `for`-loop takes either the form
7636 ----
7637 for <name> = <lower> to <upper> do <body> done
7638 ----
7639 or the form
7640 ----
7641 for <name> = <upper> downto <lower> do <body> done
7642 ----
7643
7644 Some languages provide considerably more flexible `for`-loop or
7645 `foreach`-constructs.
7646
7647 A bit surprisingly, <:StandardML:Standard ML> provides special syntax
7648 for `while`-loops, but not for `for`-loops.  Indeed, in SML, many uses
7649 of `for`-loops are better expressed using `app`, `foldl`/`foldr`,
7650 `map` and many other higher-order functions provided by the
7651 <:BasisLibrary:Basis Library> for manipulating lists, vectors and
7652 arrays.  However, the Basis Library does not provide a function for
7653 iterating over a range of integer values.  Fortunately, it is very
7654 easy to write one.
7655
7656
7657 == A fairly simple design ==
7658
7659 The following implementation imitates both the syntax and semantics of
7660 the OCaml `for`-loop.
7661
7662 [source,sml]
7663 ----
7664 datatype for = to of int * int
7665              | downto of int * int
7666
7667 infix to downto
7668
7669 val for =
7670     fn lo to up =>
7671        (fn f => let fun loop lo = if lo > up then ()
7672                                   else (f lo; loop (lo+1))
7673                 in loop lo end)
7674      | up downto lo =>
7675        (fn f => let fun loop up = if up < lo then ()
7676                                   else (f up; loop (up-1))
7677                 in loop up end)
7678 ----
7679
7680 For example,
7681
7682 [source,sml]
7683 ----
7684 for (1 to 9)
7685     (fn i => print (Int.toString i))
7686 ----
7687
7688 would print `123456789` and
7689
7690 [source,sml]
7691 ----
7692 for (9 downto 1)
7693     (fn i => print (Int.toString i))
7694 ----
7695
7696 would print `987654321`.
7697
7698 Straightforward formatting of nested loops
7699
7700 [source,sml]
7701 ----
7702 for (a to b)
7703     (fn i =>
7704         for (c to d)
7705             (fn j =>
7706                 ...))
7707 ----
7708
7709 is fairly readable, but tends to cause the body of the loop to be
7710 indented quite deeply.
7711
7712
7713 == Off-by-one ==
7714
7715 The above design has an annoying feature.  In practice, the upper
7716 bound of the iterated range is almost always excluded and most loops
7717 would subtract one from the upper bound:
7718
7719 [source,sml]
7720 ----
7721 for (0 to n-1) ...
7722 for (n-1 downto 0) ...
7723 ----
7724
7725 It is probably better to break convention and exclude the upper bound
7726 by default, because it leads to more concise code and becomes
7727 idiomatic with very little practice.  The iterator combinators
7728 described below exclude the upper bound by default.
7729
7730
7731 == Iterator combinators ==
7732
7733 While the simple `for`-function described in the previous section is
7734 probably good enough for many uses, it is a bit cumbersome when one
7735 needs to iterate over a Cartesian product.  One might also want to
7736 iterate over more than just consecutive integers.  It turns out that
7737 one can provide a library of iterator combinators that allow one to
7738 implement iterators more flexibly.
7739
7740 Since the types of the combinators may be a bit difficult to infer
7741 from their implementations, let's first take a look at a signature of
7742 the iterator combinator library:
7743
7744 [source,sml]
7745 ----
7746 signature ITER =
7747   sig
7748     type 'a t = ('a -> unit) -> unit
7749
7750     val return : 'a -> 'a t
7751     val >>= : 'a t * ('a -> 'b t) -> 'b t
7752
7753     val none : 'a t
7754
7755     val to : int * int -> int t
7756     val downto : int * int -> int t
7757
7758     val inList : 'a list -> 'a t
7759     val inVector : 'a vector -> 'a t
7760     val inArray : 'a array -> 'a t
7761
7762     val using : ('a, 'b) StringCvt.reader -> 'b -> 'a t
7763
7764     val when : 'a t * ('a -> bool) -> 'a t
7765     val by : 'a t * ('a -> 'b) -> 'b t
7766     val @@ : 'a t * 'a t -> 'a t
7767     val ** : 'a t * 'b t -> ('a, 'b) product t
7768
7769     val for : 'a -> 'a
7770   end
7771 ----
7772
7773 Several of the above combinators are meant to be used as infix
7774 operators.  Here is a set of suitable infix declarations:
7775
7776 [source,sml]
7777 ----
7778 infix 2 to downto
7779 infix 1 @@ when by
7780 infix 0 >>= **
7781 ----
7782
7783 A few notes are in order:
7784
7785 * The `'a t` type constructor with the `return` and `>>=` operators forms a monad.
7786
7787 * The `to` and `downto` combinators will omit the upper bound of the range.
7788
7789 * `for` is the identity function.  It is purely for syntactic sugar and is not strictly required.
7790
7791 * The `@@` combinator produces an iterator for the concatenation of the given iterators.
7792
7793 * The `**` combinator produces an iterator for the Cartesian product of the given iterators.
7794 ** See <:ProductType:> for the type constructor `('a, 'b) product` used in the type of the iterator produced by `**`.
7795
7796 * The `using` combinator allows one to iterate over slices, streams and many other kinds of sequences.
7797
7798 * `when` is the filtering combinator.  The name `when` is   inspired by <:OCaml:>'s guard clauses.
7799
7800 * `by` is the mapping combinator.
7801
7802 The below implementation of the `ITER`-signature makes use of the
7803 following basic combinators:
7804
7805 [source,sml]
7806 ----
7807 fun const x _ = x
7808 fun flip f x y = f y x
7809 fun id x = x
7810 fun opt fno fso = fn NONE => fno () | SOME ? => fso ?
7811 fun pass x f = f x
7812 ----
7813
7814 Here is an implementation the `ITER`-signature:
7815
7816 [source,sml]
7817 ----
7818 structure Iter :> ITER =
7819   struct
7820     type 'a t = ('a -> unit) -> unit
7821
7822     val return = pass
7823     fun (iA >>= a2iB) f = iA (flip a2iB f)
7824
7825     val none = ignore
7826
7827     fun (l to u) f = let fun `l = if l<u then (f l; `(l+1)) else () in `l end
7828     fun (u downto l) f = let fun `u = if u>l then (f (u-1); `(u-1)) else () in `u end
7829
7830     fun inList ? = flip List.app ?
7831     fun inVector ? = flip Vector.app ?
7832     fun inArray ? = flip Array.app ?
7833
7834     fun using get s f = let fun `s = opt (const ()) (fn (x, s) => (f x; `s)) (get s) in `s end
7835
7836     fun (iA when p) f = iA (fn a => if p a then f a else ())
7837     fun (iA by g) f = iA (f o g)
7838     fun (iA @@ iB) f = (iA f : unit; iB f)
7839     fun (iA ** iB) f = iA (fn a => iB (fn b => f (a & b)))
7840
7841     val for = id
7842   end
7843 ----
7844
7845 Note that some of the above combinators (e.g. `**`) could be expressed
7846 in terms of the other combinators, most notably `return` and `>>=`.
7847 Another implementation issue worth mentioning is that `downto` is
7848 written specifically to avoid computing `l-1`, which could cause an
7849 `Overflow`.
7850
7851 To use the above combinators the `Iter`-structure needs to be opened
7852
7853 [source,sml]
7854 ----
7855 open Iter
7856 ----
7857
7858 and one usually also wants to declare the infix status of the
7859 operators as shown earlier.
7860
7861 Here is an example that illustrates some of the features:
7862
7863 [source,sml]
7864 ----
7865 for (0 to 10 when (fn x => x mod 3 <> 0) ** inList ["a", "b"] ** 2 downto 1 by real)
7866     (fn x & y & z =>
7867        print ("("^Int.toString x^", \""^y^"\", "^Real.toString z^")\n"))
7868 ----
7869
7870 Using the `Iter` combinators one can easily produce more complicated
7871 iterators.  For example, here is an iterator over a "triangle":
7872
7873 [source,sml]
7874 ----
7875 fun triangle (l, u) = l to u >>= (fn i => i to u >>= (fn j => return (i, j)))
7876 ----
7877
7878 <<<
7879
7880 :mlton-guide-page: FrontEnd
7881 [[FrontEnd]]
7882 FrontEnd
7883 ========
7884
7885 <:FrontEnd:> is a translation pass from source to the <:AST:>
7886 <:IntermediateLanguage:>.
7887
7888 == Description ==
7889
7890 This pass performs lexing and parsing to produce an abstract syntax
7891 tree.
7892
7893 == Implementation ==
7894
7895 * <!ViewGitFile(mlton,master,mlton/front-end/front-end.sig)>
7896 * <!ViewGitFile(mlton,master,mlton/front-end/front-end.fun)>
7897
7898 == Details and Notes ==
7899
7900 The lexer is produced by <:MLLex:> from
7901 <!ViewGitFile(mlton,master,mlton/front-end/ml.lex)>.
7902
7903 The parser is produced by <:MLYacc:> from
7904 <!ViewGitFile(mlton,master,mlton/front-end/ml.grm)>.
7905
7906 The specifications for the lexer and parser were originally taken from
7907 <:SMLNJ: SML/NJ> (version 109.32), but have been heavily modified
7908 since then.
7909
7910 <<<
7911
7912 :mlton-guide-page: FSharp
7913 [[FSharp]]
7914 FSharp
7915 ======
7916
7917 http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/[F#]
7918 is a functional programming language developed at Microsoft Research.
7919 F# was partly inspired by the <:OCaml:OCaml> language and shares some
7920 common core constructs with it.  F# is integrated with Visual Studio
7921 2010 as a first-class language.
7922
7923 <<<
7924
7925 :mlton-guide-page: FunctionalRecordUpdate
7926 [[FunctionalRecordUpdate]]
7927 FunctionalRecordUpdate
7928 ======================
7929
7930 Functional record update is the copying of a record while replacing
7931 the values of some of the fields.  <:StandardML:Standard ML> does not
7932 have explicit syntax for functional record update.  We will show below
7933 how to implement functional record update in SML, with a little
7934 boilerplate code.
7935
7936 As an example, the functional update of the record
7937
7938 [source,sml]
7939 ----
7940 {a = 13, b = 14, c = 15}
7941 ----
7942
7943 with `c = 16` yields a new record
7944
7945 [source,sml]
7946 ----
7947 {a = 13, b = 14, c = 16}
7948 ----
7949
7950 Functional record update also makes sense with multiple simultaneous
7951 updates.  For example, the functional update of the record above with
7952 `a = 18, c = 19` yields a new record
7953
7954 [source,sml]
7955 ----
7956 {a = 18, b = 14, c = 19}
7957 ----
7958
7959
7960 One could easily imagine an extension of the SML that supports
7961 functional record update.  For example
7962
7963 [source,sml]
7964 ----
7965 e with {a = 16, b = 17}
7966 ----
7967
7968 would create a copy of the record denoted by `e` with field `a`
7969 replaced with `16` and `b` replaced with `17`.
7970
7971 Since there is no such syntax in SML, we now show how to implement
7972 functional record update directly.  We first give a simple
7973 implementation that has a number of problems.  We then give an
7974 advanced implementation, that, while complex underneath, is a reusable
7975 library that admits simple use.
7976
7977
7978 == Simple implementation ==
7979
7980 To support functional record update on the record type
7981
7982 [source,sml]
7983 ----
7984 {a: 'a, b: 'b, c: 'c}
7985 ----
7986
7987 first, define an update function for each component.
7988
7989 [source,sml]
7990 ----
7991 fun withA ({a = _, b, c}, a) = {a = a, b = b, c = c}
7992 fun withB ({a, b = _, c}, b) = {a = a, b = b, c = c}
7993 fun withC ({a, b, c = _}, c) = {a = a, b = b, c = c}
7994 ----
7995
7996 Then, one can express `e with {a = 16, b = 17}` as
7997
7998 [source,sml]
7999 ----
8000 withB (withA (e, 16), 17)
8001 ----
8002
8003 With infix notation
8004
8005 [source,sml]
8006 ----
8007 infix withA withB withC
8008 ----
8009
8010 the syntax is almost as concise as a language extension.
8011
8012 [source,sml]
8013 ----
8014 e withA 16 withB 17
8015 ----
8016
8017 This approach suffers from the fact that the amount of boilerplate
8018 code is quadratic in the number of record fields.  Furthermore,
8019 changing, adding, or deleting a field requires time proportional to
8020 the number of fields (because each ++with__<L>__++ function must be
8021 changed).  It is also annoying to have to define a ++with__<L>__++
8022 function, possibly with a fixity declaration, for each field.
8023
8024 Fortunately, there is a solution to these problems.
8025
8026
8027 == Advanced implementation ==
8028
8029 Using <:Fold:> one can define a family of ++makeUpdate__<N>__++
8030 functions and single _update_ operator `U` so that one can define a
8031 functional record update function for any record type simply by
8032 specifying a (trivial) isomorphism between that type and function
8033 argument list.  For example, suppose that we would like to do
8034 functional record update on records with fields `a` and `b`.  Then one
8035 defines a function `updateAB` as follows.
8036
8037 [source,sml]
8038 ----
8039 val updateAB =
8040    fn z =>
8041    let
8042       fun from v1 v2 = {a = v1, b = v2}
8043       fun to f {a = v1, b = v2} = f v1 v2
8044    in
8045       makeUpdate2 (from, from, to)
8046    end
8047    z
8048 ----
8049
8050 The functions `from` (think _from function arguments_) and `to` (think
8051 _to function arguements_) specify an isomorphism between `a`,`b`
8052 records and function arguments.  There is a second use of `from` to
8053 work around the lack of
8054 <:FirstClassPolymorphism:first-class polymorphism> in SML.
8055
8056 With the definition of `updateAB` in place, the following expressions
8057 are valid.
8058
8059 [source,sml]
8060 ----
8061 updateAB {a = 13, b = "hello"} (set#b "goodbye") $
8062 updateAB {a = 13.5, b = true} (set#b false) (set#a 12.5) $
8063 ----
8064
8065 As another example, suppose that we would like to do functional record
8066 update on records with fields `b`, `c`, and `d`.  Then one defines a
8067 function `updateBCD` as follows.
8068
8069 [source,sml]
8070 ----
8071 val updateBCD =
8072    fn z =>
8073    let
8074       fun from v1 v2 v3 = {b = v1, c = v2, d = v3}
8075       fun to f {b = v1, c = v2, d = v3} = f v1 v2 v3
8076    in
8077       makeUpdate3 (from, from, to)
8078    end
8079    z
8080 ----
8081
8082 With the definition of `updateBCD` in place, the following expression
8083 is valid.
8084
8085 [source,sml]
8086 ----
8087 updateBCD {b = 1, c = 2, d = 3} (set#c 4) (set#c 5) $
8088 ----
8089
8090 Note that not all fields need be updated and that the same field may
8091 be updated multiple times.  Further note that the same `set` operator
8092 is used for all update functions (in the above, for both `updateAB`
8093 and `updateBCD`).
8094
8095 In general, to define a functional-record-update function on records
8096 with fields `f1`, `f2`, ..., `fN`, use the following template.
8097
8098 [source,sml]
8099 ----
8100 val update =
8101    fn z =>
8102    let
8103       fun from v1 v2 ... vn = {f1 = v1, f2 = v2, ..., fn = vn}
8104       fun to f {f1 = v1, f2 = v2, ..., fn = vn} = v1 v2 ... vn
8105    in
8106       makeUpdateN (from, from, to)
8107    end
8108    z
8109 ----
8110
8111 With this, one can update a record as follows.
8112
8113 [source,sml]
8114 ----
8115 update {f1 = v1, ..., fn = vn} (set#fi1 vi1) ... (set#fim vim) $
8116 ----
8117
8118
8119 == The `FunctionalRecordUpdate` structure ==
8120
8121 Here is the implementation of functional record update.
8122
8123 [source,sml]
8124 ----
8125 structure FunctionalRecordUpdate =
8126    struct
8127       local
8128          fun next g (f, z) x = g (f x, z)
8129          fun f1 (f, z) x = f (z x)
8130          fun f2  z = next f1  z
8131          fun f3  z = next f2  z
8132
8133          fun c0  from = from
8134          fun c1  from = c0  from f1
8135          fun c2  from = c1  from f2
8136          fun c3  from = c2  from f3
8137
8138          fun makeUpdate cX (from, from', to) record =
8139             let
8140                fun ops () = cX from'
8141                fun vars f = to f record
8142             in
8143                Fold.fold ((vars, ops), fn (vars, _) => vars from)
8144             end
8145       in
8146          fun makeUpdate0  z = makeUpdate c0  z
8147          fun makeUpdate1  z = makeUpdate c1  z
8148          fun makeUpdate2  z = makeUpdate c2  z
8149          fun makeUpdate3  z = makeUpdate c3  z
8150
8151          fun upd z = Fold.step2 (fn (s, f, (vars, ops)) => (fn out => vars (s (ops ()) (out, f)), ops)) z
8152          fun set z = Fold.step2 (fn (s, v, (vars, ops)) => (fn out => vars (s (ops ()) (out, fn _ => v)), ops)) z
8153       end
8154    end
8155 ----
8156
8157 The idea of `makeUpdate` is to build a record of functions which can
8158 replace the contents of one argument out of a list of arguments.  The
8159 functions ++f__<X>__++ replace the 0th, 1st, ... argument with their
8160 argument `z`. The ++c__<X>__++ functions pass the first __X__ `f`
8161 functions to the record constructor.
8162
8163 The `#field` notation of Standard ML allows us to select the map
8164 function which replaces the corresponding argument. By converting the
8165 record to an argument list, feeding that list through the selected map
8166 function and piping the list into the record constructor, functional
8167 record update is achieved.
8168
8169
8170 == Efficiency ==
8171
8172 With MLton, the efficiency of this approach is as good as one would
8173 expect with the special syntax.  Namely a sequence of updates will be
8174 optimized into a single record construction that copies the unchanged
8175 fields and fills in the changed fields with their new values.
8176
8177 Before Sep 14, 2009, this page advocated an alternative implementation
8178 of <:FunctionalRecordUpdate:>.  However, the old structure caused
8179 exponentially increasing compile times.  We advise you to switch to
8180 the newer version.
8181
8182
8183 == Applications ==
8184
8185 Functional record update can be used to implement labelled
8186 <:OptionalArguments:optional arguments>.
8187
8188 <<<
8189
8190 :mlton-guide-page: fxp
8191 [[fxp]]
8192 fxp
8193 ===
8194
8195 http://atseidl2.informatik.tu-muenchen.de/%7Eberlea/Fxp/[fxp] is an XML
8196 parser written in Standard ML.
8197
8198 It has a
8199 http://atseidl2.informatik.tu-muenchen.de/%7Eberlea/Fxp/mlton.html[patch]
8200 to compile with MLton.
8201
8202 <<<
8203
8204 :mlton-guide-page: GarbageCollection
8205 [[GarbageCollection]]
8206 GarbageCollection
8207 =================
8208
8209 For a good introduction and overview to garbage collection, see
8210 <!Cite(Jones99)>.
8211
8212 MLton's garbage collector uses copying, mark-compact, and generational
8213 collection, automatically switching between them at run time based on
8214 the amount of live data relative to the amount of RAM.  The runtime
8215 system tries to keep the heap within RAM if at all possible.
8216
8217 MLton's copying collector is a simple, two-space, breadth-first,
8218 Cheney-style collector.  The design for the generational and
8219 mark-compact GC is based on <!Cite(Sansom91)>.
8220
8221 == Design notes ==
8222
8223 * http://www.mlton.org/pipermail/mlton/2002-May/012420.html
8224 +
8225 object layout and header word design
8226
8227 == Also see ==
8228
8229  * <:Regions:>
8230
8231 <<<
8232
8233 :mlton-guide-page: GenerativeDatatype
8234 [[GenerativeDatatype]]
8235 GenerativeDatatype
8236 ==================
8237
8238 In <:StandardML:Standard ML>, datatype declarations are said to be
8239 _generative_, because each time a datatype declaration is evaluated,
8240 it yields a new type.  Thus, any attempt to mix the types will lead to
8241 a type error at compile-time.  The following program, which does not
8242 type check, demonstrates this.
8243
8244 [source,sml]
8245 ----
8246 functor F () =
8247    struct
8248       datatype t = T
8249    end
8250 structure S1 = F ()
8251 structure S2 = F ()
8252 val _: S1.t -> S2.t = fn x => x
8253 ----
8254
8255 Generativity also means that two different datatype declarations
8256 define different types, even if they define identical constructors.
8257 The following program does not type check due to this.
8258
8259 [source,sml]
8260 ----
8261 datatype t = A | B
8262 val a1 = A
8263 datatype t = A | B
8264 val a2 = A
8265 val _ = if true then a1 else a2
8266 ----
8267
8268 == Also see ==
8269
8270  * <:GenerativeException:>
8271
8272 <<<
8273
8274 :mlton-guide-page: GenerativeException
8275 [[GenerativeException]]
8276 GenerativeException
8277 ===================
8278
8279 In <:StandardML:Standard ML>, exception declarations are said to be
8280 _generative_, because each time an exception declaration is evaluated,
8281 it yields a new exception.
8282
8283 The following program demonstrates the generativity of exceptions.
8284
8285 [source,sml]
8286 ----
8287 exception E
8288 val e1 = E
8289 fun isE1 (e: exn): bool =
8290    case e of
8291       E => true
8292     | _ => false
8293 exception E
8294 val e2 = E
8295 fun isE2 (e: exn): bool =
8296    case e of
8297       E => true
8298     | _ => false
8299 fun pb (b: bool): unit =
8300    print (concat [Bool.toString b, "\n"])
8301 val () = (pb (isE1 e1)
8302           ;pb (isE1 e2)
8303           ; pb (isE2 e1)
8304           ; pb (isE2 e2))
8305 ----
8306
8307 In the above program, two different exception declarations declare an
8308 exception `E` and a corresponding function that returns `true` only on
8309 that exception.  Although declared by syntactically identical
8310 exception declarations, `e1` and `e2` are different exceptions.  The
8311 program, when run, prints `true`, `false`, `false`, `true`.
8312
8313 A slight modification of the above program shows that even a single
8314 exception declaration yields a new exception each time it is
8315 evaluated.
8316
8317 [source,sml]
8318 ----
8319 fun f (): exn * (exn -> bool) =
8320    let
8321       exception E
8322    in
8323       (E, fn E => true | _ => false)
8324    end
8325 val (e1, isE1) = f ()
8326 val (e2, isE2) = f ()
8327 fun pb (b: bool): unit =
8328    print (concat [Bool.toString b, "\n"])
8329 val () = (pb (isE1 e1)
8330           ; pb (isE1 e2)
8331           ; pb (isE2 e1)
8332           ; pb (isE2 e2))
8333 ----
8334
8335 Each call to `f` yields a new exception and a function that returns
8336 `true` only on that exception.  The program, when run, prints `true`,
8337 `false`, `false`, `true`.
8338
8339
8340 == Type Safety ==
8341
8342 Exception generativity is required for type safety.  Consider the
8343 following valid SML program.
8344
8345 [source,sml]
8346 ----
8347 fun f (): ('a -> exn) * (exn -> 'a) =
8348    let
8349       exception E of 'a
8350    in
8351       (E, fn E x => x | _ => raise Fail "f")
8352    end
8353 fun cast (a: 'a): 'b =
8354    let
8355       val (make: 'a -> exn, _) = f ()
8356       val (_, get: exn -> 'b) = f ()
8357    in
8358       get (make a)
8359    end
8360 val _ = ((cast 13): int -> int) 14
8361 ----
8362
8363 If exceptions weren't generative, then each call `f ()` would yield
8364 the same exception constructor `E`.  Then, our `cast` function could
8365 use `make: 'a -> exn` to convert any value into an exception and then
8366 `get: exn -> 'b` to convert that exception to a value of arbitrary
8367 type.  If `cast` worked, then we could cast an integer as a function
8368 and apply.  Of course, because of generative exceptions, this program
8369 raises `Fail "f"`.
8370
8371
8372 == Applications ==
8373
8374 The `exn` type is effectively a <:UniversalType:universal type>.
8375
8376
8377 == Also see ==
8378
8379  * <:GenerativeDatatype:>
8380
8381 <<<
8382
8383 :mlton-guide-page: Git
8384 [[Git]]
8385 Git
8386 ===
8387
8388 http://git-scm.com/[Git] is a distributed version control system.  The
8389 MLton project currently uses Git to maintain its
8390 <:Sources:source code>.
8391
8392 Here are some online Git resources.
8393
8394 * http://git-scm.com/docs[Reference Manual]
8395 * http://git-scm.com/book[ProGit, by Scott Chacon]
8396
8397 <<<
8398
8399 :mlton-guide-page: Glade
8400 [[Glade]]
8401 Glade
8402 =====
8403
8404 http://glade.gnome.org/features.html[Glade] is a tool for generating
8405 Gtk user interfaces.
8406
8407 <:WesleyTerpstra:> is working on a Glade->mGTK converter.
8408
8409 * http://www.mlton.org/pipermail/mlton/2004-December/016865.html
8410
8411 <<<
8412
8413 :mlton-guide-page: Globalize
8414 [[Globalize]]
8415 Globalize
8416 =========
8417
8418 <:Globalize:> is an analysis pass for the <:SXML:>
8419 <:IntermediateLanguage:>, invoked from <:ClosureConvert:>.
8420
8421 == Description ==
8422
8423 This pass marks values that are constant, allowing <:ClosureConvert:>
8424 to move them out to the top level so they are only evaluated once and
8425 do not appear in closures.
8426
8427 == Implementation ==
8428
8429 * <!ViewGitFile(mlton,master,mlton/closure-convert/globalize.sig)>
8430 * <!ViewGitFile(mlton,master,mlton/closure-convert/globalize.fun)>
8431
8432 == Details and Notes ==
8433
8434 {empty}
8435
8436 <<<
8437
8438 :mlton-guide-page: GnuMP
8439 [[GnuMP]]
8440 GnuMP
8441 =====
8442
8443 The http://gmplib.org[GnuMP] library (GNU Multiple Precision
8444 arithmetic library) is a library for arbitrary precision integer
8445 arithmetic.  MLton uses the GnuMP library to implement the
8446 <:BasisLibrary: Basis Library> `IntInf` module.
8447
8448 == Known issues ==
8449
8450 * There is a known problem with the GnuMP library (prior to version
8451 4.2.x), where it requires a lot of stack space for some computations,
8452 e.g. `IntInf.toString` of a million digit number.  If you run with
8453 stack size limited, you may see a segfault in such programs.  This
8454 problem is mentioned in the http://gmplib.org/#FAQ[GnuMP FAQ], where
8455 they describe two solutions.
8456
8457 ** Increase (or unlimit) your stack space.  From your program, use
8458 `setrlimit`, or from the shell, use `ulimit`.
8459
8460 ** Configure and rebuild `libgmp` with `--disable-alloca`, which will
8461 cause it to allocate temporaries using `malloc` instead of on the
8462 stack.
8463
8464 * On some platforms, the GnuMP library may be configured to use one of
8465 multiple ABIs (Application Binary Interfaces).  For example, on some
8466 32-bit architectures, GnuMP may be configured to represent a limb as
8467 either a 32-bit `long` or as a 64-bit `long long`.  Similarly, GnuMP
8468 may be configured to use specific CPU features.
8469 +
8470 In order to efficiently use the GnuMP library, MLton represents an
8471 `IntInf.int` value in a manner compatible with the GnuMP library's
8472 representation of a limb.  Hence, it is important that MLton and the
8473 GnuMP library agree upon the representation of a limb.
8474
8475 ** When using a source package of MLton, building will detect the
8476 GnuMP library's representation of a limb.
8477
8478 ** When using a binary package of MLton that is dynamically linked
8479 against the GnuMP library, the build machine and the install machine
8480 must have the GnuMP library configured with the same representation of
8481 a limb.  (On the other hand, the build machine need not have the GnuMP
8482 library configured with CPU features compatible with the install
8483 machine.)
8484
8485 ** When using a binary package of MLton that is statically linked
8486 against the GnuMP library, the build machine and the install machine
8487 need not have the GnuMP library configured with the same
8488 representation of a limb.  (On the other hand, the build machine must
8489 have the GnuMP library configured with CPU features compatible with
8490 the install machine.)
8491 +
8492 However, MLton will be configured with the representation of a limb
8493 from the GnuMP library of the build machine.  Executables produced by
8494 MLton will be incompatible with the GnuMP library of the install
8495 machine.  To _reconfigure_ MLton with the representation of a limb
8496 from the GnuMP library of the install machine, one must edit:
8497 +
8498 ----
8499 /usr/lib/mlton/self/sizes
8500 ----
8501 +
8502 changing the
8503 +
8504 ----
8505 mplimb = ??
8506 ----
8507 +
8508 entry so that `??` corresponds to the bytes in a limb; and, one must edit:
8509 +
8510 ----
8511 /usr/lib/mlton/sml/basis/config/c/arch-os/c-types.sml
8512 ----
8513 +
8514 changing the
8515 +
8516 ----
8517 (* from "gmp.h" *)
8518 structure C_MPLimb = struct open Word?? type t = word end
8519 functor C_MPLimb_ChooseWordN (A: CHOOSE_WORDN_ARG) = ChooseWordN_Word?? (A)
8520 ----
8521 +
8522 entries so that `??` corresponds to the bits in a limb.
8523
8524 <<<
8525
8526 :mlton-guide-page: GoogleSummerOfCode2013
8527 [[GoogleSummerOfCode2013]]
8528 Google Summer of Code (2013)
8529 ============================
8530
8531 == Mentors ==
8532
8533 The following developers have agreed to serve as mentors for the 2013 Google Summer of Code:
8534
8535 * http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
8536 * http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
8537 * http://www.cs.purdue.edu/homes/suresh/[Suresh Jagannathan]
8538
8539 == Ideas List ==
8540
8541 === Implement a Partial Redundancy Elimination (PRE) Optimization ===
8542
8543 Partial redundancy elimination (PRE) is a program transformation that
8544 removes operations that are redundant on some, but not necessarily all
8545 paths, through the program.  PRE can subsume both common subexpression
8546 elimination and loop-invariant code motion, and is therefore a
8547 potentially powerful optimization.  However, a na&iuml;ve
8548 implementation of PRE on a program in static single assignment (SSA)
8549 form is unlikely to be effective.  This project aims to adapt and
8550 implement the SSAPRE algorithm(s) of Thomas VanDrunen in MLton's SSA
8551 intermediate language.
8552
8553 Background:
8554 --
8555 * http://onlinelibrary.wiley.com/doi/10.1002/spe.618/abstract[Anticipation-based partial redundancy elimination for static single assignment form]; Thomas VanDrunen and Antony L. Hosking
8556 * http://cs.wheaton.edu/%7Etvandrun/writings/thesis.pdf[Partial Redundancy Elimination for Global Value Numbering]; Thomas VanDrunen
8557 * http://www.springerlink.com/content/w06m3cw453nphm1u/[Value-Based Partial Redundancy Elimination]; Thomas VanDrunen and Antony L. Hosking
8558 * http://portal.acm.org/citation.cfm?doid=319301.319348[Partial redundancy elimination in SSA form]; Robert Kennedy, Sun Chan, Shin-Ming Liu, Raymond Lo, Peng Tu, and Fred Chow
8559 --
8560
8561 Recommended Skills: SML programming experience; some middle-end compiler experience
8562
8563 /////
8564 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
8565 /////
8566
8567 === Design and Implement a Heap Profiler ===
8568
8569 A heap profile is a description of the space usage of a program.  A
8570 heap profile is concerned with the allocation, retention, and
8571 deallocation (via garbage collection) of heap data during the
8572 execution of a program.  A heap profile can be used to diagnose
8573 performance problems in a functional program that arise from space
8574 leaks.  This project aims to design and implement a heap profiler for
8575 MLton compiled programs.
8576
8577 Background:
8578 --
8579 * http://portal.acm.org/citation.cfm?doid=583854.582451[GCspy: an adaptable heap visualisation framework]; Tony Printezis and Richard Jones
8580 * http://journals.cambridge.org/action/displayAbstract?aid=1349892[New dimensions in heap profiling]; Colin Runciman and Niklas R&ouml;jemo
8581 * http://www.springerlink.com/content/710501660722gw37/[Heap profiling for space efficiency]; Colin Runciman and Niklas R&ouml;jemo
8582 * http://journals.cambridge.org/action/displayAbstract?aid=1323096[Heap profiling of lazy functional programs]; Colin Runciman and David Wakeling
8583 --
8584
8585 Recommended Skills: C and SML programming experience; some experience with UI and visualization
8586
8587 /////
8588 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
8589 /////
8590
8591 === Garbage Collector Improvements ===
8592
8593 The garbage collector plays a significant role in the performance of
8594 functional languages.  Garbage collect too often, and program
8595 performance suffers due to the excessive time spent in the garbage
8596 collector.  Garbage collect not often enough, and program performance
8597 suffers due to the excessive space used by the uncollected garbage.
8598 One particular issue is ensuring that a program utilizing a garbage
8599 collector "plays nice" with other processes on the system, by not
8600 using too much or too little physical memory.  While there are some
8601 reasonable theoretical results about garbage collections with heaps of
8602 fixed size, there seems to be insufficient work that really looks
8603 carefully at the question of dynamically resizing the heap in response
8604 to the live data demands of the application and, similarly, in
8605 response to the behavior of the operating system and other processes.
8606 This project aims to investigate improvements to the memory behavior of
8607 MLton compiled programs through better tuning of the garbage
8608 collector.
8609
8610 Background:
8611 --
8612 * http://www.dcs.gla.ac.uk/%7Ewhited/papers/automated_heap_sizing.pdf[Automated Heap Sizing in the Poly/ML Runtime (Position Paper)]; David White, Jeremy Singer, Jonathan Aitken, and David Matthews
8613 * http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4145125[Isla Vista Heap Sizing: Using Feedback to Avoid Paging]; Chris Grzegorczyk, Sunil Soman, Chandra Krintz, and Rich Wolski
8614 * http://portal.acm.org/citation.cfm?doid=1152649.1152652[Controlling garbage collection and heap growth to reduce the execution time of Java applications]; Tim Brecht, Eshrat Arjomandi, Chang Li, and Hang Pham
8615 * http://portal.acm.org/citation.cfm?doid=1065010.1065028[Garbage collection without paging]; Matthew Hertz, Yi Feng, and Emery D. Berger
8616 * http://portal.acm.org/citation.cfm?doid=1029873.1029881[Automatic heap sizing: taking real memory into account]; Ting Yang, Matthew Hertz, Emery D. Berger, Scott F. Kaplan, and J. Eliot B. Moss
8617 --
8618
8619 Recommended Skills: C programming experience; some operating systems and/or systems programming experience; some compiler and garbage collector experience
8620
8621 /////
8622 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
8623 /////
8624
8625 === Implement Successor{nbsp}ML Language Features ===
8626
8627 Any programming language, including Standard{nbsp}ML, can be improved.
8628 The community has identified a number of modest extensions and
8629 revisions to the Standard{nbsp}ML programming language that would
8630 likely prove useful in practice.  This project aims to implement these
8631 language features in the MLton compiler.
8632
8633 Background:
8634 --
8635 * http://successor-ml.org/index.php?title=Main_Page[Successor{nbsp}ML]
8636 * http://www.mpi-sws.org/%7Erossberg/hamlet/index.html#successor-ml[HaMLet (Successor{nbsp}ML)]
8637 * http://journals.cambridge.org/action/displayAbstract?aid=1322628[A critique of Standard{nbsp}ML]; Andrew W. Appel
8638 --
8639
8640 Recommended Skills: SML programming experience; some front-end compiler experience (i.e., scanners and parsers)
8641
8642 /////
8643 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
8644 /////
8645
8646 === Implement Source-level Debugging ===
8647
8648 Debugging is a fact of programming life.  Unfortunately, most SML
8649 implementations (including MLton) provide little to no source-level
8650 debugging support.  This project aims to add basic to intermediate
8651 source-level debugging support to the MLton compiler.  MLton already
8652 supports source-level profiling, which can be used to attribute bytes
8653 allocated or time spent in source functions.  It should be relatively
8654 straightforward to leverage this source-level information into basic
8655 source-level debugging support, with the ability to set/unset
8656 breakpoints and step through declarations and functions.  It may be
8657 possible to also provide intermediate source-level debugging support,
8658 with the ability to inspect in-scope variables of basic types (e.g.,
8659 types compatible with MLton's foreign function interface).
8660
8661 Background:
8662 --
8663 * http://mlton.org/HowProfilingWorks[MLton -- How Profiling Works]
8664 * http://mlton.org/ForeignFunctionInterfaceTypes[MLton -- Foreign Function Interface Types]
8665 * http://dwarfstd.org/[DWARF Debugging Standard]
8666 * http://sourceware.org/gdb/current/onlinedocs/stabs/index.html[STABS Debugging Format]
8667 --
8668
8669 Recommended Skills: SML programming experience; some compiler experience
8670
8671 /////
8672 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
8673 /////
8674
8675 === SIMD Primitives ===
8676
8677 Most modern processors offer some direct support for SIMD (Single
8678 Instruction, Multiple Data) operations, such as Intel's MMX/SSE
8679 instructions, AMD's 3DNow!  instructions, and IBM's AltiVec.  Such
8680 instructions are particularly useful for multimedia, scientific, and
8681 cryptographic applications.  This project aims to add preliminary
8682 support for vector data and vector operations to the MLton compiler.
8683 Ideally, after surveying SIMD instruction sets and SIMD support in
8684 other compilers, a core set of SIMD primitives with broad architecture
8685 and compiler support can be identified.  After adding SIMD primitives
8686 to the core compiler and carrying them through to the various
8687 backends, there will be opportunities to design and implement an SML
8688 library that exposes the primitives to the SML programmer as well as
8689 opportunities to design and implement auto-vectorization
8690 optimizations.
8691
8692 Background:
8693 --
8694 * http://en.wikipedia.org/wiki/SIMD[SIMD]
8695 * http://gcc.gnu.org/projects/tree-ssa/vectorization.html[Auto-vectorization in GCC]
8696 * http://llvm.org/docs/Vectorizers.html[Auto-vectorization in LLVM]
8697 --
8698
8699 Recommended Skills: SML programming experience; some compiler experience; some computer architecture experience
8700
8701 /////
8702 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
8703 /////
8704
8705 === RTOS Support ===
8706
8707 This project entails porting the MLton compiler to RTOSs such as:
8708 RTEMS, RT Linux, and FreeRTOS.  The project will include modifications
8709 to the MLton build and configuration process.  Students will need to
8710 extend the MLton configuration process for each of the RTOSs.  The
8711 MLton compilation process will need to be extended to invoke the C
8712 cross compilers the RTOSs provide for embedded support.  Test scripts
8713 for validation will be necessary and these will need to be run in
8714 emulators for supported architectures.
8715
8716 Recommended Skills: C programming experience; some scripting experience
8717
8718 /////
8719 Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
8720 /////
8721
8722 === Region Based Memory Management ===
8723
8724 Region based memory management is an alternative automatic memory
8725 management scheme to garbage collection.  Regions can be inferred by
8726 the compiler (e.g., Cyclone and MLKit) or provided to the programmer
8727 through a library.  Since many students do not have extensive
8728 experience with compilers we plan on adopting the later approach.
8729 Creating a viable region based memory solution requires the removal of
8730 the GC and changes to the allocator.  Additionally, write barriers
8731 will be necessary to ensure references between two ML objects is never
8732 established if the left hand side of the assignment has a longer
8733 lifetime than the right hand side.  Students will need to come up with
8734 an appropriate interface for creating, entering, and exiting regions
8735 (examples include RTSJ scoped memory and SCJ scoped memory).
8736
8737 Background:
8738 --
8739 * Cyclone
8740 * MLKit
8741 * RTSJ + SCJ scopes
8742 --
8743
8744 Recommended Skills: SML programming experience; C programming experience; some compiler and garbage collector experience
8745
8746 /////
8747 Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
8748 /////
8749
8750 === Integration of Multi-MLton ===
8751
8752 http://multimlton.cs.purdue.edu[MultiMLton] is a compiler and runtime
8753 environment that targets scalable multicore platforms.  It is an
8754 extension of MLton.  It combines new language abstractions and
8755 associated compiler analyses for expressing and implementing various
8756 kinds of fine-grained parallelism (safe futures, speculation,
8757 transactions, etc.), along with a sophisticated runtime system tuned
8758 to efficiently handle large numbers of lightweight threads.  The core
8759 stable features of MultiMLton will need to be integrated with the
8760 latest MLton public release.  Certain experimental features, such as
8761 support for the Intel SCC and distributed runtime will be omitted.
8762 This project requires students to understand the delta between the
8763 MultiMLton code base and the MLton code base.  Students will need to
8764 create build and configuration scripts for MLton to enable MultiMLton
8765 features.
8766
8767 Background
8768 --
8769 * http://multimlton.cs.purdue.edu/mML/Publications.html[MultiMLton -- Publications]
8770 --
8771
8772 Recommended Skills: SML programming experience; C programming experience; some compiler experience
8773
8774 /////
8775 Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
8776 /////
8777
8778 <<<
8779
8780 :mlton-guide-page: GoogleSummerOfCode2014
8781 [[GoogleSummerOfCode2014]]
8782 Google Summer of Code (2014)
8783 ============================
8784
8785 == Mentors ==
8786
8787 The following developers have agreed to serve as mentors for the 2014 Google Summer of Code:
8788
8789 * http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
8790 * http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
8791 * http://people.cs.uchicago.edu/~jhr/[John Reppy]
8792 * http://www.cs.purdue.edu/homes/chandras[KC Sivaramakrishnan]
8793 /////
8794 * http://www.cs.purdue.edu/homes/suresh/[Suresh Jagannathan]
8795 /////
8796
8797 == Ideas List ==
8798
8799 === Implement a Partial Redundancy Elimination (PRE) Optimization ===
8800
8801 Partial redundancy elimination (PRE) is a program transformation that
8802 removes operations that are redundant on some, but not necessarily all
8803 paths, through the program.  PRE can subsume both common subexpression
8804 elimination and loop-invariant code motion, and is therefore a
8805 potentially powerful optimization.  However, a na&iuml;ve
8806 implementation of PRE on a program in static single assignment (SSA)
8807 form is unlikely to be effective.  This project aims to adapt and
8808 implement the SSAPRE algorithm(s) of Thomas VanDrunen in MLton's SSA
8809 intermediate language.
8810
8811 Background:
8812 --
8813 * http://onlinelibrary.wiley.com/doi/10.1002/spe.618/abstract[Anticipation-based partial redundancy elimination for static single assignment form]; Thomas VanDrunen and Antony L. Hosking
8814 * http://cs.wheaton.edu/%7Etvandrun/writings/thesis.pdf[Partial Redundancy Elimination for Global Value Numbering]; Thomas VanDrunen
8815 * http://www.springerlink.com/content/w06m3cw453nphm1u/[Value-Based Partial Redundancy Elimination]; Thomas VanDrunen and Antony L. Hosking
8816 * http://portal.acm.org/citation.cfm?doid=319301.319348[Partial redundancy elimination in SSA form]; Robert Kennedy, Sun Chan, Shin-Ming Liu, Raymond Lo, Peng Tu, and Fred Chow
8817 --
8818
8819 Recommended Skills: SML programming experience; some middle-end compiler experience
8820
8821 /////
8822 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
8823 /////
8824
8825 === Design and Implement a Heap Profiler ===
8826
8827 A heap profile is a description of the space usage of a program.  A
8828 heap profile is concerned with the allocation, retention, and
8829 deallocation (via garbage collection) of heap data during the
8830 execution of a program.  A heap profile can be used to diagnose
8831 performance problems in a functional program that arise from space
8832 leaks.  This project aims to design and implement a heap profiler for
8833 MLton compiled programs.
8834
8835 Background:
8836 --
8837 * http://portal.acm.org/citation.cfm?doid=583854.582451[GCspy: an adaptable heap visualisation framework]; Tony Printezis and Richard Jones
8838 * http://journals.cambridge.org/action/displayAbstract?aid=1349892[New dimensions in heap profiling]; Colin Runciman and Niklas R&ouml;jemo
8839 * http://www.springerlink.com/content/710501660722gw37/[Heap profiling for space efficiency]; Colin Runciman and Niklas R&ouml;jemo
8840 * http://journals.cambridge.org/action/displayAbstract?aid=1323096[Heap profiling of lazy functional programs]; Colin Runciman and David Wakeling
8841 --
8842
8843 Recommended Skills: C and SML programming experience; some experience with UI and visualization
8844
8845 /////
8846 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
8847 /////
8848
8849 === Garbage Collector Improvements ===
8850
8851 The garbage collector plays a significant role in the performance of
8852 functional languages.  Garbage collect too often, and program
8853 performance suffers due to the excessive time spent in the garbage
8854 collector.  Garbage collect not often enough, and program performance
8855 suffers due to the excessive space used by the uncollected garbage.
8856 One particular issue is ensuring that a program utilizing a garbage
8857 collector "plays nice" with other processes on the system, by not
8858 using too much or too little physical memory.  While there are some
8859 reasonable theoretical results about garbage collections with heaps of
8860 fixed size, there seems to be insufficient work that really looks
8861 carefully at the question of dynamically resizing the heap in response
8862 to the live data demands of the application and, similarly, in
8863 response to the behavior of the operating system and other processes.
8864 This project aims to investigate improvements to the memory behavior of
8865 MLton compiled programs through better tuning of the garbage
8866 collector.
8867
8868 Background:
8869 --
8870 * http://www.dcs.gla.ac.uk/%7Ewhited/papers/automated_heap_sizing.pdf[Automated Heap Sizing in the Poly/ML Runtime (Position Paper)]; David White, Jeremy Singer, Jonathan Aitken, and David Matthews
8871 * http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4145125[Isla Vista Heap Sizing: Using Feedback to Avoid Paging]; Chris Grzegorczyk, Sunil Soman, Chandra Krintz, and Rich Wolski
8872 * http://portal.acm.org/citation.cfm?doid=1152649.1152652[Controlling garbage collection and heap growth to reduce the execution time of Java applications]; Tim Brecht, Eshrat Arjomandi, Chang Li, and Hang Pham
8873 * http://portal.acm.org/citation.cfm?doid=1065010.1065028[Garbage collection without paging]; Matthew Hertz, Yi Feng, and Emery D. Berger
8874 * http://portal.acm.org/citation.cfm?doid=1029873.1029881[Automatic heap sizing: taking real memory into account]; Ting Yang, Matthew Hertz, Emery D. Berger, Scott F. Kaplan, and J. Eliot B. Moss
8875 --
8876
8877 Recommended Skills: C programming experience; some operating systems and/or systems programming experience; some compiler and garbage collector experience
8878
8879 /////
8880 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
8881 /////
8882
8883 === Implement Successor{nbsp}ML Language Features ===
8884
8885 Any programming language, including Standard{nbsp}ML, can be improved.
8886 The community has identified a number of modest extensions and
8887 revisions to the Standard{nbsp}ML programming language that would
8888 likely prove useful in practice.  This project aims to implement these
8889 language features in the MLton compiler.
8890
8891 Background:
8892 --
8893 * http://successor-ml.org/index.php?title=Main_Page[Successor{nbsp}ML]
8894 * http://www.mpi-sws.org/%7Erossberg/hamlet/index.html#successor-ml[HaMLet (Successor{nbsp}ML)]
8895 * http://journals.cambridge.org/action/displayAbstract?aid=1322628[A critique of Standard{nbsp}ML]; Andrew W. Appel
8896 --
8897
8898 Recommended Skills: SML programming experience; some front-end compiler experience (i.e., scanners and parsers)
8899
8900 /////
8901 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
8902 /////
8903
8904 === Implement Source-level Debugging ===
8905
8906 Debugging is a fact of programming life.  Unfortunately, most SML
8907 implementations (including MLton) provide little to no source-level
8908 debugging support.  This project aims to add basic to intermediate
8909 source-level debugging support to the MLton compiler.  MLton already
8910 supports source-level profiling, which can be used to attribute bytes
8911 allocated or time spent in source functions.  It should be relatively
8912 straightforward to leverage this source-level information into basic
8913 source-level debugging support, with the ability to set/unset
8914 breakpoints and step through declarations and functions.  It may be
8915 possible to also provide intermediate source-level debugging support,
8916 with the ability to inspect in-scope variables of basic types (e.g.,
8917 types compatible with MLton's foreign function interface).
8918
8919 Background:
8920 --
8921 * http://mlton.org/HowProfilingWorks[MLton -- How Profiling Works]
8922 * http://mlton.org/ForeignFunctionInterfaceTypes[MLton -- Foreign Function Interface Types]
8923 * http://dwarfstd.org/[DWARF Debugging Standard]
8924 * http://sourceware.org/gdb/current/onlinedocs/stabs/index.html[STABS Debugging Format]
8925 --
8926
8927 Recommended Skills: SML programming experience; some compiler experience
8928
8929 /////
8930 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
8931 /////
8932
8933 === Region Based Memory Management ===
8934
8935 Region based memory management is an alternative automatic memory
8936 management scheme to garbage collection.  Regions can be inferred by
8937 the compiler (e.g., Cyclone and MLKit) or provided to the programmer
8938 through a library.  Since many students do not have extensive
8939 experience with compilers we plan on adopting the later approach.
8940 Creating a viable region based memory solution requires the removal of
8941 the GC and changes to the allocator.  Additionally, write barriers
8942 will be necessary to ensure references between two ML objects is never
8943 established if the left hand side of the assignment has a longer
8944 lifetime than the right hand side.  Students will need to come up with
8945 an appropriate interface for creating, entering, and exiting regions
8946 (examples include RTSJ scoped memory and SCJ scoped memory).
8947
8948 Background:
8949 --
8950 * Cyclone
8951 * MLKit
8952 * RTSJ + SCJ scopes
8953 --
8954
8955 Recommended Skills: SML programming experience; C programming experience; some compiler and garbage collector experience
8956
8957 /////
8958 Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
8959 /////
8960
8961 === Integration of Multi-MLton ===
8962
8963 http://multimlton.cs.purdue.edu[MultiMLton] is a compiler and runtime
8964 environment that targets scalable multicore platforms.  It is an
8965 extension of MLton.  It combines new language abstractions and
8966 associated compiler analyses for expressing and implementing various
8967 kinds of fine-grained parallelism (safe futures, speculation,
8968 transactions, etc.), along with a sophisticated runtime system tuned
8969 to efficiently handle large numbers of lightweight threads.  The core
8970 stable features of MultiMLton will need to be integrated with the
8971 latest MLton public release.  Certain experimental features, such as
8972 support for the Intel SCC and distributed runtime will be omitted.
8973 This project requires students to understand the delta between the
8974 MultiMLton code base and the MLton code base.  Students will need to
8975 create build and configuration scripts for MLton to enable MultiMLton
8976 features.
8977
8978 Background
8979 --
8980 * http://multimlton.cs.purdue.edu/mML/Publications.html[MultiMLton -- Publications]
8981 --
8982
8983 Recommended Skills: SML programming experience; C programming experience; some compiler experience
8984
8985 /////
8986 Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
8987 /////
8988
8989 === Concurrent{nbsp}ML Improvements ===
8990
8991 http://cml.cs.uchicago.edu/[Concurrent ML] is an SML concurrency
8992 library based on synchronous message passing.  MLton has a partial
8993 implementation of the CML message-passing primitives, but its use in
8994 real-world applications has been stymied by the lack of completeness
8995 and thread-safe I/O libraries.  This project would aim to flesh out
8996 the CML implementation in MLton to be fully compatible with the
8997 "official" version distributed as part of SML/NJ.  Furthermore, time
8998 permitting, runtime system support could be added to allow use of
8999 modern OS features, such as asynchronous I/O, in the implementation of
9000 CML's system interfaces.
9001
9002 Background
9003 --
9004 * http://cml.cs.uchicago.edu/
9005 * http://mlton.org/ConcurrentML
9006 * http://mlton.org/ConcurrentMLImplementation
9007 --
9008
9009 Recommended Skills: SML programming experience; knowledge of concurrent programming; some operating systems and/or systems programming experience
9010
9011 /////
9012 Mentor: http://people.cs.uchicago.edu/~jhr/[John Reppy]
9013 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
9014 /////
9015
9016 /////
9017 === SML3d Development ===
9018
9019 The SML3d Project is a collection of libraries to support 3D graphics
9020 programming using Standard ML and the http://opengl.org/[OpenGL]
9021 graphics API. It currently requires the MLton implementation of SML
9022 and is supported on Linux, Mac OS X, and Microsoft Windows. There is
9023 also support for http://www.khronos.org/opencl/[OpenCL].  This project
9024 aims to continue development of the SML3d Project.
9025
9026 Background
9027 --
9028 * http://sml3d.cs.uchicago.edu/
9029 --
9030
9031 Mentor: http://people.cs.uchicago.edu/~jhr/[John Reppy]
9032 /////
9033
9034 <<<
9035
9036 :mlton-guide-page: GoogleSummerOfCode2015
9037 [[GoogleSummerOfCode2015]]
9038 Google Summer of Code (2015)
9039 ============================
9040
9041 == Mentors ==
9042
9043 The following developers have agreed to serve as mentors for the 2015 Google Summer of Code:
9044
9045 * http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
9046 * http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
9047 /////
9048 * http://people.cs.uchicago.edu/~jhr/[John Reppy]
9049 * http://www.cs.purdue.edu/homes/chandras[KC Sivaramakrishnan]
9050 * http://www.cs.purdue.edu/homes/suresh/[Suresh Jagannathan]
9051 /////
9052
9053 == Ideas List ==
9054
9055 /////
9056 === Implement a Partial Redundancy Elimination (PRE) Optimization ===
9057
9058 Partial redundancy elimination (PRE) is a program transformation that
9059 removes operations that are redundant on some, but not necessarily all
9060 paths, through the program.  PRE can subsume both common subexpression
9061 elimination and loop-invariant code motion, and is therefore a
9062 potentially powerful optimization.  However, a naïve implementation of
9063 PRE on a program in static single assignment (SSA) form is unlikely to
9064 be effective.  This project aims to adapt and implement the GVN-PRE
9065 algorithm of Thomas VanDrunen in MLton's SSA intermediate language.
9066
9067 Background:
9068 --
9069 * http://cs.wheaton.edu/%7Etvandrun/writings/thesis.pdf[Partial Redundancy Elimination for Global Value Numbering]; Thomas VanDrunen
9070 * http://www.cs.purdue.edu/research/technical_reports/2003/TR%2003-032.pdf[Corner-cases in Value-Based Partial Redundancy Elimination]; Thomas VanDrunen and Antony L. Hosking
9071 * http://www.springerlink.com/content/w06m3cw453nphm1u/[Value-Based Partial Redundancy Elimination]; Thomas VanDrunen and Antony L. Hosking
9072 * http://onlinelibrary.wiley.com/doi/10.1002/spe.618/abstract[Anticipation-based Partial Redundancy Elimination for Static Single Assignment Form]; Thomas VanDrunen and Antony L. Hosking
9073 * http://portal.acm.org/citation.cfm?doid=319301.319348[Partial Redundancy Elimination in SSA Form]; Robert Kennedy, Sun Chan, Shin-Ming Liu, Raymond Lo, Peng Tu, and Fred Chow
9074 --
9075
9076 Recommended Skills: SML programming experience; some middle-end compiler experience
9077
9078 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
9079 /////
9080
9081 === Design and Implement a Heap Profiler ===
9082
9083 A heap profile is a description of the space usage of a program.  A
9084 heap profile is concerned with the allocation, retention, and
9085 deallocation (via garbage collection) of heap data during the
9086 execution of a program.  A heap profile can be used to diagnose
9087 performance problems in a functional program that arise from space
9088 leaks.  This project aims to design and implement a heap profiler for
9089 MLton compiled programs.
9090
9091 Background:
9092 --
9093 * http://portal.acm.org/citation.cfm?doid=583854.582451[GCspy: an adaptable heap visualisation framework]; Tony Printezis and Richard Jones
9094 * http://journals.cambridge.org/action/displayAbstract?aid=1349892[New dimensions in heap profiling]; Colin Runciman and Niklas R&ouml;jemo
9095 * http://www.springerlink.com/content/710501660722gw37/[Heap profiling for space efficiency]; Colin Runciman and Niklas R&ouml;jemo
9096 * http://journals.cambridge.org/action/displayAbstract?aid=1323096[Heap profiling of lazy functional programs]; Colin Runciman and David Wakeling
9097 --
9098
9099 Recommended Skills: C and SML programming experience; some experience with UI and visualization
9100
9101 /////
9102 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
9103 /////
9104
9105 === Garbage Collector Improvements ===
9106
9107 The garbage collector plays a significant role in the performance of
9108 functional languages.  Garbage collect too often, and program
9109 performance suffers due to the excessive time spent in the garbage
9110 collector.  Garbage collect not often enough, and program performance
9111 suffers due to the excessive space used by the uncollected
9112 garbage.  One particular issue is ensuring that a program utilizing a
9113 garbage collector "plays nice" with other processes on the system, by
9114 not using too much or too little physical memory.  While there are some
9115 reasonable theoretical results about garbage collections with heaps of
9116 fixed size, there seems to be insufficient work that really looks
9117 carefully at the question of dynamically resizing the heap in response
9118 to the live data demands of the application and, similarly, in
9119 response to the behavior of the operating system and other
9120 processes.  This project aims to investigate improvements to the memory
9121 behavior of MLton compiled programs through better tuning of the
9122 garbage collector.
9123
9124 Background:
9125 --
9126 * http://gchandbook.org/[The Garbage Collection Handbook: The Art of Automatic Memory Management]; Richard Jones, Antony Hosking, Eliot Moss
9127 * http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.1020[Dual-Mode Garbage Collection]; Patrick Sansom
9128 * http://portal.acm.org/citation.cfm?doid=1029873.1029881[Automatic Heap Sizing: Taking Real Memory into Account]; Ting Yang, Matthew Hertz, Emery D. Berger, Scott F. Kaplan, and J. Eliot B. Moss
9129 * http://portal.acm.org/citation.cfm?doid=1152649.1152652[Controlling Garbage Collection and Heap Growth to Reduce the Execution Time of Java Applications]; Tim Brecht, Eshrat Arjomandi, Chang Li, and Hang Pham
9130 * http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4145125[Isla Vista Heap Sizing: Using Feedback to Avoid Paging]; Chris Grzegorczyk, Sunil Soman, Chandra Krintz, and Rich Wolski
9131 * http://portal.acm.org/citation.cfm?doid=1806651.1806669[The Economics of Garbage Collection]; Jeremy Singer, Richard E. Jones, Gavin Brown, and Mikel Luján
9132 * http://www.dcs.gla.ac.uk/%7Ejsinger/pdfs/tfp12.pdf[Automated Heap Sizing in the Poly/ML Runtime (Position Paper)]; David White, Jeremy Singer, Jonathan Aitken, and David Matthews
9133 * http://portal.acm.org/citation.cfm?doid=2555670.2466481[Control Theory for Principled Heap Sizing]; David R. White, Jeremy Singer, Jonathan M. Aitken, and Richard E. Jones
9134 --
9135
9136 Recommended Skills: C programming experience; some operating systems and/or systems programming experience; some compiler and garbage collector experience
9137
9138 /////
9139 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
9140 /////
9141
9142 === Heap-allocated Activation Records ===
9143
9144 Activation records (a.k.a., stack frames) are traditionally allocated
9145 on a stack.  This naturally corresponds to the call-return pattern of
9146 function invocation.  However, there are some disadvantages to
9147 stack-allocated activation records.  In a functional programming
9148 language, functions may be deeply recursive, resulting in call stacks
9149 that are much larger than typically supported by the operating system;
9150 hence, a functional programming language implementation will typically
9151 store its stack in its heap.  Furthermore, a functional programming
9152 language implementation must handle and recover from stack overflow,
9153 by allocating a larger stack (again, in its heap) and copying
9154 activation records from the old stack to the new stack.  In the
9155 presence of threads, stacks must be allocated in a heap and, in the
9156 presence of a garbage collector, should be garbage collected when
9157 unreachable.  While heap-allocated activation records avoid many of
9158 these disadvantages, they have not been widely implemented.  This
9159 project aims to implement and evaluate heap-allocated activation
9160 records in the MLton compiler.
9161
9162 Background:
9163 --
9164 * http://journals.cambridge.org/action/displayAbstract?aid=1295104[Empirical and Analytic Study of Stack Versus Heap Cost for Languages with Closures]; Andrew W. Appel and Zhong Shao
9165 * http://portal.acm.org/citation.cfm?doid=182590.156783[Space-efficient closure representations]; Zhong Shao and Andrew W. Appel
9166 * http://portal.acm.org/citation.cfm?doid=93548.93554[Representing control in the presence of first-class continuations]; R. Hieb, R. Kent Dybvig, and Carl Bruggeman
9167 --
9168
9169 Recommended Skills: SML programming experience; some middle- and back-end compiler experience
9170
9171 /////
9172 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
9173 /////
9174
9175 === Correctly Rounded Floating-point Binary-to-Decimal and Decimal-to-Binary Conversion Routines in Standard ML ===
9176
9177 The
9178 http://en.wikipedia.org/wiki/IEEE_754-2008[IEEE Standard for Floating-Point Arithmetic (IEEE 754)]
9179 is the de facto representation for floating-point computation.
9180 However, it is a _binary_ (base 2) representation of floating-point
9181 values, while many applications call for input and output of
9182 floating-point values in _decimal_ (base 10) representation.  The
9183 _decimal-to-binary_ conversion problem takes a decimal floating-point
9184 representation (e.g., a string like +"0.1"+) and returns the best
9185 binary floating-point representation of that number.  The
9186 _binary-to-decimal_ conversion problem takes a binary floating-point
9187 representation and returns a decimal floating-point representation
9188 using the smallest number of digits that allow the decimal
9189 floating-point representation to be converted to the original binary
9190 floating-point representation.  For both conversion routines, "best"
9191 is dependent upon the current floating-point rounding mode.
9192
9193 MLton uses David Gay's
9194 http://www.netlib.org/fp/gdtoa.tgz[gdtoa library] for floating-point
9195 conversions.  While this is an exellent library, it generalizes the
9196 decimal-to-binary and binary-to-decimal conversion routines beyond
9197 what is required by the
9198 http://standardml.org/Basis/[Standard ML Basis Library] and induces an
9199 external dependency on the compiler.  Native implementations of these
9200 conversion routines in Standard ML would obviate the dependency on the
9201 +gdtoa+ library, while also being able to take advantage of Standard
9202 ML features in the implementation (e.g., the published algorithms
9203 often require use of infinite precision arithmetic, which is provided
9204 by the +IntInf+ structure in Standard ML, but is provided in an ad hoc
9205 fasion in the +gdtoa+ library).
9206
9207 This project aims to develop a native implementation of the conversion
9208 routines in Standard ML.
9209
9210 Background:
9211 --
9212 * http://dl.acm.org/citation.cfm?doid=103162.103163[What every computer scientist should know about floating-point arithmetic]; David Goldberg
9213 * http://dl.acm.org/citation.cfm?doid=93542.93559[How to print floating-point numbers accurately]; Guy L. Steele, Jr. and Jon L. White
9214 * http://dl.acm.org/citation.cfm?doid=93542.93557[How to read floating point numbers accurately]; William D. Clinger
9215 * http://cm.bell-labs.com/cm/cs/doc/90/4-10.ps.gz[Correctly Rounded Binary-Decimal and Decimal-Binary Conversions]; David Gay
9216 * http://dl.acm.org/citation.cfm?doid=249069.231397[Printing floating-point numbers quickly and accurately]; Robert G. Burger and R. Kent Dybvig
9217 * http://dl.acm.org/citation.cfm?doid=1806596.1806623[Printing floating-point numbers quickly and accurately with integers]; Florian Loitsch
9218 --
9219
9220 Recommended Skills: SML programming experience; algorithm design and implementation
9221
9222 /////
9223 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
9224 /////
9225
9226 === Implement Source-level Debugging ===
9227
9228 Debugging is a fact of programming life.  Unfortunately, most SML
9229 implementations (including MLton) provide little to no source-level
9230 debugging support.  This project aims to add basic to intermediate
9231 source-level debugging support to the MLton compiler.  MLton already
9232 supports source-level profiling, which can be used to attribute bytes
9233 allocated or time spent in source functions.  It should be relatively
9234 straightforward to leverage this source-level information into basic
9235 source-level debugging support, with the ability to set/unset
9236 breakpoints and step through declarations and functions.  It may be
9237 possible to also provide intermediate source-level debugging support,
9238 with the ability to inspect in-scope variables of basic types (e.g.,
9239 types compatible with MLton's foreign function interface).
9240
9241 Background:
9242 --
9243 * http://mlton.org/HowProfilingWorks[MLton -- How Profiling Works]
9244 * http://mlton.org/ForeignFunctionInterfaceTypes[MLton -- Foreign Function Interface Types]
9245 * http://dwarfstd.org/[DWARF Debugging Standard]
9246 * http://sourceware.org/gdb/current/onlinedocs/stabs/index.html[STABS Debugging Format]
9247 --
9248
9249 Recommended Skills: SML programming experience; some compiler experience
9250
9251 /////
9252 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
9253 /////
9254
9255 === Region Based Memory Management ===
9256
9257 Region based memory management is an alternative automatic memory
9258 management scheme to garbage collection.  Regions can be inferred by
9259 the compiler (e.g., Cyclone and MLKit) or provided to the programmer
9260 through a library.  Since many students do not have extensive
9261 experience with compilers we plan on adopting the later approach.
9262 Creating a viable region based memory solution requires the removal of
9263 the GC and changes to the allocator.  Additionally, write barriers
9264 will be necessary to ensure references between two ML objects is never
9265 established if the left hand side of the assignment has a longer
9266 lifetime than the right hand side.  Students will need to come up with
9267 an appropriate interface for creating, entering, and exiting regions
9268 (examples include RTSJ scoped memory and SCJ scoped memory).
9269
9270 Background:
9271 --
9272 * Cyclone
9273 * MLKit
9274 * RTSJ + SCJ scopes
9275 --
9276
9277 Recommended Skills: SML programming experience; C programming experience; some compiler and garbage collector experience
9278
9279 /////
9280 Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
9281 /////
9282
9283 === Adding Real-Time Capabilities ===
9284
9285 This project focuses on exposing real-time APIs from a real-time OS
9286 kernel at the SML level.  This will require mapping the current MLton
9287 (or http://multimlton.cs.purdue.edu[MultiMLton]) threading framework
9288 to real-time threads that the RTOS provides.  This will include
9289 associating priorities with MLton threads and building priority based
9290 scheduling algorithms.  Additionally, support for perdioc, aperiodic,
9291 and sporadic tasks should be supported.  A real-time SML library will
9292 need to be created to provide a forward facing interface for
9293 programmers.  Stretch goals include reworking the MLton +atomic+
9294 statement and associated synchronization primitives built on top of
9295 the MLton +atomic+ statement.
9296
9297 Recommended Skills: SML programming experience; C programming experience; real-time experience a plus but not required
9298
9299 /////
9300 Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
9301 /////
9302
9303 === Real-Time Garbage Collection ===
9304
9305 This project focuses on modifications to the MLton GC to support
9306 real-time garbage collection.  We will model the real-time GC on the
9307 Schism RTGC.  The first task will be to create a fixed size runtime
9308 object representation.  Large structures will need to be represented
9309 as a linked lists of fixed sized objects.  Arrays and vectors will be
9310 transferred into dense trees.  Compaction and copying can therefore be
9311 removed from the GC algorithms that MLton currently supports.  Lastly,
9312 the GC will be made concurrent, allowing for the execution of the GC
9313 threads as the lowest priority task in the system.  Stretch goals
9314 include a priority aware mechanism for the GC to signal to real-time
9315 ML threads that it needs to scan their stack and identification of
9316 places where the stack is shallow to bound priority inversion during
9317 this procedure.
9318
9319 Recommended Skills: C programming experience; garbage collector experience a plus but not required
9320
9321 /////
9322 Mentor: http://www.cse.buffalo.edu/%7Elziarek/[Lukasz (Luke) Ziarek]
9323 /////
9324
9325 /////
9326 === Concurrent{nbsp}ML Improvements ===
9327
9328 http://cml.cs.uchicago.edu/[Concurrent ML] is an SML concurrency
9329 library based on synchronous message passing.  MLton has a partial
9330 implementation of the CML message-passing primitives, but its use in
9331 real-world applications has been stymied by the lack of completeness
9332 and thread-safe I/O libraries.  This project would aim to flesh out
9333 the CML implementation in MLton to be fully compatible with the
9334 "official" version distributed as part of SML/NJ.  Furthermore, time
9335 permitting, runtime system support could be added to allow use of
9336 modern OS features, such as asynchronous I/O, in the implementation of
9337 CML's system interfaces.
9338
9339 Background
9340 --
9341 * http://cml.cs.uchicago.edu/
9342 * http://mlton.org/ConcurrentML
9343 * http://mlton.org/ConcurrentMLImplementation
9344 --
9345
9346 Recommended Skills: SML programming experience; knowledge of concurrent programming; some operating systems and/or systems programming experience
9347
9348 Mentor: http://people.cs.uchicago.edu/~jhr/[John Reppy]
9349 Mentor: http://www.cs.rit.edu/%7Emtf[Matthew Fluet]
9350 /////
9351
9352 /////
9353 === SML3d Development ===
9354
9355 The SML3d Project is a collection of libraries to support 3D graphics
9356 programming using Standard ML and the http://opengl.org/[OpenGL]
9357 graphics API. It currently requires the MLton implementation of SML
9358 and is supported on Linux, Mac OS X, and Microsoft Windows. There is
9359 also support for http://www.khronos.org/opencl/[OpenCL].  This project
9360 aims to continue development of the SML3d Project.
9361
9362 Background
9363 --
9364 * http://sml3d.cs.uchicago.edu/
9365 --
9366
9367 Mentor: http://people.cs.uchicago.edu/~jhr/[John Reppy]
9368 /////
9369
9370 <<<
9371
9372 :mlton-guide-page: HaMLet
9373 [[HaMLet]]
9374 HaMLet
9375 ======
9376
9377 http://www.mpi-sws.org/~rossberg/hamlet/[HaMLet] is a
9378 <:StandardMLImplementations:Standard ML implementation>.  It is
9379 intended as reference implementation of
9380 <:DefinitionOfStandardML:The Definition of Standard ML (Revised)> and
9381 not for serious practical work.
9382
9383 <<<
9384
9385 :mlton-guide-page: HenryCejtin
9386 [[HenryCejtin]]
9387 HenryCejtin
9388 ===========
9389
9390 I was one of the original developers of Mathematica (actually employee #1).
9391 My background is a combination of mathematics and computer science.
9392 Currently I am doing various things in Chicago.
9393
9394 <<<
9395
9396 :mlton-guide-page: History
9397 [[History]]
9398 History
9399 =======
9400
9401 In April 1997, Stephen Weeks wrote a defunctorizer for Standard ML and
9402 integrated it with SML/NJ.  The defunctorizer used SML/NJ's visible
9403 compiler and operated on the `Ast` intermediate representation
9404 produced by the SML/NJ front end.  Experiments showed that
9405 defunctorization gave a speedup of up to six times over separate
9406 compilation and up to two times over batch compilation without functor
9407 expansion.
9408
9409 In August 1997, we began development of an independent compiler for
9410 SML.  At the time the compiler was called `smlc`.  By October, we had
9411 a working monomorphiser.  By November, we added a polyvariant
9412 higher-order control-flow analysis.  At that point, MLton was about
9413 10,000 lines of code.
9414
9415 Over the next year and half, `smlc` morphed into a full-fledged
9416 compiler for SML.  It was renamed MLton, and first released in March
9417 1999.
9418
9419 From the start, MLton has been driven by whole-program optimization
9420 and an emphasis on performance.  Also from the start, MLton has had a
9421 fast C FFI and `IntInf` based on the GNU multiprecision library.  At
9422 its first release, MLton was 48,006 lines.
9423
9424 Between the March 1999 and January 2002, MLton grew to 102,541 lines,
9425 as we added a native code generator, mllex, mlyacc, a profiler, many
9426 optimizations, and many libraries including threads and signal
9427 handling.
9428
9429 During 2002, MLton grew to 112,204 lines and we had releases in April
9430 and September.  We added support for cross compilation and used this
9431 to enable MLton to run on Cygwin/Windows and FreeBSD.  We also made
9432 improvements to the garbage collector, so that it now works with large
9433 arrays and up to 4G of memory and so that it automatically uses
9434 copying, mark-compact, or generational collection depending on heap
9435 usage and RAM size.  We also continued improvements to the optimizer
9436 and libraries.
9437
9438 During 2003, MLton grew to 122,299 lines and we had releases in March
9439 and July.  We extended the profiler to support source-level profiling
9440 of time and allocation and to display call graphs.  We completed the
9441 Basis Library implementation, and added new MLton-specific libraries
9442 for weak pointers and finalization.  We extended the FFI to allow
9443 callbacks from C to SML.  We added support for the Sparc/Solaris
9444 platform, and made many improvements to the C code generator.
9445
9446 <<<
9447
9448 :mlton-guide-page: HowProfilingWorks
9449 [[HowProfilingWorks]]
9450 HowProfilingWorks
9451 =================
9452
9453 Here's how <:Profiling:> works.  If profiling is on, the front end
9454 (elaborator) inserts `Enter` and `Leave` statements into the source
9455 program for function entry and exit.  For example,
9456 [source,sml]
9457 ----
9458 fun f n = if n = 0 then 0 else 1 + f (n - 1)
9459 ----
9460 becomes
9461 [source,sml]
9462 ----
9463 fun f n =
9464    let
9465       val () = Enter "f"
9466       val res = (if n = 0 then 0 else 1 + f (n - 1))
9467                 handle e => (Leave "f"; raise e)
9468       val () = Leave "f"
9469    in
9470       res
9471    end
9472 ----
9473
9474 Actually there is a bit more information than just the source function
9475 name; there is also lexical nesting and file position.
9476
9477 Most of the middle of the compiler ignores, but preserves, `Enter` and
9478 `Leave`.  However, so that profiling preserves tail calls, the
9479 <:Shrink:SSA shrinker> has an optimization that notices when the only
9480 operations that cause a call to be a nontail call are profiling
9481 operations, and if so, moves them before the call, turning it into a
9482 tail call. If you observe a program that has a tail call that appears
9483 to be turned into a nontail when compiled with profiling, please
9484 <:Bug:report a bug>.
9485
9486 There is the `checkProf` function in
9487 <!ViewGitFile(mlton,master,mlton/ssa/type-check.fun)>, which checks that
9488 the `Enter`/`Leave` statements match up.
9489
9490 In the backend, just before translating to the <:Machine: Machine IL>,
9491 the profiler uses the `Enter`/`Leave` statements to infer the "local"
9492 portion of the control stack at each program point.  The profiler then
9493 removes the ++Enter++s/++Leave++s and inserts different information
9494 depending on which kind of profiling is happening.  For time profiling
9495 (with the <:AMD64Codegen:> and <:X86Codegen:>), the profiler inserts labels that cover the
9496 code (i.e. each statement has a unique label in its basic block that
9497 prefixes it) and associates each label with the local control stack.
9498 For time profiling (with the <:CCodegen:> and <:LLVMCodegen:>), the profiler
9499 inserts code that sets a global field that records the local control
9500 stack.  For allocation profiling, the profiler inserts calls to a C
9501 function that will maintain byte counts.  With stack profiling, the
9502 profiler also inserts a call to a C function at each nontail call in
9503 order to maintain information at runtime about what SML functions are
9504 on the stack.
9505
9506 At run time, the profiler associates counters (either clock ticks or
9507 byte counts) with source functions.  When the program finishes, the
9508 profiler writes the counts out to the `mlmon.out` file.  Then,
9509 `mlprof` uses source information stored in the executable to
9510 associate the counts in the `mlmon.out` file with source
9511 functions.
9512
9513 For time profiling, the profiler catches the `SIGPROF` signal 100
9514 times per second and increments the appropriate counter, determined by
9515 looking at the label prefixing the current program counter and mapping
9516 that to the current source function.
9517
9518 == Caveats ==
9519
9520 There may be a few missed clock ticks or bytes allocated at the very
9521 end of the program after the data is written.
9522
9523 Profiling has not been tested with signals or threads.  In particular,
9524 stack profiling may behave strangely.
9525
9526 <<<
9527
9528 :mlton-guide-page: Identifier
9529 [[Identifier]]
9530 Identifier
9531 ==========
9532
9533 In <:StandardML:Standard ML>, there are syntactically two kinds of
9534 identifiers.
9535
9536 * Alphanumeric: starts with a letter or prime (`'`) and is followed by letters, digits, primes and underbars (`_`).
9537 +
9538 Examples: `abc`, `ABC123`, `Abc_123`, `'a`.
9539
9540 * Symbolic: a sequence of the following
9541 +
9542 ----
9543  ! % & $ # + - / : < = > ? @ | ~ ` ^ | *
9544 ----
9545 +
9546 Examples: `+=`, `<=`, `>>`, `$`.
9547
9548 With the exception of `=`, reserved words can not be identifiers.
9549
9550 There are a number of different classes of identifiers, some of which
9551 have additional syntactic rules.
9552
9553 * Identifiers not starting with a prime.
9554 ** value identifier (includes variables and constructors)
9555 ** type constructor
9556 ** structure identifier
9557 ** signature identifier
9558 ** functor identifier
9559 * Identifiers starting with a prime.
9560 ** type variable
9561 * Identifiers not starting with a prime and numeric labels (`1`, `2`, ...).
9562 ** record label
9563
9564 <<<
9565
9566 :mlton-guide-page: Immutable
9567 [[Immutable]]
9568 Immutable
9569 =========
9570
9571 Immutable means not <:Mutable:mutable> and is an adjective meaning
9572 "can not be modified".  Most values in <:StandardML:Standard ML> are
9573 immutable.  For example, constants, tuples, records, lists, and
9574 vectors are all immutable.
9575
9576 <<<
9577
9578 :mlton-guide-page: ImperativeTypeVariable
9579 [[ImperativeTypeVariable]]
9580 ImperativeTypeVariable
9581 ======================
9582
9583 In <:StandardML:Standard ML>, an imperative type variable is a type
9584 variable whose second character is a digit, as in `'1a` or
9585 `'2b`.  Imperative type variables were used as an alternative to
9586 the <:ValueRestriction:> in an earlier version of SML, but no longer play
9587 a role.  They are treated exactly as other type variables.
9588
9589 <<<
9590
9591 :mlton-guide-page: ImplementExceptions
9592 [[ImplementExceptions]]
9593 ImplementExceptions
9594 ===================
9595
9596 <:ImplementExceptions:> is a pass for the <:SXML:>
9597 <:IntermediateLanguage:>, invoked from <:SXMLSimplify:>.
9598
9599 == Description ==
9600
9601 This pass implements exceptions.
9602
9603 == Implementation ==
9604
9605 * <!ViewGitFile(mlton,master,mlton/xml/implement-exceptions.fun)>
9606
9607 == Details and Notes ==
9608
9609 {empty}
9610
9611 <<<
9612
9613 :mlton-guide-page: ImplementHandlers
9614 [[ImplementHandlers]]
9615 ImplementHandlers
9616 =================
9617
9618 <:ImplementHandlers:> is a pass for the <:RSSA:>
9619 <:IntermediateLanguage:>, invoked from <:RSSASimplify:>.
9620
9621 == Description ==
9622
9623 This pass implements the (threaded) exception handler stack.
9624
9625 == Implementation ==
9626
9627 * <!ViewGitFile(mlton,master,mlton/backend/implement-handlers.fun)>
9628
9629 == Details and Notes ==
9630
9631 {empty}
9632
9633 <<<
9634
9635 :mlton-guide-page: ImplementProfiling
9636 [[ImplementProfiling]]
9637 ImplementProfiling
9638 ==================
9639
9640 <:ImplementProfiling:> is a pass for the <:RSSA:>
9641 <:IntermediateLanguage:>, invoked from <:RSSASimplify:>.
9642
9643 == Description ==
9644
9645 This pass implements profiling.
9646
9647 == Implementation ==
9648
9649 * <!ViewGitFile(mlton,master,mlton/backend/implement-profiling.fun)>
9650
9651 == Details and Notes ==
9652
9653 See <:HowProfilingWorks:>.
9654
9655 <<<
9656
9657 :mlton-guide-page: ImplementSuffix
9658 [[ImplementSuffix]]
9659 ImplementSuffix
9660 ===============
9661
9662 <:ImplementSuffix:> is a pass for the <:SXML:>
9663 <:IntermediateLanguage:>, invoked from <:SXMLSimplify:>.
9664
9665 == Description ==
9666
9667 This pass implements the `TopLevel_setSuffix` primitive, which
9668 installs a function to exit the program.
9669
9670 == Implementation ==
9671
9672 * <!ViewGitFile(mlton,master,mlton/xml/implement-suffix.fun)>
9673
9674 == Details and Notes ==
9675
9676 <:ImplementSuffix:> works by introducing a new `ref` cell to contain
9677 the function of type `unit -> unit` that should be called on program
9678 exit.
9679
9680 * The following code (appropriately alpha-converted) is appended to the beginning of the <:SXML:> program:
9681 +
9682 [source,sml]
9683 ----
9684 val z_0 =
9685   fn a_0 =>
9686   let
9687     val x_0 =
9688       "toplevel suffix not installed"
9689     val x_1 =
9690       MLton_bug (x_0)
9691   in
9692     x_1
9693   end
9694 val topLevelSuffixCell =
9695   Ref_ref (z_0)
9696 ----
9697
9698 * Any occurrence of
9699 +
9700 [source,sml]
9701 ----
9702 val x_0 =
9703   TopLevel_setSuffix (f_0)
9704 ----
9705 +
9706 is rewritten to
9707 +
9708 [source,sml]
9709 ----
9710 val x_0 =
9711   Ref_assign (topLevelSuffixCell, f_0)
9712 ----
9713
9714 * The following code (appropriately alpha-converted) is appended to the end of the <:SXML:> program:
9715 +
9716 [source,sml]
9717 ----
9718 val f_0 =
9719   Ref_deref (topLevelSuffixCell)
9720 val z_0 =
9721   ()
9722 val x_0 =
9723   f_0 z_0
9724 ----
9725
9726 <<<
9727
9728 :mlton-guide-page: InfixingOperators
9729 [[InfixingOperators]]
9730 InfixingOperators
9731 =================
9732
9733 Fixity specifications are not part of signatures in
9734 <:StandardML:Standard ML>. When one wants to use a module that
9735 provides functions designed to be used as infix operators there are
9736 several obvious alternatives:
9737
9738 * Use only prefix applications. Unfortunately there are situations
9739 where infix applications lead to considerably more readable code.
9740
9741 * Make the fixity declarations at the top-level. This may lead to
9742 collisions and may be unsustainable in a large project. Pollution of
9743 the top-level should be avoided.
9744
9745 * Make the fixity declarations at each scope where you want to use
9746 infix applications. The duplication becomes inconvenient if the
9747 operators are widely used. Duplication of code should be avoided.
9748
9749 * Use non-standard extensions, such as the <:MLBasis: ML Basis system>
9750 to control the scope of fixity declarations. This has the obvious
9751 drawback of reduced portability.
9752
9753 * Reuse existing infix operator symbols (`^`, `+`, `-`, ...).  This
9754 can be convenient when the standard operators aren't needed in the
9755 same scope with the new operators.  On the other hand, one is limited
9756 to the standard operator symbols and the code may appear confusing.
9757
9758 None of the obvious alternatives is best in every case. The following
9759 describes a slightly less obvious alternative that can sometimes be
9760 useful. The idea is to approximate Haskell's special syntax for
9761 treating any identifier enclosed in grave accents (backquotes) as an
9762 infix operator. In Haskell, instead of writing the prefix application
9763 `f x y` one can write the infix application ++x &grave;f&grave; y++.
9764
9765
9766 == Infixing operators ==
9767
9768 Let's first take a look at the definitions of the operators:
9769
9770 [source,sml]
9771 ----
9772 infix  3 <\     fun x <\ f = fn y => f (x, y)     (* Left section      *)
9773 infix  3 \>     fun f \> y = f y                  (* Left application  *)
9774 infixr 3 />     fun f /> y = fn x => f (x, y)     (* Right section     *)
9775 infixr 3 </     fun x </ f = f x                  (* Right application *)
9776
9777 infix  2 o  (* See motivation below *)
9778 infix  0 :=
9779 ----
9780
9781 The left and right sectioning operators, `<\` and `/>`, are useful in
9782 SML for partial application of infix operators.
9783 <!Cite(Paulson96, ML For the Working Programmer)> describes curried
9784 functions `secl` and `secr` for the same purpose on pages 179-181.
9785 For example,
9786
9787 [source,sml]
9788 ----
9789 List.map (op- /> y)
9790 ----
9791
9792 is a function for subtracting `y` from a list of integers and
9793
9794 [source,sml]
9795 ----
9796 List.exists (x <\ op=)
9797 ----
9798
9799 is a function for testing whether a list contains an `x`.
9800
9801 Together with the left and right application operators, `\>` and `</`,
9802 the sectioning operators provide a way to treat any binary function
9803 (i.e. a function whose domain is a pair) as an infix operator.  In
9804 general,
9805
9806 ----
9807 x0 <\f1\> x1 <\f2\> x2 ... <\fN\> xN = fN (... f2 (f1 (x0, x1), x2) ..., xN)
9808 ----
9809
9810 and
9811
9812 ----
9813 xN </fN/> ... x2 </f2/> x1 </f1/> x0  =  fN (xN, ... f2 (x2, f1 (x1, x0)) ...)
9814 ----
9815
9816
9817 === Examples ===
9818
9819 As a fairly realistic example, consider providing a function for sequencing
9820 comparisons:
9821
9822 [source,sml]
9823 ----
9824 structure Order (* ... *) =
9825    struct
9826       (* ... *)
9827       val orWhenEq = fn (EQUAL, th) => th ()
9828                       | (other,  _) => other
9829       (* ... *)
9830    end
9831 ----
9832 Using `orWhenEq` and the infixing operators, one can write a
9833 `compare` function for triples as
9834
9835 [source,sml]
9836 ----
9837 fun compare (fad, fbe, fcf) ((a, b, c), (d, e, f)) =
9838     fad (a, d) <\Order.orWhenEq\> `fbe (b, e) <\Order.orWhenEq\> `fcf (c, f)
9839 ----
9840
9841 where +&grave;+ is defined as
9842
9843 [source,sml]
9844 ----
9845 fun `f x = fn () => f x
9846 ----
9847
9848 Although `orWhenEq` can be convenient (try rewriting the above without
9849 it), it is probably not useful enough to be defined at the top level
9850 as an infix operator. Fortunately we can use the infixing operators
9851 and don't have to.
9852
9853 Another fairly realistic example would be to use the infixing operators with
9854 the technique described on the <:Printf:> page. Assuming that you would have
9855 a `Printf` module binding `printf`, +&grave;+, and formatting combinators
9856 named `int` and `string`, you could write
9857
9858 [source,sml]
9859 ----
9860 let open Printf in
9861   printf (`"Here's an int "<\int\>" and a string "<\string\>".") 13 "foo" end
9862 ----
9863
9864 without having to duplicate the fixity declarations. Alternatively, you could
9865 write
9866
9867 [source,sml]
9868 ----
9869 P.printf (P.`"Here's an int "<\P.int\>" and a string "<\P.string\>".") 13 "foo"
9870 ----
9871
9872 assuming you have the made the binding
9873
9874 [source,sml]
9875 ----
9876 structure P = Printf
9877 ----
9878
9879
9880 == Application and piping operators ==
9881
9882 The left and right application operators may also provide some notational
9883 convenience on their own. In general,
9884
9885 ----
9886 f \> x1 \> ... \> xN = f x1 ... xN
9887 ----
9888
9889 and
9890
9891 ----
9892 xN </ ... </ x1 </ f = f x1 ... xN
9893 ----
9894
9895 If nothing else, both of them can eliminate parentheses. For example,
9896
9897 [source,sml]
9898 ----
9899 foo (1 + 2) = foo \> 1 + 2
9900 ----
9901
9902 The left and right application operators are related to operators
9903 that could be described as the right and left piping operators:
9904
9905 [source,sml]
9906 ----
9907 infix  1 >|     val op>| = op</      (* Left pipe *)
9908 infixr 1 |<     val op|< = op\>      (* Right pipe *)
9909 ----
9910
9911 As you can see, the left and right piping operators, `>|` and `|<`,
9912 are the same as the right and left application operators,
9913 respectively, except the associativities are reversed and the binding
9914 strength is lower. They are useful for piping data through a sequence
9915 of operations. In general,
9916
9917 ----
9918 x >| f1 >| ... >| fN = fN (... (f1 x) ...) = (fN o ... o f1) x
9919 ----
9920
9921 and
9922
9923 ----
9924 fN |< ... |< f1 |< x = fN (... (f1 x) ...) = (fN o ... o f1) x
9925 ----
9926
9927 The right piping operator, `|<`, is provided by the Haskell prelude as
9928 `$`. It can be convenient in CPS or continuation passing style.
9929
9930 A use for the left piping operator is with parsing combinators. In a
9931 strict language, like SML, eta-reduction is generally unsafe. Using
9932 the left piping operator, parsing functions can be formatted
9933 conveniently as
9934
9935 [source,sml]
9936 ----
9937 fun parsingFunc input =
9938    input >| (* ... *)
9939          || (* ... *)
9940          || (* ... *)
9941 ----
9942
9943 where `||` is supposed to be a combinator provided by the parsing combinator
9944 library.
9945
9946
9947 == About precedences ==
9948
9949 You probably noticed that we redefined the
9950 <:OperatorPrecedence:precedences> of the function composition operator
9951 `o` and the assignment operator `:=`. Doing so is not strictly
9952 necessary, but can be convenient and should be relatively
9953 safe. Consider the following motivating examples from
9954 <:WesleyTerpstra: Wesley W. Terpstra> relying on the redefined
9955 precedences:
9956
9957 [source,sml]
9958 ----
9959 Word8.fromInt o Char.ord o s <\String.sub
9960 (* Combining sectioning and composition *)
9961
9962 x := s <\String.sub\> i
9963 (* Assigning the result of an infixed application *)
9964 ----
9965
9966 In imperative languages, assignment usually has the lowest precedence
9967 (ignoring statement separators). The precedence of `:=` in the
9968 <:BasisLibrary: Basis Library> is perhaps unnecessarily high, because
9969 an expression of the form `r := x` always returns a unit, which makes
9970 little sense to combine with anything. Dropping `:=` to the lowest
9971 precedence level makes it behave more like in other imperative
9972 languages.
9973
9974 The case for `o` is different. With the exception of `before` and
9975 `:=`, it doesn't seem to make much sense to use `o` with any of the
9976 operators defined by the <:BasisLibrary: Basis Library> in an
9977 unparenthesized expression. This is simply because none of the other
9978 operators deal with functions. It would seem that the precedence of
9979 `o` could be chosen completely arbitrarily from the set `{1, ..., 9}`
9980 without having any adverse effects with respect to other infix
9981 operators defined by the <:BasisLibrary: Basis Library>.
9982
9983
9984 == Design of the symbols ==
9985
9986 The closest approximation of Haskell's ++x &grave;f&grave; y++ syntax
9987 achievable in Standard ML would probably be something like
9988 ++x &grave;f^ y++, but `^` is already used for string
9989 concatenation by the <:BasisLibrary: Basis Library>. Other
9990 combinations of the characters +&grave;+ and `^` would be
9991 possible, but none seems clearly the best visually. The symbols `<\`,
9992 `\>`, `</`, and `/>` are reasonably concise and have a certain
9993 self-documenting appearance and symmetry, which can help to remember
9994 them.  As the names suggest, the symbols of the piping operators `>|`
9995 and `|<` are inspired by Unix shell pipelines.
9996
9997
9998 == Also see ==
9999
10000  * <:Utilities:>
10001
10002 <<<
10003
10004 :mlton-guide-page: Inline
10005 [[Inline]]
10006 Inline
10007 ======
10008
10009 <:Inline:> is an optimization pass for the <:SSA:>
10010 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
10011
10012 == Description ==
10013
10014 This pass inlines <:SSA:> functions using a size-based metric.
10015
10016 == Implementation ==
10017
10018 * <!ViewGitFile(mlton,master,mlton/ssa/inline.sig)>
10019 * <!ViewGitFile(mlton,master,mlton/ssa/inline.fun)>
10020
10021 == Details and Notes ==
10022
10023 The <:Inline:> pass can be invoked to use one of three metrics:
10024
10025 * `NonRecursive(product, small)` -- inline any function satisfying `(numCalls - 1) * (size - small) <= product`, where `numCalls` is the static number of calls to the function and `size` is the size of the function.
10026 * `Leaf(size)` -- inline any leaf function smaller than `size`
10027 * `LeafNoLoop(size)` -- inline any leaf function without loops smaller than `size`
10028
10029 <<<
10030
10031 :mlton-guide-page: InsertLimitChecks
10032 [[InsertLimitChecks]]
10033 InsertLimitChecks
10034 =================
10035
10036 <:InsertLimitChecks:> is a pass for the <:RSSA:>
10037 <:IntermediateLanguage:>, invoked from <:RSSASimplify:>.
10038
10039 == Description ==
10040
10041 This pass inserts limit checks.
10042
10043 == Implementation ==
10044
10045 * <!ViewGitFile(mlton,master,mlton/backend/limit-check.fun)>
10046
10047 == Details and Notes ==
10048
10049 {empty}
10050
10051 <<<
10052
10053 :mlton-guide-page: InsertSignalChecks
10054 [[InsertSignalChecks]]
10055 InsertSignalChecks
10056 ==================
10057
10058 <:InsertSignalChecks:> is a pass for the <:RSSA:>
10059 <:IntermediateLanguage:>, invoked from <:RSSASimplify:>.
10060
10061 == Description ==
10062
10063 This pass inserts signal checks.
10064
10065 == Implementation ==
10066
10067 * <!ViewGitFile(mlton,master,mlton/backend/limit-check.fun)>
10068
10069 == Details and Notes ==
10070
10071 {empty}
10072
10073 <<<
10074
10075 :mlton-guide-page: Installation
10076 [[Installation]]
10077 Installation
10078 ============
10079
10080 MLton runs on a variety of platforms and is distributed in both source and
10081 binary form.
10082
10083 A `.tgz` or `.tbz` binary package can be extracted at any location, yielding
10084 `README.adoc` (this file), `CHANGELOG.adoc`, `LICENSE`, `Makefile`, `bin/`,
10085 `lib/`, and `share/`.  The compiler and tools can be executed in-place (e.g.,
10086 `./bin/mlton`).
10087
10088 A small set of `Makefile` variables can be used to customize the binary package
10089 via `make update`:
10090
10091  * `CC`: Specify C compiler.  Can be used for alternative tools (e.g.,
10092    `CC=clang` or `CC=gcc-7`).
10093  * `WITH_GMP_DIR`, `WITH_GMP_INC_DIR`, `WITH_GMP_LIB_DIR`: Specify GMP include
10094    and library paths, if not on default search paths.  (If `WITH_GMP_DIR` is
10095    set, then `WITH_GMP_INC_DIR` defaults to `$(WITH_GMP_DIR)/include` and
10096    `WITH_GMP_LIB_DIR` defaults to `$(WITH_GMP_DIR)/lib`.)
10097
10098 For example:
10099
10100 [source,sml]
10101 ----
10102 $ make CC=clang WITH_GMP_DIR=/opt/gmp update
10103 ----
10104
10105 On typical platforms, installing MLton (after optionally performing
10106 `make update`) to `/usr/local` can be accomplished via:
10107
10108 [source,sml]
10109 ----
10110 $ make install
10111 ----
10112
10113 A small set of `Makefile` variables can be used to customize the installation:
10114
10115  * `PREFIX`: Specify the installation prefix.
10116  * `CC`: Specify C compiler.  Can be used for alternative tools (e.g.,
10117    `CC=clang` or `CC=gcc-7`).
10118  * `WITH_GMP_DIR`, `WITH_GMP_INC_DIR`, `WITH_GMP_LIB_DIR`: Specify GMP include
10119    and library paths, if not on default search paths.  (If `WITH_GMP_DIR` is
10120    set, then `WITH_GMP_INC_DIR` defaults to `$(WITH_GMP_DIR)/include` and
10121    `WITH_GMP_LIB_DIR` defaults to `$(WITH_GMP_DIR)/lib`.)
10122
10123 For example:
10124
10125 [source,sml]
10126 ----
10127 $ make PREFIX=/opt/mlton install
10128 ----
10129
10130 Installation of MLton creates the following files and directories.
10131
10132 * ++__prefix__/bin/mllex++
10133 +
10134 The <:MLLex:> lexer generator.
10135
10136 * ++__prefix__/bin/mlnlffigen++
10137 +
10138 The <:MLNLFFI:ML-NLFFI> tool.
10139
10140 * ++__prefix__/bin/mlprof++
10141 +
10142 A <:Profiling:> tool.
10143
10144 * ++__prefix__/bin/mlton++
10145 +
10146 A script to call the compiler.  This script may be moved anywhere,
10147 however, it makes use of files in ++__prefix__/lib/mlton++.
10148
10149 * ++__prefix__/bin/mlyacc++
10150 +
10151 The <:MLYacc:> parser generator.
10152
10153 * ++__prefix__/lib/mlton++
10154 +
10155 Directory containing libraries and include files needed during compilation.
10156
10157 * ++__prefix__/share/man/man1/{mllex,mlnlffigen,mlprof,mlton,mlyacc}.1++
10158 +
10159 Man pages.
10160
10161 * ++__prefix__/share/doc/mlton++
10162 +
10163 Directory containing the user guide for MLton, mllex, and mlyacc, as
10164 well as example SML programs (in the `examples` directory), and license
10165 information.
10166
10167
10168 == Hello, World! ==
10169
10170 Once you have installed MLton, create a file called `hello-world.sml`
10171 with the following contents.
10172
10173 ----
10174 print "Hello, world!\n";
10175 ----
10176
10177 Now create an executable, `hello-world`, with the following command.
10178 ----
10179 mlton hello-world.sml
10180 ----
10181
10182 You can now run `hello-world` to verify that it works.  There are more
10183 small examples in ++__prefix__/share/doc/mlton/examples++.
10184
10185
10186 == Installation on Cygwin ==
10187
10188 When installing the Cygwin `tgz`, you should use Cygwin's `bash` and
10189 `tar`.  The use of an archiving tool that is not aware of Cygwin's
10190 mounts will put the files in the wrong place.
10191
10192 <<<
10193
10194 :mlton-guide-page: IntermediateLanguage
10195 [[IntermediateLanguage]]
10196 IntermediateLanguage
10197 ====================
10198
10199 MLton uses a number of intermediate languages in translating from the input source program to low-level code.  Here is a list in the order which they are translated to.
10200
10201  * <:AST:>.  Pretty close to the source.
10202  * <:CoreML:>.  Explicitly typed, no module constructs.
10203  * <:XML:>.  Polymorphic, <:HigherOrder:>.
10204  * <:SXML:>.  SimplyTyped, <:HigherOrder:>.
10205  * <:SSA:>.  SimplyTyped, <:FirstOrder:>.
10206  * <:SSA2:>.  SimplyTyped, <:FirstOrder:>.
10207  * <:RSSA:>.  Explicit data representations.
10208  * <:Machine:>.  Untyped register transfer language.
10209
10210 <<<
10211
10212 :mlton-guide-page: IntroduceLoops
10213 [[IntroduceLoops]]
10214 IntroduceLoops
10215 ==============
10216
10217 <:IntroduceLoops:> is an optimization pass for the <:SSA:>
10218 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
10219
10220 == Description ==
10221
10222 This pass rewrites any <:SSA:> function that calls itself in tail
10223 position into one with a local loop and no self tail calls.
10224
10225 A <:SSA:> function like
10226 ----
10227 fun F (arg_0, arg_1) = L_0 ()
10228   ...
10229   L_16 (x_0)
10230     ...
10231     F (z_0, z_1) Tail
10232   ...
10233 ----
10234 becomes
10235 ----
10236 fun F (arg_0', arg_1') = loopS_0 ()
10237   loopS_0 ()
10238     loop_0 (arg_0', arg_1')
10239   loop_0 (arg_0, arg_1)
10240     L_0 ()
10241   ...
10242   L_16 (x_0)
10243     ...
10244     loop_0 (z_0, z_1)
10245   ...
10246 ----
10247
10248 == Implementation ==
10249
10250 * <!ViewGitFile(mlton,master,mlton/ssa/introduce-loops.fun)>
10251
10252 == Details and Notes ==
10253
10254 {empty}
10255
10256 <<<
10257
10258 :mlton-guide-page: JesperLouisAndersen
10259 [[JesperLouisAndersen]]
10260 JesperLouisAndersen
10261 ===================
10262
10263 Jesper Louis Andersen is an undergraduate student at DIKU, the department of computer science, Copenhagen university. His contributions to MLton are few, though he has made the port of MLton to the NetBSD and OpenBSD platforms.
10264
10265 His general interests in computer science are compiler theory, language theory, algorithms and datastructures and programming. His assets are his general knowledge of UNIX systems, knowledge of system administration, knowledge of operating system kernels; NetBSD in particular.
10266
10267 He was employed by the university as a system administrator for 2 years, which has set him back somewhat in his studies. Currently he is trying to learn mathematics (real analysis, general topology, complex functional analysis and algebra).
10268
10269
10270 == Projects using MLton ==
10271
10272 === A register allocator ===
10273 For internal use at a compiler course at DIKU. It is written in the literate programming style and implements the _Iterated Register Coalescing_ algorithm by Lal George and Andrew Appel http://citeseer.ist.psu.edu/george96iterated.html. The status of the project is that it is unfinished. Most of the basic parts of the algorithm is done, but the interface to the students (simple) datatype takes some conversion.
10274
10275 === A configuration management system in SML ===
10276 At this time, only loose plans exists for this. The plan is to build a Configuration Management system on the principles of the OpenCM system, see http://www.opencm.org/docs.html. The basic idea is to unify "naming" and "identity" into one by uniquely identifying all objects managed in the repository by the use of cryptographic checksums. This mantra guides the rest of the system, providing integrity, accessibility and confidentiality.
10277
10278 <<<
10279
10280 :mlton-guide-page: JohnnyAndersen
10281 [[JohnnyAndersen]]
10282 JohnnyAndersen
10283 ==============
10284
10285 Johnny Andersen (aka Anoq of the Sun)
10286
10287 Here is a picture in front of the academy building
10288 at the University of Athens, Greece, taken in September 2003.
10289
10290 image::JohnnyAndersen.attachments/anoq.jpg[align="center"]
10291
10292 <<<
10293
10294 :mlton-guide-page: KnownCase
10295 [[KnownCase]]
10296 KnownCase
10297 =========
10298
10299 <:KnownCase:> is an optimization pass for the <:SSA:>
10300 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
10301
10302 == Description ==
10303
10304 This pass duplicates and simplifies `Case` transfers when the
10305 constructor of the scrutinee is known.
10306
10307 Uses <:Restore:>.
10308
10309 For example, the program
10310 [source,sml]
10311 ----
10312 val rec last =
10313   fn [] => 0
10314    | [x] => x
10315    | _ :: l => last l
10316
10317 val _ = 1 + last [2, 3, 4, 5, 6, 7]
10318 ----
10319
10320 gives rise to the <:SSA:> function
10321
10322 ----
10323 fun last_0 (x_142) = loopS_1 ()
10324   loopS_1 ()
10325     loop_11 (x_142)
10326   loop_11 (x_143)
10327     case x_143 of
10328       nil_1 => L_73 | ::_0 => L_74
10329   L_73 ()
10330     return global_5
10331   L_74 (x_145, x_144)
10332     case x_145 of
10333       nil_1 => L_75 | _ => L_76
10334   L_75 ()
10335     return x_144
10336   L_76 ()
10337     loop_11 (x_145)
10338 ----
10339
10340 which is simplified to
10341
10342 ----
10343 fun last_0 (x_142) = loopS_1 ()
10344   loopS_1 ()
10345     case x_142 of
10346       nil_1 => L_73 | ::_0 => L_118
10347   L_73 ()
10348     return global_5
10349   L_118 (x_230, x_229)
10350     L_74 (x_230, x_229, x_142)
10351   L_74 (x_145, x_144, x_232)
10352     case x_145 of
10353       nil_1 => L_75 | ::_0 => L_114
10354   L_75 ()
10355     return x_144
10356   L_114 (x_227, x_226)
10357     L_74 (x_227, x_226, x_145)
10358 ----
10359
10360 == Implementation ==
10361
10362 * <!ViewGitFile(mlton,master,mlton/ssa/known-case.fun)>
10363
10364 == Details and Notes ==
10365
10366 One interesting aspect of <:KnownCase:>, is that it often has the
10367 effect of unrolling list traversals by one iteration, moving the
10368 `nil`/`::` check to the end of the loop, rather than the beginning.
10369
10370 <<<
10371
10372 :mlton-guide-page: LambdaCalculus
10373 [[LambdaCalculus]]
10374 LambdaCalculus
10375 ==============
10376
10377 The http://en.wikipedia.org/wiki/Lambda_calculus[lambda calculus] is
10378 the formal system underlying <:StandardML:Standard ML>.
10379
10380 <<<
10381
10382 :mlton-guide-page: LambdaFree
10383 [[LambdaFree]]
10384 LambdaFree
10385 ==========
10386
10387 <:LambdaFree:> is an analysis pass for the <:SXML:>
10388 <:IntermediateLanguage:>, invoked from <:ClosureConvert:>.
10389
10390 == Description ==
10391
10392 This pass descends the entire <:SXML:> program and attaches a property
10393 to each `Lambda` `PrimExp.t` in the program.  Then, you can use
10394 `lambdaFree` and `lambdaRec` to get free variables of that `Lambda`.
10395
10396 == Implementation ==
10397
10398 * <!ViewGitFile(mlton,master,mlton/closure-convert/lambda-free.sig)>
10399 * <!ViewGitFile(mlton,master,mlton/closure-convert/lambda-free.fun)>
10400
10401 == Details and Notes ==
10402
10403 For `Lambda`-s bound in a `Fun` dec, `lambdaFree` gives the union of
10404 the frees of the entire group of mutually recursive functions.  Hence,
10405 `lambdaFree` for every `Lambda` in a single `Fun` dec is the same.
10406 Furthermore, for a `Lambda` bound in a `Fun` dec, `lambdaRec` gives
10407 the list of other functions bound in the same dec defining that
10408 `Lambda`.
10409
10410 For example:
10411 ----
10412 val rec f = fn x => ... y ... g ... f ...
10413 and g = fn z => ... f ... w ...
10414 ----
10415
10416 ----
10417 lambdaFree(fn x =>) = [y, w]
10418 lambdaFree(fn z =>) = [y, w]
10419 lambdaRec(fn x =>) = [g, f]
10420 lambdaRec(fn z =>) = [f]
10421 ----
10422
10423 <<<
10424
10425 :mlton-guide-page: LanguageChanges
10426 [[LanguageChanges]]
10427 LanguageChanges
10428 ===============
10429
10430 We are sometimes asked to modify MLton to change the language it
10431 compiles.  In short, we are conservative about making such changes.
10432 There are a number of reasons for this.
10433
10434 * <:DefinitionOfStandardML:The Definition of Standard ML> is an
10435 extremely high standard of specification.  The value of the Definition
10436 would be significantly diluted by changes that are not specified at an
10437 equally high level, and the dilution increases with the complexity of
10438 the language change and its interaction with other language features.
10439
10440 * The SML community is small and there are a number of
10441 <:StandardMLImplementations:SML implementations>.  Without an
10442 agreed-upon standard, it becomes very difficult to port programs
10443 between compilers, and the community would be balkanized.
10444
10445 * Our main goal is to enable programmers to be as effective as
10446 possible with MLton/SML.  There are a number of improvements other
10447 than language changes that we could spend our time on that would
10448 provide more benefit to programmers.
10449
10450 * The more the language that MLton compiles changes over time, the
10451 more difficult it is to use MLton as a stable platform for serious
10452 program development.
10453
10454 Despite these drawbacks, we have extended SML in a couple of cases.
10455
10456 * <:ForeignFunctionInterface: Foreign function interface>
10457 * <:MLBasis: ML Basis system>
10458 * <:SuccessorML: Successor ML features>
10459
10460 We allow these language extensions because they provide functionality
10461 that is impossible to achieve without them or have non-trivial
10462 community support.  The Definition does not define a foreign function
10463 interface.  So, we must either extend the language or greatly restrict
10464 the class of programs that can be written.  Similarly, the Definition
10465 does not provide a mechanism for namespace control at the module
10466 level, making it impossible to deliver packaged libraries and have a
10467 hope of users using them without name clashes.  The ML Basis system
10468 addresses this problem.  We have also provided a formal specification
10469 of the ML Basis system at the level of the Definition.
10470
10471 == Also see ==
10472
10473 * http://www.mlton.org/pipermail/mlton/2004-August/016165.html
10474 * http://www.mlton.org/pipermail/mlton-user/2004-December/000320.html
10475
10476 <<<
10477
10478 :mlton-guide-page: Lazy
10479 [[Lazy]]
10480 Lazy
10481 ====
10482
10483 In a lazy (or non-strict) language, the arguments to a function are
10484 not evaluated before calling the function.  Instead, the arguments are
10485 suspended and only evaluated by the function if needed.
10486
10487 <:StandardML:Standard ML> is an eager (or strict) language, not a lazy
10488 language.  However, it is easy to delay evaluation of an expression in
10489 SML by creating a _thunk_, which is a nullary function.  In SML, a
10490 thunk is written `fn () => e`.  Another essential feature of laziness
10491 is _memoization_, meaning that once a suspended argument is evaluated,
10492 subsequent references look up the value.  We can express this in SML
10493 with a function that maps a thunk to a memoized thunk.
10494
10495 [source,sml]
10496 ----
10497 signature LAZY =
10498    sig
10499       val lazy: (unit -> 'a) -> unit -> 'a
10500    end
10501 ----
10502
10503 This is easy to implement in SML.
10504
10505 [source,sml]
10506 ----
10507 structure Lazy: LAZY =
10508    struct
10509       fun lazy (th: unit -> 'a): unit -> 'a =
10510          let
10511             datatype 'a lazy_result = Unevaluated of (unit -> 'a)
10512                                     | Evaluated of 'a
10513                                     | Failed of exn
10514
10515             val r = ref (Unevaluated th)
10516          in
10517             fn () =>
10518                case !r of
10519                    Unevaluated th => let
10520                                        val a  = th ()
10521                                            handle x => (r := Failed x; raise x)
10522                                        val () =         r := Evaluated a
10523                                      in
10524                                        a
10525                                      end
10526                  | Evaluated a => a
10527                  | Failed x    => raise x
10528          end
10529    end
10530 ----
10531
10532 <<<
10533
10534 :mlton-guide-page: Libraries
10535 [[Libraries]]
10536 Libraries
10537 =========
10538
10539 In theory every strictly conforming Standard ML program should run on
10540 MLton.  However, often large SML projects use implementation specific
10541 features so some "porting" is required. Here is a partial list of
10542 software that is known to run on MLton.
10543
10544 * Utility libraries:
10545 ** <:SMLNJLibrary:> - distributed with MLton
10546 ** <:MLtonLibraryProject:> - various libraries located on the MLton subversion repository
10547 ** <!ViewGitDir(mlton,master,lib/mlton)> - the internal MLton utility library, which we hope to cleanup and make more accessible someday
10548 ** http://github.com/seanmcl/sml-ext[sml-ext], a grab bag of libraries for MLton and other SML implementations (by Sean McLaughlin)
10549 ** http://tom7misc.cvs.sourceforge.net/tom7misc/sml-lib/[sml-lib], a grab bag of libraries for MLton and other SML implementations (by <:TomMurphy:>)
10550 * Scanner generators:
10551 ** <:MLLPTLibrary:> - distributed with MLton
10552 ** <:MLLex:> - distributed with MLton
10553 ** <:MLULex:> -
10554 * Parser generators:
10555 ** <:MLAntlr:> -
10556 ** <:MLLPTLibrary:> - distributed with MLton
10557 ** <:MLYacc:> - distributed with MLton
10558 * Concurrency: <:ConcurrentML:> - distributed with MLton
10559 * Graphics
10560 ** <:SML3d:>
10561 ** <:mGTK:>
10562 * Misc. libraries:
10563 ** <:CKitLibrary:> - distributed with MLton
10564 ** <:MLRISCLibrary:> - distributed with MLton
10565 ** <:MLNLFFI:ML-NLFFI> - distributed with MLton
10566 ** <:Swerve:>, an HTTP server
10567 ** <:fxp:>, an XML parser
10568
10569 == Ports in progress ==
10570
10571 <:Contact:> us for details on any of these.
10572
10573 * <:MLDoc:> http://people.cs.uchicago.edu/%7Ejhr/tools/ml-doc.html
10574 * <:Unicode:>
10575
10576 == More ==
10577
10578 More projects using MLton can be seen on the <:Users:> page.
10579
10580 == Software for SML implementations other than MLton ==
10581
10582 * PostgreSQL
10583 ** Moscow ML: http://www.dina.kvl.dk/%7Esestoft/mosmllib/Postgres.html
10584 ** SML/NJ NLFFI: http://smlweb.sourceforge.net/smlsql/
10585 * Web:
10586 ** ML Kit: http://www.smlserver.org[SMLserver]  (a plugin for AOLserver)
10587 ** Moscow ML: http://ellemose.dina.kvl.dk/%7Esestoft/msp/index.msp[ML Server Pages] (support for PHP-style CGI scripting)
10588 ** SML/NJ: http://smlweb.sourceforge.net/[smlweb]
10589
10590 <<<
10591
10592 :mlton-guide-page: LibrarySupport
10593 [[LibrarySupport]]
10594 LibrarySupport
10595 ==============
10596
10597 MLton supports both linking to and creating system-level libraries.
10598 While Standard ML libraries should be designed with the <:MLBasis:> system to work with other Standard ML programs,
10599 system-level library support allows MLton to create libraries for use by other programming languages.
10600 Even more importantly, system-level library support allows MLton to access libraries from other languages.
10601 This article will explain how to use libraries portably with MLton.
10602
10603 == The Basics ==
10604
10605 A Dynamic Shared Object (DSO) is a piece of executable code written in a format understood by the operating system.
10606 Executable programs and dynamic libraries are the two most common examples of a DSO.
10607 They are called shared because if they are used more than once, they are only loaded once into main memory.
10608 For example, if you start two instances of your web browser (an executable), there may be two processes running, but the program code of the executable is only loaded once.
10609 A dynamic library, for example a graphical toolkit, might be used by several different executable programs, each possibly running multiple times.
10610 Nevertheless, the dynamic library is only loaded once and it's program code is shared between all of the processes.
10611
10612 In addition to program code, DSOs contain a table of textual strings called symbols.
10613 These are used in order to make the DSO do something useful, like execute.
10614 For example, on linux the symbol `_start` refers to the point in the program code where the operating system should start executing the program.
10615 Dynamic libraries generally provide many symbols, corresponding to functions which can be called and variables which can be read or written.
10616 Symbols can be used by the DSO itself, or by other DSOs which require services.
10617
10618 When a DSO creates a symbol, this is called 'exporting'.
10619 If a DSO needs to use a symbol, this is called 'importing'.
10620 A DSO might need to use symbols defined within itself or perhaps from another DSO.
10621 In both cases, it is importing that symbol, but the scope of the import differs.
10622 Similarly, a DSO might export a symbol for use only within itself, or it might export a symbol for use by other DSOs.
10623 Some symbols are resolved at compile time by the linker (those used within the DSO) and some are resolved at runtime by the dynamic link loader (symbols accessed between DSOs).
10624
10625 == Symbols in MLton ==
10626
10627 Symbols in MLton are both imported and exported via the <:ForeignFunctionInterface:>.
10628 The notation `_import "symbolname"` imports functions, `_symbol "symbolname"` imports variables, and `_address "symbolname"` imports an address.
10629 To create and export a symbol, `_export "symbolname"` creates a function symbol and `_symbol "symbolname" 'alloc'` creates and exports a variable.
10630 For details of the syntax and restrictions on the supported FFI types, read the <:ForeignFunctionInterface:> page.
10631 In this discussion it only matters that every FFI use is either an import or an export.
10632
10633 When exporting a symbol, MLton supports controlling the export scope.
10634 If the symbol should only be used within the same DSO, that symbol has '`private`' scope.
10635 Conversely, if the symbol should also be available to other DSOs the symbol has '`public`' scope.
10636 Generally, one should have as few public exports as possible.
10637 Since they are public, other DSOs will come to depend on them, limiting your ability to change them.
10638 You specify the export scope in MLton by putting `private` or `public` after the symbol's name in an FFI directive.
10639 eg: `_export "foo" private: int->int;` or `_export "bar" public: int->int;` .
10640
10641 For technical reasons, the linker and loader on various platforms need to know the scope of a symbol being imported.
10642 If the symbol is exported by the same DSO, use `public` or `private` as appropriate.
10643 If the symbol is exported by a different DSO, then the scope '`external`' should be used to import it.
10644 Within a DSO, all references to a symbol must use the same scope.
10645 MLton will check this at compile time, reporting: `symbol "foo" redeclared as public (previously external)`. This may cause linker errors.
10646 However, MLton can only check usage within Standard ML.
10647 All objects being linked into a resulting DSO must agree, and it is the programmer's responsibility to ensure this.
10648
10649 Summary of symbol scopes:
10650
10651 * `private`: used for symbols exported within a DSO only for use within that DSO
10652 * `public`: used for symbols exported within a DSO that may also be used outside that DSO
10653 * `external`: used for importing symbols from another DSO
10654 * All uses of a symbol within a DSO (both imports and exports) must agree on the symbol scope
10655
10656 == Output Formats ==
10657
10658 MLton can create executables (`-format executable`) and dynamic shared libraries (`-format library`).
10659 To link a shared library, use `-link-opt -l<dso_name>`.
10660 The default output format is executable.
10661
10662 MLton can also create archives.
10663 An archive is not a DSO, but it does have a collection of symbols.
10664 When an archive is linked into a DSO, it is completely absorbed.
10665 Other objects being compiled into the DSO should refer to the public symbols in the archive as public, since they are still in the same DSO.
10666 However, in the interest of modular programming, private symbols in an archive cannot be used outside of that archive, even within the same DSO.
10667
10668 Although both executables and libraries are DSOs, some implementation details differ on some platforms.
10669 For this reason, MLton can create two types or archives.
10670 A normal archive (`-format archive`) is appropriate for linking into an executable.
10671 Conversely, a libarchive (`-format libarchive`) should be used if it will be linked into a dynamic library.
10672
10673 When MLton does not create an executable, it creates two special symbols.
10674 The symbol `libname_open` is a function which must be called before any other symbols are accessed.
10675 The `libname` is controlled by the `-libname` compile option and defaults to the name of the output, with any prefixing lib stripped (eg: `foo` -> `foo`, `libfoo` -> `foo`).
10676 The symbol `libname_close` is a function which should be called to clean up memory once done.
10677
10678 Summary of `-format` options:
10679
10680 * `executable`: create an executable (a DSO)
10681 * `library`: create a dynamic shared library (a DSO)
10682 * `archive`: create an archive of symbols (not a DSO) that can be linked into an executable
10683 * `libarchive`: create an archive of symbols (not a DSO) that can be linked into a library
10684
10685 Related options:
10686
10687 * `-libname x`: controls the name of the special `_open` and `_close` functions.
10688
10689
10690 == Interfacing with C ==
10691
10692 MLton can generate a C header file.
10693 When the output format is not an executable, it creates one by default named `libname.h`.
10694 This can be overridden with `-export-header foo.h`.
10695 This header file should be included by any C files using the exported Standard ML symbols.
10696
10697 If C is being linked with Standard ML into the same output archive or DSO,
10698 then the C code should `#define PART_OF_LIBNAME` before it includes the header file.
10699 This ensures that the C code is using the symbols with correct scope.
10700 Any symbols exported from C should also be marked using the `PRIVATE`/`PUBLIC`/`EXTERNAL` macros defined in the Standard ML export header.
10701 The declared C scope on exported C symbols should match the import scope used in Standard ML.
10702
10703 An example:
10704 [source,c]
10705 ----
10706 #define PART_OF_FOO
10707 #include "foo.h"
10708
10709 PUBLIC int cFoo() {
10710   return smlFoo();
10711 }
10712 ----
10713
10714 [source,sml]
10715 ----
10716 val () = _export "smlFoo" private: unit -> int; (fn () => 5)
10717 val cFoo = _import "cFoo" public: unit -> int;
10718 ----
10719
10720
10721 == Operating-system specific details ==
10722
10723 On Windows, `libarchive` and `archive` are the same.
10724 However, depending on this will lead to portability problems.
10725 Windows is also especially sensitive to mixups of '`public`' and '`external`'.
10726 If an archive is linked, make sure it's symbols are imported as `public`.
10727 If a DLL is linked, make sure it's symbols are imported as `external`.
10728 Using `external` instead of `public` will result in link errors that `__imp__foo is undefined`.
10729 Using `public` instead of `external` will result in inconsistent function pointer addresses and failure to update the imported variables.
10730
10731 On Linux, `libarchive` and `archive` are different.
10732 Libarchives are quite rare, but necessary if creating a library from an archive.
10733 It is common for a library to provide both an archive and a dynamic library on this platform.
10734 The linker will pick one or the other, usually preferring the dynamic library.
10735 While a quirk of the operating system allows external import to work for both archives and libraries,
10736 portable projects should not depend on this behaviour.
10737 On other systems it can matter how the library is linked (static or dynamic).
10738
10739 <<<
10740
10741 :mlton-guide-page: License
10742 [[License]]
10743 License
10744 =======
10745
10746 == Web Site ==
10747 In order to allow the maximum freedom for the future use of the
10748 content in this web site, we require that contributions to the web
10749 site be dedicated to the public domain.  That means that you can only
10750 add works that are already in the public domain, or that you must hold
10751 the copyright on the work that you agree to dedicate the work to the
10752 public domain.
10753
10754 By contributing to this web site, you agree to dedicate your
10755 contribution to the public domain.
10756
10757 == Software ==
10758
10759 As of 20050812, MLton software is licensed under the BSD-style license
10760 below.  By contributing code to the project, you agree to release the
10761 code under this license.  Contributors can retain copyright to their
10762 contributions by asserting copyright in their code.  Contributors may
10763 also add to the list of copyright holders in
10764 `doc/license/MLton-LICENSE`, which appears below.
10765
10766 [source,text]
10767 ----
10768 sys::[./bin/InclGitFile.py mlton master doc/license/MLton-LICENSE]
10769 ----
10770
10771 <<<
10772
10773 :mlton-guide-page: LineDirective
10774 [[LineDirective]]
10775 LineDirective
10776 =============
10777
10778 To aid in the debugging of code produced by program generators such
10779 as http://www.eecs.harvard.edu/%7Enr/noweb/[Noweb], MLton supports
10780 comments with line directives of the form
10781 [source,sml]
10782 ----
10783 (*#line l.c "f"*)
10784 ----
10785 Here, _l_ and _c_ are sequences of decimal digits and _f_ is the
10786 source file.  The first character of a source file has the position
10787 1.1.  A line directive causes the front end to believe that the
10788 character following the right parenthesis is at the line and column of
10789 the specified file.  A line directive only affects the reporting of
10790 error messages and does not affect program semantics (except for
10791 functions like `MLton.Exn.history` that report source file positions).
10792 Syntactically invalid line directives are ignored.  To prevent
10793 incompatibilities with SML, the file name may not contain the
10794 character sequence `*)`.
10795
10796 <<<
10797
10798 :mlton-guide-page: LLVM
10799 [[LLVM]]
10800 LLVM
10801 ====
10802
10803 The http://www.llvm.org/[LLVM Project] is a collection of modular and
10804 reusable compiler and toolchain technologies.
10805
10806 MLton supports code generation via LLVM (`-codegen llvm`); see
10807 <:LLVMCodegen:>.
10808
10809 == Also see ==
10810
10811 * <:CMinusMinus:>
10812
10813 <<<
10814
10815 :mlton-guide-page: LLVMCodegen
10816 [[LLVMCodegen]]
10817 LLVMCodegen
10818 ===========
10819
10820 The <:LLVMCodegen:> is a <:Codegen:code generator> that translates the
10821 <:Machine:> <:IntermediateLanguage:> to <:LLVM:> assembly, which is
10822 further optimized and compiled to native object code by the <:LLVM:>
10823 toolchain.
10824
10825 It requires <:LLVM:> version 3.7 or greater to be installed.
10826
10827 In benchmarks performed on the <:RunningOnAMD64:AMD64> architecture,
10828 code size with this generator is usually slightly smaller than either
10829 the <:AMD64Codegen:native> or the <:CCodegen:C> code generators. Compile
10830 time is worse than <:AMD64Codegen:native>, but slightly better than
10831 <:CCodegen:C>. Run time is often better than either <:AMD64Codegen:native>
10832 or <:CCodegen:C>.
10833
10834 == Implementation ==
10835
10836 * <!ViewGitFile(mlton,master,mlton/codegen/llvm-codegen/llvm-codegen.sig)>
10837 * <!ViewGitFile(mlton,master,mlton/codegen/llvm-codegen/llvm-codegen.fun)>
10838
10839 == Details and Notes ==
10840
10841 The <:LLVMCodegen:> was initially developed by Brian Leibig (see
10842 <!Cite(Leibig13,An LLVM Back-end for MLton)>).
10843
10844 <<<
10845
10846 :mlton-guide-page: LocalFlatten
10847 [[LocalFlatten]]
10848 LocalFlatten
10849 ============
10850
10851 <:LocalFlatten:> is an optimization pass for the <:SSA:>
10852 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
10853
10854 == Description ==
10855
10856 This pass flattens arguments to <:SSA:> blocks.
10857
10858 A block argument is flattened as long as it only flows to selects and
10859 there is some tuple constructed in this function that flows to it.
10860
10861 == Implementation ==
10862
10863 * <!ViewGitFile(mlton,master,mlton/ssa/local-flatten.fun)>
10864
10865 == Details and Notes ==
10866
10867 {empty}
10868
10869 <<<
10870
10871 :mlton-guide-page: LocalRef
10872 [[LocalRef]]
10873 LocalRef
10874 ========
10875
10876 <:LocalRef:> is an optimization pass for the <:SSA:>
10877 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
10878
10879 == Description ==
10880
10881 This pass optimizes `ref` cells local to a <:SSA:> function:
10882
10883 * global `ref`-s only used in one function are moved to the function
10884
10885 * `ref`-s only created, read from, and written to (i.e., don't escape)
10886 are converted into function local variables
10887
10888 Uses <:Multi:> and <:Restore:>.
10889
10890 == Implementation ==
10891
10892 * <!ViewGitFile(mlton,master,mlton/ssa/local-ref.fun)>
10893
10894 == Details and Notes ==
10895
10896 Moving a global `ref` requires the <:Multi:> analysis, because a
10897 global `ref` can only be moved into a function that is executed at
10898 most once.
10899
10900 Conversion of non-escaping `ref`-s is structured in three phases:
10901
10902 * analysis -- a variable `r = Ref_ref x` escapes if
10903 ** `r` is used in any context besides `Ref_assign (r, _)` or `Ref_deref r`
10904 ** all uses `r` reachable from a (direct or indirect) call to `Thread_copyCurrent` are of the same flavor (either `Ref_assign` or `Ref_deref`); this also requires the <:Multi:> analysis.
10905
10906 * transformation
10907 +
10908 --
10909 ** rewrites `r = Ref_ref x` to `r = x`
10910 ** rewrites `_ = Ref_assign (r, y)` to `r = y`
10911 ** rewrites `z = Ref_deref r` to `z = r`
10912 --
10913 +
10914 Note that the resulting program violates the SSA condition.
10915
10916 * <:Restore:> -- restore the SSA condition.
10917
10918 <<<
10919
10920 :mlton-guide-page: Logo
10921 [[Logo]]
10922 Logo
10923 ====
10924
10925 ifdef::basebackend-html[]
10926 image::Logo.attachments/mlton.svg[align="center",height="128",width="128"]
10927 endif::[]
10928 ifdef::basebackend-docbook[]
10929 image::Logo.attachments/mlton-128.pdf[align="center"]
10930 endif::[]
10931
10932 == Files ==
10933
10934 * <!Attachment(Logo,mlton.svg)>
10935 * <!Attachment(Logo,mlton-1024.png)>
10936 * <!Attachment(Logo,mlton-512.png)>
10937 * <!Attachment(Logo,mlton-256.png)>
10938 * <!Attachment(Logo,mlton-128.png)>
10939 * <!Attachment(Logo,mlton-64.png)>
10940 * <!Attachment(Logo,mlton-32.png)>
10941
10942 <<<
10943
10944 :mlton-guide-page: LoopInvariant
10945 [[LoopInvariant]]
10946 LoopInvariant
10947 =============
10948
10949 <:LoopInvariant:> is an optimization pass for the <:SSA:>
10950 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
10951
10952 == Description ==
10953
10954 This pass removes loop invariant arguments to local loops.
10955
10956 ----
10957   loop (x, y)
10958     ...
10959   ...
10960     loop (x, z)
10961   ...
10962 ----
10963
10964 becomes
10965
10966 ----
10967   loop' (x, y)
10968     loop (y)
10969   loop (y)
10970     ...
10971   ...
10972     loop (z)
10973   ...
10974 ----
10975
10976 == Implementation ==
10977
10978 * <!ViewGitFile(mlton,master,mlton/ssa/loop-invariant.fun)>
10979
10980 == Details and Notes ==
10981
10982 {empty}
10983
10984 <<<
10985
10986 :mlton-guide-page: LoopUnroll
10987 [[LoopUnroll]]
10988 LoopUnroll
10989 ==========
10990
10991 <:LoopUnroll:> is an optimization pass for the <:SSA:> <:IntermediateLanguage:>,
10992 invoked from <:SSASimplify:>.
10993
10994 == Description ==
10995
10996 A simple loop unrolling optimization.
10997
10998 == Implementation ==
10999
11000 * <!ViewGitFile(mlton,master,mlton/ssa/loop-unroll.fun)>
11001
11002 == Details and Notes ==
11003
11004 {empty}
11005
11006 <<<
11007
11008 :mlton-guide-page: LoopUnswitch
11009 [[LoopUnswitch]]
11010 LoopUnswitch
11011 ============
11012
11013 <:LoopUnswitch:> is an optimization pass for the <:SSA:>
11014 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
11015
11016 == Description ==
11017
11018 A simple loop unswitching optimization.
11019
11020 == Implementation ==
11021
11022 * <!ViewGitFile(mlton,master,mlton/ssa/loop-unswitch.fun)>
11023
11024 == Details and Notes ==
11025
11026 {empty}
11027
11028 <<<
11029
11030 :mlton-guide-page: Machine
11031 [[Machine]]
11032 Machine
11033 =======
11034
11035 <:Machine:> is an <:IntermediateLanguage:>, translated from <:RSSA:>
11036 by <:ToMachine:> and used as input by the <:Codegen:>.
11037
11038 == Description ==
11039
11040 <:Machine:> is an <:Untyped:> <:IntermediateLanguage:>, corresponding
11041 to a abstract register machine.
11042
11043 == Implementation ==
11044
11045 * <!ViewGitFile(mlton,master,mlton/backend/machine.sig)>
11046 * <!ViewGitFile(mlton,master,mlton/backend/machine.fun)>
11047
11048 == Type Checking ==
11049
11050 The <:Machine:> <:IntermediateLanguage:> has a primitive type checker
11051 (<!ViewGitFile(mlton,master,mlton/backend/machine.sig)>,
11052 <!ViewGitFile(mlton,master,mlton/backend/machine.fun)>), which only checks
11053 some liveness properties.
11054
11055 == Details and Notes ==
11056
11057 The runtime structure sets some constants according to the
11058 configuration files on the target architecture and OS.
11059
11060 <<<
11061
11062 :mlton-guide-page: ManualPage
11063 [[ManualPage]]
11064 ManualPage
11065 ==========
11066
11067 MLton is run from the command line with a collection of options
11068 followed by a file name and a list of files to compile, assemble, and
11069 link with.
11070
11071 ----
11072 mlton [option ...] file.{c|mlb|o|sml} [file.{c|o|s|S} ...]
11073 ----
11074
11075 The simplest case is to run `mlton foo.sml`, where `foo.sml` contains
11076 a valid SML program, in which case MLton compiles the program to
11077 produce an executable `foo`.  Since MLton does not support separate
11078 compilation, the program must be the entire program you wish to
11079 compile.  However, the program may refer to signatures and structures
11080 defined in the <:BasisLibrary:Basis Library>.
11081
11082 Larger programs, spanning many files, can be compiled with the
11083 <:MLBasis:ML Basis system>.  In this case, `mlton foo.mlb` will
11084 compile the complete SML program described by the basis `foo.mlb`,
11085 which may specify both SML files and additional bases.
11086
11087 == Next Steps ==
11088
11089 * <:CompileTimeOptions:>
11090 * <:RunTimeOptions:>
11091
11092 <<<
11093
11094 :mlton-guide-page: MatchCompilation
11095 [[MatchCompilation]]
11096 MatchCompilation
11097 ================
11098
11099 Match compilation is the process of translating an SML match into a
11100 nested tree (or dag) of simple case expressions and tests.
11101
11102 MLton's match compiler is described <:MatchCompile:here>.
11103
11104 == Match compilation in other compilers ==
11105
11106 * <!Cite(BaudinetMacQueen85)>
11107 * <!Cite(Leroy90)>, pages 60-69.
11108 * <!Cite(Sestoft96)>
11109 * <!Cite(ScottRamsey00)>
11110
11111 <<<
11112
11113 :mlton-guide-page: MatchCompile
11114 [[MatchCompile]]
11115 MatchCompile
11116 ============
11117
11118 <:MatchCompile:> is a translation pass, agnostic in the
11119 <:IntermediateLanguage:>s between which it translates.
11120
11121 == Description ==
11122
11123 <:MatchCompilation:Match compilation> converts a case expression with
11124 nested patterns into a case expression with flat patterns.
11125
11126 == Implementation ==
11127
11128 * <!ViewGitFile(mlton,master,mlton/match-compile/match-compile.sig)>
11129 * <!ViewGitFile(mlton,master,mlton/match-compile/match-compile.fun)>
11130
11131 == Details and Notes ==
11132
11133 [source,sml]
11134 ----
11135 val matchCompile:
11136    {caseType: Type.t, (* type of entire expression *)
11137     cases: (NestedPat.t * ((Var.t -> Var.t) -> Exp.t)) vector,
11138     conTycon: Con.t -> Tycon.t,
11139     region: Region.t,
11140     test: Var.t,
11141     testType: Type.t,
11142     tyconCons: Tycon.t -> {con: Con.t, hasArg: bool} vector}
11143    -> Exp.t * (unit -> ((Layout.t * {isOnlyExns: bool}) vector) vector)
11144 ----
11145
11146 `matchCompile` is complicated by the desire for modularity between the
11147 match compiler and its caller.  Its caller is responsible for building
11148 the right hand side of a rule `p => e`.  On the other hand, the match
11149 compiler is responsible for destructing the test and binding new
11150 variables to the components.  In order to connect the new variables
11151 created by the match compiler with the variables in the pattern `p`,
11152 the match compiler passes an environment back to its caller that maps
11153 each variable in `p` to the corresponding variable introduced by the
11154 match compiler.
11155
11156 The match compiler builds a tree of n-way case expressions by working
11157 from outside to inside and left to right in the patterns.  For example,
11158 [source,sml]
11159 ----
11160 case x of
11161   (_, C1 a) => e1
11162 | (C2 b, C3 c) => e2
11163 ----
11164 is translated to
11165 [source,sml]
11166 ----
11167 let
11168    fun f1 a = e1
11169    fun f2 (b, c) = e2
11170 in
11171   case x of
11172      (x1, x2) =>
11173        (case x1 of
11174           C2 b' => (case x2 of
11175                       C1 a' => f1 a'
11176                     | C3 c' => f2(b',c')
11177                     | _ => raise Match)
11178         | _ => (case x2 of
11179                   C1 a_ => f1 a_
11180                 | _ => raise Match))
11181 end
11182 ----
11183
11184 Here you can see the necessity of abstracting out the ride hand sides
11185 of the cases in order to avoid code duplication.  Right hand sides are
11186 always abstracted.  The simplifier cleans things up.  You can also see
11187 the new (primed) variables introduced by the match compiler and how
11188 the renaming works.  Finally, you can see how the match compiler
11189 introduces the necessary default clauses in order to make a match
11190 exhaustive, i.e. cover all the cases.
11191
11192 The match compiler uses `numCons` and `tyconCons` to determine
11193 the exhaustivity of matches against constructors.
11194
11195 <<<
11196
11197 :mlton-guide-page: MatthewFluet
11198 [[MatthewFluet]]
11199 MatthewFluet
11200 ============
11201
11202 Matthew Fluet (
11203 mailto:matthew.fluet@gmail.com[matthew.fluet@gmail.com]
11204 ,
11205 http://www.cs.rit.edu/%7Emtf
11206 )
11207 is an Assistant Professor at the http://www.rit.edu[Rochester Institute of Technology].
11208
11209 ''''
11210
11211 Current MLton projects:
11212
11213 * general maintenance
11214 * release new version
11215
11216 ''''
11217
11218 Misc. and underspecified TODOs:
11219
11220 * understand <:RefFlatten:> and <:DeepFlatten:>
11221 ** http://www.mlton.org/pipermail/mlton/2005-April/026990.html
11222 ** http://www.mlton.org/pipermail/mlton/2007-November/030056.html
11223 ** http://www.mlton.org/pipermail/mlton/2008-April/030250.html
11224 ** http://www.mlton.org/pipermail/mlton/2008-July/030279.html
11225 ** http://www.mlton.org/pipermail/mlton/2008-August/030312.html
11226 ** http://www.mlton.org/pipermail/mlton/2008-September/030360.html
11227 ** http://www.mlton.org/pipermail/mlton-user/2009-June/001542.html
11228 * `MSG_DONTWAIT` isn't Posix
11229 * coordinate w/ Dan Spoonhower and Lukasz Ziarek and Armand Navabi on multi-threaded
11230 ** http://www.mlton.org/pipermail/mlton/2008-March/030214.html
11231 * Intel Research bug: `no tyconRep property` (company won't release sample code)
11232 ** http://www.mlton.org/pipermail/mlton-user/2008-March/001358.html
11233 * treatment of real constants
11234 ** http://www.mlton.org/pipermail/mlton/2008-May/030262.html
11235 ** http://www.mlton.org/pipermail/mlton/2008-June/030271.html
11236 * representation of `bool` and `_bool` in <:ForeignFunctionInterface:>
11237 ** http://www.mlton.org/pipermail/mlton/2008-May/030264.html
11238 * http://www.icfpcontest.org
11239 ** John Reppy claims that "It looks like the card-marking overhead that one incurs when using generational collection swamps the benefits of generational collection."
11240 * page to disk policy / single heap
11241 ** http://www.mlton.org/pipermail/mlton/2008-June/030278.html
11242 ** http://www.mlton.org/pipermail/mlton/2008-August/030318.html
11243 * `MLton.GC.pack` doesn't keep a small heap if a garbage collection occurs before `MLton.GC.unpack`.
11244 ** It might be preferable for `MLton.GC.pack` to be implemented as a (new) `MLton.GC.Ratios.setLive 1.1` followed by `MLton.GC.collect ()` and for `MLton.GC.unpack` to be implemented as `MLton.GC.Ratios.setLive 8.0` followed by `MLton.GC.collect ()`.
11245 * The `static struct GC_objectType objectTypes[] =` array includes many duplicates.  Objects of distinct source type, but equivalent representations (in terms of size, bytes non-pointers, number pointers) can share the objectType index.
11246 * PolySpace bug: <:Redundant:> optimization (company won't release sample code)
11247 ** http://www.mlton.org/pipermail/mlton/2008-September/030355.html
11248 * treatment of exception raised during <:BasisLibrary:> evaluation
11249 ** http://www.mlton.org/pipermail/mlton/2008-December/030501.html
11250 ** http://www.mlton.org/pipermail/mlton/2008-December/030502.html
11251 ** http://www.mlton.org/pipermail/mlton/2008-December/030503.html
11252 * Use `memcpy`
11253 ** http://www.mlton.org/pipermail/mlton-user/2009-January/001506.html
11254 ** http://www.mlton.org/pipermail/mlton/2009-January/030506.html
11255 * Implement more 64bit primops in x86 codegen
11256 ** http://www.mlton.org/pipermail/mlton/2009-January/030507.html
11257 * Enrich path-map file syntax:
11258 ** http://www.mlton.org/pipermail/mlton/2008-September/030348.html
11259 ** http://www.mlton.org/pipermail/mlton-user/2009-January/001507.html
11260 * PolySpace bug: crash during Cheney-copy collection
11261 ** http://www.mlton.org/pipermail/mlton/2009-February/030513.html
11262 * eliminate `-build-constants`
11263 ** all `_const`-s are known by `runtime/gen/basis-ffi.def`
11264 ** generate `gen-constants.c` from `basis-ffi.def`
11265 ** generate `constants` from `gen-constants.c` and `libmlton.a`
11266 ** similar to `gen-sizes.c` and `sizes`
11267 * eliminate "Windows hacks" for Cygwin from `Path` module
11268 ** http://www.mlton.org/pipermail/mlton/2009-July/030606.html
11269 * extend IL type checkers to check for empty property lists
11270 * make (unsafe) `IntInf` conversions into primitives
11271 ** http://www.mlton.org/pipermail/mlton/2009-July/030622.html
11272
11273 <<<
11274
11275 :mlton-guide-page: mGTK
11276 [[mGTK]]
11277 mGTK
11278 ====
11279
11280 http://mgtk.sourceforge.net/[mGTK] is a wrapper for
11281 http://www.gtk.org/[GTK+], a GUI toolkit.
11282
11283 We recommend using mGTK 0.93, which is not listed on their home page,
11284 but is available at the
11285 http://sourceforge.net/project/showfiles.php?group_id=23226&package_id=16523[file
11286 release page].  To test it, after unpacking, do `cd examples; make
11287 mlton`, after which you should be able to run the many examples
11288 (`signup-mlton`, `listview-mlton`, ...).
11289
11290 == Also see ==
11291
11292 * <:Glade:>
11293
11294 <<<
11295
11296 :mlton-guide-page: MichaelNorrish
11297 [[MichaelNorrish]]
11298 MichaelNorrish
11299 ==============
11300
11301 I am a researcher at http://nicta.com.au[NICTA], with a web-page http://web.rsise.anu.edu.au/%7Emichaeln/[here].
11302
11303 I'm interested in MLton because of the chance that it might be a good vehicle for future implementations of the http://hol.sf.net[HOL] theorem-proving system. It's beginning to look as if one route forward will be to embed an SML interpreter into a MLton-compiled executable.  I don't know if an extensible interpreter of the kind we're looking for already exists.
11304
11305 <<<
11306
11307 :mlton-guide-page: MikeThomas
11308 [[MikeThomas]]
11309 MikeThomas
11310 ==========
11311
11312 Here is a picture at home in Brisbane, Queensland, Australia, taken in January 2004.
11313
11314 image::MikeThomas.attachments/picture.jpg[align="center"]
11315
11316 <<<
11317
11318 :mlton-guide-page: ML
11319 [[ML]]
11320 ML
11321 ==
11322
11323 ML stands for _meta language_.  ML was originally designed in the
11324 1970s as a programming language to assist theorem proving in the logic
11325 LCF.  In the 1980s, ML split into two variants,
11326 <:StandardML:Standard ML> and <:OCaml:>, both of which are still used
11327 today.
11328
11329 <<<
11330
11331 :mlton-guide-page: MLAntlr
11332 [[MLAntlr]]
11333 MLAntlr
11334 =======
11335
11336 http://smlnj-gforge.cs.uchicago.edu/projects/ml-lpt/[MLAntlr] is a
11337 parser generator for <:StandardML:Standard ML>.
11338
11339 == Also see ==
11340
11341 * <:MLULex:>
11342 * <:MLLPTLibrary:>
11343
11344 <<<
11345
11346 :mlton-guide-page: MLBasis
11347 [[MLBasis]]
11348 MLBasis
11349 =======
11350
11351 The ML Basis system extends <:StandardML:Standard ML> to support
11352 programming-in-the-very-large, namespace management at the module
11353 level, separate delivery of library sources, and more.  While Standard
11354 ML modules are a sophisticated language for programming-in-the-large,
11355 it is difficult, if not impossible, to accomplish a number of routine
11356 namespace management operations when a program draws upon multiple
11357 libraries provided by different vendors.
11358
11359 The ML Basis system is a simple, yet powerful, approach that builds
11360 upon the programmer's intuitive notion (and
11361 <:DefinitionOfStandardML: The Definition of Standard ML (Revised)>'s
11362 formal notion) of the top-level environment (a _basis_).  The system
11363 is designed as a natural extension of <:StandardML: Standard ML>; the
11364 formal specification of the ML Basis system
11365 (<!Attachment(MLBasis,mlb-formal.pdf)>) is given in the style
11366 of the Definition.
11367
11368 Here are some of the key features of the ML Basis system:
11369
11370 1. Explicit file order: The order of files (and, hence, the order of
11371 evaluation) in the program is explicit.  The ML Basis system's
11372 semantics are structured in such a way that for any well-formed
11373 project, there will be exactly one possible interpretation of the
11374 project's syntax, static semantics, and dynamic semantics.
11375
11376 2. Implicit dependencies: A source file (corresponding to an SML
11377 top-level declaration) is elaborated in the environment described by
11378 preceding declarations.  It is not necessary to explicitly list the
11379 dependencies of a file.
11380
11381 3. Scoping and renaming: The ML Basis system provides mechanisms for
11382 limiting the scope of (i.e, hiding) and renaming identifiers.
11383
11384 4. No naming convention for finding the file that defines a module.
11385 To import a module, its defining file must appear in some ML Basis
11386 file.
11387
11388 == Next steps ==
11389
11390 * <:MLBasisSyntaxAndSemantics:>
11391 * <:MLBasisExamples:>
11392 * <:MLBasisPathMap:>
11393 * <:MLBasisAnnotations:>
11394 * <:MLBasisAvailableLibraries:>
11395
11396 <<<
11397
11398 :mlton-guide-page: MLBasisAnnotationExamples
11399 [[MLBasisAnnotationExamples]]
11400 MLBasisAnnotationExamples
11401 =========================
11402
11403 Here are some example uses of <:MLBasisAnnotations:>.
11404
11405 == Eliminate spurious warnings in automatically generated code ==
11406
11407 Programs that automatically generate source code can often produce
11408 nonexhaustive patterns, relying on invariants of the generated code to
11409 ensure that the pattern matchings never fail.  A programmer may wish
11410 to elide the nonexhaustive warnings from this code, in order that
11411 legitimate warnings are not missed in a flurry of false positives.  To
11412 do so, the programmer simply annotates the generated code with the
11413 `nonexhaustiveBind ignore` and `nonexhaustiveMatch ignore`
11414 annotations:
11415
11416 ----
11417 local
11418   $(GEN_ROOT)/gen-lib.mlb
11419
11420   ann
11421     "nonexhaustiveBind ignore"
11422     "nonexhaustiveMatch ignore"
11423   in
11424     foo.gen.sml
11425   end
11426 in
11427   signature FOO
11428   structure Foo
11429 end
11430 ----
11431
11432
11433 == Deliver a library ==
11434
11435 Standard ML libraries can be delivered via `.mlb` files.  Authors of
11436 such libraries should strive to be mindful of the ways in which
11437 programmers may choose to compile their programs.  For example,
11438 although the defaults for `sequenceNonUnit` and `warnUnused` are
11439 `ignore` and `false`, periodically compiling with these annotations
11440 defaulted to `warn` and `true` can help uncover likely bugs.  However,
11441 a programmer is unlikely to be interested in unused modules from an
11442 imported library, and the behavior of `sequenceNonUnit error` may be
11443 incompatible with some libraries.  Hence, a library author may choose
11444 to deliver a library as follows:
11445
11446 ----
11447 ann
11448   "nonexhaustiveBind warn" "nonexhaustiveMatch warn"
11449   "redundantBind warn" "redundantMatch warn"
11450   "sequenceNonUnit warn"
11451   "warnUnused true" "forceUsed"
11452 in
11453   local
11454     file1.sml
11455     ...
11456     filen.sml
11457   in
11458     functor F1
11459     ...
11460     signature S1
11461     ...
11462     structure SN
11463     ...
11464   end
11465 end
11466 ----
11467
11468 The annotations `nonexhaustiveBind warn`, `redundantBind warn`,
11469 `nonexhaustiveMatch warn`, `redundantMatch warn`, and `sequenceNonUnit
11470 warn` have the obvious effect on elaboration.  The annotations
11471 `warnUnused true` and `forceUsed` work in conjunction -- warning on
11472 any identifiers that do not contribute to the exported modules, and
11473 preventing warnings on exported modules that are not used in the
11474 remainder of the program.  Many of the
11475 <:MLBasisAvailableLibraries:available libraries> are delivered with
11476 these annotations.
11477
11478 <<<
11479
11480 :mlton-guide-page: MLBasisAnnotations
11481 [[MLBasisAnnotations]]
11482 MLBasisAnnotations
11483 ==================
11484
11485 <:MLBasis:ML Basis> annotations control options that affect the
11486 elaboration of SML source files.  Conceptually, a basis file is
11487 elaborated in a default annotation environment (just as it is
11488 elaborated in an empty basis).  The declaration
11489 ++ann++{nbsp}++"++__ann__++"++{nbsp}++in++{nbsp}__basdec__{nbsp}++end++
11490 merges the annotation _ann_ with the "current" annotation environment
11491 for the elaboration of _basdec_.  To allow for future expansion,
11492 ++"++__ann__++"++ is lexed as a single SML string constant.  To
11493 conveniently specify multiple annotations, the following derived form
11494 is provided:
11495
11496 ****
11497 +ann+ ++"++__ann__++"++ (++"++__ann__++"++ )^\+^ +in+ _basdec_ +end+
11498 =>
11499 +ann+ ++"++__ann__++"++ +in+ +ann+ (++"++__ann__++"++)^\+^ +in+ _basdec_ +end+ +end+
11500 ****
11501
11502 Here are the available annotations.  In the explanation below, for
11503 annotations that take an argument, the first value listed is the
11504 default.
11505
11506 * +allowFFI {false|true}+
11507 +
11508 If `true`, allow `_address`, `_export`, `_import`, and `_symbol`
11509 expressions to appear in source files.  See
11510 <:ForeignFunctionInterface:>.
11511
11512 * +allowSuccessorML {false|true}+
11513 +
11514 --
11515 Allow or disallow all of the <:SuccessorML:> features.  This is a
11516 proxy for all of the following annotations.
11517
11518 ** +allowDoDecls {false|true}+
11519 +
11520 If `true`, allow a +do _exp_+ declaration form.
11521
11522 ** +allowExtendedConsts {false|true}+
11523 +
11524 --
11525 Allow or disallow all of the extended constants features.  This is a
11526 proxy for all of the following annotations.
11527
11528 *** +allowExtendedNumConsts {false|true}+
11529 +
11530 If `true`, allow extended numeric constants.
11531
11532 *** +allowExtendedTextConsts {false|true}+
11533 +
11534 If `true`, allow extended text constants.
11535 --
11536
11537 ** +allowLineComments {false|true}+
11538 +
11539 If `true`, allow line comments beginning with the token ++(*)++.
11540
11541 ** +allowOptBar {false|true}+
11542 +
11543 If `true`, allow a bar to appear before the first match rule of a
11544 `case`, `fn`, or `handle` expression, allow a bar to appear before the
11545 first function-value binding of a `fun` declaration, and allow a bar
11546 to appear before the first constructor binding or description of a
11547 `datatype` declaration or specification.
11548
11549 ** +allowOptSemicolon {false|true}+
11550 +
11551 If `true`, allows a semicolon to appear after the last expression in a
11552 sequence expression or `let` body.
11553
11554 ** +allowOrPats {false|true}+
11555 +
11556 If `true`, allows disjunctive (a.k.a., "or") patterns of the form
11557 +_pat_ | _pat_+.
11558
11559 ** +allowRecordPunExps {false|true}+
11560 +
11561 If `true`, allows record punning expressions.
11562
11563 ** +allowSigWithtype {false|true}+
11564 +
11565 If `true`, allows `withtype` to modify a `datatype` specification in a
11566 signature.
11567
11568 ** +allowVectorExpsAndPats {false|true}+
11569 +
11570 --
11571 Allow or disallow vector expressions and vector patterns.  This is a
11572 proxy for all of the following annotations.
11573
11574 *** +allowVectorExps {false|true}+
11575 +
11576 If `true`, allow vector expressions.
11577
11578 *** +allowVectorPats {false|true}+
11579 +
11580 If `true`, allow vector patterns.
11581 --
11582 --
11583
11584 * +forceUsed+
11585 +
11586 Force all identifiers in the basis denoted by the body of the `ann` to
11587 be considered used; use in conjunction with `warnUnused true`.
11588
11589 * +nonexhaustiveBind {warn|error|ignore}+
11590 +
11591 If `error` or `warn`, report nonexhaustive patterns in `val`
11592 declarations (i.e., pattern-match failures that raise the `Bind`
11593 exception).  An error will abort a compile, while a warning will not.
11594
11595 * +nonexhaustiveExnBind {default|ignore}+
11596 +
11597 If `ignore`, suppress errors and warnings about nonexhaustive matches
11598 in `val` declarations that arise solely from unmatched exceptions.
11599 If `default`, follow the behavior of `nonexhaustiveBind`.
11600
11601 * +nonexhaustiveExnMatch {default|ignore}+
11602 +
11603 If `ignore`, suppress errors and warnings about nonexhaustive matches
11604 in `fn` expressions, `case` expressions, and `fun` declarations that
11605 arise solely from unmatched exceptions.  If `default`, follow the
11606 behavior of `nonexhaustiveMatch`.
11607
11608 * +nonexhaustiveExnRaise {ignore|default}+
11609 +
11610 If `ignore`, suppress errors and warnings about nonexhaustive matches
11611 in `handle` expressions that arise solely from unmatched exceptions.
11612 If `default`, follow the behavior of `nonexhaustiveRaise`.
11613
11614 * +nonexhaustiveMatch {warn|error|ignore}+
11615 +
11616 If `error` or `warn`, report nonexhaustive patterns in `fn`
11617 expressions, `case` expressions, and `fun` declarations (i.e.,
11618 pattern-match failures that raise the `Match` exception).  An error
11619 will abort a compile, while a warning will not.
11620
11621 * +nonexhaustiveRaise {ignore|warn|error}+
11622 +
11623 If `error` or `warn`, report nonexhaustive patterns in `handle`
11624 expressions (i.e., pattern-match failures that implicitly (re)raise
11625 the unmatched exception).  An error will abort a compile, while a
11626 warning will not.
11627
11628 * +redundantBind {warn|error|ignore}+
11629 +
11630 If `error` or `warn`, report redundant patterns in `val` declarations.
11631 An error will abort a compile, while a warning will not.
11632
11633 * +redundantMatch {warn|error|ignore}+
11634 +
11635 If `error` or `warn`, report redundant patterns in `fn` expressions,
11636 `case` expressions, and `fun` declarations.  An error will abort a
11637 compile, while a warning will not.
11638
11639 * +redundantRaise {warn|error|ignore}+
11640 +
11641 If `error` or `warn`, report redundant patterns in `handle`
11642 expressions.  An error will abort a compile, while a warning will not.
11643
11644 * +resolveScope {strdec|dec|topdec|program}+
11645 +
11646 Used to control the scope at which overload constraints are resolved
11647 to default types (if not otherwise resolved by type inference) and the
11648 scope at which unresolved flexible record constraints are reported.
11649 +
11650 The syntactic-class argument means to perform resolution checks at the
11651 smallest enclosing syntactic form of the given class.  The default
11652 behavior is to resolve at the smallest enclosing _strdec_ (which is
11653 equivalent to the largest enclosing _dec_).  Other useful behaviors
11654 are to resolve at the smallest enclosing _topdec_ (which is equivalent
11655 to the largest enclosing _strdec_) and at the smallest enclosing
11656 _program_ (which corresponds to a single `.sml` file and does not
11657 correspond to the whole `.mlb` program).
11658
11659 * +sequenceNonUnit {ignore|error|warn}+
11660 +
11661 If `error` or `warn`, report when `e1` is not of type `unit` in the
11662 sequence expression `(e1; e2)`.  This can be helpful in detecting
11663 curried applications that are mistakenly not fully applied.  To
11664 silence spurious messages, you can use `ignore e1`.
11665
11666 * +valrecConstr {warn|error|ignore}+
11667 +
11668 If `error` or `warn`, report when a `val rec` (or `fun`) declaration
11669 redefines an identifier that previously had constructor status.  An
11670 error will abort a compile, while a warning will not.
11671
11672 * +warnUnused {false|true}+
11673 +
11674 Report unused identifiers.
11675
11676 == Next Steps ==
11677
11678  * <:MLBasisAnnotationExamples:>
11679  * <:WarnUnusedAnomalies:>
11680
11681 <<<
11682
11683 :mlton-guide-page: MLBasisAvailableLibraries
11684 [[MLBasisAvailableLibraries]]
11685 MLBasisAvailableLibraries
11686 =========================
11687
11688 MLton comes with the following <:MLBasis:ML Basis> files available.
11689
11690 * `$(SML_LIB)/basis/basis.mlb`
11691 +
11692 The <:BasisLibrary:Basis Library>.
11693
11694 * `$(SML_LIB)/basis/basis-1997.mlb`
11695 +
11696 The (deprecated) 1997 version of the <:BasisLibrary:Basis Library>.
11697
11698 * `$(SML_LIB)/basis/mlton.mlb`
11699 +
11700 The <:MLtonStructure:MLton> structure and signatures.
11701
11702 * `$(SML_LIB)/basis/c-types.mlb`
11703 +
11704 Various structure aliases useful as <:ForeignFunctionInterfaceTypes:>.
11705
11706 * `$(SML_LIB)/basis/unsafe.mlb`
11707 +
11708 The <:UnsafeStructure:Unsafe> structure and signature.
11709
11710 * `$(SML_LIB)/basis/sml-nj.mlb`
11711 +
11712 The <:SMLofNJStructure:SMLofNJ> structure and signature.
11713
11714 * `$(SML_LIB)/mlyacc-lib/mlyacc-lib.mlb`
11715 +
11716 Modules used by parsers built with <:MLYacc:>.
11717
11718 * `$(SML_LIB)/cml/cml.mlb`
11719 +
11720 <:ConcurrentML:>, a library for message-passing concurrency.
11721
11722 * `$(SML_LIB)/mlnlffi-lib/mlnlffi-lib.mlb`
11723 +
11724 <:MLNLFFI:ML-NLFFI>, a library for foreign function interfaces.
11725
11726 * `$(SML_LIB)/mlrisc-lib/...`
11727 +
11728 <:MLRISCLibrary:>, a library for retargetable and optimizing compiler back ends.
11729
11730 * `$(SML_LIB)/smlnj-lib/...`
11731 +
11732 <:SMLNJLibrary:>, a collection of libraries distributed with SML/NJ.
11733
11734 * `$(SML_LIB)/ckit-lib/ckit-lib.mlb`
11735 +
11736 <:CKitLibrary:>, a library for C source code.
11737
11738 * `$(SML_LIB)/mllpt-lib/mllpt-lib.mlb`
11739 +
11740 <:MLLPTLibrary:>, a support library for the <:MLULex:> scanner generator and the <:MLAntlr:> parser generator.
11741
11742
11743 == Basis fragments ==
11744
11745 There are a number of specialized ML Basis files for importing
11746 fragments of the <:BasisLibrary: Basis Library> that can not be
11747 expressed within SML.
11748
11749 * `$(SML_LIB)/basis/pervasive-types.mlb`
11750 +
11751 The top-level types and constructors of the Basis Library.
11752
11753 * `$(SML_LIB)/basis/pervasive-exns.mlb`
11754 +
11755 The top-level exception constructors of the Basis Library.
11756
11757 * `$(SML_LIB)/basis/pervasive-vals.mlb`
11758 +
11759 The top-level values of the Basis Library, without infix status.
11760
11761 * `$(SML_LIB)/basis/overloads.mlb`
11762 +
11763 The top-level overloaded values of the Basis Library, without infix status.
11764
11765 * `$(SML_LIB)/basis/equal.mlb`
11766 +
11767 The polymorphic equality `=` and inequality `<>` values, without infix status.
11768
11769 * `$(SML_LIB)/basis/infixes.mlb`
11770 +
11771 The infix declarations of the Basis Library.
11772
11773 * `$(SML_LIB)/basis/pervasive.mlb`
11774 +
11775 The entire top-level value and type environment of the Basis Library, with infix status.  This is the same as importing the above six MLB files.
11776
11777 <<<
11778
11779 :mlton-guide-page: MLBasisExamples
11780 [[MLBasisExamples]]
11781 MLBasisExamples
11782 ===============
11783
11784 Here are some example uses of <:MLBasis:ML Basis> files.
11785
11786
11787 == Complete program ==
11788
11789 Suppose your complete program consists of the files `file1.sml`, ...,
11790 `filen.sml`, which depend upon libraries `lib1.mlb`, ..., `libm.mlb`.
11791
11792 ----
11793 (* import libraries *)
11794 lib1.mlb
11795 ...
11796 libm.mlb
11797
11798 (* program files *)
11799 file1.sml
11800 ...
11801 filen.sml
11802 ----
11803
11804 The bases denoted by `lib1.mlb`, ..., `libm.mlb` are merged (bindings
11805 of names in later bases take precedence over bindings of the same name
11806 in earlier bases), producing a basis in which `file1.sml`, ...,
11807 `filen.sml` are elaborated, adding additional bindings to the basis.
11808
11809
11810 == Export filter ==
11811
11812 Suppose you only want to export certain structures, signatures, and
11813 functors from a collection of files.
11814
11815 ----
11816 local
11817   file1.sml
11818   ...
11819   filen.sml
11820 in
11821   (* export filter here *)
11822   functor F
11823   structure S
11824 end
11825 ----
11826
11827 While `file1.sml`, ..., `filen.sml` may declare top-level identifiers
11828 in addition to `F` and `S`, such names are not accessible to programs
11829 and libraries that import this `.mlb`.
11830
11831
11832 == Export filter with renaming ==
11833
11834 Suppose you want an export filter, but want to rename one of the
11835 modules.
11836
11837 ----
11838 local
11839   file1.sml
11840   ...
11841   filen.sml
11842 in
11843   (* export filter, with renaming, here *)
11844   functor F
11845   structure S' = S
11846 end
11847 ----
11848
11849 Note that `functor F` is an abbreviation for `functor F = F`, which
11850 simply exports an identifier under the same name.
11851
11852
11853 == Import filter ==
11854
11855 Suppose you only want to import a functor `F` from one library and a
11856 structure `S` from another library.
11857
11858 ----
11859 local
11860   lib1.mlb
11861 in
11862   (* import filter here *)
11863   functor F
11864 end
11865 local
11866   lib2.mlb
11867 in
11868   (* import filter here *)
11869   structure S
11870 end
11871 file1.sml
11872 ...
11873 filen.sml
11874 ----
11875
11876
11877 == Import filter with renaming ==
11878
11879 Suppose you want to import a structure `S` from one library and
11880 another structure `S` from another library.
11881
11882 ----
11883 local
11884   lib1.mlb
11885 in
11886   (* import filter, with renaming, here *)
11887   structure S1 = S
11888 end
11889 local
11890   lib2.mlb
11891 in
11892   (* import filter, with renaming, here *)
11893   structure S2 = S
11894 end
11895 file1.sml
11896 ...
11897 filen.sml
11898 ----
11899
11900
11901 == Full Basis ==
11902
11903 Since the Modules level of SML is the natural means for organizing
11904 program and library components, MLB files provide convenient syntax
11905 for renaming Modules level identifiers (in fact, renaming of functor
11906 identifiers provides a mechanism that is not available in SML).
11907 However, please note that `.mlb` files elaborate to full bases
11908 including top-level types and values (including infix status), in
11909 addition to structures, signatures, and functors.  For example,
11910 suppose you wished to extend the <:BasisLibrary:Basis Library> with an
11911 `('a, 'b) either` datatype corresponding to a disjoint sum; the type
11912 and some operations should be available at the top-level;
11913 additionally, a signature and structure provide the complete
11914 interface.
11915
11916 We could use the following files.
11917
11918 `either-sigs.sml`
11919 [source,sml]
11920 ----
11921 signature EITHER_GLOBAL =
11922   sig
11923     datatype ('a, 'b) either = Left of 'a | Right of 'b
11924     val &  : ('a -> 'c) * ('b -> 'c) -> ('a, 'b) either -> 'c
11925     val && : ('a -> 'c) * ('b -> 'd) -> ('a, 'b) either -> ('c, 'd) either
11926   end
11927
11928 signature EITHER =
11929   sig
11930     include EITHER_GLOBAL
11931     val isLeft  : ('a, 'b) either -> bool
11932     val isRight : ('a, 'b) either -> bool
11933     ...
11934   end
11935 ----
11936
11937 `either-strs.sml`
11938 [source,sml]
11939 ----
11940 structure Either : EITHER =
11941   struct
11942     datatype ('a, 'b) either = Left of 'a | Right of 'b
11943     fun f & g = fn x =>
11944       case x of Left z => f z | Right z => g z
11945     fun f && g = (Left o f) & (Right o g)
11946     fun isLeft x = ((fn _ => true) & (fn _ => false)) x
11947     fun isRight x = (not o isLeft) x
11948     ...
11949   end
11950 structure EitherGlobal : EITHER_GLOBAL = Either
11951 ----
11952
11953 `either-infixes.sml`
11954 [source,sml]
11955 ----
11956 infixr 3 & &&
11957 ----
11958
11959 `either-open.sml`
11960 [source,sml]
11961 ----
11962 open EitherGlobal
11963 ----
11964
11965 `either.mlb`
11966 ----
11967 either-infixes.sml
11968 local
11969   (* import Basis Library *)
11970   $(SML_LIB)/basis/basis.mlb
11971   either-sigs.sml
11972   either-strs.sml
11973 in
11974   signature EITHER
11975   structure Either
11976   either-open.sml
11977 end
11978 ----
11979
11980 A client that imports `either.mlb` will have access to neither
11981 `EITHER_GLOBAL` nor `EitherGlobal`, but will have access to the type
11982 `either` and the values `&` and `&&` (with infix status) in the
11983 top-level environment.  Note that `either-infixes.sml` is outside the
11984 scope of the local, because we want the infixes available in the
11985 implementation of the library and to clients of the library.
11986
11987 <<<
11988
11989 :mlton-guide-page: MLBasisPathMap
11990 [[MLBasisPathMap]]
11991 MLBasisPathMap
11992 ==============
11993
11994 An <:MLBasis:ML Basis> _path map_ describes a map from ML Basis path
11995 variables (of the form `$(VAR)`) to file system paths.  ML Basis path
11996 variables provide a flexible way to refer to libraries while allowing
11997 them to be moved without changing their clients.
11998
11999 The format of an `mlb-path-map` file is a sequence of lines; each line
12000 consists of two, white-space delimited tokens.  The first token is a
12001 path variable `VAR` and the second token is the path to which the
12002 variable is mapped.  The path may include path variables, which are
12003 recursively expanded.
12004
12005 The mapping from path variables to paths is initialized by the compiler.
12006 Additional path maps can be specified with `-mlb-path-map` and
12007 individual path variable mappings can be specified with
12008 `-mlb-path-var` (see <:CompileTimeOptions:>).  Configuration files are
12009 processed from first to last and from top to bottom, later mappings
12010 take precedence over earlier mappings.
12011
12012 The compiler and system-wide configuration file makes the following
12013 path variables available.
12014
12015 [options="header",cols="^25%,<75%"]
12016 |====
12017 |MLB path variable|Description
12018 |`SML_LIB`|path to system-wide libraries, usually `/usr/lib/mlton/sml`
12019 |`TARGET_ARCH`|string representation of target architecture
12020 |`TARGET_OS`|string representation of target operating system
12021 |`DEFAULT_INT`|binding for default int, usually `int32`
12022 |`DEFAULT_WORD`|binding for default word, usually `word32`
12023 |`DEFAULT_REAL`|binding for default real, usually `real64`
12024 |====
12025
12026 <<<
12027
12028 :mlton-guide-page: MLBasisSyntaxAndSemantics
12029 [[MLBasisSyntaxAndSemantics]]
12030 MLBasisSyntaxAndSemantics
12031 =========================
12032
12033 An <:MLBasis:ML Basis> (MLB) file should have the `.mlb` suffix and
12034 should contain a basis declaration.
12035
12036 == Syntax ==
12037
12038 A basis declaration (_basdec_) must be one of the following forms.
12039
12040 * +basis+ _basid_ +=+ _basexp_ (+and+ _basid_ +=+ _basexp_)^*^
12041 * +open+ _basid~1~_ ... _basid~n~_
12042 * +local+ _basdec_ +in+ _basdec_ +end+
12043 * _basdec_ [+;+] _basdec_
12044 * +structure+ _strid_ [+=+ _strid_]  (+and+ _strid_[+=+ _strid_])^*^
12045 * +signature+ _sigid_ [+=+ _sigid_]  (+and+ _sigid_ [+=+ _sigid_])^*^
12046 * +functor+ _funid_ [+=+ _funid_]  (+and+ _funid_ [+=+ _funid_])^*^
12047 * __path__++.sml++, __path__++.sig++, or __path__++.fun++
12048 * __path__++.mlb++
12049 * +ann+ ++"++_ann_++"++ +in+ _basdec_ +end+
12050
12051 A basis expression (_basexp_) must be of one the following forms.
12052
12053 * +bas+ _basdec_ +end+
12054 * _basid_
12055 * +let+ _basdec_ +in+ _basexp_ +end+
12056
12057 Nested SML-style comments (enclosed with `(*` and `*)`) are ignored
12058 (but <:LineDirective:>s are recognized).
12059
12060 Paths can be relative or absolute.  Relative paths are relative to the
12061 directory containing the MLB file.  Paths may include path variables
12062 and are expanded according to a <:MLBasisPathMap:path map>.  Unquoted
12063 paths may include alpha-numeric characters and the symbols "`-`" and
12064 "`_`", along with the arc separator "`/`" and extension separator
12065 "`.`".  More complicated paths, including paths with spaces, may be
12066 included by quoting the path with `"`.  A quoted path is lexed as an
12067 SML string constant.
12068
12069 <:MLBasisAnnotations:Annotations> allow a library author to
12070 control options that affect the elaboration of SML source files.
12071
12072 == Semantics ==
12073
12074 There is a <!Attachment(MLBasis,mlb-formal.pdf,formal semantics)> for
12075 ML Basis files in the style of the
12076 <:DefinitionOfStandardML:Definition>.  Here, we give an informal
12077 explanation.
12078
12079 An SML structure is a collection of types, values, and other
12080 structures.  Similarly, a basis is a collection, but of more kinds of
12081 objects: types, values, structures, fixities, signatures, functors,
12082 and other bases.
12083
12084 A basis declaration denotes a basis.  A structure, signature, or
12085 functor declaration denotes a basis containing the corresponding
12086 module.  Sequencing of basis declarations merges bases, with later
12087 definitions taking precedence over earlier ones, just like sequencing
12088 of SML declarations.  Local declarations provide name hiding, just
12089 like SML local declarations.  A reference to an SML source file causes
12090 the file to be elaborated in the basis extant at the point of
12091 reference.  A reference to an MLB file causes the basis denoted by
12092 that MLB file to be imported -- the basis at the point of reference
12093 does _not_ affect the imported basis.
12094
12095 Basis expressions and basis identifiers allow binding a basis to a
12096 name.
12097
12098 An MLB file is elaborated starting in an empty basis.  Each MLB file
12099 is elaborated and evaluated only once, with the result being cached.
12100 Subsequent references use the cached value.  Thus, any observable
12101 effects due to evaluation are not duplicated if the MLB file is
12102 referred to multiple times.
12103
12104 <<<
12105
12106 :mlton-guide-page: MLj
12107 [[MLj]]
12108 MLj
12109 ===
12110
12111 http://www.dcs.ed.ac.uk/home/mlj/[MLj] is a
12112 <:StandardMLImplementations:Standard ML implementation> that targets
12113 Java bytecode.  It is no longer maintained.  It has morphed into
12114 <:SMLNET:SML.NET>.
12115
12116 == Also see ==
12117
12118 * <!Cite(BentonEtAl98)>
12119 * <!Cite(BentonKennedy99)>
12120
12121 <<<
12122
12123 :mlton-guide-page: MLKit
12124 [[MLKit]]
12125 MLKit
12126 =====
12127
12128 The http://sourceforge.net/apps/mediawiki/mlkit[ML Kit] is a
12129 <:StandardMLImplementations:Standard ML implementation>.
12130
12131 MLKit supports:
12132
12133 * <:DefinitionOfStandardML:SML'97>
12134 ** including most of the latest <:BasisLibrary:Basis Library>
12135 http://www.standardml.org/Basis[specification],
12136 * <:MLBasis:ML Basis> files
12137 ** and separate compilation,
12138 * <:Regions:Region-Based Memory Management>
12139 ** and <:GarbageCollection:garbage collection>,
12140 * Multiple backends, including
12141 ** native x86,
12142 ** bytecode, and
12143 ** JavaScript (see http://www.itu.dk/people/mael/smltojs/[SMLtoJs]).
12144
12145 At the time of writing, MLKit does not support:
12146
12147 * concurrent programming / threads,
12148 * calling from C to SML.
12149
12150 <<<
12151
12152 :mlton-guide-page: MLLex
12153 [[MLLex]]
12154 MLLex
12155 =====
12156
12157 <:MLLex:> is a lexical analyzer generator for <:StandardML:Standard ML>
12158 modeled after the Lex lexical analyzer generator.
12159
12160 A version of MLLex, ported from the <:SMLNJ:SML/NJ> sources, is
12161 distributed with MLton.
12162
12163 == Description ==
12164
12165 MLLex takes as input the lex language as defined in the ML-Lex manual,
12166 and outputs a lexical analyzer in SML.
12167
12168 == Implementation ==
12169
12170 * <!ViewGitFile(mlton,master,mllex/lexgen.sml)>
12171 * <!ViewGitFile(mlton,master,mllex/main.sml)>
12172 * <!ViewGitFile(mlton,master,mllex/call-main.sml)>
12173
12174 == Details and Notes ==
12175
12176 There are 3 main passes in the MLLex tool:
12177
12178 * Source parsing. In this pass, lex source program are parsed into internal representations. The core part of this pass is a hand-written lexer and an LL(1) parser. The output of this pass is a record of user code, rules (along with start states) and actions. (MLLex definitions are wiped off.)
12179 * DFA construction. In this pass, a DFA is constructed by the algorithm of H. Yamada et. al.
12180 * Output. In this pass, the generated DFA is written out as a transition table, along with a table-driven algorithm, to an SML file.
12181
12182 == Also see ==
12183
12184 * <!Attachment(Documentation,mllex.pdf)>
12185 * <:MLYacc:>
12186 * <!Cite(AppelEtAl94)>
12187 * <!Cite(Price09)>
12188
12189 <<<
12190
12191 :mlton-guide-page: MLLPTLibrary
12192 [[MLLPTLibrary]]
12193 MLLPTLibrary
12194 ============
12195
12196 The
12197 http://smlnj-gforge.cs.uchicago.edu/projects/ml-lpt/[ML-LPT Library]
12198 is a support library for the <:MLULex:> scanner generator and the
12199 <:MLAntlr:> parser generator.  The ML-LPT Library is distributed with
12200 SML/NJ.
12201
12202 As of 20180119, MLton includes the ML-LPT Library synchronized with
12203 SML/NJ version 110.82.
12204
12205 == Usage ==
12206
12207 * You can import the ML-LPT Library into an MLB file with:
12208 +
12209 [options="header"]
12210 |=====
12211 |MLB file|Description
12212 |`$(SML_LIB)/mllpt-lib/mllpt-lib.mlb`|
12213 |=====
12214
12215 * If you are porting a project from SML/NJ's <:CompilationManager:> to
12216 MLton's <:MLBasis: ML Basis system> using `cm2mlb`, note that the
12217 following map is included by default:
12218 +
12219 ----
12220 # MLLPT Library
12221 $ml-lpt-lib.cm                          $(SML_LIB)/mllpt-lib
12222 $ml-lpt-lib.cm/ml-lpt-lib.cm            $(SML_LIB)/mllpt-lib/mllpt-lib.mlb
12223 ----
12224 +
12225 This will automatically convert a `$/mllpt-lib.cm` import in an input
12226 `.cm` file into a `$(SML_LIB)/mllpt-lib/mllpt-lib.mlb` import in the
12227 output `.mlb` file.
12228
12229 == Details ==
12230
12231 {empty}
12232
12233 == Patch ==
12234
12235 * <!ViewGitFile(mlton,master,lib/mllpt-lib/ml-lpt.patch)>
12236
12237 <<<
12238
12239 :mlton-guide-page: MLmon
12240 [[MLmon]]
12241 MLmon
12242 =====
12243
12244 An `mlmon.out` file records dynamic <:Profiling:profiling> counts.
12245
12246 == File format ==
12247
12248 An `mlmon.out` file is a text file with a sequence of lines.
12249
12250 * The string "`MLton prof`".
12251
12252 * The string "`alloc`", "`count`", or "`time`", depending on the kind
12253 of profiling information, corresponding to the command-line argument
12254 supplied to `mlton -profile`.
12255
12256 * The string "`current`" or "`stack`" depending on whether profiling
12257 data was gathered for only the current function (the top of the stack)
12258 or for all functions on the stack.  This corresponds to whether the
12259 executable was compiled with `-profile-stack false` or `-profile-stack
12260 true`.
12261
12262 * The magic number of the executable.
12263
12264 * The number of non-gc ticks, followed by a space, then the number of
12265 GC ticks.
12266
12267 * The number of (split) functions for which data is recorded.
12268
12269 * A line for each (split) function with counts.  Each line contains an
12270 integer count of the number of ticks while the function was current.
12271 In addition, if stack data was gathered (`-profile-stack true`), then
12272 the line contains two additional tick counts:
12273
12274 ** the number of ticks while the function was on the stack.
12275 ** the number of ticks while the function was on the stack and a GC
12276    was performed.
12277
12278 * The number of (master) functions for which data is recorded.
12279
12280 * A line for each (master) function with counts.  The lines have the
12281 same format and meaning as with split-function counts.
12282
12283 <<<
12284
12285 :mlton-guide-page: MLNLFFI
12286 [[MLNLFFI]]
12287 MLNLFFI
12288 =======
12289
12290 <!Cite(Blume01, ML-NLFFI)> is the no-longer-foreign-function interface
12291 library for SML.
12292
12293 As of 20050212, MLton has an initial port of ML-NLFFI from SML/NJ to
12294 MLton.  All of the ML-NLFFI functionality is present.
12295
12296 Additionally, MLton has an initial port of the
12297 <:MLNLFFIGen:mlnlffigen> tool from SML/NJ to MLton.  Due to low-level
12298 details, the code generated by SML/NJ's `ml-nlffigen` is not
12299 compatible with MLton, and vice-versa.  However, the generated code
12300 has the same interface, so portable client code can be written.
12301 MLton's `mlnlffigen` does not currently support C functions with
12302 `struct` or `union` arguments.
12303
12304 == Usage ==
12305
12306 * You can import the ML-NLFFI Library into an MLB file with
12307 +
12308 [options="header"]
12309 |=====
12310 |MLB file|Description
12311 |`$(SML_LIB)/mlnlffi-lib/mlnlffi-lib.mlb`|
12312 |=====
12313
12314 * If you are porting a project from SML/NJ's <:CompilationManager:> to
12315 MLton's <:MLBasis: ML Basis system> using `cm2mlb`, note that the
12316 following maps are included by default:
12317 +
12318 ----
12319 # MLNLFFI Library
12320 $c                                      $(SML_LIB)/mlnlffi-lib
12321 $c/c.cm                                 $(SML_LIB)/mlnlffi-lib/mlnlffi-lib.mlb
12322 ----
12323 +
12324 This will automatically convert a `$/c.cm` import in an input `.cm`
12325 file into a `$(SML_LIB)/mlnlffi-lib/mlnlffi-lib.mlb` import in the
12326 output `.mlb` file.
12327
12328 == Also see ==
12329
12330 * <!Cite(Blume01)>
12331 * <:MLNLFFIImplementation:>
12332 * <:MLNLFFIGen:>
12333
12334 <<<
12335
12336 :mlton-guide-page: MLNLFFIGen
12337 [[MLNLFFIGen]]
12338 MLNLFFIGen
12339 ==========
12340
12341 `mlnlffigen` generates a <:MLNLFFI:> binding from a collection of `.c`
12342 files. It is based on the <:CKitLibrary:>, which is primarily designed
12343 to handle standardized C and thus does not understand many (any?)
12344 compiler extensions; however, it attempts to recover from errors when
12345 seeing unrecognized definitions.
12346
12347 In order to work around common gcc extensions, it may be useful to add
12348 `-cppopt` options to the command line; for example
12349 `-cppopt '-D__extension__'` may be occasionally useful. Fortunately,
12350 most portable libraries largely avoid the use of these types of
12351 extensions in header files.
12352
12353 `mlnlffigen` will normally not generate bindings for `#included`
12354 files; see `-match` and `-allSU` if this is desirable.
12355
12356 <<<
12357
12358 :mlton-guide-page: MLNLFFIImplementation
12359 [[MLNLFFIImplementation]]
12360 MLNLFFIImplementation
12361 =====================
12362
12363 MLton's implementation(s) of the <:MLNLFFI:> library differs from the
12364 SML/NJ implementation in two important ways:
12365
12366 * MLton cannot utilize the `Unsafe.cast` "cheat" described in Section
12367 3.7 of <!Cite(Blume01)>.  (MLton's representation of
12368 <:Closure:closures> and
12369 <:PackedRepresentation:aggressive representation> optimizations make
12370 an `Unsafe.cast` even more "unsafe" than in other implementations.)
12371 +
12372 --
12373 We have considered two solutions:
12374
12375 ** One solution is to utilize an additional type parameter (as
12376 described in Section 3.7 of <!Cite(Blume01)>):
12377 +
12378 --
12379 __________
12380 [source,sml]
12381 ----
12382 signature C = sig
12383     type ('t, 'f, 'c) obj
12384     eqtype ('t, 'f, 'c) obj'
12385     ...
12386     type ('o, 'f) ptr
12387     eqtype ('o, 'f) ptr'
12388     ...
12389     type 'f fptr
12390     type 'f ptr'
12391     ...
12392     structure T : sig
12393         type ('t, 'f) typ
12394         ...
12395     end
12396 end
12397 ----
12398
12399 The rule for `('t, 'f, 'c) obj`,`('t, 'f, 'c) ptr`, and also `('t, 'f)
12400 T.typ` is that whenever `F fptr` occurs within the instantiation of
12401 `'t`, then `'f` must be instantiated to `F`.  In all other cases, `'f`
12402 will be instantiated to `unit`.
12403 __________
12404
12405 (In the actual MLton implementation, an abstract type `naf`
12406 (not-a-function) is used instead of `unit`.)
12407
12408 While this means that type-annotated programs may not type-check under
12409 both the SML/NJ implementation and the MLton implementation, this
12410 should not be a problem in practice.  Tools, like `ml-nlffigen`, which
12411 are necessarily implementation dependent (in order to make
12412 <:CallingFromSMLToCFunctionPointer:calls through a C function
12413 pointer>), may be easily extended to emit the additional type
12414 parameter.  Client code which uses such generated glue-code (e.g.,
12415 Section 1 of <!Cite(Blume01)>) need rarely write type-annotations,
12416 thanks to the magic of type inference.
12417 --
12418
12419 ** The above implementation suffers from two disadvantages.
12420 +
12421 --
12422 First, it changes the MLNLFFI Library interface, meaning that the same
12423 program may not type-check under both the SML/NJ implementation and
12424 the MLton implementation (though, in light of type inference and the
12425 richer `MLRep` structure provided by MLton, this point is mostly
12426 moot).
12427
12428 Second, it appears to unnecessarily duplicate type information.  For
12429 example, an external C variable of type `int (* f[3])(int)` (that is,
12430 an array of three function pointers), would be represented by the SML
12431 type `(((sint -> sint) fptr, dec dg3) arr, sint -> sint, rw) obj`.
12432 One might well ask why the `'f` instantiation (`sint -> sint` in this
12433 case) cannot be _extracted_ from the `'t` instantiation
12434 (`((sint -> sint) fptr, dec dg3) arr` in this case), obviating the
12435 need for a separate _function-type_ type argument.  There are a number
12436 of components to an complete answer to this question.  Foremost is the
12437 fact that <:StandardML: Standard ML> supports neither (general)
12438 type-level functions nor intensional polymorphism.
12439
12440 A more direct answer for MLNLFFI is that in the SML/NJ implemention,
12441 the definition of the types `('t, 'c) obj` and `('t, 'c) ptr` are made
12442 in such a way that the type variables `'t` and `'c` are <:PhantomType:
12443 phantom> (not contributing to the run-time representation of an
12444 `('t, 'c) obj` or `('t, 'c) ptr` value), despite the fact that the
12445 types `((sint -> sint) fptr, rw) ptr` and
12446 `((double -> double) fptr, rw) ptr` necessarily carry distinct (and
12447 type incompatible) run-time (C-)type information (RTTI), corresponding
12448 to the different calling conventions of the two C functions.  The
12449 `Unsafe.cast` "cheat" overcomes the type incompatibility without
12450 introducing a new type variable (as in the first solution above).
12451
12452 Hence, the reason that _function-type_ type cannot be extracted from
12453 the `'t` type variable instantiation is that the type of the
12454 representation of RTTI doesn't even _see_ the (phantom) `'t` type
12455 variable.  The solution which presents itself is to give up on the
12456 phantomness of the `'t` type variable, making it available to the
12457 representation of RTTI.
12458
12459 This is not without some small drawbacks.  Because many of the types
12460 used to instantiate `'t` carry more structure than is strictly
12461 necessary for `'t`'s RTTI, it is sometimes necessary to wrap and
12462 unwrap RTTI to accommodate the additional structure.  (In the other
12463 implementations, the corresponding operations can pass along the RTTI
12464 unchanged.)  However, these coercions contribute minuscule overhead;
12465 in fact, in a majority of cases, MLton's optimizations will completely
12466 eliminate the RTTI from the final program.
12467 --
12468
12469 The implementation distributed with MLton uses the second solution.
12470
12471 Bonus question: Why can't one use a <:UniversalType: universal type>
12472 to eliminate the use of `Unsafe.cast`?
12473
12474 ** Answer: ???
12475 --
12476
12477 * MLton (in both of the above implementations) provides a richer
12478 `MLRep` structure, utilizing ++Int__<N>__++ and ++Word__<N>__++
12479 structures.
12480 +
12481 --
12482 [source,sml]
12483 -----
12484 structure MLRep = struct
12485     structure Char =
12486        struct
12487           structure Signed = Int8
12488           structure Unsigned = Word8
12489           (* word-style bit-operations on integers... *)
12490           structure <:SignedBitops:> = IntBitOps(structure I = Signed
12491                                              structure W = Unsigned)
12492        end
12493     structure Short =
12494        struct
12495           structure Signed = Int16
12496           structure Unsigned = Word16
12497           (* word-style bit-operations on integers... *)
12498           structure <:SignedBitops:> = IntBitOps(structure I = Signed
12499                                              structure W = Unsigned)
12500        end
12501     structure Int =
12502        struct
12503           structure Signed = Int32
12504           structure Unsigned = Word32
12505           (* word-style bit-operations on integers... *)
12506           structure <:SignedBitops:> = IntBitOps(structure I = Signed
12507                                              structure W = Unsigned)
12508        end
12509     structure Long =
12510        struct
12511           structure Signed = Int32
12512           structure Unsigned = Word32
12513           (* word-style bit-operations on integers... *)
12514           structure <:SignedBitops:> = IntBitOps(structure I = Signed
12515                                              structure W = Unsigned)
12516        end
12517     structure <:LongLong:> =
12518        struct
12519           structure Signed = Int64
12520           structure Unsigned = Word64
12521           (* word-style bit-operations on integers... *)
12522           structure <:SignedBitops:> = IntBitOps(structure I = Signed
12523                                              structure W = Unsigned)
12524        end
12525     structure Float = Real32
12526     structure Double = Real64
12527 end
12528 ----
12529
12530 This would appear to be a better interface, even when an
12531 implementation must choose `Int32` and `Word32` as the representation
12532 for smaller C-types.
12533 --
12534
12535 <<<
12536
12537 :mlton-guide-page: MLRISCLibrary
12538 [[MLRISCLibrary]]
12539 MLRISCLibrary
12540 =============
12541
12542 The http://www.cs.nyu.edu/leunga/www/MLRISC/Doc/html/index.html[MLRISC
12543 Library] is a framework for retargetable and optimizing compiler back
12544 ends.  The MLRISC Library is distributed with SML/NJ.  Due to
12545 differences between SML/NJ and MLton, this library will not work
12546 out-of-the box with MLton.
12547
12548 As of 20180119, MLton includes a port of the MLRISC Library
12549 synchronized with SML/NJ version 110.82.
12550
12551 == Usage ==
12552
12553 * You can import a sub-library of the MLRISC Library into an MLB file with:
12554 +
12555 [options="header"]
12556 |====
12557 |MLB file|Description
12558 |`$(SML_LIB)/mlrisc-lib/mlb/ALPHA.mlb`|The ALPHA backend
12559 |`$(SML_LIB)/mlrisc-lib/mlb/AMD64.mlb`|The AMD64 backend
12560 |`$(SML_LIB)/mlrisc-lib/mlb/AMD64-Peephole.mlb`|The AMD64 peephole optimizer
12561 |`$(SML_LIB)/mlrisc-lib/mlb/CCall.mlb`|
12562 |`$(SML_LIB)/mlrisc-lib/mlb/CCall-sparc.mlb`|
12563 |`$(SML_LIB)/mlrisc-lib/mlb/CCall-x86-64.mlb`|
12564 |`$(SML_LIB)/mlrisc-lib/mlb/CCall-x86.mlb`|
12565 |`$(SML_LIB)/mlrisc-lib/mlb/Control.mlb`|
12566 |`$(SML_LIB)/mlrisc-lib/mlb/Graphs.mlb`|
12567 |`$(SML_LIB)/mlrisc-lib/mlb/HPPA.mlb`|The HPPA backend
12568 |`$(SML_LIB)/mlrisc-lib/mlb/IA32.mlb`|The IA32 backend
12569 |`$(SML_LIB)/mlrisc-lib/mlb/IA32-Peephole.mlb`|The IA32 peephole optimizer
12570 |`$(SML_LIB)/mlrisc-lib/mlb/Lib.mlb`|
12571 |`$(SML_LIB)/mlrisc-lib/mlb/MLRISC.mlb`|
12572 |`$(SML_LIB)/mlrisc-lib/mlb/MLTREE.mlb`|
12573 |`$(SML_LIB)/mlrisc-lib/mlb/Peephole.mlb`|
12574 |`$(SML_LIB)/mlrisc-lib/mlb/PPC.mlb`|The PPC backend
12575 |`$(SML_LIB)/mlrisc-lib/mlb/RA.mlb`|
12576 |`$(SML_LIB)/mlrisc-lib/mlb/SPARC.mlb`|The Sparc backend
12577 |`$(SML_LIB)/mlrisc-lib/mlb/StagedAlloc.mlb`|
12578 |`$(SML_LIB)/mlrisc-lib/mlb/Visual.mlb`|
12579 |=====
12580
12581 * If you are porting a project from SML/NJ's <:CompilationManager:> to
12582 MLton's <:MLBasis: ML Basis system> using `cm2mlb`, note that the
12583 following map is included by default:
12584 +
12585 ----
12586 # MLRISC Library
12587 $SMLNJ-MLRISC                           $(SML_LIB)/mlrisc-lib/mlb
12588 ----
12589 +
12590 This will automatically convert a `$SMLNJ-MLRISC/MLRISC.cm` import in
12591 an input `.cm` file into a `$(SML_LIB)/mlrisc-lib/mlb/MLRISC.mlb`
12592 import in the output `.mlb` file.
12593
12594 == Details ==
12595
12596 The following changes were made to the MLRISC Library, in addition to
12597 deriving the `.mlb` files from the `.cm` files:
12598
12599 * eliminate sequential `withtype` expansions: Most could be rewritten as a sequence of type definitions and datatype definitions.
12600 * eliminate higher-order functors: Every higher-order functor definition and application could be uncurried in the obvious way.
12601 * eliminate `where <str> = <str>`: Quite painful to expand out all the flexible types in the respective structures.  Furthermore, many of the implied type equalities aren't needed, but it's too hard to pick out the right ones.
12602 * `library/array-noneq.sml` (added, not exported): Implements `signature ARRAY_NONEQ`, similar to `signature ARRAY` from the <:BasisLibrary:Basis Library>, but replacing the latter's `eqtype 'a array = 'a array` and `type 'a vector = 'a Vector.vector` with `type 'a array` and `type 'a vector`.  Thus, array-like containers may match `ARRAY_NONEQ`, whereas only the pervasive `'a array` container may math `ARRAY`.  (SML/NJ's implementation of `signature ARRAY` omits the type realizations.)
12603 * `library/dynamic-array.sml` and `library/hash-array.sml` (modifed): Replace `include ARRAY` with `include ARRAY_NONEQ`; see above.
12604
12605 == Patch ==
12606
12607 * <!ViewGitFile(mlton,master,lib/mlrisc-lib/MLRISC.patch)>
12608
12609 <<<
12610
12611 :mlton-guide-page: MLtonArray
12612 [[MLtonArray]]
12613 MLtonArray
12614 ==========
12615
12616 [source,sml]
12617 ----
12618 signature MLTON_ARRAY =
12619    sig
12620       val unfoldi: int * 'b * (int * 'b -> 'a * 'b) -> 'a array * 'b
12621    end
12622 ----
12623
12624 * `unfoldi (n, b, f)`
12625 +
12626 constructs an array _a_ of length `n`, whose elements _a~i~_ are
12627 determined by the equations __b~0~ = b__ and
12628 __(a~i~, b~i+1~) = f (i, b~i~)__.
12629
12630 <<<
12631
12632 :mlton-guide-page: MLtonBinIO
12633 [[MLtonBinIO]]
12634 MLtonBinIO
12635 ==========
12636
12637 [source,sml]
12638 ----
12639 signature MLTON_BIN_IO = MLTON_IO
12640 ----
12641
12642 See <:MLtonIO:>.
12643
12644 <<<
12645
12646 :mlton-guide-page: MLtonCont
12647 [[MLtonCont]]
12648 MLtonCont
12649 =========
12650
12651 [source,sml]
12652 ----
12653 signature MLTON_CONT =
12654    sig
12655       type 'a t
12656
12657       val callcc: ('a t -> 'a) -> 'a
12658       val isolate: ('a -> unit) -> 'a t
12659       val prepend: 'a t * ('b -> 'a) -> 'b t
12660       val throw: 'a t * 'a -> 'b
12661       val throw': 'a t * (unit -> 'a) -> 'b
12662    end
12663 ----
12664
12665 * `type 'a t`
12666 +
12667 the type of continuations that expect a value of type `'a`.
12668
12669 * `callcc f`
12670 +
12671 applies `f` to the current continuation.  This copies the entire
12672 stack; hence, `callcc` takes time proportional to the size of the
12673 current stack.
12674
12675 * `isolate f`
12676 +
12677 creates a continuation that evaluates `f` in an empty context.  This
12678 is a constant time operation, and yields a constant size stack.
12679
12680 * `prepend (k, f)`
12681 +
12682 composes a function `f` with a continuation `k` to create a
12683 continuation that first does `f` and then does `k`.  This is a
12684 constant time operation.
12685
12686 * `throw (k, v)`
12687 +
12688 throws value `v` to continuation `k`.  This copies the entire stack of
12689 `k`; hence, `throw` takes time proportional to the size of this stack.
12690
12691 * `throw' (k, th)`
12692 +
12693 a generalization of throw that evaluates `th ()` in the context of
12694 `k`.  Thus, for example, if `th ()` raises an exception or captures
12695 another continuation, it will see `k`, not the current continuation.
12696
12697
12698 == Also see ==
12699
12700 * <:MLtonContIsolateImplementation:>
12701
12702 <<<
12703
12704 :mlton-guide-page: MLtonContIsolateImplementation
12705 [[MLtonContIsolateImplementation]]
12706 MLtonContIsolateImplementation
12707 ==============================
12708
12709 As noted before, it is fairly easy to get the operational behavior of `isolate` with just `callcc` and `throw`, but establishing the right space behavior is trickier.  Here, we show how to start from the obvious, but inefficient, implementation of `isolate` using only `callcc` and `throw`, and 'derive' an equivalent, but more efficient, implementation of `isolate` using MLton's primitive stack capture and copy operations.  This isn't a formal derivation, as we are not formally showing the equivalence of the programs (though I believe that they are all equivalent, modulo the space behavior).
12710
12711 Here is a direct implementation of isolate using only `callcc` and `throw`:
12712
12713 [source,sml]
12714 ----
12715 val isolate: ('a -> unit) -> 'a t =
12716   fn (f: 'a -> unit) =>
12717   callcc
12718   (fn k1 =>
12719    let
12720       val x = callcc (fn k2 => throw (k1, k2))
12721       val _ = (f x ; Exit.topLevelSuffix ())
12722               handle exn => MLtonExn.topLevelHandler exn
12723    in
12724       raise Fail "MLton.Cont.isolate: return from (wrapped) func"
12725    end)
12726 ----
12727
12728
12729 We use the standard nested `callcc` trick to return a continuation that is ready to receive an argument, execute the isolated function, and exit the program.  Both `Exit.topLevelSuffix` and `MLtonExn.topLevelHandler` will terminate the program.
12730
12731 Throwing to an isolated function will execute the function in a 'semantically' empty context, in the sense that we never re-execute the 'original' continuation of the call to isolate (i.e., the context that was in place at the time `isolate` was called).  However, we assume that the compiler isn't able to recognize that the 'original' continuation is unused; for example, while we (the programmer) know that `Exit.topLevelSuffix` and `MLtonExn.topLevelHandler` will terminate the program, the compiler may only see opaque calls to unknown foreign-functions.  So, that original continuation (in its entirety) is part of the continuation returned by `isolate` and throwing to the continuation returned by `isolate` will execute `f x` (with the exit wrapper) in the context of that original continuation.  Thus, the garbage collector will retain  everything reachable from that original continuation during the evaluation of `f x`, even though it is 'semantically' garbage.
12732
12733 Note that this space-leak is independent of the implementation of continuations (it arises in both MLton's stack copying implementation of continuations and would arise in SML/NJ's CPS-translation implementation); we are only assuming that the implementation can't 'see' the program termination, and so must retain the original continuation (and anything reachable from it).
12734
12735 So, we need an 'empty' continuation in which to execute `f x`.  (No surprise there, as that is the written description of `isolate`.)  To do this, we capture a top-level continuation and throw to that in order to execute `f x`:
12736
12737 [source,sml]
12738 ----
12739 local
12740 val base: (unit -> unit) t =
12741   callcc
12742   (fn k1 =>
12743    let
12744       val th = callcc (fn k2 => throw (k1, k2))
12745       val _ = (th () ; Exit.topLevelSuffix ())
12746               handle exn => MLtonExn.topLevelHandler exn
12747    in
12748       raise Fail "MLton.Cont.isolate: return from (wrapped) func"
12749    end)
12750 in
12751 val isolate: ('a -> unit) -> 'a t =
12752   fn (f: 'a -> unit) =>
12753   callcc
12754   (fn k1 =>
12755    let
12756       val x = callcc (fn k2 => throw (k1, k2))
12757    in
12758       throw (base, fn () => f x)
12759    end)
12760 end
12761 ----
12762
12763
12764 We presume that `base` is evaluated 'early' in the program.  There is a subtlety here, because one needs to believe that this `base` continuation (which technically corresponds to the entire rest of the program evaluation) 'works' as an empty context; in particular, we want it to be the case that executing `f x` in the `base` context retains less space than executing `f x` in the context in place at the call to `isolate` (as occurred in the previous implementation of `isolate`).  This isn't particularly easy to believe if one takes a normal substitution-based operational semantics, because it seems that the context captured and bound to `base` is arbitrarily large.  However, this context is mostly unevaluated code; the only heap-allocated values that are reachable from it are those that were evaluated before the evaluation of `base` (and used in the program after the evaluation of `base`).  Assuming that `base` is evaluated 'early' in the program, we conclude that there are few heap-allocated values reachable from its continuation.  In contrast, the previous implementation of `isolate` could capture a context that has many heap-allocated values reachable from it (because we could evaluate `isolate f` 'late' in the program and 'deep' in a call stack), which would all remain reachable during the evaluation of
12765 `f x`.  [We'll return to this point later, as it is taking a slightly MLton-esque view of the evaluation of a program, and may not apply as strongly to other implementations (e.g., SML/NJ).]
12766
12767 Now, once we throw to `base` and begin executing `f x`, only the heap-allocated values reachable from `f` and `x` and the few heap-allocated values reachable from `base` are retained by the garbage collector.  So, it seems that `base` 'works' as an empty context.
12768
12769 But, what about the continuation returned from `isolate f`?  Note that the continuation returned by `isolate` is one that receives an argument `x` and then
12770 throws to `base` to evaluate `f x`.  If we used a CPS-translation implementation (and assume sufficient beta-contractions to eliminate administrative redexes), then the original continuation passed to `isolate` (i.e., the continuation bound to `k1`) will not be free in the continuation returned by `isolate f`.  Rather, the only free variables in the continuation returned by `isolate f` will be `base` and `f`, so the only heap-allocated values reachable from the continuation returned by `isolate f` will be those values reachable from `base` (assumed to be few) and those values reachable from `f` (necessary in order to execute `f` at some later point).
12771
12772 But, MLton doesn't use a CPS-translation implementation.  Rather, at each call to `callcc` in the body of `isolate`, MLton will copy the current execution stack.  Thus, `k2` (the continuation returned by `isolate f`) will include execution stack at the time of the call to `isolate f` -- that is, it will include the 'original' continuation of the call to `isolate f`.  Thus, the heap-allocated values reachable from the continuation returned by `isolate f` will include those values reachable from `base`, those values reachable from `f`, and those values reachable from the original continuation of the call to `isolate f`.  So, just holding on to the continuation returned by `isolate f` will retain all of the heap-allocated values live at the time `isolate f` was called.  This leaks space, since, 'semantically', the
12773 continuation returned by `isolate f` only needs the heap-allocated values reachable from `f` (and `base`).
12774
12775 In practice, this probably isn't a significant issue.  A common use of `isolate` is implement `abort`:
12776 [source,sml]
12777 ----
12778 fun abort th = throw (isolate th, ())
12779 ----
12780
12781 The continuation returned by `isolate th` is dead immediately after being thrown to -- the continuation isn't retained, so neither is the 'semantic'
12782 garbage it would have retained.
12783
12784 But, it is easy enough to 'move' onto the 'empty' context `base` the capturing of the context that we want to be returned by `isolate f`:
12785
12786 [source,sml]
12787 ----
12788 local
12789 val base: (unit -> unit) t =
12790   callcc
12791   (fn k1 =>
12792    let
12793       val th = callcc (fn k2 => throw (k1, k2))
12794       val _ = (th () ; Exit.topLevelSuffix ())
12795               handle exn => MLtonExn.topLevelHandler exn
12796    in
12797       raise Fail "MLton.Cont.isolate: return from (wrapped) func"
12798    end)
12799 in
12800 val isolate: ('a -> unit) -> 'a t =
12801   fn (f: 'a -> unit) =>
12802   callcc
12803   (fn k1 =>
12804    throw (base, fn () =>
12805           let
12806              val x = callcc (fn k2 => throw (k1, k2))
12807           in
12808              throw (base, fn () => f x)
12809           end))
12810 end
12811 ----
12812
12813
12814 This implementation now has the right space behavior; the continuation returned by `isolate f` will only retain the heap-allocated values reachable from `f` and from `base`.  (Technically, the continuation will retain two copies of the stack that was in place at the time `base` was evaluated, but we are assuming that that stack small.)
12815
12816 One minor inefficiency of this implementation (given MLton's implementation of continuations) is that every `callcc` and `throw` entails copying a stack (albeit, some of them are small).  We can avoid this in the evaluation of `base` by using a reference cell, because `base` is evaluated at the top-level:
12817
12818 [source,sml]
12819 ----
12820 local
12821 val base: (unit -> unit) option t =
12822   let
12823      val baseRef: (unit -> unit) option t option ref = ref NONE
12824      val th = callcc (fn k => (base := SOME k; NONE))
12825   in
12826      case th of
12827         NONE => (case !baseRef of
12828                     NONE => raise Fail "MLton.Cont.isolate: missing base"
12829                   | SOME base => base)
12830       | SOME th => let
12831                       val _ = (th () ; Exit.topLevelSuffix ())
12832                               handle exn => MLtonExn.topLevelHandler exn
12833                    in
12834                       raise Fail "MLton.Cont.isolate: return from (wrapped)
12835                       func"
12836                    end
12837   end
12838 in
12839 val isolate: ('a -> unit) -> 'a t =
12840   fn (f: 'a -> unit) =>
12841   callcc
12842   (fn k1 =>
12843    throw (base, SOME (fn () =>
12844           let
12845              val x = callcc (fn k2 => throw (k1, k2))
12846           in
12847              throw (base, SOME (fn () => f x))
12848           end)))
12849 end
12850 ----
12851
12852
12853 Now, to evaluate `base`, we only copy the stack once (instead of 3 times).  Because we don't have a dummy continuation around to initialize the reference cell, the reference cell holds a continuation `option`.  To distinguish between the original evaluation of `base` (when we want to return the continuation) and the subsequent evaluations of `base` (when we want to evaluate a thunk), we capture a `(unit -> unit) option` continuation.
12854
12855 This seems to be as far as we can go without exploiting the concrete implementation of continuations in <:MLtonCont:>.  Examining the implementation, we note that the type of
12856 continuations is given by
12857 [source,sml]
12858 ----
12859 type 'a t = (unit -> 'a) -> unit
12860 ----
12861
12862 and the implementation of `throw` is given by
12863 [source,sml]
12864 ----
12865 fun ('a, 'b) throw' (k: 'a t, v: unit -> 'a): 'b =
12866   (k v; raise Fail "MLton.Cont.throw': return from continuation")
12867
12868 fun ('a, 'b) throw (k: 'a t, v: 'a): 'b = throw' (k, fn () => v)
12869 ----
12870
12871
12872 Suffice to say, a continuation is simply a function that accepts a thunk to yield the thrown value and the body of the function performs the actual throw. Using this knowledge, we can create a dummy continuation to initialize `baseRef` and greatly simplify the body of `isolate`:
12873
12874 [source,sml]
12875 ----
12876 local
12877 val base: (unit -> unit) option t =
12878   let
12879      val baseRef: (unit -> unit) option t ref =
12880         ref (fn _ => raise Fail "MLton.Cont.isolate: missing base")
12881      val th = callcc (fn k => (baseRef := k; NONE))
12882   in
12883      case th of
12884         NONE => !baseRef
12885       | SOME th => let
12886                       val _ = (th () ; Exit.topLevelSuffix ())
12887                               handle exn => MLtonExn.topLevelHandler exn
12888                    in
12889                       raise Fail "MLton.Cont.isolate: return from (wrapped)
12890                       func"
12891                    end
12892   end
12893 in
12894 val isolate: ('a -> unit) -> 'a t =
12895   fn (f: 'a -> unit) =>
12896   fn (v: unit -> 'a) =>
12897   throw (base, SOME (f o v))
12898 end
12899 ----
12900
12901
12902 Note that this implementation of `isolate` makes it clear that the continuation returned by `isolate f` only retains the heap-allocated values reachable from `f` and `base`.  It also retains only one copy of the stack that was in place at the time `base` was evaluated.  Finally, it completely avoids making any copies of the stack that is in place at the time `isolate f` is evaluated; indeed, `isolate f` is a constant-time operation.
12903
12904 Next, suppose we limited ourselves to capturing `unit` continuations with `callcc`.  We can't pass the thunk to be evaluated in the 'empty' context directly, but we can use a reference cell.
12905
12906 [source,sml]
12907 ----
12908 local
12909 val thRef: (unit -> unit) option ref = ref NONE
12910 val base: unit t =
12911   let
12912      val baseRef: unit t ref =
12913         ref (fn _ => raise Fail "MLton.Cont.isolate: missing base")
12914      val () = callcc (fn k => baseRef := k)
12915   in
12916      case !thRef of
12917         NONE => !baseRef
12918       | SOME th =>
12919            let
12920               val _ = thRef := NONE
12921               val _ = (th () ; Exit.topLevelSuffix ())
12922                       handle exn => MLtonExn.topLevelHandler exn
12923            in
12924               raise Fail "MLton.Cont.isolate: return from (wrapped) func"
12925            end
12926   end
12927 in
12928 val isolate: ('a -> unit) -> 'a t =
12929   fn (f: 'a -> unit) =>
12930   fn (v: unit -> 'a) =>
12931   let
12932      val () = thRef := SOME (f o v)
12933   in
12934      throw (base, ())
12935   end
12936 end
12937 ----
12938
12939
12940 Note that it is important to set `thRef` to `NONE` before evaluating the thunk, so that the garbage collector doesn't retain all the heap-allocated values reachable from `f` and `v` during the evaluation of `f (v ())`.  This is because `thRef` is still live during the evaluation of the thunk; in particular, it was allocated before the evaluation of `base` (and used after), and so is retained by continuation on which the thunk is evaluated.
12941
12942 This implementation can be easily adapted to use MLton's primitive stack copying operations.
12943
12944 [source,sml]
12945 ----
12946 local
12947 val thRef: (unit -> unit) option ref = ref NONE
12948 val base: Thread.preThread =
12949    let
12950       val () = Thread.copyCurrent ()
12951    in
12952       case !thRef of
12953          NONE => Thread.savedPre ()
12954        | SOME th =>
12955             let
12956                val () = thRef := NONE
12957                val _ = (th () ; Exit.topLevelSuffix ())
12958                        handle exn => MLtonExn.topLevelHandler exn
12959             in
12960                raise Fail "MLton.Cont.isolate: return from (wrapped) func"
12961             end
12962    end
12963 in
12964 val isolate: ('a -> unit) -> 'a t =
12965    fn (f: 'a -> unit) =>
12966    fn (v: unit -> 'a) =>
12967    let
12968       val () = thRef := SOME (f o v)
12969       val new = Thread.copy base
12970    in
12971       Thread.switchTo new
12972    end
12973 end
12974 ----
12975
12976
12977 In essence, `Thread.copyCurrent` copies the current execution stack and stores it in an implicit reference cell in the runtime system, which is fetchable with `Thread.savedPre`.  When we are ready to throw to the isolated function, `Thread.copy` copies the saved execution stack (because the stack is modified in place during execution, we need to retain a pristine copy in case the isolated function itself throws to other isolated functions) and `Thread.switchTo` abandons the current execution stack, installing the newly copied execution stack.
12978
12979 The actual implementation of `MLton.Cont.isolate` simply adds some `Thread.atomicBegin` and `Thread.atomicEnd` commands, which effectively protect the global `thRef` and accommodate the fact that `Thread.switchTo` does an implicit `Thread.atomicEnd` (used for leaving a signal handler thread).
12980
12981 [source,sml]
12982 ----
12983 local
12984 val thRef: (unit -> unit) option ref = ref NONE
12985 val base: Thread.preThread =
12986    let
12987       val () = Thread.copyCurrent ()
12988    in
12989       case !thRef of
12990          NONE => Thread.savedPre ()
12991        | SOME th =>
12992             let
12993                val () = thRef := NONE
12994                val _ = MLton.atomicEnd (* Match 1 *)
12995                val _ = (th () ; Exit.topLevelSuffix ())
12996                        handle exn => MLtonExn.topLevelHandler exn
12997             in
12998                raise Fail "MLton.Cont.isolate: return from (wrapped) func"
12999             end
13000    end
13001 in
13002 val isolate: ('a -> unit) -> 'a t =
13003    fn (f: 'a -> unit) =>
13004    fn (v: unit -> 'a) =>
13005    let
13006       val _ = MLton.atomicBegin (* Match 1 *)
13007       val () = thRef := SOME (f o v)
13008       val new = Thread.copy base
13009       val _ = MLton.atomicBegin (* Match 2 *)
13010    in
13011       Thread.switchTo new (* Match 2 *)
13012    end
13013 end
13014 ----
13015
13016
13017 It is perhaps interesting to note that the above implementation was originally 'derived' by specializing implementations of the <:MLtonThread:> `new`, `prepare`, and `switch` functions as if their only use was in the following implementation of `isolate`:
13018
13019 [source,sml]
13020 ----
13021 val isolate: ('a -> unit) -> 'a t =
13022    fn (f: 'a -> unit) =>
13023    fn (v: unit -> 'a) =>
13024    let
13025       val th = (f (v ()) ; Exit.topLevelSuffix ())
13026                handle exn => MLtonExn.topLevelHandler exn
13027       val t = MLton.Thread.prepare (MLton.Thread.new th, ())
13028    in
13029       MLton.Thread.switch (fn _ => t)
13030    end
13031 ----
13032
13033
13034 It was pleasant to discover that it could equally well be 'derived' starting from the `callcc` and `throw` implementation.
13035
13036 As a final comment, we noted that the degree to which the context of `base` could be considered 'empty' (i.e., retaining few heap-allocated values) depended upon a slightly MLton-esque view.  In particular, MLton does not heap allocate executable code.  So, although the `base` context keeps a lot of unevaluated code 'live', such code is not heap allocated.  In a system like SML/NJ, that does heap allocate executable code, one might want it to be the case that after throwing to an isolated function, the garbage collector retains only the code necessary to evaluate the function, and not any code that was necessary to evaluate the `base` context.
13037
13038 <<<
13039
13040 :mlton-guide-page: MLtonCross
13041 [[MLtonCross]]
13042 MLtonCross
13043 ==========
13044
13045 The debian package MLton-Cross adds various targets to MLton. In
13046 combination with the emdebian project, this allows a debian system to
13047 compile SML files to other architectures.
13048
13049 Currently, these targets are supported:
13050
13051 * _Windows (MinGW)_
13052 ** -target i586-mingw32msvc (mlton-target-i586-mingw32msvc)
13053 ** -target amd64-mingw32msvc( mlton-target-amd64-mingw32msvc)
13054 * _Linux (Debian)_
13055 ** -target alpha-linux-gnu (mlton-target-alpha-linux-gnu)
13056 ** -target arm-linux-gnueabi (mlton-target-arm-linux-gnueabi)
13057 ** -target hppa-linux-gnu (mlton-target-hppa-linux-gnu)
13058 ** -target i486-linux-gnu (mlton-target-i486-linux-gnu)
13059 ** -target ia64-linux-gnu (mlton-target-ia64-linux-gnu)
13060 ** -target mips-linux-gnu (mlton-target-mips-linux-gnu)
13061 ** -target mipsel-linux-gnu (mlton-target-mipsel-linux-gnu)
13062 ** -target powerpc-linux-gnu (mlton-target-powerpc-linux-gnu)
13063 ** -target s390-linux-gnu (mlton-target-s390-linux-gnu)
13064 ** -target sparc-linux-gnu (mlton-target-sparc-linux-gnu)
13065 ** -target x86-64-linux-gnu (mlton-target-x86-64-linux-gnu)
13066
13067
13068 == Download ==
13069
13070 MLton-Cross is kept in-sync with the current MLton release.
13071
13072 * <!Attachment(MLtonCross,mlton-cross_20100608.orig.tar.gz)>
13073
13074 <<<
13075
13076 :mlton-guide-page: MLtonExn
13077 [[MLtonExn]]
13078 MLtonExn
13079 ========
13080
13081 [source,sml]
13082 ----
13083 signature MLTON_EXN =
13084    sig
13085       val addExnMessager: (exn -> string option) -> unit
13086       val history: exn -> string list
13087
13088       val defaultTopLevelHandler: exn -> 'a
13089       val getTopLevelHandler: unit -> (exn -> unit)
13090       val setTopLevelHandler: (exn -> unit) -> unit
13091       val topLevelHandler: exn -> 'a
13092    end
13093 ----
13094
13095 * `addExnMessager f`
13096 +
13097 adds `f` as a pretty-printer to be used by `General.exnMessage` for
13098 converting exceptions to strings.  Messagers are tried in order from
13099 most recently added to least recently added.
13100
13101 * `history e`
13102 +
13103 returns call stack at the point that `e` was first raised.  Each
13104 element of the list is a file position.  The elements are in reverse
13105 chronological order, i.e. the function called last is at the front of
13106 the list.
13107 +
13108 `history e` will return `[]` unless the program is compiled with
13109 `-const 'Exn.keepHistory true'`.
13110
13111 * `defaultTopLevelHandler e`
13112 +
13113 function that behaves as the default top level handler; that is, print
13114 out the unhandled exception message for `e` and exit.
13115
13116 * `getTopLevelHandler ()`
13117 +
13118 get the top level handler.
13119
13120 * `setTopLevelHandler f`
13121 +
13122 set the top level handler to the function `f`.  The function `f`
13123 should not raise an exception or return normally.
13124
13125 * `topLevelHandler e`
13126 +
13127 behaves as if the top level handler received the exception `e`.
13128
13129 <<<
13130
13131 :mlton-guide-page: MLtonFinalizable
13132 [[MLtonFinalizable]]
13133 MLtonFinalizable
13134 ================
13135
13136 [source,sml]
13137 ----
13138 signature MLTON_FINALIZABLE =
13139    sig
13140       type 'a t
13141
13142       val addFinalizer: 'a t * ('a -> unit) -> unit
13143       val finalizeBefore: 'a t * 'b t -> unit
13144       val new: 'a -> 'a t
13145       val touch: 'a t -> unit
13146       val withValue: 'a t * ('a -> 'b) -> 'b
13147    end
13148 ----
13149
13150 A _finalizable_ value is a container to which finalizers can be
13151 attached.  A container holds a value, which is reachable as long as
13152 the container itself is reachable.  A _finalizer_ is a function that
13153 runs at some point after garbage collection determines that the
13154 container to which it is attached has become
13155 <:Reachability:unreachable>.  A finalizer is treated like a signal
13156 handler, in that it runs asynchronously in a separate thread, with
13157 signals blocked, and will not interrupt a critical section (see
13158 <:MLtonThread:>).
13159
13160 * `addFinalizer (v, f)`
13161 +
13162 adds `f` as a finalizer to `v`.  This means that sometime after the
13163 last call to `withValue` on `v` completes and `v` becomes unreachable,
13164 `f` will be called with the value of `v`.
13165
13166 * `finalizeBefore (v1, v2)`
13167 +
13168 ensures that `v1` will be finalized before `v2`.  A cycle of values
13169 `v` = `v1`, ..., `vn` = `v` with `finalizeBefore (vi, vi+1)` will
13170 result in none of the `vi` being finalized.
13171
13172 * `new x`
13173 +
13174 creates a new finalizable value, `v`, with value `x`.  The finalizers
13175 of `v` will run sometime after the last call to `withValue` on `v`
13176 when the garbage collector determines that `v` is unreachable.
13177
13178 * `touch v`
13179 +
13180 ensures that `v`'s finalizers will not run before the call to `touch`.
13181
13182 * `withValue (v, f)`
13183 +
13184 returns the result of applying `f` to the value of `v` and ensures
13185 that `v`'s finalizers will not run before `f` completes.  The call to
13186 `f` is a nontail call.
13187
13188
13189 == Example ==
13190
13191 Suppose that `finalizable.sml` contains the following:
13192 [source,sml]
13193 ----
13194 sys::[./bin/InclGitFile.py mlton master doc/examples/finalizable/finalizable.sml]
13195 ----
13196
13197 Suppose that `cons.c` contains the following.
13198 [source,c]
13199 ----
13200 sys::[./bin/InclGitFile.py mlton master doc/examples/finalizable/cons.c]
13201 ----
13202
13203 We can compile these to create an executable with
13204 ----
13205 % mlton -default-ann 'allowFFI true' finalizable.sml cons.c
13206 ----
13207
13208 Running this executable will create output like the following.
13209 ----
13210 % finalizable
13211 0x08072890 = listSing (2)
13212 0x080728a0 = listCons (2)
13213 0x080728b0 = listCons (2)
13214 0x080728c0 = listCons (2)
13215 0x080728d0 = listCons (2)
13216 0x080728e0 = listCons (2)
13217 0x080728f0 = listCons (2)
13218 listSum
13219 listSum(l) = 14
13220 listFree (0x080728f0)
13221 listFree (0x080728e0)
13222 listFree (0x080728d0)
13223 listFree (0x080728c0)
13224 listFree (0x080728b0)
13225 listFree (0x080728a0)
13226 listFree (0x08072890)
13227 ----
13228
13229
13230 == Synchronous Finalizers ==
13231
13232 Finalizers in MLton are asynchronous.  That is, they run at an
13233 unspecified time, interrupting the user program.  It is also possible,
13234 and sometimes useful, to have synchronous finalizers, where the user
13235 program explicitly decides when to run enabled finalizers.  We have
13236 considered this in MLton, and it seems possible, but there are some
13237 unresolved design issues.  See the thread at
13238
13239 * http://www.mlton.org/pipermail/mlton/2004-September/016570.html
13240
13241 == Also see ==
13242
13243 * <!Cite(Boehm03)>
13244
13245 <<<
13246
13247 :mlton-guide-page: MLtonGC
13248 [[MLtonGC]]
13249 MLtonGC
13250 =======
13251
13252 [source,sml]
13253 ----
13254 signature MLTON_GC =
13255    sig
13256       val collect: unit -> unit
13257       val pack: unit -> unit
13258       val setMessages: bool -> unit
13259       val setSummary: bool -> unit
13260       val unpack: unit -> unit
13261       structure Statistics :
13262          sig
13263             val bytesAllocated: unit -> IntInf.int
13264             val lastBytesLive: unit -> IntInf.int
13265             val numCopyingGCs: unit -> IntInf.int
13266             val numMarkCompactGCs: unit -> IntInf.int
13267             val numMinorGCs: unit -> IntInf.int
13268             val maxBytesLive: unit -> IntInf.int
13269          end
13270    end
13271 ----
13272
13273 * `collect ()`
13274 +
13275 causes a garbage collection to occur.
13276
13277 * `pack ()`
13278 +
13279 shrinks the heap as much as possible so that other processes can use
13280 available RAM.
13281
13282 * `setMessages b`
13283 +
13284 controls whether diagnostic messages are printed at the beginning and
13285 end of each garbage collection.  It is the same as the `gc-messages`
13286 runtime system option.
13287
13288 * `setSummary b`
13289 +
13290 controls whether a summary of garbage collection statistics is printed
13291 upon termination of the program.  It is the same as the `gc-summary`
13292 runtime system option.
13293
13294 * `unpack ()`
13295 +
13296 resizes a packed heap to the size desired by the runtime.
13297
13298 * `Statistics.bytesAllocated ()`
13299 +
13300 returns bytes allocated (as of the most recent garbage collection).
13301
13302 * `Statistics.lastBytesLive ()`
13303 +
13304 returns bytes live (as of the most recent garbage collection).
13305
13306 * `Statistics.numCopyingGCs ()`
13307 +
13308 returns number of (major) copying garbage collections performed (as of
13309 the most recent garbage collection).
13310
13311 * `Statistics.numMarkCompactGCs ()`
13312 +
13313 returns number of (major) mark-compact garbage collections performed
13314 (as of the most recent garbage collection).
13315
13316 * `Statistics.numMinorGCs ()`
13317 +
13318 returns number of minor garbage collections performed (as of the most
13319 recent garbage collection).
13320
13321 * `Statistics.maxBytesLive ()`
13322 +
13323 returns maximum bytes live (as of the most recent garbage collection).
13324
13325 <<<
13326
13327 :mlton-guide-page: MLtonIntInf
13328 [[MLtonIntInf]]
13329 MLtonIntInf
13330 ===========
13331
13332 [source,sml]
13333 ----
13334 signature MLTON_INT_INF =
13335    sig
13336       type t = IntInf.int
13337
13338       val areSmall: t * t -> bool
13339       val gcd: t * t -> t
13340       val isSmall: t -> bool
13341
13342       structure BigWord : WORD
13343       structure SmallInt : INTEGER
13344       datatype rep =
13345          Big of BigWord.word vector
13346        | Small of SmallInt.int
13347       val rep: t -> rep
13348       val fromRep : rep -> t option
13349    end
13350 ----
13351
13352 MLton represents an arbitrary precision integer either as an unboxed
13353 word with the bottom bit set to 1 and the top bits representing a
13354 small signed integer, or as a pointer to a vector of words, where the
13355 first word indicates the sign and the rest are the limbs of a
13356 <:GnuMP:> big integer.
13357
13358 * `type t`
13359 +
13360 the same as type `IntInf.int`.
13361
13362 * `areSmall (a, b)`
13363 +
13364 returns true iff both `a` and `b` are small.
13365
13366 * `gcd (a, b)`
13367 +
13368 uses the <:GnuMP:GnuMP's> fast gcd implementation.
13369
13370 * `isSmall a`
13371 +
13372 returns true iff `a` is small.
13373
13374 * `BigWord : WORD`
13375 +
13376 representation of a big `IntInf.int` as a vector of words; on 32-bit
13377 platforms, `BigWord` is likely to be equivalent to `Word32`, and on
13378 64-bit platforms, `BigWord` is likely to be equivalent to `Word64`.
13379
13380 * `SmallInt : INTEGER`
13381 +
13382 representation of a small `IntInf.int` as a signed integer; on 32-bit
13383 platforms, `SmallInt` is likely to be equivalent to `Int32`, and on
13384 64-bit platforms, `SmallInt` is likely to be equivalent to `Int64`.
13385
13386 * `datatype rep`
13387 +
13388 the underlying representation of an `IntInf.int`.
13389
13390 * `rep i`
13391 +
13392 returns the underlying representation of `i`.
13393
13394 * `fromRep r`
13395 +
13396 converts from the underlying representation back to an `IntInf.int`.
13397 If `fromRep r` is given anything besides the valid result of `rep i`
13398 for some `i`, this function call will return `NONE`.
13399
13400 <<<
13401
13402 :mlton-guide-page: MLtonIO
13403 [[MLtonIO]]
13404 MLtonIO
13405 =======
13406
13407 [source,sml]
13408 ----
13409 signature MLTON_IO =
13410    sig
13411       type instream
13412       type outstream
13413
13414       val inFd: instream -> Posix.IO.file_desc
13415       val mkstemp: string -> string * outstream
13416       val mkstemps: {prefix: string, suffix: string} -> string * outstream
13417       val newIn: Posix.IO.file_desc * string -> instream
13418       val newOut: Posix.IO.file_desc * string -> outstream
13419       val outFd: outstream -> Posix.IO.file_desc
13420       val tempPrefix: string -> string
13421    end
13422 ----
13423
13424 * `inFd ins`
13425 +
13426 returns the file descriptor corresponding to `ins`.
13427
13428 * `mkstemp s`
13429 +
13430 like the C `mkstemp` function, generates and open a temporary file
13431 with prefix `s`.
13432
13433 * `mkstemps {prefix, suffix}`
13434 +
13435 like `mkstemp`, except it has both a prefix and suffix.
13436
13437 * `newIn (fd, name)`
13438 +
13439 creates a new instream from file descriptor `fd`, with `name` used in
13440 any `Io` exceptions later raised.
13441
13442 * `newOut (fd, name)`
13443 +
13444 creates a new outstream from file descriptor `fd`, with `name` used in
13445 any `Io` exceptions later raised.
13446
13447 * `outFd out`
13448 +
13449 returns the file descriptor corresponding to `out`.
13450
13451 * `tempPrefix s`
13452 +
13453 adds a suitable system or user specific prefix (directory) for temp
13454 files.
13455
13456 <<<
13457
13458 :mlton-guide-page: MLtonItimer
13459 [[MLtonItimer]]
13460 MLtonItimer
13461 ===========
13462
13463 [source,sml]
13464 ----
13465 signature MLTON_ITIMER =
13466    sig
13467       datatype t =
13468          Prof
13469        | Real
13470        | Virtual
13471
13472       val set: t * {interval: Time.time, value: Time.time} -> unit
13473       val signal: t -> Posix.Signal.signal
13474    end
13475 ----
13476
13477 * `set (t, {interval, value})`
13478 +
13479 sets the interval timer (using `setitimer`) specified by `t` to the
13480 given `interval` and `value`.
13481
13482 * `signal t`
13483 +
13484 returns the signal corresponding to `t`.
13485
13486 <<<
13487
13488 :mlton-guide-page: MLtonLibraryProject
13489 [[MLtonLibraryProject]]
13490 MLtonLibraryProject
13491 ===================
13492
13493 We have a https://github.com/MLton/mltonlib[MLton Library repository]
13494 that is intended to collect libraries.
13495
13496 =====
13497   https://github.com/MLton/mltonlib
13498 =====
13499
13500 Libraries are kept in the `master` branch, and are grouped according
13501 to domain name, in the Java package style.  For example,
13502 <:VesaKarvonen:>, who works at `ssh.com`, has been putting code at:
13503
13504 =====
13505   https://github.com/MLton/mltonlib/tree/master/com/ssh
13506 =====
13507
13508 <:StephenWeeks:>, owning `sweeks.com`, has been putting code at:
13509
13510 =====
13511   https://github.com/MLton/mltonlib/tree/master/com/sweeks
13512 =====
13513
13514 A "library" is a subdirectory of some such directory.  For example,
13515 Stephen's basis-library replacement library is at
13516
13517 =====
13518   https://github.com/MLton/mltonlib/tree/master/com/sweeks/basic
13519 =====
13520
13521 We use "transparent per-library branching" to handle library
13522 versioning.  Each library has an "unstable" subdirectory in which work
13523 happens.  When one is happy with a library, one tags it by copying it
13524 to a stable version directory.  Stable libraries are immutable -- when
13525 one refers to a stable library, one always gets exactly the same code.
13526 No one has actually made a stable library yet, but, when I'm ready to
13527 tag my library, I was thinking that I would do something like copying
13528
13529 =====
13530   https://github.com/MLton/mltonlib/tree/master/com/sweeks/basic/unstable
13531 =====
13532
13533 to
13534
13535 =====
13536   https://github.com/MLton/mltonlib/tree/master/com/sweeks/basic/v1
13537 =====
13538
13539 So far, libraries in the MLton repository have been licensed under
13540 MLton's <:License:>.  We haven't decided on whether that will be a
13541 requirement to be in the repository or not.  For the sake of
13542 simplicity (a single license) and encouraging widest use of code,
13543 contributors are encouraged to use that license.  But it may be too
13544 strict to require it.
13545
13546 If someone wants to contribute a new library to our repository or to
13547 work on an old one, they can make a pull request.  If people want to
13548 work in their own repository, they can do so -- that's the point of
13549 using domain names to prevent clashes.  The idea is that a user should
13550 be able to bring library collections in from many different
13551 repositories without problems.  And those libraries could even work
13552 with each other.
13553
13554 At some point we may want to settle on an <:MLBasisPathMap:> variable
13555 for the root of the library project.  Or, we could reuse `SML_LIB`,
13556 and migrate what we currently keep there into the library
13557 infrastructure.
13558
13559 <<<
13560
13561 :mlton-guide-page: MLtonMonoArray
13562 [[MLtonMonoArray]]
13563 MLtonMonoArray
13564 ==============
13565
13566 [source,sml]
13567 ----
13568 signature MLTON_MONO_ARRAY =
13569    sig
13570       type t
13571       type elem
13572       val fromPoly: elem array -> t
13573       val toPoly: t -> elem array
13574    end
13575 ----
13576
13577 * `type t`
13578 +
13579 type of monomorphic array
13580
13581 * `type elem`
13582 +
13583 type of array elements
13584
13585 * `fromPoly a`
13586 +
13587 type cast a polymorphic array to its monomorphic counterpart; the
13588 argument and result arrays share the same identity
13589
13590 * `toPoly a`
13591 +
13592 type cast a monomorphic array to its polymorphic counterpart; the
13593 argument and result arrays share the same identity
13594
13595 <<<
13596
13597 :mlton-guide-page: MLtonMonoVector
13598 [[MLtonMonoVector]]
13599 MLtonMonoVector
13600 ===============
13601
13602 [source,sml]
13603 ----
13604 signature MLTON_MONO_VECTOR =
13605    sig
13606       type t
13607       type elem
13608       val fromPoly: elem vector -> t
13609       val toPoly: t -> elem vector
13610    end
13611 ----
13612
13613 * `type t`
13614 +
13615 type of monomorphic vector
13616
13617 * `type elem`
13618 +
13619 type of vector elements
13620
13621 * `fromPoly v`
13622 +
13623 type cast a polymorphic vector to its monomorphic counterpart; in
13624 MLton, this is a constant-time operation
13625
13626 * `toPoly v`
13627 +
13628 type cast a monomorphic vector to its polymorphic counterpart; in
13629 MLton, this is a constant-time operation
13630
13631 <<<
13632
13633 :mlton-guide-page: MLtonPlatform
13634 [[MLtonPlatform]]
13635 MLtonPlatform
13636 =============
13637
13638 [source,sml]
13639 ----
13640 signature MLTON_PLATFORM =
13641    sig
13642       structure Arch:
13643          sig
13644             datatype t = Alpha | AMD64 | ARM | ARM64 | HPPA | IA64 | m68k
13645                        | MIPS | PowerPC | PowerPC64 | S390 | Sparc | X86
13646
13647             val fromString: string -> t option
13648             val host: t
13649             val toString: t -> string
13650          end
13651
13652       structure OS:
13653          sig
13654             datatype t = AIX | Cygwin | Darwin | FreeBSD | Hurd | HPUX
13655                        | Linux | MinGW | NetBSD | OpenBSD | Solaris
13656
13657             val fromString: string -> t option
13658             val host: t
13659             val toString: t -> string
13660          end
13661    end
13662 ----
13663
13664 * `datatype Arch.t`
13665 +
13666 processor architectures
13667
13668 * `Arch.fromString a`
13669 +
13670 converts from string to architecture.  Case insensitive.
13671
13672 * `Arch.host`
13673 +
13674 the architecture for which the program is compiled.
13675
13676 * `Arch.toString`
13677 +
13678 string for architecture.
13679
13680 * `datatype OS.t`
13681 +
13682 operating systems
13683
13684 * `OS.fromString`
13685 +
13686 converts from string to operating system.  Case insensitive.
13687
13688 * `OS.host`
13689 +
13690 the operating system for which the program is compiled.
13691
13692 * `OS.toString`
13693 +
13694 string for operating system.
13695
13696 <<<
13697
13698 :mlton-guide-page: MLtonPointer
13699 [[MLtonPointer]]
13700 MLtonPointer
13701 ============
13702
13703 [source,sml]
13704 ----
13705 signature MLTON_POINTER =
13706    sig
13707       eqtype t
13708
13709       val add: t * word -> t
13710       val compare: t * t -> order
13711       val diff: t * t -> word
13712       val getInt8: t * int -> Int8.int
13713       val getInt16: t * int -> Int16.int
13714       val getInt32: t * int -> Int32.int
13715       val getInt64: t * int -> Int64.int
13716       val getPointer: t * int -> t
13717       val getReal32: t * int -> Real32.real
13718       val getReal64: t * int -> Real64.real
13719       val getWord8: t * int -> Word8.word
13720       val getWord16: t * int -> Word16.word
13721       val getWord32: t * int -> Word32.word
13722       val getWord64: t * int -> Word64.word
13723       val null: t
13724       val setInt8: t * int * Int8.int -> unit
13725       val setInt16: t * int * Int16.int -> unit
13726       val setInt32: t * int * Int32.int -> unit
13727       val setInt64: t * int * Int64.int -> unit
13728       val setPointer: t * int * t -> unit
13729       val setReal32: t * int * Real32.real -> unit
13730       val setReal64: t * int * Real64.real -> unit
13731       val setWord8: t * int * Word8.word -> unit
13732       val setWord16: t * int * Word16.word -> unit
13733       val setWord32: t * int * Word32.word -> unit
13734       val setWord64: t * int * Word64.word -> unit
13735       val sizeofPointer: word
13736       val sub: t * word -> t
13737    end
13738 ----
13739
13740 * `eqtype t`
13741 +
13742 the type of pointers, i.e. machine addresses.
13743
13744 * `add (p, w)`
13745 +
13746 returns the pointer `w` bytes after than `p`.  Does not check for
13747 overflow.
13748
13749 * `compare (p1, p2)`
13750 +
13751 compares the pointer `p1` to the pointer `p2` (as addresses).
13752
13753 * `diff (p1, p2)`
13754 +
13755 returns the number of bytes `w` such that `add (p2, w) = p1`.  Does
13756 not check for overflow.
13757
13758 * ++get__<X>__ (p, i)++
13759 +
13760 returns the object stored at index i of the array of _X_ objects
13761 pointed to by `p`.  For example, `getWord32 (p, 7)` returns the 32-bit
13762 word stored 28 bytes beyond `p`.
13763
13764 * `null`
13765 +
13766 the null pointer, i.e. 0.
13767
13768 * ++set__<X>__ (p, i, v)++
13769 +
13770 assigns `v` to the object stored at index i of the array of _X_
13771 objects pointed to by `p`.  For example, `setWord32 (p, 7, w)` stores
13772 the 32-bit word `w` at the address 28 bytes beyond `p`.
13773
13774 * `sizeofPointer`
13775 +
13776 size, in bytes, of a pointer.
13777
13778 * `sub (p, w)`
13779 +
13780 returns the pointer `w` bytes before `p`.  Does not check for
13781 overflow.
13782
13783 <<<
13784
13785 :mlton-guide-page: MLtonProcEnv
13786 [[MLtonProcEnv]]
13787 MLtonProcEnv
13788 ============
13789
13790 [source,sml]
13791 ----
13792 signature MLTON_PROC_ENV =
13793    sig
13794       type gid
13795
13796       val setenv: {name: string, value: string} -> unit
13797       val setgroups: gid list -> unit
13798   end
13799 ----
13800
13801 * `setenv {name, value}`
13802 +
13803 like the C `setenv` function.  Does not require `name` or `value` to
13804 be null terminated.
13805
13806 * `setgroups grps`
13807 +
13808 like the C `setgroups` function.
13809
13810 <<<
13811
13812 :mlton-guide-page: MLtonProcess
13813 [[MLtonProcess]]
13814 MLtonProcess
13815 ============
13816
13817 [source,sml]
13818 ----
13819 signature MLTON_PROCESS =
13820    sig
13821       type pid
13822
13823       val spawn: {args: string list, path: string} -> pid
13824       val spawne: {args: string list, env: string list, path: string} -> pid
13825       val spawnp: {args: string list, file: string} -> pid
13826
13827       type ('stdin, 'stdout, 'stderr) t
13828
13829       type input
13830       type output
13831
13832       type none
13833       type chain
13834       type any
13835
13836       exception MisuseOfForget
13837       exception DoublyRedirected
13838
13839       structure Child:
13840         sig
13841           type ('use, 'dir) t
13842
13843           val binIn: (BinIO.instream, input) t -> BinIO.instream
13844           val binOut: (BinIO.outstream, output) t -> BinIO.outstream
13845           val fd: (Posix.FileSys.file_desc, 'dir) t -> Posix.FileSys.file_desc
13846           val remember: (any, 'dir) t -> ('use, 'dir) t
13847           val textIn: (TextIO.instream, input) t -> TextIO.instream
13848           val textOut: (TextIO.outstream, output) t -> TextIO.outstream
13849         end
13850
13851       structure Param:
13852         sig
13853           type ('use, 'dir) t
13854
13855           val child: (chain, 'dir) Child.t -> (none, 'dir) t
13856           val fd: Posix.FileSys.file_desc -> (none, 'dir) t
13857           val file: string -> (none, 'dir) t
13858           val forget: ('use, 'dir) t -> (any, 'dir) t
13859           val null: (none, 'dir) t
13860           val pipe: ('use, 'dir) t
13861           val self: (none, 'dir) t
13862         end
13863
13864       val create:
13865          {args: string list,
13866           env: string list option,
13867           path: string,
13868           stderr: ('stderr, output) Param.t,
13869           stdin: ('stdin, input) Param.t,
13870           stdout: ('stdout, output) Param.t}
13871          -> ('stdin, 'stdout, 'stderr) t
13872       val getStderr: ('stdin, 'stdout, 'stderr) t -> ('stderr, input) Child.t
13873       val getStdin:  ('stdin, 'stdout, 'stderr) t -> ('stdin, output) Child.t
13874       val getStdout: ('stdin, 'stdout, 'stderr) t -> ('stdout, input) Child.t
13875       val kill: ('stdin, 'stdout, 'stderr) t * Posix.Signal.signal -> unit
13876       val reap: ('stdin, 'stdout, 'stderr) t -> Posix.Process.exit_status
13877    end
13878 ----
13879
13880
13881 == Spawn ==
13882
13883 The `spawn` functions provide an alternative to the
13884 `fork`/`exec` idiom that is typically used to create a new
13885 process.  On most platforms, the `spawn` functions are simple
13886 wrappers around `fork`/`exec`.  However, under Windows, the
13887 `spawn` functions are primitive.  All `spawn` functions return
13888 the process id of the spawned process.  They differ in how the
13889 executable is found and the environment that it uses.
13890
13891 * `spawn {args, path}`
13892 +
13893 starts a new process running the executable specified by `path`
13894 with the arguments `args`.  Like `Posix.Process.exec`.
13895
13896 * `spawne {args, env, path}`
13897 +
13898 starts a new process running the executable specified by `path` with
13899 the arguments `args` and environment `env`.  Like
13900 `Posix.Process.exece`.
13901
13902 * `spawnp {args, file}`
13903 +
13904 search the `PATH` environment variable for an executable named `file`,
13905 and start a new process running that executable with the arguments
13906 `args`.  Like `Posix.Process.execp`.
13907
13908
13909 == Create ==
13910
13911 `MLton.Process.create` provides functionality similar to
13912 `Unix.executeInEnv`, but provides more control control over the input,
13913 output, and error streams.  In addition, `create` works on all
13914 platforms, including Cygwin and MinGW (Windows) where `Posix.fork` is
13915 unavailable.  For greatest portability programs should still use the
13916 standard `Unix.execute`, `Unix.executeInEnv`, and `OS.Process.system`.
13917
13918 The following types and sub-structures are used by the `create`
13919 function.  They provide static type checking of correct stream usage.
13920
13921 === Child ===
13922
13923 * `('use, 'dir) Child.t`
13924 +
13925 This represents a handle to one of a child's standard streams. The
13926 `'dir` is viewed with respect to the parent. Thus a `('a, input)
13927 Child.t` handle means that the parent may input the output from the
13928 child.
13929
13930 * `Child.{bin,text}{In,Out} h`
13931 +
13932 These functions take a handle and bind it to a stream of the named
13933 type.  The type system will detect attempts to reverse the direction
13934 of a stream or to use the same stream in multiple, incompatible ways.
13935
13936 * `Child.fd h`
13937 +
13938 This function behaves like the other `Child.*` functions; it opens a
13939 stream. However, it does not enforce that you read or write from the
13940 handle. If you use the descriptor in an inappropriate direction, the
13941 behavior is undefined. Furthermore, this function may potentially be
13942 unavailable on future MLton host platforms.
13943
13944 * `Child.remember h`
13945 +
13946 This function takes a stream of use `any` and resets the use of the
13947 stream so that the stream may be used by `Child.*`. An `any` stream
13948 may have had use `none` or `'use` prior to calling `Param.forget`. If
13949 the stream was `none` and is used, `MisuseOfForget` is raised.
13950
13951 === Param ===
13952
13953 * `('use, 'dir) Param.t`
13954 +
13955 This is a handle to an input/output source and will be passed to the
13956 created child process. The `'dir` is relative to the child process.
13957 Input means that the child process will read from this stream.
13958
13959 * `Param.child h`
13960 +
13961 Connect the stream of the new child process to the stream of a
13962 previously created child process. A single child stream should be
13963 connected to only one child process or else `DoublyRedirected` will be
13964 raised.
13965
13966 * `Param.fd fd`
13967 +
13968 This creates a stream from the provided file descriptor which will be
13969 closed when `create` is called. This function may not be available on
13970 future MLton host platforms.
13971
13972 * `Param.forget h`
13973 +
13974 This hides the type of the actual parameter as `any`. This is useful
13975 if you are implementing an application which conditionally attaches
13976 the child process to files or pipes. However, you must ensure that
13977 your use after `Child.remember` matches the original type.
13978
13979 * `Param.file s`
13980 +
13981 Open the given file and connect it to the child process. Note that the
13982 file will be opened only when `create` is called. So any exceptions
13983 will be raised there and not by this function. If used for `input`,
13984 the file is opened read-only. If used for `output`, the file is opened
13985 read-write.
13986
13987 * `Param.null`
13988 +
13989 In some situations, the child process should have its output
13990 discarded.  The `null` param when passed as `stdout` or `stderr` does
13991 this.  When used for `stdin`, the child process will either receive
13992 `EOF` or a failure condition if it attempts to read from `stdin`.
13993
13994 * `Param.pipe`
13995 +
13996 This will connect the input/output of the child process to a pipe
13997 which the parent process holds. This may later form the input to one
13998 of the `Child.*` functions and/or the `Param.child` function.
13999
14000 * `Param.self`
14001 +
14002 This will connect the input/output of the child process to the
14003 corresponding stream of the parent process.
14004
14005 === Process ===
14006
14007 * `type ('stdin, 'stdout, 'stderr) t`
14008 +
14009 represents a handle to a child process.  The type arguments capture
14010 how the named stream of the child process may be used.
14011
14012 * `type any`
14013 +
14014 bypasses the type system in situations where an application does not
14015 want the it to enforce correct usage.  See `Child.remember` and
14016 `Param.forget`.
14017
14018 * `type chain`
14019 +
14020 means that the child process's stream was connected via a pipe to the
14021 parent process. The parent process may pass this pipe in turn to
14022 another child, thus chaining them together.
14023
14024 * `type input, output`
14025 +
14026 record the direction that a stream flows.  They are used as a part of
14027 `Param.t` and `Child.t` and is detailed there.
14028
14029 * `type none`
14030 +
14031 means that the child process's stream my not be used by the parent
14032 process.  This happens when the child process is connected directly to
14033 some source.
14034 +
14035 The types `BinIO.instream`, `BinIO.outstream`, `TextIO.instream`,
14036 `TextIO.outstream`, and `Posix.FileSys.file_desc` are also valid types
14037 with which to instantiate child streams.
14038
14039 * `exception MisuseOfForget`
14040 +
14041 may be raised if `Child.remember` and `Param.forget` are used to
14042 bypass the normal type checking.  This exception will only be raised
14043 in cases where the `forget` mechanism allows a misuse that would be
14044 impossible with the type-safe versions.
14045
14046 * `exception DoublyRedirected`
14047 +
14048 raised if a stream connected to a child process is redirected to two
14049 separate child processes.  It is safe, though bad style, to use the a
14050 `Child.t` with the same `Child.*` function repeatedly.
14051
14052 * `create {args, path, env, stderr, stdin, stdout}`
14053 +
14054 starts a child process with the given command-line `args` (excluding
14055 the program name). `path` should be an absolute path to the executable
14056 run in the new child process; relative paths work, but are less
14057 robust.  Optionally, the environment may be overridden with `env`
14058 where each string element has the form `"key=value"`. The `std*`
14059 options must be provided by the `Param.*` functions documented above.
14060 +
14061 Processes which are `create`-d must be either `reap`-ed or `kill`-ed.
14062
14063 * `getStd{in,out,err} proc`
14064 +
14065 gets a handle to the specified stream. These should be used by the
14066 `Child.*` functions. Failure to use a stream connected via pipe to a
14067 child process may result in runtime dead-lock and elicits a compiler
14068 warning.
14069
14070 * `kill (proc, sig)`
14071 +
14072 terminates the child process immediately.  The signal may or may not
14073 mean anything depending on the host platform.  A good value is
14074 `Posix.Signal.term`.
14075
14076 * `reap proc`
14077 +
14078 waits for the child process to terminate and return its exit status.
14079
14080
14081 == Important usage notes ==
14082
14083 When building an application with many pipes between child processes,
14084 it is important to ensure that there are no cycles in the undirected
14085 pipe graph.  If this property is not maintained, deadlocks are a very
14086 serious potential bug which may only appear under difficult to
14087 reproduce conditions.
14088
14089 The danger lies in that most operating systems implement pipes with a
14090 fixed buffer size. If process A has two output pipes which process B
14091 reads, it can happen that process A blocks writing to pipe 2 because
14092 it is full while process B blocks reading from pipe 1 because it is
14093 empty. This same situation can happen with any undirected cycle formed
14094 between processes (vertexes) and pipes (undirected edges) in the
14095 graph.
14096
14097 It is possible to make this safe using low-level I/O primitives for
14098 polling.  However, these primitives are not very portable and
14099 difficult to use properly.  A far better approach is to make sure you
14100 never create a cycle in the first place.
14101
14102 For these reasons, the `Unix.executeInEnv` is a very dangerous
14103 function. Be careful when using it to ensure that the child process
14104 only operates on either `stdin` or `stdout`, but not both.
14105
14106
14107 == Example use of MLton.Process.create ==
14108
14109 The following example program launches the `ipconfig` utility, pipes
14110 its output through `grep`, and then reads the result back into the
14111 program.
14112
14113 [source,sml]
14114 ----
14115 open MLton.Process
14116 val p =
14117         create {args = [ "/all" ],
14118                 env = NONE,
14119                 path = "C:\\WINDOWS\\system32\\ipconfig.exe",
14120                 stderr = Param.self,
14121                 stdin = Param.null,
14122                 stdout = Param.pipe}
14123 val q =
14124         create {args = [ "IP-Ad" ],
14125                 env = NONE,
14126                 path = "C:\\msys\\bin\\grep.exe",
14127                 stderr = Param.self,
14128                 stdin = Param.child (getStdout p),
14129                 stdout = Param.pipe}
14130 fun suck h =
14131         case TextIO.inputLine h of
14132                 NONE => ()
14133                 | SOME s => (print ("'" ^ s ^ "'\n"); suck h)
14134
14135 val () = suck (Child.textIn (getStdout q))
14136 ----
14137
14138 <<<
14139
14140 :mlton-guide-page: MLtonProfile
14141 [[MLtonProfile]]
14142 MLtonProfile
14143 ============
14144
14145 [source,sml]
14146 ----
14147 signature MLTON_PROFILE =
14148    sig
14149       structure Data:
14150          sig
14151             type t
14152
14153             val equals: t * t -> bool
14154             val free: t -> unit
14155             val malloc: unit -> t
14156             val write: t * string -> unit
14157          end
14158
14159       val isOn: bool
14160       val withData: Data.t * (unit -> 'a) -> 'a
14161    end
14162 ----
14163
14164 `MLton.Profile` provides <:Profiling:> control from within the
14165 program, allowing you to profile individual portions of your
14166 program. With `MLton.Profile`, you can create many units of profiling
14167 data (essentially, mappings from functions to counts) during a run of
14168 a program, switch between them while the program is running, and
14169 output multiple `mlmon.out` files.
14170
14171 * `isOn`
14172 +
14173 a compile-time constant that is false only when compiling `-profile no`.
14174
14175 * `type Data.t`
14176 +
14177 the type of a unit of profiling data.  In order to most efficiently
14178 execute non-profiled programs, when compiling `-profile no` (the
14179 default), `Data.t` is equivalent to `unit ref`.
14180
14181 * `Data.equals (x, y)`
14182 +
14183 returns true if the `x` and `y` are the same unit of profiling data.
14184
14185 * `Data.free x`
14186 +
14187 frees the memory associated with the unit of profiling data `x`.  It
14188 is an error to free the current unit of profiling data or to free a
14189 previously freed unit of profiling data.  When compiling
14190 `-profile no`, `Data.free x` is a no-op.
14191
14192 * `Data.malloc ()`
14193 +
14194 returns a new unit of profiling data.  Each unit of profiling data is
14195 allocated from the process address space (but is _not_ in the MLton
14196 heap) and consumes memory proportional to the number of source
14197 functions.  When compiling `-profile no`, `Data.malloc ()` is
14198 equivalent to allocating a new `unit ref`.
14199
14200 * `write (x, f)`
14201 +
14202 writes the accumulated ticks in the unit of profiling data `x` to file
14203 `f`.  It is an error to write a previously freed unit of profiling
14204 data.  When compiling `-profile no`, `write (x, f)` is a no-op.  A
14205 profiled program will always write the current unit of profiling data
14206 at program exit to a file named `mlmon.out`.
14207
14208 * `withData (d, f)`
14209 +
14210 runs `f` with `d` as the unit of profiling data, and returns the
14211 result of `f` after restoring the current unit of profiling data.
14212 When compiling `-profile no`, `withData (d, f)` is equivalent to
14213 `f ()`.
14214
14215
14216 == Example ==
14217
14218 Here is an example, taken from the `examples/profiling` directory,
14219 showing how to profile the executions of the `fib` and `tak` functions
14220 separately.  Suppose that `fib-tak.sml` contains the following.
14221 [source,sml]
14222 ----
14223 structure Profile = MLton.Profile
14224
14225 val fibData = Profile.Data.malloc ()
14226 val takData = Profile.Data.malloc ()
14227
14228 fun wrap (f, d) x =
14229    Profile.withData (d, fn () => f x)
14230
14231 val rec fib =
14232    fn 0 => 0
14233     | 1 => 1
14234     | n => fib (n - 1) + fib (n - 2)
14235 val fib = wrap (fib, fibData)
14236
14237 fun tak (x,y,z) =
14238    if not (y < x)
14239       then z
14240    else tak (tak (x - 1, y, z),
14241              tak (y - 1, z, x),
14242              tak (z - 1, x, y))
14243 val tak = wrap (tak, takData)
14244
14245 val rec f =
14246    fn 0 => ()
14247     | n => (fib 38; f (n-1))
14248 val _ = f 2
14249
14250 val rec g =
14251    fn 0 => ()
14252     | n => (tak (18,12,6); g (n-1))
14253 val _ = g 500
14254
14255 fun done (data, file) =
14256    (Profile.Data.write (data, file)
14257     ; Profile.Data.free data)
14258
14259 val _ = done (fibData, "mlmon.fib.out")
14260 val _ = done (takData, "mlmon.tak.out")
14261 ----
14262
14263 Compile and run the program.
14264 ----
14265 % mlton -profile time fib-tak.sml
14266 % ./fib-tak
14267 ----
14268
14269 Separately display the profiling data for `fib`
14270 ----
14271 % mlprof fib-tak mlmon.fib.out
14272 5.77 seconds of CPU time (0.00 seconds GC)
14273 function   cur
14274 --------- -----
14275 fib       96.9%
14276 <unknown>  3.1%
14277 ----
14278 and for `tak`
14279 ----
14280 % mlprof fib-tak mlmon.tak.out
14281 0.68 seconds of CPU time (0.00 seconds GC)
14282 function  cur
14283 -------- ------
14284 tak      100.0%
14285 ----
14286
14287 Combine the data for `fib` and `tak` by calling `mlprof`
14288 with multiple `mlmon.out` files.
14289 ----
14290 % mlprof fib-tak mlmon.fib.out mlmon.tak.out mlmon.out
14291 6.45 seconds of CPU time (0.00 seconds GC)
14292 function   cur
14293 --------- -----
14294 fib       86.7%
14295 tak       10.5%
14296 <unknown>  2.8%
14297 ----
14298
14299 <<<
14300
14301 :mlton-guide-page: MLtonRandom
14302 [[MLtonRandom]]
14303 MLtonRandom
14304 ===========
14305
14306 [source,sml]
14307 ----
14308 signature MLTON_RANDOM =
14309    sig
14310       val alphaNumChar: unit -> char
14311       val alphaNumString: int -> string
14312       val rand: unit -> word
14313       val seed: unit -> word option
14314       val srand: word -> unit
14315       val useed: unit -> word option
14316    end
14317 ----
14318
14319 * `alphaNumChar ()`
14320 +
14321 returns a random alphanumeric character.
14322
14323 * `alphaNumString n`
14324 +
14325 returns a string of length `n` of random alphanumeric characters.
14326
14327 * `rand ()`
14328 +
14329 returns the next pseudo-random number.
14330
14331 * `seed ()`
14332 +
14333 returns a random word from `/dev/random`.  Useful as an arg to
14334 `srand`.  If `/dev/random` can not be read from, `seed ()` returns
14335 `NONE`.  A call to `seed` may block until enough random bits are
14336 available.
14337
14338 * `srand w`
14339 +
14340 sets the seed used by `rand` to `w`.
14341
14342 * `useed ()`
14343 +
14344 returns a random word from `/dev/urandom`.  Useful as an arg to
14345 `srand`.  If `/dev/urandom` can not be read from, `useed ()` returns
14346 `NONE`.  A call to `useed` will never block -- it will instead return
14347 lower quality random bits.
14348
14349 <<<
14350
14351 :mlton-guide-page: MLtonReal
14352 [[MLtonReal]]
14353 MLtonReal
14354 =========
14355
14356 [source,sml]
14357 ----
14358 signature MLTON_REAL =
14359    sig
14360       type t
14361
14362       val fromWord: word -> t
14363       val fromLargeWord: LargeWord.word -> t
14364       val toWord: IEEEReal.rounding_mode -> t -> word
14365       val toLargeWord: IEEEReal.rounding_mode -> t -> LargeWord.word
14366    end
14367 ----
14368
14369 * `type t`
14370 +
14371 the type of reals.  For `MLton.LargeReal` this is `LargeReal.real`,
14372 for `MLton.Real` this is `Real.real`, for `MLton.Real32` this is
14373 `Real32.real`, for `MLton.Real64` this is `Real64.real`.
14374
14375 * `fromWord w`
14376 * `fromLargeWord w`
14377 +
14378 convert the word `w` to a real value.  If the value of `w` is larger
14379 than (the appropriate) `REAL.maxFinite`, then infinity is returned.
14380 If `w` cannot be exactly represented as a real value, then the current
14381 rounding mode is used to determine the resulting value.
14382
14383 * `toWord mode r`
14384 * `toLargeWord mode r`
14385 +
14386 convert the argument `r` to a word type using the specified rounding
14387 mode. They raise `Overflow` if the result is not representable, in
14388 particular, if `r` is an infinity. They raise `Domain` if `r` is NaN.
14389
14390 * `MLton.Real32.castFromWord w`
14391 * `MLton.Real64.castFromWord w`
14392 +
14393 convert the argument `w` to a real type as a bit-wise cast.
14394
14395 * `MLton.Real32.castToWord r`
14396 * `MLton.Real64.castToWord r`
14397 +
14398 convert the argument `r` to a word type as a bit-wise cast.
14399
14400 <<<
14401
14402 :mlton-guide-page: MLtonRlimit
14403 [[MLtonRlimit]]
14404 MLtonRlimit
14405 ===========
14406
14407 [source,sml]
14408 ----
14409 signature MLTON_RLIMIT =
14410    sig
14411       structure RLim : sig
14412                           type t
14413                           val castFromSysWord: SysWord.word -> t
14414                           val castToSysWord: t -> SysWord.word
14415                        end
14416
14417       val infinity: RLim.t
14418
14419       type t
14420
14421       val coreFileSize: t        (* CORE    max core file size *)
14422       val cpuTime: t             (* CPU     CPU time in seconds *)
14423       val dataSize: t            (* DATA    max data size *)
14424       val fileSize: t            (* FSIZE   Maximum filesize *)
14425       val numFiles: t            (* NOFILE  max number of open files *)
14426       val lockedInMemorySize: t  (* MEMLOCK max locked address space *)
14427       val numProcesses: t        (* NPROC   max number of processes *)
14428       val residentSetSize: t     (* RSS     max resident set size *)
14429       val stackSize: t           (* STACK   max stack size *)
14430       val virtualMemorySize: t   (* AS      virtual memory limit *)
14431
14432       val get: t -> {hard: rlim, soft: rlim}
14433       val set: t * {hard: rlim, soft: rlim} -> unit
14434    end
14435 ----
14436
14437 `MLton.Rlimit` provides a wrapper around the C `getrlimit` and
14438 `setrlimit` functions.
14439
14440 * `type Rlim.t`
14441 +
14442 the type of resource limits.
14443
14444 * `infinity`
14445 +
14446 indicates that a resource is unlimited.
14447
14448 * `type t`
14449 +
14450 the types of resources that can be inspected and modified.
14451
14452 * `get r`
14453 +
14454 returns the current hard and soft limits for resource `r`. May raise
14455 `OS.SysErr`.
14456
14457 * `set (r, {hard, soft})`
14458 +
14459 sets the hard and soft limits for resource `r`.  May raise
14460 `OS.SysErr`.
14461
14462 <<<
14463
14464 :mlton-guide-page: MLtonRusage
14465 [[MLtonRusage]]
14466 MLtonRusage
14467 ===========
14468
14469 [source,sml]
14470 ----
14471 signature MLTON_RUSAGE =
14472    sig
14473       type t = {utime: Time.time, (* user time *)
14474                 stime: Time.time} (* system time *)
14475
14476       val measureGC: bool -> unit
14477       val rusage: unit -> {children: t, gc: t, self: t}
14478    end
14479 ----
14480
14481 * `type t`
14482 +
14483 corresponds to a subset of the C `struct rusage`.
14484
14485 * `measureGC b`
14486 +
14487 controls whether garbage collection time is separately measured during
14488 program execution.  This affects the behavior of both `rusage` and
14489 `Timer.checkCPUTimes`, both of which will return gc times of zero with
14490 `measureGC false`.  Garbage collection time is always measured when
14491 either `gc-messages` or `gc-summary` is given as a
14492 <:RunTimeOptions:runtime system option>.
14493
14494 * `rusage ()`
14495 +
14496 corresponds to the C `getrusage` function.  It returns the resource
14497 usage of the exited children, the garbage collector, and the process
14498 itself.  The `self` component includes the usage of the `gc`
14499 component, regardless of whether `measureGC` is `true` or `false`.  If
14500 `rusage` is used in a program, either directly, or indirectly via the
14501 `Timer` structure, then `measureGC true` is automatically called at
14502 the start of the program (it can still be disable by user code later).
14503
14504 <<<
14505
14506 :mlton-guide-page: MLtonSignal
14507 [[MLtonSignal]]
14508 MLtonSignal
14509 ===========
14510
14511 [source,sml]
14512 ----
14513 signature MLTON_SIGNAL =
14514    sig
14515       type t = Posix.Signal.signal
14516       type signal = t
14517
14518       structure Handler:
14519          sig
14520             type t
14521
14522             val default: t
14523             val handler: (Thread.Runnable.t -> Thread.Runnable.t) -> t
14524             val ignore: t
14525             val isDefault: t -> bool
14526             val isIgnore: t -> bool
14527             val simple: (unit -> unit) -> t
14528          end
14529
14530       structure Mask:
14531          sig
14532             type t
14533
14534             val all: t
14535             val allBut: signal list -> t
14536             val block: t -> unit
14537             val getBlocked: unit -> t
14538             val isMember: t * signal -> bool
14539             val none: t
14540             val setBlocked: t -> unit
14541             val some: signal list -> t
14542             val unblock: t -> unit
14543          end
14544
14545       val getHandler: t -> Handler.t
14546       val handled: unit -> Mask.t
14547       val prof: t
14548       val restart: bool ref
14549       val setHandler: t * Handler.t -> unit
14550       val suspend: Mask.t -> unit
14551       val vtalrm: t
14552    end
14553 ----
14554
14555 Signals handlers are functions from (runnable) threads to (runnable)
14556 threads.  When a signal arrives, the corresponding signal handler is
14557 invoked, its argument being the thread that was interrupted by the
14558 signal.  The signal handler runs asynchronously, in its own thread.
14559 The signal handler returns the thread that it would like to resume
14560 execution (this is often the thread that it was passed).  It is an
14561 error for a signal handler to raise an exception that is not handled
14562 within the signal handler itself.
14563
14564 A signal handler is never invoked while the running thread is in a
14565 critical section (see <:MLtonThread:>).  Invoking a signal handler
14566 implicitly enters a critical section and the normal return of a signal
14567 handler implicitly exits the critical section; hence, a signal handler
14568 is never interrupted by another signal handler.
14569
14570 * `type t`
14571 +
14572 the type of signals.
14573
14574 * `type Handler.t`
14575 +
14576 the type of signal handlers.
14577
14578 * `Handler.default`
14579 +
14580 handles the signal with the default action.
14581
14582 * `Handler.handler f`
14583 +
14584 returns a handler `h` such that when a signal `s` is handled by `h`,
14585 `f` will be passed the thread that was interrupted by `s` and should
14586 return the thread that will resume execution.
14587
14588 * `Handler.ignore`
14589 +
14590 is a handler that will ignore the signal.
14591
14592 * `Handler.isDefault`
14593 +
14594 returns true if the handler is the default handler.
14595
14596 * `Handler.isIgnore`
14597 +
14598 returns true if the handler is the ignore handler.
14599
14600 * `Handler.simple f`
14601 +
14602 returns a handler that executes `f ()` and does not switch threads.
14603
14604 * `type Mask.t`
14605 +
14606 the type of signal masks, which are sets of blocked signals.
14607
14608 * `Mask.all`
14609 +
14610 a mask of all signals.
14611
14612 * `Mask.allBut l`
14613 +
14614 a mask of all signals except for those in `l`.
14615
14616 * `Mask.block m`
14617 +
14618 blocks all signals in `m`.
14619
14620 * `Mask.getBlocked ()`
14621 +
14622 gets the signal mask `m`, i.e. a signal is blocked if and only if it
14623 is in `m`.
14624
14625 * `Mask.isMember (m, s)`
14626 +
14627 returns true if the signal `s` is in `m`.
14628
14629 * `Mask.none`
14630 +
14631 a mask of no signals.
14632
14633 * `Mask.setBlocked m`
14634 +
14635 sets the signal mask to `m`, i.e. a signal is blocked if and only if
14636 it is in `m`.
14637
14638 * `Mask.some l`
14639 +
14640 a mask of the signals in `l`.
14641
14642 * `Mask.unblock m`
14643 +
14644 unblocks all signals in `m`.
14645
14646 * `getHandler s`
14647 +
14648 returns the current handler for signal `s`.
14649
14650 * `handled ()`
14651 +
14652 returns the signal mask `m` corresponding to the currently handled
14653 signals; i.e., a signal is handled if and only if it is in `m`.
14654
14655 * `prof`
14656 +
14657 `SIGPROF`, the profiling signal.
14658
14659 * `restart`
14660 +
14661 dynamically determines the behavior of interrupted system calls; when
14662 `true`, interrupted system calls are restarted; when `false`,
14663 interrupted system calls raise `OS.SysError`.
14664
14665 * `setHandler (s, h)`
14666 +
14667 sets the handler for signal `s` to `h`.
14668
14669 * `suspend m`
14670 +
14671 temporarily sets the signal mask to `m` and suspends until an unmasked
14672 signal is received and handled, at which point `suspend` resets the
14673 mask and returns.
14674
14675 * `vtalrm`
14676 +
14677 `SIGVTALRM`, the signal for virtual timers.
14678
14679
14680 == Interruptible System Calls ==
14681
14682 Signal handling interacts in a non-trivial way with those functions in
14683 the <:BasisLibrary:Basis Library> that correspond directly to
14684 interruptible system calls (a subset of those functions that may raise
14685 `OS.SysError`).  The desire is that these functions should have
14686 predictable semantics.  The principal concerns are:
14687
14688 1. System calls that are interrupted by signals should, by default, be
14689 restarted; the alternative is to raise
14690 +
14691 [source,sml]
14692 ----
14693 OS.SysError (Posix.Error.errorMsg Posix.Error.intr,
14694              SOME Posix.Error.intr)
14695 ----
14696 +
14697 This behavior is determined dynamically by the value of `Signal.restart`.
14698
14699 2. Signal handlers should always get a chance to run (when outside a
14700 critical region).  If a system call is interrupted by a signal, then
14701 the signal handler will run before the call is restarted or
14702 `OS.SysError` is raised; that is, before the `Signal.restart` check.
14703
14704 3. A system call that must be restarted while in a critical section
14705 will be restarted with the handled signals blocked (and the previously
14706 blocked signals remembered).  This encourages the system call to
14707 complete, allowing the program to make progress towards leaving the
14708 critical section where the signal can be handled.  If the system call
14709 completes, the set of blocked signals are restored to those previously
14710 blocked.
14711
14712 <<<
14713
14714 :mlton-guide-page: MLtonStructure
14715 [[MLtonStructure]]
14716 MLtonStructure
14717 ==============
14718
14719 The `MLton` structure contains a lot of functionality that is not
14720 available in the <:BasisLibrary:Basis Library>.  As a warning,
14721 please keep in mind that the `MLton` structure and its
14722 substructures do change from release to release of MLton.
14723
14724 [source,sml]
14725 ----
14726 structure MLton:
14727    sig
14728       val eq: 'a * 'a -> bool
14729       val equal: 'a * 'a -> bool
14730       val hash: 'a -> Word32.word
14731       val isMLton: bool
14732       val share: 'a -> unit
14733       val shareAll: unit -> unit
14734       val size: 'a -> int
14735
14736       structure Array: MLTON_ARRAY
14737       structure BinIO: MLTON_BIN_IO
14738       structure CharArray: MLTON_MONO_ARRAY where type t = CharArray.array
14739                                             where type elem = CharArray.elem
14740       structure CharVector: MLTON_MONO_VECTOR where type t = CharVector.vector
14741                                               where type elem = CharVector.elem
14742       structure Cont: MLTON_CONT
14743       structure Exn: MLTON_EXN
14744       structure Finalizable: MLTON_FINALIZABLE
14745       structure GC: MLTON_GC
14746       structure IntInf: MLTON_INT_INF
14747       structure Itimer: MLTON_ITIMER
14748       structure LargeReal: MLTON_REAL where type t = LargeReal.real
14749       structure LargeWord: MLTON_WORD where type t = LargeWord.word
14750       structure Platform: MLTON_PLATFORM
14751       structure Pointer: MLTON_POINTER
14752       structure ProcEnv: MLTON_PROC_ENV
14753       structure Process: MLTON_PROCESS
14754       structure Profile: MLTON_PROFILE
14755       structure Random: MLTON_RANDOM
14756       structure Real: MLTON_REAL where type t = Real.real
14757       structure Real32: sig
14758                            include MLTON_REAL
14759                            val castFromWord: Word32.word -> t
14760                            val castToWord: t -> Word32.word
14761                         end where type t = Real32.real
14762       structure Real64: sig
14763                            include MLTON_REAL
14764                            val castFromWord: Word64.word -> t
14765                            val castToWord: t -> Word64.word
14766                         end where type t = Real64.real
14767       structure Rlimit: MLTON_RLIMIT
14768       structure Rusage: MLTON_RUSAGE
14769       structure Signal: MLTON_SIGNAL
14770       structure Syslog: MLTON_SYSLOG
14771       structure TextIO: MLTON_TEXT_IO
14772       structure Thread: MLTON_THREAD
14773       structure Vector: MLTON_VECTOR
14774       structure Weak: MLTON_WEAK
14775       structure Word: MLTON_WORD where type t = Word.word
14776       structure Word8: MLTON_WORD where type t = Word8.word
14777       structure Word16: MLTON_WORD where type t = Word16.word
14778       structure Word32: MLTON_WORD where type t = Word32.word
14779       structure Word64: MLTON_WORD where type t = Word64.word
14780       structure Word8Array: MLTON_MONO_ARRAY where type t = Word8Array.array
14781                                              where type elem = Word8Array.elem
14782       structure Word8Vector: MLTON_MONO_VECTOR where type t = Word8Vector.vector
14783                                                where type elem = Word8Vector.elem
14784       structure World: MLTON_WORLD
14785    end
14786 ----
14787
14788
14789 == Substructures ==
14790
14791 * <:MLtonArray:>
14792 * <:MLtonBinIO:>
14793 * <:MLtonCont:>
14794 * <:MLtonExn:>
14795 * <:MLtonFinalizable:>
14796 * <:MLtonGC:>
14797 * <:MLtonIntInf:>
14798 * <:MLtonIO:>
14799 * <:MLtonItimer:>
14800 * <:MLtonMonoArray:>
14801 * <:MLtonMonoVector:>
14802 * <:MLtonPlatform:>
14803 * <:MLtonPointer:>
14804 * <:MLtonProcEnv:>
14805 * <:MLtonProcess:>
14806 * <:MLtonRandom:>
14807 * <:MLtonReal:>
14808 * <:MLtonRlimit:>
14809 * <:MLtonRusage:>
14810 * <:MLtonSignal:>
14811 * <:MLtonSyslog:>
14812 * <:MLtonTextIO:>
14813 * <:MLtonThread:>
14814 * <:MLtonVector:>
14815 * <:MLtonWeak:>
14816 * <:MLtonWord:>
14817 * <:MLtonWorld:>
14818
14819 == Values ==
14820
14821 * `eq (x, y)`
14822 +
14823 returns true if `x` and `y` are equal as pointers.  For simple types
14824 like `char`, `int`, and `word`, this is the same as equals.  For
14825 arrays, datatypes, strings, tuples, and vectors, this is a simple
14826 pointer equality.  The semantics is a bit murky.
14827
14828 * `equal (x, y)`
14829 +
14830 returns true if `x` and `y` are structurally equal.  For equality
14831 types, this is the same as <:PolymorphicEquality:>.  For other types,
14832 it is a conservative approximation of equivalence.
14833
14834 * `hash x`
14835 +
14836 returns a structural hash of `x`.  The hash function is consistent
14837 between execution of the same program, but may not be consistent
14838 between different programs.
14839
14840 * `isMLton`
14841 +
14842 is always `true` in a MLton implementation, and is always `false` in a
14843 stub implementation.
14844
14845 * `share x`
14846 +
14847 maximizes sharing in the heap for the object graph reachable from `x`.
14848
14849 * `shareAll ()`
14850 +
14851 maximizes sharing in the heap by sharing space for equivalent
14852 immutable objects.  A call to `shareAll` performs a major garbage
14853 collection, and takes time proportional to the size of the heap.
14854
14855 * `size x`
14856 +
14857 returns the amount of heap space (in bytes) taken by the value of `x`,
14858 including all objects reachable from `x` by following pointers.  It
14859 takes time proportional to the size of `x`.  See below for an example.
14860
14861
14862 == <!Anchor(size)>Example of `MLton.size` ==
14863
14864 This example, `size.sml`, demonstrates the application of `MLton.size`
14865 to many different kinds of objects.
14866 [source,sml]
14867 ----
14868 sys::[./bin/InclGitFile.py mlton master doc/examples/size/size.sml]
14869 ----
14870
14871 Compile and run as usual.
14872 ----
14873 % mlton size.sml
14874 % ./size
14875 The size of an int list of length 4 is 48 bytes.
14876 The size of a string of length 10 is 24 bytes.
14877 The size of an int array of length 10 is 52 bytes.
14878 The size of a double array of length 10 is 92 bytes.
14879 The size of an array of length 10 of 2-ples of ints is 92 bytes.
14880 The size of a useless function is 0 bytes.
14881 The size of a continuation option ref is 4544 bytes.
14882 13
14883 The size of a continuation option ref is 8 bytes.
14884 ----
14885
14886 Note that sizes are dependent upon the target platform and compiler
14887 optimizations.
14888
14889 <<<
14890
14891 :mlton-guide-page: MLtonSyslog
14892 [[MLtonSyslog]]
14893 MLtonSyslog
14894 ===========
14895
14896 [source,sml]
14897 ----
14898 signature MLTON_SYSLOG =
14899    sig
14900       type openflag
14901
14902       val CONS     : openflag
14903       val NDELAY   : openflag
14904       val NOWAIT   : openflag
14905       val ODELAY   : openflag
14906       val PERROR   : openflag
14907       val PID      : openflag
14908
14909       type facility
14910
14911       val AUTHPRIV : facility
14912       val CRON     : facility
14913       val DAEMON   : facility
14914       val KERN     : facility
14915       val LOCAL0   : facility
14916       val LOCAL1   : facility
14917       val LOCAL2   : facility
14918       val LOCAL3   : facility
14919       val LOCAL4   : facility
14920       val LOCAL5   : facility
14921       val LOCAL6   : facility
14922       val LOCAL7   : facility
14923       val LPR      : facility
14924       val MAIL     : facility
14925       val NEWS     : facility
14926       val SYSLOG   : facility
14927       val USER     : facility
14928       val UUCP     : facility
14929
14930       type loglevel
14931
14932       val EMERG    : loglevel
14933       val ALERT    : loglevel
14934       val CRIT     : loglevel
14935       val ERR      : loglevel
14936       val WARNING  : loglevel
14937       val NOTICE   : loglevel
14938       val INFO     : loglevel
14939       val DEBUG    : loglevel
14940
14941       val closelog: unit -> unit
14942       val log: loglevel * string -> unit
14943       val openlog: string * openflag list * facility -> unit
14944    end
14945 ----
14946
14947 `MLton.Syslog` is a complete interface to the system logging
14948 facilities.  See `man 3 syslog` for more details.
14949
14950 * `closelog ()`
14951 +
14952 closes the connection to the system logger.
14953
14954 * `log (l, s)`
14955 +
14956 logs message `s` at a loglevel `l`.
14957
14958 * `openlog (name, flags, facility)`
14959 +
14960 opens a connection to the system logger. `name` will be prefixed to
14961 each message, and is typically set to the program name.
14962
14963 <<<
14964
14965 :mlton-guide-page: MLtonTextIO
14966 [[MLtonTextIO]]
14967 MLtonTextIO
14968 ===========
14969
14970 [source,sml]
14971 ----
14972 signature MLTON_TEXT_IO = MLTON_IO
14973 ----
14974
14975 See <:MLtonIO:>.
14976
14977 <<<
14978
14979 :mlton-guide-page: MLtonThread
14980 [[MLtonThread]]
14981 MLtonThread
14982 ===========
14983
14984 [source,sml]
14985 ----
14986 signature MLTON_THREAD =
14987    sig
14988       structure AtomicState:
14989          sig
14990             datatype t = NonAtomic | Atomic of int
14991          end
14992
14993       val atomically: (unit -> 'a) -> 'a
14994       val atomicBegin: unit -> unit
14995       val atomicEnd: unit -> unit
14996       val atomicState: unit -> AtomicState.t
14997
14998       structure Runnable:
14999          sig
15000             type t
15001          end
15002
15003       type 'a t
15004
15005       val atomicSwitch: ('a t -> Runnable.t) -> 'a
15006       val new: ('a -> unit) -> 'a t
15007       val prepend: 'a t * ('b -> 'a) -> 'b t
15008       val prepare: 'a t * 'a -> Runnable.t
15009       val switch: ('a t -> Runnable.t) -> 'a
15010    end
15011 ----
15012
15013 `MLton.Thread` provides access to MLton's user-level thread
15014 implementation (i.e. not OS-level threads).  Threads are lightweight
15015 data structures that represent a paused computation.  Runnable threads
15016 are threads that will begin or continue computing when `switch`-ed to.
15017 `MLton.Thread` does not include a default scheduling mechanism, but it
15018 can be used to implement both preemptive and non-preemptive threads.
15019
15020 * `type AtomicState.t`
15021 +
15022 the type of atomic states.
15023
15024
15025 * `atomically f`
15026 +
15027 runs `f` in a critical section.
15028
15029 * `atomicBegin ()`
15030 +
15031 begins a critical section.
15032
15033 * `atomicEnd ()`
15034 +
15035 ends a critical section.
15036
15037 * `atomicState ()`
15038 +
15039 returns the current atomic state.
15040
15041 * `type Runnable.t`
15042 +
15043 the type of threads that can be resumed.
15044
15045 * `type 'a t`
15046 +
15047 the type of threads that expect a value of type `'a`.
15048
15049 * `atomicSwitch f`
15050 +
15051 like `switch`, but assumes an atomic calling context.  Upon
15052 `switch`-ing back to the current thread, an implicit `atomicEnd` is
15053 performed.
15054
15055 * `new f`
15056 +
15057 creates a new thread that, when run, applies `f` to the value given to
15058 the thread.  `f` must terminate by `switch`ing to another thread or
15059 exiting the process.
15060
15061 * `prepend (t, f)`
15062 +
15063 creates a new thread (destroying `t` in the process) that first
15064 applies `f` to the value given to the thread and then continues with
15065 `t`.  This is a constant time operation.
15066
15067 * `prepare (t, v)`
15068 +
15069 prepares a new runnable thread (destroying `t` in the process) that
15070 will evaluate `t` on `v`.
15071
15072 * `switch f`
15073 +
15074 applies `f` to the current thread to get `rt`, and then start running
15075 thread `rt`.  It is an error for `f` to perform another `switch`.  `f`
15076 is guaranteed to run atomically.
15077
15078
15079 == Example of non-preemptive threads ==
15080
15081 [source,sml]
15082 ----
15083 sys::[./bin/InclGitFile.py mlton master doc/examples/thread/non-preemptive-threads.sml]
15084 ----
15085
15086
15087 == Example of preemptive threads ==
15088
15089 [source,sml]
15090 ----
15091 sys::[./bin/InclGitFile.py mlton master doc/examples/thread/preemptive-threads.sml]
15092 ----
15093
15094 <<<
15095
15096 :mlton-guide-page: MLtonVector
15097 [[MLtonVector]]
15098 MLtonVector
15099 ===========
15100
15101 [source,sml]
15102 ----
15103 signature MLTON_VECTOR =
15104    sig
15105       val create: int -> {done: unit -> 'a vector,
15106                           sub: int -> 'a,
15107                           update: int * 'a -> unit}
15108       val unfoldi: int * 'b * (int * 'b -> 'a * 'b) -> 'a vector * 'b
15109    end
15110 ----
15111
15112 * `create n`
15113 +
15114 initiates the construction a vector _v_ of length `n`, returning
15115 functions to manipulate the vector.  The `done` function may be called
15116 to return the created vector; it is an error to call `done` before all
15117 entries have been initialized; it is an error to call `done` after
15118 having called `done`.  The `sub` function may be called to return an
15119 initialized vector entry; it is not an error to call `sub` after
15120 having called `done`.  The `update` function may be called to
15121 initialize a vector entry; it is an error to call `update` after
15122 having called `done`.  One must initialize vector entries in order
15123 from lowest to highest; that is, before calling `update (i, x)`, one
15124 must have already called `update (j, x)` for all `j` in `[0, i)`.  The
15125 `done`, `sub`, and `update` functions are all constant-time
15126 operations.
15127
15128 * `unfoldi (n, b, f)`
15129 +
15130 constructs a vector _v_ of length `n`, whose elements __v~i~__ are
15131 determined by the equations __b~0~ = b__ and
15132 __(v~i~, b~i+1~) = f (i, b~i~)__.
15133
15134 <<<
15135
15136 :mlton-guide-page: MLtonWeak
15137 [[MLtonWeak]]
15138 MLtonWeak
15139 =========
15140
15141 [source,sml]
15142 ----
15143 signature MLTON_WEAK =
15144    sig
15145       type 'a t
15146
15147       val get: 'a t -> 'a option
15148       val new: 'a -> 'a t
15149    end
15150 ----
15151
15152 A weak pointer is a pointer to an object that is nulled if the object
15153 becomes <:Reachability:unreachable> due to garbage collection.  The
15154 weak pointer does not itself cause the object it points to be retained
15155 by the garbage collector -- only other strong pointers can do that.
15156 For objects that are not allocated in the heap, like integers, a weak
15157 pointer will always be nulled.  So, if `w: int Weak.t`, then
15158 `Weak.get w = NONE`.
15159
15160 * `type 'a t`
15161 +
15162 the type of weak pointers to objects of type `'a`
15163
15164 * `get w`
15165 +
15166 returns `NONE` if the object pointed to by `w` no longer exists.
15167 Otherwise, returns `SOME` of the object pointed to by `w`.
15168
15169 * `new x`
15170 +
15171 returns a weak pointer to `x`.
15172
15173 <<<
15174
15175 :mlton-guide-page: MLtonWord
15176 [[MLtonWord]]
15177 MLtonWord
15178 =========
15179
15180 [source,sml]
15181 ----
15182 signature MLTON_WORD =
15183    sig
15184       type t
15185
15186       val bswap: t -> t
15187       val rol: t * word -> t
15188       val ror: t * word -> t
15189    end
15190 ----
15191
15192 * `type t`
15193 +
15194 the type of words.  For `MLton.LargeWord` this is `LargeWord.word`,
15195 for `MLton.Word` this is `Word.word`, for `MLton.Word8` this is
15196 `Word8.word`, for `MLton.Word16` this is `Word16.word`, for
15197 `MLton.Word32` this is `Word32.word`, for `MLton.Word64` this is
15198 `Word64.word`.
15199
15200 * `bswap w`
15201 +
15202 byte swap.
15203
15204 * `rol (w, w')`
15205 +
15206 rotates left (circular).
15207
15208 * `ror (w, w')`
15209 +
15210 rotates right (circular).
15211
15212 <<<
15213
15214 :mlton-guide-page: MLtonWorld
15215 [[MLtonWorld]]
15216 MLtonWorld
15217 ==========
15218
15219 [source,sml]
15220 ----
15221 signature MLTON_WORLD =
15222    sig
15223       datatype status = Clone | Original
15224
15225       val load: string -> 'a
15226       val save: string -> status
15227       val saveThread: string * Thread.Runnable.t -> unit
15228    end
15229 ----
15230
15231 * `datatype status`
15232 +
15233 specifies whether a world is original or restarted (a clone).
15234
15235 * `load f`
15236 +
15237 loads the saved computation from file `f`.
15238
15239 * `save f`
15240 +
15241 saves the entire state of the computation to the file `f`.  The
15242 computation can then be restarted at a later time using `World.load`
15243 or the `load-world` <:RunTimeOptions:runtime option>.  The call to
15244 `save` in the original computation returns `Original` and the call in
15245 the restarted world returns `Clone`.
15246
15247 * `saveThread (f, rt)`
15248 +
15249 saves the entire state of the computation to the file `f` that will
15250 resume with thread `rt` upon restart.
15251
15252
15253 == Notes ==
15254
15255 <!Anchor(ASLR)>
15256 Executables that save and load worlds are incompatible with
15257 http://en.wikipedia.org/wiki/Address_space_layout_randomization[address space layout randomization (ASLR)]
15258 of the executable (though, not of shared libraries).  The state of a
15259 computation includes addresses into the code and data segments of the
15260 executable (e.g., static runtime-system data, return addresses); such
15261 addresses are invalid when interpreted by the executable loaded at a
15262 different base address.
15263
15264 Executables that save and load worlds should be compiled with an
15265 option to suppress the generation of position-independent executables.
15266
15267 * <:RunningOnDarwin:Darwin 11 (Mac OS X Lion) and higher> : `-link-opt -fno-PIE`
15268
15269
15270 == Example ==
15271
15272 Suppose that `save-world.sml` contains the following.
15273 [source,sml]
15274 ----
15275 sys::[./bin/InclGitFile.py mlton master doc/examples/save-world/save-world.sml]
15276 ----
15277
15278 Then, if we compile `save-world.sml` and run it, the `Original`
15279 branch will execute, and a file named `world` will be created.
15280 ----
15281 % mlton save-world.sml
15282 % ./save-world
15283 I am the original
15284 ----
15285
15286 We can then load `world` using the `load-world`
15287 <:RunTimeOptions:run time option>.
15288 ----
15289 % ./save-world @MLton load-world world --
15290 I am the clone
15291 ----
15292
15293 <<<
15294
15295 :mlton-guide-page: MLULex
15296 [[MLULex]]
15297 MLULex
15298 ======
15299
15300 http://smlnj-gforge.cs.uchicago.edu/projects/ml-lpt/[MLULex] is a
15301 scanner generator for <:StandardML:Standard ML>.
15302
15303 == Also see ==
15304
15305 * <:MLAntlr:>
15306 * <:MLLPTLibrary:>
15307 * <!Cite(OwensEtAl09)>
15308
15309 <<<
15310
15311 :mlton-guide-page: MLYacc
15312 [[MLYacc]]
15313 MLYacc
15314 ======
15315
15316 <:MLYacc:> is a parser generator for <:StandardML:Standard ML> modeled
15317 after the Yacc parser generator.
15318
15319 A version of MLYacc, ported from the <:SMLNJ:SML/NJ> sources, is
15320 distributed with MLton.
15321
15322 == Also see ==
15323
15324 * <!Attachment(Documentation,mlyacc.pdf)>
15325 * <:MLLex:>
15326 * <!Cite(TarditiAppel00)>
15327 * <!Cite(Price09)>
15328
15329 <<<
15330
15331 :mlton-guide-page: Monomorphise
15332 [[Monomorphise]]
15333 Monomorphise
15334 ============
15335
15336 <:Monomorphise:> is a translation pass from the <:XML:>
15337 <:IntermediateLanguage:> to the <:SXML:> <:IntermediateLanguage:>.
15338
15339 == Description ==
15340
15341 Monomorphisation eliminates polymorphic values and datatype
15342 declarations by duplicating them for each type at which they are used.
15343
15344 Consider the following <:XML:> program.
15345 [source,sml]
15346 ----
15347 datatype 'a t = T of 'a
15348 fun 'a f (x: 'a) = T x
15349 val a = f 1
15350 val b = f 2
15351 val z = f (3, 4)
15352 ----
15353
15354 The result of monomorphising this program is the following <:SXML:> program:
15355 [source,sml]
15356 ----
15357 datatype t1 = T1 of int
15358 datatype t2 = T2 of int * int
15359 fun f1 (x: int) = T1 x
15360 fun f2 (x: int * int) = T2 x
15361 val a = f1 1
15362 val b = f1 2
15363 val z = f2 (3, 4)
15364 ----
15365
15366 == Implementation ==
15367
15368 * <!ViewGitFile(mlton,master,mlton/xml/monomorphise.sig)>
15369 * <!ViewGitFile(mlton,master,mlton/xml/monomorphise.fun)>
15370
15371 == Details and Notes ==
15372
15373 The monomorphiser works by making one pass over the entire program.
15374 On the way down, it creates a cache for each variable declared in a
15375 polymorphic declaration that maps a lists of type arguments to a new
15376 variable name.  At a variable reference, it consults the cache (based
15377 on the types the variable is applied to).  If there is already an
15378 entry in the cache, it is used.  If not, a new entry is created.  On
15379 the way up, the monomorphiser duplicates a variable declaration for
15380 each entry in the cache.
15381
15382 As with variables, the monomorphiser records all of the type at which
15383 constructors are used.  After the entire program is processed, the
15384 monomorphiser duplicates each datatype declaration and its associated
15385 constructors.
15386
15387 The monomorphiser duplicates all of the functions declared in a
15388 `fun` declaration as a unit.  Consider the following program
15389 [source,sml]
15390 ----
15391 fun 'a f (x: 'a) = g x
15392 and g (y: 'a) = f y
15393 val a = f 13
15394 val b = g 14
15395 val c = f (1, 2)
15396 ----
15397
15398 and its monomorphisation
15399
15400 [source,sml]
15401 ----
15402 fun f1 (x: int) = g1 x
15403 and g1 (y: int) = f1 y
15404 fun f2 (x : int * int) = g2 x
15405 and g2 (y : int * int) = f2 y
15406 val a = f1 13
15407 val b = g1 14
15408 val c = f2 (1, 2)
15409 ----
15410
15411 == Pathological datatype declarations ==
15412
15413 SML allows a pathological polymorphic datatype declaration in which
15414 recursive uses of the defined type constructor are applied to
15415 different type arguments than the definition.  This has been
15416 disallowed by others on type theoretic grounds.  A canonical example
15417 is the following.
15418 [source,sml]
15419 ----
15420 datatype 'a t = A of 'a | B of ('a * 'a) t
15421 val z : int t = B (B (A ((1, 2), (3, 4))))
15422 ----
15423
15424 The presence of the recursion in the datatype declaration might appear
15425 to cause the need for the monomorphiser to create an infinite number
15426 of types.  However, due to the absence of polymorphic recursion in
15427 SML, there are in fact only a finite number of instances of such types
15428 in any given program.  The monomorphiser translates the above program
15429 to the following one.
15430 [source,sml]
15431 ----
15432 datatype t1 = B1 of t2
15433 datatype t2 = B2 of t3
15434 datatype t3 = A3 of (int * int) * (int * int)
15435 val z : int t = B1 (B2 (A3 ((1, 2), (3, 4))))
15436 ----
15437
15438 It is crucial that the monomorphiser be allowed to drop unused
15439 constructors from datatype declarations in order for the translation
15440 to terminate.
15441
15442 <<<
15443
15444 :mlton-guide-page: MoscowML
15445 [[MoscowML]]
15446 MoscowML
15447 ========
15448
15449 http://mosml.org[Moscow ML] is a
15450 <:StandardMLImplementations:Standard ML implementation>.  It is a
15451 byte-code compiler, so it compiles code quickly, but the code runs
15452 slowly.  See <:Performance:>.
15453
15454 <<<
15455
15456 :mlton-guide-page: Multi
15457 [[Multi]]
15458 Multi
15459 =====
15460
15461 <:Multi:> is an analysis pass for the <:SSA:>
15462 <:IntermediateLanguage:>, invoked from <:ConstantPropagation:> and
15463 <:LocalRef:>.
15464
15465 == Description ==
15466
15467 This pass analyzes the control flow of a <:SSA:> program to determine
15468 which <:SSA:> functions and blocks might be executed more than once or
15469 by more than one thread.  It also determines when a program uses
15470 threads and when functions and blocks directly or indirectly invoke
15471 `Thread_copyCurrent`.
15472
15473 == Implementation ==
15474
15475 * <!ViewGitFile(mlton,master,mlton/ssa/multi.sig)>
15476 * <!ViewGitFile(mlton,master,mlton/ssa/multi.fun)>
15477
15478 == Details and Notes ==
15479
15480 {empty}
15481
15482 <<<
15483
15484 :mlton-guide-page: Mutable
15485 [[Mutable]]
15486 Mutable
15487 =======
15488
15489 Mutable is an adjective meaning "can be modified".  In
15490 <:StandardML:Standard ML>, ref cells and arrays are mutable, while all
15491 other values are <:Immutable:immutable>.
15492
15493 <<<
15494
15495 :mlton-guide-page: NeedsReview
15496 [[NeedsReview]]
15497 NeedsReview
15498 ===========
15499
15500 This page documents some patches and bug fixes that need additional review by experienced developers:
15501
15502 * Bug in transparent signature match:
15503 ** What is an 'original' interface and why does the equivalence of original interfaces implies the equivalence of the actual interfaces?
15504 ** http://www.mlton.org/pipermail/mlton/2007-September/029991.html
15505 ** http://www.mlton.org/pipermail/mlton/2007-September/029995.html
15506 ** SVN Revision: <!ViewSVNRev(6046)>
15507
15508 * Bug in <:DeepFlatten:> pass:
15509 ** Should we allow argument to `Weak_new` to be flattened?
15510 ** SVN Revision: <!ViewSVNRev(6189)> (regression test demonstrating bug)
15511 ** SVN Revision: <!ViewSVNRev(6191)>
15512
15513 <<<
15514
15515 :mlton-guide-page: NumericLiteral
15516 [[NumericLiteral]]
15517 NumericLiteral
15518 ==============
15519
15520 Numeric literals in <:StandardML:Standard ML> can be written in either
15521 decimal or hexadecimal notation.  Sometimes it can be convenient to
15522 write numbers down in other bases.  Fortunately, using <:Fold:>, it is
15523 possible to define a concise syntax for numeric literals that allows
15524 one to write numeric constants in any base and of various types
15525 (`int`, `IntInf.int`, `word`, and more).
15526
15527 We will define constants `I`, `II`, `W`, and +`+ so
15528 that, for example,
15529 [source,sml]
15530 ----
15531 I 10 `1`2`3 $
15532 ----
15533 denotes `123:int` in base 10, while
15534 [source,sml]
15535 ----
15536 II 8 `2`3 $
15537 ----
15538 denotes `19:IntInf.int` in base 8, and
15539 [source,sml]
15540 ----
15541 W 2 `1`1`0`1 $
15542 ----
15543 denotes `0w13: word`.
15544
15545 Here is the code.
15546
15547 [source,sml]
15548 ----
15549 structure Num =
15550    struct
15551       fun make (op *, op +, i2x) iBase =
15552           let
15553              val xBase = i2x iBase
15554           in
15555              Fold.fold
15556                 ((i2x 0,
15557                   fn (i, x) =>
15558                      if 0 <= i andalso i < iBase then
15559                         x * xBase + i2x i
15560                      else
15561                         raise Fail (concat
15562                                        ["Num: ", Int.toString i,
15563                                         " is not a valid\
15564                                         \ digit in base ",
15565                                         Int.toString iBase])),
15566                  fst)
15567           end
15568
15569       fun I  ? = make (op *, op +, id) ?
15570       fun II ? = make (op *, op +, IntInf.fromInt) ?
15571       fun W  ? = make (op *, op +, Word.fromInt) ?
15572
15573       fun ` ? = Fold.step1 (fn (i, (x, step)) =>
15574                                (step (i, x), step)) ?
15575
15576       val a = 10
15577       val b = 11
15578       val c = 12
15579       val d = 13
15580       val e = 14
15581       val f = 15
15582    end
15583 ----
15584 where
15585 [source,sml]
15586 ----
15587 fun fst (x, _) = x
15588 ----
15589
15590 The idea is for the fold to start with zero and to construct the
15591 result one digit at a time, with each stepper multiplying the previous
15592 result by the base and adding the next digit.  The code is abstracted
15593 in two different ways for extra generality.  First, the `make`
15594 function abstracts over the various primitive operations (addition,
15595 multiplication, etc) that are needed to construct a number.  This
15596 allows the same code to be shared for constants `I`, `II`, `W` used to
15597 write down the various numeric types.  It also allows users to add new
15598 constants for additional numeric types, by supplying the necessary
15599 arguments to make.
15600
15601 Second, the step function, +&grave;+, is abstracted over the actual
15602 construction operation, which is created by make, and passed along the
15603 fold.  This allows the same constant, +&grave;+, to be used for all
15604 numeric types.  The alternative approach, having a different step
15605 function for each numeric type, would be more painful to use.
15606
15607 On the surface, it appears that the code checks the digits dynamically
15608 to ensure they are valid for the base.  However, MLton will simplify
15609 everything away at compile time, leaving just the final numeric
15610 constant.
15611
15612 <<<
15613
15614 :mlton-guide-page: ObjectOrientedProgramming
15615 [[ObjectOrientedProgramming]]
15616 ObjectOrientedProgramming
15617 =========================
15618
15619 <:StandardML:Standard ML> does not have explicit support for
15620 object-oriented programming.  Here are some papers that show how to
15621 express certain object-oriented concepts in SML.
15622
15623 * <!Cite(Berthomieu00, OO Programming styles in ML)>
15624
15625 * <!Cite(ThorupTofte94, Object-oriented programming and Standard ML)>
15626
15627 * <!Cite(LarsenNiss04, mGTK: An SML binding of Gtk+)>
15628
15629 * <!Cite(FluetPucella06, Phantom Types and Subtyping)>
15630
15631 The question of OO programming in SML comes up every now and then.
15632 The following discusses a simple object-oriented (OO) programming
15633 technique in Standard ML.  The reader is assumed to be able to read
15634 Java and SML code.
15635
15636
15637 == Motivation ==
15638
15639 SML doesn't provide subtyping, but it does provide parametric
15640 polymorphism, which can be used to encode some forms of subtyping.
15641 Most articles on OO programming in SML concentrate on such encoding
15642 techniques.  While those techniques are interesting -- and it is
15643 recommended to read such articles -- and sometimes useful, it seems
15644 that basically all OO gurus agree that (deep) subtyping (or
15645 inheritance) hierarchies aren't as practical as they were thought to
15646 be in the early OO days.  "Good", flexible, "OO" designs tend to have
15647 a flat structure
15648
15649 ----
15650          Interface
15651              ^
15652              |
15653 - - -+-------+-------+- - -
15654      |       |       |
15655    ImplA   ImplB   ImplC
15656 ----
15657
15658
15659 and deep inheritance hierarchies
15660
15661 ----
15662 ClassA
15663   ^
15664   |
15665 ClassB
15666   ^
15667   |
15668 ClassC
15669   ^
15670   |
15671 ----
15672
15673 tend to be signs of design mistakes.  There are good underlying
15674 reasons for this, but a thorough discussion is not in the scope of
15675 this article.  However, the point is that perhaps the encoding of
15676 subtyping is not as important as one might believe.  In the following
15677 we ignore subtyping and rather concentrate on a very simple and basic
15678 dynamic dispatch technique.
15679
15680
15681 == Dynamic Dispatch Using a Recursive Record of Functions ==
15682
15683 Quite simply, the basic idea is to implement a "virtual function
15684 table" using a record that is wrapped inside a (possibly recursive)
15685 datatype.  Let's first take a look at a simple concrete example.
15686
15687 Consider the following Java interface:
15688
15689 ----
15690 public interface Counter {
15691   public void inc();
15692   public int get();
15693 }
15694 ----
15695
15696 We can translate the `Counter` interface to SML as follows:
15697
15698 [source,sml]
15699 ----
15700 datatype counter = Counter of {inc : unit -> unit, get : unit -> int}
15701 ----
15702
15703 Each value of type `counter` can be thought of as an object that
15704 responds to two messages `inc` and `get`.  To actually send messages
15705 to a counter, it is useful to define auxiliary functions
15706
15707 [source,sml]
15708 ----
15709 local
15710    fun mk m (Counter t) = m t ()
15711 in
15712    val cGet = mk#get
15713    val cInc = mk#inc
15714 end
15715 ----
15716
15717 that basically extract the "function table" `t` from a counter object
15718 and then select the specified method `m` from the table.
15719
15720 Let's then implement a simple function that increments a counter until a
15721 given maximum is reached:
15722
15723 [source,sml]
15724 ----
15725 fun incUpto counter max = while cGet counter < max do cInc counter
15726 ----
15727
15728 You can easily verify that the above code compiles even without any
15729 concrete implementation of a counter, thus it is clear that it doesn't
15730 depend on a particular counter implementation.
15731
15732 Let's then implement a couple of counters.  First consider the
15733 following Java class implementing the `Counter` interface given earlier.
15734
15735 ----
15736 public class BasicCounter implements Counter {
15737   private int cnt;
15738   public BasicCounter(int initialCnt) { this.cnt = initialCnt; }
15739   public void inc() { this.cnt += 1; }
15740   public int get() { return this.cnt; }
15741 }
15742 ----
15743
15744 We can translate the above to SML as follows:
15745
15746 [source,sml]
15747 ----
15748 fun newBasicCounter initialCnt = let
15749        val cnt = ref initialCnt
15750     in
15751        Counter {inc = fn () => cnt := !cnt + 1,
15752                 get = fn () => !cnt}
15753     end
15754 ----
15755
15756 The SML function `newBasicCounter` can be described as a constructor
15757 function for counter objects of the `BasicCounter` "class".  We can
15758 also have other counter implementations.  Here is the constructor for
15759 a counter decorator that logs messages:
15760
15761 [source,sml]
15762 ----
15763 fun newLoggedCounter counter =
15764     Counter {inc = fn () => (print "inc\n" ; cInc counter),
15765              get = fn () => (print "get\n" ; cGet counter)}
15766 ----
15767
15768 The `incUpto` function works just as well with objects of either
15769 class:
15770
15771 [source,sml]
15772 ----
15773 val aCounter = newBasicCounter 0
15774 val () = incUpto aCounter 5
15775 val () = print (Int.toString (cGet aCounter) ^"\n")
15776
15777 val aCounter = newLoggedCounter (newBasicCounter 0)
15778 val () = incUpto aCounter 5
15779 val () = print (Int.toString (cGet aCounter) ^"\n")
15780 ----
15781
15782 In general, a dynamic dispatch interface is represented as a record
15783 type wrapped inside a datatype.  Each field of the record corresponds
15784 to a public method or field of the object:
15785
15786 [source,sml]
15787 ----
15788 datatype interface =
15789    Interface of {method : t1 -> t2,
15790                  immutableField : t,
15791                  mutableField : t ref}
15792 ----
15793
15794 The reason for wrapping the record inside a datatype is that records,
15795 in SML, can not be recursive.  However, SML datatypes can be
15796 recursive.  A record wrapped in a datatype can contain fields that
15797 contain the datatype.  For example, an interface such as `Cloneable`
15798
15799 [source,sml]
15800 ----
15801 datatype cloneable = Cloneable of {clone : unit -> cloneable}
15802 ----
15803
15804 can be represented using recursive datatypes.
15805
15806 Like in OO languages, interfaces are abstract and can not be
15807 instantiated to produce objects.  To be able to instantiate objects,
15808 the constructors of a concrete class are needed.  In SML, we can
15809 implement constructors as simple functions from arbitrary arguments to
15810 values of the interface type.  Such a constructor function can
15811 encapsulate arbitrary private state and functions using lexical
15812 closure.  It is also easy to share implementations of methods between
15813 two or more constructors.
15814
15815 While the `Counter` example is rather trivial, it should not be
15816 difficult to see that this technique quite simply doesn't require a huge
15817 amount of extra verbiage and is more than usable in practice.
15818
15819
15820 == SML Modules and Dynamic Dispatch ==
15821
15822 One might wonder about how SML modules and the dynamic dispatch
15823 technique work together.  Let's investigate!  Let's use a simple
15824 dispenser framework as a concrete example.  (Note that this isn't
15825 intended to be an introduction to the SML module system.)
15826
15827 === Programming with SML Modules ===
15828
15829 Using SML signatures we can specify abstract data types (ADTs) such as
15830 dispensers.  Here is a signature for an "abstract" functional (as
15831 opposed to imperative) dispenser:
15832
15833 [source,sml]
15834 ----
15835 signature ABSTRACT_DISPENSER = sig
15836    type 'a t
15837    val isEmpty : 'a t -> bool
15838    val push : 'a * 'a t -> 'a t
15839    val pop : 'a t -> ('a * 'a t) option
15840 end
15841 ----
15842
15843 The term "abstract" in the name of the signature refers to the fact that
15844 the signature gives no way to instantiate a dispenser.  It has nothing to
15845 do with the concept of abstract data types.
15846
15847 Using SML functors we can write "generic" algorithms that manipulate
15848 dispensers of an unknown type.  Here are a couple of very simple
15849 algorithms:
15850
15851 [source,sml]
15852 ----
15853 functor DispenserAlgs (D : ABSTRACT_DISPENSER) = struct
15854    open D
15855
15856    fun pushAll (xs, d) = foldl push d xs
15857
15858    fun popAll d = let
15859           fun lp (xs, NONE) = rev xs
15860             | lp (xs, SOME (x, d)) = lp (x::xs, pop d)
15861        in
15862           lp ([], pop d)
15863        end
15864
15865    fun cp (from, to) = pushAll (popAll from, to)
15866 end
15867 ----
15868
15869 As one can easily verify, the above compiles even without any concrete
15870 dispenser structure.  Functors essentially provide a form a static
15871 dispatch that one can use to break compile-time dependencies.
15872
15873 We can also give a signature for a concrete dispenser
15874
15875 [source,sml]
15876 ----
15877 signature DISPENSER = sig
15878    include ABSTRACT_DISPENSER
15879    val empty : 'a t
15880 end
15881 ----
15882
15883 and write any number of concrete structures implementing the signature.
15884 For example, we could implement stacks
15885
15886 [source,sml]
15887 ----
15888 structure Stack :> DISPENSER = struct
15889    type 'a t = 'a list
15890    val empty = []
15891    val isEmpty = null
15892    val push = op ::
15893    val pop = List.getItem
15894 end
15895 ----
15896
15897 and queues
15898
15899 [source,sml]
15900 ----
15901 structure Queue :> DISPENSER = struct
15902    datatype 'a t = T of 'a list * 'a list
15903    val empty = T ([], [])
15904    val isEmpty = fn T ([], _) => true | _ => false
15905    val normalize = fn ([], ys) => (rev ys, []) | q => q
15906    fun push (y, T (xs, ys)) = T (normalize (xs, y::ys))
15907    val pop = fn (T (x::xs, ys)) => SOME (x, T (normalize (xs, ys))) | _ => NONE
15908 end
15909 ----
15910
15911 One can now write code that uses either the `Stack` or the `Queue`
15912 dispenser.  One can also instantiate the previously defined functor to
15913 create functions for manipulating dispensers of a type:
15914
15915 [source,sml]
15916 ----
15917 structure S = DispenserAlgs (Stack)
15918 val [4,3,2,1] = S.popAll (S.pushAll ([1,2,3,4], Stack.empty))
15919
15920 structure Q = DispenserAlgs (Queue)
15921 val [1,2,3,4] = Q.popAll (Q.pushAll ([1,2,3,4], Queue.empty))
15922 ----
15923
15924 There is no dynamic dispatch involved at the module level in SML.  An
15925 attempt to do dynamic dispatch
15926
15927 [source,sml]
15928 ----
15929 val q = Q.push (1, Stack.empty)
15930 ----
15931
15932 will give a type error.
15933
15934 === Combining SML Modules and Dynamic Dispatch ===
15935
15936 Let's then combine SML modules and the dynamic dispatch technique
15937 introduced in this article.  First we define an interface for
15938 dispensers:
15939
15940 [source,sml]
15941 ----
15942 structure Dispenser = struct
15943    datatype 'a t =
15944       I of {isEmpty : unit -> bool,
15945             push : 'a -> 'a t,
15946             pop : unit -> ('a * 'a t) option}
15947
15948    fun O m (I t) = m t
15949
15950    fun isEmpty t = O#isEmpty t ()
15951    fun push (v, t) = O#push t v
15952    fun pop t = O#pop t ()
15953 end
15954 ----
15955
15956 The `Dispenser` module, which we can think of as an interface for
15957 dispensers, implements the `ABSTRACT_DISPENSER` signature using
15958 the dynamic dispatch technique, but we leave the signature ascription
15959 until later.
15960
15961 Then we define a `DispenserClass` functor that makes a "class" out of
15962 a given dispenser module:
15963
15964 [source,sml]
15965 ----
15966 functor DispenserClass (D : DISPENSER) : DISPENSER = struct
15967    open Dispenser
15968
15969    fun make d =
15970        I {isEmpty = fn () => D.isEmpty d,
15971           push = fn x => make (D.push (x, d)),
15972           pop = fn () =>
15973                    case D.pop d of
15974                       NONE => NONE
15975                     | SOME (x, d) => SOME (x, make d)}
15976
15977    val empty =
15978        I {isEmpty = fn () => true,
15979           push = fn x => make (D.push (x, D.empty)),
15980           pop = fn () => NONE}
15981 end
15982 ----
15983
15984 Finally we seal the `Dispenser` module:
15985
15986 [source,sml]
15987 ----
15988 structure Dispenser : ABSTRACT_DISPENSER = Dispenser
15989 ----
15990
15991 This isn't necessary for type safety, because the unsealed `Dispenser`
15992 module does not allow one to break encapsulation, but makes sure that
15993 only the `DispenserClass` functor can create dispenser classes
15994 (because the constructor `Dispenser.I` is no longer accessible).
15995
15996 Using the `DispenserClass` functor we can turn any concrete dispenser
15997 module into a dispenser class:
15998
15999 [source,sml]
16000 ----
16001 structure StackClass = DispenserClass (Stack)
16002 structure QueueClass = DispenserClass (Queue)
16003 ----
16004
16005 Each dispenser class implements the same dynamic dispatch interface
16006 and the `ABSTRACT_DISPENSER` -signature.
16007
16008 Because the dynamic dispatch `Dispenser` module implements the
16009 `ABSTRACT_DISPENSER`-signature, we can use it to instantiate the
16010 `DispenserAlgs`-functor:
16011
16012 [source,sml]
16013 ----
16014 structure D = DispenserAlgs (Dispenser)
16015 ----
16016
16017 The resulting `D` module, like the `Dispenser` module, works with
16018 any dispenser class and uses dynamic dispatch:
16019
16020 [source,sml]
16021 ----
16022 val [4, 3, 2, 1] = D.popAll (D.pushAll ([1, 2, 3, 4], StackClass.empty))
16023 val [1, 2, 3, 4] = D.popAll (D.pushAll ([1, 2, 3, 4], QueueClass.empty))
16024 ----
16025
16026 <<<
16027
16028 :mlton-guide-page: OCaml
16029 [[OCaml]]
16030 OCaml
16031 =====
16032
16033 http://caml.inria.fr/[OCaml] is a variant of <:ML:> and is similar to
16034 <:StandardML:Standard ML>.
16035
16036 == OCaml and SML ==
16037
16038 Here's a comparison of some aspects of the OCaml and SML languages.
16039
16040 * Standard ML has a formal <:DefinitionOfStandardML:Definition>, while
16041 OCaml is specified by its lone implementation and informal
16042 documentation.
16043
16044 * Standard ML has a number of <:StandardMLImplementations:compilers>,
16045 while OCaml has only one.
16046
16047 * OCaml has built-in support for object-oriented programming, while
16048 Standard ML does not (however, see <:ObjectOrientedProgramming:>).
16049
16050 * Andreas Rossberg has a
16051 http://www.mpi-sws.org/%7Erossberg/sml-vs-ocaml.html[side-by-side
16052 comparison] of the syntax of SML and OCaml.
16053
16054 * Adam Chlipala has a
16055 http://adam.chlipala.net/mlcomp[point-by-point comparison] of OCaml
16056 and SML.
16057
16058 == OCaml and MLton ==
16059
16060 Here's a comparison of some aspects of OCaml and MLton.
16061
16062 * Performance
16063
16064 ** Both OCaml and MLton have excellent performance.
16065
16066 ** MLton performs extensive <:WholeProgramOptimization:>, which can
16067 provide substantial improvements in large, modular programs.
16068
16069 ** MLton uses native types, like 32-bit integers, without any penalty
16070 due to tagging or boxing.  OCaml uses 31-bit integers with a penalty
16071 due to tagging, and 32-bit integers with a penalty due to boxing.
16072
16073 ** MLton uses native types, like 64-bit floats, without any penalty
16074 due to boxing.  OCaml, in some situations, boxes 64-bit floats.
16075
16076 ** MLton represents arrays of all types unboxed.  In OCaml, only
16077 arrays of 64-bit floats are unboxed, and then only when it is
16078 syntactically apparent.
16079
16080 ** MLton represents records compactly by reordering and packing the
16081 fields.
16082
16083 ** In MLton, polymorphic and monomorphic code have the same
16084 performance.  In OCaml, polymorphism can introduce a performance
16085 penalty.
16086
16087 ** In MLton, module boundaries have no impact on performance.  In
16088 OCaml, moving code between modules can cause a performance penalty.
16089
16090 ** MLton's <:ForeignFunctionInterface:> is simpler than OCaml's.
16091
16092 * Tools
16093
16094 ** OCaml has a debugger, while MLton does not.
16095
16096 ** OCaml supports separate compilation, while MLton does not.
16097
16098 ** OCaml compiles faster than MLton.
16099
16100 ** MLton supports profiling of both time and allocation.
16101
16102 * Libraries
16103
16104 ** OCaml has more available libraries.
16105
16106 * Community
16107
16108 ** OCaml has a larger community than MLton.
16109
16110 ** MLton has a very responsive
16111    http://www.mlton.org/mailman/listinfo/mlton[developer list].
16112
16113 <<<
16114
16115 :mlton-guide-page: OpenGL
16116 [[OpenGL]]
16117 OpenGL
16118 ======
16119
16120 There are at least two interfaces to OpenGL for MLton/SML, both of
16121 which should be considered alpha quality.
16122
16123 * <:MikeThomas:> built a low-level interface, directly translating
16124 many of the functions, covering GL, GLU, and GLUT.  This is available
16125 in the MLton <:Sources:>:
16126 <!ViewGitDir(mltonlib,master,org/mlton/mike/opengl)>.  The code
16127 contains a number of small, standard OpenGL examples translated to
16128 SML.
16129
16130 * <:ChrisClearwater:> has written at least an interface to GL, and
16131 possibly more.  See
16132 ** http://mlton.org/pipermail/mlton/2005-January/026669.html
16133
16134 <:Contact:> us for more information or an update on the status of
16135 these projects.
16136
16137 <<<
16138
16139 :mlton-guide-page: OperatorPrecedence
16140 [[OperatorPrecedence]]
16141 OperatorPrecedence
16142 ==================
16143
16144 <:StandardML:Standard ML> has a built in notion of precedence for
16145 certain symbols.  Every program that includes the
16146 <:BasisLibrary:Basis Library> automatically gets the following infix
16147 declarations.  Higher number indicates higher precedence.
16148
16149 [source,sml]
16150 ----
16151 infix 7 * / mod div
16152 infix 6 + - ^
16153 infixr 5 :: @
16154 infix 4 = <> > >= < <=
16155 infix 3 := o
16156 infix 0 before
16157 ----
16158
16159 <<<
16160
16161 :mlton-guide-page: OptionalArguments
16162 [[OptionalArguments]]
16163 OptionalArguments
16164 =================
16165
16166 <:StandardML:Standard ML> does not have built-in support for optional
16167 arguments.  Nevertheless, using <:Fold:>, it is easy to define
16168 functions that take optional arguments.
16169
16170 For example, suppose that we have the following definition of a
16171 function `f`.
16172
16173 [source,sml]
16174 ----
16175 fun f (i, r, s) =
16176    concat [Int.toString i, ", ", Real.toString r, ", ", s]
16177 ----
16178
16179 Using the `OptionalArg` structure described below, we can define a
16180 function `f'`, an optionalized version of `f`, that takes 0, 1, 2, or
16181 3 arguments.  Embedded within `f'` will be default values for `i`,
16182 `r`, and `s`.  If `f'` gets no arguments, then all the defaults are
16183 used.  If `f'` gets one argument, then that will be used for `i`.  Two
16184 arguments will be used for `i` and `r` respectively.  Three arguments
16185 will override all default values.  Calls to `f'` will look like the
16186 following.
16187
16188 [source,sml]
16189 ----
16190 f' $
16191 f' `2 $
16192 f' `2 `3.0 $
16193 f' `2 `3.0 `"four" $
16194 ----
16195
16196 The optional argument indicator, +&grave;+, is not special syntax ---
16197 it is a normal SML value, defined in the `OptionalArg` structure
16198 below.
16199
16200 Here is the definition of `f'` using the `OptionalArg` structure, in
16201 particular, `OptionalArg.make` and `OptionalArg.D`.
16202
16203 [source,sml]
16204 ----
16205 val f' =
16206    fn z =>
16207    let open OptionalArg in
16208       make (D 1) (D 2.0) (D "three") $
16209    end (fn i & r & s => f (i, r, s))
16210    z
16211 ----
16212
16213 The definition of `f'` is eta expanded as with all uses of fold.  A
16214 call to `OptionalArg.make` is supplied with a variable number of
16215 defaults (in this case, three), the end-of-arguments terminator, `$`,
16216 and the function to run, taking its arguments as an n-ary
16217 <:ProductType:product>.  In this case, the function simply converts
16218 the product to an ordinary tuple and calls `f`.  Often, the function
16219 body will simply be written directly.
16220
16221 In general, the definition of an optional-argument function looks like
16222 the following.
16223
16224 [source,sml]
16225 ----
16226 val f =
16227    fn z =>
16228    let open OptionalArg in
16229       make (D <default1>) (D <default2>) ... (D <defaultn>) $
16230    end (fn x1 & x2 & ... & xn =>
16231         <function code goes here>)
16232    z
16233 ----
16234
16235 Here is the definition of `OptionalArg`.
16236
16237 [source,sml]
16238 ----
16239 structure OptionalArg =
16240    struct
16241       val make =
16242          fn z =>
16243          Fold.fold
16244          ((id, fn (f, x) => f x),
16245           fn (d, r) => fn func =>
16246           Fold.fold ((id, d ()), fn (f, d) =>
16247                      let
16248                         val d & () = r (id, f d)
16249                      in
16250                         func d
16251                      end))
16252          z
16253
16254       fun D d = Fold.step0 (fn (f, r) =>
16255                             (fn ds => f (d & ds),
16256                              fn (f, a & b) => r (fn x => f a & x, b)))
16257
16258       val ` =
16259          fn z =>
16260          Fold.step1 (fn (x, (f, _ & d)) => (fn d => f (x & d), d))
16261          z
16262    end
16263 ----
16264
16265 `OptionalArg.make` uses a nested fold.  The first `fold` accumulates
16266 the default values in a product, associated to the right, and a
16267 reversal function that converts a product (of the same arity as the
16268 number of defaults) from right associativity to left associativity.
16269 The accumulated defaults are used by the second fold, which recurs
16270 over the product, replacing the appropriate component as it encounters
16271 optional arguments.  The second fold also constructs a "fill"
16272 function, `f`, that is used to reconstruct the product once the
16273 end-of-arguments is reached.  Finally, the finisher reconstructs the
16274 product and uses the reversal function to convert the product from
16275 right associative to left associative, at which point it is passed to
16276 the user-supplied function.
16277
16278 Much of the complexity comes from the fact that while recurring over a
16279 product from left to right, one wants it to be right-associative,
16280 e.g., look like
16281
16282 [source,sml]
16283 ----
16284 a & (b & (c & d))
16285 ----
16286
16287 but the user function in the end wants the product to be left
16288 associative, so that the product argument pattern can be written
16289 without parentheses (since `&` is left associative).
16290
16291
16292 == Labelled optional arguments ==
16293
16294 In addition to the positional optional arguments described above, it
16295 is sometimes useful to have labelled optional arguments.  These allow
16296 one to define a function, `f`, with defaults, say `a` and `b`.  Then,
16297 a caller of `f` can supply values for `a` and `b` by name.  If no
16298 value is supplied then the default is used.
16299
16300 Labelled optional arguments are a simple extension of
16301 <:FunctionalRecordUpdate:> using post composition.  Suppose, for
16302 example, that one wants a function `f` with labelled optional
16303 arguments `a` and `b` with default values `0` and `0.0` respectively.
16304 If one has a functional-record-update function `updateAB` for records
16305 with `a` and `b` fields, then one can define `f` in the following way.
16306
16307 [source,sml]
16308 ----
16309 val f =
16310    fn z =>
16311    Fold.post
16312    (updateAB {a = 0, b = 0.0},
16313     fn {a, b} => print (concat [Int.toString a, " ",
16314                                 Real.toString b, "\n"]))
16315    z
16316 ----
16317
16318 The idea is that `f` is the post composition (using `Fold.post`) of
16319 the actual code for the function with a functional-record updater that
16320 starts with the defaults.
16321
16322 Here are some example calls to `f`.
16323 [source,sml]
16324 ----
16325 val () = f $
16326 val () = f (U#a 13) $
16327 val () = f (U#a 13) (U#b 17.5) $
16328 val () = f (U#b 17.5) (U#a 13) $
16329 ----
16330
16331 Notice that a caller can supply neither of the arguments, either of
16332 the arguments, or both of the arguments, and in either order.  All
16333 that matter is that the arguments be labelled correctly (and of the
16334 right type, of course).
16335
16336 Here is another example.
16337
16338 [source,sml]
16339 ----
16340 val f =
16341    fn z =>
16342    Fold.post
16343    (updateBCD {b = 0, c = 0.0, d = "<>"},
16344     fn {b, c, d} =>
16345     print (concat [Int.toString b, " ",
16346                    Real.toString c, " ",
16347                    d, "\n"]))
16348    z
16349 ----
16350
16351 Here are some example calls.
16352
16353 [source,sml]
16354 ----
16355 val () = f $
16356 val () = f (U#d "goodbye") $
16357 val () = f (U#d "hello") (U#b 17) (U#c 19.3) $
16358 ----
16359
16360 <<<
16361
16362 :mlton-guide-page: Overloading
16363 [[Overloading]]
16364 Overloading
16365 ===========
16366
16367 In <:StandardML:Standard ML>, constants (like `13`, `0w13`, `13.0`)
16368 are overloaded, meaning that they can denote a constant of the
16369 appropriate type as determined by context.  SML defines the
16370 overloading classes _Int_, _Real_, and _Word_, which denote the sets
16371 of types that integer, real, and word constants may take on.  In
16372 MLton, these are defined as follows.
16373
16374 [cols="^25%,<75%"]
16375 |=====
16376 | _Int_  | `Int2.int`, `Int3.int`, ... `Int32.int`, `Int64.int`, `Int.int`, `IntInf.int`, `LargeInt.int`, `FixedInt.int`, `Position.int`
16377 | _Real_ | `Real32.real`, `Real64.real`, `Real.real`, `LargeReal.real`
16378 | _Word_ | `Word2.word`, `Word3.word`, ... `Word32.word`, `Word64.word`, `Word.word`, `LargeWord.word`, `SysWord.word`
16379 |=====
16380
16381 The <:DefinitionOfStandardML:Definition> allows flexibility in how
16382 much context is used to resolve overloading.  It says that the context
16383 is _no larger than the smallest enclosing structure-level
16384 declaration_, but that _an implementation may require that a smaller
16385 context determines the type_.  MLton uses the largest possible context
16386 allowed by SML in resolving overloading.  If the type of a constant is
16387 not determined by context, then it takes on a default type.  In MLton,
16388 these are defined as follows.
16389
16390 [cols="^25%,<75%"]
16391 |=====
16392 | _Int_ | `Int.int`
16393 | _Real_ | `Real.real`
16394 | _Word_ | `Word.word`
16395 |=====
16396
16397 Other implementations may use a smaller context or different default
16398 types.
16399
16400 == Also see ==
16401
16402  * http://www.standardml.org/Basis/top-level-chapter.html[discussion of overloading in the Basis Library]
16403
16404 == Examples ==
16405
16406  * The following program is rejected.
16407 +
16408 [source,sml]
16409 ----
16410 structure S:
16411    sig
16412       val x: Word8.word
16413    end =
16414    struct
16415       val x = 0w0
16416    end
16417 ----
16418 +
16419 The smallest enclosing structure declaration for `0w0` is
16420 `val x = 0w0`.  Hence, `0w0` receives the default type for words,
16421 which is `Word.word`.
16422
16423 <<<
16424
16425 :mlton-guide-page: PackedRepresentation
16426 [[PackedRepresentation]]
16427 PackedRepresentation
16428 ====================
16429
16430 <:PackedRepresentation:> is an analysis pass for the <:SSA2:>
16431 <:IntermediateLanguage:>, invoked from <:ToRSSA:>.
16432
16433 == Description ==
16434
16435 This pass analyzes a <:SSA2:> program to compute a packed
16436 representation for each object.
16437
16438 == Implementation ==
16439
16440 * <!ViewGitFile(mlton,master,mlton/backend/representation.sig)>
16441 * <!ViewGitFile(mlton,master,mlton/backend/packed-representation.fun)>
16442
16443 == Details and Notes ==
16444
16445 Has a special case to make sure that `true` is represented as `1` and
16446 `false` is represented as `0`.
16447
16448 <<<
16449
16450 :mlton-guide-page: ParallelMove
16451 [[ParallelMove]]
16452 ParallelMove
16453 ============
16454
16455 <:ParallelMove:> is a rewrite pass, agnostic in the
16456 <:IntermediateLanguage:> which it produces.
16457
16458 == Description ==
16459
16460 This function computes a sequence of individual moves to effect a
16461 parallel move (with possibly overlapping froms and tos).
16462
16463 == Implementation ==
16464
16465 * <!ViewGitFile(mlton,master,mlton/backend/parallel-move.sig)>
16466 * <!ViewGitFile(mlton,master,mlton/backend/parallel-move.fun)>
16467
16468 == Details and Notes ==
16469
16470 {empty}
16471
16472 <<<
16473
16474 :mlton-guide-page: Performance
16475 [[Performance]]
16476 Performance
16477 ===========
16478
16479 This page compares the performance of a number of SML compilers on a
16480 range of benchmarks.
16481
16482 This page compares the following SML compiler versions.
16483
16484 * <:Home:MLton> 20171211 (git 79d4a623c)
16485 * <:MLKit:ML Kit> 4.3.12 (20171210)
16486 * <:MoscowML:Moscow ML> 2.10.1 ++ (git f529b33bb, 20170711)
16487 * <:PolyML:Poly/ML> 5.7.2 Testing (git 5.7.1-35-gcb73407a)
16488 * <:SMLNJ:SML/NJ> 110.81 (20170501)
16489
16490 There are tables for <:#RunTime:run time>, <:#CodeSize:code size>, and
16491 <:#CompileTime:compile time>.
16492
16493
16494 == Setup ==
16495
16496 All benchmarks were compiled and run on a 2.6 GHz Core i7-5600U with 16G of
16497 RAM.  The benchmarks were compiled with the default settings for all
16498 the compilers, except for Moscow ML, which was passed the
16499 `-orthodox -standalone -toplevel` switches.  The Poly/ML executables
16500 were produced using `polyc`.
16501 The SML/NJ executables were produced by wrapping the entire program in
16502 a `local` declaration whose body performs an `SMLofNJ.exportFn`.
16503
16504 For more details, or if you want to run the benchmarks yourself,
16505 please see the <!ViewGitDir(mlton,master,benchmark)> directory of our
16506 <:Sources:>.
16507
16508 All of the benchmarks are available for download from this page.  Some
16509 of the benchmarks were obtained from the SML/NJ benchmark suite.  Some
16510 of the benchmarks expect certain input files to exist in the
16511 <!ViewGitDir(mlton,master,benchmark/tests/DATA)> subdirectory.
16512
16513 * <!RawGitFile(mlton,master,benchmark/tests/hamlet.sml)> <!RawGitFile(mlton,master,benchmark/tests/DATA/hamlet-input.sml)>
16514 * <!RawGitFile(mlton,master,benchmark/tests/ray.sml)> <!RawGitFile(mlton,master,benchmark/tests/DATA/ray)>
16515 * <!RawGitFile(mlton,master,benchmark/tests/raytrace.sml)> <!RawGitFile(mlton,master,benchmark/tests/DATA/chess.gml)>
16516 * <!RawGitFile(mlton,master,benchmark/tests/vliw.sml)> <!RawGitFile(mlton,master,benchmark/tests/DATA/ndotprod.s)>
16517
16518
16519 == <!Anchor(RunTime)>Run-time ratio ==
16520
16521 The following table gives the ratio of the run time of each benchmark
16522 when compiled by another compiler to the run time when compiled by
16523 MLton.  That is, the larger the number, the slower the generated code
16524 runs.  A number larger than one indicates that the corresponding
16525 compiler produces code that runs more slowly than MLton.  A * in an
16526 entry means the compiler failed to compile the benchmark or that the
16527 benchmark failed to run.
16528
16529 [options="header",cols="<2,5*<1"]
16530 |====
16531 |benchmark|MLton|ML-Kit|MosML|Poly/ML|SML/NJ
16532 |<!RawGitFile(mlton,master,benchmark/tests/barnes-hut.sml)>|1.00|10.11|19.36|2.98|1.24
16533 |<!RawGitFile(mlton,master,benchmark/tests/boyer.sml)>|1.00|*|7.87|1.22|1.75
16534 |<!RawGitFile(mlton,master,benchmark/tests/checksum.sml)>|1.00|30.79|*|10.94|9.08
16535 |<!RawGitFile(mlton,master,benchmark/tests/count-graphs.sml)>|1.00|6.51|40.42|2.34|2.32
16536 |<!RawGitFile(mlton,master,benchmark/tests/DLXSimulator.sml)>|1.00|0.97|*|0.60|*
16537 |<!RawGitFile(mlton,master,benchmark/tests/even-odd.sml)>|1.00|0.50|11.50|0.42|0.42
16538 |<!RawGitFile(mlton,master,benchmark/tests/fft.sml)>|1.00|7.35|81.51|4.03|1.19
16539 |<!RawGitFile(mlton,master,benchmark/tests/fib.sml)>|1.00|1.41|10.94|1.25|1.17
16540 |<!RawGitFile(mlton,master,benchmark/tests/flat-array.sml)>|1.00|7.19|68.33|5.28|13.16
16541 |<!RawGitFile(mlton,master,benchmark/tests/hamlet.sml)>|1.00|4.97|22.85|1.58|*
16542 |<!RawGitFile(mlton,master,benchmark/tests/imp-for.sml)>|1.00|4.99|57.84|3.34|4.67
16543 |<!RawGitFile(mlton,master,benchmark/tests/knuth-bendix.sml)>|1.00|*|18.43|3.18|3.06
16544 |<!RawGitFile(mlton,master,benchmark/tests/lexgen.sml)>|1.00|2.76|7.94|3.19|*
16545 |<!RawGitFile(mlton,master,benchmark/tests/life.sml)>|1.00|1.80|20.19|0.89|1.50
16546 |<!RawGitFile(mlton,master,benchmark/tests/logic.sml)>|1.00|5.10|11.06|1.15|1.27
16547 |<!RawGitFile(mlton,master,benchmark/tests/mandelbrot.sml)>|1.00|3.50|25.52|1.33|1.28
16548 |<!RawGitFile(mlton,master,benchmark/tests/matrix-multiply.sml)>|1.00|29.40|183.02|7.41|15.19
16549 |<!RawGitFile(mlton,master,benchmark/tests/md5.sml)>|1.00|95.18|*|32.61|47.47
16550 |<!RawGitFile(mlton,master,benchmark/tests/merge.sml)>|1.00|1.42|*|0.74|3.24
16551 |<!RawGitFile(mlton,master,benchmark/tests/mlyacc.sml)>|1.00|1.83|8.45|0.84|*
16552 |<!RawGitFile(mlton,master,benchmark/tests/model-elimination.sml)>|1.00|4.03|12.42|1.70|2.25
16553 |<!RawGitFile(mlton,master,benchmark/tests/mpuz.sml)>|1.00|3.73|57.44|2.05|3.22
16554 |<!RawGitFile(mlton,master,benchmark/tests/nucleic.sml)>|1.00|3.96|*|1.73|1.20
16555 |<!RawGitFile(mlton,master,benchmark/tests/output1.sml)>|1.00|6.26|30.85|7.82|5.99
16556 |<!RawGitFile(mlton,master,benchmark/tests/peek.sml)>|1.00|9.37|44.78|2.18|2.15
16557 |<!RawGitFile(mlton,master,benchmark/tests/psdes-random.sml)>|1.00|*|*|2.79|3.59
16558 |<!RawGitFile(mlton,master,benchmark/tests/ratio-regions.sml)>|1.00|5.68|165.56|3.92|37.52
16559 |<!RawGitFile(mlton,master,benchmark/tests/ray.sml)>|1.00|12.05|25.08|8.73|1.75
16560 |<!RawGitFile(mlton,master,benchmark/tests/raytrace.sml)>|1.00|*|*|2.11|3.33
16561 |<!RawGitFile(mlton,master,benchmark/tests/simple.sml)>|1.00|2.95|24.03|3.67|1.93
16562 |<!RawGitFile(mlton,master,benchmark/tests/smith-normal-form.sml)>|1.00|*|*|1.04|*
16563 |<!RawGitFile(mlton,master,benchmark/tests/string-concat.sml)>|1.00|1.88|28.01|0.70|2.67
16564 |<!RawGitFile(mlton,master,benchmark/tests/tailfib.sml)>|1.00|1.58|23.57|0.90|1.04
16565 |<!RawGitFile(mlton,master,benchmark/tests/tak.sml)>|1.00|1.69|15.90|1.57|2.01
16566 |<!RawGitFile(mlton,master,benchmark/tests/tensor.sml)>|1.00|*|*|*|2.07
16567 |<!RawGitFile(mlton,master,benchmark/tests/tsp.sml)>|1.00|2.19|66.76|3.27|1.48
16568 |<!RawGitFile(mlton,master,benchmark/tests/tyan.sml)>|1.00|*|19.43|1.08|1.03
16569 |<!RawGitFile(mlton,master,benchmark/tests/vector32-concat.sml)>|1.00|13.85|*|1.80|12.48
16570 |<!RawGitFile(mlton,master,benchmark/tests/vector64-concat.sml)>|1.00|*|*|*|13.92
16571 |<!RawGitFile(mlton,master,benchmark/tests/vector-rev.sml)>|1.00|7.88|68.85|9.39|68.80
16572 |<!RawGitFile(mlton,master,benchmark/tests/vliw.sml)>|1.00|2.46|15.39|1.43|1.55
16573 |<!RawGitFile(mlton,master,benchmark/tests/wc-input1.sml)>|1.00|6.00|*|29.25|9.54
16574 |<!RawGitFile(mlton,master,benchmark/tests/wc-scanStream.sml)>|1.00|80.43|*|19.45|8.71
16575 |<!RawGitFile(mlton,master,benchmark/tests/zebra.sml)>|1.00|4.62|35.56|1.68|9.97
16576 |<!RawGitFile(mlton,master,benchmark/tests/zern.sml)>|1.00|*|*|*|1.60
16577 |====
16578
16579 <!Anchor(SNFNote)>
16580 Note: for SML/NJ, the
16581 <!RawGitFile(mlton,master,benchmark/tests/smith-normal-form.sml)>
16582 benchmark was killed after running for over 51,000 seconds.
16583
16584
16585 == <!Anchor(CodeSize)>Code size ==
16586
16587 The following table gives the code size of each benchmark in bytes.
16588 The size for MLton and the ML Kit is the sum of text and data for the
16589 standalone executable as reported by `size`.  The size for Moscow
16590 ML is the size in bytes of the executable `a.out`.  The size for
16591 Poly/ML is the difference in size of the database before the session
16592 start and after the commit.  The size for SML/NJ is the size of the
16593 heap file created by `exportFn` and does not include the size of
16594 the SML/NJ runtime system (approximately 100K).  A * in an entry means
16595 that the compiler failed to compile the benchmark.
16596
16597 [options="header",cols="<2,5*<1"]
16598 |====
16599 |benchmark|MLton|ML-Kit|MosML|Poly/ML|SML/NJ
16600 |<!RawGitFile(mlton,master,benchmark/tests/barnes-hut.sml)>|180,788|810,267|199,503|148,120|402,480
16601 |<!RawGitFile(mlton,master,benchmark/tests/boyer.sml)>|250,246|*|248,018|196,984|496,664
16602 |<!RawGitFile(mlton,master,benchmark/tests/checksum.sml)>|122,422|225,274|*|106,088|406,560
16603 |<!RawGitFile(mlton,master,benchmark/tests/count-graphs.sml)>|151,878|250,126|187,048|144,032|428,136
16604 |<!RawGitFile(mlton,master,benchmark/tests/DLXSimulator.sml)>|223,073|827,483|*|272,664|*
16605 |<!RawGitFile(mlton,master,benchmark/tests/even-odd.sml)>|122,350|87,586|181,415|106,072|380,928
16606 |<!RawGitFile(mlton,master,benchmark/tests/fft.sml)>|145,008|237,230|186,228|131,400|418,896
16607 |<!RawGitFile(mlton,master,benchmark/tests/fib.sml)>|122,310|87,402|181,312|106,088|380,928
16608 |<!RawGitFile(mlton,master,benchmark/tests/flat-array.sml)>|121,958|104,102|181,464|106,072|394,256
16609 |<!RawGitFile(mlton,master,benchmark/tests/hamlet.sml)>|1,503,849|2,280,691|407,219|2,249,504|*
16610 |<!RawGitFile(mlton,master,benchmark/tests/imp-for.sml)>|122,078|89,346|181,470|106,088|381,952
16611 |<!RawGitFile(mlton,master,benchmark/tests/knuth-bendix.sml)>|193,145|*|192,659|161,080|400,408
16612 |<!RawGitFile(mlton,master,benchmark/tests/lexgen.sml)>|308,296|826,819|213,128|268,272|*
16613 |<!RawGitFile(mlton,master,benchmark/tests/life.sml)>|141,862|721,419|186,463|118,552|384,024
16614 |<!RawGitFile(mlton,master,benchmark/tests/logic.sml)>|211,086|782,667|188,908|198,408|409,624
16615 |<!RawGitFile(mlton,master,benchmark/tests/mandelbrot.sml)>|122,086|700,075|183,037|106,104|386,048
16616 |<!RawGitFile(mlton,master,benchmark/tests/matrix-multiply.sml)>|124,398|280,006|184,328|110,232|416,784
16617 |<!RawGitFile(mlton,master,benchmark/tests/md5.sml)>|150,497|271,794|*|122,624|399,416
16618 |<!RawGitFile(mlton,master,benchmark/tests/merge.sml)>|123,846|100,858|181,542|106,136|381,960
16619 |<!RawGitFile(mlton,master,benchmark/tests/mlyacc.sml)>|678,920|1,233,587|263,721|576,728|*
16620 |<!RawGitFile(mlton,master,benchmark/tests/model-elimination.sml)>|846,779|1,432,283|297,108|777,664|985,304
16621 |<!RawGitFile(mlton,master,benchmark/tests/mpuz.sml)>|124,126|229,078|184,440|114,584|392,232
16622 |<!RawGitFile(mlton,master,benchmark/tests/nucleic.sml)>|298,038|507,186|*|475,808|456,744
16623 |<!RawGitFile(mlton,master,benchmark/tests/output1.sml)>|157,973|699,003|181,680|118,800|380,928
16624 |<!RawGitFile(mlton,master,benchmark/tests/peek.sml)>|156,401|201,138|183,438|110,456|385,072
16625 |<!RawGitFile(mlton,master,benchmark/tests/psdes-random.sml)>|126,486|106,166|*|106,088|393,256
16626 |<!RawGitFile(mlton,master,benchmark/tests/ratio-regions.sml)>|150,174|265,694|190,088|184,536|414,760
16627 |<!RawGitFile(mlton,master,benchmark/tests/ray.sml)>|260,863|736,795|195,064|198,976|512,160
16628 |<!RawGitFile(mlton,master,benchmark/tests/raytrace.sml)>|384,905|*|*|446,424|623,824
16629 |<!RawGitFile(mlton,master,benchmark/tests/simple.sml)>|365,578|895,139|197,765|1,051,952|708,696
16630 |<!RawGitFile(mlton,master,benchmark/tests/smith-normal-form.sml)>|286,474|*|*|262,616|547,984
16631 |<!RawGitFile(mlton,master,benchmark/tests/string-concat.sml)>|119,102|140,626|183,249|106,088|390,160
16632 |<!RawGitFile(mlton,master,benchmark/tests/tailfib.sml)>|122,110|87,890|181,369|106,072|381,952
16633 |<!RawGitFile(mlton,master,benchmark/tests/tak.sml)>|122,246|87,402|181,349|106,088|376,832
16634 |<!RawGitFile(mlton,master,benchmark/tests/tensor.sml)>|186,545|*|*|*|421,984
16635 |<!RawGitFile(mlton,master,benchmark/tests/tsp.sml)>|163,033|722,571|188,634|126,984|393,264
16636 |<!RawGitFile(mlton,master,benchmark/tests/tyan.sml)>|235,449|*|195,401|184,816|478,296
16637 |<!RawGitFile(mlton,master,benchmark/tests/vector32-concat.sml)>|123,790|104,398|*|106,200|394,256
16638 |<!RawGitFile(mlton,master,benchmark/tests/vector64-concat.sml)>|123,846|*|*|*|405,552
16639 |<!RawGitFile(mlton,master,benchmark/tests/vector-rev.sml)>|122,982|104,614|181,534|106,072|394,256
16640 |<!RawGitFile(mlton,master,benchmark/tests/vliw.sml)>|538,074|1,182,851|249,884|580,792|749,752
16641 |<!RawGitFile(mlton,master,benchmark/tests/wc-input1.sml)>|186,152|699,459|191,347|127,200|386,048
16642 |<!RawGitFile(mlton,master,benchmark/tests/wc-scanStream.sml)>|196,232|700,131|191,539|127,232|387,072
16643 |<!RawGitFile(mlton,master,benchmark/tests/zebra.sml)>|230,433|128,354|186,322|127,048|390,184
16644 |<!RawGitFile(mlton,master,benchmark/tests/zern.sml)>|156,902|*|*|*|453,768
16645 |====
16646
16647
16648 == <!Anchor(CompileTime)>Compile time ==
16649
16650 The following table gives the compile time of each benchmark in
16651 seconds.  A * in an entry means that the compiler failed to compile
16652 the benchmark.
16653
16654 [options="header",cols="<2,5*<1"]
16655 |====
16656 |benchmark|MLton|ML-Kit|MosML|Poly/ML|SML/NJ
16657 |<!RawGitFile(mlton,master,benchmark/tests/barnes-hut.sml)>|2.70|0.89|0.15|0.29|0.20
16658 |<!RawGitFile(mlton,master,benchmark/tests/boyer.sml)>|2.87|*|0.14|0.20|0.41
16659 |<!RawGitFile(mlton,master,benchmark/tests/checksum.sml)>|2.21|0.24|*|0.07|0.05
16660 |<!RawGitFile(mlton,master,benchmark/tests/count-graphs.sml)>|2.28|0.34|0.04|0.11|0.21
16661 |<!RawGitFile(mlton,master,benchmark/tests/DLXSimulator.sml)>|2.93|1.01|*|0.27|*
16662 |<!RawGitFile(mlton,master,benchmark/tests/even-odd.sml)>|2.23|0.20|0.01|0.07|0.04
16663 |<!RawGitFile(mlton,master,benchmark/tests/fft.sml)>|2.35|0.28|0.03|0.09|0.10
16664 |<!RawGitFile(mlton,master,benchmark/tests/fib.sml)>|2.16|0.19|0.01|0.07|0.04
16665 |<!RawGitFile(mlton,master,benchmark/tests/flat-array.sml)>|2.16|0.20|0.01|0.07|0.04
16666 |<!RawGitFile(mlton,master,benchmark/tests/hamlet.sml)>|12.28|19.25|23.75|6.44|*
16667 |<!RawGitFile(mlton,master,benchmark/tests/imp-for.sml)>|2.14|0.20|0.01|0.08|0.04
16668 |<!RawGitFile(mlton,master,benchmark/tests/knuth-bendix.sml)>|2.48|*|0.08|0.14|0.23
16669 |<!RawGitFile(mlton,master,benchmark/tests/lexgen.sml)>|3.31|0.75|0.15|0.22|*
16670 |<!RawGitFile(mlton,master,benchmark/tests/life.sml)>|2.25|0.32|0.03|0.09|0.10
16671 |<!RawGitFile(mlton,master,benchmark/tests/logic.sml)>|2.72|0.57|0.07|0.17|0.21
16672 |<!RawGitFile(mlton,master,benchmark/tests/mandelbrot.sml)>|2.14|0.24|0.01|0.07|0.04
16673 |<!RawGitFile(mlton,master,benchmark/tests/matrix-multiply.sml)>|2.14|0.24|0.01|0.08|0.05
16674 |<!RawGitFile(mlton,master,benchmark/tests/md5.sml)>|2.31|0.39|*|0.12|0.27
16675 |<!RawGitFile(mlton,master,benchmark/tests/merge.sml)>|2.15|0.21|0.01|0.07|0.04
16676 |<!RawGitFile(mlton,master,benchmark/tests/mlyacc.sml)>|7.07|4.53|2.05|0.80|*
16677 |<!RawGitFile(mlton,master,benchmark/tests/model-elimination.sml)>|6.78|4.76|1.20|1.65|4.78
16678 |<!RawGitFile(mlton,master,benchmark/tests/mpuz.sml)>|2.14|0.28|0.02|0.08|0.07
16679 |<!RawGitFile(mlton,master,benchmark/tests/nucleic.sml)>|3.96|2.12|*|0.37|0.49
16680 |<!RawGitFile(mlton,master,benchmark/tests/output1.sml)>|2.30|0.22|0.01|0.07|0.04
16681 |<!RawGitFile(mlton,master,benchmark/tests/peek.sml)>|2.26|0.20|0.01|0.07|0.04
16682 |<!RawGitFile(mlton,master,benchmark/tests/psdes-random.sml)>|2.12|0.22|*|9.83|12.55
16683 |<!RawGitFile(mlton,master,benchmark/tests/ratio-regions.sml)>|2.59|0.47|0.07|0.16|0.24
16684 |<!RawGitFile(mlton,master,benchmark/tests/ray.sml)>|2.95|0.46|0.05|0.17|0.14
16685 |<!RawGitFile(mlton,master,benchmark/tests/raytrace.sml)>|3.93|*|*|0.45|0.74
16686 |<!RawGitFile(mlton,master,benchmark/tests/simple.sml)>|3.42|1.23|0.30|0.32|0.53
16687 |<!RawGitFile(mlton,master,benchmark/tests/smith-normal-form.sml)>|3.23|*|*|0.15|0.32
16688 |<!RawGitFile(mlton,master,benchmark/tests/string-concat.sml)>|2.25|0.28|0.01|0.08|0.05
16689 |<!RawGitFile(mlton,master,benchmark/tests/tailfib.sml)>|2.24|0.21|0.01|0.08|0.05
16690 |<!RawGitFile(mlton,master,benchmark/tests/tak.sml)>|2.23|0.20|0.01|0.08|0.05
16691 |<!RawGitFile(mlton,master,benchmark/tests/tensor.sml)>|2.73|*|*|*|0.44
16692 |<!RawGitFile(mlton,master,benchmark/tests/tsp.sml)>|2.42|0.38|0.05|0.11|0.11
16693 |<!RawGitFile(mlton,master,benchmark/tests/tyan.sml)>|2.93|*|0.10|0.27|0.31
16694 |<!RawGitFile(mlton,master,benchmark/tests/vector32-concat.sml)>|2.23|0.22|*|0.07|0.04
16695 |<!RawGitFile(mlton,master,benchmark/tests/vector64-concat.sml)>|2.18|*|*|*|0.04
16696 |<!RawGitFile(mlton,master,benchmark/tests/vector-rev.sml)>|2.23|0.22|0.01|0.08|0.05
16697 |<!RawGitFile(mlton,master,benchmark/tests/vliw.sml)>|5.25|2.93|0.63|0.94|1.85
16698 |<!RawGitFile(mlton,master,benchmark/tests/wc-input1.sml)>|2.46|0.24|0.01|0.08|0.05
16699 |<!RawGitFile(mlton,master,benchmark/tests/wc-scanStream.sml)>|2.61|0.25|0.01|0.08|0.05
16700 |<!RawGitFile(mlton,master,benchmark/tests/zebra.sml)>|2.99|0.35|0.03|0.09|0.11
16701 |<!RawGitFile(mlton,master,benchmark/tests/zern.sml)>|2.31|*|*|*|0.11
16702 |====
16703
16704 <<<
16705
16706 :mlton-guide-page: PhantomType
16707 [[PhantomType]]
16708 PhantomType
16709 ===========
16710
16711 A phantom type is a type that has no run-time representation, but is
16712 used to force the type checker to ensure invariants at compile time.
16713 This is done by augmenting a type with additional arguments (phantom
16714 type variables) and expressing constraints by choosing phantom types
16715 to stand for the phantom types in the types of values.
16716
16717 == Also see ==
16718
16719 * <!Cite(Blume01)>
16720 ** dimensions
16721 ** C type system
16722 * <!Cite(FluetPucella06)>
16723 ** subtyping
16724 * socket module in <:BasisLibrary:Basis Library>
16725
16726 <<<
16727
16728 :mlton-guide-page: PlatformSpecificNotes
16729 [[PlatformSpecificNotes]]
16730 PlatformSpecificNotes
16731 =====================
16732
16733 Here are notes about using MLton on the following platforms.
16734
16735 == Operating Systems ==
16736
16737 * <:RunningOnAIX:AIX>
16738 * <:RunningOnCygwin:Cygwin>
16739 * <:RunningOnDarwin:Darwin>
16740 * <:RunningOnFreeBSD:FreeBSD>
16741 * <:RunningOnHPUX:HPUX>
16742 * <:RunningOnLinux:Linux>
16743 * <:RunningOnMinGW:MinGW>
16744 * <:RunningOnNetBSD:NetBSD>
16745 * <:RunningOnOpenBSD:OpenBSD>
16746 * <:RunningOnSolaris:Solaris>
16747
16748 == Architectures ==
16749
16750 * <:RunningOnAMD64:AMD64>
16751 * <:RunningOnHPPA:HPPA>
16752 * <:RunningOnPowerPC:PowerPC>
16753 * <:RunningOnPowerPC64:PowerPC64>
16754 * <:RunningOnSparc:Sparc>
16755 * <:RunningOnX86:X86>
16756
16757 == Also see ==
16758
16759 * <:PortingMLton:>
16760
16761 <<<
16762
16763 :mlton-guide-page: PolyEqual
16764 [[PolyEqual]]
16765 PolyEqual
16766 =========
16767
16768 <:PolyEqual:> is an optimization pass for the <:SSA:>
16769 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
16770
16771 == Description ==
16772
16773 This pass implements polymorphic equality.
16774
16775 == Implementation ==
16776
16777 * <!ViewGitFile(mlton,master,mlton/ssa/poly-equal.fun)>
16778
16779 == Details and Notes ==
16780
16781 For each datatype, tycon, and vector type, it builds and equality
16782 function and translates calls to `MLton_equal` into calls to that
16783 function.
16784
16785 Also generates calls to `Word_equal`.
16786
16787 For tuples, it does the equality test inline; i.e., it does not create
16788 a separate equality function for each tuple type.
16789
16790 All equality functions are created only if necessary, i.e., if
16791 equality is actually used at a type.
16792
16793 Optimizations:
16794
16795 * for datatypes that are enumerations, do not build a case dispatch,
16796 just use `MLton_eq`, as the backend will represent these as ints
16797
16798 * deep equality always does an `MLton_eq` test first
16799
16800 * If one argument to `=` is a constant and the type will get
16801 translated to an `IntOrPointer`, then just use `eq` instead of the
16802 full equality.  This is important for implementing code like the
16803 following efficiently:
16804 +
16805 ----
16806 if x = 0  ...    (* where x is of type IntInf.int *)
16807 ----
16808
16809 * Also convert pointer equality on scalar types to type specific
16810 primitives.
16811
16812 <<<
16813
16814 :mlton-guide-page: PolyHash
16815 [[PolyHash]]
16816 PolyHash
16817 ========
16818
16819 <:PolyHash:> is an optimization pass for the <:SSA:>
16820 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
16821
16822 == Description ==
16823
16824 This pass implements polymorphic, structural hashing.
16825
16826 == Implementation ==
16827
16828 * <!ViewGitFile(mlton,master,mlton/ssa/poly-hash.fun)>
16829
16830 == Details and Notes ==
16831
16832 For each datatype, tycon, and vector type, it builds and equality
16833 function and translates calls to `MLton_hash` into calls to that
16834 function.
16835
16836 For tuples, it does the equality test inline; i.e., it does not create
16837 a separate equality function for each tuple type.
16838
16839 All equality functions are created only if necessary, i.e., if
16840 equality is actually used at a type.
16841
16842 <<<
16843
16844 :mlton-guide-page: PolyML
16845 [[PolyML]]
16846 PolyML
16847 ======
16848
16849 http://www.polyml.org/[Poly/ML] is a
16850 <:StandardMLImplementations:Standard ML implementation>.
16851
16852 == Also see ==
16853
16854  * <!Cite(Matthews95)>
16855
16856 <<<
16857
16858 :mlton-guide-page: PolymorphicEquality
16859 [[PolymorphicEquality]]
16860 PolymorphicEquality
16861 ===================
16862
16863 Polymorphic equality is a built-in function in
16864 <:StandardML:Standard ML> that compares two values of the same type
16865 for equality.  It is specified as
16866
16867 [source,sml]
16868 ----
16869 val = : ''a * ''a -> bool
16870 ----
16871
16872 The `''a` in the specification are
16873 <:EqualityTypeVariable:equality type variables>, and indicate that
16874 polymorphic equality can only be applied to values of an
16875 <:EqualityType:equality type>.  It is not allowed in SML to rebind
16876 `=`, so a programmer is guaranteed that `=` always denotes polymorphic
16877 equality.
16878
16879
16880 == Equality of ground types ==
16881
16882 Ground types like `char`, `int`, and `word` may be compared (to values
16883 of the same type).  For example, `13 = 14` is type correct and yields
16884 `false`.
16885
16886
16887 == Equality of reals ==
16888
16889 The one ground type that can not be compared is `real`.  So,
16890 `13.0 = 14.0` is not type correct.  One can use `Real.==` to compare
16891 reals for equality, but beware that this has different algebraic
16892 properties than polymorphic equality.
16893
16894 See http://standardml.org/Basis/real.html for a discussion of why
16895 `real` is not an equality type.
16896
16897
16898 == Equality of functions ==
16899
16900 Comparison of functions is not allowed.
16901
16902
16903 == Equality of immutable types ==
16904
16905 Polymorphic equality can be used on <:Immutable:immutable> values like
16906 tuples, records, lists, and vectors.  For example,
16907
16908 ----
16909 (1, 2, 3) = (4, 5, 6)
16910 ----
16911
16912 is a type-correct expression yielding `false`, while
16913
16914 ----
16915 [1, 2, 3] = [1, 2, 3]
16916 ----
16917
16918 is type correct and yields `true`.
16919
16920 Equality on immutable values is computed by structure, which means
16921 that values are compared by recursively descending the data structure
16922 until ground types are reached, at which point the ground types are
16923 compared with primitive equality tests (like comparison of
16924 characters).  So, the expression
16925
16926 ----
16927 [1, 2, 3] = [1, 1 + 1, 1 + 1 + 1]
16928 ----
16929
16930 is guaranteed to yield `true`, even though the lists may occupy
16931 different locations in memory.
16932
16933 Because of structural equality, immutable values can only be compared
16934 if their components can be compared.  For example, `[1, 2, 3]` can be
16935 compared, but `[1.0, 2.0, 3.0]` can not.  The SML type system uses
16936 <:EqualityType:equality types> to ensure that structural equality is
16937 only applied to valid values.
16938
16939
16940 == Equality of mutable values ==
16941
16942 In contrast to immutable values, polymorphic equality of
16943 <:Mutable:mutable> values (like ref cells and arrays) is performed by
16944 pointer comparison, not by structure.  So, the expression
16945
16946 ----
16947 ref 13 = ref 13
16948 ----
16949
16950 is guaranteed to yield `false`, even though the ref cells hold the
16951 same contents.
16952
16953 Because equality of mutable values is not structural, arrays and refs
16954 can be compared _even if their components are not equality types_.
16955 Hence, the following expression is type correct (and yields true).
16956
16957 [source,sml]
16958 ----
16959 let
16960    val r = ref 13.0
16961 in
16962    r = r
16963 end
16964 ----
16965
16966
16967 == Equality of datatypes ==
16968
16969 Polymorphic equality of datatypes is structural.  Two values of the
16970 same datatype are equal if they are of the same <:Variant:variant> and
16971 if the <:Variant:variant>'s arguments are equal (recursively).  So,
16972 with the datatype
16973
16974 [source,sml]
16975 ----
16976 datatype t = A | B of t
16977 ----
16978
16979 then `B (B A) = B A` is type correct and yields `false`, while `A = A`
16980 and `B A = B A` yield `true`.
16981
16982 As polymorphic equality descends two values to compare them, it uses
16983 pointer equality whenever it reaches a mutable value.  So, with the
16984 datatype
16985
16986 [source,sml]
16987 ----
16988 datatype t = A of int ref | ...
16989 ----
16990
16991 then `A (ref 13) = A (ref 13)` is type correct and yields `false`,
16992 because the pointer equality on the two ref cells yields `false`.
16993
16994 One weakness of the SML type system is that datatypes do not inherit
16995 the special property of the `ref` and `array` type constructors that
16996 allows them to be compared regardless of their component type.  For
16997 example, after declaring
16998
16999 [source,sml]
17000 ----
17001 datatype 'a t = A of 'a ref
17002 ----
17003
17004 one might expect to be able to compare two values of type `real t`,
17005 because pointer comparison on a ref cell would suffice.
17006 Unfortunately, the type system can only express that a user-defined
17007 datatype <:AdmitsEquality:admits equality> or not.  In this case, `t`
17008 admits equality, which means that `int t` can be compared but that
17009 `real t` can not.  We can confirm this with the program
17010
17011 [source,sml]
17012 ----
17013 datatype 'a t = A of 'a ref
17014 fun f (x: real t, y: real t) = x = y
17015 ----
17016
17017 on which MLton reports the following error.
17018
17019 ----
17020 Error: z.sml 2.32-2.36.
17021   Function applied to incorrect argument.
17022     expects: [<equality>] t * [<equality>] t
17023     but got: [real] t * [real] t
17024     in: = (x, y)
17025 ----
17026
17027
17028 == Implementation ==
17029
17030 Polymorphic equality is implemented by recursively descending the two
17031 values being compared, stopping as soon as they are determined to be
17032 unequal, or exploring the entire values to determine that they are
17033 equal.  Hence, polymorphic equality can take time proportional to the
17034 size of the smaller value.
17035
17036 MLton uses some optimizations to improve performance.
17037
17038 * When computing structural equality, first do a pointer comparison.
17039 If the comparison yields `true`, then stop and return `true`, since
17040 the structural comparison is guaranteed to do so.  If the pointer
17041 comparison fails, then recursively descend the values.
17042
17043 * If a datatype is an enum (e.g. `datatype t = A | B | C`), then a
17044 single comparison suffices to compare values of the datatype.  No case
17045 dispatch is required to determine whether the two values are of the
17046 same <:Variant:variant>.
17047
17048 * When comparing a known constant non-value-carrying
17049 <:Variant:variant>, use a single comparison.  For example, the
17050 following code will compile into a single comparison for `A = x`.
17051 +
17052 [source,sml]
17053 ----
17054 datatype t = A | B | C of ...
17055 fun f x = ... if A = x then ...
17056 ----
17057
17058 * When comparing a small constant `IntInf.int` to another
17059 `IntInf.int`, use a single comparison against the constant.  No case
17060 dispatch is required.
17061
17062
17063 == Also see ==
17064
17065 * <:AdmitsEquality:>
17066 * <:EqualityType:>
17067 * <:EqualityTypeVariable:>
17068
17069 <<<
17070
17071 :mlton-guide-page: Polyvariance
17072 [[Polyvariance]]
17073 Polyvariance
17074 ============
17075
17076 Polyvariance is an optimization pass for the <:SXML:>
17077 <:IntermediateLanguage:>, invoked from <:SXMLSimplify:>.
17078
17079 == Description ==
17080
17081 This pass duplicates a higher-order, `let` bound function at each
17082 variable reference, if the cost is smaller than some threshold.
17083
17084 == Implementation ==
17085
17086 * <!ViewGitFile(mlton,master,mlton/xml/polyvariance.fun)>
17087
17088 == Details and Notes ==
17089
17090 {empty}
17091
17092 <<<
17093
17094 :mlton-guide-page: Poplog
17095 [[Poplog]]
17096 Poplog
17097 ======
17098
17099 http://www.cs.bham.ac.uk/research/poplog/poplog.info.html[POPLOG] is a
17100 development environment that includes implementations of a number of
17101 languages, including <:StandardML:Standard ML>.
17102
17103 While POPLOG is actively developed, the <:ML:> support predates
17104 <:DefinitionOfStandardML:SML'97>, and there is no support for the
17105 <:BasisLibrary:Basis Library>
17106 http://www.standardml.org/Basis[specification].
17107
17108 == Also see ==
17109
17110  * http://www.cs.bham.ac.uk/research/poplog/doc/pmlhelp/mlinpop[Mixed-language programming in ML and Pop-11].
17111
17112 <<<
17113
17114 :mlton-guide-page: PortingMLton
17115 [[PortingMLton]]
17116 PortingMLton
17117 ============
17118
17119 Porting MLton to a new target platform (architecture or OS) involves
17120 the following steps.
17121
17122 1. Make the necessary changes to the scripts, runtime system,
17123 <:BasisLibrary: Basis Library> implementation, and compiler.
17124
17125 2. Get the regressions working using a cross compiler.
17126
17127 3. <:CrossCompiling: Cross compile> MLton and bootstrap on the target.
17128
17129 MLton has a native code generator only for AMD64 and X86, so, if you
17130 are porting to another architecture, you must use the C code
17131 generator.  These notes do not cover building a new native code
17132 generator.
17133
17134 Some of the following steps will not be necessary if MLton already
17135 supports the architecture or operating system you are porting to.
17136
17137
17138 == What code to change ==
17139
17140 * Scripts.
17141 +
17142 --
17143 * In `bin/platform`, add new cases to define `$HOST_OS` and `$HOST_ARCH`.
17144 --
17145
17146 * Runtime system.
17147 +
17148 --
17149 The goal of this step is to be able to successfully run `make` in the
17150 `runtime` directory on the target machine.
17151
17152 * In `platform.h`, add a new case to include `platform/<arch>.h` and `platform/<os>.h`.
17153
17154 * In `platform/<arch>.h`:
17155 ** define `MLton_Platform_Arch_host`.
17156
17157 * In `platform/<os>.h`:
17158 ** include platform-specific includes.
17159 ** define `MLton_Platform_OS_host`.
17160 ** define all of the `HAS_*` macros.
17161
17162 * In `platform/<os>.c` implement any platform-dependent functions that the runtime needs.
17163
17164 * Add rounding mode control to `basis/Real/IEEEReal.c` for the new arch (if not `HAS_FEROUND`)
17165
17166 * Compile and install the <:GnuMP:>.  This varies from platform to platform.  In `platform/<os>.h`, you need to include the appropriate `gmp.h`.
17167 --
17168
17169 * Basis Library implementation (`basis-library/*`)
17170 +
17171 --
17172 * In `primitive/prim-mlton.sml`:
17173 ** Add a new variant to the `MLton.Platform.Arch.t` datatype.
17174 ** modify the constants that define `MLton.Platform.Arch.host` to match with `MLton_Platform_Arch_host`, as set in `runtime/platform/<arch>.h`.
17175 ** Add a new variant to the `MLton.Platform.OS.t` datatype.
17176 ** modify the constants that define `MLton.Platform.OS.host` to match with `MLton_Platform_OS_host`, as set in `runtime/platform/<os>.h`.
17177
17178 * In `mlton/platform.{sig,sml}` add a new variant.
17179
17180 * In `sml-nj/sml-nj.sml`, modify `getOSKind`.
17181
17182 * Look at all the uses of `MLton.Platform` in the Basis Library implementation and see if you need to do anything special.  You might use the following command to see where to look.
17183 +
17184 ----
17185 find basis-library -type f | xargs grep 'MLton\.Platform'
17186 ----
17187 +
17188 If in doubt, leave the code alone and wait to see what happens when you run the regression tests.
17189 --
17190
17191 * Compiler.
17192 +
17193 --
17194 * In `lib/stubs/mlton-stubs/platform.sig` add any new variants, as was done in the Basis Library.
17195
17196 * In `lib/stubs/mlton-stubs/mlton.sml` add any new variants in `MLton.Platform`, as was done in the Basis Library.
17197 --
17198
17199 The string used to identify a particular architecture or operating
17200 system must be the same (except for possibly case of letters) in the
17201 scripts, runtime, Basis Library implementation, and compiler (stubs).
17202 In `mlton/main/main.fun`, MLton itself uses the conversions to and
17203 from strings:
17204 ----
17205 MLton.Platform.{Arch,OS}.{from,to}String
17206 ----
17207
17208 If the there is a mismatch, you may see the error message
17209 `strange arch` or `strange os`.
17210
17211
17212 == Running the regressions with a cross compiler ==
17213
17214 When porting to a new platform, it is always best to get all (or as
17215 many as possible) of the regressions working before moving to a self
17216 compile.  It is easiest to do this by modifying and rebuilding the
17217 compiler on a working machine and then running the regressions with a
17218 cross compiler.  It is not easy to build a gcc cross compiler, so we
17219 recommend generating the C and assembly on a working machine (using
17220 MLton's `-target` and `-stop g` flags, copying the generated files to
17221 the target machine, then compiling and linking there.
17222
17223 1. Remake the compiler on a working machine.
17224
17225 2. Use `bin/add-cross` to add support for the new target.  In particular, this should create `build/lib/mlton/targets/<target>/` with the platform-specific necessary cross-compilation information.
17226
17227 3. Run the regression tests with the cross-compiler.  To cross-compile all the tests, do
17228 +
17229 ----
17230 bin/regression -cross <target>
17231 ----
17232 +
17233 This will create all the executables.  Then, copy `bin/regression` and
17234 the `regression` directory to the target machine, and do
17235 +
17236 ----
17237 bin/regression -run-only <target>
17238 ----
17239 +
17240 This should run all the tests.
17241
17242 Repeat this step, interleaved with appropriate compiler modifications,
17243 until all the regressions pass.
17244
17245
17246 == Bootstrap ==
17247
17248 Once you've got all the regressions working, you can build MLton for
17249 the new target.  As with the regressions, the idea for bootstrapping
17250 is to generate the C and assembly on a working machine, copy it to the
17251 target machine, and then compile and link there.  Here's the sequence
17252 of steps.
17253
17254 1. On a working machine, with the newly rebuilt compiler, in the `mlton` directory, do:
17255 +
17256 ----
17257 mlton -stop g -target <target> mlton.mlb
17258 ----
17259
17260 2. Copy to the target machine.
17261
17262 3. On the target machine, move the libraries to the right place. That is, in `build/lib/mlton/targets`, do:
17263 +
17264 ----
17265 rm -rf self
17266 mv <target> self
17267 ----
17268 +
17269 Also make sure you have all the header files in build/lib/mlton/include. You can copy them from a host machine that has run `make runtime`.
17270
17271 4. On the target machine, compile and link MLton.  That is, in the  mlton directory, do something like:
17272 +
17273 ----
17274 gcc -c -Ibuild/lib/mlton/include -Ibuild/lib/mlton/targets/self/include -O1 -w mlton/mlton.*.[cs]
17275 gcc -o build/lib/mlton/mlton-compile \
17276         -Lbuild/lib/mlton/targets/self \
17277         -L/usr/local/lib \
17278         mlton.*.o \
17279         -lmlton -lgmp -lgdtoa -lm
17280 ----
17281
17282 5. At this point, MLton should be working and you can finish the rest of a usual make on the target machine.
17283 +
17284 ----
17285 make basis-no-check script mlbpathmap constants libraries tools
17286 ----
17287
17288 6. Making the last tool, mlyacc, will fail, because mlyacc cannot bootstrap its own yacc.grm.* files. On the host machine, run `make -C mlyacc src/yacc.grm.sml`. Then copy both files to the target machine, and compile mlyacc, making sure to supply the path to your newly compile mllex: `make -C mlyacc MLLEX=mllex/mllex`.
17289
17290 There are other details to get right, like making sure that the tools
17291 directories were clean so that the tools are rebuilt on the new
17292 platform, but hopefully this structure works.  Once you've got a
17293 compiler on the target machine, you should test it by running all the
17294 regressions normally (i.e. without the `-cross` flag) and by running a
17295 couple rounds of self compiles.
17296
17297
17298 == Also see ==
17299
17300 The above description is based on the following emails sent to the
17301 MLton list.
17302
17303 * http://www.mlton.org/pipermail/mlton/2002-October/013110.html
17304 * http://www.mlton.org/pipermail/mlton/2004-July/016029.html
17305
17306 <<<
17307
17308 :mlton-guide-page: PrecedenceParse
17309 [[PrecedenceParse]]
17310 PrecedenceParse
17311 ===============
17312
17313 <:PrecedenceParse:> is an analysis/rewrite pass for the <:AST:>
17314 <:IntermediateLanguage:>, invoked from <:Elaborate:>.
17315
17316 == Description ==
17317
17318 This pass rewrites <:AST:> function clauses, expressions, and patterns
17319 to resolve <:OperatorPrecedence:>.
17320
17321 == Implementation ==
17322
17323 * <!ViewGitFile(mlton,master,mlton/elaborate/precedence-parse.sig)>
17324 * <!ViewGitFile(mlton,master,mlton/elaborate/precedence-parse.fun)>
17325
17326 == Details and Notes ==
17327
17328 {empty}
17329
17330 <<<
17331
17332 :mlton-guide-page: Printf
17333 [[Printf]]
17334 Printf
17335 ======
17336
17337 Programmers coming from C or Java often ask if
17338 <:StandardML:Standard ML> has a `printf` function.  It does not.
17339 However, it is possible to implement your own version with only a few
17340 lines of code.
17341
17342 Here is a definition for `printf` and `fprintf`, along with format
17343 specifiers for booleans, integers, and reals.
17344
17345 [source,sml]
17346 ----
17347 structure Printf =
17348    struct
17349       fun $ (_, f) = f (fn p => p ()) ignore
17350       fun fprintf out f = f (out, id)
17351       val printf = fn z => fprintf TextIO.stdOut z
17352       fun one ((out, f), make) g =
17353          g (out, fn r =>
17354             f (fn p =>
17355                make (fn s =>
17356                      r (fn () => (p (); TextIO.output (out, s))))))
17357       fun ` x s = one (x, fn f => f s)
17358       fun spec to x = one (x, fn f => f o to)
17359       val B = fn z => spec Bool.toString z
17360       val I = fn z => spec Int.toString z
17361       val R = fn z => spec Real.toString z
17362    end
17363 ----
17364
17365 Here's an example use.
17366
17367 [source,sml]
17368 ----
17369 val () = printf `"Int="I`"  Bool="B`"  Real="R`"\n" $ 1 false 2.0
17370 ----
17371
17372 This prints the following.
17373
17374 ----
17375 Int=1  Bool=false  Real=2.0
17376 ----
17377
17378 In general, a use of `printf` looks like
17379
17380 ----
17381 printf <spec1> ... <specn> $ <arg1> ... <argm>
17382 ----
17383
17384 where each `<speci>` is either a specifier like `B`, `I`, or `R`, or
17385 is an inline string, like ++&grave;"foo"++.  A backtick (+&grave;+)
17386 must precede each inline string.  Each `<argi>` must be of the
17387 appropriate type for the corresponding specifier.
17388
17389 SML `printf` is more powerful than its C counterpart in a number of
17390 ways.  In particular, the function produced by `printf` is a perfectly
17391 ordinary SML function, and can be passed around, used multiple times,
17392 etc.  For example:
17393
17394 [source,sml]
17395 ----
17396 val f: int -> bool -> unit = printf `"Int="I`"  Bool="B`"\n" $
17397 val () = f 1 true
17398 val () = f 2 false
17399 ----
17400
17401 The definition of `printf` is even careful to not print anything until
17402 it is fully applied.  So, examples like the following will work as
17403 expected.
17404
17405 ----
17406 val f: int -> bool -> unit = printf `"Int="I`"  Bool="B`"\n" $ 13
17407 val () = f true
17408 val () = f false
17409 ----
17410
17411 It is also easy to define new format specifiers.  For example, suppose
17412 we wanted format specifiers for characters and strings.
17413
17414 ----
17415 val C = fn z => spec Char.toString z
17416 val S = fn z => spec (fn s => s) z
17417 ----
17418
17419 One can define format specifiers for more complex types, e.g. pairs of
17420 integers.
17421
17422 ----
17423 val I2 =
17424    fn z =>
17425    spec (fn (i, j) =>
17426          concat ["(", Int.toString i, ", ", Int.toString j, ")"])
17427    z
17428 ----
17429
17430 Here's an example use.
17431
17432 ----
17433 val () = printf `"Test "I2`"  a string "S`"\n" $ (1, 2) "hello"
17434 ----
17435
17436
17437 == Printf via <:Fold:> ==
17438
17439 `printf` is best viewed as a special case of variable-argument
17440 <:Fold:> that inductively builds a function as it processes its
17441 arguments.  Here is the definition of a `Printf` structure in terms of
17442 fold.  The structure is equivalent to the above one, except that it
17443 uses the standard `$` instead of a specialized one.
17444
17445 [source,sml]
17446 ----
17447 structure Printf =
17448    struct
17449       fun fprintf out =
17450          Fold.fold ((out, id), fn (_, f) => f (fn p => p ()) ignore)
17451
17452       val printf = fn z => fprintf TextIO.stdOut z
17453
17454       fun one ((out, f), make) =
17455          (out, fn r =>
17456           f (fn p =>
17457              make (fn s =>
17458                    r (fn () => (p (); TextIO.output (out, s))))))
17459
17460       val ` =
17461          fn z => Fold.step1 (fn (s, x) => one (x, fn f => f s)) z
17462
17463       fun spec to = Fold.step0 (fn x => one (x, fn f => f o to))
17464
17465       val B = fn z => spec Bool.toString z
17466       val I = fn z => spec Int.toString z
17467       val R = fn z => spec Real.toString z
17468    end
17469 ----
17470
17471 Viewing `printf` as a fold opens up a number of possibilities.  For
17472 example, one can name parts of format strings using the fold idiom for
17473 naming sequences of steps.
17474
17475 ----
17476 val IB = fn u => Fold.fold u `"Int="I`" Bool="B
17477 val () = printf IB`"  "IB`"\n" $ 1 true 3 false
17478 ----
17479
17480 One can even parametrize over partial format strings.
17481
17482 ----
17483 fun XB X = fn u => Fold.fold u `"X="X`" Bool="B
17484 val () = printf (XB I)`"  "(XB R)`"\n" $ 1 true 2.0 false
17485 ----
17486
17487
17488 == Also see ==
17489
17490 * <:PrintfGentle:>
17491 * <!Cite(Danvy98, Functional Unparsing)>
17492
17493 <<<
17494
17495 :mlton-guide-page: PrintfGentle
17496 [[PrintfGentle]]
17497 PrintfGentle
17498 ============
17499
17500 This page provides a gentle introduction and derivation of <:Printf:>,
17501 with sections and arrangement more suitable to a talk.
17502
17503
17504 == Introduction ==
17505
17506 SML does not have `printf`.  Could we define it ourselves?
17507
17508 [source,sml]
17509 ----
17510 val () = printf ("here's an int %d and a real %f.\n", 13, 17.0)
17511 val () = printf ("here's three values (%d, %f, %f).\n", 13, 17.0, 19.0)
17512 ----
17513
17514 What could the type of `printf` be?
17515
17516 This obviously can't work, because SML functions take a fixed number
17517 of arguments.  Actually they take one argument, but if that's a tuple,
17518 it can only have a fixed number of components.
17519
17520
17521 == From tupling to currying ==
17522
17523 What about currying to get around the typing problem?
17524
17525 [source,sml]
17526 ----
17527 val () = printf "here's an int %d and a real %f.\n" 13 17.0
17528 val () = printf "here's three values (%d, %f, %f).\n" 13 17.0 19.0
17529 ----
17530
17531 That fails for a similar reason.  We need two types for `printf`.
17532
17533 ----
17534 val printf: string -> int -> real -> unit
17535 val printf: string -> int -> real -> real -> unit
17536 ----
17537
17538 This can't work, because `printf` can only have one type.  SML doesn't
17539 support programmer-defined overloading.
17540
17541
17542 == Overloading and dependent types ==
17543
17544 Even without worrying about number of arguments, there is another
17545 problem.  The type of `printf` depends on the format string.
17546
17547 [source,sml]
17548 ----
17549 val () = printf "here's an int %d and a real %f.\n" 13 17.0
17550 val () = printf "here's a real %f and an int %d.\n" 17.0 13
17551 ----
17552
17553 Now we need
17554
17555 ----
17556 val printf: string -> int -> real -> unit
17557 val printf: string -> real -> int -> unit
17558 ----
17559
17560 Again, this can't possibly working because SML doesn't have
17561 overloading, and types can't depend on values.
17562
17563
17564 == Idea: express type information in the format string ==
17565
17566 If we express type information in the format string, then different
17567 uses of `printf` can have different types.
17568
17569 [source,sml]
17570 ----
17571 type 'a t  (* the type of format strings *)
17572 val printf: 'a t -> 'a
17573 infix D F
17574 val fs1: (int -> real -> unit) t = "here's an int "D" and a real "F".\n"
17575 val fs2: (int -> real -> real -> unit) t =
17576    "here's three values ("D", "F", "F").\n"
17577 val () = printf fs1 13 17.0
17578 val () = printf fs2 13 17.0 19.0
17579 ----
17580
17581 Now, our two calls to `printf` type check, because the format
17582 string specializes `printf` to the appropriate type.
17583
17584
17585 == The types of format characters ==
17586
17587 What should the type of format characters `D` and `F` be?  Each format
17588 character requires an additional argument of the appropriate type to
17589 be supplied to `printf`.
17590
17591 Idea: guess the final type that will be needed for `printf` the format
17592 string and verify it with each format character.
17593
17594 [source,sml]
17595 ----
17596 type ('a, 'b) t   (* 'a = rest of type to verify, 'b = final type *)
17597 val ` : string -> ('a, 'a) t  (* guess the type, which must be verified *)
17598 val D: (int -> 'a, 'b) t * string -> ('a, 'b) t  (* consume an int *)
17599 val F: (real -> 'a, 'b) t * string -> ('a, 'b) t  (* consume a real *)
17600 val printf: (unit, 'a) t -> 'a
17601 ----
17602
17603 Don't worry.  In the end, type inference will guess and verify for us.
17604
17605
17606 == Understanding guess and verify ==
17607
17608 Now, let's build up a format string and a specialized `printf`.
17609
17610 [source,sml]
17611 ----
17612 infix D F
17613 val f0 = `"here's an int "
17614 val f1 = f0 D " and a real "
17615 val f2 = f1 F ".\n"
17616 val p = printf f2
17617 ----
17618
17619 These definitions yield the following types.
17620
17621 [source,sml]
17622 ----
17623 val f0: (int -> real -> unit, int -> real -> unit) t
17624 val f1: (real -> unit, int -> real -> unit) t
17625 val f2: (unit, int -> real -> unit) t
17626 val p: int -> real -> unit
17627 ----
17628
17629 So, `p` is a specialized `printf` function.  We could use it as
17630 follows
17631
17632 [source,sml]
17633 ----
17634 val () = p 13 17.0
17635 val () = p 14 19.0
17636 ----
17637
17638
17639 == Type checking this using a functor ==
17640
17641 [source,sml]
17642 ----
17643 signature PRINTF =
17644    sig
17645       type ('a, 'b) t
17646       val ` : string -> ('a, 'a) t
17647       val D: (int -> 'a, 'b) t * string -> ('a, 'b) t
17648       val F: (real -> 'a, 'b) t * string -> ('a, 'b) t
17649       val printf: (unit, 'a) t -> 'a
17650    end
17651
17652 functor Test (P: PRINTF) =
17653    struct
17654       open P
17655       infix D F
17656
17657       val () = printf (`"here's an int "D" and a real "F".\n") 13 17.0
17658       val () = printf (`"here's three values ("D", "F ", "F").\n") 13 17.0 19.0
17659    end
17660 ----
17661
17662
17663 == Implementing `Printf` ==
17664
17665 Think of a format character as a formatter transformer.  It takes the
17666 formatter for the part of the format string before it and transforms
17667 it into a new formatter that first does the left hand bit, then does
17668 its bit, then continues on with the rest of the format string.
17669
17670 [source,sml]
17671 ----
17672 structure Printf: PRINTF =
17673    struct
17674       datatype ('a, 'b) t = T of (unit -> 'a) -> 'b
17675
17676       fun printf (T f) = f (fn () => ())
17677
17678       fun ` s = T (fn a => (print s; a ()))
17679
17680       fun D (T f, s) =
17681          T (fn g => f (fn () => fn i =>
17682                        (print (Int.toString i); print s; g ())))
17683
17684       fun F (T f, s) =
17685          T (fn g => f (fn () => fn i =>
17686                        (print (Real.toString i); print s; g ())))
17687    end
17688 ----
17689
17690
17691 == Testing printf ==
17692
17693 [source,sml]
17694 ----
17695 structure Z = Test (Printf)
17696 ----
17697
17698
17699 == User-definable formats ==
17700
17701 The definition of the format characters is pretty much the same.
17702 Within the `Printf` structure we can define a format character
17703 generator.
17704
17705 [source,sml]
17706 ----
17707 val newFormat: ('a -> string) -> ('a -> 'b, 'c) t * string -> ('b, 'c) t =
17708    fn toString => fn (T f, s) =>
17709    T (fn th => f (fn () => fn a => (print (toString a); print s ; th ())))
17710 val D = fn z => newFormat Int.toString z
17711 val F = fn z => newFormat Real.toString z
17712 ----
17713
17714
17715 == A core `Printf` ==
17716
17717 We can now have a very small `PRINTF` signature, and define all
17718 the format strings externally to the core module.
17719
17720 [source,sml]
17721 ----
17722 signature PRINTF =
17723    sig
17724       type ('a, 'b) t
17725       val ` : string -> ('a, 'a) t
17726       val newFormat: ('a -> string) -> ('a -> 'b, 'c) t * string -> ('b, 'c) t
17727       val printf: (unit, 'a) t -> 'a
17728    end
17729
17730 structure Printf: PRINTF =
17731    struct
17732       datatype ('a, 'b) t = T of (unit -> 'a) -> 'b
17733
17734       fun printf (T f) = f (fn () => ())
17735
17736       fun ` s = T (fn a => (print s; a ()))
17737
17738       fun newFormat toString (T f, s) =
17739          T (fn th =>
17740             f (fn () => fn a =>
17741                (print (toString a)
17742                 ; print s
17743                 ; th ())))
17744    end
17745 ----
17746
17747
17748 == Extending to fprintf ==
17749
17750 One can implement fprintf by threading the outstream through all the
17751 transformers.
17752
17753 [source,sml]
17754 ----
17755 signature PRINTF =
17756    sig
17757       type ('a, 'b) t
17758       val ` : string -> ('a, 'a) t
17759       val fprintf: (unit, 'a) t * TextIO.outstream -> 'a
17760       val newFormat: ('a -> string) -> ('a -> 'b, 'c) t * string -> ('b, 'c) t
17761       val printf: (unit, 'a) t -> 'a
17762    end
17763
17764 structure Printf: PRINTF =
17765    struct
17766       type out = TextIO.outstream
17767       val output = TextIO.output
17768
17769       datatype ('a, 'b) t = T of (out -> 'a) -> out -> 'b
17770
17771       fun fprintf (T f, out) = f (fn _ => ()) out
17772
17773       fun printf t = fprintf (t, TextIO.stdOut)
17774
17775       fun ` s = T (fn a => fn out => (output (out, s); a out))
17776
17777       fun newFormat toString (T f, s) =
17778          T (fn g =>
17779             f (fn out => fn a =>
17780                (output (out, toString a)
17781                 ; output (out, s)
17782                 ; g out)))
17783    end
17784 ----
17785
17786
17787 == Notes ==
17788
17789 * Lesson: instead of using dependent types for a function, express the
17790 the dependency in the type of the argument.
17791
17792 * If `printf` is partially applied, it will do the printing then and
17793 there.  Perhaps this could be fixed with some kind of terminator.
17794 +
17795 A syntactic or argument terminator is not necessary.  A formatter can
17796 either be eager (as above) or lazy (as below).  A lazy formatter
17797 accumulates enough state to print the entire string.  The simplest
17798 lazy formatter concatenates the strings as they become available:
17799 +
17800 [source,sml]
17801 ----
17802 structure PrintfLazyConcat: PRINTF =
17803    struct
17804       datatype ('a, 'b) t = T of (string -> 'a) -> string -> 'b
17805
17806       fun printf (T f) = f print ""
17807
17808       fun ` s = T (fn th => fn s' => th (s' ^ s))
17809
17810       fun newFormat toString (T f, s) =
17811          T (fn th =>
17812             f (fn s' => fn a =>
17813                th (s' ^ toString a ^ s)))
17814    end
17815 ----
17816 +
17817 It is somewhat more efficient to accumulate the strings as a list:
17818 +
17819 [source,sml]
17820 ----
17821 structure PrintfLazyList: PRINTF =
17822    struct
17823       datatype ('a, 'b) t = T of (string list -> 'a) -> string list -> 'b
17824
17825       fun printf (T f) = f (List.app print o List.rev) []
17826
17827       fun ` s = T (fn th => fn ss => th (s::ss))
17828
17829       fun newFormat toString (T f, s) =
17830          T (fn th =>
17831             f (fn ss => fn a =>
17832                th (s::toString a::ss)))
17833    end
17834 ----
17835
17836
17837 == Also see ==
17838
17839 * <:Printf:>
17840 * <!Cite(Danvy98, Functional Unparsing)>
17841
17842 <<<
17843
17844 :mlton-guide-page: ProductType
17845 [[ProductType]]
17846 ProductType
17847 ===========
17848
17849 <:StandardML:Standard ML> has special syntax for products (tuples). A
17850 product type is written as
17851 [source,sml]
17852 ----
17853 t1 * t2 * ... * tN
17854 ----
17855 and a product pattern is written as
17856 [source,sml]
17857 ----
17858 (p1, p2, ..., pN)
17859 ----
17860
17861 In most situations the syntax is quite convenient.  However, there are
17862 situations where the syntax is cumbersome.  There are also situations
17863 in which it is useful to construct and destruct n-ary products
17864 inductively, especially when using <:Fold:>.
17865
17866 In such situations, it is useful to have a binary product datatype
17867 with an infix constructor defined as follows.
17868 [source,sml]
17869 ----
17870 datatype ('a, 'b) product = & of 'a * 'b
17871 infix &
17872 ----
17873
17874 With these definitions, one can write an n-ary product as a nested
17875 binary product quite conveniently.
17876 [source,sml]
17877 ----
17878 x1 & x2 & ... & xn
17879 ----
17880
17881 Because of left associativity, this is the same as
17882 [source,sml]
17883 ----
17884 (((x1 & x2) & ...) & xn)
17885 ----
17886
17887 Because `&` is a constructor, the syntax can also be used for
17888 patterns.
17889
17890 The symbol `&` is inspired by the Curry-Howard isomorphism: the proof
17891 of a conjunction `(A & B)` is a pair of proofs `(a, b)`.
17892
17893
17894 == Example: parser combinators ==
17895
17896 A typical parser combinator library provides a combinator that has a
17897 type of the form.
17898 [source,sml]
17899 ----
17900 'a parser * 'b parser -> ('a * 'b) parser
17901 ----
17902 and produces a parser for the concatenation of two parsers. When more
17903 than two parsers are concatenated, the result of the resulting parser
17904 is a nested structure of pairs
17905 [source,sml]
17906 ----
17907 (...((p1, p2), p3)..., pN)
17908 ----
17909 which is somewhat cumbersome.
17910
17911 By using a product type, the type of the concatenation combinator then
17912 becomes
17913 [source,sml]
17914 ----
17915 'a parser * 'b parser -> ('a, 'b) product parser
17916 ----
17917 While this doesn't stop the nesting, it makes the pattern significantly
17918 easier to write. Instead of
17919 [source,sml]
17920 ----
17921 (...((p1, p2), p3)..., pN)
17922 ----
17923 the pattern is written as
17924 [source,sml]
17925 ----
17926 p1 & p2 & p3 & ... & pN
17927 ----
17928 which is considerably more concise.
17929
17930
17931 == Also see ==
17932
17933 * <:VariableArityPolymorphism:>
17934 * <:Utilities:>
17935
17936 <<<
17937
17938 :mlton-guide-page: Profiling
17939 [[Profiling]]
17940 Profiling
17941 =========
17942
17943 With MLton and `mlprof`, you can profile your program to find out
17944 bytes allocated, execution counts, or time spent in each function.  To
17945 profile you program, compile with ++-profile __kind__++, where _kind_
17946 is one of `alloc`, `count`, or `time`.  Then, run the executable,
17947 which will write an `mlmon.out` file when it finishes.  You can then
17948 run `mlprof` on the executable and the `mlmon.out` file to see the
17949 performance data.
17950
17951 Here are the three kinds of profiling that MLton supports.
17952
17953 * <:ProfilingAllocation:>
17954 * <:ProfilingCounts:>
17955 * <:ProfilingTime:>
17956
17957 == Next steps ==
17958
17959 * <:CallGraph:>s to visualize profiling data.
17960 * <:HowProfilingWorks:>
17961 * <:MLmon:>
17962 * <:MLtonProfile:> to selectively profile parts of your program.
17963 * <:ProfilingTheStack:>
17964 * <:ShowProf:>
17965
17966 <<<
17967
17968 :mlton-guide-page: ProfilingAllocation
17969 [[ProfilingAllocation]]
17970 ProfilingAllocation
17971 ===================
17972
17973 With MLton and `mlprof`, you can <:Profiling:profile> your program to
17974 find out how many bytes each function allocates.  To do so, compile
17975 your program with `-profile alloc`.  For example, suppose that
17976 `list-rev.sml` is the following.
17977
17978 [source,sml]
17979 ----
17980 sys::[./bin/InclGitFile.py mlton master doc/examples/profiling/list-rev.sml]
17981 ----
17982
17983 Compile and run `list-rev` as follows.
17984 ----
17985 % mlton -profile alloc list-rev.sml
17986 % ./list-rev
17987 % mlprof -show-line true list-rev mlmon.out
17988 6,030,136 bytes allocated (108,336 bytes by GC)
17989        function          cur
17990 ----------------------- -----
17991 append  list-rev.sml: 1 97.6%
17992 <gc>                     1.8%
17993 <main>                   0.4%
17994 rev  list-rev.sml: 6     0.2%
17995 ----
17996
17997 The data shows that most of the allocation is done by the `append`
17998 function defined on line 1 of `list-rev.sml`.  The table also shows
17999 how special functions like `gc` and `main` are handled: they are
18000 printed with surrounding brackets.  C functions are displayed
18001 similarly.  In this example, the allocation done by the garbage
18002 collector is due to stack growth, which is usually the case.
18003
18004 The run-time performance impact of allocation profiling is noticeable,
18005 because it inserts additional C calls for object allocation.
18006
18007 Compile with `-profile alloc -profile-branch true` to find out how
18008 much allocation is done in each branch of a function; see
18009 <:ProfilingCounts:> for more details on `-profile-branch`.
18010
18011 <<<
18012
18013 :mlton-guide-page: ProfilingCounts
18014 [[ProfilingCounts]]
18015 ProfilingCounts
18016 ===============
18017
18018 With MLton and `mlprof`, you can <:Profiling:profile> your program to
18019 find out how many times each function is called and how many times
18020 each branch is taken.  To do so, compile your program with
18021 `-profile count -profile-branch true`. For example, suppose that
18022 `tak.sml` contains the following.
18023
18024 [source,sml]
18025 ----
18026 sys::[./bin/InclGitFile.py mlton master doc/examples/profiling/tak.sml]
18027 ----
18028
18029 Compile with count profiling and run the program.
18030 ----
18031 % mlton -profile count -profile-branch true tak.sml
18032 % ./tak
18033 ----
18034
18035 Display the profiling data, along with raw counts and file positions.
18036 ----
18037 % mlprof -raw true -show-line true tak mlmon.out
18038 623,610,002 ticks
18039             function               cur       raw
18040 --------------------------------- ----- -------------
18041 Tak.tak1.tak2  tak.sml: 5         38.2% (238,530,000)
18042 Tak.tak1.tak2.<true>  tak.sml: 7  27.5% (171,510,000)
18043 Tak.tak1  tak.sml: 3              10.7%  (67,025,000)
18044 Tak.tak1.<true>  tak.sml: 14      10.7%  (67,025,000)
18045 Tak.tak1.tak2.<false>  tak.sml: 9 10.7%  (67,020,000)
18046 Tak.tak1.<false>  tak.sml: 16      2.0%  (12,490,000)
18047 f  tak.sml: 23                     0.0%       (5,001)
18048 f.<branch>  tak.sml: 25            0.0%       (5,000)
18049 f.<branch>  tak.sml: 23            0.0%           (1)
18050 uncalled  tak.sml: 29              0.0%           (0)
18051 f.<branch>  tak.sml: 24            0.0%           (0)
18052 ----
18053
18054 Branches are displayed with lexical nesting followed by `<branch>`
18055 where the function name would normally be, or `<true>` or `<false>`
18056 for if-expressions.  It is best to run `mlprof` with `-show-line true`
18057 to help identify the branch.
18058
18059 One use of `-profile count` is as a code-coverage tool, to help find
18060 code in your program that hasn't been tested.  For this reason,
18061 `mlprof` displays functions and branches even if they have a count of
18062 zero.  As the above output shows, the branch on line 24 was never
18063 taken and the function defined on line 29 was never called.  To see
18064 zero counts, it is best to run `mlprof` with `-raw true`, since some
18065 code (e.g. the branch on line 23 above) will show up with `0.0%` but
18066 may still have been executed and hence have a nonzero raw count.
18067
18068 <<<
18069
18070 :mlton-guide-page: ProfilingTheStack
18071 [[ProfilingTheStack]]
18072 ProfilingTheStack
18073 =================
18074
18075 For all forms of <:Profiling:>, you can gather counts for all
18076 functions on the stack, not just the currently executing function.  To
18077 do so, compile your program with `-profile-stack true`.  For example,
18078 suppose that `list-rev.sml` contains the following.
18079
18080 [source,sml]
18081 ----
18082 sys::[./bin/InclGitFile.py mlton master doc/examples/profiling/list-rev.sml]
18083 ----
18084
18085 Compile with stack profiling and then run the program.
18086 ----
18087 % mlton -profile alloc -profile-stack true list-rev.sml
18088 % ./list-rev
18089 ----
18090
18091 Display the profiling data.
18092 ----
18093 % mlprof -show-line true list-rev mlmon.out
18094 6,030,136 bytes allocated (108,336 bytes by GC)
18095        function          cur  stack  GC
18096 ----------------------- ----- ----- ----
18097 append  list-rev.sml: 1 97.6% 97.6% 1.4%
18098 <gc>                     1.8%  0.0% 1.8%
18099 <main>                   0.4% 98.2% 1.8%
18100 rev  list-rev.sml: 6     0.2% 97.6% 1.8%
18101 ----
18102
18103 In the above table, we see that `rev`, defined on line 6 of
18104 `list-rev.sml`, is only responsible for 0.2% of the allocation, but is
18105 on the stack while 97.6% of the allocation is done by the user program
18106 and while 1.8% of the allocation is done by the garbage collector.
18107
18108 The run-time performance impact of `-profile-stack true` can be
18109 noticeable since there is some extra bookkeeping at every nontail call
18110 and return.
18111
18112 <<<
18113
18114 :mlton-guide-page: ProfilingTime
18115 [[ProfilingTime]]
18116 ProfilingTime
18117 =============
18118
18119 With MLton and `mlprof`, you can <:Profiling:profile> your program to
18120 find out how much time is spent in each function over an entire run of
18121 the program.  To do so, compile your program with `-profile time`.
18122 For example, suppose that `tak.sml` contains the following.
18123
18124 [source,sml]
18125 ----
18126 sys::[./bin/InclGitFile.py mlton master doc/examples/profiling/tak.sml]
18127 ----
18128
18129 Compile with time profiling and run the program.
18130 ----
18131 % mlton -profile time tak.sml
18132 % ./tak
18133 ----
18134
18135 Display the profiling data.
18136 ----
18137 % mlprof tak mlmon.out
18138 6.00 seconds of CPU time (0.00 seconds GC)
18139 function     cur
18140 ------------- -----
18141 Tak.tak1.tak2 75.8%
18142 Tak.tak1      24.2%
18143 ----
18144
18145 This example shows how `mlprof` indicates lexical nesting: as a
18146 sequence of period-separated names indicating the structures and
18147 functions in which a function definition is nested.  The profiling
18148 data shows that roughly three-quarters of the time is spent in the
18149 `Tak.tak1.tak2` function, while the rest is spent in `Tak.tak1`.
18150
18151 Display raw counts in addition to percentages with `-raw true`.
18152 ----
18153 % mlprof -raw true tak mlmon.out
18154 6.00 seconds of CPU time (0.00 seconds GC)
18155   function     cur    raw
18156 ------------- ----- -------
18157 Tak.tak1.tak2 75.8% (4.55s)
18158 Tak.tak1      24.2% (1.45s)
18159 ----
18160
18161 Display the file name and line number for each function in addition to
18162 its name with `-show-line true`.
18163 ----
18164 % mlprof -show-line true tak mlmon.out
18165 6.00 seconds of CPU time (0.00 seconds GC)
18166         function           cur
18167 ------------------------- -----
18168 Tak.tak1.tak2  tak.sml: 5 75.8%
18169 Tak.tak1  tak.sml: 3      24.2%
18170 ----
18171
18172 Time profiling is designed to have a very small performance impact.
18173 However, in some cases there will be a run-time performance cost,
18174 which may perturb the results.  There is more likely to be an impact
18175 with `-codegen c` than `-codegen native`.
18176
18177 You can also compile with `-profile time -profile-branch true` to find
18178 out how much time is spent in each branch of a function; see
18179 <:ProfilingCounts:> for more details on `-profile-branch`.
18180
18181
18182 == Caveats ==
18183
18184 With `-profile time`, use of the following in your program will cause
18185 a run-time error, since they would interfere with the profiler signal
18186 handler.
18187
18188 * `MLton.Itimer.set (MLton.Itimer.Prof, ...)`
18189 * `MLton.Signal.setHandler (MLton.Signal.prof, ...)`
18190
18191 Also, because of the random sampling used to implement `-profile
18192 time`, it is best to have a long running program (at least tens of
18193 seconds) in order to get reasonable time
18194
18195 <<<
18196
18197 :mlton-guide-page: Projects
18198 [[Projects]]
18199 Projects
18200 ========
18201
18202 We have lots of ideas for projects to improve MLton, many of which we
18203 do not have time to implement, or at least haven't started on yet.
18204 Here is a list of some of those improvements, ranging from the easy (1
18205 week) to the difficult (several months).  If you have any interest in
18206 working on one of these, or some other improvement to MLton not listed
18207 here, please send mail to
18208 mailto:MLton-devel@mlton.org[`MLton-devel@mlton.org`].
18209
18210 * Port to new platform: Windows (native, not Cygwin or MinGW), ...
18211 * Source-level debugger
18212 * Heap profiler
18213 * Interfaces to libraries: OpenGL, Gtk+, D-BUS, ...
18214 * More libraries written in SML (see <!ViewGitProj(mltonlib)>)
18215 * Additional constant types: `structure Real80: REAL`, ...
18216 * An IDE (possibly integrated with <:Eclipse:>)
18217 * Port MLRISC and use for code generation
18218 * Optimizations
18219 ** Improved closure representation
18220 +
18221 Right now, MLton's closure conversion algorithm uses a simple flat closure to represent each function.
18222 +
18223 *** http://www.mlton.org/pipermail/mlton/2003-October/024570.html
18224 *** http://www.mlton.org/pipermail/mlton-user/2007-July/001150.html
18225 *** <!Cite(ShaoAppel94)>
18226 ** Elimination of array bounds checks in loops
18227 ** Elimination of overflow checks on array index computations
18228 ** Common-subexpression elimination of repeated array subscripts
18229 ** Loop-invariant code motion, especially for tuple selects
18230 ** Partial redundancy elimination
18231 *** http://www.mlton.org/pipermail/mlton/2006-April/028598.html
18232 ** Loop unrolling, especially for small loops
18233 ** Auto-vectorization, for MMX/SSE/3DNow!/AltiVec (see the http://gcc.gnu.org/projects/tree-ssa/vectorization.html[work done on GCC])
18234 ** Optimize `MLton_eq`: pointer equality is necessarily false when one of the arguments is freshly allocated in the block
18235 * Analyses
18236 ** Uncaught exception analysis
18237
18238 <<<
18239
18240 :mlton-guide-page: Pronounce
18241 [[Pronounce]]
18242 Pronounce
18243 =========
18244
18245 Here is <!Attachment(Pronounce,pronounce-mlton.mp3,how "MLton" sounds)>.
18246
18247 "MLton" is pronounced in two syllables, with stress on the first
18248 syllable.  The first syllable sounds like the word _mill_ (as in
18249 "steel mill"), the second like the word _tin_ (as in "cookie tin").
18250
18251 <<<
18252
18253 :mlton-guide-page: PropertyList
18254 [[PropertyList]]
18255 PropertyList
18256 ============
18257
18258 A property list is a dictionary-like data structure into which
18259 properties (name-value pairs) can be inserted and from which
18260 properties can be looked up by name.  The term comes from the Lisp
18261 language, where every symbol has a property list for storing
18262 information, and where the names are typically symbols and keys can be
18263 any type of value.
18264
18265 Here is an SML signature for property lists such that for any type of
18266 value a new property can be dynamically created to manipulate that
18267 type of value in a property list.
18268
18269 [source,sml]
18270 ----
18271 signature PROPERTY_LIST =
18272    sig
18273       type t
18274
18275       val new: unit -> t
18276       val newProperty: unit -> {add: t * 'a -> unit,
18277                                 peek: t -> 'a option}
18278    end
18279 ----
18280
18281 Here is a functor demonstrating the use of property lists.  It first
18282 creates a property list, then two new properties (of different types),
18283 and adds a value to the list for each property.
18284
18285 [source,sml]
18286 ----
18287 functor Test (P: PROPERTY_LIST) =
18288    struct
18289       val pl = P.new ()
18290
18291       val {add = addInt: P.t * int -> unit, peek = peekInt} = P.newProperty ()
18292       val {add = addReal: P.t * real -> unit, peek = peekReal} = P.newProperty ()
18293
18294       val () = addInt (pl, 13)
18295       val () = addReal (pl, 17.0)
18296       val s1 = Int.toString (valOf (peekInt pl))
18297       val s2 = Real.toString (valOf (peekReal pl))
18298       val () = print (concat [s1, " ", s2, "\n"])
18299    end
18300 ----
18301
18302 Applied to an appropriate implementation `PROPERTY_LIST`, the `Test`
18303 functor will produce the following output.
18304
18305 ----
18306 13 17.0
18307 ----
18308
18309
18310 == Implementation ==
18311
18312 Because property lists can hold values of any type, their
18313 implementation requires a <:UniversalType:>.  Given that, a property
18314 list is simply a list of elements of the universal type.  Adding a
18315 property adds to the front of the list, and looking up a property
18316 scans the list.
18317
18318 [source,sml]
18319 ----
18320 functor PropertyList (U: UNIVERSAL_TYPE): PROPERTY_LIST =
18321    struct
18322       datatype t = T of U.t list ref
18323
18324       fun new () = T (ref [])
18325
18326       fun 'a newProperty () =
18327          let
18328             val (inject, out) = U.embed ()
18329             fun add (T r, a: 'a): unit = r := inject a :: (!r)
18330             fun peek (T r) =
18331                Option.map (valOf o out) (List.find (isSome o out) (!r))
18332          in
18333             {add = add, peek = peek}
18334          end
18335    end
18336 ----
18337
18338
18339 If `U: UNIVERSAL_TYPE`, then we can test our code as follows.
18340
18341 [source,sml]
18342 ----
18343 structure Z = Test (PropertyList (U))
18344 ----
18345
18346 Of course, a serious implementation of property lists would have to
18347 handle duplicate insertions of the same property, as well as the
18348 removal of elements in order to avoid space leaks.
18349
18350 == Also see ==
18351
18352 * MLton relies heavily on property lists for attaching information to
18353 syntax tree nodes in its intermediate languages.  See
18354 <!ViewGitFile(mlton,master,lib/mlton/basic/property-list.sig)> and
18355 <!ViewGitFile(mlton,master,lib/mlton/basic/property-list.fun)>.
18356
18357 * The <:MLRISCLibrary:> <!Cite(LeungGeorge99, uses property lists
18358 extensively)>.
18359
18360 <<<
18361
18362 :mlton-guide-page: Pygments
18363 [[Pygments]]
18364 Pygments
18365 ========
18366
18367 http://pygments.org/[Pygments] is a generic syntax highlighter.  Here is a _lexer_ for highlighting
18368 <:StandardML: Standard ML>.
18369
18370 * <!ViewGitDir(mlton,master,ide/pygments/sml_lexer)> -- Provides highlighting of keywords, special constants, and (nested) comments.
18371
18372 == Install and use ==
18373 * Checkout all files and install as a http://pygments.org/[Pygments] plugin.
18374 +
18375 ----
18376 $ git clone https://github.com/MLton/mlton.git mlton
18377 $ cd mlton/ide/pygments
18378 $ python setup.py install
18379 ----
18380
18381 * Invoke `pygmentize` with `-l sml`.
18382
18383 == Feedback ==
18384
18385 Comments and suggestions should be directed to <:MatthewFluet:>.
18386
18387 <<<
18388
18389 :mlton-guide-page: RayRacine
18390 [[RayRacine]]
18391 RayRacine
18392 =========
18393
18394 Using SML in some _Semantic Web_ stuff.   Anyone interested in
18395 similar, please contact me.  GreyLensman on #sml on IRC or rracine at
18396 this domain adelphia with a dot here net.
18397
18398 Current areas of coding.
18399
18400 . Pretty solid, high performance Rete implementation - base functionality is complete.
18401 . N3 parser - mostly complete
18402 . RDF parser based on fxg - not started.
18403 . Swerve HTTP server - 1/2 done.
18404 . SPARQL implementation - not started.
18405 . Persistent engine based on BerkelyDB - not started.
18406 . Native implementation of Postgresql protocol - underway, ways to go.
18407 . I also have a small change to the MLton compiler to add ++PackWord__<N>__++ - changes compile but needs some more work, clean-up and unit tests.
18408
18409 <<<
18410
18411 :mlton-guide-page: Reachability
18412 [[Reachability]]
18413 Reachability
18414 ============
18415
18416 Reachability is a notion dealing with the graph of heap objects
18417 maintained at runtime.  Nodes in the graph are heap objects and edges
18418 correspond to the pointers between heap objects.  As the program runs,
18419 it allocates new objects (adds nodes to the graph), and those new
18420 objects can contain pointers to other objects (new edges in the
18421 graph).  If the program uses mutable objects (refs or arrays), it can
18422 also change edges in the graph.
18423
18424 At any time, the program has access to some finite set of _root_
18425 nodes, and can only ever access nodes that are reachable by following
18426 edges from these root nodes.  Nodes that are _unreachable_ can be
18427 garbage collected.
18428
18429 == Also see ==
18430
18431  * <:MLtonFinalizable:>
18432  * <:MLtonWeak:>
18433
18434 <<<
18435
18436 :mlton-guide-page: Redundant
18437 [[Redundant]]
18438 Redundant
18439 =========
18440
18441 <:Redundant:> is an optimization pass for the <:SSA:>
18442 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
18443
18444 == Description ==
18445
18446 The redundant SSA optimization eliminates redundant function and label
18447 arguments; an argument of a function or label is redundant if it is
18448 always the same as another argument of the same function or label.
18449 The analysis finds an equivalence relation on the arguments of a
18450 function or label, such that all arguments in an equivalence class are
18451 redundant with respect to the other arguments in the equivalence
18452 class; the transformation selects one representative of each
18453 equivalence class and drops the binding occurrence of
18454 non-representative variables and renames use occurrences of the
18455 non-representative variables to the representative variable.  The
18456 analysis finds the equivalence classes via a fixed-point analysis.
18457 Each vector of arguments to a function or label is initialized to
18458 equivalence classes that equate all arguments of the same type; one
18459 could start with an equivalence class that equates all arguments, but
18460 arguments of different type cannot be redundant.  Variables bound in
18461 statements are initialized to singleton equivalence classes.  The
18462 fixed-point analysis repeatedly refines these equivalence classes on
18463 the formals by the equivalence classes of the actuals.
18464
18465 == Implementation ==
18466
18467 * <!ViewGitFile(mlton,master,mlton/ssa/redundant.fun)>
18468
18469 == Details and Notes ==
18470
18471 The reason <:Redundant:> got put in was due to some output of the
18472 <:ClosureConvert:> pass converter where the environment record, or
18473 components of it, were passed around in several places.  That may have
18474 been more relevant with polyvariant analyses (which are long gone).
18475 But it still seems possibly relevant, especially with more aggressive
18476 flattening, which should reveal some fields in nested closure records
18477 that are redundant.
18478
18479 <<<
18480
18481 :mlton-guide-page: RedundantTests
18482 [[RedundantTests]]
18483 RedundantTests
18484 ==============
18485
18486 <:RedundantTests:> is an optimization pass for the <:SSA:>
18487 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
18488
18489 == Description ==
18490
18491 This pass simplifies conditionals whose results are implied by a
18492 previous conditional test.
18493
18494 == Implementation ==
18495
18496 * <!ViewGitFile(mlton,master,mlton/ssa/redundant-tests.fun)>
18497
18498 == Details and Notes ==
18499
18500 An additional test will sometimes eliminate the overflow test when
18501 adding or subtracting 1.  In particular, it will eliminate it in the
18502 following cases:
18503 [source,sml]
18504 ----
18505 if x < y
18506   then ... x + 1 ...
18507 else ... y - 1 ...
18508 ----
18509
18510 <<<
18511
18512 :mlton-guide-page: References
18513 [[References]]
18514 References
18515 ==========
18516
18517 <:#AAA:A>
18518 <:#BBB:B>
18519 <:#CCC:C>
18520 <:#DDD:D>
18521 <:#EEE:E>
18522 <:#FFF:F>
18523 <:#GGG:G>
18524 <:#HHH:H>
18525 <:#III:I>
18526 <:#JJJ:J>
18527 <:#KKK:K>
18528 <:#LLL:L>
18529 <:#MMM:M>
18530 <:#NNN:N>
18531 <:#OOO:O>
18532 <:#PPP:P>
18533 <:#QQQ:Q>
18534 <:#RRR:R>
18535 <:#SSS:S>
18536 <:#TTT:T>
18537 <:#UUU:U>
18538 <:#VVV:V>
18539 <:#WWW:W>
18540 <:#XXX:X>
18541 <:#YYY:Y>
18542 <:#ZZZ:Z>
18543
18544 == <!Anchor(AAA)>A ==
18545
18546  * <!Anchor(AcarEtAl06)>
18547  http://www.umut-acar.org/publications/pldi2006.pdf[An Experimental Analysis of Self-Adjusting Computation]
18548  Umut Acar, Guy Blelloch, Matthias Blume, and Kanat Tangwongsan.
18549  <:#PLDI:> 2006.
18550
18551  * <!Anchor(Appel92)>
18552  http://us.cambridge.org/titles/catalogue.asp?isbn=0521416957[Compiling with Continuations]
18553  (http://www.addall.com/New/submitNew.cgi?query=0-521-41695-7&type=ISBN&location=10000&state=&dispCurr=USD[addall]).
18554  ISBN 0521416957.
18555  Andrew W. Appel.
18556  Cambridge University Press, 1992.
18557
18558  * <!Anchor(Appel93)>
18559  http://www.cs.princeton.edu/research/techreps/TR-364-92[A Critique of Standard ML].
18560  Andrew W. Appel.
18561  <:#JFP:> 1993.
18562
18563  * <!Anchor(Appel98)>
18564  http://us.cambridge.org/titles/catalogue.asp?isbn=0521582741[Modern Compiler Implementation in ML]
18565  (http://www.addall.com/New/submitNew.cgi?query=0-521-58274-1&type=ISBN&location=10000&state=&dispCurr=USD[addall]).
18566  ISBN 0521582741
18567  Andrew W. Appel.
18568  Cambridge University Press, 1998.
18569
18570  * <!Anchor(AppelJim97)>
18571  http://ncstrl.cs.princeton.edu/expand.php?id=TR-556-97[Shrinking Lambda Expressions in Linear Time]
18572  Andrew Appel and Trevor Jim.
18573  <:#JFP:> 1997.
18574
18575  * <!Anchor(AppelEtAl94)>
18576  http://www.smlnj.org/doc/ML-Lex/manual.html[A lexical analyzer generator for Standard ML. Version 1.6.0]
18577  Andrew W. Appel, James S. Mattson, and David R. Tarditi.  1994
18578
18579 == <!Anchor(BBB)>B ==
18580
18581  * <!Anchor(BaudinetMacQueen85)>
18582  http://www.classes.cs.uchicago.edu/archive/2011/spring/22620-1/papers/macqueen-baudinet85.pdf[Tree Pattern Matching for ML].
18583  Marianne Baudinet, David MacQueen.  1985.
18584 +
18585 ____
18586 Describes the match compiler used in an early version of
18587 <:SMLNJ:SML/NJ>.
18588 ____
18589
18590  * <!Anchor(BentonEtAl98)>
18591  http://research.microsoft.com/en-us/um/people/nick/icfp98.pdf[Compiling Standard ML to Java Bytecodes].
18592  Nick Benton, Andrew Kennedy, and George Russell.
18593  <:#ICFP:> 1998.
18594
18595  * <!Anchor(BentonKennedy99)>
18596  http://research.microsoft.com/en-us/um/people/nick/SMLJavaInterop.pdf[Interlanguage Working Without Tears: Blending SML with Java].
18597  Nick Benton and Andrew Kennedy.
18598  <:#ICFP:> 1999.
18599
18600  * <!Anchor(BentonKennedy01)>
18601  http://research.microsoft.com/en-us/um/people/akenn/sml/ExceptionalSyntax.pdf[Exceptional Syntax].
18602  Nick Benton and Andrew Kennedy.
18603  <:#JFP:> 2001.
18604
18605  * <!Anchor(BentonEtAl04)>
18606  http://research.microsoft.com/en-us/um/people/nick/p53-Benton.pdf[Adventures in Interoperability: The SML.NET Experience].
18607  Nick Benton, Andrew Kennedy, and Claudio Russo.
18608  <:#PPDP:> 2004.
18609
18610  * <!Anchor(BentonEtAl04_2)>
18611  http://research.microsoft.com/en-us/um/people/nick/shrinking.pdf[Shrinking Reductions in SML.NET].
18612  Nick Benton, Andrew Kennedy, Sam Lindley and Claudio Russo.
18613  <:#IFL:> 2004.
18614 +
18615 ____
18616 Describes a linear-time implementation of an
18617 <!Cite(AppelJim97,Appel-Jim shrinker)>, using a mutable IL, and shows
18618 that it yields nice speedups in SML.NET's compile times.  There are
18619 also benchmarks showing that SML.NET when compiled by MLton runs
18620 roughly five times faster than when compiled by SML/NJ.
18621 ____
18622
18623  * <!Anchor(Benton05)>
18624  http://research.microsoft.com/en-us/um/people/nick/benton03.pdf[Embedded Interpreters].
18625  Nick Benton.
18626  <:#JFP:> 2005.
18627
18628  * <!Anchor(Berry91)>
18629  http://www.lfcs.inf.ed.ac.uk/reports/91/ECS-LFCS-91-148/ECS-LFCS-91-148.pdf[The Edinburgh SML Library].
18630  Dave Berry.
18631  University of Edinburgh Technical Report ECS-LFCS-91-148, 1991.
18632
18633  * <!Anchor(BerryEtAl93)>
18634  http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36.7958&rep=rep1&type=ps[A semantics for ML concurrency primitives].
18635  Dave Berry, Robin Milner, and David N. Turner.
18636  <:#POPL:> 1992.
18637
18638  * <!Anchor(Berry93)>
18639  http://journals.cambridge.org/abstract_S0956796800000873[Lessons From the Design of a Standard ML Library].
18640  Dave Berry.
18641  <:#JFP:> 1993.
18642
18643  * <!Anchor(Bertelsen98)>
18644  http://www.petermb.dk/sml2jvm.ps.gz[Compiling SML to Java Bytecode].
18645  Peter Bertelsen.
18646  Master's Thesis, 1998.
18647
18648  * <!Anchor(Berthomieu00)>
18649  http://homepages.laas.fr/bernard/oo/ooml.html[OO Programming styles in ML].
18650  Bernard Berthomieu.
18651  LAAS Report #2000111, 2000.
18652
18653  * <!Anchor(Blume01)>
18654  http://people.cs.uchicago.edu/~blume/papers/nlffi-entcs.pdf[No-Longer-Foreign: Teaching an ML compiler to speak C "natively"].
18655  Matthias Blume.
18656  <:#BABEL:> 2001.
18657
18658  * <!Anchor(Blume01_02)>
18659  http://people.cs.uchicago.edu/~blume/pgraph/proposal.pdf[Portable library descriptions for Standard ML].
18660  Matthias Blume.  2001.
18661
18662  * <!Anchor(Boehm03)>
18663  http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html[Destructors, Finalizers, and Synchronization].
18664  Hans Boehm.
18665  <:#POPL:> 2003.
18666 +
18667 ____
18668 Discusses a number of issues in the design of finalizers.  Many of the
18669 design choices are consistent with <:MLtonFinalizable:>.
18670 ____
18671
18672 == <!Anchor(CCC)>C ==
18673
18674  * <!Anchor(CejtinEtAl00)>
18675  http://www.cs.purdue.edu/homes/suresh/papers/icfp99.ps.gz[Flow-directed Closure Conversion for Typed Languages].
18676  Henry Cejtin, Suresh Jagannathan, and Stephen Weeks.
18677  <:#ESOP:> 2000.
18678 +
18679 ____
18680 Describes MLton's closure-conversion algorithm, which translates from
18681 its simply-typed higher-order intermediate language to its
18682 simply-typed first-order intermediate language.
18683 ____
18684
18685  * <!Anchor(ChengBlelloch01)>
18686  http://www.cs.cmu.edu/afs/cs/project/pscico/pscico/papers/gc01/pldi-final.pdf[A Parallel, Real-Time Garbage Collector].
18687  Perry Cheng and Guy E. Blelloch.
18688  <:#PLDI:> 2001.
18689
18690  * <!Anchor(Claessen00)>
18691  http://users.eecs.northwestern.edu/~robby/courses/395-495-2009-fall/quick.pdf[QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs].
18692  Koen Claessen and John Hughes.
18693  <:#ICFP:> 2000.
18694
18695  * <!Anchor(Clinger98)>
18696  http://www.cesura17.net/~will/Professional/Research/Papers/tail.pdf[Proper Tail Recursion and Space Efficiency].
18697  William D. Clinger.
18698  <:#PLDI:> 1998.
18699
18700  * <!Anchor(CooperMorrisett90)>
18701  http://www.eecs.harvard.edu/~greg/papers/jgmorris-mlthreads.ps[Adding Threads to Standard ML].
18702  Eric C. Cooper and J. Gregory Morrisett.
18703  CMU Technical Report CMU-CS-90-186, 1990.
18704
18705  * <!Anchor(CouttsEtAl07)>
18706  http://metagraph.org/papers/stream_fusion.pdf[Stream Fusion: From Lists to Streams to Nothing at All].
18707  Duncan Coutts, Roman Leshchinskiy, and Don Stewart.
18708  Submitted for publication.  April 2007.
18709
18710 == <!Anchor(DDD)>D ==
18711
18712  * <!Anchor(DamasMilner82)>
18713  http://groups.csail.mit.edu/pag/6.883/readings/p207-damas.pdf[Principal Type-Schemes for Functional Programs].
18714  Luis Damas and Robin Milner.
18715  <:#POPL:> 1982.
18716
18717  * <!Anchor(Danvy98)>
18718  http://www.brics.dk/RS/98/12[Functional Unparsing].
18719  Olivier Danvy.
18720  BRICS Technical Report RS 98-12, 1998.
18721
18722  * <!Anchor(Deboer05)>
18723  http://alleystoughton.us/eXene/dusty-thesis.pdf[Exhancements to eXene].
18724  Dustin B. deBoer.
18725  Master of Science Thesis, 2005.
18726 +
18727 ____
18728 Describes ways to improve widget concurrency, handling of input focus,
18729 X resources and selections.
18730 ____
18731
18732  * <!Anchor(DoligezLeroy93)>
18733  http://cristal.inria.fr/~doligez/publications/doligez-leroy-popl-1993.pdf[A Concurrent, Generational Garbage Collector for a Multithreaded Implementation of ML].
18734  Damien Doligez and Xavier Leroy.
18735  <:#POPL:> 1993.
18736
18737  * <!Anchor(Dreyer07)>
18738  http://www.mpi-sws.org/~dreyer/papers/mtc/main-long.pdf[Modular Type Classes].
18739  Derek Dreyer, Robert Harper, Manuel M.T. Chakravarty, Gabriele Keller.
18740  University of Chicago Technical Report TR-2007-02, 2006.
18741
18742  * <!Anchor(DreyerBlume07)>
18743  http://www.mpi-sws.org/~dreyer/papers/infmod/main-long.pdf[Principal Type Schemes for Modular Programs].
18744  Derek Dreyer and Matthias Blume.
18745  <:#ESOP:> 2007.
18746
18747  * <!Anchor(Dubois95)>
18748  ftp://ftp.inria.fr/INRIA/Projects/cristal/Francois.Rouaix/generics.dvi.Z[Extensional Polymorphism].
18749  Catherin Dubois, Francois Rouaix, and Pierre Weis.
18750  <:#POPL:> 1995.
18751 +
18752 ____
18753 An extension of ML that allows the definition of ad-hoc polymorphic
18754 functions by inspecting the type of their argument.
18755 ____
18756
18757 == <!Anchor(EEE)>E ==
18758
18759  * <!Anchor(Elsman03)>
18760  http://www.elsman.com/tldi03.pdf[Garbage Collection Safety for Region-based Memory Management].
18761  Martin Elsman.
18762  <:#TLDI:> 2003.
18763
18764  * <!Anchor(Elsman04)>
18765  http://www.elsman.com/ITU-TR-2004-43.pdf[Type-Specialized Serialization with Sharing].
18766  Martin Elsman.  University of Copenhagen. IT University Technical
18767  Report TR-2004-43, 2004.
18768
18769 == <!Anchor(FFF)>F ==
18770
18771  * <!Anchor(FelleisenFreidman98)>
18772  http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=4787[The Little MLer]
18773  (http://www3.addall.com/New/submitNew.cgi?query=026256114X&type=ISBN[addall]).
18774  ISBN 026256114X.
18775  Matthias Felleisen and Dan Freidman.
18776  The MIT Press, 1998.
18777
18778  * <!Anchor(FlattFindler04)>
18779  http://www.cs.utah.edu/plt/kill-safe/[Kill-Safe Synchronization Abstractions].
18780  Matthew Flatt and Robert Bruce Findler.
18781  <:#PLDI:> 2004.
18782
18783  * <!Anchor(FluetWeeks01)>
18784  http://www.cs.rit.edu/~mtf/research/contification[Contification Using Dominators].
18785  Matthew Fluet and Stephen Weeks.
18786  <:#ICFP:> 2001.
18787 +
18788 ____
18789 Describes contification, a generalization of tail-recursion
18790 elimination that is an optimization operating on MLton's static single
18791 assignment (SSA) intermediate language.
18792 ____
18793
18794  * <!Anchor(FluetPucella06)>
18795  http://www.cs.rit.edu/~mtf/research/phantom-subtyping/jfp06/jfp06.pdf[Phantom Types and Subtyping].
18796  Matthew Fluet and Riccardo Pucella.
18797  <:#JFP:> 2006.
18798
18799  * <!Anchor(Furuse01)>
18800  http://jfla.inria.fr/2001/actes/07-furuse.ps[Generic Polymorphism in ML].
18801  J{empty}. Furuse.
18802  <:#JFLA:> 2001.
18803 +
18804 ____
18805 The formalism behind G'CAML, which has an approach to ad-hoc
18806 polymorphism based on <!Cite(Dubois95)>, the differences being in how
18807 type checking works an an improved compilation approach for typecase
18808 that does the matching at compile time, not run time.
18809 ____
18810
18811 == <!Anchor(GGG)>G ==
18812
18813  * <!Anchor(GansnerReppy93)>
18814  http://alleystoughton.us/eXene/1993-trends.pdf[A Multi-Threaded Higher-order User Interface Toolkit].
18815  Emden R. Gansner and John H. Reppy.
18816  User Interface Software, 1993.
18817
18818  * <!Anchor(GansnerReppy04)>
18819 http://www.cambridge.org/gb/academic/subjects/computer-science/programming-languages-and-applied-logic/standard-ml-basis-library[The Standard ML Basis Library].
18820  (http://www3.addall.com/New/submitNew.cgi?query=9780521794787&type=ISBN[addall])
18821  ISBN 9780521794787.
18822  Emden R. Gansner and John H. Reppy.
18823  Cambridge University Press, 2004.
18824 +
18825 ____
18826 An introduction and overview of the <:BasisLibrary:Basis Library>,
18827 followed by a detailed description of each module.  The module
18828 descriptions are also available
18829 http://www.standardml.org/Basis[online].
18830 ____
18831
18832  * <!Anchor(GrossmanEtAl02)>
18833  http://www.cs.umd.edu/projects/cyclone/papers/cyclone-regions.pdf[Region-based Memory Management in Cyclone].
18834  Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling
18835  Wang, and James Cheney.
18836  <:#PLDI:> 2002.
18837
18838 == <!Anchor(HHH)>H ==
18839
18840  * <!Anchor(HallenbergEtAl02)>
18841  http://www.itu.dk/people/tofte/publ/pldi2002.pdf[Combining Region Inference and Garbage Collection].
18842  Niels Hallenberg, Martin Elsman, and Mads Tofte.
18843  <:#PLDI:> 2002.
18844
18845  * <!Anchor(HansenRichel99)>
18846  http://www.it.dtu.dk/introSML[Introduction to Programming Using SML]
18847  (http://www3.addall.com/New/submitNew.cgi?query=0201398206&type=ISBN[addall]).
18848  ISBN 0201398206.
18849  Michael R. Hansen, Hans Rischel.
18850  Addison-Wesley, 1999.
18851
18852  * <!Anchor(Harper11)>
18853  http://www.cs.cmu.edu/~rwh/smlbook/book.pdf[Programming in Standard ML].
18854  Robert Harper.
18855
18856  * <!Anchor(HarperEtAl93)>
18857  http://www.cs.cmu.edu/~rwh/papers/callcc/jfp.pdf[Typing First-Class Continuations in ML].
18858  Robert Harper, Bruce F. Duba, and David MacQueen.
18859  <:#JFP:> 1993.
18860
18861  * <!Anchor(HarperMitchell92)>
18862  http://www.cs.cmu.edu/~rwh/papers/xml/toplas93.pdf[On the Type Structure of Standard ML].
18863  Robert Harper and John C. Mitchell.
18864  <:#TOPLAS:> 1992.
18865
18866  * <!Anchor(HauserBenson04)>
18867  http://doi.ieeecomputersociety.org/10.1109/CSD.2004.1309122[On the Practicality and Desirability of Highly-concurrent, Mostly-functional Programming].
18868  Carl H. Hauser and David B. Benson.
18869  <:#ACSD:> 2004.
18870 +
18871 ____
18872 Describes the use of <:ConcurrentML: Concurrent ML> in implementing
18873 the Ped text editor.  Argues that using large numbers of threads and
18874 message passing style is a practical and effective way of
18875 modularizing a program.
18876 ____
18877
18878  * <!Anchor(HeckmanWilhelm97)>
18879  http://rw4.cs.uni-sb.de/~heckmann/abstracts/neuform.html[A Functional Description of TeX's Formula Layout].
18880  Reinhold Heckmann and Reinhard Wilhelm.
18881  <:#JFP:> 1997.
18882
18883  * <!Anchor(HicksEtAl03)>
18884  http://wwwold.cs.umd.edu/Library/TRs/CS-TR-4514/CS-TR-4514.pdf[Safe and Flexible Memory Management in Cyclone].
18885  Mike Hicks, Greg Morrisett, Dan Grossman, and Trevor Jim.
18886  University of Maryland Technical Report CS-TR-4514, 2003.
18887
18888  * <!Anchor(Hurd04)>
18889  http://www.gilith.com/research/talks/tphols2004.pdf[Compiling HOL4 to Native Code].
18890  Joe Hurd.
18891  <:#TPHOLs:> 2004.
18892 +
18893 ____
18894 Describes a port of HOL from Moscow ML to MLton, the difficulties
18895 encountered in compiling large programs, and the speedups achieved
18896 (roughly 10x).
18897 ____
18898
18899 == <!Anchor(III)>I ==
18900
18901 {empty}
18902
18903 == <!Anchor(JJJ)>J ==
18904
18905  * <!Anchor(Jones99)>
18906  http://www.cs.kent.ac.uk/people/staff/rej/gcbook[Garbage Collection: Algorithms for Automatic Memory Management]
18907  (http://www3.addall.com/New/submitNew.cgi?query=0471941484&type=ISBN[addall]).
18908  ISBN 0471941484.
18909  Richard Jones.
18910  John Wiley & Sons, 1999.
18911
18912 == <!Anchor(KKK)>K ==
18913
18914  * <!Anchor(Kahrs93)>
18915  http://kar.kent.ac.uk/21122/[Mistakes and Ambiguities in the Definition of Standard ML].
18916  Stefan Kahrs.
18917  University of Edinburgh Technical Report ECS-LFCS-93-257, 1993.
18918 +
18919 ____
18920 Describes a number of problems with the
18921 <!Cite(MilnerEtAl90,1990 Definition)>, many of which were fixed in the
18922 <!Cite(MilnerEtAl97,1997 Definition)>.
18923
18924 Also see the http://www.cs.kent.ac.uk/~smk/errors-new.ps.Z[addenda]
18925 published in 1996.
18926 ____
18927
18928  * <!Anchor(Karvonen07)>
18929  http://dl.acm.org/citation.cfm?doid=1292535.1292547[Generics for the Working ML'er].
18930  Vesa Karvonen.
18931  <:#ML:> 2007. http://research.microsoft.com/~crusso/ml2007/slides/ml08rp-karvonen-slides.pdf[Slides] from the presentation are also available.
18932
18933  * <!Anchor(Kennedy04)>
18934  http://research.microsoft.com/~akenn/fun/picklercombinators.pdf[Pickler Combinators].
18935  Andrew Kennedy.
18936  <:#JFP:> 2004.
18937
18938  * <!Anchor(KoserEtAl03)>
18939  http://www.litech.org/~vaughan/pdf/dpcool2003.pdf[sml2java: A Source To Source Translator].
18940  Justin Koser, Haakon Larsen, Jeffrey A. Vaughan.
18941  <:#DPCOOL:> 2003.
18942
18943 == <!Anchor(LLL)>L ==
18944
18945  * <!Anchor(Lang99)>
18946  http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29.7130&rep=rep1&type=ps[Faster Algorithms for Finding Minimal Consistent DFAs].
18947  Kevin Lang. 1999.
18948
18949  * <!Anchor(LarsenNiss04)>
18950  http://usenix.org/publications/library/proceedings/usenix04/tech/freenix/full_papers/larsen/larsen.pdf[mGTK: An SML binding of Gtk+].
18951  Ken Larsen and Henning Niss.
18952  USENIX Annual Technical Conference, 2004.
18953
18954  * <!Anchor(Leibig13)>
18955  http://www.cs.rit.edu/~bal6053/msproject/[An LLVM Back-end for MLton].
18956  Brian Leibig.
18957  MS Project Report, 2013.
18958 +
18959 ____
18960 Describes MLton's <:LLVMCodegen:>.
18961 ____
18962
18963  * <!Anchor(Leroy90)>
18964  http://pauillac.inria.fr/~xleroy/bibrefs/Leroy-ZINC.html[The ZINC Experiment: an Economical Implementation of the ML Language].
18965  Xavier Leroy.
18966  Technical report 117, INRIA, 1990.
18967 +
18968 ____
18969 A detailed explanation of the design and implementation of a bytecode
18970 compiler and interpreter for ML with a machine model aimed at
18971 efficient implementation.
18972 ____
18973
18974  * <!Anchor(Leroy93)>
18975  http://pauillac.inria.fr/~xleroy/bibrefs/Leroy-poly-par-nom.html[Polymorphism by Name for References and Continuations].
18976  Xavier Leroy.
18977  <:#POPL:> 1993.
18978
18979  * <!Anchor(LeungGeorge99)>
18980  http://www.cs.nyu.edu/leunga/my-papers/annotations.ps[MLRISC Annotations].
18981  Allen Leung and Lal George. 1999.
18982
18983 == <!Anchor(MMM)>M ==
18984
18985  * <!Anchor(MarlowEtAl01)>
18986  http://community.haskell.org/~simonmar/papers/async.pdf[Asynchronous Exceptions in Haskell].
18987  Simon Marlow, Simon Peyton Jones, Andy Moran and John Reppy.
18988  <:#PLDI:> 2001.
18989 +
18990 ____
18991 An asynchronous exception is a signal that one thread can send to
18992 another, and is useful for the receiving thread to treat as an
18993 exception so that it can clean up locks or other state relevant to its
18994 current context.
18995 ____
18996
18997  * <!Anchor(MacQueenEtAl84)>
18998  http://homepages.inf.ed.ac.uk/gdp/publications/Ideal_model.pdf[An Ideal Model for Recursive Polymorphic Types].
18999  David MacQueen, Gordon Plotkin, Ravi Sethi.
19000  <:#POPL:> 1984.
19001
19002  * <!Anchor(Matthews91)>
19003  http://www.lfcs.inf.ed.ac.uk/reports/91/ECS-LFCS-91-174[A Distributed Concurrent Implementation of Standard ML].
19004  David Matthews.
19005  University of Edinburgh Technical Report ECS-LFCS-91-174, 1991.
19006
19007  * <!Anchor(Matthews95)>
19008  http://www.lfcs.inf.ed.ac.uk/reports/95/ECS-LFCS-95-335[Papers on Poly/ML].
19009  David C. J. Matthews.
19010  University of Edinburgh Technical Report ECS-LFCS-95-335, 1995.
19011
19012  * http://www.lfcs.inf.ed.ac.uk/reports/97/ECS-LFCS-97-375[That About Wraps it Up: Using FIX to Handle Errors Without Exceptions, and Other Programming Tricks].
19013  Bruce J. McAdam.
19014  University of Edinburgh Technical Report ECS-LFCS-97-375, 1997.
19015
19016  * <!Anchor(MeierNorgaard93)>
19017  A Just-In-Time Backend for Moscow ML 2.00 in SML.
19018  Bjarke Meier, Kristian Nørgaard.
19019  Masters Thesis, 2003.
19020 +
19021 ____
19022 A just-in-time compiler using GNU Lightning, showing a speedup of up
19023 to four times over Moscow ML's usual bytecode interpreter.
19024
19025 The full report is only available in
19026 http://www.itu.dk/stud/speciale/bmkn/fundanemt/download/report[Danish].
19027 ____
19028
19029  * <!Anchor(Milner78)>
19030  http://courses.engr.illinois.edu/cs421/sp2013/project/milner-polymorphism.pdf[A Theory of Type Polymorphism in Programming].
19031  Robin Milner.
19032  Journal of Computer and System Sciences, 1978.
19033
19034  * <!Anchor(Milner82)>
19035  http://homepages.inf.ed.ac.uk/dts/fps/papers/evolved.dvi.gz[How ML Evolved].
19036  Robin Milner.
19037  Polymorphism--The ML/LCF/Hope Newsletter, 1983.
19038
19039  * <!Anchor(MilnerTofte91)>
19040  http://www.itu.dk/people/tofte/publ/1990sml/1990sml.html[Commentary on Standard ML]
19041  (http://www3.addall.com/New/submitNew.cgi?query=0262631377&type=ISBN[addall])
19042  ISBN 0262631377.
19043  Robin Milner and Mads Tofte.
19044  The MIT Press, 1991.
19045 +
19046 ____
19047 Introduces and explains the notation and approach used in
19048 <!Cite(MilnerEtAl90,The Definition of Standard ML)>.
19049 ____
19050
19051  * <!Anchor(MilnerEtAl90)>
19052  http://www.itu.dk/people/tofte/publ/1990sml/1990sml.html[The Definition of Standard ML].
19053  (http://www3.addall.com/New/submitNew.cgi?query=0262631326&type=ISBN[addall])
19054  ISBN 0262631326.
19055  Robin Milner, Mads Tofte, and Robert Harper.
19056  The MIT Press, 1990.
19057 +
19058 ____
19059 Superseded by <!Cite(MilnerEtAl97,The Definition of Standard ML (Revised))>.
19060 Accompanied by the <!Cite(MilnerTofte91,Commentary on Standard ML)>.
19061 ____
19062
19063  * <!Anchor(MilnerEtAl97)>
19064  http://mitpress.mit.edu/books/definition-standard-ml[The Definition of Standard ML (Revised)].
19065  (http://www3.addall.com/New/submitNew.cgi?query=0262631814&type=ISBN[addall])
19066  ISBN 0262631814.
19067  Robin Milner, Mads Tofte, Robert Harper, and David MacQueen.
19068  The MIT Press, 1997.
19069 +
19070 ____
19071 A terse and formal specification of Standard ML's syntax and
19072 semantics.  Supersedes <!Cite(MilnerEtAl90,The Definition of Standard ML)>.
19073 ____
19074
19075  * <!Anchor(ML2000)>
19076  http://flint.cs.yale.edu/flint/publications/ml2000.html[Principles and a Preliminary Design for ML2000].
19077  The ML2000 working group, 1999.
19078
19079  * <!Anchor(Morentsen99)>
19080  http://daimi.au.dk/CPnets/workshop99/papers/Mortensen.pdf[Automatic Code Generation from Coloured Petri Nets for an Access Control System].
19081  Kjeld H. Mortensen.
19082  Workshop on Practical Use of Coloured Petri Nets and Design/CPN, 1999.
19083
19084  * <!Anchor(MorrisettTolmach93)>
19085  http://web.cecs.pdx.edu/~apt/ppopp93.ps[Procs and Locks: a Portable Multiprocessing Platform for Standard ML of New Jersey].
19086  J{empty}. Gregory Morrisett and Andrew Tolmach.
19087  <:#PPoPP:> 1993.
19088
19089  * <!Anchor(Murphy06)>
19090  http://www.cs.cmu.edu/~tom7/papers/grid-ml06.pdf[ML Grid Programming with ConCert].
19091  Tom Murphy VII.
19092  <:#ML:> 2006.
19093
19094 == <!Anchor(NNN)>N ==
19095
19096  * <!Anchor(Neumann99)>
19097  http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.25.9485&rep=rep1&type=ps[fxp - Processing Structured Documents in SML].
19098  Andreas Neumann.
19099  Scottish Functional Programming Workshop, 1999.
19100 +
19101 ____
19102 Describes http://atseidl2.informatik.tu-muenchen.de/~berlea/Fxp[fxp],
19103 an XML parser implemented in Standard ML.
19104 ____
19105
19106  * <!Anchor(Neumann99Thesis)>
19107  http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.25.8108&rep=rep1&type=ps[Parsing and Querying XML Documents in SML].
19108  Andreas Neumann.
19109  Doctoral Thesis, 1999.
19110
19111  * <!Anchor(NguyenOhori06)>
19112  http://www.pllab.riec.tohoku.ac.jp/~ohori/research/NguyenOhoriPPDP06.pdf[Compiling ML Polymorphism with Explicit Layout Bitmap].
19113  Huu-Duc Nguyen and Atsushi Ohori.
19114  <:#PPDP:> 2006.
19115
19116 == <!Anchor(OOO)>O ==
19117
19118  * <!Anchor(Okasaki99)>
19119 http://www.cambridge.org/gb/academic/subjects/computer-science/programming-languages-and-applied-logic/purely-functional-data-structures[Purely Functional Data Structures].
19120  ISBN 9780521663502.
19121  Chris Okasaki.
19122  Cambridge University Press, 1999.
19123
19124  * <!Anchor(Ohori89)>
19125  http://www.pllab.riec.tohoku.ac.jp/~ohori/research/fpca89.pdf[A Simple Semantics for ML Polymorphism].
19126  Atsushi Ohori.
19127  <:#FPCA:> 1989.
19128
19129  * <!Anchor(Ohori95)>
19130  http://www.pllab.riec.tohoku.ac.jp/~ohori/research/toplas95.pdf[A Polymorphic Record Calculus and Its Compilation].
19131  Atsushi Ohori.
19132  <:#TOPLAS:> 1995.
19133
19134  * <!Anchor(OhoriTakamizawa97)>
19135  http://www.pllab.riec.tohoku.ac.jp/~ohori/research/jlsc97.pdf[An Unboxed Operational Semantics for ML Polymorphism].
19136  Atsushi Ohori and Tomonobu Takamizawa.
19137  <:#LASC:> 1997.
19138
19139  * <!Anchor(Ohori99)>
19140  http://www.pllab.riec.tohoku.ac.jp/~ohori/research/ic98.pdf[Type-Directed Specialization of Polymorphism].
19141  Atsushi Ohori.
19142  <:#IC:> 1999.
19143
19144  * <!Anchor(OwensEtAl09)>
19145  http://www.mpi-sws.org/~turon/re-deriv.pdf[Regular-expression derivatives reexamined].
19146  Scott Owens, John Reppy, and Aaron Turon.
19147  <:#JFP:> 2009.
19148
19149 == <!Anchor(PPP)>P ==
19150
19151  * <!Anchor(Paulson96)>
19152  http://www.cambridge.org/co/academic/subjects/computer-science/programming-languages-and-applied-logic/ml-working-programmer-2nd-edition[ML For the Working Programmer]
19153  (http://www3.addall.com/New/submitNew.cgi?query=052156543X&type=ISBN[addall])
19154  ISBN 052156543X.
19155  Larry C. Paulson.
19156  Cambridge University Press, 1996.
19157
19158  * <!Anchor(PetterssonEtAl02)>
19159  http://user.it.uu.se/~kostis/Papers/flops02_22.ps.gz[The HiPE/x86 Erlang Compiler: System Description and Performance Evaluation].
19160  Mikael Pettersson, Konstantinos Sagonas, and Erik Johansson.
19161  <:#FLOPS:> 2002.
19162 +
19163 ____
19164 Describes a native x86 Erlang compiler and a comparison of many
19165 different native x86 compilers (including MLton) and their register
19166 usage and call stack implementations.
19167 ____
19168
19169  * <!Anchor(Price09)>
19170  http://rogerprice.org/#UG[User's Guide to ML-Lex and ML-Yacc]
19171  Roger Price.  2009.
19172
19173  * <!Anchor(Pucella98)>
19174  http://arxiv.org/abs/cs.PL/0405080[Reactive Programming in Standard ML].
19175  Riccardo R. Puccella.  1998.
19176  <:#ICCL:> 1998.
19177
19178 == <!Anchor(QQQ)>Q ==
19179
19180 {empty}
19181
19182 == <!Anchor(RRR)>R ==
19183
19184  * <!Anchor(Ramsey90)>
19185  https://www.cs.princeton.edu/research/techreps/TR-262-90[Concurrent Programming in ML].
19186  Norman Ramsey.
19187  Princeton University Technical Report CS-TR-262-90, 1990.
19188
19189  * <!Anchor(Ramsey11)>
19190  http://www.cs.tufts.edu/~nr/pubs/embedj-abstract.html[Embedding an Interpreted Language Using Higher-Order Functions and Types].
19191  Norman Ramsey.
19192  <:#JFP:> 2011.
19193
19194  * <!Anchor(RamseyFisherGovereau05)>
19195  http://www.cs.tufts.edu/~nr/pubs/els-abstract.html[An Expressive Language of Signatures].
19196  Norman Ramsey, Kathleen Fisher, and Paul Govereau.
19197  <:#ICFP:> 2005.
19198
19199  * <!Anchor(RedwineRamsey04)>
19200  http://www.cs.tufts.edu/~nr/pubs/widen-abstract.html[Widening Integer Arithmetic].
19201  Kevin Redwine and Norman Ramsey.
19202  <:#CC:> 2004.
19203 +
19204 ____
19205 Describes a method to implement numeric types and operations (like
19206 `Int31` or `Word17`) for sizes smaller than that provided by the
19207 processor.
19208 ____
19209
19210  * <!Anchor(Reppy88)>
19211  Synchronous Operations as First-Class Values.
19212  John Reppy.
19213  <:#PLDI:> 1988.
19214
19215  * <!Anchor(Reppy07)>
19216  http://www.cambridge.org/co/academic/subjects/computer-science/distributed-networked-and-mobile-computing/concurrent-programming-ml[Concurrent Programming in ML]
19217  (http://www3.addall.com/New/submitNew.cgi?query=9780521714723&type=ISBN[addall]).
19218  ISBN 9780521714723.
19219  John Reppy.
19220  Cambridge University Press, 2007.
19221 +
19222 ____
19223 Describes <:ConcurrentML:>.
19224 ____
19225
19226  * <!Anchor(Reynolds98)>
19227  https://users-cs.au.dk/hosc/local/HOSC-11-4-pp355-361.pdf[Definitional Interpreters Revisited].
19228  John C. Reynolds.
19229  <:#HOSC:> 1998.
19230
19231  * <!Anchor(Reynolds98_2)>
19232  https://users-cs.au.dk/hosc/local/HOSC-11-4-pp363-397.pdf[Definitional Interpreters for Higher-Order Programming Languages]
19233  John C. Reynolds.
19234  <:#HOSC:> 1998.
19235
19236  * <!Anchor(Rossberg01)>
19237  http://www.mpi-sws.org/~rossberg/papers/Rossberg%20-%20Defects%20in%20the%20Revised%20Definition%20of%20Standard%20ML%20%5B2007-01-22%20Update%5D.pdf[Defects in the Revised Definition of Standard ML].
19238  Andreas Rossberg. 2001.
19239
19240 == <!Anchor(SSS)>S ==
19241
19242  * <!Anchor(Sansom91)>
19243  http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.1020&rep=rep1&type=ps[Dual-Mode Garbage Collection].
19244  Patrick M. Sansom.
19245  Workshop on the Parallel Implementation of Functional Languages, 1991.
19246
19247  * <!Anchor(ScottRamsey00)>
19248  http://www.cs.tufts.edu/~nr/pubs/match-abstract.html[When Do Match-Compilation Heuristics Matter].
19249  Kevin Scott and Norman Ramsey.
19250  University of Virginia Technical Report CS-2000-13, 2000.
19251 +
19252 ____
19253 Modified SML/NJ to experimentally compare a number of
19254 match-compilation heuristics and showed that choice of heuristic
19255 usually does not significantly affect code size or run time.
19256 ____
19257
19258  * <!Anchor(Sestoft96)>
19259  http://www.itu.dk/~sestoft/papers/match.ps.gz[ML Pattern Match Compilation and Partial Evaluation].
19260  Peter Sestoft.
19261  Partial Evaluation, 1996.
19262 +
19263 ____
19264 Describes the derivation of the match compiler used in
19265 <:MoscowML:Moscow ML>.
19266 ____
19267
19268  * <!Anchor(ShaoAppel94)>
19269  http://flint.cs.yale.edu/flint/publications/closure.html[Space-Efficient Closure Representations].
19270  Zhong Shao and Andrew W. Appel.
19271  <:#LFP:> 1994.
19272
19273  * <!Anchor(Shipman02)>
19274  <!Attachment(References,Shipman02.pdf,Unix System Programming with Standard ML)>.
19275  Anthony L. Shipman.
19276  2002.
19277 +
19278 ____
19279 Includes a description of the <:Swerve:> HTTP server written in SML.
19280 ____
19281
19282  * <!Anchor(Signoles03)>
19283  Calcul Statique des Applications de Modules Parametres.
19284  Julien Signoles.
19285  <:#JFLA:> 2003.
19286 +
19287 ____
19288 Describes a http://caml.inria.fr/cgi-bin/hump.en.cgi?contrib=382[defunctorizer]
19289 for OCaml, and compares it to existing defunctorizers, including MLton.
19290 ____
19291
19292  * <!Anchor(SittampalamEtAl04)>
19293  http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.4.1349&rep=rep1&type=ps[Incremental Execution of Transformation Specifications].
19294  Ganesh Sittampalam, Oege de Moor, and Ken Friis Larsen.
19295  <:#POPL:> 2004.
19296 +
19297 ____
19298 Mentions a port from Moscow ML to MLton of
19299 http://www.itu.dk/research/muddy/[MuDDY], an SML wrapper around the
19300 http://sourceforge.net/projects/buddy[BuDDY] BDD package.
19301 ____
19302
19303  * <!Anchor(SwaseyEtAl06)>
19304  http://www.cs.cmu.edu/~tom7/papers/smlsc2-ml06.pdf[A Separate Compilation Extension to Standard ML].
19305  David Swasey, Tom Murphy VII, Karl Crary and Robert Harper.
19306  <:#ML:> 2006.
19307
19308 == <!Anchor(TTT)>T ==
19309
19310  * <!Anchor(TarditiAppel00)>
19311  http://www.smlnj.org/doc/ML-Yacc/index.html[ML-Yacc User's Manual. Version 2.4]
19312  David R. Tarditi and Andrew W. Appel. 2000.
19313
19314  * <!Anchor(TarditiEtAl90)>
19315  http://research.microsoft.com/pubs/68738/loplas-sml2c.ps[No Assembly Required: Compiling Standard ML to C].
19316  David Tarditi, Peter Lee, and Anurag Acharya. 1990.
19317
19318  * <!Anchor(ThorupTofte94)>
19319  http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53.5372&rep=rep1&type=ps[Object-oriented programming and Standard ML].
19320  Lars Thorup and Mads Tofte.
19321  <:#ML:>, 1994.
19322
19323  * <!Anchor(Tofte90)>
19324  Type Inference for Polymorphic References.
19325  Mads Tofte.
19326  <:#IC:> 1990.
19327
19328  * <!Anchor(Tofte96)>
19329  http://www.itu.dk/courses/FDP/E2004/Tofte-1996-Essentials_of_SML_Modules.pdf[Essentials of Standard ML Modules].
19330  Mads Tofte.
19331
19332  * <!Anchor(Tofte09)>
19333  http://www.itu.dk/people/tofte/publ/tips.pdf[Tips for Computer Scientists on Standard ML (Revised)].
19334  Mads Tofte.
19335
19336  * <!Anchor(TolmachAppel95)>
19337  http://web.cecs.pdx.edu/~apt/jfp95.ps[A Debugger for Standard ML].
19338  Andrew Tolmach and Andrew W. Appel.
19339  <:#JFP:> 1995.
19340
19341  * <!Anchor(Tolmach97)>
19342  http://web.cecs.pdx.edu/~apt/tic97.ps[Combining Closure Conversion with Closure Analysis using Algebraic Types].
19343  Andrew Tolmach.
19344  <:#TIC:> 1997.
19345 +
19346 ____
19347 Describes a closure-conversion algorithm for a monomorphic IL.  The
19348 algorithm uses a unification-based flow analysis followed by
19349 defunctionalization and is similar to the approach used in MLton
19350 (<!Cite(CejtinEtAl00)>).
19351 ____
19352
19353  * <!Anchor(TolmachOliva98)>
19354  http://web.cecs.pdx.edu/~apt/jfp98.ps[From ML to Ada: Strongly-typed Language Interoperability via Source Translation].
19355  Andrew Tolmach and Dino Oliva.
19356  <:#JFP:> 1998.
19357 +
19358 ____
19359 Describes a compiler for RML, a core SML-like language.  The compiler
19360 is similar in structure to MLton, using monomorphisation,
19361 defunctionalization, and optimization on a first-order IL.
19362 ____
19363
19364 == <!Anchor(UUU)>U ==
19365
19366  * <!Anchor(Ullman98)>
19367  http://www-db.stanford.edu/~ullman/emlp.html[Elements of ML Programming]
19368  (http://www3.addall.com/New/submitNew.cgi?query=0137903871&type=ISBN[addall]).
19369  ISBN 0137903871.
19370  Jeffrey D. Ullman.
19371  Prentice-Hall, 1998.
19372
19373 == <!Anchor(VVV)>V ==
19374
19375 {empty}
19376
19377 == <!Anchor(WWW)>W ==
19378
19379  * <!Anchor(Wand84)>
19380  http://portal.acm.org/citation.cfm?id=800527[A Types-as-Sets Semantics for Milner-Style Polymorphism].
19381  Mitchell Wand.
19382  <:#POPL:> 1984.
19383
19384  * <!Anchor(Wang01)>
19385  http://ncstrl.cs.princeton.edu/expand.php?id=TR-640-01[Managing Memory with Types].
19386  Daniel C. Wang.
19387  PhD Thesis.
19388 +
19389 ____
19390 Chapter 6 describes an implementation of a type-preserving garbage
19391 collector for MLton.
19392 ____
19393
19394  * <!Anchor(WangAppel01)>
19395  http://www.cs.princeton.edu/~appel/papers/typegc.pdf[Type-Preserving Garbage Collectors].
19396  Daniel C. Wang and Andrew W. Appel.
19397  <:#POPL:> 2001.
19398 +
19399 ____
19400 Shows how to modify MLton to generate a strongly-typed garbage
19401 collector as part of a program.
19402 ____
19403
19404  * <!Anchor(WangMurphy02)>
19405  http://www.cs.cmu.edu/~tom7/papers/wang-murphy-recursion.pdf[Programming With Recursion Schemes].
19406  Daniel C. Wang and Tom Murphy VII.
19407 +
19408 ____
19409 Describes a programming technique for data abstraction, along with
19410 benchmarks of MLton and other SML compilers.
19411 ____
19412
19413  * <!Anchor(Weeks06)>
19414  <!Attachment(References,060916-mlton.pdf,Whole-Program Compilation in MLton)>.
19415  Stephen Weeks.
19416  <:#ML:> 2006.
19417
19418  * <!Anchor(Wright95)>
19419  http://homepages.inf.ed.ac.uk/dts/fps/papers/wright.ps.gz[Simple Imperative Polymorphism].
19420  Andrew Wright.
19421  <:#LASC:>, 8(4):343-355, 1995.
19422 +
19423 ____
19424 The origin of the <:ValueRestriction:>.
19425 ____
19426
19427 == <!Anchor(XXX)>X ==
19428
19429 {empty}
19430
19431 == <!Anchor(YYY)>Y ==
19432
19433  * <!Anchor(Yang98)>
19434  http://cs.nyu.edu/zheyang/papers/YangZ\--ICFP98.html[Encoding Types in ML-like Languages].
19435  Zhe Yang.
19436  <:#ICFP:> 1998.
19437
19438 == <!Anchor(ZZZ)>Z ==
19439
19440  * <!Anchor(ZiarekEtAl06)>
19441  http://www.cs.purdue.edu/homes/lziarek/icfp06.pdf[Stabilizers: A Modular Checkpointing Abstraction for Concurrent Functional Programs].
19442  Lukasz Ziarek, Philip Schatz, and Suresh Jagannathan.
19443  <:#ICFP:> 2006.
19444
19445  * <!Anchor(ZiarekEtAl08)>
19446  http://www.cse.buffalo.edu/~lziarek/hosc.pdf[Flattening tuples in an SSA intermediate representation].
19447  Lukasz Ziarek, Stephen Weeks, and Suresh Jagannathan.
19448  <:#HOSC:> 2008.
19449
19450
19451 == Abbreviations ==
19452
19453 * <!Anchor(ACSD)> ACSD = International Conference on Application of Concurrency to System Design
19454 * <!Anchor(BABEL)> BABEL = Workshop on multi-language infrastructure and interoperability
19455 * <!Anchor(CC)> CC = International Conference on Compiler Construction
19456 * <!Anchor(DPCOOL)> DPCOOL = Workshop on Declarative Programming in the Context of OO Languages
19457 * <!Anchor(ESOP)> ESOP = European Symposium on Programming
19458 * <!Anchor(FLOPS)> FLOPS = Symposium on Functional and Logic Programming
19459 * <!Anchor(FPCA)> FPCA = Conference on Functional Programming Languages and Computer Architecture
19460 * <!Anchor(HOSC)> HOSC = Higher-Order and Symbolic Computation
19461 * <!Anchor(IC)> IC = Information and Computation
19462 * <!Anchor(ICCL)> ICCL = IEEE International Conference on Computer Languages
19463 * <!Anchor(ICFP)> ICFP = International Conference on Functional Programming
19464 * <!Anchor(IFL)> IFL = International Workshop on Implementation and Application of Functional Languages
19465 * <!Anchor(IVME)> IVME = Workshop on Interpreters, Virtual Machines and Emulators
19466 * <!Anchor(JFLA)> JFLA = Journees Francophones des Langages Applicatifs
19467 * <!Anchor(JFP)> JFP = Journal of Functional Programming
19468 * <!Anchor(LASC)> LASC = Lisp and Symbolic Computation
19469 * <!Anchor(LFP)> LFP = Lisp and Functional Programming
19470 * <!Anchor(ML)> ML = Workshop on ML
19471 * <!Anchor(PLDI)> PLDI = Conference on Programming Language Design and Implementation
19472 * <!Anchor(POPL)> POPL = Symposium on Principles of Programming Languages
19473 * <!Anchor(PPDP)> PPDP = International Conference on Principles and Practice of Declarative Programming
19474 * <!Anchor(PPoPP)> PPoPP = Principles and Practice of Parallel Programming
19475 * <!Anchor(TCS)> TCS = IFIP International Conference on Theoretical Computer Science
19476 * <!Anchor(TIC)> TIC = Types in Compilation
19477 * <!Anchor(TLDI)> TLDI = Workshop on Types in Language Design and Implementation
19478 * <!Anchor(TOPLAS)> TOPLAS = Transactions on Programming Languages and Systems
19479 * <!Anchor(TPHOLs)> TPHOLs = International Conference on Theorem Proving in Higher Order Logics
19480
19481 <<<
19482
19483 :mlton-guide-page: RefFlatten
19484 [[RefFlatten]]
19485 RefFlatten
19486 ==========
19487
19488 <:RefFlatten:> is an optimization pass for the <:SSA2:>
19489 <:IntermediateLanguage:>, invoked from <:SSA2Simplify:>.
19490
19491 == Description ==
19492
19493 This pass flattens a `ref` cell into its containing object.
19494 The idea is to replace, where possible, a type like
19495 ----
19496 (int ref * real)
19497 ----
19498
19499 with a type like
19500 ----
19501 (int[m] * real)
19502 ----
19503
19504 where the `[m]` indicates a mutable field of a tuple.
19505
19506 == Implementation ==
19507
19508 * <!ViewGitFile(mlton,master,mlton/ssa/ref-flatten.fun)>
19509
19510 == Details and Notes ==
19511
19512 The savings is obvious, I hope.  We avoid an extra heap-allocated
19513 object for the `ref`, which in the above case saves two words.  We
19514 also save the time and code for the extra indirection at each get and
19515 set.  There are lots of useful data structures (singly-linked and
19516 doubly-linked lists, union-find, Fibonacci heaps, ...) that I believe
19517 we are paying through the nose right now because of the absence of ref
19518 flattening.
19519
19520 The idea is to compute for each occurrence of a `ref` type in the
19521 program whether or not that `ref` can be represented as an offset of
19522 some object (constructor or tuple).  As before, a unification-based
19523 whole-program with deep abstract values makes sure the analysis is
19524 consistent.
19525
19526 The only syntactic part of the analysis that remains is the part that
19527 checks that for a variable bound to a value constructed by `Ref_ref`:
19528
19529 * the object allocation is in the same block.  This is pretty
19530 draconian, and it would be nice to generalize it some day to allow
19531 flattening as long as the `ref` allocation and object allocation "line
19532 up one-to-one" in the same loop-free chunk of code.
19533
19534 * updates occur in the same block (and hence it is safe-for-space
19535 because the containing object is still alive).  It would be nice to
19536 relax this to allow updates as long as it can be provedthat the
19537 container is live.
19538
19539 Prevent flattening of `unit ref`-s.
19540
19541 <:RefFlatten:> is safe for space.  The idea is to prevent a `ref`
19542 being flattened into an object that has a component of unbounded size
19543 (other than possibly the `ref` itself) unless we can prove that at
19544 each point the `ref` is live, then the containing object is live too.
19545 I used a pretty simple approximation to liveness.
19546
19547 <<<
19548
19549 :mlton-guide-page: Regions
19550 [[Regions]]
19551 Regions
19552 =======
19553
19554 In region-based memory management, the heap is divided into a
19555 collection of regions into which objects are allocated.  At compile
19556 time, either in the source program or through automatic inference,
19557 allocation points are annotated with the region in which the
19558 allocation will occur.  Typically, although not always, the regions
19559 are allocated and deallocated according to a stack discipline.
19560
19561 MLton does not use region-based memory management; it uses traditional
19562 <:GarbageCollection:>.  We have considered integrating regions with
19563 MLton, but in our opinion it is far from clear that regions would
19564 provide MLton with improved performance, while they would certainly
19565 add a lot of complexity to the compiler and complicate reasoning about
19566 and achieving <:SpaceSafety:>.  Region-based memory management and
19567 garbage collection have different strengths and weaknesses; it's
19568 pretty easy to come up with programs that do significantly better
19569 under regions than under GC, and vice versa.  We believe that it is
19570 the case that common SML idioms tend to work better under GC than
19571 under regions.
19572
19573 One common argument for regions is that the region operations can all
19574 be done in (approximately) constant time; therefore, you eliminate GC
19575 pause times, leading to a real-time GC.  However, because of space
19576 safety concerns (see below), we believe that region-based memory
19577 management for SML must also include a traditional garbage collector.
19578 Hence, to achieve real-time memory management for MLton/SML, we
19579 believe that it would be both easier and more efficient to implement a
19580 traditional real-time garbage collector than it would be to implement
19581 a region system.
19582
19583 == Regions, the ML Kit, and space safety ==
19584
19585 The <:MLKit:ML Kit> pioneered the use of regions for compiling
19586 Standard ML.  The ML Kit maintains a stack of regions at run time.  At
19587 compile time, it uses region inference to decide when data can be
19588 allocated in a stack-like manner, assigning it to an appropriate
19589 region.  The ML Kit has put a lot of effort into improving the
19590 supporting analyses and representations of regions, which are all
19591 necessary to improve the performance.
19592
19593 Unfortunately, under a pure stack-based region system, space leaks are
19594 inevitable in theory, and costly in practice.  Data for which region
19595 inference can not determine the lifetime is moved into the "global
19596 region" whose lifetime is the entire program.  There are two ways in
19597 which region inference will place an object to the global region.
19598
19599 * When the inference is too conservative, that is, when the data is
19600 used in a stack-like manner but the region inference can't figure it
19601 out.
19602
19603 * When data is not used in a stack-like manner.  In this case,
19604 correctness requires region inference to place the object
19605
19606 This global region is a source of space leaks.  No matter what region
19607 system you use, there are some programs such that the global region
19608 must exist, and its size will grow to an unbounded multiple of the
19609 live data size.  For these programs one must have a GC to achieve
19610 space safety.
19611
19612 To solve this problem, the ML Kit has undergone work to combine
19613 garbage collection with region-based memory management.
19614 <!Cite(HallenbergEtAl02)> and <!Cite(Elsman03)> describe the addition
19615 of a garbage collector to the ML Kit's region-based system.  These
19616 papers provide convincing evidence for space leaks in the global
19617 region.  They show a number of benchmarks where the memory usage of
19618 the program running with just regions is a large multiple (2, 10, 50,
19619 even 150) of the program running with regions plus GC.
19620
19621 These papers also give some numbers to show the ML Kit with just
19622 regions does better than either a system with just GC or a combined
19623 system.  Unfortunately, a pure region system isn't practical because
19624 of the lack of space safety.  And the other performance numbers are
19625 not so convincing, because they compare to an old version of SML/NJ
19626 and not at all with MLton.  It would be interesting to see a
19627 comparison with a more serious collector.
19628
19629 == Regions, Garbage Collection, and Cyclone ==
19630
19631 One possibility is to take Cyclone's approach, and provide both
19632 region-based memory management and garbage collection, but at the
19633 programmer's option (<!Cite(GrossmanEtAl02)>, <!Cite(HicksEtAl03)>).
19634
19635 One might ask whether we might do the same thing -- i.e., provide a
19636 `MLton.Regions` structure with explicit region based memory
19637 management operations, so that the programmer could use them when
19638 appropriate.  <:MatthewFluet:> has thought about this question
19639
19640 * http://www.cs.cornell.edu/People/fluet/rgn-monad/index.html
19641
19642 Unfortunately, his conclusion is that the SML type system is too weak
19643 to support this option, although there might be a "poor-man's" version
19644 with dynamic checks.
19645
19646 <<<
19647
19648 :mlton-guide-page: Release20041109
19649 [[Release20041109]]
19650 Release20041109
19651 ===============
19652
19653 This is an archived public release of MLton, version 20041109.
19654
19655 == Changes since the last public release ==
19656
19657 * New platforms:
19658 ** x86: FreeBSD 5.x, OpenBSD
19659 ** PowerPC: Darwin (MacOSX)
19660 * Support for the <:MLBasis: ML Basis system>, a new mechanism supporting programming in the very large, separate delivery of library sources, and more.
19661 * Support for dynamic libraries.
19662 * Support for <:ConcurrentML:> (CML).
19663 * New structures: `Int2`, `Int3`, ..., `Int31` and `Word2`, `Word3`, ..., `Word31`.
19664 * Front-end bug fixes and improvements.
19665 * A new form of profiling with ++-profile count++, which can be used to test code coverage.
19666 * A bytecode generator, available via ++-codegen bytecode++.
19667 * Representation improvements:
19668 ** Tuples and datatypes are packed to decrease space usage.
19669 ** Ref cells may be unboxed into their containing object.
19670 ** Arrays of tuples may represent the tuples unboxed.
19671
19672 For a complete list of changes and bug fixes since 20040227, see the
19673 <!RawGitFile(mlton,on-20041109-release,doc/changelog)>.
19674
19675 == Also see ==
19676
19677 * <:Bugs20041109:>
19678
19679 <<<
19680
19681 :mlton-guide-page: Release20051202
19682 [[Release20051202]]
19683 Release20051202
19684 ===============
19685
19686 This is an archived public release of MLton, version 20051202.
19687
19688 == Changes since the last public release ==
19689
19690 * The <:License:MLton license> is now BSD-style instead of the GPL.
19691 * New platforms: <:RunningOnMinGW:X86/MinGW> and HPPA/Linux.
19692 * Improved and expanded documentation, based on the MLton wiki.
19693 * Compiler.
19694 ** improved exception history.
19695 ** <:CompileTimeOptions:Command-line switches>.
19696 *** Added: ++-as-opt++, ++-mlb-path-map++, ++-target-as-opt++, ++-target-cc-opt++.
19697 *** Removed: ++-native++, ++-sequence-unit++, ++-warn-match++, ++-warn-unused++.
19698 * Language.
19699 ** <:ForeignFunctionInterface:FFI> syntax changes and extensions.
19700 *** Added: `_symbol`.
19701 *** Changed: `_export`, `_import`.
19702 *** Removed: `_ffi`.
19703 ** <:MLBasisAnnotations:ML Basis annotations>.
19704 *** Added: `allowFFI`, `nonexhaustiveExnMatch`, `nonexhaustiveMatch`, `redundantMatch`, `sequenceNonUnit`.
19705 *** Deprecated: `allowExport`, `allowImport`, `sequenceUnit`, `warnMatch`.
19706 * Libraries.
19707 ** Basis Library.
19708 *** Added: `Int1`, `Word1`.
19709 ** <:MLtonStructure:MLton structure>.
19710 *** Added: `Process.create`, `ProcEnv.setgroups`, `Rusage.measureGC`, `Socket.fdToSock`, `Socket.Ctl.getError`.
19711 *** Changed: `MLton.Platform.Arch`.
19712 ** Other libraries.
19713 *** Added: <:CKitLibrary:ckit>, <:MLNLFFI:ML-NLFFI library>, <:SMLNJLibrary:SML/NJ library>.
19714 * Tools.
19715 ** Updates of `mllex` and `mlyacc` from SML/NJ.
19716 ** Added <:MLNLFFI:mlnlffigen>.
19717 ** <:Profiling:> supports better inclusion/exclusion of code.
19718
19719 For a complete list of changes and bug fixes since
19720 <:Release20041109:>, see the
19721 <!RawGitFile(mlton,on-20051202-release,doc/changelog)> and
19722 <:Bugs20041109:>.
19723
19724 == 20051202 binary packages ==
19725
19726 * x86
19727 ** http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton-20051202-1.i386-cygwin.tgz[Cygwin] 1.5.18-1
19728 ** http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton-20051202-1.i386-freebsd.tbz[FreeBSD] 5.4
19729 ** Linux
19730 *** http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton_20051202-1_i386.deb[Debian] sid
19731 *** http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton_20051202-1_i386.stable.deb[Debian] stable (Sarge)
19732 *** http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton-20051202-1.i386.rpm[RedHat] 7.1-9.3 FC1-FC4
19733 *** http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton-20051202-1.i386-linux.tgz[tgz] for other distributions (glibc 2.3)
19734 ** http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton-20051202-1.i386-mingw.tgz[MinGW]
19735 ** http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton-20051202-1.i386-netbsd.tgz[NetBSD] 2.0.2
19736 ** http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton-20051202-1.i386-openbsd.tgz[OpenBSD] 3.7
19737 * PowerPC
19738 ** http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton-20051202-1.powerpc-darwin.tgz[Darwin] 7.9.0 (Mac OS X)
19739 * Sparc
19740 ** http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton-20051202-1.sparc-solaris.tgz[Solaris] 8
19741
19742 == 20051202 source packages ==
19743
19744 * http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton-20051202-1.src.tgz[source tgz]
19745 * Debian http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton_20051202-1.dsc[dsc], http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton_20051202-1.diff.gz[diff.gz], http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton_20051202.orig.tar.gz[orig.tar.gz]
19746 * RedHat http://sourceforge.net/projects/mlton/files/mlton/20051202/mlton-20051202-1.src.rpm[source rpm]
19747
19748 == Packages available at other sites ==
19749
19750 * http://packages.debian.org/cgi-bin/search_packages.pl?searchon=names&version=all&exact=1&keywords=mlton[Debian]
19751 * http://www.freebsd.org/cgi/ports.cgi?query=mlton&stype=all[FreeBSD]
19752 * Fedora Core http://fedoraproject.org/extras/4/i386/repodata/repoview/mlton-0-20051202-8.fc4.html[4] http://fedoraproject.org/extras/5/i386/repodata/repoview/mlton-0-20051202-8.fc5.html[5]
19753 * http://packages.ubuntu.com/dapper/devel/mlton[Ubuntu]
19754
19755 == Also see ==
19756
19757 * <:Bugs20051202:>
19758 * http://www.mlton.org/guide/20051202/[MLton Guide (20051202)].
19759 +
19760 A snapshot of the MLton wiki at the time of release.
19761
19762 <<<
19763
19764 :mlton-guide-page: Release20070826
19765 [[Release20070826]]
19766 Release20070826
19767 ===============
19768
19769 This is an archived public release of MLton, version 20070826.
19770
19771 == Changes since the last public release ==
19772
19773 * New platforms:
19774 ** <:RunningOnAMD64:AMD64>/<:RunningOnLinux:Linux>, <:RunningOnAMD64:AMD64>/<:RunningOnFreeBSD:FreeBSD>
19775 ** <:RunningOnHPPA:HPPA>/<:RunningOnHPUX:HPUX>
19776 ** <:RunningOnPowerPC:PowerPC>/<:RunningOnAIX:AIX>
19777 ** <:RunningOnX86:X86>/<:RunningOnDarwin:Darwin (Mac OS X)>
19778 * Compiler.
19779 ** Support for 64-bit platforms.
19780 *** Native amd64 codegen.
19781 ** <:CompileTimeOptions:Compile-time options>.
19782 *** Added: ++-codegen amd64++, ++-codegen x86++, ++-default-type __type__++, ++-profile-val {false|true}++.
19783 *** Changed: ++-stop f++ (file listing now includes `.mlb` files).
19784 ** Bytecode codegen.
19785 *** Support for exception history.
19786 *** Support for profiling.
19787 * Language.
19788 *** <:MLBasisAnnotations:ML Basis annotations>.
19789 **** Removed: `allowExport`, `allowImport`, `sequenceUnit`, `warnMatch`.
19790 * Libraries.
19791 ** <:BasisLibrary:Basis Library>.
19792 *** Added: `PackWord16Big`, `PackWord16Little`, `PackWord64Big`, `PackWord64Little`.
19793 *** Bug fixes: see <!RawGitFile(mlton,on-20070826-release,doc/changelog)>.
19794 ** <:MLtonStructure:MLton structure>.
19795 *** Added: `MLTON_MONO_ARRAY`, `MLTON_MONO_VECTOR`, `MLTON_REAL`, `MLton.BinIO.tempPrefix`, `MLton.CharArray`, `MLton.CharVector`, `MLton.Exn.defaultTopLevelHandler`, `MLton.Exn.getTopLevelHandler`, `MLton.Exn.setTopLevelHandler`, `MLton.IntInf.BigWord`, `Mlton.IntInf.SmallInt`, `MLton.LargeReal`, `MLton.LargeWord`, `MLton.Real`, `MLton.Real32`, `MLton.Real64`, `MLton.Rlimit.Rlim`, `MLton.TextIO.tempPrefix`, `MLton.Vector.create`, `MLton.Word.bswap`, `MLton.Word8.bswap`, `MLton.Word16`, `MLton.Word32`, `MLton.Word64`, `MLton.Word8Array`, `MLton.Word8Vector`.
19796 *** Changed: `MLton.Array.unfoldi`, `MLton.IntInf.rep`, `MLton.Rlimit`, `MLton.Vector.unfoldi`.
19797 *** Deprecated: `MLton.Socket`.
19798 ** Other libraries.
19799 *** Added: <:MLRISCLibrary:MLRISC library>.
19800 *** Updated: <:CKitLibrary:ckit library>, <:SMLNJLibrary:SML/NJ library>.
19801 * Tools.
19802
19803 For a complete list of changes and bug fixes since
19804 <:Release20051202:>, see the
19805 <!RawGitFile(mlton,on-20070826-release,doc/changelog)> and
19806 <:Bugs20051202:>.
19807
19808 == 20070826 binary packages ==
19809
19810 * AMD64
19811 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.amd64-linux.tgz[Linux], glibc 2.3
19812 * HPPA
19813 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.hppa-hpux1100.tgz[HPUX] 11.00 and above, statically linked against <:GnuMP:>
19814 * PowerPC
19815 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.powerpc-aix51.tgz[AIX] 5.1 and above, statically linked against <:GnuMP:>
19816 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.powerpc-darwin.gmp-static.tgz[Darwin] 8.10 (Mac OS X), statically linked against <:GnuMP:>
19817 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.powerpc-darwin.gmp-macports.tgz[Darwin] 8.10 (Mac OS X), dynamically linked against <:GnuMP:> in `/opt/local/lib` (suitable for http://macports.org[MacPorts] install of <:GnuMP:>)
19818 * Sparc
19819 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.sparc-solaris8.tgz[Solaris] 8 and above, statically linked against <:GnuMP:>
19820 * X86
19821 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.x86-cygwin.tgz[Cygwin] 1.5.24-2
19822 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.x86-darwin.gmp-macports.tgz[Darwin (.tgz)] 8.10 (Mac OS X), dynamically linked against <:GnuMP:> in `/opt/local/lib` (suitable for http://macports.org[MacPorts] install of <:GnuMP:>)
19823 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.x86-darwin.gmp-macports.dmg[Darwin (.dmg)] 8.10 (Mac OS X), dynamically linked against <:GnuMP:> in `/opt/local/lib` (suitable for http://macports.org[MacPorts] install of <:GnuMP:>)
19824 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.x86-darwin.gmp-static.tgz[Darwin (.tgz)] 8.10 (Mac OS X), statically linked against <:GnuMP:>
19825 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.x86-darwin.gmp-static.dmg[Darwin (.dmg)] 8.10 (Mac OS X), statically linked against <:GnuMP:>
19826 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.x86-freebsd.tgz[FreeBSD]
19827 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.x86-linux.tgz[Linux], glibc 2.3
19828 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.x86-linux.glibc213.gmp-static.tgz[Linux], glibc 2.1, statically linked against <:GnuMP:>
19829 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.x86-mingw.gmp-dll.tgz[MinGW], dynamically linked against <:GnuMP:> (requires `libgmp-3.dll`)
19830 ** http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.x86-mingw.gmp-static.tgz[MinGW], statically linked against <:GnuMP:>
19831
19832 == 20070826 source packages ==
19833
19834  * http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton-20070826-1.src.tgz[source tgz]
19835
19836  * Debian http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton_20070826-1.dsc[dsc],
19837  http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton_20070826-1.diff.gz[diff.gz],
19838  http://sourceforge.net/projects/mlton/files/mlton/20070826/mlton_20070826.orig.tar.gz[orig.tar.gz]
19839
19840 == Packages available at other sites ==
19841
19842 * http://packages.debian.org/search?keywords=mlton&searchon=names&suite=all&section=all[Debian]
19843 * http://www.freebsd.org/cgi/ports.cgi?query=mlton&stype=all[FreeBSD]
19844 * https://admin.fedoraproject.org/pkgdb/packages/name/mlton[Fedora]
19845 * http://packages.ubuntu.com/cgi-bin/search_packages.pl?keywords=mlton&searchon=names&version=all&release=all[Ubuntu]
19846
19847 == Also see ==
19848
19849 * <:Bugs20070826:>
19850 * http://www.mlton.org/guide/20070826/[MLton Guide (20070826)].
19851 +
19852 A snapshot of the MLton wiki at the time of release.
19853
19854 <<<
19855
19856 :mlton-guide-page: Release20100608
19857 [[Release20100608]]
19858 Release20100608
19859 ===============
19860
19861 This is an archived public release of MLton, version 20100608.
19862
19863 == Changes since the last public release ==
19864
19865 * New platforms.
19866 ** <:RunningOnAMD64:AMD64>/<:RunningOnDarwin:Darwin> (Mac OS X Snow Leopard)
19867 ** <:RunningOnIA64:IA64>/<:RunningOnHPUX:HPUX>
19868 ** <:RunningOnPowerPC64:PowerPC64>/<:RunningOnAIX:AIX>
19869 * Compiler.
19870 ** <:CompileTimeOptions:Command-line switches>.
19871 *** Added: ++-mlb-path-var __<name> <value>__++
19872 *** Removed: ++-keep sml++, ++-stop sml++
19873 ** Improved constant folding of floating-point operations.
19874 ** Experimental: Support for compiling to a C library; see <:LibrarySupport: documentation>.
19875 ** Extended ++-show-def-use __output__++ to include types of variable definitions.
19876 ** Deprecated features (to be removed in a future release)
19877 *** Bytecode codegen: The bytecode codegen has not seen significant use and it is not well understood by any of the active developers.
19878 *** Support for `.cm` files as input: The ML Basis system provides much better infrastructure for "programming in the very large" than the (very) limited support for CM.  The `cm2mlb` tool (available in the source distribution) can be used to convert CM projects to MLB projects, preserving the CM scoping of module identifiers.
19879 ** Bug fixes: see <!RawGitFile(mlton,on-20100608-release,doc/changelog)>
19880 * Runtime.
19881 ** <:RunTimeOptions:@MLton switches>.
19882 *** Added: ++may-page-heap {false|true}++
19883 ** ++may-page-heap++: By default, MLton will not page the heap to disk when unable to grow the heap to accommodate an allocation. (Previously, this behavior was the default, with no means to disable, with security an least-surprise issues.)
19884 ** Bug fixes: see <!RawGitFile(mlton,on-20100608-release,doc/changelog)>
19885 * Language.
19886 ** Allow numeric characters in <:MLBasis:ML Basis> path variables.
19887 * Libraries.
19888 ** <:BasisLibrary:Basis Library>.
19889 *** Bug fixes: see <!RawGitFile(mlton,on-20100608-release,doc/changelog)>
19890 ** <:MLtonStructure:MLton structure>.
19891 *** Added: `MLton.equal`, `MLton.hash`, `MLton.Cont.isolate`, `MLton.GC.Statistics`, `MLton.Pointer.sizeofPointer`, `MLton.Socket.Address.toVector`
19892 *** Changed:
19893 *** Deprecated: `MLton.Socket`
19894 ** <:UnsafeStructure:Unsafe structure>.
19895 *** Added versions of all of the monomorphic array and vector structures.
19896 ** Other libraries.
19897 *** Updated: <:CKitLibrary:ckit library>, <:MLRISCLibrary:MLRISC library>, <:SMLNJLibrary:SML/NJ library>.
19898 * Tools.
19899 ** `mllex`
19900 *** Eliminated top-level `type int = Int.int` in output.
19901 *** Include `(*#line line:col "file.lex" *)` directives in output.
19902 *** Added `%posint` command, to set the `yypos` type and allow the lexing of multi-gigabyte files.
19903 ** `mlnlffigen`
19904 *** Added command-line switches `-linkage archive` and `-linkage shared`.
19905 *** Deprecated command-line switch `-linkage static`.
19906 *** Added support for <:RunningOnIA64:IA64> and <:RunningOnHPPA:HPPA> targets.
19907 ** `mlyacc`
19908 *** Eliminated top-level `type int = Int.int` in output.
19909 *** Include `(*#line line:col "file.grm" *)` directives in output.
19910
19911 For a complete list of changes and bug fixes since <:Release20070826:>, see the
19912 <!RawGitFile(mlton,on-20100608-release,doc/changelog)>
19913 and <:Bugs20070826:>.
19914
19915 == 20100608 binary packages ==
19916
19917 * AMD64 (aka "x86-64" or "x64")
19918 ** http://sourceforge.net/projects/mlton/files/mlton/20100608/mlton-20100608-1.amd64-darwin.gmp-macports.tgz[Darwin (.tgz)] 10.3 (Mac OS X Snow Leopard), dynamically linked against <:GnuMP:> in `/opt/local/lib` (suitable for http://macports.org[MacPorts] install of <:GnuMP:>)
19919 ** http://sourceforge.net/projects/mlton/files/mlton/20100608/mlton-20100608-1.amd64-darwin.gmp-static.tgz[Darwin (.tgz)] 10.3 (Mac OS X Snow Leopard), statically linked against <:GnuMP:> (but requires <:GnuMP:> for generated executables)
19920 ** http://sourceforge.net/projects/mlton/files/mlton/20100608/mlton-20100608-1.amd64-linux.tgz[Linux], glibc 2.11
19921 ** http://sourceforge.net/projects/mlton/files/mlton/20100608/mlton-20100608-1.amd64-linux.static.tgz[Linux], statically linked
19922 ** Windows MinGW 32/64 http://sourceforge.net/projects/mlton/files/mlton/20100608/MLton-20100608-1.exe[self-extracting] (28MB) or http://sourceforge.net/projects/mlton/files/mlton/20100608/MLton-20100608-1.msi[MSI] (61MB) installer
19923 * X86
19924 ** http://sourceforge.net/projects/mlton/files/mlton/20100608/mlton-20100608-1.x86-cygwin.tgz[Cygwin] 1.7.5
19925 ** http://sourceforge.net/projects/mlton/files/mlton/20100608/mlton-20100608-1.x86-darwin.gmp-macports.tgz[Darwin (.tgz)] 9.8 (Mac OS X Leopard), dynamically linked against <:GnuMP:> in `/opt/local/lib` (suitable for http://macports.org[MacPorts] install of <:GnuMP:>)
19926 ** http://sourceforge.net/projects/mlton/files/mlton/20100608/mlton-20100608-1.x86-darwin.gmp-static.tgz[Darwin (.tgz)] 9.8 (Mac OS X Leopard), statically linked against <:GnuMP:> (but requires <:GnuMP:> for generated executables)
19927 ** http://sourceforge.net/projects/mlton/files/mlton/20100608/mlton-20100608-1.x86-linux.tgz[Linux], glibc 2.11
19928 ** http://sourceforge.net/projects/mlton/files/mlton/20100608/mlton-20100608-1.x86-linux.static.tgz[Linux], statically linked
19929 ** Windows MinGW 32/64 http://sourceforge.net/projects/mlton/files/mlton/20100608/MLton-20100608-1.exe[self-extracting] (28MB) or http://sourceforge.net/projects/mlton/files/mlton/20100608/MLton-20100608-1.msi[MSI] (61MB) installer
19930
19931 == 20100608 source packages ==
19932
19933  * http://sourceforge.net/projects/mlton/files/mlton/20100608/mlton-20100608.src.tgz[mlton-20100608.src.tgz]
19934
19935 == Packages available at other sites ==
19936
19937  * http://packages.debian.org/search?keywords=mlton&searchon=names&suite=all&section=all[Debian]
19938  * http://www.freebsd.org/cgi/ports.cgi?query=mlton&stype=all[FreeBSD]
19939  * https://admin.fedoraproject.org/pkgdb/acls/name/mlton[Fedora]
19940  * http://packages.ubuntu.com/search?suite=default&section=all&arch=any&searchon=names&keywords=mlton[Ubuntu]
19941
19942 == Also see ==
19943
19944 * <:Bugs20100608:>
19945 * http://www.mlton.org/guide/20100608/[MLton Guide (20100608)].
19946 +
19947 A snapshot of the MLton wiki at the time of release.
19948
19949 <<<
19950
19951 :mlton-guide-page: Release20130715
19952 [[Release20130715]]
19953 Release20130715
19954 ===============
19955
19956 This is an archived public release of MLton, version 20130715.
19957
19958 == Changes since the last public release ==
19959
19960 // * New platforms.
19961 // ** ???
19962 * Compiler.
19963 ** Cosmetic improvements to type-error messages.
19964 ** Removed features:
19965 *** Bytecode codegen: The bytecode codegen had not seen significant use and it was not well understood by any of the active developers.
19966 *** Support for `.cm` files as input: The <:MLBasis:ML Basis system> provides much better infrastructure for "programming in the very large" than the (very) limited support for CM.  The `cm2mlb` tool (available in the source distribution) can be used to convert CM projects to MLB projects, preserving the CM scoping of module identifiers.
19967 ** Bug fixes: see <!RawGitFile(mlton,on-20130715-release,doc/changelog)>
19968 * Runtime.
19969 ** Bug fixes: see <!RawGitFile(mlton,on-20130715-release,doc/changelog)>
19970 * Language.
19971 ** Interpret `(*#line line:col "file" *)` directives as relative file names.
19972 ** <:MLBasisAnnotations:ML Basis annotations>.
19973 *** Added: `resolveScope`
19974 * Libraries.
19975 ** <:BasisLibrary:Basis Library>.
19976 *** Improved performance of `String.concatWith`.
19977 *** Use bit operations for `REAL.class` and other low-level operations.
19978 *** Support additional variables with `Posix.ProcEnv.sysconf`.
19979 *** Bug fixes: see <!RawGitFile(mlton,on-20130715-release,doc/changelog)>
19980 ** <:MLtonStructure:MLton structure>.
19981 *** Removed: `MLton.Socket`
19982 ** Other libraries.
19983 *** Updated: <:CKitLibrary:ckit library>, <:MLRISCLibrary:MLRISC library>, <:SMLNJLibrary:SML/NJ library>
19984 *** Added: <:MLLPTLibrary:MLLPT library>
19985 * Tools.
19986 ** `mllex`
19987 *** Generate `(*#line line:col "file.lex" *)` directives with simple (relative) file names, rather than absolute paths.
19988 ** `mlyacc`
19989 *** Generate `(*#line line:col "file.grm" *)` directives with simple (relative) file names, rather than absolute paths.
19990 *** Fixed bug in comment-handling in lexer.
19991
19992 For a complete list of changes and bug fixes since
19993 <:Release20100608:>, see the
19994 <!RawGitFile(mlton,on-20130715-release,doc/changelog)> and
19995 <:Bugs20100608:>.
19996
19997 == 20130715 binary packages ==
19998
19999 * AMD64 (aka "x86-64" or "x64")
20000 ** http://sourceforge.net/projects/mlton/files/mlton/20130715/mlton-20130715-1.amd64-darwin.gmp-macports.tgz[Darwin (.tgz)] 11.4 (Mac OS X Lion), dynamically linked against <:GnuMP:> in `/opt/local/lib` (suitable for http://macports.org[MacPorts] install of <:GnuMP:>)
20001 ** http://sourceforge.net/projects/mlton/files/mlton/20130715/mlton-20130715-1.amd64-darwin.gmp-static.tgz[Darwin (.tgz)] 11.4 (Mac OS X Lion), statically linked against <:GnuMP:> (but requires <:GnuMP:> for generated executables)
20002 ** http://sourceforge.net/projects/mlton/files/mlton/20130715/mlton-20130715-1.amd64-linux.tgz[Linux], glibc 2.15
20003 // ** http://sourceforge.net/projects/mlton/files/mlton/20130715/mlton-20130715-1.amd64-linux.static.tgz[Linux], statically linked
20004 // ** Windows MinGW 32/64 http://sourceforge.net/projects/mlton/files/mlton/20130715/MLton-20130715-1.exe[self-extracting] (28MB) or http://sourceforge.net/projects/mlton/files/mlton/20130715/MLton-20130715-1.msi[MSI] (61MB) installer
20005 * X86
20006 // ** http://sourceforge.net/projects/mlton/files/mlton/20130715/mlton-20130715-1.x86-cygwin.tgz[Cygwin] 1.7.5
20007 ** http://sourceforge.net/projects/mlton/files/mlton/20130715/mlton-20130715-1.x86-linux.tgz[Linux], glibc 2.15
20008 // ** http://sourceforge.net/projects/mlton/files/mlton/20130715/mlton-20130715-1.x86-linux.static.tgz[Linux], statically linked
20009 // ** Windows MinGW 32/64 http://sourceforge.net/projects/mlton/files/mlton/20130715/MLton-20130715-1.exe[self-extracting] (28MB) or http://sourceforge.net/projects/mlton/files/mlton/20130715/MLton-20130715-1.msi[MSI] (61MB) installer
20010
20011 == 20130715 source packages ==
20012
20013  * http://sourceforge.net/projects/mlton/files/mlton/20130715/mlton-20130715.src.tgz[mlton-20130715.src.tgz]
20014
20015 == Downstream packages ==
20016
20017  * http://packages.debian.org/search?keywords=mlton&searchon=names&suite=all&section=all[Debian]
20018  * http://www.freebsd.org/cgi/ports.cgi?query=mlton&stype=all[FreeBSD]
20019  * https://admin.fedoraproject.org/pkgdb/acls/name/mlton[Fedora]
20020  * http://packages.ubuntu.com/search?suite=default&section=all&arch=any&searchon=names&keywords=mlton[Ubuntu]
20021
20022 == Also see ==
20023
20024 * <:Bugs20130715:>
20025 * http://www.mlton.org/guide/20130715/[MLton Guide (20130715)].
20026 +
20027 A snapshot of the MLton website at the time of release.
20028
20029 <<<
20030
20031 :mlton-guide-page: Release20180207
20032 [[Release20180207]]
20033 Release20180207
20034 ===============
20035
20036 Here you can download the latest public release of MLton, version 20180207.
20037
20038 == Changes since the last public release ==
20039
20040 * Compiler.
20041   ** Added an experimental LLVM codegen (`-codegen llvm`); requires LLVM tools
20042   (`llvm-as`, `opt`, `llc`) version &ge; 3.7.
20043   ** Made many substantial cosmetic improvements to front-end diagnostic
20044   messages, especially with respect to source location regions, type inference
20045   for `fun` and `val rec` declarations, signature constraints applied to a
20046   structure, `sharing type` specifications and `where type` signature
20047   expressions, type constructor or type variable escaping scope, and
20048   nonexhaustive pattern matching.
20049   ** Fixed minor bugs with exception replication, precedence parsing of function
20050   clauses, and simultaneous `sharing` of multiple structures.
20051   ** Made compilation deterministic (eliminate output executable name from
20052   compile-time specified `@MLton` runtime arguments; deterministically generate
20053   magic constant for executable).
20054   ** Updated `-show-basis` (recursively expand structures in environments,
20055   displaying components with long identifiers; append `(* @ region *)`
20056   annotations to items shown in environment).
20057   ** Forced amd64 codegen to generate PIC on amd64-linux targets.
20058 * Runtime.
20059   ** Added `gc-summary-file file` runtime option.
20060   ** Reorganized runtime support for `IntInf` operations so that programs that
20061   do not use `IntInf` compile to executables with no residual dependency on GMP.
20062   ** Changed heap representation to store forwarding pointer for an object in
20063   the object header (rather than in the object data and setting the header to a
20064   sentinel value).
20065 * Language.
20066   ** Added support for selected SuccessorML features; see
20067   http://mlton.org/SuccessorML for details.
20068   ** Added `(*#showBasis "file" *)` directive; see
20069   http://mlton.org/ShowBasisDirective for details.
20070   ** FFI:
20071     *** Added `pure`, `impure`, and `reentrant` attributes to `_import`.  An
20072     unattributed `_import` is treated as `impure`.  A `pure` `_import` may be
20073     subject to more aggressive optimizations (common subexpression elimination,
20074     dead-code elimination).  An `_import`-ed C function that (directly or
20075     indirectly) calls an `_export`-ed SML function should be attributed
20076     `reentrant`.
20077   ** ML Basis annotations.
20078     *** Added `allowSuccessorML {false|true}` to enable all SuccessorML features
20079     and other annotations to enable specific SuccessorML features; see
20080     http://mlton.org/SuccessorML for details.
20081     *** Split `nonexhaustiveMatch {warn|error|igore}` and `redundantMatch
20082     {warn|error|ignore}` into `nonexhaustiveMatch` and `redundantMatch`
20083     (controls diagnostics for `case` expressions, `fn` expressions, and `fun`
20084     declarations (which may raise `Match` on failure)) and `nonexhaustiveBind`
20085     and `redundantBind` (controls diagnostics for `val` declarations (which may
20086     raise `Bind` on failure)).
20087     *** Added `valrecConstr {warn|error|ignore}` to report when a `val rec` (or
20088     `fun`) declaration redefines an identifier that previously had constructor
20089     status.
20090 * Libraries.
20091   ** Basis Library.
20092     *** Improved performance of `Array.copy`, `Array.copyVec`, `Vector.append`,
20093     `String.^`, `String.concat`, `String.concatWith`, and other related
20094     functions by using `memmove` rather than element-by-element constructions.
20095   ** `Unsafe` structure.
20096     *** Added unsafe operations for array uninitialization and raw arrays; see
20097     https://github.com/MLton/mlton/pull/207 for details.
20098   ** Other libraries.
20099     *** Updated: ckit library, MLLPT library, MLRISC library, SML/NJ library
20100 * Tools.
20101   ** mlnlffigen
20102     *** Updated to warn and skip (rather than abort) when encountering functions
20103     with `struct`/`union` argument or return type.
20104
20105 For a complete list of changes and bug fixes since
20106 <:Release20130715:>, see the
20107 <!ViewGitFile(mlton,on-20180207-release,CHANGELOG.adoc)> and
20108 <:Bugs20130715:>.
20109
20110 == 20180207 binary packages ==
20111
20112 * AMD64 (aka "x86-64" or "x64")
20113 ** https://sourceforge.net/projects/mlton/files/mlton/20180207/mlton-20180207-1.amd64-darwin.gmp-homebrew.tgz[Darwin (.tgz)] 16.7 (Mac OS X Sierra), dynamically linked against <:GnuMP:> in `/usr/local/lib` (suitable for https://brew.sh/[Homebrew] install of <:GnuMP:>)
20114 ** https://sourceforge.net/projects/mlton/files/mlton/20180207/mlton-20180207-1.amd64-darwin.gmp-static.tgz[Darwin (.tgz)] 16.7 (Mac OS X Sierra), statically linked against <:GnuMP:> (but requires <:GnuMP:> for generated executables)
20115 ** https://sourceforge.net/projects/mlton/files/mlton/20180207/mlton-20180207-1.amd64-linux.tgz[Linux], glibc 2.23
20116 // ** Windows MinGW 32/64 https://sourceforge.net/projects/mlton/files/mlton/20180207/MLton-20180207-1.exe[self-extracting] (28MB) or https://sourceforge.net/projects/mlton/files/mlton/20180207/MLton-20180207-1.msi[MSI] (61MB) installer
20117 // * X86
20118 // ** https://sourceforge.net/projects/mlton/files/mlton/20180207/mlton-20180207-1.x86-cygwin.tgz[Cygwin] 1.7.5
20119 // ** https://sourceforge.net/projects/mlton/files/mlton/20180207/mlton-20180207-1.x86-linux.tgz[Linux], glibc 2.23
20120 // ** https://sourceforge.net/projects/mlton/files/mlton/20180207/mlton-20180207-1.x86-linux.static.tgz[Linux], statically linked
20121 // ** Windows MinGW 32/64 https://sourceforge.net/projects/mlton/files/mlton/20180207/MLton-20180207-1.exe[self-extracting] (28MB) or https://sourceforge.net/projects/mlton/files/mlton/20180207/MLton-20180207-1.msi[MSI] (61MB) installer
20122
20123 == 20180207 source packages ==
20124
20125  * https://sourceforge.net/projects/mlton/files/mlton/20180207/mlton-20180207.src.tgz[mlton-20180207.src.tgz]
20126
20127 == Also see ==
20128
20129 * <:Bugs20180207:>
20130 * http://www.mlton.org/guide/20180207/[MLton Guide (20180207)].
20131 +
20132 A snapshot of the MLton website at the time of release.
20133
20134 <<<
20135
20136 :mlton-guide-page: ReleaseChecklist
20137 [[ReleaseChecklist]]
20138 ReleaseChecklist
20139 ================
20140
20141 == Advance preparation for release ==
20142
20143 * Update `./CHANGELOG.adoc`.
20144 ** Write entries for missing notable commits.
20145 ** Write summary of changes from previous release.
20146 ** Update with estimated release date.
20147 * Update `./README.adoc`.
20148 ** Check features and description.
20149 * Update `man/{mlton,mlprof}.1`.
20150 ** Check compile-time and run-time options in `man/mlton.1`.
20151 ** Check options in `man/mlprof.1`.
20152 ** Update with estimated release date.
20153 * Update `doc/guide`.
20154 // ** Check <:OrphanedPages:> and <:WantedPages:>.
20155 ** Synchronize <:Features:> page with `./README.adoc`.
20156 ** Update <:Credits:> page with acknowledgements.
20157 ** Create *ReleaseYYYYMM??* page (i.e., forthcoming release) based on *ReleaseXXXXLLCC* (i.e., previous release).
20158 *** Update summary from `./CHANGELOG.adoc`.
20159 *** Update links to estimated release date.
20160 ** Create *BugsYYYYMM??* page based on *BugsXXXXLLCC*.
20161 *** Update links to estimated release date.
20162 ** Spell check pages.
20163 * Ensure that all updates are pushed to `master` branch of <!ViewGitProj(mlton)>.
20164
20165 == Prepare sources for tagging ==
20166
20167 * Update `./CHANGELOG.adoc`.
20168 ** Update with proper release date.
20169 * Update `man/{mlton,mlprof}.1`.
20170 ** Update with proper release date.
20171 * Update `doc/guide`.
20172 ** Rename *ReleaseYYYYMM??* to *ReleaseYYYYMMDD* with proper release date.
20173 *** Update links with proper release date.
20174 ** Rename *BugsYYYYMM??* to *BugsYYYYMMDD* with proper release date.
20175 *** Update links with proper release date.
20176 ** Update *ReleaseXXXXLLCC*.
20177 *** Change intro to "`This is an archived public release of MLton, version XXXXLLCC.`"
20178 ** Update <:Home:> with note of new release.
20179 *** Change `What's new?` text to `Please try out our new release, <:ReleaseYYYYMMDD:MLton YYYYMMDD>`.
20180 *** Update `Download` link with proper release date.
20181 ** Update <:Releases:> with new release.
20182 * Ensure that all updates are pushed to `master` branch of <!ViewGitProj(mlton)>.
20183
20184 == Tag sources ==
20185
20186 * Shell commands:
20187 +
20188 ----
20189 git clone http://github.com/MLton/mlton mlton.git
20190 cd mlton.git
20191 git checkout master
20192 git tag -a -m "Tagging YYYYMMDD release" on-YYYYMMDD-release master
20193 git push origin on-YYYYMMDD-release
20194 ----
20195
20196 == Packaging ==
20197
20198 === SourceForge FRS ===
20199
20200 * Create *YYYYMMDD* directory:
20201 +
20202 -----
20203 sftp user@frs.sourceforge.net:/home/frs/project/mlton/mlton
20204 sftp> mkdir YYYYMMDD
20205 sftp> quit
20206 -----
20207
20208 === Source release ===
20209
20210 * Create `mlton-YYYYMMDD.src.tgz`:
20211 +
20212 ----
20213 git clone http://github.com/MLton/mlton mlton
20214 cd mlton
20215 git checkout on-YYYYMMDD-release
20216 make MLTON_VERSION=YYYYMMDD source-release
20217 cd ..
20218 ----
20219 +
20220 or
20221 +
20222 ----
20223 wget https://github.com/MLton/mlton/archive/on-YYYYMMDD-release.tar.gz
20224 tar xzvf on-YYYYMMDD-release.tar.gz
20225 cd mlton-on-YYYYMMDD-release
20226 make MLTON_VERSION=YYYYMMDD source-release
20227 cd ..
20228 ----
20229
20230 * Upload `mlton-YYYYMMDD.src.tgz`:
20231 +
20232 -----
20233 scp mlton-YYYYMMDD.src.tgz user@frs.sourceforge.net:/home/frs/project/mlton/mlton/YYYYMMDD/
20234 -----
20235
20236 * Update *ReleaseYYYYMMDD* with `mlton-YYYYMMDD.src.tgz` link.
20237
20238 === Binary releases ===
20239
20240 * Build and create `mlton-YYYYMMDD-1.ARCH-OS.tgz`:
20241 +
20242 ----
20243 wget http://sourceforge.net/projects/mlton/files/mlton/YYYYMMDD/mlton-YYYYMMDD.src.tgz
20244 tar xzvf mlton-YYYYMMDD.src.tgz
20245 cd mlton-YYYYMMDD
20246 make binary-release
20247 cd ..
20248 ----
20249
20250 * Upload `mlton-YYYYMMDD-1.ARCH-OS.tgz`:
20251 +
20252 -----
20253 scp mlton-YYYYMMDD-1.ARCH-OS.tgz user@frs.sourceforge.net:/home/frs/project/mlton/mlton/YYYYMMDD/
20254 -----
20255
20256 * Update *ReleaseYYYYMMDD* with `mlton-YYYYMMDD-1.ARCH-OS.tgz` link.
20257
20258 == Website ==
20259
20260 * `guide/YYYYMMDD` gets a copy of `doc/guide/localhost`.
20261 * Shell commands:
20262 +
20263 ----
20264 wget http://sourceforge.net/projects/mlton/files/mlton/YYYYMMDD/mlton-YYYYMMDD.src.tgz
20265 tar xzvf mlton-YYYYMMDD.src.tgz
20266 cd mlton-YYYYMMDD
20267 cd doc/guide
20268 cp -prf localhost YYYYMMDD
20269 tar czvf guide-YYYYMMDD.tgz YYYYMMDD
20270 rsync -avzP --delete -e ssh YYYYMMDD user@web.sourceforge.net:/home/project-web/mlton/htdocs/guide/
20271 rsync -avzP --delete -e ssh guide-YYYYMMDD.tgz user@web.sourceforge.net:/home/project-web/mlton/htdocs/guide/
20272 ----
20273
20274 == Announce release ==
20275
20276 * Mail announcement to:
20277 ** mailto:MLton-devel@mlton.org[`MLton-devel@mlton.org`]
20278 ** mailto:MLton-user@mlton.org[`MLton-user@mlton.org`]
20279
20280 == Misc. ==
20281
20282 * Generate new <:Performance:> numbers.
20283
20284 <<<
20285
20286 :mlton-guide-page: Releases
20287 [[Releases]]
20288 Releases
20289 ========
20290
20291 Public releases of MLton:
20292
20293 * <:Release20180207:>
20294 * <:Release20130715:>
20295 * <:Release20100608:>
20296 * <:Release20070826:>
20297 * <:Release20051202:>
20298 * <:Release20041109:>
20299 * Release20040227
20300 * Release20030716
20301 * Release20030711
20302 * Release20030312
20303 * Release20020923
20304 * Release20020410
20305 * Release20011006
20306 * Release20010806
20307 * Release20010706
20308 * Release20000906
20309 * Release20000712
20310 * Release19990712
20311 * Release19990319
20312 * Release19980826
20313
20314 <<<
20315
20316 :mlton-guide-page: RemoveUnused
20317 [[RemoveUnused]]
20318 RemoveUnused
20319 ============
20320
20321 <:RemoveUnused:> is an optimization pass for both the <:SSA:> and
20322 <:SSA2:> <:IntermediateLanguage:>s, invoked from <:SSASimplify:> and
20323 <:SSA2Simplify:>.
20324
20325 == Description ==
20326
20327 This pass aggressively removes unused:
20328
20329 * datatypes
20330 * datatype constructors
20331 * datatype constructor arguments
20332 * functions
20333 * function arguments
20334 * function returns
20335 * blocks
20336 * block arguments
20337 * statements (variable bindings)
20338 * handlers from non-tail calls (mayRaise analysis)
20339 * continuations from non-tail calls (mayReturn analysis)
20340
20341 == Implementation ==
20342
20343 * <!ViewGitFile(mlton,master,mlton/ssa/remove-unused.fun)>
20344 * <!ViewGitFile(mlton,master,mlton/ssa/remove-unused2.fun)>
20345
20346 == Details and Notes ==
20347
20348 {empty}
20349
20350 <<<
20351
20352 :mlton-guide-page: Restore
20353 [[Restore]]
20354 Restore
20355 =======
20356
20357 <:Restore:> is a rewrite pass for the <:SSA:> and <:SSA2:>
20358 <:IntermediateLanguage:>s, invoked from <:KnownCase:> and
20359 <:LocalRef:>.
20360
20361 == Description ==
20362
20363 This pass restores the SSA condition for a violating <:SSA:> or
20364 <:SSA2:> program; the program must satisfy:
20365 ____
20366 Every path from the root to a use of a variable (excluding globals)
20367 passes through a def of that variable.
20368 ____
20369
20370 == Implementation ==
20371
20372 * <!ViewGitFile(mlton,master,mlton/ssa/restore.sig)>
20373 * <!ViewGitFile(mlton,master,mlton/ssa/restore.fun)>
20374 * <!ViewGitFile(mlton,master,mlton/ssa/restore2.sig)>
20375 * <!ViewGitFile(mlton,master,mlton/ssa/restore2.fun)>
20376
20377 == Details and Notes ==
20378
20379 Based primarily on Section 19.1 of <!Cite(Appel98, Modern Compiler
20380 Implementation in ML)>.
20381
20382 The main deviation is the calculation of liveness of the violating
20383 variables, which is used to predicate the insertion of phi arguments.
20384 This is due to the algorithm's bias towards imperative languages, for
20385 which it makes the assumption that all variables are defined in the
20386 start block and all variables are "used" at exit.
20387
20388 This is "optimized" for restoration of functions with small numbers of
20389 violating variables -- use bool vectors to represent sets of violating
20390 variables.
20391
20392 Also, we use a `Promise.t` to suspend part of the dominance frontier
20393 computation.
20394
20395 <<<
20396
20397 :mlton-guide-page: ReturnStatement
20398 [[ReturnStatement]]
20399 ReturnStatement
20400 ===============
20401
20402 Programmers coming from languages that have a `return` statement, such
20403 as C, Java, and Python, often ask how one can translate functions that
20404 return early into SML.  This page briefly describes a number of ways
20405 to translate uses of `return` to SML.
20406
20407 == Conditional iterator function ==
20408
20409 A conditional iterator function, such as
20410 http://www.standardml.org/Basis/list.html#SIG:LIST.find:VAL[`List.find`],
20411 http://www.standardml.org/Basis/list.html#SIG:LIST.exists:VAL[`List.exists`],
20412 or
20413 http://www.standardml.org/Basis/list.html#SIG:LIST.all:VAL[`List.all`]
20414 is probably what you want in most cases.  Unfortunately, it might be
20415 the case that the particular conditional iteration pattern that you
20416 want isn't provided for your data structure.  Usually the best
20417 alternative in such a case is to implement the desired iteration
20418 pattern as a higher-order function.  For example, to implement a
20419 `find` function for arrays (which already exists as
20420 http://www.standardml.org/Basis/array.html#SIG:ARRAY.findi:VAL[`Array.find`])
20421 one could write
20422
20423 [source,sml]
20424 ----
20425 fun find predicate array = let
20426    fun loop i =
20427        if i = Array.length array then
20428           NONE
20429        else if predicate (Array.sub (array, i)) then
20430           SOME (Array.sub (array, i))
20431        else
20432           loop (i+1)
20433 in
20434    loop 0
20435 end
20436 ----
20437
20438 Of course, this technique, while probably the most common case in
20439 practice, applies only if you are essentially iterating over some data
20440 structure.
20441
20442 == Escape handler ==
20443
20444 Probably the most direct way to translate code using `return`
20445 statements is to basically implement `return` using exception
20446 handling.  The mechanism can be packaged into a reusable module with
20447 the signature
20448 (<!ViewGitFile(mltonlib,master,com/ssh/extended-basis/unstable/public/control/exit.sig)>):
20449 [source,sml]
20450 ----
20451 sys::[./bin/InclGitFile.py mltonlib master com/ssh/extended-basis/unstable/public/control/exit.sig 6:]
20452 ----
20453
20454 (<!Cite(HarperEtAl93, Typing First-Class Continuations in ML)>
20455 discusses the typing of a related construct.)  The implementation
20456 (<!ViewGitFile(mltonlib,master,com/ssh/extended-basis/unstable/detail/control/exit.sml)>)
20457 is straightforward:
20458 [source,sml]
20459 ----
20460 sys::[./bin/InclGitFile.py mltonlib master com/ssh/extended-basis/unstable/detail/control/exit.sml 6:]
20461 ----
20462
20463 Here is an example of how one could implement a `find` function given
20464 an `app` function:
20465 [source,sml]
20466 ----
20467 fun appToFind (app : ('a -> unit) -> 'b -> unit)
20468               (predicate : 'a -> bool)
20469               (data : 'b) =
20470     Exit.call
20471        (fn return =>
20472            (app (fn x =>
20473                     if predicate x then
20474                        return (SOME x)
20475                     else
20476                        ())
20477                 data
20478           ; NONE))
20479 ----
20480
20481 In the above, as soon as the expression `predicate x` evaluates to
20482 `true` the `app` invocation is terminated.
20483
20484
20485 == Continuation-passing Style (CPS) ==
20486
20487 A general way to implement complex control patterns is to use
20488 http://en.wikipedia.org/wiki/Continuation-passing_style[CPS].  In CPS,
20489 instead of returning normally, functions invoke a function passed as
20490 an argument.  In general, multiple continuation functions may be
20491 passed as arguments and the ordinary return continuation may also be
20492 used.  As an example, here is a function that finds the leftmost
20493 element of a binary tree satisfying a given predicate:
20494 [source,sml]
20495 ----
20496 datatype 'a tree = LEAF | BRANCH of 'a tree * 'a * 'a tree
20497
20498 fun find predicate = let
20499    fun recurse continue =
20500        fn LEAF =>
20501           continue ()
20502         | BRANCH (lhs, elem, rhs) =>
20503           recurse
20504              (fn () =>
20505                  if predicate elem then
20506                     SOME elem
20507                  else
20508                     recurse continue rhs)
20509              lhs
20510 in
20511    recurse (fn () => NONE)
20512 end
20513 ----
20514
20515 Note that the above function returns as soon as the leftmost element
20516 satisfying the predicate is found.
20517
20518 <<<
20519
20520 :mlton-guide-page: RSSA
20521 [[RSSA]]
20522 RSSA
20523 ====
20524
20525 <:RSSA:> is an <:IntermediateLanguage:>, translated from <:SSA2:> by
20526 <:ToRSSA:>, optimized by <:RSSASimplify:>, and translated by
20527 <:ToMachine:> to <:Machine:>.
20528
20529 == Description ==
20530
20531 <:RSSA:> is a <:IntermediateLanguage:> that makes representation
20532 decisions explicit.
20533
20534 == Implementation ==
20535
20536 * <!ViewGitFile(mlton,master,mlton/backend/rssa.sig)>
20537 * <!ViewGitFile(mlton,master,mlton/backend/rssa.fun)>
20538
20539 == Type Checking ==
20540
20541 The new type language is aimed at expressing bit-level control over
20542 layout and associated packing of data representations.  There are
20543 singleton types that denote constants, other atomic types for things
20544 like integers and reals, and arbitrary sum types and sequence (tuple)
20545 types.  The big change to the type system is that type checking is now
20546 based on subtyping, not type equality.  So, for example, the singleton
20547 type `0xFFFFEEBB` whose only inhabitant is the eponymous constant is a
20548 subtype of the type `Word32`.
20549
20550 == Details and Notes ==
20551
20552 SSA is an abbreviation for Static Single Assignment.  The <:RSSA:>
20553 <:IntermediateLanguage:> is a variant of SSA.
20554
20555 <<<
20556
20557 :mlton-guide-page: RSSAShrink
20558 [[RSSAShrink]]
20559 RSSAShrink
20560 ==========
20561
20562 <:RSSAShrink:> is an optimization pass for the <:RSSA:>
20563 <:IntermediateLanguage:>.
20564
20565 == Description ==
20566
20567 This pass implements a whole family of compile-time reductions, like:
20568
20569 * constant folding, copy propagation
20570 * inline the `Goto` to a block with a unique predecessor
20571
20572 == Implementation ==
20573
20574 * <!ViewGitFile(mlton,master,mlton/backend/rssa.fun)>
20575
20576 == Details and Notes ==
20577
20578 {empty}
20579
20580 <<<
20581
20582 :mlton-guide-page: RSSASimplify
20583 [[RSSASimplify]]
20584 RSSASimplify
20585 ============
20586
20587 The optimization passes for the <:RSSA:> <:IntermediateLanguage:> are
20588 collected and controlled by the `Backend` functor
20589 (<!ViewGitFile(mlton,master,mlton/backend/backend.sig)>,
20590 <!ViewGitFile(mlton,master,mlton/backend/backend.fun)>).
20591
20592 The following optimization pass is implemented:
20593
20594 * <:RSSAShrink:>
20595
20596 The following implementation passes are implemented:
20597
20598 * <:ImplementHandlers:>
20599 * <:ImplementProfiling:>
20600 * <:InsertLimitChecks:>
20601 * <:InsertSignalChecks:>
20602
20603 The optimization passes can be controlled from the command-line by the options
20604
20605 * `-diag-pass <pass>` -- keep diagnostic info for pass
20606 * `-drop-pass <pass>` -- omit optimization pass
20607 * `-keep-pass <pass>` -- keep the results of pass
20608
20609 <<<
20610
20611 :mlton-guide-page: RunningOnAIX
20612 [[RunningOnAIX]]
20613 RunningOnAIX
20614 ============
20615
20616 MLton runs fine on AIX.
20617
20618 == Also see ==
20619
20620 * <:RunningOnPowerPC:>
20621 * <:RunningOnPowerPC64:>
20622
20623 <<<
20624
20625 :mlton-guide-page: RunningOnAlpha
20626 [[RunningOnAlpha]]
20627 RunningOnAlpha
20628 ==============
20629
20630 MLton runs fine on the Alpha architecture.
20631
20632 == Notes ==
20633
20634 * When compiling for Alpha, MLton doesn't support native code
20635 generation (`-codegen native`).  Hence, performance is not as good as
20636 it might be and compile times are longer.  Also, the quality of code
20637 generated by `gcc` is important.  By default, MLton calls `gcc -O1`.
20638 You can change this by calling MLton with `-cc-opt -O2`.
20639
20640 * When compiling for Alpha, MLton uses `-align 8` by default.
20641
20642 <<<
20643
20644 :mlton-guide-page: RunningOnAMD64
20645 [[RunningOnAMD64]]
20646 RunningOnAMD64
20647 ==============
20648
20649 MLton runs fine on the AMD64 (aka "x86-64" or "x64") architecture.
20650
20651 == Notes ==
20652
20653 * When compiling for AMD64, MLton targets the 64-bit ABI.
20654
20655 * On AMD64, MLton supports native code generation (`-codegen native` or `-codegen amd64`).
20656
20657 * When compiling for AMD64, MLton uses `-align 8` by default.  Using
20658 `-align 4` may be incompatible with optimized builds of the <:GnuMP:>
20659 library, which assume 8-byte alignment.  (See the thread at
20660 http://www.mlton.org/pipermail/mlton/2009-October/030674.html for more
20661 details.)
20662
20663 <<<
20664
20665 :mlton-guide-page: RunningOnARM
20666 [[RunningOnARM]]
20667 RunningOnARM
20668 ============
20669
20670 MLton runs fine on the ARM architecture.
20671
20672 == Notes ==
20673
20674 * When compiling for ARM, MLton doesn't support native code generation
20675 (`-codegen native`).  Hence, performance is not as good as it might be
20676 and compile times are longer.  Also, the quality of code generated by
20677 `gcc` is important.  By default, MLton calls `gcc -O1`.  You can
20678 change this by calling MLton with `-cc-opt -O2`.
20679
20680 <<<
20681
20682 :mlton-guide-page: RunningOnCygwin
20683 [[RunningOnCygwin]]
20684 RunningOnCygwin
20685 ===============
20686
20687 MLton runs on the http://www.cygwin.com/[Cygwin] emulation layer,
20688 which provides a Posix-like environment while running on Windows.  To
20689 run MLton with Cygwin, you must first install Cygwin on your Windows
20690 machine.  To do this, visit the Cygwin site from your Windows machine
20691 and run their `setup.exe` script.  Then, you can unpack the MLton
20692 binary `tgz` in your Cygwin environment.
20693
20694 To run MLton cross-compiled executables on Windows, you must install
20695 the Cygwin `dll` on the Windows machine.
20696
20697 == Known issues ==
20698
20699 * Time profiling is disabled.
20700
20701 * Cygwin's `mmap` emulation is less than perfect.  Sometimes it
20702 interacts badly with `Posix.Process.fork`.
20703
20704 * The <!RawGitFile(mlton,master,regression/socket.sml)> regression
20705 test fails.  We suspect this is not a bug and is simply due to our
20706 test relying on a certain behavior when connecting to a socket that
20707 has not yet accepted, which is handled differently on Cygwin than
20708 other platforms.  Any help in understanding and resolving this issue
20709 is appreciated.
20710
20711 == Also see ==
20712
20713 * <:RunningOnMinGW:RunningOnMinGW>
20714
20715 <<<
20716
20717 :mlton-guide-page: RunningOnDarwin
20718 [[RunningOnDarwin]]
20719 RunningOnDarwin
20720 ===============
20721
20722 MLton runs fine on Darwin (and on Mac OS X).
20723
20724 == Notes ==
20725
20726 * MLton requires the <:GnuMP:> library, which is available via
20727 http://www.finkproject.org[Fink], http://www.macports.com[MacPorts],
20728 http://mxcl.github.io/homebrew/[Homebrew].
20729
20730 * For Intel-based Macs, MLton targets the <:RunningOnAMD64:AMD64
20731 architecture> on Darwin 10 (Mac OS X Snow Leopard) and higher and
20732 targets the <:RunningOnX86:x86 architecture> on Darwin 8 (Mac OS X
20733 Tiger) and Darwin 9 (Mac OS X Leopard).
20734
20735 == Known issues ==
20736
20737 * Executables that save and load worlds on Darwin 11 (Mac OS X Lion)
20738 and higher should be compiled with `-link-opt -fno-PIE` ; see
20739 <:MLtonWorld:> for more details.
20740
20741 * <:ProfilingTime:> may give inaccurate results on multi-processor
20742 machines.  The `SIGPROF` signal, used to sample the profiled program,
20743 is supposed to be delivered 100 times a second (i.e., at 10000us
20744 intervals), but there can be delays of over 1 minute between the
20745 delivery of consecutive `SIGPROF` signals.  A more complete
20746 description may be found
20747 http://lists.apple.com/archives/Unix-porting/2007/Aug/msg00000.html[here]
20748 and
20749 http://lists.apple.com/archives/Darwin-dev/2007/Aug/msg00045.html[here].
20750
20751 == Also see ==
20752
20753 * <:RunningOnAMD64:>
20754 * <:RunningOnPowerPC:>
20755 * <:RunningOnX86:>
20756
20757 <<<
20758
20759 :mlton-guide-page: RunningOnFreeBSD
20760 [[RunningOnFreeBSD]]
20761 RunningOnFreeBSD
20762 ================
20763
20764 MLton runs fine on http://www.freebsd.org/[FreeBSD].
20765
20766 == Notes ==
20767
20768 * MLton is available as a http://www.freebsd.org/[FreeBSD]
20769 http://www.freebsd.org/cgi/ports.cgi?query=mlton&stype=all[port].
20770
20771 == Known issues ==
20772
20773 * Executables often run more slowly than on a comparable Linux
20774 machine.  We conjecture that part of this is due to costs due to heap
20775 resizing and kernel zeroing of pages.  Any help in solving the problem
20776 would be appreciated.
20777
20778 * FreeBSD defaults to a datasize limit of 512M, even if you have more
20779 than that amount of memory in the computer. Hence, your MLton process
20780 will be limited in the amount of memory it has. To fix this problem,
20781 turn up the datasize and the default datasize available to a process:
20782 Edit `/boot/loader.conf` to set the limits. For example, the setting
20783 +
20784 ----
20785    kern.maxdsiz="671088640"
20786    kern.dfldsiz="671088640"
20787    kern.maxssiz="134217728"
20788 ----
20789 +
20790 will give a process 640M of datasize memory, default to 640M available
20791 and set 128M of stack size memory.
20792
20793 <<<
20794
20795 :mlton-guide-page: RunningOnHPPA
20796 [[RunningOnHPPA]]
20797 RunningOnHPPA
20798 =============
20799
20800 MLton runs fine on the HPPA architecture.
20801
20802 == Notes ==
20803
20804 * When compiling for HPPA, MLton targets the 32-bit HPPA architecture.
20805
20806 * When compiling for HPPA, MLton doesn't support native code
20807 generation (`-codegen native`).  Hence, performance is not as good as
20808 it might be and compile times are longer.  Also, the quality of code
20809 generated by `gcc` is important.  By default, MLton calls `gcc -O1`.
20810 You can change this by calling MLton with `-cc-opt -O2`.
20811
20812 * When compiling for HPPA, MLton uses `-align 8` by default.  While
20813 this speeds up reals, it also may increase object sizes.  If your
20814 program does not make significant use of reals, you might see a
20815 speedup with `-align 4`.
20816
20817 <<<
20818
20819 :mlton-guide-page: RunningOnHPUX
20820 [[RunningOnHPUX]]
20821 RunningOnHPUX
20822 =============
20823
20824 MLton runs fine on HPUX.
20825
20826 == Also see ==
20827
20828 * <:RunningOnHPPA:>
20829
20830 <<<
20831
20832 :mlton-guide-page: RunningOnIA64
20833 [[RunningOnIA64]]
20834 RunningOnIA64
20835 =============
20836
20837 MLton runs fine on the IA64 architecture.
20838
20839 == Notes ==
20840
20841 * When compiling for IA64, MLton targets the 64-bit ABI.
20842
20843 * When compiling for IA64, MLton doesn't support native code
20844 generation (`-codegen native`).  Hence, performance is not as good as
20845 it might be and compile times are longer.  Also, the quality of code
20846 generated by `gcc` is important.  By default, MLton calls `gcc -O1`.
20847 You can change this by calling MLton with `-cc-opt -O2`.
20848
20849 * When compiling for IA64, MLton uses `-align 8` by default.
20850
20851 * On the IA64, the <:GnuMP:> library supports multiple ABIs.  See the
20852 <:GnuMP:> page for more details.
20853
20854 <<<
20855
20856 :mlton-guide-page: RunningOnLinux
20857 [[RunningOnLinux]]
20858 RunningOnLinux
20859 ==============
20860
20861 MLton runs fine on Linux.
20862
20863 <<<
20864
20865 :mlton-guide-page: RunningOnMinGW
20866 [[RunningOnMinGW]]
20867 RunningOnMinGW
20868 ==============
20869
20870 MLton runs on http://mingw.org[MinGW], a library for porting Unix
20871 applications to Windows.  Some library functionality is missing or
20872 changed.
20873
20874 == Notes ==
20875
20876 * To compile MLton on MinGW:
20877 ** The <:GnuMP:> library is required.
20878 ** The Bash shell is required.  If you are using a prebuilt MSYS, you
20879 probably want to symlink `bash` to `sh`.
20880
20881 == Known issues ==
20882
20883 * Many functions are unimplemented and will `raise SysErr`.
20884 ** `MLton.Itimer.set`
20885 ** `MLton.ProcEnv.setgroups`
20886 ** `MLton.Process.kill`
20887 ** `MLton.Process.reap`
20888 ** `MLton.World.load`
20889 ** `OS.FileSys.readLink`
20890 ** `OS.IO.poll`
20891 ** `OS.Process.terminate`
20892 ** `Posix.FileSys.chown`
20893 ** `Posix.FileSys.fchown`
20894 ** `Posix.FileSys.fpathconf`
20895 ** `Posix.FileSys.link`
20896 ** `Posix.FileSys.mkfifo`
20897 ** `Posix.FileSys.pathconf`
20898 ** `Posix.FileSys.readlink`
20899 ** `Posix.FileSys.symlink`
20900 ** `Posix.IO.dupfd`
20901 ** `Posix.IO.getfd`
20902 ** `Posix.IO.getfl`
20903 ** `Posix.IO.getlk`
20904 ** `Posix.IO.setfd`
20905 ** `Posix.IO.setfl`
20906 ** `Posix.IO.setlkw`
20907 ** `Posix.IO.setlk`
20908 ** `Posix.ProcEnv.ctermid`
20909 ** `Posix.ProcEnv.getegid`
20910 ** `Posix.ProcEnv.geteuid`
20911 ** `Posix.ProcEnv.getgid`
20912 ** `Posix.ProcEnv.getgroups`
20913 ** `Posix.ProcEnv.getlogin`
20914 ** `Posix.ProcEnv.getpgrp`
20915 ** `Posix.ProcEnv.getpid`
20916 ** `Posix.ProcEnv.getppid`
20917 ** `Posix.ProcEnv.getuid`
20918 ** `Posix.ProcEnv.setgid`
20919 ** `Posix.ProcEnv.setpgid`
20920 ** `Posix.ProcEnv.setsid`
20921 ** `Posix.ProcEnv.setuid`
20922 ** `Posix.ProcEnv.sysconf`
20923 ** `Posix.ProcEnv.times`
20924 ** `Posix.ProcEnv.ttyname`
20925 ** `Posix.Process.exece`
20926 ** `Posix.Process.execp`
20927 ** `Posix.Process.exit`
20928 ** `Posix.Process.fork`
20929 ** `Posix.Process.kill`
20930 ** `Posix.Process.pause`
20931 ** `Posix.Process.waitpid_nh`
20932 ** `Posix.Process.waitpid`
20933 ** `Posix.SysDB.getgrgid`
20934 ** `Posix.SysDB.getgrnam`
20935 ** `Posix.SysDB.getpwuid`
20936 ** `Posix.TTY.TC.drain`
20937 ** `Posix.TTY.TC.flow`
20938 ** `Posix.TTY.TC.flush`
20939 ** `Posix.TTY.TC.getattr`
20940 ** `Posix.TTY.TC.getpgrp`
20941 ** `Posix.TTY.TC.sendbreak`
20942 ** `Posix.TTY.TC.setattr`
20943 ** `Posix.TTY.TC.setpgrp`
20944 ** `Unix.kill`
20945 ** `Unix.reap`
20946 ** `UnixSock.fromAddr`
20947 ** `UnixSock.toAddr`
20948
20949 <<<
20950
20951 :mlton-guide-page: RunningOnNetBSD
20952 [[RunningOnNetBSD]]
20953 RunningOnNetBSD
20954 ===============
20955
20956 MLton runs fine on http://www.netbsd.org/[NetBSD].
20957
20958 == Installing the correct packages for NetBSD ==
20959
20960 The NetBSD system installs 3rd party packages by a mechanism known as
20961 pkgsrc. This is a tree of Makefiles which when invoked downloads the
20962 source code, builds a package and installs it on the system. In order
20963 to run MLton on NetBSD, you will have to install several packages for
20964 it to work:
20965
20966 * `shells/bash`
20967
20968 * `devel/gmp`
20969
20970 * `devel/gmake`
20971
20972 In order to get graphical call-graphs of profiling information, you
20973 will need the additional package
20974
20975 * `graphics/graphviz`
20976
20977 To build the documentation for MLton, you will need the addtional
20978 package
20979
20980 * `textproc/asciidoc`.
20981
20982 == Tips for compiling and using MLton on NetBSD ==
20983
20984 MLton can be a memory-hog on computers with little memory.  While
20985 640Mb of RAM ought to be enough to self-compile MLton one might want
20986 to do some tuning to the NetBSD VM subsystem in order to succeed.  The
20987 notes presented here is what <:JesperLouisAndersen:> uses for
20988 compiling MLton on his laptop.
20989
20990 === The NetBSD VM subsystem ===
20991
20992 NetBSD uses a VM subsystem named
20993 http://www.ccrc.wustl.edu/pub/chuck/tech/uvm/[UVM].
20994 http://www.selonen.org/arto/netbsd/vm_tune.html[Tuning the VM system]
20995 can be done via the `sysctl(8)`-interface with the "VM" MIB set.
20996
20997 === Tuning the NetBSD VM subsystem for MLton ===
20998
20999 MLton uses a lot of anonymous pages when it is running. Thus, we will
21000 need to tune up the default of 80 for anonymous pages.  Setting
21001
21002 ----
21003 sysctl -w vm.anonmax=95
21004 sysctl -w vm.anonmin=50
21005 sysctl -w vm.filemin=2
21006 sysctl -w vm.execmin=2
21007 sysctl -w vm.filemax=4
21008 sysctl -w vm.execmax=4
21009 ----
21010
21011 makes it less likely for the VM system to swap out anonymous pages.
21012 For a full explanation of the above flags, see the documentation.
21013
21014 The result is that my laptop goes from a MLton compile where it swaps
21015 a lot to a MLton compile with no swapping.
21016
21017 <<<
21018
21019 :mlton-guide-page: RunningOnOpenBSD
21020 [[RunningOnOpenBSD]]
21021 RunningOnOpenBSD
21022 ================
21023
21024 MLton runs fine on http://www.openbsd.org/[OpenBSD].
21025
21026 == Known issues ==
21027
21028 * The <!RawGitFile(mlton,master,regression/socket.sml)> regression
21029 test fails.  We suspect this is not a bug and is simply due to our
21030 test relying on a certain behavior when connecting to a socket that
21031 has not yet accepted, which is handled differently on OpenBSD than
21032 other platforms.  Any help in understanding and resolving this issue
21033 is appreciated.
21034
21035 <<<
21036
21037 :mlton-guide-page: RunningOnPowerPC
21038 [[RunningOnPowerPC]]
21039 RunningOnPowerPC
21040 ================
21041
21042 MLton runs fine on the PowerPC architecture.
21043
21044 == Notes ==
21045
21046 * When compiling for PowerPC, MLton targets the 32-bit PowerPC
21047 architecture.
21048
21049 * When compiling for PowerPC, MLton doesn't support native code
21050 generation (`-codegen native`).  Hence, performance is not as good as
21051 it might be and compile times are longer.  Also, the quality of code
21052 generated by `gcc` is important.  By default, MLton calls `gcc -O1`.
21053 You can change this by calling MLton with `-cc-opt -O2`.
21054
21055 * On the PowerPC, the <:GnuMP:> library supports multiple ABIs.  See
21056 the <:GnuMP:> page for more details.
21057
21058 <<<
21059
21060 :mlton-guide-page: RunningOnPowerPC64
21061 [[RunningOnPowerPC64]]
21062 RunningOnPowerPC64
21063 ==================
21064
21065 MLton runs fine on the PowerPC64 architecture.
21066
21067 == Notes ==
21068
21069 * When compiling for PowerPC64, MLton targets the 64-bit PowerPC
21070 architecture.
21071
21072 * When compiling for PowerPC64, MLton doesn't support native code
21073 generation (`-codegen native`).  Hence, performance is not as good as
21074 it might be and compile times are longer.  Also, the quality of code
21075 generated by `gcc` is important.  By default, MLton calls `gcc -O1`.
21076 You can change this by calling MLton with `-cc-opt -O2`.
21077
21078 * On the PowerPC64, the <:GnuMP:> library supports multiple ABIs.  See
21079 the <:GnuMP:> page for more details.
21080
21081 <<<
21082
21083 :mlton-guide-page: RunningOnS390
21084 [[RunningOnS390]]
21085 RunningOnS390
21086 =============
21087
21088 MLton runs fine on the S390 architecture.
21089
21090 == Notes ==
21091
21092 * When compiling for S390, MLton doesn't support native code
21093 generation (`-codegen native`).  Hence, performance is not as good as
21094 it might be and compile times are longer.  Also, the quality of code
21095 generated by `gcc` is important.  By default, MLton calls `gcc -O1`.
21096 You can change this by calling MLton with `-cc-opt -O2`.
21097
21098 <<<
21099
21100 :mlton-guide-page: RunningOnSolaris
21101 [[RunningOnSolaris]]
21102 RunningOnSolaris
21103 ================
21104
21105 MLton runs fine on Solaris.
21106
21107 == Notes ==
21108
21109 * You must install the `binutils`, `gcc`, and `make` packages.  You
21110 can find out how to get these at
21111 http://www.sunfreeware.com[sunfreeware.com].
21112
21113 * Making the documentation requires that you install `latex` and
21114 `dvips`, which are available in the `tetex` package.
21115
21116 == Known issues ==
21117
21118 * Bootstrapping on the <:RunningOnSparc:Sparc architecture> is so slow
21119 as to be impractical (many hours on a 500MHz UltraSparc).  For this
21120 reason, we strongly recommend building with a
21121 <:CrossCompiling:cross compiler>.
21122
21123 == Also see ==
21124
21125 * <:RunningOnAMD64:>
21126 * <:RunningOnSparc:>
21127 * <:RunningOnX86:>
21128
21129 <<<
21130
21131 :mlton-guide-page: RunningOnSparc
21132 [[RunningOnSparc]]
21133 RunningOnSparc
21134 ==============
21135
21136 MLton runs fine on the Sparc architecture.
21137
21138 == Notes ==
21139
21140 * When compiling for Sparc, MLton targets the 32-bit Sparc
21141 architecture (i.e., Sparc V8).
21142
21143 * When compiling for Sparc, MLton doesn't support native code
21144 generation (`-codegen native`).  Hence, performance is not as good as
21145 it might be and compile times are longer.  Also, the quality of code
21146 generated by `gcc` is important.  By default, MLton calls `gcc -O1`.
21147 You can change this by calling MLton with `-cc-opt -O2`.  We have seen
21148 this speed up some programs by as much as 30%, especially those
21149 involving floating point; however, it can also more than double
21150 compile times.
21151
21152 * When compiling for Sparc, MLton uses `-align 8` by default.  While
21153 this speeds up reals, it also may increase object sizes.  If your
21154 program does not make significant use of reals, you might see a
21155 speedup with `-align 4`.
21156
21157 == Known issues ==
21158
21159 * Bootstrapping on the <:RunningOnSparc:Sparc architecture> is so slow
21160 as to be impractical (many hours on a 500MHz UltraSparc).  For this
21161 reason, we strongly recommend building with a
21162 <:CrossCompiling:cross compiler>.
21163
21164 == Also see ==
21165
21166 * <:RunningOnSolaris:>
21167
21168 <<<
21169
21170 :mlton-guide-page: RunningOnX86
21171 [[RunningOnX86]]
21172 RunningOnX86
21173 ============
21174
21175 MLton runs fine on the x86 architecture.
21176
21177 == Notes ==
21178
21179 * On x86, MLton supports native code generation (`-codegen native` or
21180 `-codegen x86`).
21181
21182 <<<
21183
21184 :mlton-guide-page: RunTimeOptions
21185 [[RunTimeOptions]]
21186 RunTimeOptions
21187 ==============
21188
21189 Executables produced by MLton take command line arguments that control
21190 the runtime system.  These arguments are optional, and occur before
21191 the executable's usual arguments.  To use these options, the first
21192 argument to the executable must be `@MLton`.  The optional arguments
21193 then follow, must be terminated by `--`, and are followed by any
21194 arguments to the program.  The optional arguments are _not_ made
21195 available to the SML program via `CommandLine.arguments`.  For
21196 example, a valid call to `hello-world` is:
21197
21198 ----
21199 hello-world @MLton gc-summary fixed-heap 10k -- a b c
21200 ----
21201
21202 In the above example,
21203 `CommandLine.arguments () = ["a", "b", "c"]`.
21204
21205 It is allowed to have a sequence of `@MLton` arguments, as in:
21206
21207 ----
21208 hello-world @MLton gc-summary -- @MLton fixed-heap 10k -- a b c
21209 ----
21210
21211 Run-time options can also control MLton, as in
21212
21213 ----
21214 mlton @MLton fixed-heap 0.5g -- foo.sml
21215 ----
21216
21217
21218 == Options ==
21219
21220 * ++fixed-heap __x__{k|K|m|M|g|G}++
21221 +
21222 Use a fixed size heap of size _x_, where _x_ is a real number and the
21223 trailing letter indicates its units.
21224 +
21225 [cols="^25%,<75%"]
21226 |====
21227 | `k` or `K` | 1024
21228 | `m` or `M` | 1,048,576
21229 | `g` or `G` | 1,073,741,824
21230 |====
21231 +
21232 A value of `0` means to use almost all the RAM present on the machine.
21233 +
21234 The heap size used by `fixed-heap` includes all memory allocated by
21235 SML code, including memory for the stack (or stacks, if there are
21236 multiple threads).  It does not, however, include any memory used for
21237 code itself or memory used by C globals, the C stack, or malloc.
21238
21239 * ++gc-messages++
21240 +
21241 Print a message at the start and end of every garbage collection.
21242
21243 * ++gc-summary++
21244 +
21245 Print a summary of garbage collection statistics upon program
21246 termination to standard error.
21247
21248 * ++gc-summary-file __file__++
21249 +
21250 Print a summary of garbage collection statistics upon program
21251 termination to the file specified by _file_.
21252
21253 * ++load-world __world__++
21254 +
21255 Restart the computation with the file specified by _world_, which must
21256 have been created by a call to `MLton.World.save` by the same
21257 executable.  See <:MLtonWorld:>.
21258
21259 * ++max-heap __x__{k|K|m|M|g|G}++
21260 +
21261 Run the computation with an automatically resized heap that is never
21262 larger than _x_, where _x_ is a real number and the trailing letter
21263 indicates the units as with `fixed-heap`.  The heap size for
21264 `max-heap` is accounted for as with `fixed-heap`.
21265
21266 * ++may-page-heap {false|true}++
21267 +
21268 Enable paging the heap to disk when unable to grow the heap to a
21269 desired size.
21270
21271 * ++no-load-world++
21272 +
21273 Disable `load-world`.  This can be used as an argument to the compiler
21274 via `-runtime no-load-world` to create executables that will not load
21275 a world.  This may be useful to ensure that set-uid executables do not
21276 load some strange world.
21277
21278 * ++ram-slop __x__++
21279 +
21280 Multiply _x_ by the amount of RAM on the machine to obtain what the
21281 runtime views as the amount of RAM it can use.  Typically _x_ is less
21282 than 1, and is used to account for space used by other programs
21283 running on the same machine.
21284
21285 * ++stop++
21286 +
21287 Causes the runtime to stop processing `@MLton` arguments once the next
21288 `--` is reached.  This can be used as an argument to the compiler via
21289 `-runtime stop` to create executables that don't process any `@MLton`
21290 arguments.
21291
21292 <<<
21293
21294 :mlton-guide-page: ScopeInference
21295 [[ScopeInference]]
21296 ScopeInference
21297 ==============
21298
21299 Scope inference is an analysis/rewrite pass for the <:AST:>
21300 <:IntermediateLanguage:>, invoked from <:Elaborate:>.
21301
21302 == Description ==
21303
21304 This pass adds free type variables to the `val` or `fun`
21305 declaration where they are implicitly scoped.
21306
21307 == Implementation ==
21308
21309 <!ViewGitFile(mlton,master,mlton/elaborate/scope.sig)>
21310 <!ViewGitFile(mlton,master,mlton/elaborate/scope.fun)>
21311
21312 == Details and Notes ==
21313
21314 Scope inference determines for each type variable, the declaration
21315 where it is bound.  Scope inference is a direct implementation of the
21316 specification given in section 4.6 of the
21317 <:DefinitionOfStandardML: Definition>.  Recall that a free occurrence
21318 of a type variable `'a` in a declaration `d` is _unguarded_
21319 in `d` if `'a` is not part of a smaller declaration.  A type
21320 variable `'a` is implicitly scoped at `d` if `'a` is
21321 unguarded in `d` and `'a` does not occur unguarded in any
21322 declaration containing `d`.
21323
21324 The first pass of scope inference walks down the tree and renames all
21325 explicitly bound type variables in order to avoid name collisions.  It
21326 then walks up the tree and adds to each declaration the set of
21327 unguarded type variables occurring in that declaration.  At this
21328 point, if declaration `d` contains an unguarded type variable
21329 `'a` and the immediately containing declaration does not contain
21330 `'a`, then `'a` is implicitly scoped at `d`.  The final
21331 pass walks down the tree leaving a `'a` at the a declaration where
21332 it is scoped and removing it from all enclosed declarations.
21333
21334 <<<
21335
21336 :mlton-guide-page: SelfCompiling
21337 [[SelfCompiling]]
21338 SelfCompiling
21339 =============
21340
21341 If you want to compile MLton, you must first get the <:Sources:>. You
21342 can compile with either MLton or SML/NJ, but we strongly recommend
21343 using MLton, since it generates a much faster and more robust
21344 executable.
21345
21346 == Compiling with MLton ==
21347
21348 To compile with MLton, you need the binary versions of `mlton`,
21349 `mllex`, and `mlyacc` that come with the MLton binary package.  To be
21350 safe, you should use the same version of MLton that you are building.
21351 However, older versions may work, as long as they don't go back too
21352 far.  To build MLton, run `make` from within the root directory of the
21353 sources.  This will build MLton first with the already installed
21354 binary version of MLton and will then rebuild MLton with itself.
21355
21356 First, the `Makefile` calls `mllex` and `mlyacc` to build the lexer
21357 and parser, and then calls `mlton` to compile itself.  When making
21358 MLton using another version the `Makefile` automatically uses
21359 `mlton-stubs.mlb`, which will put in enough stubs to emulate the
21360 `structure MLton`.  Once MLton is built, the `Makefile` will rebuild
21361 MLton with itself, this time using `mlton.mlb` and the real
21362 `structure MLton` from the <:BasisLibrary:Basis Library>.  This second round
21363 of compilation is essential in order to achieve a fast and robust
21364 MLton.
21365
21366 Compiling MLton requires at least 1GB of RAM for 32-bit platforms (2GB is
21367 preferable) and at least 2GB RAM for 64-bit platforms (4GB is preferable).
21368 If your machine has less RAM, self-compilation will
21369 likely fail, or at least take a very long time due to paging.  Even if
21370 you have enough memory, there simply may not be enough available, due
21371 to memory consumed by other processes.  In this case, you may see an
21372 `Out of memory` message, or self-compilation may become extremely
21373 slow.  The only fix is to make sure that enough memory is available.
21374
21375 === Possible Errors ===
21376
21377 * The C compiler may not be able to find the <:GnuMP:> header file,
21378 `gmp.h` leading to an error like the following.
21379 +
21380 ----
21381   cenv.h:49:18: fatal error: gmp.h: No such file or directory
21382 ----
21383 +
21384 The solution is to install (or build) GnuMP on your machine.  If you
21385 install it at a location not on the default seach path, then run
21386 ++make WITH_GMP_INC_DIR=__/path/to/gmp/include__ WITH_GMP_LIB_DIR=__/path/to/gmp/lib__++.
21387
21388 * The following errors indicates that a binary version of MLton could
21389 not be found in your path.
21390 +
21391 ----
21392 /bin/sh: mlton: command not found
21393 ----
21394 +
21395 ----
21396 make[2]: mlton: Command not found
21397 ----
21398 +
21399 You need to have `mlton` in your path to build MLton from source.
21400 +
21401 During the build process, there are various times that the `Makefile`-s
21402 look for a `mlton` in your path and in `src/build/bin`.  It is OK if
21403 the latter doesn't exist when the build starts; it is the target being
21404 built.  Failure to find a `mlton` in your path will abort the build.
21405
21406
21407 == Compiling with SML/NJ ==
21408
21409 To compile with SML/NJ, run `make bootstrap-smlnj` from within the
21410 root directory of the sources.  You must use a recent version of
21411 SML/NJ.  First, the `Makefile` calls `ml-lex` and `ml-yacc` to build
21412 the lexer and parser.  Then, it calls SML/NJ with the appropriate
21413 `sources.cm` file.  Once MLton is built with SML/NJ, the `Makefile`
21414 will rebuild MLton with this SML/NJ built MLton and then will rebuild
21415 MLton with the MLton built MLton.  Building with SML/NJ takes
21416 significant time (particularly during the "`parseAndElaborate`" phase
21417 when the SML/NJ built MLton is compiling MLton).  Unless you are doing
21418 compiler development and need rapid recompilation, we recommend
21419 compiling with MLton.
21420
21421 <<<
21422
21423 :mlton-guide-page: Serialization
21424 [[Serialization]]
21425 Serialization
21426 =============
21427
21428 <:StandardML:Standard ML> does not have built-in support for
21429 serialization.  Here are papers that describe user-level approaches:
21430
21431 * <!Cite(Elsman04)>
21432 * <!Cite(Kennedy04)>
21433
21434 The MLton repository also contains an experimental generic programming
21435 library (see
21436 <!ViewGitFile(mltonlib,master,com/ssh/generic/unstable/README)>) that
21437 includes a pickling (serialization) generic (see
21438 <!ViewGitFile(mltonlib,master,com/ssh/generic/unstable/public/value/pickle.sig)>).
21439
21440 <<<
21441
21442 :mlton-guide-page: ShareZeroVec
21443 [[ShareZeroVec]]
21444 ShareZeroVec
21445 ============
21446
21447 <:ShareZeroVec:> is an optimization pass for the <:SSA:>
21448 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
21449
21450 == Description ==
21451
21452 An SSA optimization to share zero-length vectors.
21453
21454 From <!ViewGitCommit(mlton,be8c5f576)>, which replaced the use of the
21455 `Array_array0Const` primitive in the Basis Library implementation with a
21456 (nullary) `Vector_vector` primitive:
21457
21458 ________
21459
21460 The original motivation for the `Array_array0Const` primitive was to share the
21461 heap space required for zero-length vectors among all vectors (of a given type).
21462 It was claimed that this optimization is important, e.g., in a self-compile,
21463 where vectors are used for lots of syntax tree elements and many of those
21464 vectors are empty. See:
21465 http://www.mlton.org/pipermail/mlton-devel/2002-February/021523.html
21466
21467 Curiously, the full effect of this optimization has been missing for quite some
21468 time (perhaps since the port of <:ConstantPropagation:> to the SSA IL).  While
21469 <:ConstantPropagation:> has "globalized" the nullary application of the
21470 `Array_array0Const` primitive, it also simultaneously transformed it to an
21471 application of the `Array_uninit` (previously, the `Array_array`) primitive to
21472 the zero constant.  The hash-consing of globals, meant to create exactly one
21473 global for each distinct constant, treats `Array_uninit` primitives as unequal
21474 (appropriately, since `Array_uninit` allocates an array with identity (though
21475 the identity may be supressed by a subsequent `Array_toVector`)), hence each
21476 distinct `Array_array0Const` primitive in the program remained as distinct
21477 globals.  The limited amount of inlining prior to <:ConstantPropagation:> meant
21478 that there were typically fewer than a dozen "copies" of the same empty vector
21479 in a program for a given type.
21480
21481 As a "functional" primitive, a nullary `Vector_vector` is globalized by
21482 ClosureConvert, but is further recognized by ConstantPropagation and hash-consed
21483 into a unique instance for each type.
21484 ________
21485
21486 However, a single, shared, global `Vector_vector ()` inhibits the
21487 coercion-based optimizations of `Useless`.  For example, consider the
21488 following program:
21489
21490 [source,sml]
21491 ----
21492     val n = valOf (Int.fromString (hd (CommandLine.arguments ())))
21493
21494     val v1 = Vector.tabulate (n, fn i =>
21495                               let val w = Word16.fromInt i
21496                               in (w - 0wx1, w, w + 0wx1 + w)
21497                               end)
21498     val v2 = Vector.map (fn (w1, w2, w3) => (w1, 0wx2 * w2, 0wx3 * w3)) v1
21499     val v3 = VectorSlice.vector (VectorSlice.slice (v1, 1, SOME (n - 2)))
21500     val ans1 = Vector.foldl (fn ((w1,w2,w3),w) => w + w1 + w2 + w3) 0wx0 v1
21501     val ans2 = Vector.foldl (fn ((_,w2,_),w) => w + w2) 0wx0 v2
21502     val ans3 = Vector.foldl (fn ((_,w2,_),w) => w + w2) 0wx0 v3
21503
21504     val _ = print (concat ["ans1 = ", Word16.toString ans1, "  ",
21505                            "ans2 = ", Word16.toString ans2, "  ",
21506                            "ans3 = ", Word16.toString ans3, "\n"])
21507 ----
21508
21509 We would like `v2` and `v3` to be optimized from
21510 `(word16 * word16 * word16) vector` to `word16 vector` because only
21511 the 2nd component of the elements is needed to compute the answer.
21512
21513 With `Array_array0Const`, each distinct occurrence of
21514 `Array_array0Const((word16 * word16 * word16))` arising from
21515 polyvariance and inlining remained a distinct
21516 `Array_uninit((word16 * word16 * word16)) (0x0)` global, which
21517 resulted in distinct occurrences for the
21518 `val v1 = Vector.tabulate ...` and for the
21519 `val v2 = Vector.map ...`. The latter could be optimized to
21520 `Array_uninit(word16) (0x0)` by `Useless`, because its result only
21521 flows to places requiring the 2nd component of the elements.
21522
21523 With `Vector_vector ()`, the distinct occurrences of
21524 `Vector_vector((word16 * word16 * word16)) ()` arising from
21525 polyvariance are globalized during `ClosureConvert`, those global
21526 references may be further duplicated by inlining, but the distinct
21527 occurrences of `Vector_vector((word16 * word16 * word16)) ()` are
21528 merged to a single occurrence.  Because this result flows to places
21529 requiring all three components of the elements, it remains
21530 `Vector_vector((word16 * word16 * word16)) ()` after
21531 `Useless`. Furthermore, because one cannot (in constant time) coerce a
21532 `(word16 * word16 * word16) vector` to a `word16 vector`, the `v2`
21533 value remains of type `(word16 * word16 * word16) vector`.
21534
21535 One option would be to drop the 0-element vector "optimization"
21536 entirely.  This costs some space (no sharing of empty vectors) and
21537 some time (allocation and garbage collection of empty vectors).
21538
21539 Another option would be to reinstate the `Array_array0Const` primitive
21540 and associated `ConstantPropagation` treatment.  But, the semantics
21541 and purpose of `Array_array0Const` was poorly understood, resulting in
21542 this break.
21543
21544 The <:ShareZeroVec:> pass pursues a different approach: perform the 0-element
21545 vector "optimization" as a separate optimization, after
21546 `ConstantPropagation` and `Useless`.  A trivial static analysis is
21547 used to match `val v: t vector = Array_toVector(t) (a)` with
21548 corresponding `val a: array = Array_uninit(t) (l)` and the later are
21549 expanded to
21550 `val a: t array = if 0 = l then zeroArr_[t] else Array_uninit(t) (l)`
21551 with a single global `val zeroArr_[t] = Array_uninit(t) (0)` created
21552 for each distinct type (after coercion-based optimizations).
21553
21554 One disadvantage of this approach, compared to the `Vector_vector(t) ()`
21555 approach, is that `Array_toVector` is applied each time a vector
21556 is created, even if it is being applied to the `zeroArr_[t]`
21557 zero-length array.  (Although, this was the behavior of the
21558 `Array_array0Const` approach.)  This updates the object header each
21559 time, whereas the `Vector_vector(t) ()` approach would have updated
21560 the object header once, when the global was created, and the
21561 `zeroVec_[t]` global and the `Array_toVector` result would flow to the
21562 join point.
21563
21564 It would be possible to properly share zero-length vectors, but doing
21565 so is a more sophisticated analysis and transformation, because there
21566 can be arbitrary code between the
21567 `val a: t array = Array_uninit(t) (l)` and the corresponding
21568 `val v: v vector = Array_toVector(t) (a)`, although, in practice,
21569 nothing happens when a zero-length vector is created.  It may be best
21570 to pursue a more general "array to vector" optimization that
21571 transforms creations of static-length vectors (e.g., all the
21572 `Vector.new<N>` functions) into `Vector_vector` primitives (some of
21573 which could be globalized).
21574
21575 == Implementation ==
21576
21577 * <!ViewGitFile(mlton,master,mlton/ssa/share-zero-vec.fun)>
21578
21579 == Details and Notes ==
21580
21581 {empty}
21582
21583 <<<
21584
21585 :mlton-guide-page: ShowBasis
21586 [[ShowBasis]]
21587 ShowBasis
21588 =========
21589
21590 MLton has a flag, `-show-basis <file>`, that causes MLton to pretty
21591 print to _file_ the basis defined by the input program.  For example,
21592 if `foo.sml` contains
21593 [source,sml]
21594 ----
21595 fun f x = x + 1
21596 ----
21597 then `mlton -show-basis foo.basis foo.sml` will create `foo.basis`
21598 with the following contents.
21599 ----
21600 val f: int -> int
21601 ----
21602
21603 If you only want to see the basis and do not wish to compile the
21604 program, you can call MLton with `-stop tc`.
21605
21606 == Displaying signatures ==
21607
21608 When displaying signatures, MLton prefixes types defined in the
21609 signature them with `_sig.` to distinguish them from types defined in the
21610 environment.  For example,
21611 [source,sml]
21612 ----
21613 signature SIG =
21614    sig
21615       type t
21616       val x: t * int -> unit
21617    end
21618 ----
21619 is displayed as
21620 ----
21621 signature SIG =
21622    sig
21623       type t
21624       val x: _sig.t * int -> unit
21625    end
21626 ----
21627
21628 Notice that `int` occurs without the `_sig.` prefix.
21629
21630 MLton also uses a canonical name for each type in the signature, and
21631 that name is used everywhere for that type, no matter what the input
21632 signature looked like.  For example:
21633 [source,sml]
21634 ----
21635 signature SIG =
21636    sig
21637       type t
21638       type u = t
21639       val x: t
21640       val y: u
21641    end
21642 ----
21643 is displayed as
21644 ----
21645 signature SIG =
21646    sig
21647       type t
21648       type u = _sig.t
21649       val x: _sig.t
21650       val y: _sig.t
21651    end
21652 ----
21653
21654 Canonical names are always relative to the "top" of the signature,
21655 even when used in nested substructures.  For example:
21656 [source,sml]
21657 ----
21658 signature S =
21659    sig
21660       type t
21661       val w: t
21662       structure U:
21663          sig
21664             type u
21665             val x: t
21666             val y: u
21667          end
21668       val z: U.u
21669    end
21670 ----
21671 is displayed as
21672 ----
21673 signature S =
21674    sig
21675       type t
21676       val w: _sig.t
21677       val z: _sig.U.u
21678       structure U:
21679          sig
21680             type u
21681             val x: _sig.t
21682             val y: _sig.U.u
21683          end
21684    end
21685 ----
21686
21687 == Displaying structures ==
21688
21689 When displaying structures, MLton uses signature constraints wherever
21690 possible, combined with `where type` clauses to specify the meanings
21691 of the types defined within the signature.  For example:
21692 [source,sml]
21693 ----
21694 signature SIG =
21695    sig
21696       type t
21697       val x: t
21698    end
21699 structure S: SIG =
21700    struct
21701       type t = int
21702       val x = 13
21703    end
21704 structure S2:> SIG = S
21705 ----
21706 is displayed as
21707 ----
21708 signature SIG =
21709    sig
21710       type t
21711       val x: _sig.t
21712    end
21713 structure S: SIG
21714              where type t = int
21715 structure S2: SIG
21716               where type t = S2.t
21717 ----
21718
21719 <<<
21720
21721 :mlton-guide-page: ShowBasisDirective
21722 [[ShowBasisDirective]]
21723 ShowBasisDirective
21724 ==================
21725
21726 A comment of the form `(*#showBasis "<file>"*)` is recognized as a directive to
21727 save the current basis (i.e., environment) to `<file>` (in the same format as
21728 the `-show-basis <file>` <:CompileTimeOptions: compile-time option>).  The
21729 `<file>` is interpreted relative to the source file in which it appears.  The
21730 comment is lexed as a distinct token and is parsed as a structure-level
21731 declaration.  [Note that treating the directive as a top-level declaration would
21732 prohibit using it inside a functor body, which would make the feature
21733 significantly less useful in the context of the MLton compiler sources (with its
21734 nearly fully functorial style).]
21735
21736 This feature is meant to facilitate auto-completion via
21737 https://github.com/MatthewFluet/company-mlton[`company-mlton`] and similar
21738 tools.
21739
21740 <<<
21741
21742 :mlton-guide-page: ShowProf
21743 [[ShowProf]]
21744 ShowProf
21745 ========
21746
21747 If an executable is compiled for <:Profiling:profiling>, then it
21748 accepts a special command-line runtime system argument, `show-prof`,
21749 that outputs information about the source functions that are profiled.
21750 Normally, this information is used by `mlprof`.  This page documents
21751 the `show-prof` output format, and is intended for those working on
21752 the profiler internals.
21753
21754 The `show-prof` output is ASCII, and consists of a sequence of lines.
21755
21756 * The magic number of the executable.
21757 * The number of source names in the executable.
21758 * A line for each source name giving the name of the function, a tab,
21759 the filename of the file containing the function, a colon, a space,
21760 and the line number that the function starts on in that file.
21761 * The number of (split) source functions.
21762 * A line for each (split) source function, where each line consists of
21763 a source-name index (into the array of source names) and a successors
21764 index (into the array of split-source sequences, defined below).
21765 * The number of split-source sequences.
21766 * A line for each split-source sequence, where each line is a space
21767 separated list of (split) source functions.
21768
21769 The latter two arrays, split sources and split-source sequences,
21770 define a directed graph, which is the call-graph of the program.
21771
21772 <<<
21773
21774 :mlton-guide-page: Shrink
21775 [[Shrink]]
21776 Shrink
21777 ======
21778
21779 <:Shrink:> is a rewrite pass for the <:SSA:> and <:SSA2:>
21780 <:IntermediateLanguage:>s, invoked from every optimization pass (see
21781 <:SSASimplify:> and <:SSA2Simplify:>).
21782
21783 == Description ==
21784
21785 This pass implements a whole family of compile-time reductions, like:
21786
21787 * `#1(a, b)` => `a`
21788 * `case C x of C y => e` => `let y = x in e`
21789 * constant folding, copy propagation
21790 * eta blocks
21791 * tuple reconstruction elimination
21792
21793 == Implementation ==
21794
21795 * <!ViewGitFile(mlton,master,mlton/ssa/shrink.sig)>
21796 * <!ViewGitFile(mlton,master,mlton/ssa/shrink.fun)>
21797 * <!ViewGitFile(mlton,master,mlton/ssa/shrink2.sig)>
21798 * <!ViewGitFile(mlton,master,mlton/ssa/shrink2.fun)>
21799
21800 == Details and Notes ==
21801
21802 The <:Shrink:> pass is run after every <:SSA:> and <:SSA2:>
21803 optimization pass.
21804
21805 The <:Shrink:> implementation also includes functions to eliminate
21806 unreachable blocks from a <:SSA:> or <:SSA2:> program or function.
21807 The <:Shrink:> pass does not guarantee to eliminate all unreachable
21808 blocks.  Doing so would unduly complicate the implementation, and it
21809 is almost always the case that all unreachable blocks are eliminated.
21810 However, a small number of optimization passes require that the input
21811 have no unreachable blocks (essentially, when the analysis works on
21812 the control flow graph and the rewrite iterates on the vector of
21813 blocks).  These passes explicitly call `eliminateDeadBlocks`.
21814
21815 The <:Shrink:> pass has a special case to turn a non-tail call where
21816 the continuation and handler only do `Profile` statements into a tail
21817 call where the `Profile` statements precede the tail call.
21818
21819 <<<
21820
21821 :mlton-guide-page: SimplifyTypes
21822 [[SimplifyTypes]]
21823 SimplifyTypes
21824 =============
21825
21826 <:SimplifyTypes:> is an optimization pass for the <:SSA:>
21827 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
21828
21829 == Description ==
21830
21831 This pass computes a "cardinality" of each datatype, which is an
21832 abstraction of the number of values of the datatype.
21833
21834 * `Zero` means the datatype has no values (except for bottom).
21835 * `One` means the datatype has one value (except for bottom).
21836 * `Many` means the datatype has many values.
21837
21838 This pass removes all datatypes whose cardinality is `Zero` or `One`
21839 and removes:
21840
21841 * components of tuples
21842 * function args
21843 * constructor args
21844
21845 which are such datatypes.
21846
21847 This pass marks constructors as one of:
21848
21849 * `Useless`: it never appears in a `ConApp`.
21850 * `Transparent`: it is the only variant in its datatype and its argument type does not contain any uses of `array` or `vector`.
21851 * `Useful`: otherwise
21852
21853 This pass also removes `Useless` and `Transparent` constructors.
21854
21855 == Implementation ==
21856
21857 * <!ViewGitFile(mlton,master,mlton/ssa/simplify-types.fun)>
21858
21859 == Details and Notes ==
21860
21861 This pass must happen before polymorphic equality is implemented because
21862
21863 * it will make polymorphic equality faster because some types are simpler
21864 * it removes uses of polymorphic equality that must return true
21865
21866 We must keep track of `Transparent` constructors whose argument type
21867 uses `array` because of datatypes like the following:
21868 [source,sml]
21869 ----
21870 datatype t = T of t array
21871 ----
21872
21873 Such a datatype has `Cardinality.Many`, but we cannot eliminate the
21874 datatype and replace the lhs by the rhs, i.e. we must keep the
21875 circularity around.
21876
21877 Must do similar things for `vectors`.
21878
21879 Also, to eliminate as many `Transparent` constructors as possible, for
21880 something like the following,
21881 [source,sml]
21882 ----
21883 datatype t = T of u array
21884      and u = U of t vector
21885 ----
21886 we (arbitrarily) expand one of the datatypes first.  The result will
21887 be something like
21888 [source,sml]
21889 ----
21890 datatype u = U of u array array
21891 ----
21892 where all uses of `t` are replaced by `u array`.
21893
21894 <<<
21895
21896 :mlton-guide-page: SML3d
21897 [[SML3d]]
21898 SML3d
21899 =====
21900
21901 The http://sml3d.cs.uchicago.edu/[SML3d Project] is a collection of
21902 libraries to support 3D graphics programming using Standard ML and the
21903 http://www.opengl.org/[OpenGL] graphics API. It currently requires the
21904 MLton implementation of SML and is supported on Linux, Mac OS X, and
21905 Microsoft Windows. There is also support for
21906 http://www.khronos.org/opencl/[OpenCL].
21907
21908 <<<
21909
21910 :mlton-guide-page: SMLNET
21911 [[SMLNET]]
21912 SMLNET
21913 ======
21914
21915 http://www.cl.cam.ac.uk/research/tsg/SMLNET[SML.NET] is a
21916 <:StandardMLImplementations:Standard ML implementation> that
21917 targets the .NET Common Language Runtime.
21918
21919 SML.NET is based on the <:MLj:MLj> compiler.
21920
21921 == Also see ==
21922
21923 * <!Cite(BentonEtAl04)>
21924
21925 <<<
21926
21927 :mlton-guide-page: SMLNJ
21928 [[SMLNJ]]
21929 SMLNJ
21930 =====
21931
21932 http://www.smlnj.org/[SML/NJ] is a
21933 <:StandardMLImplementations:Standard ML implementation>.  It is a
21934 native code compiler that runs on a variety of platforms and has a
21935 number of libraries and tools.
21936
21937 We maintain a list of SML/NJ's <:SMLNJDeviations:deviations> from
21938 <:DefinitionOfStandardML:The Definition of Standard ML>.
21939
21940 MLton has support for some features of SML/NJ in order to ease porting
21941 between MLton and SML/NJ.
21942
21943 * <:CompilationManager:> (CM)
21944 * <:LineDirective:>s
21945 * <:SMLofNJStructure:>
21946 * <:UnsafeStructure:>
21947
21948 <<<
21949
21950 :mlton-guide-page: SMLNJDeviations
21951 [[SMLNJDeviations]]
21952 SMLNJDeviations
21953 ===============
21954
21955 Here are some deviations of <:SMLNJ:SML/NJ> from
21956 <:DefinitionOfStandardML:The Definition of Standard ML (Revised)>.
21957 Some of these are documented in the
21958 http://www.smlnj.org/doc/Conversion/index.html[SML '97 Conversion Guide].
21959 Since MLton does not deviate from the Definition, you should look here
21960 if you are having trouble porting a program from MLton to SML/NJ or
21961 vice versa.  If you discover other deviations of SML/NJ that aren't
21962 listed here, please send mail to
21963 mailto:MLton-devel@mlton.org[`MLton-devel@mlton.org`].
21964
21965 * SML/NJ allows spaces in long identifiers, as in `S . x`.  Section
21966 2.5 of the Definition implies that `S . x` should be treated as three
21967 separate lexical items.
21968
21969 * SML/NJ allows `op` to appear in `val` specifications:
21970 +
21971 [source,sml]
21972 ----
21973 signature FOO = sig
21974    val op + : int * int -> int
21975 end
21976 ----
21977 +
21978 The grammar on page 14 of the Definition does not allow it. Recent
21979 versions of SML/NJ do give a warning.
21980
21981 * SML/NJ rejects
21982 +
21983 [source,sml]
21984 ----
21985 (op *)
21986 ----
21987 +
21988 as an unmatched close comment.
21989
21990 * SML/NJ allows `=` to be rebound by the declaration:
21991 +
21992 [source,sml]
21993 ----
21994 val op = = 13
21995 ----
21996 +
21997 This is explicitly forbidden on page 5 of the Definition. Recent
21998 versions of SML/NJ do give a warning.
21999
22000 * SML/NJ allows rebinding `true`, `false`, `nil`, `::`, and `ref` by
22001 the declarations:
22002 +
22003 [source,sml]
22004 ----
22005 fun true () = ()
22006 fun false () = ()
22007 fun nil () = ()
22008 fun op :: () = ()
22009 fun ref () = ()
22010 ----
22011 +
22012 This is explicitly forbidden on page 9 of the Definition.
22013
22014 * SML/NJ extends the syntax of the language to allow vector
22015 expressions and patterns like the following:
22016 +
22017 [source,sml]
22018 ----
22019 val v = #[1,2,3]
22020 val #[x,y,z] = v
22021 ----
22022 +
22023 MLton supports vector expressions and patterns with the <:SuccessorML#VectorExpsAndPats:`allowVectorExpsAndPats`> <:MLBasisAnnotations:ML Basis annotation>.
22024
22025 * SML/NJ extends the syntax of the language to allow _or patterns_
22026 like the following:
22027 +
22028 [source,sml]
22029 ----
22030 datatype foo = Foo of int | Bar of int
22031 val (Foo x | Bar x) = Foo 13
22032 ----
22033 +
22034 MLton supports or patterns with the <:SuccessorML#OrPats:`allowOrPats`> <:MLBasisAnnotations:ML Basis annotation>.
22035
22036 * SML/NJ allows higher-order functors, that is, functors can be
22037 components of structures and can be passed as functor arguments and
22038 returned as functor results.  As a consequence, SML/NJ allows
22039 abbreviated functor definitions, as in the following:
22040 +
22041 [source,sml]
22042 ----
22043 signature S =
22044   sig
22045     type t
22046     val x: t
22047   end
22048 functor F (structure A: S): S =
22049   struct
22050     type t = A.t * A.t
22051     val x = (A.x, A.x)
22052   end
22053 functor G = F
22054 ----
22055
22056 * SML/NJ extends the syntax of the language to allow `functor` and
22057 `signature` declarations to occur within the scope of `local` and
22058 `structure` declarations.
22059
22060 * SML/NJ allows duplicate type specifications in signatures when the
22061 duplicates are introduced by `include`, as in the following:
22062 +
22063 [source,sml]
22064 ----
22065 signature SIG1 =
22066    sig
22067       type t
22068       type u
22069    end
22070 signature SIG2 =
22071    sig
22072       type t
22073       type v
22074    end
22075 signature SIG =
22076    sig
22077       include SIG1
22078       include SIG2
22079    end
22080 ----
22081 +
22082 This is disallowed by rule 77 of the Definition.
22083
22084 * SML/NJ allows sharing constraints between type abbreviations in
22085 signatures, as in the following:
22086 +
22087 [source,sml]
22088 ----
22089 signature SIG =
22090    sig
22091       type t = int * int
22092       type u = int * int
22093       sharing type t = u
22094    end
22095 ----
22096 +
22097 These are disallowed by rule 78 of the Definition.  Recent versions of
22098 SML/NJ correctly disallow sharing constraints between type
22099 abbreviations in signatures.
22100
22101 * SML/NJ disallows multiple `where type` specifications of the same
22102 type name, as in the following
22103 +
22104 [source,sml]
22105 ----
22106 signature S =
22107   sig
22108      type t
22109      type u = t
22110   end
22111   where type u = int
22112 ----
22113 +
22114 This is allowed by rule 64 of the Definition.
22115
22116 * SML/NJ allows `and` in `sharing` specs in signatures, as in
22117 +
22118 [source,sml]
22119 ----
22120 signature S =
22121    sig
22122       type t
22123       type u
22124       type v
22125       sharing type t = u
22126       and type u = v
22127    end
22128 ----
22129
22130 * SML/NJ does not expand the `withtype` derived form as described by
22131 the Definition.  According to page 55 of the Definition, the type
22132 bindings of a `withtype` declaration are substituted simultaneously in
22133 the connected datatype.  Consider the following program.
22134 +
22135 [source,sml]
22136 ----
22137 type u = real ;
22138 datatype a =
22139     A of t
22140   | B of u
22141 withtype u = int
22142 and t = u
22143 ----
22144 +
22145 According to the Definition, it should be expanded to the following.
22146 +
22147 [source,sml]
22148 ----
22149 type u = real ;
22150 datatype a =
22151     A of u
22152   | B of int ;
22153 type u = int
22154 and t = u
22155 ----
22156 +
22157 However, SML/NJ expands `withtype` bindings sequentially, meaning that
22158 earlier bindings are expanded within later ones. Hence, the above
22159 program is expanded to the following.
22160 +
22161 [source,sml]
22162 ----
22163 type u = real ;
22164 datatype a =
22165     A of int
22166   | B of int ;
22167 type u = int
22168 type t = int
22169 ----
22170
22171 * SML/NJ allows `withtype` specifications in signatures.
22172 +
22173 MLton supports `withtype` specifications in signatures with the <:SuccessorML#SigWithtype:`allowSigWithtype`> <:MLBasisAnnotations:ML Basis annotation>.
22174
22175 * SML/NJ allows a `where` structure specification that is similar to a
22176 `where type` specification.  For example:
22177 +
22178 [source,sml]
22179 ----
22180 structure S = struct type t = int end
22181 signature SIG =
22182   sig
22183      structure T : sig type t end
22184   end where T = S
22185 ----
22186 +
22187 This is equivalent to:
22188 +
22189 [source,sml]
22190 ----
22191 structure S = struct type t = int end
22192 signature SIG =
22193   sig
22194      structure T : sig type t end
22195   end where type T.t = S.t
22196 ----
22197 +
22198 SML/NJ also allows a definitional structure specification that is
22199 similar to a definitional type specification.  For example:
22200 +
22201 [source,sml]
22202 ----
22203 structure S = struct type t = int end
22204 signature SIG =
22205   sig
22206      structure T : sig type t end = S
22207   end
22208 ----
22209 +
22210 This is equivalent to the previous examples and to:
22211 +
22212 [source,sml]
22213 ----
22214 structure S = struct type t = int end
22215 signature SIG =
22216   sig
22217      structure T : sig type t end where type t = S.t
22218   end
22219 ----
22220
22221 * SML/NJ disallows binding non-datatypes with datatype replication.
22222 For example, it rejects the following program that should be allowed
22223 according to the Definition.
22224 +
22225 [source,sml]
22226 ----
22227 type ('a, 'b) t = 'a * 'b
22228 datatype u = datatype t
22229 ----
22230 +
22231 This idiom can be useful when one wants to rename a type without
22232 rewriting all the type arguments.  For example, the above would have
22233 to be written in SML/NJ as follows.
22234 +
22235 [source,sml]
22236 ----
22237 type ('a, 'b) t = 'a * 'b
22238 type ('a, 'b) u = ('a, 'b) t
22239 ----
22240
22241 * SML/NJ disallows sharing a structure with one of its substructures.
22242 For example, SML/NJ disallows the following.
22243 +
22244 [source,sml]
22245 ----
22246 signature SIG =
22247    sig
22248       structure S:
22249          sig
22250             type t
22251             structure T: sig type t end
22252          end
22253       sharing S = S.T
22254    end
22255 ----
22256 +
22257 This signature is allowed by the Definition.
22258
22259 * SML/NJ disallows polymorphic generalization of refutable
22260 patterns. For example, SML/NJ disallows the following.
22261 +
22262 [source,sml]
22263 ----
22264 val [x] = [[]]
22265 val _ = (1 :: x, "one" :: x)
22266 ----
22267 +
22268 Recent versions of SML/NJ correctly allow polymorphic generalization
22269 of refutable patterns.
22270
22271 * SML/NJ uses an overly restrictive context for type inference.  For
22272 example, SML/NJ rejects both of the following.
22273 +
22274 [source,sml]
22275 ----
22276 structure S =
22277 struct
22278   val z = (fn x => x) []
22279   val y = z :: [true] :: nil
22280 end
22281 ----
22282 +
22283 [source,sml]
22284 ----
22285 structure S : sig val z : bool list end =
22286 struct
22287   val z = (fn x => x) []
22288 end
22289 ----
22290 +
22291 These structures are allowed by the Definition.
22292
22293 == Deviations from the Basis Library Specification ==
22294
22295 Here are some deviations of SML/NJ from the <:BasisLibrary:Basis Library>
22296 http://www.standardml.org/Basis[specification].
22297
22298 * SML/NJ exposes the equality of the `vector` type in structures such
22299 as `Word8Vector` that abstractly match `MONO_VECTOR`, which says
22300 `type vector`, not `eqtype vector`.  So, for example, SML/NJ accepts
22301 the following program:
22302 +
22303 [source,sml]
22304 ----
22305 fun f (v: Word8Vector.vector) = v = v
22306 ----
22307
22308 * SML/NJ exposes the equality property of the type `status` in
22309 `OS.Process`. This means that programs which directly compare two
22310 values of type `status` will work with SML/NJ but not MLton.
22311
22312 * Under SML/NJ on Windows, `OS.Path.validVolume` incorrectly considers
22313 absolute empty volumes to be valid. In other words, when the
22314 expression
22315 +
22316 [source,sml]
22317 ----
22318 OS.Path.validVolume { isAbs = true, vol = "" }
22319 ----
22320 +
22321 is evaluated by SML/NJ on Windows, the result is `true`.  MLton, on
22322 the other hand, correctly follows the Basis Library Specification,
22323 which states that on Windows, `OS.Path.validVolume` should return
22324 `false` whenever `isAbs = true` and `vol = ""`.
22325 +
22326 This incorrect behavior causes other `OS.Path` functions to behave
22327 differently. For example, when the expression
22328 +
22329 [source,sml]
22330 ----
22331 OS.Path.toString (OS.Path.fromString "\\usr\\local")
22332 ----
22333 +
22334 is evaluated by SML/NJ on Windows, the result is `"\\usr\\local"`,
22335 whereas under MLton on Windows, evaluating this expression (correctly)
22336 causes an `OS.Path.Path` exception to be raised.
22337
22338 <<<
22339
22340 :mlton-guide-page: SMLNJLibrary
22341 [[SMLNJLibrary]]
22342 SMLNJLibrary
22343 ============
22344
22345 The http://www.smlnj.org/doc/smlnj-lib/index.html[SML/NJ Library] is a
22346 collection of libraries that are distributed with SML/NJ.  Due to
22347 differences between SML/NJ and MLton, these libraries will not work
22348 out-of-the box with MLton.
22349
22350 As of 20180119, MLton includes a port of the SML/NJ Library
22351 synchronized with SML/NJ version 110.82.
22352
22353 == Usage ==
22354
22355 * You can import a sub-library of the SML/NJ Library into an MLB file with:
22356 +
22357 [options="header"]
22358 |=====
22359 |MLB file|Description
22360 |`$(SML_LIB)/smlnj-lib/Util/smlnj-lib.mlb`|Various utility modules, included collections, simple formating, ...
22361 |`$(SML_LIB)/smlnj-lib/Controls/controls-lib.mlb`|A library for managing control flags in an application.
22362 |`$(SML_LIB)/smlnj-lib/HashCons/hash-cons-lib.mlb`|Support for implementing hash-consed data structures.
22363 |`$(SML_LIB)/smlnj-lib/HTML/html-lib.mlb`|HTML 3.2 parsing and pretty-printing library.
22364 |`$(SML_LIB)/smlnj-lib/HTML4/html4-lib.mlb`|HTML 4.01 parsing and pretty-printing library.
22365 |`$(SML_LIB)/smlnj-lib/INet/inet-lib.mlb`|Networking utilities; supported on both Unix and Windows systems.
22366 |`$(SML_LIB)/smlnj-lib/JSON/json-lib.mlb`|JavaScript Object Notation (JSON) reading and writing library.
22367 |`$(SML_LIB)/smlnj-lib/PP/pp-lib.mlb`|Pretty-printing library.
22368 |`$(SML_LIB)/smlnj-lib/Reactive/reactive-lib.mlb`|Reactive scripting library.
22369 |`$(SML_LIB)/smlnj-lib/RegExp/regexp-lib.mlb`|Regular expression library.
22370 |`$(SML_LIB)/smlnj-lib/SExp/sexp-lib.mlb`|S-expression library.
22371 |`$(SML_LIB)/smlnj-lib/Unix/unix-lib.mlb`|Utilities for Unix-based operating systems.
22372 |`$(SML_LIB)/smlnj-lib/XML/xml-lib.mlb`|XML library.
22373 |=====
22374
22375 * If you are porting a project from SML/NJ's <:CompilationManager:> to
22376 MLton's <:MLBasis: ML Basis system> using `cm2mlb`, note that the
22377 following maps are included by default:
22378 +
22379 -----
22380 # SMLNJ Library
22381 $SMLNJ-LIB                              $(SML_LIB)/smlnj-lib
22382 $smlnj-lib.cm                           $(SML_LIB)/smlnj-lib/Util
22383 $controls-lib.cm                        $(SML_LIB)/smlnj-lib/Controls
22384 $hash-cons-lib.cm                       $(SML_LIB)/smlnj-lib/HashCons
22385 $html-lib.cm                            $(SML_LIB)/smlnj-lib/HTML
22386 $html4-lib.cm                           $(SML_LIB)/smlnj-lib/HTML4
22387 $inet-lib.cm                            $(SML_LIB)/smlnj-lib/INet
22388 $json-lib.cm                            $(SML_LIB)/smlnj-lib/JSON
22389 $pp-lib.cm                              $(SML_LIB)/smlnj-lib/PP
22390 $reactive-lib.cm                        $(SML_LIB)/smlnj-lib/Reactive
22391 $regexp-lib.cm                          $(SML_LIB)/smlnj-lib/RegExp
22392 $sexp-lib.cm                            $(SML_LIB)/smlnj-lib/SExp
22393 $unix-lib.cm                            $(SML_LIB)/smlnj-lib/Unix
22394 $xml-lib.cm                             $(SML_LIB)/smlnj-lib/XML
22395 ----
22396 +
22397 This will automatically convert a `$/smlnj-lib.cm` import in an input
22398 `.cm` file into a `$(SML_LIB)/smlnj-lib/Util/smlnj-lib.mlb` import in
22399 the output `.mlb` file.
22400
22401 == Details ==
22402
22403 The following changes were made to the SML/NJ Library, in addition to
22404 deriving the `.mlb` files from the `.cm` files:
22405
22406 * `HTML4/pp-init.sml` (added): Implements `structure PrettyPrint` using the SML/NJ PP Library.  This implementation is taken from the SML/NJ compiler source, since the SML/NJ HTML4 Library used the `structure PrettyPrint` provided by the SML/NJ compiler itself.
22407 * `Util/base64.sml` (modified): Rewrote use of `Unsafe.CharVector.create` and `Unsafe.CharVector.update`; MLton assumes that vectors are immutable.
22408 * `Util/engine.mlton.sml` (added, not exported): Implements `structure Engine`, providing time-limited, resumable computations using <:MLtonThread:>, <:MLtonSignal:>, and <:MLtonItimer:>.
22409 * `Util/graph-scc-fn.sml` (modified): Rewrote use of `where` structure specification.
22410 * `Util/redblack-map-fn.sml` (modified): Rewrote use of `where` structure specification.
22411 * `Util/redblack-set-fn.sml` (modified): Rewrote use of `where` structure specification.
22412 * `Util/time-limit.mlb` (added): Exports `structure TimeLimit`, which is _not_ exported by `smlnj-lib.mlb`.  Since MLton is very conservative in the presence of threads and signals, program performance may be adversely affected by unnecessarily including `structure TimeLimit`.
22413 * `Util/time-limit.mlton.sml` (added): Implements `structure TimeLimit` using `structure Engine`.  The SML/NJ implementation of `structure TimeLimit` uses SML/NJ's first-class continuations, signals, and interval timer.
22414
22415 == Patch ==
22416
22417 * <!ViewGitFile(mlton,master,lib/smlnj-lib/smlnj-lib.patch)>
22418
22419 <<<
22420
22421 :mlton-guide-page: SMLofNJStructure
22422 [[SMLofNJStructure]]
22423 SMLofNJStructure
22424 ================
22425
22426 [source,sml]
22427 ----
22428 signature SML_OF_NJ =
22429    sig
22430       structure Cont:
22431          sig
22432             type 'a cont
22433             val callcc: ('a cont -> 'a) -> 'a
22434             val isolate: ('a -> unit) -> 'a cont
22435             val throw: 'a cont -> 'a -> 'b
22436          end
22437       structure SysInfo:
22438          sig
22439             exception UNKNOWN
22440             datatype os_kind = BEOS | MACOS | OS2 | UNIX | WIN32
22441
22442             val getHostArch: unit -> string
22443             val getOSKind: unit -> os_kind
22444             val getOSName: unit -> string
22445          end
22446
22447       val exnHistory: exn -> string list
22448       val exportFn: string * (string * string list -> OS.Process.status) -> unit
22449       val exportML: string -> bool
22450       val getAllArgs: unit -> string list
22451       val getArgs: unit -> string list
22452       val getCmdName: unit -> string
22453    end
22454 ----
22455
22456 `SMLofNJ` implements a subset of the structure of the same name
22457 provided in <:SMLNJ:Standard ML of New Jersey>.  It is included to
22458 make it easier to port programs between the two systems.  The
22459 semantics of these functions may be different than in SML/NJ.
22460
22461 * `structure Cont`
22462 +
22463 implements continuations.
22464
22465 * `SysInfo.getHostArch ()`
22466 +
22467 returns the string for the architecture.
22468
22469 * `SysInfo.getOSKind`
22470 +
22471 returns the OS kind.
22472
22473 * `SysInfo.getOSName ()`
22474 +
22475 returns the string for the host.
22476
22477 * `exnHistory`
22478 +
22479 the same as `MLton.Exn.history`.
22480
22481 * `getCmdName ()`
22482 +
22483 the same as `CommandLine.name ()`.
22484
22485 * `getArgs ()`
22486 +
22487 the same as `CommandLine.arguments ()`.
22488
22489 * `getAllArgs ()`
22490 +
22491 the same as `getCmdName()::getArgs()`.
22492
22493 * `exportFn f`
22494 +
22495 saves the state of the computation to a file that will apply `f` to
22496 the command-line arguments upon restart.
22497
22498 * `exportML f`
22499 +
22500 saves the state of the computation to file `f` and continue.  Returns
22501 `true` in the restarted computation and `false` in the continuing
22502 computation.
22503
22504 <<<
22505
22506 :mlton-guide-page: SMLSharp
22507 [[SMLSharp]]
22508 SMLSharp
22509 ========
22510
22511 http://www.pllab.riec.tohoku.ac.jp/smlsharp/[SML#] is an
22512 <:StandardMLImplementations:implementation> of an extension of SML.
22513
22514 It includes some
22515 http://www.pllab.riec.tohoku.ac.jp/smlsharp/?Tools[generally useful SML tools]
22516 including a pretty printer generator, a document generator, and a
22517 regression testing framework, and
22518 http://www.pllab.riec.tohoku.ac.jp/smlsharp/?Library%2FScripting[scripting library].
22519
22520 <<<
22521
22522 :mlton-guide-page: Sources
22523 [[Sources]]
22524 Sources
22525 =======
22526
22527 We maintain our sources with <:Git:>.  You can
22528 https://github.com/MLton/mlton/[view them on the web] or access
22529 them with a git client.
22530
22531 Anonymous read-only access is available via
22532 ----------
22533 https://github.com/MLton/mlton.git
22534 ----------
22535 or
22536 ----------
22537 git://github.com/MLton/mlton.git
22538 ----------
22539
22540
22541 == Commit email ==
22542
22543 All commits are sent to
22544 mailto:MLton-commit@mlton.org[`MLton-commit@mlton.org`]
22545 (https://lists.sourceforge.net/lists/listinfo/mlton-commit[subscribe],
22546 https://sourceforge.net/mailarchive/forum.php?forum_name=mlton-commit[archive],
22547 http://www.mlton.org/pipermail/mlton-commit/[archive]) which is a
22548 read-only mailing list for commit emails.  Discussion should go to
22549 mailto:MLton-devel@mlton.org[`MLton-devel@mlton.org`].
22550
22551 /////
22552 If the first line of a commit log message begins with "++MAIL{nbsp} ++",
22553 then the commit message will be sent with the subject as the rest of
22554 that first line, and will also be sent to
22555 mailto:MLton-devel@mlton.org[`MLton-devel@mlton.org`].
22556 /////
22557
22558
22559 == Changelog ==
22560
22561 See <!ViewGitFile(mlton,master,CHANGELOG.adoc)> for a list of
22562 changes and bug fixes.
22563
22564
22565 == Subversion ==
22566
22567 Prior to 20130308, we used <:Subversion:>.
22568
22569 == CVS ==
22570
22571 Prior to 20050730, we used <:CVS:>.
22572
22573 <<<
22574
22575 :mlton-guide-page: SpaceSafety
22576 [[SpaceSafety]]
22577 SpaceSafety
22578 ===========
22579
22580 Informally, space safety is a property of a language implementation
22581 that asymptotically bounds the space used by a running program.
22582
22583 == Also see ==
22584
22585 * Chapter 12 of <!Cite(Appel92)>
22586 * <!Cite(Clinger98)>
22587
22588 <<<
22589
22590 :mlton-guide-page: SSA
22591 [[SSA]]
22592 SSA
22593 ===
22594
22595 <:SSA:> is an <:IntermediateLanguage:>, translated from <:SXML:> by
22596 <:ClosureConvert:>, optimized by <:SSASimplify:>, and translated by
22597 <:ToSSA2:> to <:SSA2:>.
22598
22599 == Description ==
22600
22601 <:SSA:> is a <:FirstOrder:>, <:SimplyTyped:> <:IntermediateLanguage:>.
22602 It is the main <:IntermediateLanguage:> used for optimizations.
22603
22604 An <:SSA:> program consists of a collection of datatype declarations,
22605 a sequence of global statements, and a collection of functions, along
22606 with a distinguished "main" function.  Each function consists of a
22607 collection of basic blocks, where each basic block is a sequence of
22608 statements ending with some control transfer.
22609
22610 == Implementation ==
22611
22612 * <!ViewGitFile(mlton,master,mlton/ssa/ssa.sig)>
22613 * <!ViewGitFile(mlton,master,mlton/ssa/ssa.fun)>
22614 * <!ViewGitFile(mlton,master,mlton/ssa/ssa-tree.sig)>
22615 * <!ViewGitFile(mlton,master,mlton/ssa/ssa-tree.fun)>
22616
22617 == Type Checking ==
22618
22619 Type checking (<!ViewGitFile(mlton,master,mlton/ssa/type-check.sig)>,
22620 <!ViewGitFile(mlton,master,mlton/ssa/type-check.fun)>) of a <:SSA:> program
22621 verifies the following:
22622
22623 * no duplicate definitions (tycons, cons, vars, labels, funcs)
22624 * no out of scope references (tycons, cons, vars, labels, funcs)
22625 * variable definitions dominate variable uses
22626 * case transfers are exhaustive and irredundant
22627 * `Enter`/`Leave` profile statements match
22628 * "traditional" well-typedness
22629
22630 == Details and Notes ==
22631
22632 SSA is an abbreviation for Static Single Assignment.
22633
22634 For some initial design discussion, see the thread at:
22635
22636 * http://mlton.org/pipermail/mlton/2001-August/019689.html
22637
22638 For some retrospectives, see the threads at:
22639
22640 * http://mlton.org/pipermail/mlton/2003-January/023054.html
22641 * http://mlton.org/pipermail/mlton/2007-February/029597.html
22642
22643 <<<
22644
22645 :mlton-guide-page: SSA2
22646 [[SSA2]]
22647 SSA2
22648 ====
22649
22650 <:SSA2:> is an <:IntermediateLanguage:>, translated from <:SSA:> by
22651 <:ToSSA2:>, optimized by <:SSA2Simplify:>, and translated by
22652 <:ToRSSA:> to <:RSSA:>.
22653
22654 == Description ==
22655
22656 <:SSA2:> is a <:FirstOrder:>, <:SimplyTyped:>
22657 <:IntermediateLanguage:>, a slight variant of the <:SSA:>
22658 <:IntermediateLanguage:>,
22659
22660 Like <:SSA:>, an <:SSA2:> program consists of a collection of datatype
22661 declarations, a sequence of global statements, and a collection of
22662 functions, along with a distinguished "main" function.  Each function
22663 consists of a collection of basic blocks, where each basic block is a
22664 sequence of statements ending with some control transfer.
22665
22666 Unlike <:SSA:>, <:SSA2:> includes mutable fields in objects and makes
22667 the vector type constructor n-ary instead of unary.  This allows
22668 optimizations like <:RefFlatten:> and <:DeepFlatten:> to be expressed.
22669
22670 == Implementation ==
22671
22672 * <!ViewGitFile(mlton,master,mlton/ssa/ssa2.sig)>
22673 * <!ViewGitFile(mlton,master,mlton/ssa/ssa2.fun)>
22674 * <!ViewGitFile(mlton,master,mlton/ssa/ssa-tree2.sig)>
22675 * <!ViewGitFile(mlton,master,mlton/ssa/ssa-tree2.fun)>
22676
22677 == Type Checking ==
22678
22679 Type checking (<!ViewGitFile(mlton,master,mlton/ssa/type-check2.sig)>,
22680 <!ViewGitFile(mlton,master,mlton/ssa/type-check2.fun)>) of a <:SSA2:>
22681 program verifies the following:
22682
22683 * no duplicate definitions (tycons, cons, vars, labels, funcs)
22684 * no out of scope references (tycons, cons, vars, labels, funcs)
22685 * variable definitions dominate variable uses
22686 * case transfers are exhaustive and irredundant
22687 * `Enter`/`Leave` profile statements match
22688 * "traditional" well-typedness
22689
22690 == Details and Notes ==
22691
22692 SSA is an abbreviation for Static Single Assignment.
22693
22694 <<<
22695
22696 :mlton-guide-page: SSA2Simplify
22697 [[SSA2Simplify]]
22698 SSA2Simplify
22699 ============
22700
22701 The optimization passes for the <:SSA2:> <:IntermediateLanguage:> are
22702 collected and controlled by the `Simplify2` functor
22703 (<!ViewGitFile(mlton,master,mlton/ssa/simplify2.sig)>,
22704 <!ViewGitFile(mlton,master,mlton/ssa/simplify2.fun)>).
22705
22706 The following optimization passes are implemented:
22707
22708 * <:DeepFlatten:>
22709 * <:RefFlatten:>
22710 * <:RemoveUnused:>
22711 * <:Zone:>
22712
22713 There are additional analysis and rewrite passes that augment many of the other optimization passes:
22714
22715 * <:Restore:>
22716 * <:Shrink:>
22717
22718 The optimization passes can be controlled from the command-line by the options
22719
22720 * `-diag-pass <pass>` -- keep diagnostic info for pass
22721 * `-disable-pass <pass>` -- skip optimization pass (if normally performed)
22722 * `-enable-pass <pass>` -- perform optimization pass (if normally skipped)
22723 * `-keep-pass <pass>` -- keep the results of pass
22724 * `-loop-passes <n>` -- loop optimization passes
22725 * `-ssa2-passes <passes>` -- ssa optimization passes
22726
22727 <<<
22728
22729 :mlton-guide-page: SSASimplify
22730 [[SSASimplify]]
22731 SSASimplify
22732 ===========
22733
22734 The optimization passes for the <:SSA:> <:IntermediateLanguage:> are
22735 collected and controlled by the `Simplify` functor
22736 (<!ViewGitFile(mlton,master,mlton/ssa/simplify.sig)>,
22737 <!ViewGitFile(mlton,master,mlton/ssa/simplify.fun)>).
22738
22739 The following optimization passes are implemented:
22740
22741 * <:CombineConversions:>
22742 * <:CommonArg:>
22743 * <:CommonBlock:>
22744 * <:CommonSubexp:>
22745 * <:ConstantPropagation:>
22746 * <:Contify:>
22747 * <:Flatten:>
22748 * <:Inline:>
22749 * <:IntroduceLoops:>
22750 * <:KnownCase:>
22751 * <:LocalFlatten:>
22752 * <:LocalRef:>
22753 * <:LoopInvariant:>
22754 * <:LoopUnfoll:>
22755 * <:LoopUnswitch:>
22756 * <:Redundant:>
22757 * <:RedundantTests:>
22758 * <:RemoveUnused:>
22759 * <:ShareZeroVec:>
22760 * <:SimplifyTypes:>
22761 * <:Useless:>
22762
22763 The following implementation passes are implemented:
22764
22765 * <:PolyEqual:>
22766 * <:PolyHash:>
22767
22768 There are additional analysis and rewrite passes that augment many of the other optimization passes:
22769
22770 * <:Multi:>
22771 * <:Restore:>
22772 * <:Shrink:>
22773
22774 The optimization passes can be controlled from the command-line by the options:
22775
22776 * `-diag-pass <pass>` -- keep diagnostic info for pass
22777 * `-disable-pass <pass>` -- skip optimization pass (if normally performed)
22778 * `-enable-pass <pass>` -- perform optimization pass (if normally skipped)
22779 * `-keep-pass <pass>` -- keep the results of pass
22780 * `-loop-passes <n>` -- loop optimization passes
22781 * `-ssa-passes <passes>` -- ssa optimization passes
22782
22783 <<<
22784
22785 :mlton-guide-page: Stabilizers
22786 [[Stabilizers]]
22787 Stabilizers
22788 ===========
22789
22790 == Installation ==
22791
22792 * Stabilizers currently require the MLton sources, this should be fixed by the next release
22793
22794 == License ==
22795
22796 * Stabilizers are released under the MLton License
22797
22798 == Instructions ==
22799
22800 * Download and build a source copy of MLton
22801 * Extract the tar.gz file attached to this page
22802 * Some examples are provided in the "examples/" sub directory, more examples will be added to this page in the following week
22803
22804 == Bug reports / Suggestions ==
22805
22806 * Please send any errors you encounter to schatzp and lziarek at cs.purdue.edu
22807 * We are looking to expand the usability of stabilizers
22808 * Please send any suggestions and desired functionality to the above email addresses
22809
22810 == Note ==
22811
22812 * This is an alpha release. We expect to have another release shortly with added functionality soon
22813 * More documentation, such as signatures and descriptions of functionality, will be forthcoming
22814
22815
22816 == Documentation ==
22817
22818 [source,sml]
22819 ----
22820 signature STABLE =
22821   sig
22822      type checkpoint
22823
22824      val stable: ('a -> 'b) -> ('a -> 'b)
22825      val stabilize: unit -> 'a
22826
22827      val stableCP: (('a -> 'b) * (unit -> unit)) ->
22828                     (('a -> 'b) *  checkpoint)
22829      val stabilizeCP: checkpoint -> unit
22830
22831      val unmonitoredAssign: ('a ref * 'a) -> unit
22832      val monitoredAssign: ('a ref * 'a) -> unit
22833   end
22834 ----
22835
22836
22837 `Stable` provides functions to manage stable sections.
22838
22839 * `type checkpoint`
22840 +
22841 handle used to stabilize contexts other than the current one.
22842
22843 * `stable f`
22844 +
22845 returns a function identical to `f` that will execute within a stable section.
22846
22847 * `stabilize ()`
22848 +
22849 unrolls the effects made up to the current context to at least the
22850 nearest enclosing _stable_ section.  These effects may have propagated
22851 to other threads, so all affected threads are returned to a globally
22852 consistent previous state.  The return is undefined because control
22853 cannot resume after stabilize is called.
22854
22855 * `stableCP (f, comp)`
22856 +
22857 returns a function `f'` and checkpoint tag `cp`.  Function `f'` is
22858 identical to `f` but when applied will execute within a stable
22859 section.  `comp` will be executed if `f'` is later stabilized.  `cp`
22860 is used by `stabilizeCP` to stabilize a given checkpoint.
22861
22862 * `stabilizeCP cp`
22863 +
22864 same as stabilize except that the (possibly current) checkpoint to
22865 stabilize is provided.
22866
22867 * `unmonitoredAssign (r, v)`
22868 +
22869 standard assignment (`:=`).  The version of CML distributed rebinds
22870 `:=` to a monitored version so interesting effects can be recorded.
22871
22872 * `monitoredAssign (r, v)`
22873 +
22874 the assignment operator that should be used in programs that use
22875 stabilizers. `:=` is rebound to this by including CML.
22876
22877 == Download ==
22878
22879 * <!Attachment(Stabilizers,stabilizers_alpha_2006-10-09.tar.gz)>
22880
22881 == Also see ==
22882
22883 * <!Cite(ZiarekEtAl06)>
22884
22885 <<<
22886
22887 :mlton-guide-page: StandardML
22888 [[StandardML]]
22889 StandardML
22890 ==========
22891
22892 Standard ML (SML) is a programming language that combines excellent
22893 support for rapid prototyping, modularity, and development of large
22894 programs, with performance approaching that of C.
22895
22896 == SML Resources ==
22897
22898 * <:StandardMLTutorials:Tutorials>
22899 * <:StandardMLBooks:Books>
22900 * <:StandardMLImplementations:Implementations>
22901 // * http://google.com/coop/cse?cx=014714656471597805969%3Afzuz7eybmcy[SML web search] from Google Co-op
22902
22903 == Aspects of SML ==
22904
22905 * <:DefineTypeBeforeUse:>
22906 * <:EqualityType:>
22907 * <:EqualityTypeVariable:>
22908 * <:GenerativeDatatype:>
22909 * <:GenerativeException:>
22910 * <:Identifier:>
22911 * <:OperatorPrecedence:>
22912 * <:Overloading:>
22913 * <:PolymorphicEquality:>
22914 * <:TypeVariableScope:>
22915 * <:ValueRestriction:>
22916
22917 == Using SML ==
22918
22919 * <:Fixpoints:>
22920 * <:ForLoops:>
22921 * <:FunctionalRecordUpdate:>
22922 * <:InfixingOperators:>
22923 * <:Lazy:>
22924 * <:ObjectOrientedProgramming:>
22925 * <:OptionalArguments:>
22926 * <:Printf:>
22927 * <:PropertyList:>
22928 * <:ReturnStatement:>
22929 * <:Serialization:>
22930 * <:StandardMLGotchas:>
22931 * <:StyleGuide:>
22932 * <:TipsForWritingConciseSML:>
22933 * <:UniversalType:>
22934
22935 == Programming in SML ==
22936
22937 * <:Emacs:>
22938 * <:Enscript:>
22939 * <:Pygments:>
22940
22941 == Notes ==
22942
22943 * <:StandardMLHistory: History of SML>
22944 * <:Regions:>
22945
22946 == Related Languages ==
22947
22948 * <:Alice:>
22949 * <:FSharp:F#>
22950 * <:OCaml:>
22951
22952 <<<
22953
22954 :mlton-guide-page: StandardMLBooks
22955 [[StandardMLBooks]]
22956 StandardMLBooks
22957 ===============
22958
22959 == Introductory Books ==
22960
22961 * <!Cite(Ullman98, Elements of ML Programming)>
22962
22963 * <!Cite(Paulson96, ML For the Working Programmer)>
22964
22965 * <!Cite(HansenRichel99, Introduction to Programming using SML)>
22966
22967 * <!Cite(FelleisenFreidman98, The Little MLer)>
22968
22969 == Applications ==
22970
22971 * <!Cite(Shipman02, Unix System Programming with Standard ML)>
22972
22973 == Reference Books ==
22974
22975 * <!Cite(GansnerReppy04, The Standard ML Basis Library)>
22976
22977 * <:DefinitionOfStandardML:The Definition of Standard ML (Revised)>
22978
22979 == Related Topics ==
22980
22981 * <!Cite(Reppy07, Concurrent Programming in ML)>
22982
22983 * <!Cite(Okasaki99, Purely Functional Data Structures)>
22984
22985 <<<
22986
22987 :mlton-guide-page: StandardMLGotchas
22988 [[StandardMLGotchas]]
22989 StandardMLGotchas
22990 =================
22991
22992 This page contains brief explanations of some recurring sources of
22993 confusion and problems that SML newbies encounter.
22994
22995 Many confusions about the syntax of SML seem to arise from the use of
22996 an interactive REPL (Read-Eval Print Loop) while trying to learn the
22997 basics of the language.  While writing your first SML programs, you
22998 should keep the source code of your programs in a form that is
22999 accepted by an SML compiler as a whole.
23000
23001 == The `and` keyword ==
23002
23003 It is a common mistake to misuse the `and` keyword or to not know how
23004 to introduce mutually recursive definitions.  The purpose of the `and`
23005 keyword is to introduce mutually recursive definitions of functions
23006 and datatypes.  For example,
23007
23008 [source,sml]
23009 ----
23010 fun isEven 0w0 = true
23011   | isEven 0w1 = false
23012   | isEven n = isOdd (n-0w1)
23013 and isOdd 0w0 = false
23014   | isOdd 0w1 = true
23015   | isOdd n = isEven (n-0w1)
23016 ----
23017
23018 and
23019
23020 [source,sml]
23021 ----
23022 datatype decl = VAL of id * pat * expr
23023            (* | ... *)
23024      and expr = LET of decl * expr
23025            (* | ... *)
23026 ----
23027
23028 You can also use `and` as a shorthand in a couple of other places, but
23029 it is not necessary.
23030
23031 == Constructed patterns ==
23032
23033 It is a common mistake to forget to parenthesize constructed patterns
23034 in `fun` bindings.  Consider the following invalid definition:
23035
23036 [source,sml]
23037 ----
23038 fun length nil = 0
23039   | length h :: t = 1 + length t
23040 ----
23041
23042 The pattern `h :: t` needs to be parenthesized:
23043
23044 [source,sml]
23045 ----
23046 fun length nil = 0
23047   | length (h :: t) = 1 + length t
23048 ----
23049
23050 The parentheses are needed, because a `fun` definition may have
23051 multiple consecutive constructed patterns through currying.
23052
23053 The same applies to nonfix constructors.  For example, the parentheses
23054 in
23055
23056 [source,sml]
23057 ----
23058 fun valOf NONE = raise Option
23059   | valOf (SOME x) = x
23060 ----
23061
23062 are required.  However, the outermost constructed pattern in a `fn` or
23063 `case` expression need not be parenthesized, because in those cases
23064 there is always just one constructed pattern.  So, both
23065
23066 [source,sml]
23067 ----
23068 val valOf = fn NONE => raise Option
23069              | SOME x => x
23070 ----
23071
23072 and
23073
23074 [source,sml]
23075 ----
23076 fun valOf x = case x of
23077                  NONE => raise Option
23078                | SOME x => x
23079 ----
23080
23081 are fine.
23082
23083 == Declarations and expressions ==
23084
23085 It is a common mistake to confuse expressions and declarations.
23086 Normally an SML source file should only contain declarations.  The
23087 following are declarations:
23088
23089 [source,sml]
23090 ----
23091 datatype dt = ...
23092 fun f ... = ...
23093 functor Fn (...) = ...
23094 infix ...
23095 infixr ...
23096 local ... in ... end
23097 nonfix ...
23098 open ...
23099 signature SIG = ...
23100 structure Struct = ...
23101 type t = ...
23102 val v = ...
23103 ----
23104
23105 Note that
23106
23107 [source,sml]
23108 ----
23109 let ... in ... end
23110 ----
23111
23112 isn't a declaration.
23113
23114 To specify a side-effecting computation in a source file, you can write:
23115
23116 [source,sml]
23117 ----
23118 val () = ...
23119 ----
23120
23121
23122 == Equality types ==
23123
23124 SML has a fairly intricate built-in notion of equality.  See
23125 <:EqualityType:> and <:EqualityTypeVariable:> for a thorough
23126 discussion.
23127
23128
23129 == Nested cases ==
23130
23131 It is a common mistake to write nested case expressions without the
23132 necessary parentheses.  See <:UnresolvedBugs:> for a discussion.
23133
23134
23135 == (op *) ==
23136
23137 It used to be a common mistake to parenthesize `op *` as `(op *)`.
23138 Before SML'97, `*)` was considered a comment terminator in SML and
23139 caused a syntax error.  At the time of writing, <:SMLNJ:SML/NJ> still
23140 rejects the code.  An extra space may be used for portability:
23141 `(op * )`. However, parenthesizing `op` is redundant, even though it
23142 is a widely used convention.
23143
23144
23145 == Overloading ==
23146
23147 A number of standard operators (`+`, `-`, `~`, `*`, `<`, `>`, ...) and
23148 numeric constants are overloaded for some of the numeric types (`int`,
23149 `real`, `word`).  It is a common surprise that definitions using
23150 overloaded operators such as
23151
23152 [source,sml]
23153 ----
23154 fun min (x, y) = if y < x then y else x
23155 ----
23156
23157 are not overloaded themselves.  SML doesn't really support
23158 (user-defined) overloading or other forms of ad hoc polymorphism.  In
23159 cases such as the above where the context doesn't resolve the
23160 overloading, expressions using overloaded operators or constants get
23161 assigned a default type.  The above definition gets the type
23162
23163 [source,sml]
23164 ----
23165 val min : int * int -> int
23166 ----
23167
23168 See <:Overloading:> and <:TypeIndexedValues:> for further discussion.
23169
23170
23171 == Semicolons ==
23172
23173 It is a common mistake to use redundant semicolons in SML code.  This
23174 is probably caused by the fact that in an SML REPL, a semicolon (and
23175 enter) is used to signal the REPL that it should evaluate the
23176 preceding chunk of code as a unit.  In SML source files, semicolons
23177 are really needed in only two places.  Namely, in expressions of the
23178 form
23179
23180 [source,sml]
23181 ----
23182 (exp ; ... ; exp)
23183 ----
23184
23185 and
23186
23187 [source,sml]
23188 ----
23189 let ... in exp ; ... ; exp end
23190 ----
23191
23192 Note that semicolons act as expression (or declaration) separators
23193 rather than as terminators.
23194
23195
23196 == Stale bindings ==
23197
23198 {empty}
23199
23200
23201 == Unresolved records ==
23202
23203 {empty}
23204
23205
23206 == Value restriction ==
23207
23208 See <:ValueRestriction:>.
23209
23210
23211 == Type Variable Scope ==
23212
23213 See <:TypeVariableScope:>.
23214
23215 <<<
23216
23217 :mlton-guide-page: StandardMLHistory
23218 [[StandardMLHistory]]
23219 StandardMLHistory
23220 =================
23221
23222 <:StandardML:Standard ML> grew out of <:ML:> in the early 1980s.
23223
23224 For an excellent overview of SML's history, see Appendix F of the
23225 <:DefinitionOfStandardML:Definition>.
23226
23227 For an overview if its history before 1982, see <!Cite(Milner82, How
23228 ML Evolved)>.
23229
23230 <<<
23231
23232 :mlton-guide-page: StandardMLImplementations
23233 [[StandardMLImplementations]]
23234 StandardMLImplementations
23235 =========================
23236
23237 There are a number of implementations of <:StandardML:Standard ML>,
23238 from interpreters, to byte-code compilers, to incremental compilers,
23239 to whole-program compilers.
23240
23241 * <:Alice:Alice ML>
23242 * <:HaMLet:HaMLet>
23243 * <:MLKit:ML Kit>
23244 * <:Home:MLton>
23245 * <:MoscowML:Moscow ML>
23246 * <:PolyML:Poly/ML>
23247 * <:SMLSharp:SML#>
23248 * <:SMLNJ:SML/NJ>
23249 * <:SMLNET:SML.NET>
23250 * <:TILT:TILT>
23251
23252 == Not Actively Maintained ==
23253
23254 * http://www.dcs.ed.ac.uk/home/edml/[Edinburgh ML]
23255 * <:MLj:MLj>
23256 * MLWorks
23257 * <:Poplog:>
23258 * http://www.cs.cornell.edu/Info/People/jgm/til.tar.Z[TIL]
23259
23260 <<<
23261
23262 :mlton-guide-page: StandardMLPortability
23263 [[StandardMLPortability]]
23264 StandardMLPortability
23265 =====================
23266
23267 Technically, SML'97 as defined in the
23268 <:DefinitionOfStandardML:Definition>
23269 requires only a minimal initial basis, which, while including the
23270 types `int`, `real`, `char`, and `string`, need have
23271 no operations on those base types.  Hence, the only observable output
23272 of an SML'97 program is termination or raising an exception.  Most SML
23273 compilers should agree there, to the degree each agrees with the
23274 Definition.  See <:UnresolvedBugs:> for MLton's very few corner cases.
23275
23276 Realistically, a program needs to make use of the
23277 <:BasisLibrary:Basis Library>.
23278 Within the Basis Library, there are numerous places where the behavior
23279 is implementation dependent.  For a trivial example:
23280
23281 [source,sml]
23282 ----
23283 val _ = valOf (Int.maxInt)
23284 ----
23285
23286
23287 may either raise the `Option` exception (if
23288 `Int.maxInt == NONE`) or may terminate normally.  The default
23289 Int/Real/Word sizes are the biggest implementation dependent aspect;
23290 so, one implementation may raise `Overflow` while another can
23291 accommodate the result.  Also, maximum array and vector lengths are
23292 implementation dependent.  Interfacing with the operating system is a
23293 bit murky, and implementations surely differ in handling of errors
23294 there.
23295
23296 <<<
23297
23298 :mlton-guide-page: StandardMLTutorials
23299 [[StandardMLTutorials]]
23300 StandardMLTutorials
23301 ===================
23302
23303 * http://www.dcs.napier.ac.uk/course-notes/sml/manual.html[A Gentle Introduction to ML].
23304 Andrew Cummings.
23305
23306 * http://www.dcs.ed.ac.uk/home/stg/NOTES/[Programming in Standard ML '97: An Online Tutorial].
23307 Stephen Gilmore.
23308
23309 * <!Cite(Harper11, Programming in Standard ML)>.
23310 Robert Harper.
23311
23312 * <!Cite(Tofte96, Essentials of Standard ML Modules)>.
23313 Mads Tofte.
23314
23315 * <!Cite(Tofte09, Tips for Computer Scientists on Standard ML (Revised))>.
23316 Mads Tofte.
23317
23318 <<<
23319
23320 :mlton-guide-page: StaticSum
23321 [[StaticSum]]
23322 StaticSum
23323 =========
23324
23325 While SML makes it impossible to write functions whose types would
23326 depend on the values of their arguments, or so called dependently
23327 typed functions, it is possible, and arguably commonplace, to write
23328 functions whose types depend on the types of their arguments.  Indeed,
23329 the types of parametrically polymorphic functions like `map` and
23330 `foldl` can be said to depend on the types of their arguments.  What
23331 is less commonplace, however, is to write functions whose behavior
23332 would depend on the types of their arguments.  Nevertheless, there are
23333 several techniques for writing such functions.
23334 <:TypeIndexedValues:Type-indexed values> and <:Fold:fold> are two such
23335 techniques.  This page presents another such technique dubbed static
23336 sums.
23337
23338
23339 == Ordinary Sums ==
23340
23341 Consider the sum type as defined below:
23342 [source,sml]
23343 ----
23344 structure Sum = struct
23345    datatype ('a, 'b) t = INL of 'a | INR of 'b
23346 end
23347 ----
23348
23349 While a generic sum type such as defined above is very useful, it has
23350 a number of limitations.  As an example, we could write the function
23351 `out` to extract the value from a sum as follows:
23352 [source,sml]
23353 ----
23354 fun out (s : ('a, 'a) Sum.t) : 'a =
23355     case s
23356      of Sum.INL a => a
23357       | Sum.INR a => a
23358 ----
23359
23360 As can be seen from the type of `out`, it is limited in the sense that
23361 it requires both variants of the sum to have the same type.  So, `out`
23362 cannot be used to extract the value of a sum of two different types,
23363 such as the type `(int, real) Sum.t`.  As another example of a
23364 limitation, consider the following attempt at a `succ` function:
23365 [source,sml]
23366 ----
23367 fun succ (s : (int, real) Sum.t) : ??? =
23368     case s
23369      of Sum.INL i => i + 1
23370       | Sum.INR r => Real.nextAfter (r, Real.posInf)
23371 ----
23372
23373 The above definition of `succ` cannot be typed, because there is no
23374 type for the codomain within SML.
23375
23376
23377 == Static Sums ==
23378
23379 Interestingly, it is possible to define values `inL`, `inR`, and
23380 `match` that satisfy the laws
23381 ----
23382 match (inL x) (f, g) = f x
23383 match (inR x) (f, g) = g x
23384 ----
23385 and do not suffer from the same limitions.  The definitions are
23386 actually quite trivial:
23387 [source,sml]
23388 ----
23389 structure StaticSum = struct
23390    fun inL x (f, _) = f x
23391    fun inR x (_, g) = g x
23392    fun match x = x
23393 end
23394 ----
23395
23396 Now, given the `succ` function defined as
23397 [source,sml]
23398 ----
23399 fun succ s =
23400     StaticSum.match s
23401        (fn i => i + 1,
23402         fn r => Real.nextAfter (r, Real.posInf))
23403 ----
23404 we get
23405 [source,sml]
23406 ----
23407 succ (StaticSum.inL 1) = 2
23408 succ (StaticSum.inR Real.maxFinite) = Real.posInf
23409 ----
23410
23411 To better understand how this works, consider the following signature
23412 for static sums:
23413 [source,sml]
23414 ----
23415 structure StaticSum :> sig
23416    type ('dL, 'cL, 'dR, 'cR, 'c) t
23417    val inL : 'dL -> ('dL, 'cL, 'dR, 'cR, 'cL) t
23418    val inR : 'dR -> ('dL, 'cL, 'dR, 'cR, 'cR) t
23419    val match : ('dL, 'cL, 'dR, 'cR, 'c) t -> ('dL -> 'cL) * ('dR -> 'cR) -> 'c
23420 end = struct
23421    type ('dL, 'cL, 'dR, 'cR, 'c) t = ('dL -> 'cL) * ('dR -> 'cR) -> 'c
23422    open StaticSum
23423 end
23424 ----
23425
23426 Above, `'d` stands for domain and `'c` for codomain.  The key
23427 difference between an ordinary sum type, like `(int, real) Sum.t`, and
23428 a static sum type, like `(int, real, real, int, real) StaticSum.t`, is
23429 that the ordinary sum type says nothing about the type of the result
23430 of deconstructing a sum while the static sum type specifies the type.
23431
23432 With the sealed static sum module, we get the type
23433 [source,sml]
23434 ----
23435 val succ : (int, int, real, real, 'a) StaticSum.t -> 'a
23436 ----
23437 for the previously defined `succ` function.  The type specifies that
23438 `succ` maps a left `int` to an `int` and a right `real` to a `real`.
23439 For example, the type of `StaticSum.inL 1` is
23440 `(int, 'cL, 'dR, 'cR, 'cL) StaticSum.t`.  Unifying this with the
23441 argument type of `succ` gives the type `(int, int, real, real, int)
23442 StaticSum.t -> int`.
23443
23444 The `out` function is quite useful on its own.  Here is how it can be
23445 defined:
23446 [source,sml]
23447 ----
23448 structure StaticSum = struct
23449    open StaticSum
23450    val out : ('a, 'a, 'b, 'b, 'c) t -> 'c =
23451     fn s => match s (fn x => x, fn x => x)
23452 end
23453 ----
23454
23455 Due to the value restriction, lack of first class polymorphism and
23456 polymorphic recursion, the usefulness and convenience of static sums
23457 is somewhat limited in SML.  So, don't throw away the ordinary sum
23458 type just yet.  Static sums can nevertheless be quite useful.
23459
23460
23461 === Example: Send and Receive with Argument Type Dependent Result Types ===
23462
23463 In some situations it would seem useful to define functions whose
23464 result type would depend on some of the arguments.  Traditionally such
23465 functions have been thought to be impossible in SML and the solution
23466 has been to define multiple functions.  For example, the
23467 http://www.standardml.org/Basis/socket.html[`Socket` structure] of the
23468 Basis library defines 16 `send` and 16 `recv` functions.  In contrast,
23469 the Net structure
23470 (<!ViewGitFile(mltonlib,master,com/sweeks/basic/unstable/net.sig)>) of the
23471 Basic library designed by Stephen Weeks defines only a single `send`
23472 and a single `receive` and the result types of the functions depend on
23473 their arguments.  The implementation
23474 (<!ViewGitFile(mltonlib,master,com/sweeks/basic/unstable/net.sml)>) uses
23475 static sums (with a slighly different signature:
23476 <!ViewGitFile(mltonlib,master,com/sweeks/basic/unstable/static-sum.sig)>).
23477
23478
23479 === Example: Picking Monad Results ===
23480
23481 Suppose that we need to write a parser that accepts a pair of integers
23482 and returns their sum given a monadic parsing combinator library.  A
23483 part of the signature of such library could look like this
23484 [source,sml]
23485 ----
23486 signature PARSING = sig
23487    include MONAD
23488    val int : int t
23489    val lparen : unit t
23490    val rparen : unit t
23491    val comma : unit t
23492    (* ... *)
23493 end
23494 ----
23495 where the `MONAD` signature could be defined as
23496 [source,sml]
23497 ----
23498 signature MONAD = sig
23499    type 'a t
23500    val return : 'a -> 'a t
23501    val >>= : 'a t * ('a -> 'b t) -> 'b t
23502 end
23503 infix >>=
23504 ----
23505
23506 The straightforward, but tedious, way to write the desired parser is:
23507 [source,sml]
23508 ----
23509 val p = lparen >>= (fn _ =>
23510         int    >>= (fn x =>
23511         comma  >>= (fn _ =>
23512         int    >>= (fn y =>
23513         rparen >>= (fn _ =>
23514         return (x + y))))))
23515 ----
23516
23517 In Haskell, the parser could be written using the `do` notation
23518 considerably less verbosely as:
23519 [source,haskell]
23520 ----
23521 p = do { lparen ; x <- int ; comma ; y <- int ; rparen ; return $ x + y }
23522 ----
23523
23524 SML doesn't provide a `do` notation, so we need another solution.
23525
23526 Suppose we would have a "pick" notation for monads that would allows
23527 us to write the parser as
23528 [source,sml]
23529 ----
23530 val p = `lparen ^ \int ^ `comma ^ \int ^ `rparen @ (fn x & y => x + y)
23531 ----
23532 using four auxiliary combinators: +&grave;+, `\`, `^`, and `@`.
23533
23534 Roughly speaking
23535
23536 * +&grave;p+ means that the result of `p` is dropped,
23537 * `\p` means that the result of `p` is taken,
23538 * `p ^ q` means that results of `p` and `q` are taken as a product, and
23539 * `p @ a` means that the results of `p` are passed to the function `a` and that result is returned.
23540
23541 The difficulty is in implementing the concatenation combinator `^`.
23542 The type of the result of the concatenation depends on the types of
23543 the arguments.
23544
23545 Using static sums and the <:ProductType:product type>, the pick
23546 notation for monads can be implemented as follows:
23547 [source,sml]
23548 ----
23549 functor MkMonadPick (include MONAD) = let
23550    open StaticSum
23551 in
23552    struct
23553       fun `a = inL (a >>= (fn _ => return ()))
23554       val \ = inR
23555       fun a @ f = out a >>= (return o f)
23556       fun a ^ b =
23557           (match b o match a)
23558              (fn a =>
23559                  (fn b => inL (a >>= (fn _ => b)),
23560                   fn b => inR (a >>= (fn _ => b))),
23561               fn a =>
23562                  (fn b => inR (a >>= (fn a => b >>= (fn _ => return a))),
23563                   fn b => inR (a >>= (fn a => b >>= (fn b => return (a & b))))))
23564    end
23565 end
23566 ----
23567
23568 The above implementation is inefficient, however.  It uses many more
23569 bind operations, `>>=`, than necessary.  That can be solved with an
23570 additional level of abstraction:
23571 [source,sml]
23572 ----
23573 functor MkMonadPick (include MONAD) = let
23574    open StaticSum
23575 in
23576    struct
23577       fun `a = inL (fn b => a >>= (fn _ => b ()))
23578       fun \a = inR (fn b => a >>= b)
23579       fun a @ f = out a (return o f)
23580       fun a ^ b =
23581           (match b o match a)
23582              (fn a => (fn b => inL (fn c => a (fn () => b c)),
23583                        fn b => inR (fn c => a (fn () => b c))),
23584               fn a => (fn b => inR (fn c => a (fn a => b (fn () => c a))),
23585                        fn b => inR (fn c => a (fn a => b (fn b => c (a & b))))))
23586    end
23587 end
23588 ----
23589
23590 After instantiating and opening either of the above monad pick
23591 implementations, the previously given definition of `p` can be
23592 compiled and results in a parser whose result is of type `int`.  Here
23593 is a functor to test the theory:
23594 [source,sml]
23595 ----
23596 functor Test (Arg : PARSING) = struct
23597    local
23598       structure Pick = MkMonadPick (Arg)
23599       open Pick Arg
23600    in
23601       val p : int t =
23602           `lparen ^ \int ^ `comma ^ \int ^ `rparen @ (fn x & y => x + y)
23603    end
23604 end
23605 ----
23606
23607
23608 == Also see ==
23609
23610 There are a number of related techniques.  Here are some of them.
23611
23612 * <:Fold:>
23613 * <:TypeIndexedValues:>
23614
23615 <<<
23616
23617 :mlton-guide-page: StephenWeeks
23618 [[StephenWeeks]]
23619 StephenWeeks
23620 ============
23621
23622 I live in the New York City area and work at http://janestcapital.com[Jane Street Capital].
23623
23624 My http://sweeks.com/[home page].
23625
23626 You can email me at sweeks@sweeks.com.
23627
23628 <<<
23629
23630 :mlton-guide-page: StyleGuide
23631 [[StyleGuide]]
23632 StyleGuide
23633 ==========
23634
23635 These conventions are chosen so that inertia is towards modularity, code reuse and finding bugs early, _not_ to save typing.
23636
23637 * <:SyntacticConventions:>
23638
23639 <<<
23640
23641 :mlton-guide-page: Subversion
23642 [[Subversion]]
23643 Subversion
23644 ==========
23645
23646 http://subversion.apache.org/[Subversion] is a version control system.
23647 The MLton project used Subversion to maintain its
23648 <:Sources:source code>, but switched to <:Git:> on 20130308.
23649
23650 Here are some online Subversion resources.
23651
23652 * http://svnbook.red-bean.com[Version Control with Subversion]
23653
23654 <<<
23655
23656 :mlton-guide-page: SuccessorML
23657 [[SuccessorML]]
23658 SuccessorML
23659 ===========
23660
23661 The purpose of http://sml-family.org/successor-ml/[successor ML], or
23662 sML for short, is to provide a vehicle for the continued evolution of
23663 ML, using Standard ML as a starting point. The intention is for
23664 successor ML to be a living, evolving dialect of ML that is responsive
23665 to community needs and advances in language design, implementation,
23666 and semantics.
23667
23668 == SuccessorML Features in MLton ==
23669
23670 The following SuccessorML features have been implemented in MLton.
23671 The features are disabled by default, and may be enabled utilizing the
23672 feature's corresponding <:MLBasisAnnotations:ML Basis annotation>
23673 which is listed directly after the feature name.  In addition, the
23674 +allowSuccessorML {false|true}+ annotation can be used to
23675 simultaneously enable all of the features.
23676
23677 * <!Anchor(DoDecls)>
23678 `do` Declarations: +allowDoDecls {false|true}+
23679 +
23680 Allow a +do _exp_+ declaration form, which evaluates _exp_ for its
23681 side effects.  The following example uses a `do` declaration:
23682 +
23683 [source,sml]
23684 ----
23685 do print "Hello world.\n"
23686 ----
23687 +
23688 and is equivalent to:
23689 +
23690 [source,sml]
23691 ----
23692 val () = print "Hello world.\n"
23693 ----
23694
23695 * <!Anchor(ExtendedConsts)>
23696 Extended Constants: +allowExtendedConsts {false|true}+
23697 +
23698 --
23699 Allow or disallow all of the extended constants features.  This is a
23700 proxy for all of the following annotations.
23701
23702 ** <!Anchor(ExtendedNumConsts)>
23703 Extended Numeric Constants: +allowExtendedNumConsts {false|true}+
23704 +
23705 Allow underscores as a separator in numeric constants and allow binary
23706 integer and word constants.
23707 +
23708 Underscores in a numeric constant must occur between digits and
23709 consecutive underscores are allowed.
23710 +
23711 Binary integer constants use the prefix +0b+ and binary word constants
23712 use the prefix +0wb+.
23713 +
23714 The following example uses extended numeric constants (although it may
23715 be incorrectly syntax highlighted):
23716 +
23717 [source,sml]
23718 ----
23719 val pb = 0b10101
23720 val nb = ~0b10_10_10
23721 val wb = 0wb1010
23722 val i = 4__327__829
23723 val r = 6.022_140_9e23
23724 ----
23725
23726 ** <!Anchor(ExtendedTextConsts)> Extended Text Constants: +allowExtendedTextConsts {false|true}+
23727 +
23728 Allow characters with integer codes &ge; 128 and &le; 247 that
23729 correspond to syntactically well-formed UTF-8 byte sequences in text
23730 constants.
23731 +
23732 ////
23733 and allow `\Uxxxxxxxx` numeric escapes in text constants.
23734 ////
23735 +
23736 Any 1, 2, 3, or 4 byte sequence that can be properly decoded to a
23737 binary number according to the UTF-8 encoding/decoding scheme is
23738 allowed in a text constant (but invalid sequences are not explicitly
23739 rejected) and denotes the corresponding sequence of characters with
23740 integer codes &ge; 128 and &le; 247.  This feature enables "UTF-8
23741 convenience" (but not comprehensive Unicode support); in particular,
23742 it allows one to copy text from a browser and paste it into a string
23743 constant in an editor and, furthermore, if the string is printed to a
23744 terminal, then will (typically) appear as the original text.  The
23745 following example uses UTF-8 byte sequences:
23746 +
23747 [source,sml]
23748 ----
23749 val s1 : String.string = "\240\159\130\161"
23750 val s2 : String.string = "🂡"
23751 val _ = print ("s1 --> " ^ s1 ^ "\n")
23752 val _ = print ("s2 --> " ^ s2 ^ "\n")
23753 val _ = print ("String.size s1 --> " ^ Int.toString (String.size s1) ^ "\n")
23754 val _ = print ("String.size s2 --> " ^ Int.toString (String.size s2) ^ "\n")
23755 val _ = print ("s1 = s2 --> " ^ Bool.toString (s1 = s2) ^ "\n")
23756 ----
23757 +
23758 and, when compiled and executed, will display:
23759 +
23760 ----
23761 s1 --> 🂡
23762 s2 --> 🂡
23763 String.size s1 --> 4
23764 String.size s2 --> 4
23765 s1 = s2 --> true
23766 ----
23767 +
23768 Note that the `String.string` type corresponds to any sequence of
23769 8-bit values, including invalid UTF-8 sequences; hence the string
23770 constant `"\192"` (a UTF-8 leading byte with no UTF-8 continuation
23771 byte) is valid.  Similarly, the `Char.char` type corresponds to a
23772 single 8-bit value; hence the char constant `#"α"` is not valid, as
23773 the text constant `"α"` denotes a sequence of two 8-bit values.
23774 +
23775 ////
23776 A `\Uxxxxxxxx` numeric escape denotes a single character with the
23777 hexadecimal integer code `xxxxxxxx`.  Such numeric escapes are not
23778 necessary for the `String.string` and `Char.char` types, since
23779 characters in such text constants must have integer codes &le; 255 and
23780 the `\ddd` and `\uxxxx` numeric escapes suffice.  However, the
23781 `\Uxxxxxxxx` numeric escapes are useful for the `WideString.string`
23782 and `WideChar.char` types, since characters in such text constants may
23783 have integer codes &le; 2^32^-1.  The following uses a `\Uxxxxxxxx`
23784 numeric escape (although it may be incorrectly syntax highlighted):
23785 +
23786 [source,sml]
23787 ----
23788 val s1 : WideString.string = "\U0001F0A1" (* 'PLAYING CARD ACE OF SPADES' (U+1F0A1) *)
23789 val _ = print ("WideString.size s1 --> " ^ Int.toString (WideString.size s1) ^ "\n")
23790 ----
23791 +
23792 and, when compiled and executed, will display:
23793 +
23794 ----
23795 WideString.size s1 --> 1
23796 ----
23797 +
23798 Note that the `WideString.string` type corresponds to any sequence of
23799 32-bit values, including invalid Unicode code points; hence, the
23800 string constants `"\U001F0000"` and `"\U40000000"` are valid (but the
23801 corresponding integer codes are not valid Unicode code points).
23802 Similarly, the `WideChar.char` type corresponds to a single 32-bit
23803 value.
23804 +
23805 Finally, note that a UTF-8 byte sequence in a `WideString.string` or
23806 `WideChar.char` text constant does not denote a single 32-bit value,
23807 but rather a sequence of 32-bit values &ge; 128 and &le; 247.  The
23808 following example uses both UTF-8 byte sequences and `\Uxxxxxxxx`
23809 numeric escapes (although it may be incorrectly syntax highlighted):
23810 +
23811 [source,sml]
23812 ----
23813 val s1 : WideString.string = "\U0001F0A1" (* 'PLAYING CARD ACE OF SPADES' (U+1F0A1) *)
23814 val s2 : WideString.string = "🂡"
23815 val s3 : WideString.string = "\U000000F0\U0000009F\U00000082\U000000A1"
23816 val _ = print ("WideString.size s1 --> " ^ Int.toString (WideString.size s1) ^ "\n")
23817 val _ = print ("WideString.size s2 --> " ^ Int.toString (WideString.size s2) ^ "\n")
23818 val _ = print ("WideString.size s3 --> " ^ Int.toString (WideString.size s3) ^ "\n")
23819 val _ = print ("s1 = s2 --> " ^ Bool.toString (s1 = s2) ^ "\n")
23820 val _ = print ("s2 = s3 --> " ^ Bool.toString (s2 = s3) ^ "\n")
23821 ----
23822 +
23823 and, when compiled and executed, will display:
23824 +
23825 ----
23826 WideString.size s1 --> 1
23827 WideString.size s2 --> 4
23828 WideString.size s3 --> 4
23829 s1 = s2 --> false
23830 s2 = s3 --> true
23831 ----
23832 ////
23833 --
23834
23835 * <!Anchor(LineComments)>
23836 Line Comments: +allowLineComments {false|true}+
23837 +
23838 Allow line comments beginning with the token ++(*)++.  The following
23839 example uses a line comment:
23840 +
23841 [source,sml]
23842 ----
23843 (*) This is a line comment
23844 ----
23845 +
23846 Line comments properly nest within block comments.  The following
23847 example uses line comments nested within block comments:
23848 +
23849 [source,sml]
23850 ----
23851 (*
23852 val x = 4 (*) This is a line comment
23853 *)
23854
23855 (*
23856 val y = 5 (*) This is a line comment *)
23857 *)
23858 ----
23859
23860 * <!Anchor(OptBar)>
23861 Optional Pattern Bars: +allowOptBar {false|true}+
23862 +
23863 Allow a bar to appear before the first match rule of a `case`, `fn`,
23864 or `handle` expression, allow a bar to appear before the first
23865 function-value binding of a `fun` declaration, and allow a bar to
23866 appear before the first constructor binding or description of a
23867 `datatype` declaration or specification.  The following example uses
23868 leading bars in a `datatype` declaration, a `fun` declaration, and a
23869 `case` expression:
23870 +
23871 [source,sml]
23872 ----
23873 datatype t =
23874   | C
23875   | B
23876   | A
23877
23878 fun
23879   | f NONE = 0
23880   | f (SOME t) =
23881      (case t of
23882         | A => 1
23883         | B => 2
23884         | C => 3)
23885 ----
23886 +
23887 By eliminating the special case of the first element, this feature
23888 allows for simpler refactoring (e.g., sorting the lines of the
23889 `datatype` declaration's constructor bindings to put the constructors
23890 in alphabetical order).
23891
23892 * <!Anchor(OptSemicolon)>
23893 Optional Semicolons: +allowOptSemicolon {false|true}+
23894 +
23895 Allow a semicolon to appear after the last expression in a sequence or
23896 `let`-body expression.  The following example uses a trailing
23897 semicolon in the body of a `let` expression:
23898 +
23899 [source,sml]
23900 ----
23901 fun h z =
23902   let
23903     val x = 3 * z
23904   in
23905      f x ;
23906      g x ;
23907   end
23908 ----
23909 +
23910 By eliminating the special case of the last element, this feature
23911 allows for simpler refactoring.
23912
23913 * <!Anchor(OrPats)>
23914 Disjunctive (Or) Patterns: +allowOrPats {false|true}+
23915 +
23916 Allow disjunctive (a.k.a., "or") patterns of the form +_pat~1~_ |
23917 _pat~2~_+, which matches a value that matches either +_pat~1~_+ or
23918 +_pat~2~_+.  Disjunctive patterns have lower precedence than `as`
23919 patterns and constraint patterns, much as `orelse` expressions have
23920 lower precedence than `andalso` expressions and constraint
23921 expressions.  Both sub-patterns of a disjunctive pattern must bind the
23922 same variables with the same types.  The following example uses
23923 disjunctive patterns:
23924 +
23925 [source,sml]
23926 ----
23927 datatype t = A of int | B of int | C of int | D of int * int | E of int * int
23928
23929 fun f t =
23930   case t of
23931      A x | B x | C x => x + 1
23932    | D (x, _) | E (_, x) => x * 2
23933 ----
23934
23935 * <!Anchor(RecordPunExps)>
23936 Record Punning Expressions: +allowRecordPunExps {false|true}+
23937 +
23938 Allow record punning expressions, whereby an identifier +_vid_+ as an
23939 expression row in a record expression denotes the expression row
23940 +_vid_ = _vid_+ (i.e., treating a label as a variable).  The following
23941 example uses record punning expressions (and also record punning
23942 patterns):
23943 +
23944 [source,sml]
23945 ----
23946 fun incB r =
23947   case r of {a, b, c} => {a, b = b + 1, c}
23948 ----
23949 +
23950 and is equivalent to:
23951 +
23952 [source,sml]
23953 ----
23954 fun incB r =
23955   case r of {a = a, b = b, c = c} => {a = a, b = b + 1, c = c}
23956 ----
23957
23958 * <!Anchor(SigWithtype)>
23959 `withtype` in Signatures: +allowSigWithtype {false|true}+
23960 +
23961 Allow `withtype` to modify a `datatype` specification in a signature.
23962 The following example uses `withtype` in a signature (and also
23963 `withtype` in a declaration):
23964 +
23965 [source,sml]
23966 ----
23967 signature STREAM =
23968   sig
23969     datatype 'a u = Nil | Cons of 'a * 'a t
23970     withtype 'a t = unit -> 'a u
23971   end
23972 structure Stream : STREAM =
23973   struct
23974     datatype 'a u = Nil | Cons of 'a * 'a t
23975     withtype 'a t = unit -> 'a u
23976   end
23977 ----
23978 +
23979 and is equivalent to:
23980 +
23981 [source,sml]
23982 ----
23983 signature STREAM =
23984   sig
23985     datatype 'a u = Nil | Cons of 'a * (unit -> 'a u)
23986     type 'a t = unit -> 'a u
23987   end
23988 structure Stream : STREAM =
23989   struct
23990     datatype 'a u = Nil | Cons of 'a * (unit -> 'a u)
23991     type 'a t = unit -> 'a u
23992   end
23993 ----
23994
23995 * <!Anchor(VectorExpsAndPats)>
23996 Vector Expressions and Patterns: +allowVectorExpsAndPats {false|true}+
23997 +
23998 --
23999 Allow or disallow vector expressions and vector patterns.  This is a
24000 proxy for all of the following annotations.
24001
24002 ** <!Anchor(VectorExps)>
24003 Vector Expressions: +allowVectorExps {false|true}+
24004 +
24005 Allow vector expressions of the form +#[_exp~0~_, _exp~1~_, ..., _exp~n-1~_]+ (where _n ≥ 0_).  The expression has type +_τ_ vector+ when each expression _exp~i~_ has type +_τ_+.
24006
24007 ** <!Anchor(VectorPats)>
24008 Vector Patterns: +allowVectorPats {false|true}+
24009 +
24010 Allow vector patterns of the form +#[_pat~0~_, _pat~1~_, ..., _pat~n-1~_]+ (where _n ≥ 0_).  The pattern matches values of type +_τ_ vector+ when each pattern _pat~i~_ matches values of type +_τ_+.
24011 --
24012
24013 <<<
24014
24015 :mlton-guide-page: SureshJagannathan
24016 [[SureshJagannathan]]
24017 SureshJagannathan
24018 =================
24019
24020 I am an Associate Professor at the http://www.cs.purdue.edu/[Department of Computer Science] at Purdue University.
24021 My research focus is in programming language design and implementation, concurrency,
24022 and distributed systems.  I am interested in various aspects of MLton, mostly related to (in no particular order): (1) control-flow analysis (2) representation
24023 strategies (e.g., flattening), (3) IR formats, and (4) extensions for distributed programming.
24024
24025
24026 Please see my http://www.cs.purdue.edu/homes/suresh/index.html[Home page] for more details.
24027
24028 <<<
24029
24030 :mlton-guide-page: Swerve
24031 [[Swerve]]
24032 Swerve
24033 ======
24034
24035 http://ftp.sun.ac.za/ftp/mirrorsites/ocaml/Systems_programming/book/c3253.html[Swerve]
24036 is an HTTP server written in SML, originally developed with SML/NJ.
24037 <:RayRacine:> ported Swerve to MLton in January 2005.
24038
24039 <!Attachment(Swerve,swerve.tar.bz2,Download)> the port.
24040
24041 Excerpt from the included `README`:
24042 ____
24043 Total testing of this port consisted of a successful compile, startup,
24044 and serving one html page with one gif image.  Given that the original
24045 code was throughly designed and implemented in a thoughtful manner and
24046 I expect it is quite usable modulo a few minor bugs introduced by my
24047 porting effort.
24048 ____
24049
24050 Swerve is described in <!Cite(Shipman02)>.
24051
24052 <<<
24053
24054 :mlton-guide-page: SXML
24055 [[SXML]]
24056 SXML
24057 ====
24058
24059 <:SXML:> is an <:IntermediateLanguage:>, translated from <:XML:> by
24060 <:Monomorphise:>, optimized by <:SXMLSimplify:>, and translated by
24061 <:ClosureConvert:> to <:SSA:>.
24062
24063 == Description ==
24064
24065 SXML is a simply-typed version of <:XML:>.
24066
24067 == Implementation ==
24068
24069 * <!ViewGitFile(mlton,master,mlton/xml/sxml.sig)>
24070 * <!ViewGitFile(mlton,master,mlton/xml/sxml.fun)>
24071 * <!ViewGitFile(mlton,master,mlton/xml/sxml-tree.sig)>
24072
24073 == Type Checking ==
24074
24075 <:SXML:> shares the type checker for <:XML:>.
24076
24077 == Details and Notes ==
24078
24079 There are only two differences between <:XML:> and <:SXML:>.  First,
24080 <:SXML:> `val`, `fun`, and `datatype` declarations always have an
24081 empty list of type variables.  Second, <:SXML:> variable references
24082 always have an empty list of type arguments.  Constructors uses can
24083 only have a nonempty list of type arguments if the constructor is a
24084 primitive.
24085
24086 Although we could rely on the type system to enforce these constraints
24087 by parameterizing the <:XML:> signature, <:StephenWeeks:> did so in a
24088 previous version of the compiler, and the software engineering gains
24089 were not worth the effort.
24090
24091 <<<
24092
24093 :mlton-guide-page: SXMLShrink
24094 [[SXMLShrink]]
24095 SXMLShrink
24096 ==========
24097
24098 SXMLShrink is an optimization pass for the <:SXML:>
24099 <:IntermediateLanguage:>, invoked from <:SXMLSimplify:>.
24100
24101 == Description ==
24102
24103 This pass performs optimizations based on a reduction system.
24104
24105 == Implementation ==
24106
24107 * <!ViewGitFile(mlton,master,mlton/xml/shrink.sig)>
24108 * <!ViewGitFile(mlton,master,mlton/xml/shrink.fun)>
24109
24110 == Details and Notes ==
24111
24112 <:SXML:> shares the <:XMLShrink:> simplifier.
24113
24114 <<<
24115
24116 :mlton-guide-page: SXMLSimplify
24117 [[SXMLSimplify]]
24118 SXMLSimplify
24119 ============
24120
24121 The optimization passes for the <:SXML:> <:IntermediateLanguage:> are
24122 collected and controlled by the `SxmlSimplify` functor
24123 (<!ViewGitFile(mlton,master,mlton/xml/sxml-simplify.sig)>,
24124 <!ViewGitFile(mlton,master,mlton/xml/sxml-simplify.fun)>).
24125
24126 The following optimization passes are implemented:
24127
24128 * <:Polyvariance:>
24129 * <:SXMLShrink:>
24130
24131 The following implementation passes are implemented:
24132
24133 * <:ImplementExceptions:>
24134 * <:ImplementSuffix:>
24135
24136 The following optimization passes are not implemented, but might prove useful:
24137
24138 * <:Uncurry:>
24139 * <:LambdaLift:>
24140
24141 The optimization passes can be controlled from the command-line by the options
24142
24143 * `-diag-pass <pass>` -- keep diagnostic info for pass
24144 * `-disable-pass <pass>` -- skip optimization pass (if normally performed)
24145 * `-enable-pass <pass>` -- perform optimization pass (if normally skipped)
24146 * `-keep-pass <pass>` -- keep the results of pass
24147 * `-sxml-passes <passes>` -- sxml optimization passes
24148
24149 <<<
24150
24151 :mlton-guide-page: SyntacticConventions
24152 [[SyntacticConventions]]
24153 SyntacticConventions
24154 ====================
24155
24156 Here are a number of syntactic conventions useful for programming in
24157 SML.
24158
24159
24160 == General ==
24161
24162 * A line of code never exceeds 80 columns.
24163
24164 * Only split a syntactic entity across multiple lines if it doesn't fit on one line within 80 columns.
24165
24166 * Use alphabetical order wherever possible.
24167
24168 * Avoid redundant parentheses.
24169
24170 * When using `:`, there is no space before the colon, and a single space after it.
24171
24172
24173 == Identifiers ==
24174
24175 * Variables, record labels and type constructors begin with and use
24176 small letters, using capital letters to separate words.
24177 +
24178 [source,sml]
24179 ----
24180 cost
24181 maxValue
24182 ----
24183
24184 * Variables that represent collections of objects (lists, arrays,
24185 vectors, ...) are often suffixed with an `s`.
24186 +
24187 [source,sml]
24188 ----
24189 xs
24190 employees
24191 ----
24192
24193 * Constructors, structure identifiers, and functor identifiers begin
24194 with a capital letter.
24195 +
24196 [source,sml]
24197 ----
24198 Queue
24199 LinkedList
24200 ----
24201
24202 * Signature identifiers are in all capitals, using `_` to separate
24203 words.
24204 +
24205 [source,sml]
24206 ----
24207 LIST
24208 BINARY_HEAP
24209 ----
24210
24211
24212 == Types ==
24213
24214 * Alphabetize record labels.  In a record type, there are spaces after
24215 colons and commas, but not before colons or commas, or at the
24216 delimiters `{` and `}`.
24217 +
24218 [source,sml]
24219 ----
24220 {bar: int, foo: int}
24221 ----
24222
24223 * Only split a record type across multiple lines if it doesn't fit on
24224 one line. If a record type must be split over multiple lines, put one
24225 field per line.
24226 +
24227 [source,sml]
24228 ----
24229 {bar: int,
24230  foo: real * real,
24231  zoo: bool}
24232 ----
24233
24234
24235 * In a tuple type, there are spaces before and after each `*`.
24236 +
24237 [source,sml]
24238 ----
24239 int * bool * real
24240 ----
24241
24242 * Only split a tuple type across multiple lines if it doesn't fit on
24243 one line.  In a tuple type split over multiple lines, there is one
24244 type per line, and the `*`-s go at the beginning of the lines.
24245 +
24246 [source,sml]
24247 ----
24248 int
24249 * bool
24250 * real
24251 ----
24252 +
24253 It may also be useful to parenthesize to make the grouping more
24254 apparent.
24255 +
24256 [source,sml]
24257 ----
24258 (int
24259  * bool
24260  * real)
24261 ----
24262
24263 * In an arrow type split over multiple lines, put the arrow at the
24264 beginning of its line.
24265 +
24266 [source,sml]
24267 ----
24268 int * real
24269 -> bool
24270 ----
24271 +
24272 It may also be useful to parenthesize to make the grouping more
24273 apparent.
24274 +
24275 [source,sml]
24276 ----
24277 (int * real
24278  -> bool)
24279 ----
24280
24281 * Avoid redundant parentheses.
24282
24283 * Arrow types associate to the right, so write
24284 +
24285 [source,sml]
24286 ----
24287 a -> b -> c
24288 ----
24289 +
24290 not
24291 +
24292 [source,sml]
24293 ----
24294 a -> (b -> c)
24295 ----
24296
24297 * Type constructor application associates to the left, so write
24298 +
24299 [source,sml]
24300 ----
24301 int ref list
24302 ----
24303 +
24304 not
24305 +
24306 [source,sml]
24307 ----
24308 (int ref) list
24309 ----
24310
24311 * Type constructor application binds more tightly than a tuple type,
24312 so write
24313 +
24314 [source,sml]
24315 ----
24316 int list * bool list
24317 ----
24318 +
24319 not
24320 +
24321 [source,sml]
24322 ----
24323 (int list) * (bool list)
24324 ----
24325
24326 * Tuple types bind more tightly than arrow types, so write
24327 +
24328 [source,sml]
24329 ----
24330 int * bool -> real
24331 ----
24332 +
24333 not
24334 +
24335 [source,sml]
24336 ----
24337 (int * bool) -> real
24338 ----
24339
24340
24341 == Core ==
24342
24343 * A core expression or declaration split over multiple lines does not
24344 contain any blank lines.
24345
24346 * A record field selector has no space between the `#` and the record
24347 label.  So, write
24348 +
24349 [source,sml]
24350 ----
24351 #foo
24352 ----
24353 +
24354 not
24355 +
24356 [source,sml]
24357 ----
24358 # foo
24359 ----
24360 +
24361
24362 * A tuple has a space after each comma, but not before, and not at the
24363 delimiters `(` and `)`.
24364 +
24365 [source,sml]
24366 ----
24367 (e1, e2, e3)
24368 ----
24369
24370 * A tuple split over multiple lines has one element per line, and the
24371 commas go at the end of the lines.
24372 +
24373 [source,sml]
24374 ----
24375 (e1,
24376  e2,
24377  e3)
24378 ----
24379
24380 * A list has a space after each comma, but not before, and not at the
24381 delimiters `[` and `]`.
24382 +
24383 [source,sml]
24384 ----
24385 [e1, e2, e3]
24386 ----
24387
24388 * A list split over multiple lines has one element per line, and the
24389 commas at the end of the lines.
24390 +
24391 [source,sml]
24392 ----
24393 [e1,
24394  e2,
24395  e3]
24396 ----
24397
24398 * A record has spaces before and after `=`, a space after each comma,
24399 but not before, and not at the delimiters `{` and `}`.  Field names
24400 appear in alphabetical order.
24401 +
24402 [source,sml]
24403 ----
24404 {bar = 13, foo = true}
24405 ----
24406
24407 * A sequence expression has a space after each semicolon, but not before.
24408 +
24409 [source,sml]
24410 ----
24411 (e1; e2; e3)
24412 ----
24413
24414 * A sequence expression split over multiple lines has one expression
24415 per line, and the semicolons at the beginning of lines.  Lisp and
24416 Scheme programmers may find this hard to read at first.
24417 +
24418 [source,sml]
24419 ----
24420 (e1
24421  ; e2
24422  ; e3)
24423 ----
24424 +
24425 _Rationale_: this makes it easy to visually spot the beginning of each
24426 expression, which becomes more valuable as the expressions themselves
24427 are split across multiple lines.
24428
24429 * An application expression has a space between the function and the
24430 argument.  There are no parens unless the argument is a tuple (in
24431 which case the parens are really part of the tuple, not the
24432 application).
24433 +
24434 [source,sml]
24435 ----
24436 f a
24437 f (a1, a2, a3)
24438 ----
24439
24440 * Avoid redundant parentheses.  Application associates to left, so
24441 write
24442 +
24443 [source,sml]
24444 ----
24445 f a1 a2 a3
24446 ----
24447 +
24448 not
24449 +
24450 [source,sml]
24451 ----
24452 ((f a1) a2) a3
24453 ----
24454
24455 * Infix operators have a space before and after the operator.
24456 +
24457 [source,sml]
24458 ----
24459 x + y
24460 x * y - z
24461 ----
24462
24463 * Avoid redundant parentheses.  Use <:OperatorPrecedence:>.  So, write
24464 +
24465 [source,sml]
24466 ----
24467 x + y * z
24468 ----
24469 +
24470 not
24471 +
24472 [source,sml]
24473 ----
24474 x + (y * z)
24475 ----
24476
24477 * An `andalso` expression split over multiple lines has the `andalso`
24478 at the beginning of subsequent lines.
24479 +
24480 [source,sml]
24481 ----
24482 e1
24483 andalso e2
24484 andalso e3
24485 ----
24486
24487 * A `case` expression is indented as follows
24488 +
24489 [source,sml]
24490 ----
24491 case e1 of
24492    p1 => e1
24493  | p2 => e2
24494  | p3 => e3
24495 ----
24496
24497 * A `datatype`'s constructors are alphabetized.
24498 +
24499 [source,sml]
24500 ----
24501 datatype t = A | B | C
24502 ----
24503
24504 * A `datatype` declaration has a space before and after each `|`.
24505 +
24506 [source,sml]
24507 ----
24508 datatype t = A | B of int | C
24509 ----
24510
24511 * A `datatype` split over multiple lines has one constructor per line,
24512 with the `|` at the beginning of lines and the constructors beginning
24513 3 columns to the right of the `datatype`.
24514 +
24515 [source,sml]
24516 ----
24517 datatype t =
24518    A
24519  | B
24520  | C
24521 ----
24522
24523 * A `fun` declaration may start its body on the subsequent line,
24524 indented 3 spaces.
24525 +
24526 [source,sml]
24527 ----
24528 fun f x y =
24529    let
24530       val z = x + y + z
24531    in
24532       z
24533    end
24534 ----
24535
24536 * An `if` expression is indented as follows.
24537 +
24538 [source,sml]
24539 ----
24540 if e1
24541    then e2
24542 else e3
24543 ----
24544
24545 * A sequence of `if`-`then`-`else`-s is indented as follows.
24546 +
24547 [source,sml]
24548 ----
24549 if e1
24550    then e2
24551 else if e3
24552    then e4
24553 else if e5
24554    then e6
24555 else e7
24556 ----
24557
24558 * A `let` expression has the `let`, `in`, and `end` on their own
24559 lines, starting in the same column.  Declarations and the body are
24560 indented 3 spaces.
24561 +
24562 [source,sml]
24563 ----
24564 let
24565    val x = 13
24566    val y = 14
24567 in
24568    x + y
24569 end
24570 ----
24571
24572 * A `local` declaration has the `local`, `in`, and `end` on their own
24573 lines, starting in the same column.  Declarations are indented 3
24574 spaces.
24575 +
24576 [source,sml]
24577 ----
24578 local
24579    val x = 13
24580 in
24581    val y = x
24582 end
24583 ----
24584
24585 * An `orelse` expression split over multiple lines has the `orelse` at
24586 the beginning of subsequent lines.
24587 +
24588 [source,sml]
24589 ----
24590 e1
24591 orelse e2
24592 orelse e3
24593 ----
24594
24595 * A `val` declaration has a space before and after the `=`.
24596 +
24597 [source,sml]
24598 ----
24599 val p = e
24600 ----
24601
24602 * A `val` declaration can start the expression on the subsequent line,
24603 indented 3 spaces.
24604 +
24605 [source,sml]
24606 ----
24607 val p =
24608    if e1 then e2 else e3
24609 ----
24610
24611
24612 == Signatures ==
24613
24614 * A `signature` declaration is indented as follows.
24615 +
24616 [source,sml]
24617 ----
24618 signature FOO =
24619    sig
24620       val x: int
24621    end
24622 ----
24623 +
24624 _Exception_: a signature declaration in a file to itself can omit the
24625 indentation to save horizontal space.
24626 +
24627 [source,sml]
24628 ----
24629 signature FOO =
24630 sig
24631
24632 val x: int
24633
24634 end
24635 ----
24636 +
24637 In this case, there should be a blank line after the `sig` and before
24638 the `end`.
24639
24640 * A `val` specification has a space after the colon, but not before.
24641 +
24642 [source,sml]
24643 ----
24644 val x: int
24645 ----
24646 +
24647 _Exception_: in the case of operators (like `+`), there is a space
24648 before the colon to avoid lexing the colon as part of the operator.
24649 +
24650 [source,sml]
24651 ----
24652 val + : t * t -> t
24653 ----
24654
24655 * Alphabetize specifications in signatures.
24656 +
24657 [source,sml]
24658 ----
24659 sig
24660    val x: int
24661    val y: bool
24662 end
24663 ----
24664
24665
24666 == Structures ==
24667
24668 * A `structure` declaration has a space on both sides of the `=`.
24669 +
24670 [source,sml]
24671 ----
24672 structure Foo = Bar
24673 ----
24674
24675 * A `structure` declaration split over multiple lines is indented as
24676 follows.
24677 +
24678 [source,sml]
24679 ----
24680 structure S =
24681    struct
24682       val x = 13
24683    end
24684 ----
24685 +
24686 _Exception_: a structure declaration in a file to itself can omit the
24687 indentation to save horizontal space.
24688 +
24689 [source,sml]
24690 ----
24691 structure S =
24692 struct
24693
24694 val x = 13
24695
24696 end
24697 ----
24698 +
24699 In this case, there should be a blank line after the `struct` and
24700 before the `end`.
24701
24702 * Declarations in a `struct` are separated by blank lines.
24703 +
24704 [source,sml]
24705 ----
24706 struct
24707    val x =
24708       let
24709          y = 13
24710       in
24711          y + 1
24712       end
24713
24714    val z = 14
24715 end
24716 ----
24717
24718
24719 == Functors ==
24720
24721 * A `functor` declaration has spaces after each `:` (or `:>`) but not
24722 before, and a space before and after the `=`.  It is indented as
24723 follows.
24724 +
24725 [source,sml]
24726 ----
24727 functor Foo (S: FOO_ARG): FOO =
24728    struct
24729        val x = S.x
24730    end
24731 ----
24732 +
24733 _Exception_: a functor declaration in a file to itself can omit the
24734 indentation to save horizontal space.
24735 +
24736 [source,sml]
24737 ----
24738 functor Foo (S: FOO_ARG): FOO =
24739 struct
24740
24741 val x = S.x
24742
24743 end
24744 ----
24745 +
24746 In this case, there should be a blank line after the `struct`
24747 and before the `end`.
24748
24749 <<<
24750
24751 :mlton-guide-page: Talk
24752 [[Talk]]
24753 Talk
24754 ====
24755
24756 == The MLton Standard ML Compiler ==
24757
24758 *Henry Cejtin, Matthew Fluet, Suresh Jagannathan, Stephen Weeks*
24759
24760 {nbsp} +
24761 {nbsp} +
24762 {nbsp} +
24763
24764 '''
24765
24766 [cols="<,>"]
24767 |====
24768 ||<:TalkStandardML: Next>
24769 |====
24770
24771 <<<
24772
24773 :mlton-guide-page: TalkDiveIn
24774 [[TalkDiveIn]]
24775 TalkDiveIn
24776 ==========
24777
24778 == Dive In ==
24779
24780  * to <:Development:>
24781  * to <:Documentation:>
24782  * to <:Download:>
24783
24784 {nbsp} +
24785 {nbsp} +
24786 {nbsp} +
24787
24788 '''
24789
24790 [cols="<,>"]
24791 |====
24792 |<:TalkMLtonHistory: Prev>|
24793 |====
24794
24795 <<<
24796
24797 :mlton-guide-page: TalkFolkLore
24798 [[TalkFolkLore]]
24799 TalkFolkLore
24800 ============
24801
24802 == Folk Lore ==
24803
24804  * Defunctorization and monomorphisation are feasible
24805  * Global control-flow analysis is feasible
24806  * Early closure conversion is feasible
24807
24808 {nbsp} +
24809 {nbsp} +
24810 {nbsp} +
24811
24812 '''
24813
24814 [cols="<,>"]
24815 |====
24816 |<:TalkWholeProgram: Prev>|<:TalkMLtonFeatures: Next>
24817 |====
24818
24819 <<<
24820
24821 :mlton-guide-page: TalkFromSMLTo
24822 [[TalkFromSMLTo]]
24823 TalkFromSMLTo
24824 =============
24825
24826 == From Standard ML to S-T F-O IL ==
24827
24828  * What issues arise when translating from Standard ML into an intermediate language?
24829
24830 {nbsp} +
24831 {nbsp} +
24832 {nbsp} +
24833
24834 '''
24835
24836 [cols="<,>"]
24837 |====
24838 |<:TalkMLtonApproach: Prev>|<:TalkHowModules: Next>
24839 |====
24840
24841 <<<
24842
24843 :mlton-guide-page: TalkHowHigherOrder
24844 [[TalkHowHigherOrder]]
24845 TalkHowHigherOrder
24846 ==================
24847
24848 == Higher-order Functions ==
24849
24850  * How does one represent SML's higher-order functions?
24851  * MLton's answer: defunctionalize
24852
24853 {nbsp} +
24854 {nbsp} +
24855
24856 See <:ClosureConvert:>.
24857
24858 {nbsp} +
24859 {nbsp} +
24860 {nbsp} +
24861
24862 '''
24863 [cols="<,>"]
24864 |====
24865 |<:TalkMLtonApproach: Prev>|<:TalkWholeProgram: Next>
24866 |====
24867
24868 <<<
24869
24870 :mlton-guide-page: TalkHowModules
24871 [[TalkHowModules]]
24872 TalkHowModules
24873 ==============
24874
24875 == Modules ==
24876
24877  * How does one represent SML's modules?
24878  * MLton's answer: defunctorize
24879
24880 {nbsp} +
24881 {nbsp} +
24882
24883 See <:Elaborate:>.
24884
24885 {nbsp} +
24886 {nbsp} +
24887 {nbsp} +
24888
24889 '''
24890
24891 [cols="<,>"]
24892 |====
24893 |<:TalkFromSMLTo: Prev>|<:TalkHowPolymorphism: Next>
24894 |====
24895
24896 <<<
24897
24898 :mlton-guide-page: TalkHowPolymorphism
24899 [[TalkHowPolymorphism]]
24900 TalkHowPolymorphism
24901 ===================
24902
24903 == Polymorphism ==
24904
24905  * How does one represent SML's polymorphism?
24906  * MLton's answer: monomorphise
24907
24908 {nbsp} +
24909 {nbsp} +
24910
24911 See <:Monomorphise:>.
24912
24913 {nbsp} +
24914 {nbsp} +
24915 {nbsp} +
24916
24917 '''
24918
24919 [cols="<,>"]
24920 |====
24921 |<:TalkHowModules: Prev>|<:TalkHowHigherOrder: Next>
24922 |====
24923
24924 <<<
24925
24926 :mlton-guide-page: TalkMLtonApproach
24927 [[TalkMLtonApproach]]
24928 TalkMLtonApproach
24929 =================
24930
24931 == MLton's Approach ==
24932
24933  * whole-program optimization using a simply-typed, first-order intermediate language
24934  * ensures programs are not penalized for exploiting abstraction and modularity
24935
24936 {nbsp} +
24937 {nbsp} +
24938 {nbsp} +
24939
24940 '''
24941
24942 [cols="<,>"]
24943 |====
24944 |<:TalkStandardML: Prev>|<:TalkFromSMLTo: Next>
24945 |====
24946
24947 <<<
24948
24949 :mlton-guide-page: TalkMLtonFeatures
24950 [[TalkMLtonFeatures]]
24951 TalkMLtonFeatures
24952 =================
24953
24954 == MLton Features ==
24955
24956  * Supports full Standard ML language and Basis Library
24957  * Generates standalone executables
24958  * Extensions
24959    ** Foreign function interface (SML to C, C to SML)
24960    ** ML Basis system for programming in the very large
24961    ** Extension libraries
24962
24963 {nbsp} +
24964 {nbsp} +
24965
24966 See <:Features:>.
24967
24968 {nbsp} +
24969 {nbsp} +
24970 {nbsp} +
24971
24972 '''
24973
24974 [cols="<,>"]
24975 |====
24976 |<:TalkFolkLore: Prev>|<:TalkMLtonHistory: Next>
24977 |====
24978
24979 <<<
24980
24981 :mlton-guide-page: TalkMLtonHistory
24982 [[TalkMLtonHistory]]
24983 TalkMLtonHistory
24984 ================
24985
24986 == MLton History ==
24987
24988 [cols="<25%,<75%"]
24989 |====
24990 | April 1997  | Stephen Weeks wrote a defunctorizer for SML/NJ
24991 | Aug. 1997   | Begin independent compiler (`smlc`)
24992 | Oct. 1997   | Monomorphiser
24993 | Nov. 1997   | Polyvariant higher-order control-flow analysis (10,000 lines)
24994 | March 1999  | First release of MLton (48,006 lines)
24995 | Jan. 2002   | MLton at 102,541 lines
24996 | Jan. 2003   | MLton at 112,204 lines
24997 | Jan. 2004   | MLton at 122,299 lines
24998 | Nov. 2004   | MLton at 141,311 lines
24999 |====
25000
25001 {nbsp} +
25002 {nbsp} +
25003
25004 See <:History:>.
25005
25006 {nbsp} +
25007 {nbsp} +
25008 {nbsp} +
25009
25010 '''
25011
25012 [cols="<,>"]
25013 |====
25014 |<:TalkMLtonFeatures: Prev>|<:TalkDiveIn: Next>
25015 |====
25016
25017 <<<
25018
25019 :mlton-guide-page: TalkStandardML
25020 [[TalkStandardML]]
25021 TalkStandardML
25022 ==============
25023
25024 == Standard ML ==
25025
25026  * a high-level language makes
25027    ** a programmer's life easier
25028    ** a compiler writer's life harder
25029
25030  * perceived overheads of features discourage their use
25031    ** higher-order functions
25032    ** polymorphic datatypes
25033    ** separate modules
25034
25035 {nbsp} +
25036 {nbsp} +
25037
25038 Also see <:StandardML:Standard ML>.
25039
25040 {nbsp} +
25041 {nbsp} +
25042 {nbsp} +
25043
25044 '''
25045
25046 [cols="<,>"]
25047 |====
25048 |<:Talk: Prev>|<:TalkMLtonApproach: Next>
25049 |====
25050
25051 <<<
25052
25053 :mlton-guide-page: TalkTemplate
25054 [[TalkTemplate]]
25055 TalkTemplate
25056 ============
25057
25058 == Title ==
25059
25060  * Bullet
25061  * Bullet
25062
25063
25064 {nbsp} +
25065 {nbsp} +
25066 {nbsp} +
25067
25068 '''
25069
25070 [cols="<,>"]
25071 |====
25072 |<:ZZZPrev: Prev>|<:ZZZNext: Next>
25073 |====
25074
25075 <<<
25076
25077 :mlton-guide-page: TalkWholeProgram
25078 [[TalkWholeProgram]]
25079 TalkWholeProgram
25080 ================
25081
25082 == Whole Program Compiler ==
25083
25084  * Each of these techniques requires whole-program analysis
25085  * But, additional benefits:
25086    ** eliminate (some) variability in programming styles
25087    ** specialize representations
25088    ** simplifies and improves runtime system
25089
25090 {nbsp} +
25091 {nbsp} +
25092 {nbsp} +
25093
25094 '''
25095
25096 [cols="<,>"]
25097 |====
25098 |<:TalkHowHigherOrder: Prev>|<:TalkFolkLore: Next>
25099 |====
25100
25101 <<<
25102
25103 :mlton-guide-page: TILT
25104 [[TILT]]
25105 TILT
25106 ====
25107
25108 http://www.cs.cornell.edu/home/jgm/tilt.html[TILT] is a
25109 <:StandardMLImplementations:Standard ML implementation>.
25110
25111 <<<
25112
25113 :mlton-guide-page: TipsForWritingConciseSML
25114 [[TipsForWritingConciseSML]]
25115 TipsForWritingConciseSML
25116 ========================
25117
25118 SML is a rich enough language that there are often several ways to
25119 express things.  This page contains miscellaneous tips (ideas not
25120 rules) for writing concise SML.  The metric that we are interested in
25121 here is the number of tokens or words (rather than the number of
25122 lines, for example).
25123
25124 == Datatypes in Signatures ==
25125
25126 A seemingly frequent source of repetition in SML is that of datatype
25127 definitions in signatures and structures.  Actually, it isn't
25128 repetition at all.  A datatype specification in a signature, such as,
25129
25130 [source,sml]
25131 ----
25132 signature EXP = sig
25133    datatype exp = Fn of id * exp | App of exp * exp | Var of id
25134 end
25135 ----
25136
25137 is just a specification of a datatype that may be matched by multiple
25138 (albeit identical) datatype declarations.  For example, in
25139
25140 [source,sml]
25141 ----
25142 structure AnExp : EXP = struct
25143    datatype exp = Fn of id * exp | App of exp * exp | Var of id
25144 end
25145
25146 structure AnotherExp : EXP = struct
25147    datatype exp = Fn of id * exp | App of exp * exp | Var of id
25148 end
25149 ----
25150
25151 the types `AnExp.exp` and `AnotherExp.exp` are two distinct types.  If
25152 such <:GenerativeDatatype:generativity> isn't desired or needed, you
25153 can avoid the repetition:
25154
25155 [source,sml]
25156 ----
25157 structure Exp = struct
25158    datatype exp = Fn of id * exp | App of exp * exp | Var of id
25159 end
25160
25161 signature EXP = sig
25162    datatype exp = datatype Exp.exp
25163 end
25164
25165 structure Exp : EXP = struct
25166    open Exp
25167 end
25168 ----
25169
25170 Keep in mind that this isn't semantically equivalent to the original.
25171
25172
25173 == Clausal Function Definitions ==
25174
25175 The syntax of clausal function definitions is rather repetitive.  For
25176 example,
25177
25178 [source,sml]
25179 ----
25180 fun isSome NONE = false
25181   | isSome (SOME _) = true
25182 ----
25183
25184 is more verbose than
25185
25186 [source,sml]
25187 ----
25188 val isSome =
25189  fn NONE => false
25190   | SOME _ => true
25191 ----
25192
25193 For recursive functions the break-even point is one clause higher.  For example,
25194
25195 [source,sml]
25196 ----
25197 fun fib 0 = 0
25198   | fib 1 = 1
25199   | fib n = fib (n-1) + fib (n-2)
25200 ----
25201
25202 isn't less verbose than
25203
25204 [source,sml]
25205 ----
25206 val rec fib =
25207  fn 0 => 0
25208   | 1 => 1
25209   | n => fib (n-1) + fib (n-2)
25210 ----
25211
25212 It is quite often the case that a curried function primarily examines
25213 just one of its arguments.  Such functions can be written particularly
25214 concisely by making the examined argument last.  For example, instead
25215 of
25216
25217 [source,sml]
25218 ----
25219 fun eval (Fn (v, b)) env => ...
25220   | eval (App (f, a) env => ...
25221   | eval (Var v) env => ...
25222 ----
25223
25224 consider writing
25225
25226 [source,sml]
25227 ----
25228 fun eval env =
25229  fn Fn (v, b) => ...
25230   | App (f, a) => ...
25231   | Var v => ...
25232 ----
25233
25234
25235 == Parentheses ==
25236
25237 It is a good idea to avoid using lots of irritating superfluous
25238 parentheses.  An important rule to know is that prefix function
25239 application in SML has higher precedence than any infix operator.  For
25240 example, the outer parentheses in
25241
25242 [source,sml]
25243 ----
25244 (square (5 + 1)) + (square (5 * 2))
25245 ----
25246
25247 are superfluous.
25248
25249 People trained in other languages often use superfluous parentheses in
25250 a number of places.  In particular, the parentheses in the following
25251 examples are practically always superfluous and are best avoided:
25252
25253 [source,sml]
25254 ----
25255 if (condition) then ... else ...
25256 while (condition) do ...
25257 ----
25258
25259 The same basically applies to case expressions:
25260
25261 [source,sml]
25262 ----
25263 case (expression) of ...
25264 ----
25265
25266 It is not uncommon to match a tuple of two or more values:
25267
25268 [source,sml]
25269 ----
25270 case (a, b) of
25271    (A1, B1) => ...
25272  | (A2, B2) => ...
25273 ----
25274
25275 Such case expressions can be written more concisely with an
25276 <:ProductType:infix product constructor>:
25277
25278 [source,sml]
25279 ----
25280 case a & b of
25281    A1 & B1 => ...
25282  | A2 & B2 => ...
25283 ----
25284
25285
25286 == Conditionals ==
25287
25288 Repeated sequences of conditionals such as
25289
25290 [source,sml]
25291 ----
25292 if x < y then ...
25293 else if x = y then ...
25294 else ...
25295 ----
25296
25297 can often be written more concisely as case expressions such as
25298
25299 [source,sml]
25300 ----
25301 case Int.compare (x, y) of
25302    LESS => ...
25303  | EQUAL => ...
25304  | GREATER => ...
25305 ----
25306
25307 For a custom comparison, you would then define an appropriate datatype
25308 and a reification function.  An alternative to using datatypes is to
25309 use dispatch functions
25310
25311 [source,sml]
25312 ----
25313 comparing (x, y)
25314 {lt = fn () => ...,
25315  eq = fn () => ...,
25316  gt = fn () => ...}
25317 ----
25318
25319 where
25320
25321 [source,sml]
25322 ----
25323 fun comparing (x, y) {lt, eq, gt} =
25324     (case Int.compare (x, y) of
25325         LESS => lt
25326       | EQUAL => eq
25327       | GREATER => gt) ()
25328 ----
25329
25330 An advantage is that no datatype definition is needed.  A disadvantage
25331 is that you can't combine multiple dispatch results easily.
25332
25333
25334 == Command-Query Fusion ==
25335
25336 Many are familiar with the
25337 http://en.wikipedia.org/wiki/Command-Query_Separation[Command-Query
25338 Separation Principle].  Adhering to the principle, a signature for an
25339 imperative stack might contain specifications
25340
25341 [source,sml]
25342 ----
25343 val isEmpty : 'a t -> bool
25344 val pop : 'a t -> 'a
25345 ----
25346
25347 and use of a stack would look like
25348
25349 [source,sml]
25350 ----
25351 if isEmpty stack
25352 then ... pop stack ...
25353 else ...
25354 ----
25355
25356 or, when the element needs to be named,
25357
25358 [source,sml]
25359 ----
25360 if isEmpty stack
25361 then let val elem = pop stack in ... end
25362 else ...
25363 ----
25364
25365 For efficiency, correctness, and conciseness, it is often better to
25366 combine the query and command and return the result as an option:
25367
25368 [source,sml]
25369 ----
25370 val pop : 'a t -> 'a option
25371 ----
25372
25373 A use of a stack would then look like this:
25374
25375 [source,sml]
25376 ----
25377 case pop stack of
25378    NONE => ...
25379  | SOME elem => ...
25380 ----
25381
25382 <<<
25383
25384 :mlton-guide-page: ToMachine
25385 [[ToMachine]]
25386 ToMachine
25387 =========
25388
25389 <:ToMachine:> is a translation pass from the <:RSSA:>
25390 <:IntermediateLanguage:> to the <:Machine:> <:IntermediateLanguage:>.
25391
25392 == Description ==
25393
25394 This pass converts from a <:RSSA:> program into a <:Machine:> program.
25395
25396 It uses <:AllocateRegisters:>, <:Chunkify:>, and <:ParallelMove:>.
25397
25398 == Implementation ==
25399
25400 * <!ViewGitFile(mlton,master,mlton/backend/backend.sig)>
25401 * <!ViewGitFile(mlton,master,mlton/backend/backend.fun)>
25402
25403 == Details and Notes ==
25404
25405 Because the MLton runtime system is shared by all <:Codegen:codegens>, it is most
25406 convenient to decide on stack layout _before_ any <:Codegen:codegen> takes over.
25407 In particular, we compute all the stack frame info for each <:RSSA:>
25408 function, including stack size, <:GarbageCollection:garbage collector>
25409 masks for each frame, etc.  To do so, the <:Machine:>
25410 <:IntermediateLanguage:> imagines an abstract machine with an infinite
25411 number of (pseudo-)registers of every size.  A liveness analysis
25412 determines, for each variable, whether or not it is live across a
25413 point where the runtime system might take over (for example, any
25414 garbage collection point) or a non-tail call to another <:RSSA:>
25415 function.  Those that are live go on the stack, while those that
25416 aren't live go into psuedo-registers.  From this information, we know
25417 all we need to about each stack frame.  On the downside, nothing
25418 further on is allowed to change this stack info; it is set in stone.
25419
25420 <<<
25421
25422 :mlton-guide-page: TomMurphy
25423 [[TomMurphy]]
25424 TomMurphy
25425 =========
25426
25427 Tom Murphy VII is a long time MLton user and occasional contributor. He works on programming languages for his PhD work at Carnegie Mellon in Pittsburgh, USA. <:AdamGoode:> lives on the same floor of Wean Hall.
25428
25429 http://tom7.org[Home page]
25430
25431 <<<
25432
25433 :mlton-guide-page: ToRSSA
25434 [[ToRSSA]]
25435 ToRSSA
25436 ======
25437
25438 <:ToRSSA:> is a translation pass from the <:SSA2:>
25439 <:IntermediateLanguage:> to the <:RSSA:> <:IntermediateLanguage:>.
25440
25441 == Description ==
25442
25443 This pass converts a <:SSA2:> program into a <:RSSA:> program.
25444
25445 It uses <:PackedRepresentation:>.
25446
25447 == Implementation ==
25448
25449 * <!ViewGitFile(mlton,master,mlton/backend/ssa-to-rssa.sig)>
25450 * <!ViewGitFile(mlton,master,mlton/backend/ssa-to-rssa.fun)>
25451
25452 == Details and Notes ==
25453
25454 {empty}
25455
25456 <<<
25457
25458 :mlton-guide-page: ToSSA2
25459 [[ToSSA2]]
25460 ToSSA2
25461 ======
25462
25463 <:ToSSA2:> is a translation pass from the <:SSA:>
25464 <:IntermediateLanguage:> to the <:SSA2:> <:IntermediateLanguage:>.
25465
25466 == Description ==
25467
25468 This pass is a simple conversion from a <:SSA:> program into a
25469 <:SSA2:> program.
25470
25471 The only interesting portions of the translation are:
25472
25473 * an <:SSA:> `ref` type becomes an object with a single mutable field
25474 * `array`, `vector`, and `ref` are eliminated in favor of select and updates
25475 * `Case` transfers separate discrimination and constructor argument selects
25476
25477 == Implementation ==
25478
25479 * <!ViewGitFile(mlton,master,mlton/ssa/ssa-to-ssa2.sig)>
25480 * <!ViewGitFile(mlton,master,mlton/ssa/ssa-to-ssa2.fun)>
25481
25482 == Details and Notes ==
25483
25484 {empty}
25485
25486 <<<
25487
25488 :mlton-guide-page: TypeChecking
25489 [[TypeChecking]]
25490 TypeChecking
25491 ============
25492
25493 MLton's type checker follows the <:DefinitionOfStandardML:Definition>
25494 closely, so you may find differences between MLton and other SML
25495 compilers that do not follow the Definition so closely.  In
25496 particular, SML/NJ has many deviations from the Definition -- please
25497 see <:SMLNJDeviations:> for those that we are aware of.
25498
25499 In some respects MLton's type checker is more powerful than other SML
25500 compilers, so there are programs that MLton accepts that are rejected
25501 by some other SML compilers.  These kinds of programs fall into a few
25502 simple categories.
25503
25504 * MLton resolves flexible record patterns using a larger context than
25505 many other SML compilers.  For example, MLton accepts the
25506 following.
25507 +
25508 [source,sml]
25509 ----
25510 fun f {x, ...} = x
25511 val _ = f {x = 13, y = "foo"}
25512 ----
25513
25514 * MLton uses as large a context as possible to resolve the type of
25515 variables constrained by the value restriction to be monotypes.  For
25516 example, MLton accepts the following.
25517 +
25518 [source,sml]
25519 ----
25520 structure S:
25521    sig
25522       val f: int -> int
25523    end =
25524    struct
25525       val f = (fn x => x) (fn y => y)
25526    end
25527 ----
25528
25529
25530 == Type error messages ==
25531
25532 To aid in the understanding of type errors, MLton's type checker
25533 displays type errors differently than other SML compilers.  In
25534 particular, when two types are different, it is important for the
25535 programmer to easily understand why they are different.  So, MLton
25536 displays only the differences between two types that don't match,
25537 using underscores for the parts that match.  For example, if a
25538 function expects `real * int` but gets `real * real`, the type error
25539 message would look like
25540
25541 ----
25542 expects: _ * [int]
25543 but got: _ * [real]
25544 ----
25545
25546 As another aid to spotting differences, MLton places brackets `[]`
25547 around the parts of the types that don't match.  A common situation is
25548 when a function receives a different number of arguments than it
25549 expects, in which case you might see an error like
25550
25551 ----
25552 expects: [int * real]
25553 but got: [int * real * string]
25554 ----
25555
25556 The brackets make it easy to see that the problem is that the tuples
25557 have different numbers of components -- not that the components don't
25558 match.  Contrast that with a case where a function receives the right
25559 number of arguments, but in the wrong order, in which case you might
25560 see an error like
25561
25562 ----
25563 expects: [int] * [real]
25564 but got: [real] * [int]
25565 ----
25566
25567 Here the brackets make it easy to see that the components do not match.
25568
25569 We appreciate feedback on any type error messages that you find
25570 confusing, or suggestions you may have for improvements to error
25571 messages.
25572
25573
25574 == The shortest/most-recent rule for type names ==
25575
25576 In a type error message, MLton often has a number of choices in
25577 deciding what name to use for a type.  For example, in the following
25578 type-incorrect program
25579
25580 [source,sml]
25581 ----
25582 type t = int
25583 fun f (x: t) = x
25584 val _ = f "foo"
25585 ----
25586
25587 MLton reports the error message
25588
25589 ----
25590 Error: z.sml 3.9-3.15.
25591   Function applied to incorrect argument.
25592     expects: [t]
25593     but got: [string]
25594     in: f "foo"
25595 ----
25596
25597 MLton could have reported `expects: [int]` instead of `expects: [t]`.
25598 However, MLton uses the shortest/most-recent rule in order to decide
25599 what type name to display.  This rule means that, at the point of the
25600 error, MLton first looks for the shortest name for a type in terms of
25601 number of structure identifiers (e.g. `foobar` is shorter than `A.t`).
25602 Next, if there are multiple names of the same length, then MLton uses
25603 the most recently defined name.  It is this tiebreaker that causes
25604 MLton to prefer `t` to `int` in the above example.
25605
25606 In signature matching, most recently defined is not taken to include
25607 all of the definitions introduced by the structure (since the matching
25608 takes place outside the structure and before it is defined).  For
25609 example, in the following type-incorrect program
25610
25611 [source,sml]
25612 ----
25613 structure S:
25614    sig
25615       val x: int
25616    end =
25617    struct
25618       type t = int
25619       val x = "foo"
25620    end
25621 ----
25622
25623 MLton reports the error message
25624
25625 ----
25626 Error: z.sml 2.4-4.6.
25627   Variable in structure disagrees with signature (type): x.
25628     structure: val x: [string]
25629     defn at: z.sml 7.11-7.11
25630     signature: val x: [int]
25631     spec at: z.sml 3.11-3.11
25632 ----
25633
25634 If there is a type that only exists inside the structure being
25635 matched, then the prefix `_str.` is used.  For example, in the
25636 following type-incorrect program
25637
25638 [source,sml]
25639 ----
25640 structure S:
25641    sig
25642       val x: int
25643    end =
25644    struct
25645       datatype t = T
25646       val x = T
25647    end
25648 ----
25649
25650 MLton reports the error message
25651
25652 ----
25653 Error: z.sml 2.4-4.6.
25654   Variable in structure disagrees with signature (type): x.
25655     structure: val x: [_str.t]
25656     defn at: z.sml 7.11-7.11
25657     signature: val x: [int]
25658     spec at: z.sml 3.11-3.11
25659 ----
25660
25661 in which the `[_str.t]` refers to the type defined in the structure.
25662
25663 <<<
25664
25665 :mlton-guide-page: TypeConstructor
25666 [[TypeConstructor]]
25667 TypeConstructor
25668 ===============
25669
25670 In <:StandardML:Standard ML>, a type constructor is a function from
25671 types to types.  Type constructors can be _nullary_, meaning that
25672 they take no arguments, as in `char`, `int`, and `real`.
25673 Type constructors can be _unary_, meaning that they take one
25674 argument, as in `array`, `list`, and `vector`.  A program
25675 can define a new type constructor in two ways: a `type` definition
25676 or a `datatype` declaration.  User-defined type constructors can
25677 can take any number of arguments.
25678
25679 [source,sml]
25680 ----
25681 datatype t = T of int * real            (* 0 arguments *)
25682 type 'a t = 'a * int                    (* 1 argument *)
25683 datatype ('a, 'b) t = A | B of 'a * 'b  (* 2 arguments *)
25684 type ('a, 'b, 'c) t = 'a * ('b  -> 'c)  (* 3 arguments *)
25685 ----
25686
25687 Here are the syntax rules for type constructor application.
25688
25689  * Type constructor application is written in postfix.  So, one writes
25690  `int list`, not `list int`.
25691
25692  * Unary type constructors drop the parens, so one writes
25693  `int list`, not `(int) list`.
25694
25695  * Nullary type constructors drop the argument entirely, so one writes
25696  `int`, not `() int`.
25697
25698  * N-ary type constructors use tuple notation; for example,
25699  `(int, real) t`.
25700
25701  * Type constructor application associates to the left.  So,
25702  `int ref list` is the same as `(int ref) list`.
25703
25704 <<<
25705
25706 :mlton-guide-page: TypeIndexedValues
25707 [[TypeIndexedValues]]
25708 TypeIndexedValues
25709 =================
25710
25711 <:StandardML:Standard ML> does not support ad hoc polymorphism.  This
25712 presents a challenge to programmers.  The problem is that at first
25713 glance there seems to be no practical way to implement something like
25714 a function for converting a value of any type to a string or a
25715 function for computing a hash value for a value of any type.
25716 Fortunately there are ways to implement type-indexed values in SML as
25717 discussed in <!Cite(Yang98)>.  Various articles such as
25718 <!Cite(Danvy98)>, <!Cite(Ramsey11)>, <!Cite(Elsman04)>,
25719 <!Cite(Kennedy04)>, and <!Cite(Benton05)> also contain examples of
25720 type-indexed values.
25721
25722 *NOTE:* The technique used in the following example uses an early (and
25723 somewhat broken) variation of the basic technique used in an
25724 experimental generic programming library (see
25725 <!ViewGitFile(mltonlib,master,com/ssh/generic/unstable/README)>) that can
25726 be found from the MLton repository.  The generic programming library
25727 also includes a more advanced generic pretty printing function (see
25728 <!ViewGitFile(mltonlib,master,com/ssh/generic/unstable/public/value/pretty.sig)>).
25729
25730 == Example: Converting any SML value to (roughly) SML syntax ==
25731
25732 Consider the problem of converting any SML value to a textual
25733 presentation that matches the syntax of SML as closely as possible.
25734 One solution is a type-indexed function that maps a given type to a
25735 function that maps any value (of the type) to its textual
25736 presentation.  A type-indexed function like this can be useful for a
25737 variety of purposes.  For example, one could use it to show debugging
25738 information.  We'll call this function "`show`".
25739
25740 We'll do a fairly complete implementation of `show`.  We do not
25741 distinguish infix and nonfix constructors, but that is not an
25742 intrinsic property of SML datatypes.  We also don't reconstruct a type
25743 name for the value, although it would be particularly useful for
25744 functional values.  To reconstruct type names, some changes would be
25745 needed and the reader is encouraged to consider how to do that.  A
25746 more realistic implementation would use some pretty printing
25747 combinators to compute a layout for the result.  This should be a
25748 relatively easy change (given a suitable pretty printing library).
25749 Cyclic values (through references and arrays) do not have a standard
25750 textual presentation and it is impossible to convert arbitrary
25751 functional values (within SML) to a meaningful textual presentation.
25752 Finally, it would also make sense to show sharing of references and
25753 arrays.  We'll leave these improvements to an actual library
25754 implementation.
25755
25756 The following code uses the <:Fixpoints:fixpoint framework> and other
25757 utilities from an Extended Basis library (see
25758 <!ViewGitFile(mltonlib,master,com/ssh/extended-basis/unstable/README)>).
25759
25760 === Signature ===
25761
25762 Let's consider the design of the `SHOW` signature:
25763 [source,sml]
25764 ----
25765 infixr -->
25766
25767 signature SHOW = sig
25768    type 'a t       (* complete type-index *)
25769    type 'a s       (* incomplete sum *)
25770    type ('a, 'k) p (* incomplete product *)
25771    type u          (* tuple or unlabelled product *)
25772    type l          (* record or labelled product *)
25773
25774    val show : 'a t -> 'a -> string
25775
25776    (* user-defined types *)
25777    val inj : ('a -> 'b) -> 'b t -> 'a t
25778
25779    (* tuples and records *)
25780    val * : ('a, 'k) p * ('b, 'k) p -> (('a, 'b) product, 'k) p
25781
25782    val U :           'a t -> ('a, u) p
25783    val L : string -> 'a t -> ('a, l) p
25784
25785    val tuple  : ('a, u) p -> 'a t
25786    val record : ('a, l) p -> 'a t
25787
25788    (* datatypes *)
25789    val + : 'a s * 'b s -> (('a, 'b) sum) s
25790
25791    val C0 : string -> unit s
25792    val C1 : string -> 'a t -> 'a s
25793
25794    val data : 'a s -> 'a t
25795
25796    val Y : 'a t Tie.t
25797
25798    (* exceptions *)
25799    val exn : exn t
25800    val regExn : (exn -> ('a * 'a s) option) -> unit
25801
25802    (* some built-in type constructors *)
25803    val refc : 'a t -> 'a ref t
25804    val array : 'a t -> 'a array t
25805    val list : 'a t -> 'a list t
25806    val vector : 'a t -> 'a vector t
25807    val --> : 'a t * 'b t -> ('a -> 'b) t
25808
25809    (* some built-in base types *)
25810    val string : string t
25811    val unit : unit t
25812    val bool : bool t
25813    val char : char t
25814    val int : int t
25815    val word : word t
25816    val real : real t
25817 end
25818 ----
25819
25820 While some details are shaped by the specific requirements of `show`,
25821 there are a number of (design) patterns that translate to other
25822 type-indexed values.  The former kind of details are mostly shaped by
25823 the syntax of SML values that `show` is designed to produce.  To this
25824 end, abstract types and phantom types are used to distinguish
25825 incomplete record, tuple, and datatype type-indices from each other
25826 and from complete type-indices.  Also, names of record labels and
25827 datatype constructors need to be provided by the user.
25828
25829 ==== Arbitrary user-defined datatypes ====
25830
25831 Perhaps the most important pattern is how the design supports
25832 arbitrary user-defined datatypes.  A number of combinators together
25833 conspire to provide the functionality.  First of all, to support new
25834 user-defined types, a combinator taking a conversion function to a
25835 previously supported type is provided:
25836 [source,sml]
25837 ----
25838 val inj : ('a -> 'b) -> 'b t -> 'a t
25839 ----
25840
25841 An injection function is sufficient in this case, but in the general
25842 case, an embedding with injection and projection functions may be
25843 needed.
25844
25845 To support products (tuples and records) a product combinator is
25846 provided:
25847 [source,sml]
25848 ----
25849 val * : ('a, 'k) p * ('b, 'k) p -> (('a, 'b) product, 'k) p
25850 ----
25851 The second (phantom) type variable `'k` is there to distinguish
25852 between labelled and unlabelled products and the type `p`
25853 distinguishes incomplete products from complete type-indices of type
25854 `t`.  Most type-indexed values do not need to make such distinctions.
25855
25856 To support sums (datatypes) a sum combinator is provided:
25857 [source,sml]
25858 ----
25859 val + : 'a s * 'b s -> (('a, 'b) sum) s
25860 ----
25861 Again, the purpose of the type `s` is to distinguish incomplete sums
25862 from complete type-indices of type `t`, which usually isn't necessary.
25863
25864 Finally, to support recursive datatypes, including sets of mutually
25865 recursive datatypes, a <:Fixpoints:fixpoint tier> is provided:
25866 [source,sml]
25867 ----
25868 val Y : 'a t Tie.t
25869 ----
25870
25871 Together these combinators (with the more domain specific combinators
25872 `U`, `L`, `tuple`, `record`, `C0`, `C1`, and `data`) enable one to
25873 encode a type-index for any user-defined datatype.
25874
25875 ==== Exceptions ====
25876
25877 The `exn` type in SML is a <:UniversalType:universal type> into which
25878 all types can be embedded.  SML also allows a program to generate new
25879 exception variants at run-time.  Thus a mechanism is required to register
25880 handlers for particular variants:
25881 [source,sml]
25882 ----
25883 val exn : exn t
25884 val regExn : (exn -> ('a * 'a s) option) -> unit
25885 ----
25886
25887 The universal `exn` type-index then makes use of the registered
25888 handlers.  The above particular form of handler, which converts an
25889 exception value to a value of some type and a type-index for that type
25890 (essentially an existential type) is designed to make it convenient to
25891 write handlers.  To write a handler, one can conveniently reuse
25892 existing type-indices:
25893 [source,sml]
25894 ----
25895 exception Int of int
25896
25897 local
25898    open Show
25899 in
25900    val () = regExn (fn Int v => SOME (v, C1"Int" int)
25901                      | _     => NONE)
25902 end
25903 ----
25904
25905 Note that a single handler may actually handle an arbitrary number of
25906 different exceptions.
25907
25908 ==== Other types ====
25909
25910 Some built-in and standard types typically require special treatment
25911 due to their special nature.  The most important of these are arrays
25912 and references, because cyclic data (ignoring closures) and observable
25913 sharing can only be constructed through them.
25914
25915 When arrow types are really supported, unlike in this case, they
25916 usually need special treatment due to the contravariance of arguments.
25917
25918 Lists and vectors require special treatment in the case of `show`,
25919 because of their special syntax.  This isn't usually the case.
25920
25921 The set of base types to support also needs to be considered unless
25922 one exports an interface for constructing type-indices for entirely
25923 new base types.
25924
25925 == Usage ==
25926
25927 Before going to the implementation, let's look at some examples.  For
25928 the following examples, we'll assume a structure binding
25929 `Show :> SHOW`.  If you want to try the examples immediately, just
25930 skip forward to the implementation.
25931
25932 To use `show`, one first needs a type-index, which is then given to
25933 `show`.  To show a list of integers, one would use the type-index
25934 `list int`, which has the type `int list Show.t`:
25935 [source,sml]
25936 ----
25937 val "[3, 1, 4]" =
25938     let open Show in show (list int) end
25939        [3, 1, 4]
25940 ----
25941
25942 Likewise, to show a list of lists of characters, one would use the
25943 type-index `list (list char)`, which has the type `char list list
25944 Show.t`:
25945 [source,sml]
25946 ----
25947 val "[[#\"a\", #\"b\", #\"c\"], []]" =
25948     let open Show in show (list (list char)) end
25949        [[#"a", #"b", #"c"], []]
25950 ----
25951
25952 Handling standard types is not particularly interesting.  It is more
25953 interesting to see how user-defined types can be handled.  Although
25954 the `option` datatype is a standard type, it requires no special
25955 support, so we can treat it as a user-defined type.  Options can be
25956 encoded easily using a sum:
25957 [source,sml]
25958 ----
25959 fun option t = let
25960    open Show
25961 in
25962    inj (fn NONE => INL ()
25963          | SOME v => INR v)
25964        (data (C0"NONE" + C1"SOME" t))
25965 end
25966
25967 val "SOME 5" =
25968     let open Show in show (option int) end
25969        (SOME 5)
25970 ----
25971
25972 Readers new to type-indexed values might want to type annotate each
25973 subexpression of the above example as an exercise.  (Use a compiler to
25974 check your annotations.)
25975
25976 Using a product, user specified records can be also be encoded easily:
25977 [source,sml]
25978 ----
25979 val abc = let
25980    open Show
25981 in
25982    inj (fn {a, b, c} => a & b & c)
25983        (record (L"a" (option int) *
25984                 L"b" real *
25985                 L"c" bool))
25986 end
25987
25988 val "{a = SOME 1, b = 3.0, c = false}" =
25989     let open Show in show abc end
25990        {a = SOME 1, b = 3.0, c = false}
25991 ----
25992
25993 As you can see, both of the above use `inj` to inject user-defined
25994 types to the general purpose sum and product types.
25995
25996 Of particular interest is whether recursive datatypes and cyclic data
25997 can be handled.  For example, how does one write a type-index for a
25998 recursive datatype such as a cyclic graph?
25999 [source,sml]
26000 ----
26001 datatype 'a graph = VTX of 'a * 'a graph list ref
26002 fun arcs (VTX (_, r)) = r
26003 ----
26004
26005 Using the `Show` combinators, we could first write a new type-index
26006 combinator for `graph`:
26007 [source,sml]
26008 ----
26009 fun graph a = let
26010    open Tie Show
26011 in
26012    fix Y (fn graph_a =>
26013              inj (fn VTX (x, y) => x & y)
26014                  (data (C1"VTX"
26015                           (tuple (U a *
26016                                   U (refc (list graph_a)))))))
26017 end
26018 ----
26019
26020 To show a graph with integer labels
26021 [source,sml]
26022 ----
26023 val a_graph = let
26024    val a = VTX (1, ref [])
26025    val b = VTX (2, ref [])
26026    val c = VTX (3, ref [])
26027    val d = VTX (4, ref [])
26028    val e = VTX (5, ref [])
26029    val f = VTX (6, ref [])
26030 in
26031    arcs a := [b, d]
26032  ; arcs b := [c, e]
26033  ; arcs c := [a, f]
26034  ; arcs d := [f]
26035  ; arcs e := [d]
26036  ; arcs f := [e]
26037  ; a
26038 end
26039 ----
26040 we could then simply write
26041 [source,sml]
26042 ----
26043 val "VTX (1, ref [VTX (2, ref [VTX (3, ref [VTX (1, %0), \
26044     \VTX (6, ref [VTX (5, ref [VTX (4, ref [VTX (6, %3)])])] as %3)]), \
26045     \VTX (5, ref [VTX (4, ref [VTX (6, ref [VTX (5, %2)])])] as %2)]), \
26046     \VTX (4, ref [VTX (6, ref [VTX (5, ref [VTX (4, %1)])])] as %1)] as %0)" =
26047     let open Show in show (graph int) end
26048        a_graph
26049 ----
26050
26051 There is a subtle gotcha with cyclic data.  Consider the following code:
26052 [source,sml]
26053 ----
26054 exception ExnArray of exn array
26055
26056 val () = let
26057    open Show
26058 in
26059    regExn (fn ExnArray a =>
26060               SOME (a, C1"ExnArray" (array exn))
26061             | _ => NONE)
26062 end
26063
26064 val a_cycle = let
26065    val a = Array.fromList [Empty]
26066 in
26067    Array.update (a, 0, ExnArray a) ; a
26068 end
26069 ----
26070
26071 Although the above looks innocent enough, the evaluation  of
26072 [source,sml]
26073 ----
26074 val "[|ExnArray %0|] as %0" =
26075     let open Show in show (array exn) end
26076        a_cycle
26077 ----
26078 goes into an infinite loop.  To avoid this problem, the type-index
26079 `array exn` must be evaluated only once, as in the following:
26080 [source,sml]
26081 ----
26082 val array_exn = let open Show in array exn end
26083
26084 exception ExnArray of exn array
26085
26086 val () = let
26087    open Show
26088 in
26089    regExn (fn ExnArray a =>
26090               SOME (a, C1"ExnArray" array_exn)
26091             | _ => NONE)
26092 end
26093
26094 val a_cycle = let
26095    val a = Array.fromList [Empty]
26096 in
26097    Array.update (a, 0, ExnArray a) ; a
26098 end
26099
26100 val "[|ExnArray %0|] as %0" =
26101     let open Show in show array_exn end
26102        a_cycle
26103 ----
26104
26105 Cyclic data (excluding closures) in Standard ML can only be
26106 constructed imperatively through arrays and references (combined with
26107 exceptions or recursive datatypes).  Before recursing to a reference
26108 or an array, one needs to check whether that reference or array has
26109 already been seen before.  When `ref` or `array` is called with a
26110 type-index, a new cyclicity checker is instantiated.
26111
26112 == Implementation ==
26113
26114 [source,sml]
26115 ----
26116 structure SmlSyntax = struct
26117    local
26118       structure CV = CharVector and C = Char
26119    in
26120       val isSym = Char.contains "!%&$#+-/:<=>?@\\~`^|*"
26121
26122       fun isSymId s = 0 < size s andalso CV.all isSym s
26123
26124       fun isAlphaNumId s =
26125           0 < size s
26126           andalso C.isAlpha (CV.sub (s, 0))
26127           andalso CV.all (fn c => C.isAlphaNum c
26128                                   orelse #"'" = c
26129                                   orelse #"_" = c) s
26130
26131       fun isNumLabel s =
26132           0 < size s
26133           andalso #"0" <> CV.sub (s, 0)
26134           andalso CV.all C.isDigit s
26135
26136       fun isId s = isAlphaNumId s orelse isSymId s
26137
26138       fun isLongId s = List.all isId (String.fields (#"." <\ op =) s)
26139
26140       fun isLabel s = isId s orelse isNumLabel s
26141    end
26142 end
26143
26144 structure Show :> SHOW = struct
26145    datatype 'a t = IN of exn list * 'a -> bool * string
26146    type 'a s = 'a t
26147    type ('a, 'k) p = 'a t
26148    type u = unit
26149    type l = unit
26150
26151    fun show (IN t) x = #2 (t ([], x))
26152
26153    (* user-defined types *)
26154    fun inj inj (IN b) = IN (b o Pair.map (id, inj))
26155
26156    local
26157       fun surround pre suf (_, s) = (false, concat [pre, s, suf])
26158       fun parenthesize x = if #1 x then surround "(" ")" x else x
26159       fun construct tag =
26160           (fn (_, s) => (true, concat [tag, " ", s])) o parenthesize
26161       fun check p m s = if p s then () else raise Fail (m^s)
26162    in
26163       (* tuples and records *)
26164       fun (IN l) * (IN r) =
26165           IN (fn (rs, a & b) =>
26166                  (false, concat [#2 (l (rs, a)),
26167                                  ", ",
26168                                  #2 (r (rs, b))]))
26169
26170       val U = id
26171       fun L l = (check SmlSyntax.isLabel "Invalid label: " l
26172                ; fn IN t => IN (surround (l^" = ") "" o t))
26173
26174       fun tuple (IN t) = IN (surround "(" ")" o t)
26175       fun record (IN t) = IN (surround "{" "}" o t)
26176
26177       (* datatypes *)
26178       fun (IN l) + (IN r) = IN (fn (rs, INL a) => l (rs, a)
26179                                  | (rs, INR b) => r (rs, b))
26180
26181       fun C0 c = (check SmlSyntax.isId "Invalid constructor: " c
26182                 ; IN (const (false, c)))
26183       fun C1 c (IN t) = (check SmlSyntax.isId "Invalid constructor: " c
26184                        ; IN (construct c o t))
26185
26186       val data = id
26187
26188       fun Y ? = Tie.iso Tie.function (fn IN x => x, IN) ?
26189
26190       (* exceptions *)
26191       local
26192          val handlers = ref ([] : (exn -> unit t option) list)
26193       in
26194          val exn = IN (fn (rs, e) => let
26195                              fun lp [] =
26196                                  C0(concat ["<exn:",
26197                                             General.exnName e,
26198                                             ">"])
26199                                | lp (f::fs) =
26200                                  case f e
26201                                   of NONE => lp fs
26202                                    | SOME t => t
26203                              val IN f = lp (!handlers)
26204                           in
26205                              f (rs, ())
26206                           end)
26207          fun regExn f =
26208              handlers := (Option.map
26209                              (fn (x, IN f) =>
26210                                  IN (fn (rs, ()) =>
26211                                         f (rs, x))) o f)
26212                          :: !handlers
26213       end
26214
26215       (* some built-in type constructors *)
26216       local
26217          fun cyclic (IN t) = let
26218             exception E of ''a * bool ref
26219          in
26220             IN (fn (rs, v : ''a) => let
26221                       val idx = Int.toString o length
26222                       fun lp (E (v', c)::rs) =
26223                           if v' <> v then lp rs
26224                           else (c := false ; (false, "%"^idx rs))
26225                         | lp (_::rs) = lp rs
26226                         | lp [] = let
26227                              val c = ref true
26228                              val r = t (E (v, c)::rs, v)
26229                           in
26230                              if !c then r
26231                              else surround "" (" as %"^idx rs) r
26232                           end
26233                    in
26234                       lp rs
26235                    end)
26236          end
26237
26238          fun aggregate pre suf toList (IN t) =
26239              IN (surround pre suf o
26240                  (fn (rs, a) =>
26241                      (false,
26242                       String.concatWith
26243                          ", "
26244                          (map (#2 o curry t rs)
26245                               (toList a)))))
26246       in
26247          fun refc ? = (cyclic o inj ! o C1"ref") ?
26248          fun array ? = (cyclic o aggregate "[|" "|]" (Array.foldr op:: [])) ?
26249          fun list ? = aggregate "[" "]" id ?
26250          fun vector ? = aggregate "#[" "]" (Vector.foldr op:: []) ?
26251       end
26252
26253       fun (IN _) --> (IN _) = IN (const (false, "<fn>"))
26254
26255       (* some built-in base types *)
26256       local
26257          fun mk toS = (fn x => (false, x)) o toS o (fn (_, x) => x)
26258       in
26259          val string =
26260              IN (surround "\"" "\"" o mk (String.translate Char.toString))
26261          val unit = IN (mk (fn () => "()"))
26262          val bool = IN (mk Bool.toString)
26263          val char = IN (surround "#\"" "\"" o mk Char.toString)
26264          val int = IN (mk Int.toString)
26265          val word = IN (surround "0wx" "" o mk Word.toString)
26266          val real = IN (mk Real.toString)
26267       end
26268    end
26269 end
26270
26271 (* Handlers for standard top-level exceptions *)
26272 val () = let
26273    open Show
26274    fun E0 name = SOME ((), C0 name)
26275 in
26276    regExn (fn Bind => E0"Bind"
26277             | Chr => E0"Chr"
26278             | Div => E0"Div"
26279             | Domain => E0"Domain"
26280             | Empty => E0"Empty"
26281             | Match => E0"Match"
26282             | Option => E0"Option"
26283             | Overflow  => E0"Overflow"
26284             | Size => E0"Size"
26285             | Span => E0"Span"
26286             | Subscript => E0"Subscript"
26287             | _ => NONE)
26288  ; regExn (fn Fail s => SOME (s, C1"Fail" string)
26289             | _ => NONE)
26290 end
26291 ----
26292
26293
26294 == Also see ==
26295
26296 There are a number of related techniques.  Here are some of them.
26297
26298 * <:Fold:>
26299 * <:StaticSum:>
26300
26301 <<<
26302
26303 :mlton-guide-page: TypeVariableScope
26304 [[TypeVariableScope]]
26305 TypeVariableScope
26306 =================
26307
26308 In <:StandardML:Standard ML>, every type variable is _scoped_ (or
26309 bound) at a particular point in the program.  A type variable can be
26310 either implicitly scoped or explicitly scoped.  For example, `'a` is
26311 implicitly scoped in
26312
26313 [source,sml]
26314 ----
26315 val id: 'a -> 'a = fn x => x
26316 ----
26317
26318 and is implicitly scoped in
26319
26320 [source,sml]
26321 ----
26322 val id = fn x: 'a => x
26323 ----
26324
26325 On the other hand, `'a` is explicitly scoped in
26326
26327 [source,sml]
26328 ----
26329 val 'a id: 'a -> 'a = fn x => x
26330 ----
26331
26332 and is explicitly scoped in
26333
26334 [source,sml]
26335 ----
26336 val 'a id = fn x: 'a => x
26337 ----
26338
26339 A type variable can be scoped at a `val` or `fun` declaration.  An SML
26340 type checker performs scope inference on each top-level declaration to
26341 determine the scope of each implicitly scoped type variable.  After
26342 scope inference, every type variable is scoped at exactly one
26343 enclosing `val` or `fun` declaration.  Scope inference shows that the
26344 first and second example above are equivalent to the third and fourth
26345 example, respectively.
26346
26347 Section 4.6 of the <:DefinitionOfStandardML:Definition> specifies
26348 precisely the scope of an implicitly scoped type variable.  A free
26349 occurrence of a type variable `'a` in a declaration `d` is said to be
26350 _unguarded_ in `d` if `'a` is not part of a smaller declaration.  A
26351 type variable `'a` is implicitly scoped at `d` if `'a` is unguarded in
26352 `d` and `'a` does not occur unguarded in any declaration containing
26353 `d`.
26354
26355
26356 == Scope inference examples ==
26357
26358 * In this example,
26359 +
26360 [source,sml]
26361 ----
26362 val id: 'a -> 'a = fn x => x
26363 ----
26364 +
26365 `'a` is unguarded in `val id` and does not occur unguarded in any
26366 containing declaration.  Hence, `'a` is scoped at `val id` and the
26367 declaration is equivalent to the following.
26368 +
26369 [source,sml]
26370 ----
26371 val 'a id: 'a -> 'a = fn x => x
26372 ----
26373
26374 * In this example,
26375 +
26376 [source,sml]
26377 ----
26378  val f = fn x => let exception E of 'a in E x end
26379 ----
26380 +
26381 `'a` is unguarded in `val f` and does not occur unguarded in any
26382 containing declaration.  Hence, `'a` is scoped at `val f` and the
26383 declaration is equivalent to the following.
26384 +
26385 [source,sml]
26386 ----
26387 val 'a f = fn x => let exception E of 'a in E x end
26388 ----
26389
26390 * In this example (taken from the <:DefinitionOfStandardML:Definition>),
26391 +
26392 [source,sml]
26393 ----
26394 val x: int -> int = let val id: 'a -> 'a = fn z => z in id id end
26395 ----
26396 +
26397 `'a` occurs unguarded in `val id`, but not in `val x`.  Hence, `'a` is
26398 implicitly scoped at `val id`, and the declaration is equivalent to
26399 the following.
26400 +
26401 [source,sml]
26402 ----
26403 val x: int -> int = let val 'a id: 'a -> 'a = fn z => z in id id end
26404 ----
26405
26406
26407 * In this example,
26408 +
26409 [source,sml]
26410 ----
26411 val f = (fn x: 'a => x) (fn y => y)
26412 ----
26413 +
26414 `'a` occurs unguarded in `val f` and does not occur unguarded in any
26415 containing declaration.  Hence, `'a` is implicitly scoped at `val f`,
26416 and the declaration is equivalent to the following.
26417 +
26418 [source,sml]
26419 ----
26420 val 'a f = (fn x: 'a => x) (fn y => y)
26421 ----
26422 +
26423 This does not type check due to the <:ValueRestriction:>.
26424
26425 * In this example,
26426 +
26427 [source,sml]
26428 ----
26429 fun f x =
26430   let
26431     fun g (y: 'a) = if true then x else y
26432   in
26433     g x
26434   end
26435 ----
26436 +
26437 `'a` occurs unguarded in `fun g`, not in `fun f`.  Hence, `'a` is
26438 implicitly scoped at `fun g`, and the declaration is equivalent to
26439 +
26440 [source,sml]
26441 ----
26442 fun f x =
26443   let
26444     fun 'a g (y: 'a) = if true then x else y
26445   in
26446     g x
26447   end
26448 ----
26449 +
26450 This fails to type check because `x` and `y` must have the same type,
26451 but the `x` occurs outside the scope of the type variable `'a`.  MLton
26452 reports the following error.
26453 +
26454 ----
26455 Error: z.sml 3.21-3.41.
26456   Then and else branches disagree.
26457     then: [???]
26458     else: ['a]
26459     in: if true then x else y
26460     note: type would escape its scope: 'a
26461     escape to: z.sml 1.1-6.5
26462 ----
26463 +
26464 This problem could be fixed either by adding an explicit type
26465 constraint, as in `fun f (x: 'a)`, or by explicitly scoping `'a`, as
26466 in `fun 'a f x = ...`.
26467
26468
26469 == Restrictions on type variable scope ==
26470
26471 It is not allowed to scope a type variable within a declaration in
26472 which it is already in scope (see the last restriction listed on page
26473 9 of the <:DefinitionOfStandardML:Definition>).  For example, the
26474 following program is invalid.
26475
26476 [source,sml]
26477 ----
26478 fun 'a f (x: 'a) =
26479    let
26480       fun 'a g (y: 'a) = y
26481    in
26482       ()
26483    end
26484 ----
26485
26486 MLton reports the following error.
26487
26488 ----
26489 Error: z.sml 3.11-3.12.
26490   Type variable scoped at an outer declaration: 'a.
26491     scoped at: z.sml 1.1-6.6
26492 ----
26493
26494 This is an error even if the scoping is implicit.  That is, the
26495 following program is invalid as well.
26496
26497 [source,sml]
26498 ----
26499 fun f (x: 'a) =
26500    let
26501       fun 'a g (y: 'a) = y
26502    in
26503       ()
26504    end
26505 ----
26506
26507 <<<
26508
26509 :mlton-guide-page: Unicode
26510 [[Unicode]]
26511 Unicode
26512 =======
26513
26514 == Support in The Definition of Standard ML ==
26515
26516 There is no real support for Unicode in the
26517 <:DefinitionOfStandardML:Definition>; there are only a few throw-away
26518 sentences along the lines of "the characters with numbers 0 to 127
26519 coincide with the ASCII character set."
26520
26521 == Support in The Standard ML Basis Library ==
26522
26523 Neither is there real support for Unicode in the <:BasisLibrary:Basis
26524 Library>.  The general consensus (which includes the opinions of the
26525 editors of the Basis Library) is that the `WideChar` and `WideString`
26526 structures are insufficient for the purposes of Unicode.  There is no
26527 `LargeChar` structure, which in itself is a deficiency, since a
26528 programmer can not program against the largest supported character
26529 size.
26530
26531 == Current Support in MLton ==
26532
26533 MLton, as a minor extension over the Definition, supports UTF-8 byte
26534 sequences in text constants.  This feature enables "UTF-8 convenience"
26535 (but not comprehensive Unicode support); in particular, it allows one
26536 to copy text from a browser and paste it into a string constant in an
26537 editor and, furthermore, if the string is printed to a terminal, then
26538 will (typically) appear as the original text.  See the
26539 <:SuccessorML#ExtendedTextConsts:extended text constants feature of
26540 Successor ML> for more details.
26541
26542 MLton, also as a minor extension over the Definition, supports
26543 `\Uxxxxxxxx` numeric escapes in text constants and has preliminary
26544 internal support for 16- and 32-bit characters and strings.
26545
26546 MLton provides `WideChar` and `WideString` structures, corresponding
26547 to 32-bit characters and strings, respectively.
26548
26549 == Questions and Discussions ==
26550
26551 There are periodic flurries of questions and discussion about Unicode
26552 in MLton/SML.  In December 2004, there was a discussion that led to
26553 some seemingly sound design decisions.  The discussion started at:
26554
26555  * http://www.mlton.org/pipermail/mlton/2004-December/026396.html
26556
26557 There is a good summary of points at:
26558
26559  * http://www.mlton.org/pipermail/mlton/2004-December/026440.html
26560
26561 In November 2005, there was a followup discussion and the beginning of
26562 some coding.
26563
26564  * http://www.mlton.org/pipermail/mlton/2005-November/028300.html
26565
26566 == Also see ==
26567
26568 The <:fxp:> XML parser has some support for dealing with Unicode
26569 documents.
26570
26571 <<<
26572
26573 :mlton-guide-page: UniversalType
26574 [[UniversalType]]
26575 UniversalType
26576 =============
26577
26578 A universal type is a type into which all other types can be embedded.
26579 Here's a <:StandardML:Standard ML> signature for a universal type.
26580
26581 [source,sml]
26582 ----
26583 signature UNIVERSAL_TYPE =
26584    sig
26585       type t
26586
26587       val embed: unit -> ('a -> t) * (t -> 'a option)
26588    end
26589 ----
26590
26591 The idea is that `type t` is the universal type and that each call to
26592 `embed` returns a new pair of functions `(inject, project)`, where
26593 `inject` embeds a value into the universal type and `project` extracts
26594 the value from the universal type.  A pair `(inject, project)`
26595 returned by `embed` works together in that `project u` will return
26596 `SOME v` if and only if `u` was created by `inject v`.  If `u` was
26597 created by a different function `inject'`, then `project` returns
26598 `NONE`.
26599
26600 Here's an example embedding integers and reals into a universal type.
26601
26602 [source,sml]
26603 ----
26604 functor Test (U: UNIVERSAL_TYPE): sig end =
26605    struct
26606       val (intIn: int -> U.t, intOut) = U.embed ()
26607       val r: U.t ref = ref (intIn 13)
26608       val s1 =
26609          case intOut (!r) of
26610             NONE => "NONE"
26611           | SOME i => Int.toString i
26612       val (realIn: real -> U.t, realOut) = U.embed ()
26613       val () = r := realIn 13.0
26614       val s2 =
26615          case intOut (!r) of
26616             NONE => "NONE"
26617           | SOME i => Int.toString i
26618       val s3 =
26619          case realOut (!r) of
26620             NONE => "NONE"
26621           | SOME x => Real.toString x
26622       val () = print (concat [s1, " ", s2, " ", s3, "\n"])
26623    end
26624 ----
26625
26626 Applying `Test` to an appropriate implementation will print
26627
26628 ----
26629 13 NONE 13.0
26630 ----
26631
26632 Note that two different calls to embed on the same type return
26633 different embeddings.
26634
26635 Standard ML does not have explicit support for universal types;
26636 however, there are at least two ways to implement them.
26637
26638
26639 == Implementation Using Exceptions ==
26640
26641 While the intended use of SML exceptions is for exception handling, an
26642 accidental feature of their design is that the `exn` type is a
26643 universal type.  The implementation relies on being able to declare
26644 exceptions locally to a function and on the fact that exceptions are
26645 <:GenerativeException:generative>.
26646
26647 [source,sml]
26648 ----
26649 structure U:> UNIVERSAL_TYPE =
26650    struct
26651       type t = exn
26652
26653       fun 'a embed () =
26654          let
26655             exception E of 'a
26656             fun project (e: t): 'a option =
26657                case e of
26658                   E a => SOME a
26659                 | _ => NONE
26660          in
26661             (E, project)
26662          end
26663    end
26664 ----
26665
26666
26667 == Implementation Using Functions and References ==
26668
26669 [source,sml]
26670 ----
26671 structure U:> UNIVERSAL_TYPE =
26672    struct
26673       datatype t = T of {clear: unit -> unit,
26674                          store: unit -> unit}
26675
26676       fun 'a embed () =
26677          let
26678             val r: 'a option ref = ref NONE
26679             fun inject (a: 'a): t =
26680                T {clear = fn () => r := NONE,
26681                   store = fn () => r := SOME a}
26682             fun project (T {clear, store}): 'a option =
26683                let
26684                   val () = store ()
26685                   val res = !r
26686                   val () = clear ()
26687                in
26688                   res
26689                end
26690          in
26691             (inject, project)
26692          end
26693    end
26694 ----
26695
26696 Note that due to the use of a shared ref cell, the above
26697 implementation is not thread safe.
26698
26699 One could try to simplify the above implementation by eliminating the
26700 `clear` function, making `type t = unit -> unit`.
26701
26702 [source,sml]
26703 ----
26704 structure U:> UNIVERSAL_TYPE =
26705    struct
26706       type t = unit -> unit
26707
26708       fun 'a embed () =
26709          let
26710             val r: 'a option ref = ref NONE
26711             fun inject (a: 'a): t = fn () => r := SOME a
26712             fun project (f: t): 'a option = (r := NONE; f (); !r)
26713          in
26714             (inject, project)
26715          end
26716    end
26717 ----
26718
26719 While correct, this approach keeps the contents of the ref cell alive
26720 longer than necessary, which could cause a space leak.  The problem is
26721 in `project`, where the call to `f` stores some value in some ref cell
26722 `r'`.  Perhaps `r'` is the same ref cell as `r`, but perhaps not.  If
26723 we do not clear `r'` before returning from `project`, then `r'` will
26724 keep the value alive, even though it is useless.
26725
26726
26727 == Also see ==
26728
26729 * <:PropertyList:>: Lisp-style property lists implemented with a universal type
26730
26731 <<<
26732
26733 :mlton-guide-page: UnresolvedBugs
26734 [[UnresolvedBugs]]
26735 UnresolvedBugs
26736 ==============
26737
26738 Here are the places where MLton deviates from
26739 <:DefinitionOfStandardML:The Definition of Standard ML (Revised)> and
26740 the <:BasisLibrary:Basis Library>.  In general, MLton complies with
26741 the <:DefinitionOfStandardML:Definition> quite closely, typically much
26742 more closely than other SML compilers (see, e.g., our list of
26743 <:SMLNJDeviations:SML/NJ's deviations>).  In fact, the four deviations
26744 listed here are the only known deviations, and we have no immediate
26745 plans to fix them.  If you find a deviation not listed here, please
26746 report a <:Bug:>.
26747
26748 We don't plan to fix these bugs because the first (parsing nested
26749 cases) has historically never been accepted by any SML compiler, the
26750 second clearly indicates a problem in the
26751 <:DefinitionOfStandardML:Definition>, and the remaining are difficult
26752 to resolve in the context of MLton's implementaton of Standard ML (and
26753 unlikely to be problematic in practice).
26754
26755 * MLton does not correctly parse case expressions nested within other
26756 matches. For example, the following fails.
26757 +
26758 [source,sml]
26759 ----
26760 fun f 0 y =
26761       case x of
26762          1 => 2
26763        | _ => 3
26764   | f _ y = 4
26765 ----
26766 +
26767 To do this in a program, simply parenthesize the case expression.
26768 +
26769 Allowing such expressions, although compliant with the Definition,
26770 would be a mistake, since using parentheses is clearer and no SML
26771 compiler has ever allowed them.  Furthermore, implementing this would
26772 require serious yacc grammar rewriting followed by postprocessing.
26773
26774 * MLton does not raise the `Bind` exception at run time when
26775 evaluating `val rec` (and `fun`) declarations that redefine
26776 identifiers that previously had constructor status.  (By default,
26777 MLton does warn at compile time about `val rec` (and `fun`)
26778 declarations that redefine identifiers that previously had
26779 constructors status; see the `valrecConstr` <:MLBasisAnnotations:ML
26780 Basis annotation>.)  For example, the Definition requires the
26781 following program to type check, but also (bizarelly) requires it to
26782 raise the `Bind` exception
26783 +
26784 [source,sml]
26785 ----
26786 val rec NONE = fn () => ()
26787 ----
26788 +
26789 The Definition's behavior is obviously an error, a mismatch between
26790 the static semantics (rule 26) and the dynamic semantics (rule 126).
26791 Given the comments on rule 26 in the Definition, it seems clear that
26792 the authors meant for `val rec` to allow an identifier's constructor
26793 status to be overridden both statically and dynamically.  Hence, MLton
26794 and most SML compilers follow rule 26, but do not follow rule 126.
26795
26796 * MLton does not hide the equality aspect of types declared in
26797 `abstype` declarations. So, MLton accepts programs like the following,
26798 while the Definition rejects them.
26799 +
26800 [source,sml]
26801 ----
26802 abstype t = T with end
26803 val _ = fn (t1, t2 : t) => t1 = t2
26804
26805 abstype t = T with val a = T end
26806 val _ = a = a
26807 ----
26808 +
26809 One consequence of this choice is that MLton accepts the following
26810 program, in accordance with the Definition.
26811 +
26812 [source,sml]
26813 ----
26814 abstype t = T with val eq = op = end
26815 val _ = fn (t1, t2 : t) => eq (t1, t2)
26816 ----
26817 +
26818 Other implementations will typically reject this program, because they
26819 make an early choice for the type of `eq` to be `''a * ''a -> bool`
26820 instead of `t * t -> bool`.  The choice is understandable, since the
26821 Definition accepts the following program.
26822 +
26823 [source,sml]
26824 ----
26825 abstype t = T with val eq = op = end
26826 val _ = eq (1, 2)
26827 ----
26828 +
26829
26830 * MLton (re-)type checks each functor definition at every
26831 corresponding functor application (the compilation technique of
26832 defunctorization).  One consequence of this implementation is that
26833 MLton accepts the following program, while the Definition rejects
26834 it.
26835 +
26836 [source,sml]
26837 ----
26838 functor F (X: sig type t end) = struct
26839     val f = id id
26840 end
26841 structure A = F (struct type t = int end)
26842 structure B = F (struct type t = bool end)
26843 val _ = A.f 10
26844 val _ = B.f "dude"
26845 ----
26846 +
26847 On the other hand, other implementations will typically reject the
26848 following program, while MLton and the Definition accept it.
26849 +
26850 [source,sml]
26851 ----
26852 functor F (X: sig type t end) = struct
26853     val f = id id
26854 end
26855 structure A = F (struct type t = int end)
26856 structure B = F (struct type t = bool end)
26857 val _ = A.f 10
26858 val _ = B.f false
26859 ----
26860 +
26861 See <!Cite(DreyerBlume07)> for more details.
26862
26863 <<<
26864
26865 :mlton-guide-page: UnsafeStructure
26866 [[UnsafeStructure]]
26867 UnsafeStructure
26868 ===============
26869
26870 This module is a subset of the `Unsafe` module provided by SML/NJ,
26871 with a few extract operations for `PackWord` and `PackReal`.
26872
26873 [source,sml]
26874 ----
26875 signature UNSAFE_MONO_ARRAY =
26876    sig
26877       type array
26878       type elem
26879
26880       val create: int -> array
26881       val sub: array * int -> elem
26882       val update: array * int * elem -> unit
26883    end
26884
26885 signature UNSAFE_MONO_VECTOR =
26886    sig
26887       type elem
26888       type vector
26889
26890       val sub: vector * int -> elem
26891    end
26892
26893 signature UNSAFE =
26894    sig
26895       structure Array:
26896          sig
26897             val create: int * 'a -> 'a array
26898             val sub: 'a array * int -> 'a
26899             val update: 'a array * int * 'a -> unit
26900          end
26901       structure CharArray: UNSAFE_MONO_ARRAY
26902       structure CharVector: UNSAFE_MONO_VECTOR
26903       structure IntArray: UNSAFE_MONO_ARRAY
26904       structure IntVector: UNSAFE_MONO_VECTOR
26905       structure Int8Array: UNSAFE_MONO_ARRAY
26906       structure Int8Vector: UNSAFE_MONO_VECTOR
26907       structure Int16Array: UNSAFE_MONO_ARRAY
26908       structure Int16Vector: UNSAFE_MONO_VECTOR
26909       structure Int32Array: UNSAFE_MONO_ARRAY
26910       structure Int32Vector: UNSAFE_MONO_VECTOR
26911       structure Int64Array: UNSAFE_MONO_ARRAY
26912       structure Int64Vector: UNSAFE_MONO_VECTOR
26913       structure IntInfArray: UNSAFE_MONO_ARRAY
26914       structure IntInfVector: UNSAFE_MONO_VECTOR
26915       structure LargeIntArray: UNSAFE_MONO_ARRAY
26916       structure LargeIntVector: UNSAFE_MONO_VECTOR
26917       structure LargeRealArray: UNSAFE_MONO_ARRAY
26918       structure LargeRealVector: UNSAFE_MONO_VECTOR
26919       structure LargeWordArray: UNSAFE_MONO_ARRAY
26920       structure LargeWordVector: UNSAFE_MONO_VECTOR
26921       structure RealArray: UNSAFE_MONO_ARRAY
26922       structure RealVector: UNSAFE_MONO_VECTOR
26923       structure Real32Array: UNSAFE_MONO_ARRAY
26924       structure Real32Vector: UNSAFE_MONO_VECTOR
26925       structure Real64Array: UNSAFE_MONO_ARRAY
26926       structure Vector:
26927          sig
26928             val sub: 'a vector * int -> 'a
26929          end
26930       structure Word8Array: UNSAFE_MONO_ARRAY
26931       structure Word8Vector: UNSAFE_MONO_VECTOR
26932       structure Word16Array: UNSAFE_MONO_ARRAY
26933       structure Word16Vector: UNSAFE_MONO_VECTOR
26934       structure Word32Array: UNSAFE_MONO_ARRAY
26935       structure Word32Vector: UNSAFE_MONO_VECTOR
26936       structure Word64Array: UNSAFE_MONO_ARRAY
26937       structure Word64Vector: UNSAFE_MONO_VECTOR
26938
26939       structure PackReal32Big : PACK_REAL
26940       structure PackReal32Little : PACK_REAL
26941       structure PackReal64Big : PACK_REAL
26942       structure PackReal64Little : PACK_REAL
26943       structure PackRealBig : PACK_REAL
26944       structure PackRealLittle : PACK_REAL
26945       structure PackWord16Big : PACK_WORD
26946       structure PackWord16Little : PACK_WORD
26947       structure PackWord32Big : PACK_WORD
26948       structure PackWord32Little : PACK_WORD
26949       structure PackWord64Big : PACK_WORD
26950       structure PackWord64Little : PACK_WORD
26951    end
26952 ----
26953
26954 <<<
26955
26956 :mlton-guide-page: Useless
26957 [[Useless]]
26958 Useless
26959 =======
26960
26961 <:Useless:> is an optimization pass for the <:SSA:>
26962 <:IntermediateLanguage:>, invoked from <:SSASimplify:>.
26963
26964 == Description ==
26965
26966 This pass:
26967
26968 * removes components of tuples that are constants (use unification)
26969 * removes function arguments that are constants
26970 * builds some kind of dependence graph where
26971 ** a value of ground type is useful if it is an arg to a primitive
26972 ** a tuple is useful if it contains a useful component
26973 ** a constructor is useful if it contains a useful component or is used in a `Case` transfer
26974
26975 If a useful tuple is coerced to another useful tuple, then all of
26976 their components must agree (exactly).  It is trivial to convert a
26977 useful value to a useless one.
26978
26979 == Implementation ==
26980
26981 * <!ViewGitFile(mlton,master,mlton/ssa/useless.fun)>
26982
26983 == Details and Notes ==
26984
26985 It is also trivial to convert a useful tuple to one of its useful
26986 components -- but this seems hard.
26987
26988 Suppose that you have a `ref`/`array`/`vector` that is useful, but the
26989 components aren't -- then the components are converted to type `unit`,
26990 and any primitive args must be as well.
26991
26992 Unify all handler arguments so that `raise`/`handle` has a consistent
26993 calling convention.
26994
26995 <<<
26996
26997 :mlton-guide-page: Users
26998 [[Users]]
26999 Users
27000 =====
27001
27002 Here is a list of companies, projects, and courses that use or have
27003 used MLton.  If you use MLton and are not here, please add your
27004 project with a brief description and a link.  Thanks.
27005
27006 == Companies ==
27007
27008 * http://www.hardcoreprocessing.com/[Hardcore Processing] uses MLton as a http://www.hardcoreprocessing.com/Freeware/MLTonWin32.html[crosscompiler from Linux to Windows] for graphics and game software.
27009 ** http://www.cex3d.net/[CEX3D Converter], a conversion program for 3D objects.
27010 ** http://www.hardcoreprocessing.com/company/showreel/index.html[Interactive Showreel], which contains a crossplatform GUI-toolkit and a realtime renderer for a subset of RenderMan written in Standard ML.
27011 ** various http://www.hardcoreprocessing.com/entertainment/index.html[games]
27012 * http://www.mathworks.com/products/polyspace/[MathWorks/PolySpace Technologies] builds their product that detects runtime errors in embedded systems based on abstract interpretation.
27013 // * http://www.sourcelight.com/[Sourcelight Technologies] uses MLton internally for prototyping and for processing databases as part of their system that makes personalized movie recommen
27014 * http://www.reactive-systems.com/[Reactive Systems] uses MLton to build Reactis, a model-based testing and validation package used in the automotive and aerospace industries.
27015
27016 == Projects ==
27017
27018 * http://www-ia.hiof.no/%7Erolando/adate_intro.html[ADATE], Automatic Design of Algorithms Through Evolution, a system for automatic programming i.e., inductive inference of algorithms. ADATE can automatically generate non-trivial and novel algorithms written in Standard ML.
27019 * http://types.bu.edu/reports/Dim+Wes+Mul+Tur+Wel+Con:TIC-2000-LNCS.html[CIL], a compiler for SML based on intersection and union types.
27020 * http://www.cs.cmu.edu/%7Econcert/[ConCert], a project investigating certified code for grid computing.
27021 * http://hcoop.sourceforge.net/[Cooperative Internet hosting tools]
27022 // * http://www.eecs.harvard.edu/%7Estein/[DesynchFS], a programming model and distributed file system for large clusters
27023 * http://www.fantasy-coders.de/projects/gh/[Guugelhupf], a simple search engine.
27024 * http://www.mpi-sws.org/%7Erossberg/hamlet/[HaMLet], a model implementation of Standard ML.
27025 * http://code.google.com/p/kepler-code/[KeplerCode], independent verification of the computational aspects of proofs of the Kepler conjecture and the Dodecahedral conjecture.
27026 * http://www.gilith.com/research/metis/[Metis], a first-order prover (used in the http://hol.sourceforge.net/[HOL4 theorem prover] and the http://isabelle.in.tum.de/[Isabelle theorem prover]).
27027 * http://tom7misc.cvs.sourceforge.net/viewvc/tom7misc/net/mlftpd/[mlftpd], an ftp daemon written in SML.  <:TomMurphy:> is also working on http://tom7misc.cvs.sourceforge.net/viewvc/tom7misc/net/[replacements for standard network services] in SML.  He also uses MLton to build his entries (http://www.cs.cmu.edu/%7Etom7/icfp2001/[2001], http://www.cs.cmu.edu/%7Etom7/icfp2002/[2002], http://www.cs.cmu.edu/%7Etom7/icfp2004/[2004], http://www.cs.cmu.edu/%7Etom7/icfp2005/[2005]) in the annual ICFP programming contest.
27028 * http://www.informatik.uni-freiburg.de/proglang/research/software/mlope/[MLOPE], an offline partial evaluator for Standard ML.
27029 * http://www.ida.liu.se/%7Epelab/rml/[RML], a system for developing, compiling and debugging and teaching structural operational semantics (SOS) and natural semantics specifications.
27030 * http://www.macs.hw.ac.uk/ultra/skalpel/index.html[Skalpel], a type-error slicer for SML
27031 // * http://alleystoughton.us/smlnjtrans/[SMLNJtrans], a program for generating SML/NJ transcripts in LaTeX.
27032 * http://www.cs.cmu.edu/%7Etom7/ssapre/[SSA PRE], an implementation of Partial Redundancy Elimination for MLton.
27033 * <:Stabilizers:>, a modular checkpointing abstraction for concurrent functional programs.
27034 * http://ttic.uchicago.edu/%7Epl/sa-sml/[Self-Adjusting SML], self-adjusting computation, a model of computing where programs can automatically adjust to changes to their data.
27035 * http://faculty.ist.unomaha.edu/winter/ShiftLab/TL_web/TL_index.html[TL System], providing general-purpose support for rewrite-based transformation over elements belonging to a (user-defined) domain language.
27036 * http://projects.laas.fr/tina/[Tina] (Time Petri net Analyzer)
27037 * http://www.twelf.org/[Twelf] an implementation of the LF logical framework.
27038 * http://www.cs.indiana.edu/%7Errnewton/wavescope/[WaveScript/WaveScript], a sensor network project; the WaveScript compiler can generate SML (MLton) code.
27039
27040 == Courses ==
27041
27042 * http://www.eecs.harvard.edu/%7Enr/cs152/[Harvard CS-152], undergraduate programming languages.
27043 * http://www.ia-stud.hiof.no/%7Erolando/PL/[Høgskolen i Østfold IAI30202], programming languages.
27044
27045 <<<
27046
27047 :mlton-guide-page: Utilities
27048 [[Utilities]]
27049 Utilities
27050 =========
27051
27052 This page is a collection of basic utilities used in the examples on
27053 various pages.  See
27054
27055  * <:InfixingOperators:>, and
27056  * <:ProductType:>
27057
27058 for longer discussions on some of these utilities.
27059
27060 [source,sml]
27061 ----
27062 (* Operator precedence table *)
27063 infix   8  * / div mod        (* +1 from Basis Library *)
27064 infix   7  + - ^              (* +1 from Basis Library *)
27065 infixr  6  :: @               (* +1 from Basis Library *)
27066 infix   5  = <> > >= < <=     (* +1 from Basis Library *)
27067 infix   4  <\ \>
27068 infixr  4  </ />
27069 infix   3  o
27070 infix   2  >|
27071 infixr  2  |<
27072 infix   1  :=                 (* -2 from Basis Library *)
27073 infix   0  before &
27074
27075 (* Some basic combinators *)
27076 fun const x _ = x
27077 fun cross (f, g) (x, y) = (f x, g y)
27078 fun curry f x y = f (x, y)
27079 fun fail e _ = raise e
27080 fun id x = x
27081
27082 (* Product type *)
27083 datatype ('a, 'b) product = & of 'a * 'b
27084
27085 (* Sum type *)
27086 datatype ('a, 'b) sum = INL of 'a | INR of 'b
27087
27088 (* Some type shorthands *)
27089 type 'a uop = 'a -> 'a
27090 type 'a fix = 'a uop -> 'a
27091 type 'a thunk = unit -> 'a
27092 type 'a effect = 'a -> unit
27093 type ('a, 'b) emb = ('a -> 'b) * ('b -> 'a)
27094
27095 (* Infixing, sectioning, and application operators *)
27096 fun x <\ f = fn y => f (x, y)
27097 fun f \> y = f y
27098 fun f /> y = fn x => f (x, y)
27099 fun x </ f = f x
27100
27101 (* Piping operators *)
27102 val op>| = op</
27103 val op|< = op\>
27104 ----
27105
27106 <<<
27107
27108 :mlton-guide-page: ValueRestriction
27109 [[ValueRestriction]]
27110 ValueRestriction
27111 ================
27112
27113 The value restriction is a rule that governs when type inference is
27114 allowed to polymorphically generalize a value declaration.  In short,
27115 the value restriction says that generalization can only occur if the
27116 right-hand side of an expression is syntactically a value.  For
27117 example, in
27118
27119 [source,sml]
27120 ----
27121 val f = fn x => x
27122 val _ = (f "foo"; f 13)
27123 ----
27124
27125 the expression `fn x => x` is syntactically a value, so `f` has
27126 polymorphic type `'a -> 'a` and both calls to `f` type check.  On the
27127 other hand, in
27128
27129 [source,sml]
27130 ----
27131 val f = let in fn x => x end
27132 val _ = (f "foo"; f 13)
27133 ----
27134
27135 the expression `let in fn x => end end` is not syntactically a value
27136 and so `f` can either have type `int -> int` or `string -> string`,
27137 but not `'a -> 'a`.  Hence, the program does not type check.
27138
27139 <:DefinitionOfStandardML:The Definition of Standard ML> spells out
27140 precisely which expressions are syntactic values (it refers to such
27141 expressions as _non-expansive_).  An expression is a value if it is of
27142 one of the following forms.
27143
27144 * a constant (`13`, `"foo"`, `13.0`, ...)
27145 * a variable (`x`, `y`, ...)
27146 * a function (`fn x => e`)
27147 * the application of a constructor other than `ref` to a value (`Foo v`)
27148 * a type constrained value (`v: t`)
27149 * a tuple in which each field is a value `(v1, v2, ...)`
27150 * a record in which each field is a value `{l1 = v1, l2 = v2, ...}`
27151 * a list in which each element is a value `[v1, v2, ...]`
27152
27153
27154 == Why the value restriction exists ==
27155
27156 The value restriction prevents a ref cell (or an array) from holding
27157 values of different types, which would allow a value of one type to be
27158 cast to another and hence would break type safety.  If the restriction
27159 were not in place, the following program would type check.
27160
27161 [source,sml]
27162 ----
27163 val r: 'a option ref = ref NONE
27164 val r1: string option ref = r
27165 val r2: int option ref = r
27166 val () = r1 := SOME "foo"
27167 val v: int = valOf (!r2)
27168 ----
27169
27170 The first line violates the value restriction because `ref NONE` is
27171 not a value.  All other lines are type correct.  By its last line, the
27172 program has cast the string `"foo"` to an integer.  This breaks type
27173 safety, because now we can add a string to an integer with an
27174 expression like `v + 13`.  We could even be more devious, by adding
27175 the following two lines, which allow us to threat the string `"foo"`
27176 as a function.
27177
27178 [source,sml]
27179 ----
27180 val r3: (int -> int) option ref = r
27181 val v: int -> int = valOf (!r3)
27182 ----
27183
27184 Eliminating the explicit `ref` does nothing to fix the problem.  For
27185 example, we could replace the declaration of `r` with the following.
27186
27187 [source,sml]
27188 ----
27189 val f: unit -> 'a option ref = fn () => ref NONE
27190 val r: 'a option ref = f ()
27191 ----
27192
27193 The declaration of `f` is well typed, while the declaration of `r`
27194 violates the value restriction because `f ()` is not a value.
27195
27196
27197 == Unnecessarily rejected programs ==
27198
27199 Unfortunately, the value restriction rejects some programs that could
27200 be accepted.
27201
27202 [source,sml]
27203 ----
27204 val id: 'a -> 'a = fn x => x
27205 val f: 'a -> 'a = id id
27206 ----
27207
27208 The type constraint on `f` requires `f` to be polymorphic, which is
27209 disallowed because `id id` is not a value.  MLton reports the
27210 following type error.
27211
27212 ----
27213 Error: z.sml 2.5-2.5.
27214   Type of variable cannot be generalized in expansive declaration: f.
27215     type: ['a] -> ['a]
27216     in: val 'a f: ('a -> 'a) = id id
27217 ----
27218
27219 MLton indicates the inability to make `f` polymorphic by saying that
27220 the type of `f` cannot be generalized (made polymorphic) its
27221 declaration is expansive (not a value).  MLton doesn't explicitly
27222 mention the value restriction, but that is the reason.  If we leave
27223 the type constraint off of `f`
27224
27225 [source,sml]
27226 ----
27227 val id: 'a -> 'a = fn x => x
27228 val f = id id
27229 ----
27230
27231 then the program succeeds; however, MLton gives us the following
27232 warning.
27233
27234 ----
27235 Warning: z.sml 2.5-2.5.
27236   Type of variable was not inferred and could not be generalized: f.
27237     type: ??? -> ???
27238     in: val f = id id
27239 ----
27240
27241 This warning indicates that MLton couldn't polymorphically generalize
27242 `f`, nor was there enough context using `f` to determine its type.
27243 This in itself is not a type error, but it it is a hint that something
27244 is wrong with our program.  Using `f` provides enough context to
27245 eliminate the warning.
27246
27247 [source,sml]
27248 ----
27249 val id: 'a -> 'a = fn x => x
27250 val f = id id
27251 val _ = f 13
27252 ----
27253
27254 But attempting to use `f` as a polymorphic function will fail.
27255
27256 [source,sml]
27257 ----
27258 val id: 'a -> 'a = fn x => x
27259 val f = id id
27260 val _ = f 13
27261 val _ = f "foo"
27262 ----
27263
27264 ----
27265 Error: z.sml 4.9-4.15.
27266   Function applied to incorrect argument.
27267     expects: [int]
27268     but got: [string]
27269     in: f "foo"
27270 ----
27271
27272
27273 == Alternatives to the value restriction ==
27274
27275 There would be nothing wrong with treating `f` as polymorphic in
27276
27277 [source,sml]
27278 ----
27279 val id: 'a -> 'a = fn x => x
27280 val f = id id
27281 ----
27282
27283 One might think that the value restriction could be relaxed, and that
27284 only types involving `ref` should be disallowed.  Unfortunately, the
27285 following example shows that even the type `'a -> 'a` can cause
27286 problems.  If this program were allowed, then we could cast an integer
27287 to a string (or any other type).
27288
27289 [source,sml]
27290 ----
27291 val f: 'a -> 'a =
27292    let
27293       val r: 'a option ref = ref NONE
27294    in
27295       fn x =>
27296       let
27297          val y = !r
27298          val () = r := SOME x
27299       in
27300          case y of
27301             NONE => x
27302           | SOME y => y
27303       end
27304    end
27305 val _ = f 13
27306 val _ = f "foo"
27307 ----
27308
27309 The previous version of Standard ML took a different approach
27310 (<!Cite(MilnerEtAl90)>, <!Cite(Tofte90)>, <:ImperativeTypeVariable:>)
27311 than the value restriction.  It encoded information in the type system
27312 about when ref cells would be created, and used this to prevent a ref
27313 cell from holding multiple types.  Although it allowed more programs
27314 to be type checked, this approach had significant drawbacks.  First,
27315 it was significantly more complex, both for implementers and for
27316 programmers.  Second, it had an unfortunate interaction with the
27317 modularity, because information about ref usage was exposed in module
27318 signatures.  This either prevented the use of references for
27319 implementing a signature, or required information that one would like
27320 to keep hidden to propagate across modules.
27321
27322 In the early nineties, Andrew Wright studied about 250,000 lines of
27323 existing SML code and discovered that it did not make significant use
27324 of the extended typing ability, and proposed the value restriction as
27325 a simpler alternative (<!Cite(Wright95)>).  This was adopted in the
27326 revised <:DefinitionOfStandardML:Definition>.
27327
27328
27329 == Working with the value restriction ==
27330
27331 One technique that works with the value restriction is
27332 <:EtaExpansion:>.  We can use eta expansion to make our `id id`
27333 example type check follows.
27334
27335 [source,sml]
27336 ----
27337 val id: 'a -> 'a = fn x => x
27338 val f: 'a -> 'a = fn z => (id id) z
27339 ----
27340
27341 This solution means that the computation (in this case `id id`) will
27342 be performed each time `f` is applied, instead of just once when `f`
27343 is declared.  In this case, that is not a problem, but it could be if
27344 the declaration of `f` performs substantial computation or creates a
27345 shared data structure.
27346
27347 Another technique that sometimes works is to move a monomorphic
27348 computation prior to a (would-be) polymorphic declaration so that the
27349 expression is a value.  Consider the following program, which fails
27350 due to the value restriction.
27351
27352 [source,sml]
27353 ----
27354 datatype 'a t = A of string | B of 'a
27355 val x: 'a t = A (if true then "yes" else "no")
27356 ----
27357
27358 It is easy to rewrite this program as
27359
27360 [source,sml]
27361 ----
27362 datatype 'a t = A of string | B of 'a
27363 local
27364    val s = if true then "yes" else "no"
27365 in
27366    val x: 'a t = A s
27367 end
27368 ----
27369
27370 The following example (taken from <!Cite(Wright95)>) creates a ref
27371 cell to count the number of times a function is called.
27372
27373 [source,sml]
27374 ----
27375 val count: ('a -> 'a) -> ('a -> 'a) * (unit -> int) =
27376    fn f =>
27377    let
27378       val r = ref 0
27379    in
27380       (fn x => (r := 1 + !r; f x), fn () => !r)
27381    end
27382 val id: 'a -> 'a = fn x => x
27383 val (countId: 'a -> 'a, numCalls) = count id
27384 ----
27385
27386 The example does not type check, due to the value restriction.
27387 However, it is easy to rewrite the program, staging the ref cell
27388 creation before the polymorphic code.
27389
27390 [source,sml]
27391 ----
27392 datatype t = T of int ref
27393 val count1: unit -> t = fn () => T (ref 0)
27394 val count2: t * ('a -> 'a) -> (unit -> int) * ('a -> 'a) =
27395    fn (T r, f) => (fn () => !r, fn x => (r := 1 + !r; f x))
27396 val id: 'a -> 'a = fn x => x
27397 val t = count1 ()
27398 val countId: 'a -> 'a = fn z => #2 (count2 (t, id)) z
27399 val numCalls = #1 (count2 (t, id))
27400 ----
27401
27402 Of course, one can hide the constructor `T` inside a `local` or behind
27403 a signature.
27404
27405
27406 == Also see ==
27407
27408 * <:ImperativeTypeVariable:>
27409
27410 <<<
27411
27412 :mlton-guide-page: VariableArityPolymorphism
27413 [[VariableArityPolymorphism]]
27414 VariableArityPolymorphism
27415 =========================
27416
27417 <:StandardML:Standard ML> programmers often face the problem of how to
27418 provide a variable-arity polymorphic function.  For example, suppose
27419 one is defining a combinator library, e.g. for parsing or pickling.
27420 The signature for such a library might look something like the
27421 following.
27422
27423 [source,sml]
27424 ----
27425 signature COMBINATOR =
27426    sig
27427       type 'a t
27428
27429       val int: int t
27430       val real: real t
27431       val string: string t
27432       val unit: unit t
27433       val tuple2: 'a1 t * 'a2 t -> ('a1 * 'a2) t
27434       val tuple3: 'a1 t * 'a2 t * 'a3 t -> ('a1 * 'a2 * 'a3) t
27435       val tuple4: 'a1 t * 'a2 t * 'a3 t * 'a4 t
27436                   -> ('a1 * 'a2 * 'a3 * 'a4) t
27437       ...
27438    end
27439 ----
27440
27441 The question is how to define a variable-arity tuple combinator.
27442 Traditionally, the only way to take a variable number of arguments in
27443 SML is to put the arguments in a list (or vector) and pass that.  So,
27444 one might define a tuple combinator with the following signature.
27445 [source,sml]
27446 ----
27447 val tupleN: 'a list -> 'a list t
27448 ----
27449
27450 The problem with this approach is that as soon as one places values in
27451 a list, they must all have the same type.  So, programmers often take
27452 an alternative approach, and define a family of `tuple<N>` functions,
27453 as we see in the `COMBINATOR` signature above.
27454
27455 The family-of-functions approach is ugly for many reasons.  First, it
27456 clutters the signature with a number of functions when there should
27457 really only be one.  Second, it is _closed_, in that there are a fixed
27458 number of tuple combinators in the interface, and should a client need
27459 a combinator for a large tuple, he is out of luck.  Third, this
27460 approach often requires a lot of duplicate code in the implementation
27461 of the combinators.
27462
27463 Fortunately, using <:Fold01N:> and <:ProductType:products>, one can
27464 provide an interface and implementation that solves all these
27465 problems.  Here is a simple pickling module that converts values to
27466 strings.
27467 [source,sml]
27468 ----
27469 structure Pickler =
27470    struct
27471       type 'a t = 'a -> string
27472
27473       val unit = fn () => ""
27474
27475       val int = Int.toString
27476
27477       val real = Real.toString
27478
27479       val string = id
27480
27481       type 'a accum = 'a * string list -> string list
27482
27483       val tuple =
27484          fn z =>
27485          Fold01N.fold
27486          {finish = fn ps => fn x => concat (rev (ps (x, []))),
27487           start = fn p => fn (x, l) => p x :: l,
27488           zero = unit}
27489          z
27490
27491       val ` =
27492          fn z =>
27493          Fold01N.step1
27494          {combine = (fn (p, p') => fn (x & x', l) => p' x' :: "," :: p (x, l))}
27495          z
27496    end
27497 ----
27498
27499 If one has `n` picklers of types
27500 [source,sml]
27501 ----
27502 val p1: a1 Pickler.t
27503 val p2: a2 Pickler.t
27504 ...
27505 val pn: an Pickler.t
27506 ----
27507 then one can construct a pickler for n-ary products as follows.
27508 [source,sml]
27509 ----
27510 tuple `p1 `p2 ... `pn $ : (a1 & a2 & ... & an) Pickler.t
27511 ----
27512
27513 For example, with `Pickler` in scope, one can prove the following
27514 equations.
27515 [source,sml]
27516 ----
27517 "" = tuple $ ()
27518 "1" = tuple `int $ 1
27519 "1,2.0" = tuple `int `real $ (1 & 2.0)
27520 "1,2.0,three" = tuple `int `real `string $ (1 & 2.0 & "three")
27521 ----
27522
27523 Here is the signature for `Pickler`.  It shows why the `accum` type is
27524 useful.
27525 [source,sml]
27526 ----
27527 signature PICKLER =
27528    sig
27529       type 'a t
27530
27531       val int: int t
27532       val real: real t
27533       val string: string t
27534       val unit: unit t
27535
27536       type 'a accum
27537       val ` : ('a accum, 'b t, ('a, 'b) prod accum,
27538                'z1, 'z2, 'z3, 'z4, 'z5, 'z6, 'z7) Fold01N.step1
27539       val tuple: ('a t, 'a accum, 'b accum, 'b t, unit t,
27540                   'z1, 'z2, 'z3, 'z4, 'z5) Fold01N.t
27541    end
27542
27543 structure Pickler: PICKLER = Pickler
27544 ----
27545
27546 <<<
27547
27548 :mlton-guide-page: Variant
27549 [[Variant]]
27550 Variant
27551 =======
27552
27553 A _variant_ is an arm of a datatype declaration.  For example, the
27554 datatype
27555
27556 [source,sml]
27557 ----
27558 datatype t = A | B of int | C of real
27559 ----
27560
27561 has three variants: `A`, `B`, and `C`.
27562
27563 <<<
27564
27565 :mlton-guide-page: VesaKarvonen
27566 [[VesaKarvonen]]
27567 VesaKarvonen
27568 ============
27569
27570 Vesa Karvonen is a student at the http://www.cs.helsinki.fi/index.en.html[University of Helsinki].
27571 His interests lie in programming techniques that allow complex programs to be expressed
27572 clearly and concisely and the design and implementation of programming languages.
27573
27574 image::VesaKarvonen.attachments/vesa-in-mlton-t-shirt.jpg[align="center"]
27575
27576 Things he'd like to see for SML and hopes to be able to contribute towards:
27577
27578 * A practical tool for documenting libraries. Preferably one that is
27579 based on extracting the documentation from source code comments.
27580
27581 * A good IDE. Possibly an enhanced SML mode (`esml-mode`) for Emacs.
27582 Google for http://www.google.com/search?&q=SLIME+video[SLIME video] to
27583 get an idea of what he'd like to see. Some specific notes:
27584 +
27585 --
27586   * show type at point
27587   * robust, consistent indentation
27588   * show documentation
27589   * jump to definition (see <:EmacsDefUseMode:>)
27590 --
27591 +
27592 <:EmacsBgBuildMode:> has also been written for working with MLton.
27593
27594 * Documented and cataloged libraries. Perhaps something like
27595 http://www.boost.org[Boost], but for SML libraries.  Here is a partial
27596 list of libraries, tools, and frameworks Vesa is or has been working
27597 on:
27598 +
27599 --
27600   * Asynchronous Programming Library (<!ViewGitFile(mltonlib,master,com/ssh/async/unstable/README)>)
27601   * Extended Basis Library (<!ViewGitFile(mltonlib,master,com/ssh/extended-basis/unstable/README)>)
27602   * Generic Programming Library (<!ViewGitFile(mltonlib,master,com/ssh/generic/unstable/README)>)
27603   * Pretty Printing Library (<!ViewGitFile(mltonlib,master,com/ssh/prettier/unstable/README)>)
27604   * Random Generator Library (<!ViewGitFile(mltonlib,master,com/ssh/random/unstable/README)>)
27605   * RPC (Remote Procedure Call) Library (<!ViewGitFile(mltonlib,master,org/mlton/vesak/rpc-lib/unstable/README)>)
27606   * http://www.libsdl.org/[SDL] Binding (<!ViewGitFile(mltonlib,master,org/mlton/vesak/sdl/unstable/README)>)
27607   * Unit Testing Library (<!ViewGitFile(mltonlib,master,com/ssh/unit-test/unstable/README)>)
27608   * Use Library (<!ViewGitFile(mltonlib,master,org/mlton/vesak/use-lib/unstable/README)>)
27609   * Windows Library (<!ViewGitFile(mltonlib,master,com/ssh/windows/unstable/README)>)
27610 --
27611 Note that most of these libraries have been ported to several <:StandardMLImplementations:SML implementations>.
27612
27613 <<<
27614
27615 :mlton-guide-page: WarnUnusedAnomalies
27616 [[WarnUnusedAnomalies]]
27617 WarnUnusedAnomalies
27618 ===================
27619
27620 The `warnUnused` <:MLBasisAnnotations:MLBasis annotation> can be used
27621 to report unused identifiers.  This can be useful for catching bugs
27622 and for code maintenance (e.g., eliminating dead code).  However, the
27623 `warnUnused` annotation can sometimes behave in counter-intuitive
27624 ways.  This page gives some of the anomalies that have been reported.
27625
27626 * Functions whose only uses are recursive uses within their bodies are
27627 not warned as unused:
27628 +
27629 [source,sml]
27630 ----
27631 local
27632 fun foo () = foo () : unit
27633 val bar = let fun baz () = baz () : unit in baz end
27634 in
27635 end
27636 ----
27637 +
27638 ----
27639 Warning: z.sml 3.5.
27640   Unused variable: bar.
27641 ----
27642
27643 * Components of actual functor argument that are necessary to match
27644 the functor argument signature but are unused in the body of the
27645 functor are warned as unused:
27646 +
27647 [source,sml]
27648 ----
27649 functor Warning (type t val x : t) = struct
27650    val y = x
27651 end
27652 structure X = Warning (type t = int val x = 1)
27653 ----
27654 +
27655 ----
27656 Warning: z.sml 4.29.
27657   Unused type: t.
27658 ----
27659
27660
27661 * No component of a functor result is warned as unused.  In the
27662 following, the only uses of `f2` are to match the functor argument
27663 signatures of `functor G` and `functor H` and there are no uses of
27664 `z`:
27665 +
27666 [source,sml]
27667 ----
27668 functor F(structure X : sig type t end) = struct
27669    type t = X.t
27670    fun f1 (_ : X.t) = ()
27671    fun f2 (_ : X.t) = ()
27672    val z = ()
27673 end
27674 functor G(structure Y : sig
27675                            type t
27676                            val f1 : t -> unit
27677                            val f2 : t -> unit
27678                            val z : unit
27679                         end) = struct
27680    fun g (x : Y.t) = Y.f1 x
27681 end
27682 functor H(structure Y : sig
27683                            type t
27684                            val f1 : t -> unit
27685                            val f2 : t -> unit
27686                            val z : unit
27687                         end) = struct
27688    fun h (x : Y.t) = Y.f1 x
27689 end
27690 functor Z() = struct
27691    structure S = F(structure X = struct type t = unit end)
27692    structure SG = G(structure Y = S)
27693    structure SH = H(structure Y = S)
27694 end
27695 structure U = Z()
27696 val _ = U.SG.g ()
27697 val _ = U.SH.h ()
27698 ----
27699 +
27700 ----
27701 ----
27702
27703 <<<
27704
27705 :mlton-guide-page: WesleyTerpstra
27706 [[WesleyTerpstra]]
27707 WesleyTerpstra
27708 ==============
27709
27710 Wesley W. Terpstra is a PhD student at the Technische Universitat Darmstadt (Germany).
27711
27712 Research interests
27713
27714 * Distributed systems (P2P)
27715 * Number theory (Error-correcting codes)
27716
27717 My interest in SML is centered on the fact the the language is able to directly express ideas from number theory which are important for my work. Modules and Functors seem to be a very natural basis for implementing many algebraic structures. MLton provides an ideal platform for actual implementation as it is fast and has unboxed words.
27718
27719 Things I would like from MLton in the future:
27720
27721 * Some better optimization of mathematical expressions
27722 * IPv6 and multicast support
27723 * A complete GUI toolkit like mGTK
27724 * More supported platforms so that applications written under MLton have a wider audience
27725
27726 <<<
27727
27728 :mlton-guide-page: WholeProgramOptimization
27729 [[WholeProgramOptimization]]
27730 WholeProgramOptimization
27731 ========================
27732
27733 Whole-program optimization is a compilation technique in which
27734 optimizations operate over the entire program.  This allows the
27735 compiler many optimization opportunities that are not available when
27736 analyzing modules separately (as with separate compilation).
27737
27738 Most of MLton's optimizations are whole-program optimizations.
27739 Because MLton compiles the whole program at once, it can perform
27740 optimization across module boundaries.  As a consequence, MLton often
27741 reduces or eliminates the run-time penalty that arises with separate
27742 compilation of SML features such as functors, modules, polymorphism,
27743 and higher-order functions.  MLton takes advantage of having the
27744 entire program to perform transformations such as: defunctorization,
27745 monomorphisation, higher-order control-flow analysis, inlining,
27746 unboxing, argument flattening, redundant-argument removal, constant
27747 folding, and representation selection.  Whole-program compilation is
27748 an integral part of the design of MLton and is not likely to change.
27749
27750 <<<
27751
27752 :mlton-guide-page: WishList
27753 [[WishList]]
27754 WishList
27755 ========
27756
27757 This page is mainly for recording recurring feature requests.  If you
27758 have a new feature request, you probably want to query interest on one
27759 of the <:Contact:mailing lists> first.
27760
27761 Please be aware of MLton's policy on
27762 <:LanguageChanges:language changes>.  Nonetheless, we hope to provide
27763 support for some of the "immediate" <:SuccessorML:> proposals in a
27764 future release.
27765
27766
27767 == Support for link options in ML Basis files ==
27768
27769 Introduce a mechanism to specify link options in <:MLBasis:ML Basis>
27770 files.  For example, generalizing a bit, a ML Basis declaration of the
27771 form
27772
27773 ----
27774 option "option"
27775 ----
27776
27777 could be introduced whose semantics would be the same (as closely as
27778 possible) as if the option string were specified on the compiler
27779 command line.
27780
27781 The main motivation for this is that a MLton library that would
27782 introduce bindings (through <:ForeignFunctionInterface:FFI>) to an
27783 external library could be packaged conveniently as a single MLB file.
27784 For example, to link with library `foo` the MLB file would simply
27785 contain:
27786
27787 ----
27788 option "-link-opt -lfoo"
27789 ----
27790
27791 Similar feature requests have been discussed previously on the mailing lists:
27792
27793 * http://www.mlton.org/pipermail/mlton/2004-July/025553.html
27794 * http://www.mlton.org/pipermail/mlton/2005-January/026648.html
27795
27796 <<<
27797
27798 :mlton-guide-page: XML
27799 [[XML]]
27800 XML
27801 ===
27802
27803 <:XML:> is an <:IntermediateLanguage:>, translated from <:CoreML:> by
27804 <:Defunctorize:>, optimized by <:XMLSimplify:>, and translated by
27805 <:Monomorphise:> to <:SXML:>.
27806
27807 == Description ==
27808
27809 <:XML:> is polymorphic, higher-order, with flat patterns.  Every
27810 <:XML:> expression is annotated with its type.  Polymorphic
27811 generalization is made explicit through type variables annotating
27812 `val` and `fun` declarations.  Polymorphic instantiation is made
27813 explicit by specifying type arguments at variable references.  <:XML:>
27814 patterns can not be nested and can not contain wildcards, constraints,
27815 flexible records, or layering.
27816
27817 == Implementation ==
27818
27819 * <!ViewGitFile(mlton,master,mlton/xml/xml.sig)>
27820 * <!ViewGitFile(mlton,master,mlton/xml/xml.fun)>
27821 * <!ViewGitFile(mlton,master,mlton/xml/xml-tree.sig)>
27822 * <!ViewGitFile(mlton,master,mlton/xml/xml-tree.fun)>
27823
27824 == Type Checking ==
27825
27826 <:XML:> also has a type checker, used for debugging.  At present, the
27827 type checker is also the best specification of the type system of
27828 <:XML:>.  If you need more details, the type checker
27829 (<!ViewGitFile(mlton,master,mlton/xml/type-check.sig)>,
27830 <!ViewGitFile(mlton,master,mlton/xml/type-check.fun)>), is pretty short.
27831
27832 Since the type checker does not affect the output of the compiler
27833 (unless it reports an error), it can be turned off.  The type checker
27834 recursively descends the program, checking that the type annotating
27835 each node is the same as the type synthesized from the types of the
27836 expressions subnodes.
27837
27838 == Details and Notes ==
27839
27840 <:XML:> uses the same atoms as <:CoreML:>, hence all identifiers
27841 (constructors, variables, etc.) are unique and can have properties
27842 attached to them.  Finally, <:XML:> has a simplifier (<:XMLShrink:>),
27843 which implements a reduction system.
27844
27845 === Types ===
27846
27847 <:XML:> types are either type variables or applications of n-ary type
27848 constructors.  There are many utility functions for constructing and
27849 destructing types involving built-in type constructors.
27850
27851 A type scheme binds list of type variables in a type.  The only
27852 interesting operation on type schemes is the application of a type
27853 scheme to a list of types, which performs a simultaneous substitution
27854 of the type arguments for the bound type variables of the scheme.  For
27855 the purposes of type checking, it is necessary to know the type scheme
27856 of variables, constructors, and primitives.  This is done by
27857 associating the scheme with the identifier using its property list.
27858 This approach is used instead of the more traditional environment
27859 approach for reasons of speed.
27860
27861 === XmlTree ===
27862
27863 Before defining `XML`, the signature for language <:XML:>, we need to
27864 define an auxiliary signature `XML_TREE`, that contains the datatype
27865 declarations for the expression trees of <:XML:>.  This is done solely
27866 for the purpose of modularity -- it allows the simplifier and type
27867 checker to be defined by separate functors (which take a structure
27868 matching `XML_TREE`).  Then, `Xml` is defined as the signature for a
27869 module containing the expression trees, the simplifier, and the type
27870 checker.
27871
27872 Both constructors and variables can have type schemes, hence both
27873 constructor and variable references specify the instance of the scheme
27874 at the point of references.  An instance is specified with a vector of
27875 types, which corresponds to the type variables in the scheme.
27876
27877 <:XML:> patterns are flat (i.e. not nested).  A pattern is a
27878 constructor with an optional argument variable.  Patterns only occur
27879 in `case` expressions.  To evaluate a case expression, compare the
27880 test value sequentially against each pattern.  For the first pattern
27881 that matches, destruct the value if necessary to bind the pattern
27882 variables and evaluate the corresponding expression.  If no pattern
27883 matches, evaluate the default.  All patterns of a case statement are
27884 of the same variant of `Pat.t`, although this is not enforced by ML's
27885 type system.  The type checker, however, does enforce this.  Because
27886 tuple patterns are irrefutable, there will only ever be one tuple
27887 pattern in a case expression and there will be no default.
27888
27889 <:XML:> contains value, exception, and mutually recursive function
27890 declarations.  There are no free type variables in <:XML:>.  All type
27891 variables are explicitly bound at either a value or function
27892 declaration.  At some point in the future, exception declarations may
27893 go away, and exceptions may be represented with a single datatype
27894 containing a `unit ref` component to implement genericity.
27895
27896 <:XML:> expressions are like those of <:CoreML:>, with the following
27897 exceptions.  There are no records expressions.  After type inference,
27898 all records (some of which may have originally been tuples in the
27899 source) are converted to tuples, because once flexible record patterns
27900 have been resolved, tuple labels are superfluous.  Tuple components
27901 are ordered based on the field ordering relation.  <:XML:> eta expands
27902 primitives and constructors so that there are always fully applied.
27903 Hence, the only kind of value of arrow type is a lambda.  This
27904 property is useful for flow analysis and later in code generation.
27905
27906 An <:XML:> program is a list of toplevel datatype declarations and a
27907 body expression.  Because datatype declarations are not generative,
27908 the defunctorizer can safely move them to toplevel.
27909
27910 <<<
27911
27912 :mlton-guide-page: XMLShrink
27913 [[XMLShrink]]
27914 XMLShrink
27915 =========
27916
27917 XMLShrink is an optimization pass for the <:XML:>
27918 <:IntermediateLanguage:>, invoked from <:XMLSimplify:>.
27919
27920 == Description ==
27921
27922 This pass performs optimizations based on a reduction system.
27923
27924 == Implementation ==
27925
27926 * <!ViewGitFile(mlton,master,mlton/xml/shrink.sig)>
27927 * <!ViewGitFile(mlton,master,mlton/xml/shrink.fun)>
27928
27929 == Details and Notes ==
27930
27931 The simplifier is based on <!Cite(AppelJim97, Shrinking Lambda
27932 Expressions in Linear Time)>.
27933
27934 The source program may contain functions that are only called once, or
27935 not even called at all.  Match compilation introduces many such
27936 functions.  In order to reduce the program size, speed up later
27937 phases, and improve the flow analysis, a source to source simplifier
27938 is run on <:XML:> after type inference and match compilation.
27939
27940 The simplifier implements the reductions shown below.  The reductions
27941 eliminate unnecessary declarations (see the side constraint in the
27942 figure), applications where the function is immediate, and case
27943 statements where the test is immediate.  Declarations can be
27944 eliminated only when the expression is nonexpansive (see Section 4.7
27945 of the <:DefinitionOfStandardML: Definition>), which is a syntactic
27946 condition that ensures that the expression has no effects
27947 (assignments, raises, or nontermination).  The reductions on case
27948 statements do not show the other irrelevant cases that may exist.  The
27949 reductions were chosen so that they were strongly normalizing and so
27950 that they never increased tree size.
27951
27952 * {empty}
27953 +
27954 --
27955 [source,sml]
27956 ----
27957 let x = e1 in e2
27958 ----
27959
27960 reduces to
27961
27962 [source,sml]
27963 ----
27964 e2 [x -> e1]
27965 ----
27966
27967 if `e1` is a constant or variable or if `e1` is nonexpansive and `x` occurs zero or one time in `e2`
27968 --
27969
27970 * {empty}
27971 +
27972 --
27973 [source,sml]
27974 ----
27975 (fn x => e1) e2
27976 ----
27977
27978 reduces to
27979
27980 [source,sml]
27981 ----
27982 let x = e2 in e1
27983 ----
27984 --
27985
27986 * {empty}
27987 +
27988 --
27989 [source,sml]
27990 ----
27991 e1 handle e2
27992 ----
27993
27994 reduces to
27995
27996 [source,sml]
27997 ----
27998 e1
27999 ----
28000
28001 if `e1` is nonexpansive
28002 --
28003
28004 * {empty}
28005 +
28006 --
28007 [source,sml]
28008 ----
28009 case let d in e end of p1 => e1 ...
28010 ----
28011
28012 reduces to
28013
28014 [source,sml]
28015 ----
28016 let d in case e of p1 => e1 ... end
28017 ----
28018 --
28019
28020 * {empty}
28021 +
28022 --
28023 [source,sml]
28024 ----
28025 case C e1 of C x => e2
28026 ----
28027
28028 reduces to
28029
28030 [source,sml]
28031 ----
28032 let x = e1 in e2
28033 ----
28034 --
28035
28036 <<<
28037
28038 :mlton-guide-page: XMLSimplify
28039 [[XMLSimplify]]
28040 XMLSimplify
28041 ===========
28042
28043 The optimization passes for the <:XML:> <:IntermediateLanguage:> are
28044 collected and controlled by the `XmlSimplify` functor
28045 (<!ViewGitFile(mlton,master,mlton/xml/xml-simplify.sig)>,
28046 <!ViewGitFile(mlton,master,mlton/xml/xml-simplify.fun)>).
28047
28048 The following optimization passes are implemented:
28049
28050 * <:XMLSimplifyTypes:>
28051 * <:XMLShrink:>
28052
28053 The optimization passes can be controlled from the command-line by the options
28054
28055 * `-diag-pass <pass>` -- keep diagnostic info for pass
28056 * `-disable-pass <pass>` -- skip optimization pass (if normally performed)
28057 * `-enable-pass <pass>` -- perform optimization pass (if normally skipped)
28058 * `-keep-pass <pass>` -- keep the results of pass
28059 * `-xml-passes <passes>` -- xml optimization passes
28060
28061 <<<
28062
28063 :mlton-guide-page: XMLSimplifyTypes
28064 [[XMLSimplifyTypes]]
28065 XMLSimplifyTypes
28066 ================
28067
28068 <:XMLSimplifyTypes:> is an optimization pass for the <:XML:>
28069 <:IntermediateLanguage:>, invoked from <:XMLSimplify:>.
28070
28071 == Description ==
28072
28073 This pass simplifies types in an <:XML:> program, eliminating all
28074 unused type arguments.
28075
28076 == Implementation ==
28077
28078 * <!ViewGitFile(mlton,master,mlton/xml/simplify-types.sig)>
28079 * <!ViewGitFile(mlton,master,mlton/xml/simplify-types.fun)>
28080
28081 == Details and Notes ==
28082
28083 It first computes a simple fixpoint on all the `datatype` declarations
28084 to determine which `datatype` `tycon` args are actually used.  Then it
28085 does a single pass over the program to determine which polymorphic
28086 declaration type variables are used, and rewrites types to eliminate
28087 unused type arguments.
28088
28089 This pass should eliminate any spurious duplication that the
28090 <:Monomorphise:> pass might perform due to phantom types.
28091
28092 <<<
28093
28094 :mlton-guide-page: Zone
28095 [[Zone]]
28096 Zone
28097 ====
28098
28099 <:Zone:> is an optimization pass for the <:SSA2:>
28100 <:IntermediateLanguage:>, invoked from <:SSA2Simplify:>.
28101
28102 == Description ==
28103
28104 This pass breaks large <:SSA2:> functions into zones, which are
28105 connected subgraphs of the dominator tree.  For each zone, at the node
28106 that dominates the zone (the "zone root"), it places a tuple
28107 collecting all of the live variables at that node.  It replaces any
28108 variables used in that zone with offsets from the tuple.  The goal is
28109 to decrease the liveness information in large <:SSA:> functions.
28110
28111 == Implementation ==
28112
28113 * <!ViewGitFile(mlton,master,mlton/ssa/zone.fun)>
28114
28115 == Details and Notes ==
28116
28117 Compute strongly-connected components to avoid put tuple constructions
28118 in loops.
28119
28120 There are two (expert) flags that govern the use of this pass
28121
28122 * `-max-function-size <n>`
28123 * `-zone-cut-depth <n>`
28124
28125 Zone splitting only works when the number of basic blocks in a
28126 function is greater than `n`.  The `n` used to cut the dominator tree
28127 is set by `-zone-cut-depth`.
28128
28129 There is currently no attempt to be safe-for-space.  That is, the
28130 tuples are not restricted to containing only "small" values.
28131
28132 In the `HOL` program, the particular problem is the main function,
28133 which has 161,783 blocks and 257,519 variables -- the product of those
28134 two numbers being about 41 billion.  Now, we're not likely going to
28135 need that much space since we use a sparse representation.  But even
28136 1/100th would really hurt.  And of course this rules out bit vectors.
28137
28138 <<<
28139
28140 :mlton-guide-page: ZZZOrphanedPages
28141 [[ZZZOrphanedPages]]
28142 ZZZOrphanedPages
28143 ================
28144
28145 The contents of these pages have been moved to other pages.
28146
28147 These templates are used by other pages.
28148
28149  * <:CompilerPassTemplate:>
28150  * <:TalkTemplate:>
28151
28152 <<<