doc/hacker-guide/mlton.tex

   1 \chap{MLton}{mlton}
   2
   3 This chapter describes the compiler proper, which is found in the {\tt mlton}
   4 directory.
   5
   6 \section{Sources}
   7
   8 \begin{description}
   9
  10 \place{ast}
  11 Abstract syntax trees produced by the front end.
  12
  13 \place{atoms}
  14 Common atomic pieces of syntax trees used throughout the compiler, like
  15 constants, primitives, variables, and types.
  16
  17 \place{backend}
  18 The backend translates from the {\tt Cps} IL to a machine independent IL called
  19 {\tt Machine}.  It decides data representations, stack frame layouts, and creates
  20 runtime system information like limit checks and bitmasks.
  21
  22 \place{call-main.sml}
  23 A one-line file that is the last line of the compiler sources.  It calls the
  24 main function.
  25
  26 \place{closure-convert}
  27 The closure converter, which converts from {\tt Sxml}, the higher-order
  28 simply-typed IL, to {\tt Cps}, the first-order simply-typed IL.
  29
  30 \place{cm}
  31 Support for {\smlnj}-style compilation manager (CM) files.
  32
  33 \place{codegen}
  34 Both the C and the native X86 code generator.
  35
  36 \place{control}
  37 Compiler switches used throughout the rest of the compiler.
  38
  39 \place{core-ml}
  40 The implicitly typed IL that results from defunctorization.  Contains a
  41 pass of dead code elimination for eliminating basis library code.  Also contains
  42 the pass that replaces constants defined by {\tt \_prim} with their values.
  43
  44 \place{elaborate}
  45 The elaborator, which matches variable uses with bindings in the AST IL and
  46 defunctorizes to produce a {\tt CoreML} program.  It does not do type checking
  47 yet, but will someday.
  48
  49 \place{front-end}
  50 The lexer and parser, which turn files into ASTs.
  51
  52 \place{main}
  53 The two main structures in the compiler, one ({\tt Main}) for handling all the
  54 command line switches and one ({\tt Compile}) which is a high-level view of the
  55 the compiler passes, from front end to code generation.
  56
  57 \place{Makefile}
  58 To make the compiler.
  59
  60 \place{mlton.cm}
  61 An automatically generated file ({\tt make mlton.cm}) that lists all of the
  62 files (in order) that make up the compiler.
  63
  64 \place{mlton.sml}
  65 An automatically generated file ({\tt make mlton.sml}) that contains all of the
  66 compiler sources concatenated together.
  67
  68 \place{rcps}
  69 An experimental IL, similar to CPS, but with more expressive types for
  70 describing representations (hence the ``r'').  Not yet in use.
  71
  72 \place{sources.cm}
  73 For compiling with {\smlnj}.
  74
  75 \place{ssa}
  76 Static-Single-Assignment form, the first-order simply-typed IL on which most
  77 optimization is performed.  There are roughly 20 different optimization passes
  78 (some of which run several times).
  79
  80 \place{type-inference}
  81 The type inference pass, which translates from {\tt CoreML} to {\tt Xml}.
  82
  83 \place{xml}
  84 The {\tt Xml} and {\tt Sxml} intermediate languages.  Also, the passes that
  85 monomorphise, do polvariance, and implement exceptions.
  86
  87 \end{description}
  88
  89 \section{Compiler Overview}
  90
  91 \figref{structure} shows the overall structure of the compiler.  Intermediate
  92 languages (ILs) are shown in ovals.  The names of compiler passes adorn arrows
  93 between ILs.  In this section I give a brief description of each pass and a
  94 pointer to a later section that covers the pass in detail.  Each IL also has a
  95 separate section devoted to it.
  96 \figBegin
  97 \centerline{\epsfig{file=structure.ps,width=5.0in}}
  98 \figEnd{Compiler structure}
  99        {structure}
 100
 101 The front end (\chapref{front-end}) takes SML source code (a complete
 102 program) and performs lexing and parsing, producing an abstract syntax
 103 tree (\chapref{ast}).  The lexer is produced by
 104 ml-lex\cite{AppelEtAl94} and the parser is produced by
 105 ml-yacc\cite{TarditiAppel94}.  The specifications for
 106 the lexer and parser were originally taken from \smlnj 109.32.  The
 107 lexer is unchanged.  I have substantially modified the actions in the
 108 grammar to produce my own version of abstract syntax trees (similar
 109 to, but different from {\smlnj}).
 110
 111 Defunctorization (\chapref{defunctorization}), translates abstract
 112 syntax trees to a small implicitly typed core language, called Core ML
 113 (\chapref{core-ml}).  Its primary task is to eliminate all uses of the
 114 module system (signatures, structures, functors).  It does this by
 115 applying all functors and flattening all structures, moving
 116 declarations to the top level.  This phase also performs precedence
 117 parsing of infix expressions and patterns (the code to do this was
 118 taken from \smlnj).  Finally, it does some amount of "macro
 119 expansion", so that the core language is smaller.
 120
 121 Type inference (\chapref{type-inference}) translates implicitly typed
 122 Core ML to an explicitly typed core language, XML (\chapref{xml}),
 123 with explicit type abstraction and application.  XML is based on the
 124 language ``Core-XML'' described in \cite{HarperMitchell93}.  Type
 125 inference consists of two passes.  The first pass determines the
 126 binding sites of type variables that are not explicitly bound (section
 127 4.6 of the Definition).  The second pass is a
 128 pretty standard unification based Hindley-Milner type
 129 inference\cite{DamasMilner82}.  The type inference pass also performs
 130 overloading resolution and resolution of flexible record patterns.
 131 This pass also performs match compilation, by which I mean the
 132 translation of case statements with nested patterns to (nested) case
 133 statements with flat patterns.
 134
 135 Monomorphisation (\chapref{monomorphisation}) translates XML to its
 136 simply-typed subset, called SXML (\chapref{sxml}), by duplicating all
 137 polymorphic functions and datatypes for each type at which they are
 138 instantiated.  Monomorphisation is only possible because SML has
 139 ``let-style'' polymorphism, in which all uses of a polymorphic value
 140 are syntactically apparent (after functors are eliminated).