| 1 | \chap{MLton}{mlton} |
| 2 | |
| 3 | This chapter describes the compiler proper, which is found in the {\tt mlton} |
| 4 | directory. |
| 5 | |
| 6 | \section{Sources} |
| 7 | |
| 8 | \begin{description} |
| 9 | |
| 10 | \place{ast} |
| 11 | Abstract syntax trees produced by the front end. |
| 12 | |
| 13 | \place{atoms} |
| 14 | Common atomic pieces of syntax trees used throughout the compiler, like |
| 15 | constants, primitives, variables, and types. |
| 16 | |
| 17 | \place{backend} |
| 18 | The backend translates from the {\tt Cps} IL to a machine independent IL called |
| 19 | {\tt Machine}. It decides data representations, stack frame layouts, and creates |
| 20 | runtime system information like limit checks and bitmasks. |
| 21 | |
| 22 | \place{call-main.sml} |
| 23 | A one-line file that is the last line of the compiler sources. It calls the |
| 24 | main function. |
| 25 | |
| 26 | \place{closure-convert} |
| 27 | The closure converter, which converts from {\tt Sxml}, the higher-order |
| 28 | simply-typed IL, to {\tt Cps}, the first-order simply-typed IL. |
| 29 | |
| 30 | \place{cm} |
| 31 | Support for {\smlnj}-style compilation manager (CM) files. |
| 32 | |
| 33 | \place{codegen} |
| 34 | Both the C and the native X86 code generator. |
| 35 | |
| 36 | \place{control} |
| 37 | Compiler switches used throughout the rest of the compiler. |
| 38 | |
| 39 | \place{core-ml} |
| 40 | The implicitly typed IL that results from defunctorization. Contains a |
| 41 | pass of dead code elimination for eliminating basis library code. Also contains |
| 42 | the pass that replaces constants defined by {\tt \_prim} with their values. |
| 43 | |
| 44 | \place{elaborate} |
| 45 | The elaborator, which matches variable uses with bindings in the AST IL and |
| 46 | defunctorizes to produce a {\tt CoreML} program. It does not do type checking |
| 47 | yet, but will someday. |
| 48 | |
| 49 | \place{front-end} |
| 50 | The lexer and parser, which turn files into ASTs. |
| 51 | |
| 52 | \place{main} |
| 53 | The two main structures in the compiler, one ({\tt Main}) for handling all the |
| 54 | command line switches and one ({\tt Compile}) which is a high-level view of the |
| 55 | the compiler passes, from front end to code generation. |
| 56 | |
| 57 | \place{Makefile} |
| 58 | To make the compiler. |
| 59 | |
| 60 | \place{mlton.cm} |
| 61 | An automatically generated file ({\tt make mlton.cm}) that lists all of the |
| 62 | files (in order) that make up the compiler. |
| 63 | |
| 64 | \place{mlton.sml} |
| 65 | An automatically generated file ({\tt make mlton.sml}) that contains all of the |
| 66 | compiler sources concatenated together. |
| 67 | |
| 68 | \place{rcps} |
| 69 | An experimental IL, similar to CPS, but with more expressive types for |
| 70 | describing representations (hence the ``r''). Not yet in use. |
| 71 | |
| 72 | \place{sources.cm} |
| 73 | For compiling with {\smlnj}. |
| 74 | |
| 75 | \place{ssa} |
| 76 | Static-Single-Assignment form, the first-order simply-typed IL on which most |
| 77 | optimization is performed. There are roughly 20 different optimization passes |
| 78 | (some of which run several times). |
| 79 | |
| 80 | \place{type-inference} |
| 81 | The type inference pass, which translates from {\tt CoreML} to {\tt Xml}. |
| 82 | |
| 83 | \place{xml} |
| 84 | The {\tt Xml} and {\tt Sxml} intermediate languages. Also, the passes that |
| 85 | monomorphise, do polvariance, and implement exceptions. |
| 86 | |
| 87 | \end{description} |
| 88 | |
| 89 | \section{Compiler Overview} |
| 90 | |
| 91 | \figref{structure} shows the overall structure of the compiler. Intermediate |
| 92 | languages (ILs) are shown in ovals. The names of compiler passes adorn arrows |
| 93 | between ILs. In this section I give a brief description of each pass and a |
| 94 | pointer to a later section that covers the pass in detail. Each IL also has a |
| 95 | separate section devoted to it. |
| 96 | \figBegin |
| 97 | \centerline{\epsfig{file=structure.ps,width=5.0in}} |
| 98 | \figEnd{Compiler structure} |
| 99 | {structure} |
| 100 | |
| 101 | The front end (\chapref{front-end}) takes SML source code (a complete |
| 102 | program) and performs lexing and parsing, producing an abstract syntax |
| 103 | tree (\chapref{ast}). The lexer is produced by |
| 104 | ml-lex\cite{AppelEtAl94} and the parser is produced by |
| 105 | ml-yacc\cite{TarditiAppel94}. The specifications for |
| 106 | the lexer and parser were originally taken from \smlnj 109.32. The |
| 107 | lexer is unchanged. I have substantially modified the actions in the |
| 108 | grammar to produce my own version of abstract syntax trees (similar |
| 109 | to, but different from {\smlnj}). |
| 110 | |
| 111 | Defunctorization (\chapref{defunctorization}), translates abstract |
| 112 | syntax trees to a small implicitly typed core language, called Core ML |
| 113 | (\chapref{core-ml}). Its primary task is to eliminate all uses of the |
| 114 | module system (signatures, structures, functors). It does this by |
| 115 | applying all functors and flattening all structures, moving |
| 116 | declarations to the top level. This phase also performs precedence |
| 117 | parsing of infix expressions and patterns (the code to do this was |
| 118 | taken from \smlnj). Finally, it does some amount of "macro |
| 119 | expansion", so that the core language is smaller. |
| 120 | |
| 121 | Type inference (\chapref{type-inference}) translates implicitly typed |
| 122 | Core ML to an explicitly typed core language, XML (\chapref{xml}), |
| 123 | with explicit type abstraction and application. XML is based on the |
| 124 | language ``Core-XML'' described in \cite{HarperMitchell93}. Type |
| 125 | inference consists of two passes. The first pass determines the |
| 126 | binding sites of type variables that are not explicitly bound (section |
| 127 | 4.6 of the Definition). The second pass is a |
| 128 | pretty standard unification based Hindley-Milner type |
| 129 | inference\cite{DamasMilner82}. The type inference pass also performs |
| 130 | overloading resolution and resolution of flexible record patterns. |
| 131 | This pass also performs match compilation, by which I mean the |
| 132 | translation of case statements with nested patterns to (nested) case |
| 133 | statements with flat patterns. |
| 134 | |
| 135 | Monomorphisation (\chapref{monomorphisation}) translates XML to its |
| 136 | simply-typed subset, called SXML (\chapref{sxml}), by duplicating all |
| 137 | polymorphic functions and datatypes for each type at which they are |
| 138 | instantiated. Monomorphisation is only possible because SML has |
| 139 | ``let-style'' polymorphism, in which all uses of a polymorphic value |
| 140 | are syntactically apparent (after functors are eliminated). |