Commit | Line | Data |
---|---|---|
7f918cf1 CE |
1 | \chap{MLton}{mlton} |
2 | ||
3 | This chapter describes the compiler proper, which is found in the {\tt mlton} | |
4 | directory. | |
5 | ||
6 | \section{Sources} | |
7 | ||
8 | \begin{description} | |
9 | ||
10 | \place{ast} | |
11 | Abstract syntax trees produced by the front end. | |
12 | ||
13 | \place{atoms} | |
14 | Common atomic pieces of syntax trees used throughout the compiler, like | |
15 | constants, primitives, variables, and types. | |
16 | ||
17 | \place{backend} | |
18 | The backend translates from the {\tt Cps} IL to a machine independent IL called | |
19 | {\tt Machine}. It decides data representations, stack frame layouts, and creates | |
20 | runtime system information like limit checks and bitmasks. | |
21 | ||
22 | \place{call-main.sml} | |
23 | A one-line file that is the last line of the compiler sources. It calls the | |
24 | main function. | |
25 | ||
26 | \place{closure-convert} | |
27 | The closure converter, which converts from {\tt Sxml}, the higher-order | |
28 | simply-typed IL, to {\tt Cps}, the first-order simply-typed IL. | |
29 | ||
30 | \place{cm} | |
31 | Support for {\smlnj}-style compilation manager (CM) files. | |
32 | ||
33 | \place{codegen} | |
34 | Both the C and the native X86 code generator. | |
35 | ||
36 | \place{control} | |
37 | Compiler switches used throughout the rest of the compiler. | |
38 | ||
39 | \place{core-ml} | |
40 | The implicitly typed IL that results from defunctorization. Contains a | |
41 | pass of dead code elimination for eliminating basis library code. Also contains | |
42 | the pass that replaces constants defined by {\tt \_prim} with their values. | |
43 | ||
44 | \place{elaborate} | |
45 | The elaborator, which matches variable uses with bindings in the AST IL and | |
46 | defunctorizes to produce a {\tt CoreML} program. It does not do type checking | |
47 | yet, but will someday. | |
48 | ||
49 | \place{front-end} | |
50 | The lexer and parser, which turn files into ASTs. | |
51 | ||
52 | \place{main} | |
53 | The two main structures in the compiler, one ({\tt Main}) for handling all the | |
54 | command line switches and one ({\tt Compile}) which is a high-level view of the | |
55 | the compiler passes, from front end to code generation. | |
56 | ||
57 | \place{Makefile} | |
58 | To make the compiler. | |
59 | ||
60 | \place{mlton.cm} | |
61 | An automatically generated file ({\tt make mlton.cm}) that lists all of the | |
62 | files (in order) that make up the compiler. | |
63 | ||
64 | \place{mlton.sml} | |
65 | An automatically generated file ({\tt make mlton.sml}) that contains all of the | |
66 | compiler sources concatenated together. | |
67 | ||
68 | \place{rcps} | |
69 | An experimental IL, similar to CPS, but with more expressive types for | |
70 | describing representations (hence the ``r''). Not yet in use. | |
71 | ||
72 | \place{sources.cm} | |
73 | For compiling with {\smlnj}. | |
74 | ||
75 | \place{ssa} | |
76 | Static-Single-Assignment form, the first-order simply-typed IL on which most | |
77 | optimization is performed. There are roughly 20 different optimization passes | |
78 | (some of which run several times). | |
79 | ||
80 | \place{type-inference} | |
81 | The type inference pass, which translates from {\tt CoreML} to {\tt Xml}. | |
82 | ||
83 | \place{xml} | |
84 | The {\tt Xml} and {\tt Sxml} intermediate languages. Also, the passes that | |
85 | monomorphise, do polvariance, and implement exceptions. | |
86 | ||
87 | \end{description} | |
88 | ||
89 | \section{Compiler Overview} | |
90 | ||
91 | \figref{structure} shows the overall structure of the compiler. Intermediate | |
92 | languages (ILs) are shown in ovals. The names of compiler passes adorn arrows | |
93 | between ILs. In this section I give a brief description of each pass and a | |
94 | pointer to a later section that covers the pass in detail. Each IL also has a | |
95 | separate section devoted to it. | |
96 | \figBegin | |
97 | \centerline{\epsfig{file=structure.ps,width=5.0in}} | |
98 | \figEnd{Compiler structure} | |
99 | {structure} | |
100 | ||
101 | The front end (\chapref{front-end}) takes SML source code (a complete | |
102 | program) and performs lexing and parsing, producing an abstract syntax | |
103 | tree (\chapref{ast}). The lexer is produced by | |
104 | ml-lex\cite{AppelEtAl94} and the parser is produced by | |
105 | ml-yacc\cite{TarditiAppel94}. The specifications for | |
106 | the lexer and parser were originally taken from \smlnj 109.32. The | |
107 | lexer is unchanged. I have substantially modified the actions in the | |
108 | grammar to produce my own version of abstract syntax trees (similar | |
109 | to, but different from {\smlnj}). | |
110 | ||
111 | Defunctorization (\chapref{defunctorization}), translates abstract | |
112 | syntax trees to a small implicitly typed core language, called Core ML | |
113 | (\chapref{core-ml}). Its primary task is to eliminate all uses of the | |
114 | module system (signatures, structures, functors). It does this by | |
115 | applying all functors and flattening all structures, moving | |
116 | declarations to the top level. This phase also performs precedence | |
117 | parsing of infix expressions and patterns (the code to do this was | |
118 | taken from \smlnj). Finally, it does some amount of "macro | |
119 | expansion", so that the core language is smaller. | |
120 | ||
121 | Type inference (\chapref{type-inference}) translates implicitly typed | |
122 | Core ML to an explicitly typed core language, XML (\chapref{xml}), | |
123 | with explicit type abstraction and application. XML is based on the | |
124 | language ``Core-XML'' described in \cite{HarperMitchell93}. Type | |
125 | inference consists of two passes. The first pass determines the | |
126 | binding sites of type variables that are not explicitly bound (section | |
127 | 4.6 of the Definition). The second pass is a | |
128 | pretty standard unification based Hindley-Milner type | |
129 | inference\cite{DamasMilner82}. The type inference pass also performs | |
130 | overloading resolution and resolution of flexible record patterns. | |
131 | This pass also performs match compilation, by which I mean the | |
132 | translation of case statements with nested patterns to (nested) case | |
133 | statements with flat patterns. | |
134 | ||
135 | Monomorphisation (\chapref{monomorphisation}) translates XML to its | |
136 | simply-typed subset, called SXML (\chapref{sxml}), by duplicating all | |
137 | polymorphic functions and datatypes for each type at which they are | |
138 | instantiated. Monomorphisation is only possible because SML has | |
139 | ``let-style'' polymorphism, in which all uses of a polymorphic value | |
140 | are syntactically apparent (after functors are eliminated). |