Import Debian changes 20180207-1
[hcoop/debian/mlton.git] / doc / hacker-guide / mlton.tex
CommitLineData
7f918cf1
CE
1\chap{MLton}{mlton}
2
3This chapter describes the compiler proper, which is found in the {\tt mlton}
4directory.
5
6\section{Sources}
7
8\begin{description}
9
10\place{ast}
11Abstract syntax trees produced by the front end.
12
13\place{atoms}
14Common atomic pieces of syntax trees used throughout the compiler, like
15constants, primitives, variables, and types.
16
17\place{backend}
18The backend translates from the {\tt Cps} IL to a machine independent IL called
19{\tt Machine}. It decides data representations, stack frame layouts, and creates
20runtime system information like limit checks and bitmasks.
21
22\place{call-main.sml}
23A one-line file that is the last line of the compiler sources. It calls the
24main function.
25
26\place{closure-convert}
27The closure converter, which converts from {\tt Sxml}, the higher-order
28simply-typed IL, to {\tt Cps}, the first-order simply-typed IL.
29
30\place{cm}
31Support for {\smlnj}-style compilation manager (CM) files.
32
33\place{codegen}
34Both the C and the native X86 code generator.
35
36\place{control}
37Compiler switches used throughout the rest of the compiler.
38
39\place{core-ml}
40The implicitly typed IL that results from defunctorization. Contains a
41pass of dead code elimination for eliminating basis library code. Also contains
42the pass that replaces constants defined by {\tt \_prim} with their values.
43
44\place{elaborate}
45The elaborator, which matches variable uses with bindings in the AST IL and
46defunctorizes to produce a {\tt CoreML} program. It does not do type checking
47yet, but will someday.
48
49\place{front-end}
50The lexer and parser, which turn files into ASTs.
51
52\place{main}
53The two main structures in the compiler, one ({\tt Main}) for handling all the
54command line switches and one ({\tt Compile}) which is a high-level view of the
55the compiler passes, from front end to code generation.
56
57\place{Makefile}
58To make the compiler.
59
60\place{mlton.cm}
61An automatically generated file ({\tt make mlton.cm}) that lists all of the
62files (in order) that make up the compiler.
63
64\place{mlton.sml}
65An automatically generated file ({\tt make mlton.sml}) that contains all of the
66compiler sources concatenated together.
67
68\place{rcps}
69An experimental IL, similar to CPS, but with more expressive types for
70describing representations (hence the ``r''). Not yet in use.
71
72\place{sources.cm}
73For compiling with {\smlnj}.
74
75\place{ssa}
76Static-Single-Assignment form, the first-order simply-typed IL on which most
77optimization is performed. There are roughly 20 different optimization passes
78(some of which run several times).
79
80\place{type-inference}
81The type inference pass, which translates from {\tt CoreML} to {\tt Xml}.
82
83\place{xml}
84The {\tt Xml} and {\tt Sxml} intermediate languages. Also, the passes that
85monomorphise, do polvariance, and implement exceptions.
86
87\end{description}
88
89\section{Compiler Overview}
90
91\figref{structure} shows the overall structure of the compiler. Intermediate
92languages (ILs) are shown in ovals. The names of compiler passes adorn arrows
93between ILs. In this section I give a brief description of each pass and a
94pointer to a later section that covers the pass in detail. Each IL also has a
95separate section devoted to it.
96\figBegin
97\centerline{\epsfig{file=structure.ps,width=5.0in}}
98\figEnd{Compiler structure}
99 {structure}
100
101The front end (\chapref{front-end}) takes SML source code (a complete
102program) and performs lexing and parsing, producing an abstract syntax
103tree (\chapref{ast}). The lexer is produced by
104ml-lex\cite{AppelEtAl94} and the parser is produced by
105ml-yacc\cite{TarditiAppel94}. The specifications for
106the lexer and parser were originally taken from \smlnj 109.32. The
107lexer is unchanged. I have substantially modified the actions in the
108grammar to produce my own version of abstract syntax trees (similar
109to, but different from {\smlnj}).
110
111Defunctorization (\chapref{defunctorization}), translates abstract
112syntax trees to a small implicitly typed core language, called Core ML
113(\chapref{core-ml}). Its primary task is to eliminate all uses of the
114module system (signatures, structures, functors). It does this by
115applying all functors and flattening all structures, moving
116declarations to the top level. This phase also performs precedence
117parsing of infix expressions and patterns (the code to do this was
118taken from \smlnj). Finally, it does some amount of "macro
119expansion", so that the core language is smaller.
120
121Type inference (\chapref{type-inference}) translates implicitly typed
122Core ML to an explicitly typed core language, XML (\chapref{xml}),
123with explicit type abstraction and application. XML is based on the
124language ``Core-XML'' described in \cite{HarperMitchell93}. Type
125inference consists of two passes. The first pass determines the
126binding sites of type variables that are not explicitly bound (section
1274.6 of the Definition). The second pass is a
128pretty standard unification based Hindley-Milner type
129inference\cite{DamasMilner82}. The type inference pass also performs
130overloading resolution and resolution of flexible record patterns.
131This pass also performs match compilation, by which I mean the
132translation of case statements with nested patterns to (nested) case
133statements with flat patterns.
134
135Monomorphisation (\chapref{monomorphisation}) translates XML to its
136simply-typed subset, called SXML (\chapref{sxml}), by duplicating all
137polymorphic functions and datatypes for each type at which they are
138instantiated. Monomorphisation is only possible because SML has
139``let-style'' polymorphism, in which all uses of a polymorphic value
140are syntactically apparent (after functors are eliminated).