[hcoop/debian/mlton.git] / doc / hacker-guide / mlton.tex

\chap{MLton}{mlton}

This chapter describes the compiler proper, which is found in the {\tt mlton}
directory.

\section{Sources}

\begin{description}

\place{ast}
Abstract syntax trees produced by the front end.

\place{atoms}
Common atomic pieces of syntax trees used throughout the compiler, like
constants, primitives, variables, and types.

\place{backend}
The backend translates from the {\tt Cps} IL to a machine independent IL called
{\tt Machine}.  It decides data representations, stack frame layouts, and creates
runtime system information like limit checks and bitmasks.

\place{call-main.sml}
A one-line file that is the last line of the compiler sources.  It calls the
main function.

\place{closure-convert}
The closure converter, which converts from {\tt Sxml}, the higher-order
simply-typed IL, to {\tt Cps}, the first-order simply-typed IL.

\place{cm}
Support for {\smlnj}-style compilation manager (CM) files.

\place{codegen}
Both the C and the native X86 code generator.

\place{control}
Compiler switches used throughout the rest of the compiler.

\place{core-ml}
The implicitly typed IL that results from defunctorization.  Contains a
pass of dead code elimination for eliminating basis library code.  Also contains
the pass that replaces constants defined by {\tt \_prim} with their values.

\place{elaborate}
The elaborator, which matches variable uses with bindings in the AST IL and
defunctorizes to produce a {\tt CoreML} program.  It does not do type checking
yet, but will someday. 

\place{front-end}
The lexer and parser, which turn files into ASTs.

\place{main}
The two main structures in the compiler, one ({\tt Main}) for handling all the
command line switches and one ({\tt Compile}) which is a high-level view of the
the compiler passes, from front end to code generation.

\place{Makefile}
To make the compiler.

\place{mlton.cm}
An automatically generated file ({\tt make mlton.cm}) that lists all of the
files (in order) that make up the compiler.

\place{mlton.sml}
An automatically generated file ({\tt make mlton.sml}) that contains all of the
compiler sources concatenated together.

\place{rcps}
An experimental IL, similar to CPS, but with more expressive types for
describing representations (hence the ``r'').  Not yet in use.

\place{sources.cm}
For compiling with {\smlnj}.

\place{ssa}
Static-Single-Assignment form, the first-order simply-typed IL on which most
optimization is performed.  There are roughly 20 different optimization passes
(some of which run several times).

\place{type-inference}
The type inference pass, which translates from {\tt CoreML} to {\tt Xml}.

\place{xml}
The {\tt Xml} and {\tt Sxml} intermediate languages.  Also, the passes that
monomorphise, do polvariance, and implement exceptions.

\end{description}

\section{Compiler Overview}

\figref{structure} shows the overall structure of the compiler.  Intermediate
languages (ILs) are shown in ovals.  The names of compiler passes adorn arrows
between ILs.  In this section I give a brief description of each pass and a
pointer to a later section that covers the pass in detail.  Each IL also has a
separate section devoted to it.
\figBegin
\centerline{\epsfig{file=structure.ps,width=5.0in}}
\figEnd{Compiler structure}
       {structure}

The front end (\chapref{front-end}) takes SML source code (a complete
program) and performs lexing and parsing, producing an abstract syntax
tree (\chapref{ast}).  The lexer is produced by
ml-lex\cite{AppelEtAl94} and the parser is produced by
ml-yacc\cite{TarditiAppel94}.  The specifications for 
the lexer and parser were originally taken from \smlnj 109.32.  The
lexer is unchanged.  I have substantially modified the actions in the
grammar to produce my own version of abstract syntax trees (similar
to, but different from {\smlnj}).

Defunctorization (\chapref{defunctorization}), translates abstract
syntax trees to a small implicitly typed core language, called Core ML
(\chapref{core-ml}).  Its primary task is to eliminate all uses of the
module system (signatures, structures, functors).  It does this by
applying all functors and flattening all structures, moving
declarations to the top level.  This phase also performs precedence
parsing of infix expressions and patterns (the code to do this was
taken from \smlnj).  Finally, it does some amount of "macro
expansion", so that the core language is smaller.

Type inference (\chapref{type-inference}) translates implicitly typed
Core ML to an explicitly typed core language, XML (\chapref{xml}),
with explicit type abstraction and application.  XML is based on the
language ``Core-XML'' described in \cite{HarperMitchell93}.  Type
inference consists of two passes.  The first pass determines the
binding sites of type variables that are not explicitly bound (section
4.6 of the Definition).  The second pass is a
pretty standard unification based Hindley-Milner type
inference\cite{DamasMilner82}.  The type inference pass also performs
overloading resolution and resolution of flexible record patterns.
This pass also performs match compilation, by which I mean the
translation of case statements with nested patterns to (nested) case
statements with flat patterns.

Monomorphisation (\chapref{monomorphisation}) translates XML to its
simply-typed subset, called SXML (\chapref{sxml}), by duplicating all
polymorphic functions and datatypes for each type at which they are
instantiated.  Monomorphisation is only possible because SML has
``let-style'' polymorphism, in which all uses of a polymorphic value
are syntactically apparent (after functors are eliminated).
Commit	Line	Data
7f918cf1 CE	1	\chap{MLton}{mlton}
	2
	3	This chapter describes the compiler proper, which is found in the {\tt mlton}
	4	directory.
	5
	6	\section{Sources}
	7
	8	\begin{description}
	9
	10	\place{ast}
	11	Abstract syntax trees produced by the front end.
	12
	13	\place{atoms}
	14	Common atomic pieces of syntax trees used throughout the compiler, like
	15	constants, primitives, variables, and types.
	16
	17	\place{backend}
	18	The backend translates from the {\tt Cps} IL to a machine independent IL called
	19	{\tt Machine}. It decides data representations, stack frame layouts, and creates
	20	runtime system information like limit checks and bitmasks.
	21
	22	\place{call-main.sml}
	23	A one-line file that is the last line of the compiler sources. It calls the
	24	main function.
	25
	26	\place{closure-convert}
	27	The closure converter, which converts from {\tt Sxml}, the higher-order
	28	simply-typed IL, to {\tt Cps}, the first-order simply-typed IL.
	29
	30	\place{cm}
	31	Support for {\smlnj}-style compilation manager (CM) files.
	32
	33	\place{codegen}
	34	Both the C and the native X86 code generator.
	35
	36	\place{control}
	37	Compiler switches used throughout the rest of the compiler.
	38
	39	\place{core-ml}
	40	The implicitly typed IL that results from defunctorization. Contains a
	41	pass of dead code elimination for eliminating basis library code. Also contains
	42	the pass that replaces constants defined by {\tt \_prim} with their values.
	43
	44	\place{elaborate}
	45	The elaborator, which matches variable uses with bindings in the AST IL and
	46	defunctorizes to produce a {\tt CoreML} program. It does not do type checking
	47	yet, but will someday.
	48
	49	\place{front-end}
	50	The lexer and parser, which turn files into ASTs.
	51
	52	\place{main}
	53	The two main structures in the compiler, one ({\tt Main}) for handling all the
	54	command line switches and one ({\tt Compile}) which is a high-level view of the
	55	the compiler passes, from front end to code generation.
	56
	57	\place{Makefile}
	58	To make the compiler.
	59
	60	\place{mlton.cm}
	61	An automatically generated file ({\tt make mlton.cm}) that lists all of the
	62	files (in order) that make up the compiler.
	63
	64	\place{mlton.sml}
65	An automatically generated file ({\tt make mlton.sml}) that contains all of the
66	compiler sources concatenated together.
67
68	\place{rcps}
69	An experimental IL, similar to CPS, but with more expressive types for
70	describing representations (hence the ``r''). Not yet in use.
71
72	\place{sources.cm}
73	For compiling with {\smlnj}.
74
75	\place{ssa}
76	Static-Single-Assignment form, the first-order simply-typed IL on which most
77	optimization is performed. There are roughly 20 different optimization passes
78	(some of which run several times).
79
80	\place{type-inference}
81	The type inference pass, which translates from {\tt CoreML} to {\tt Xml}.
82
83	\place{xml}
84	The {\tt Xml} and {\tt Sxml} intermediate languages. Also, the passes that
85	monomorphise, do polvariance, and implement exceptions.
86
87	\end{description}
88
89	\section{Compiler Overview}
90
91	\figref{structure} shows the overall structure of the compiler. Intermediate
92	languages (ILs) are shown in ovals. The names of compiler passes adorn arrows
93	between ILs. In this section I give a brief description of each pass and a
94	pointer to a later section that covers the pass in detail. Each IL also has a
95	separate section devoted to it.
96	\figBegin
97	\centerline{\epsfig{file=structure.ps,width=5.0in}}
98	\figEnd{Compiler structure}
99	{structure}
100
101	The front end (\chapref{front-end}) takes SML source code (a complete
102	program) and performs lexing and parsing, producing an abstract syntax
103	tree (\chapref{ast}). The lexer is produced by
104	ml-lex\cite{AppelEtAl94} and the parser is produced by
105	ml-yacc\cite{TarditiAppel94}. The specifications for
106	the lexer and parser were originally taken from \smlnj 109.32. The
107	lexer is unchanged. I have substantially modified the actions in the
108	grammar to produce my own version of abstract syntax trees (similar
109	to, but different from {\smlnj}).
110
111	Defunctorization (\chapref{defunctorization}), translates abstract
112	syntax trees to a small implicitly typed core language, called Core ML
113	(\chapref{core-ml}). Its primary task is to eliminate all uses of the
114	module system (signatures, structures, functors). It does this by
115	applying all functors and flattening all structures, moving
116	declarations to the top level. This phase also performs precedence
117	parsing of infix expressions and patterns (the code to do this was
118	taken from \smlnj). Finally, it does some amount of "macro
119	expansion", so that the core language is smaller.
120
121	Type inference (\chapref{type-inference}) translates implicitly typed
122	Core ML to an explicitly typed core language, XML (\chapref{xml}),
123	with explicit type abstraction and application. XML is based on the
124	language ``Core-XML'' described in \cite{HarperMitchell93}. Type
125	inference consists of two passes. The first pass determines the
126	binding sites of type variables that are not explicitly bound (section
127	4.6 of the Definition). The second pass is a
128	pretty standard unification based Hindley-Milner type
129	inference\cite{DamasMilner82}. The type inference pass also performs
130	overloading resolution and resolution of flexible record patterns.
131	This pass also performs match compilation, by which I mean the
132	translation of case statements with nested patterns to (nested) case
133	statements with flat patterns.
134
135	Monomorphisation (\chapref{monomorphisation}) translates XML to its
136	simply-typed subset, called SXML (\chapref{sxml}), by duplicating all
137	polymorphic functions and datatypes for each type at which they are
138	instantiated. Monomorphisation is only possible because SML has
139	``let-style'' polymorphism, in which all uses of a polymorphic value
140	are syntactically apparent (after functors are eliminated).