Import Upstream version 20180207
[hcoop/debian/mlton.git] / doc / style-guide / main.tex
CommitLineData
7f918cf1
CE
1\documentclass[12pt]{article}
2\usepackage{alltt,epsfig,html,latexsym,longtable,makeidx,moreverb}
3
4\setlength\topmargin{-0.5in}
5\setlength\textheight{8.5in}
6\setlength\textwidth{7.0in}
7\setlength\oddsidemargin{-0.3in}
8\setlength\evensidemargin{-0.3in}
9\hyphenation{}
10\title{{\mlton} SML Style Guide}
11\author{Stephen Weeks}
12\date{\today}
13\include{macros}
14\makeindex
15
16\begin{document}
17
18\maketitle
19\input{abstract}
20
21% conventions chosen so that inertia is towards modularity and reuse
22% not to type fewer characters
23
24\sec{High-level structure}{high-level-structure}
25
26Code is structured in {\mlton} so that signatures are closed. Thus, in
27{\mlton}, one would never write the following.
28\begin{verbatim}
29signature SIG =
30 sig
31 val f: Foo.t -> int
32 end
33\end{verbatim}
34Instead, one would write the following.
35\begin{verbatim}
36signature SIG =
37 sig
38 structure Foo: FOO
39
40 val f: Foo.t -> int
41 end
42\end{verbatim}
43The benefit of this approach is that one can first understand the
44specifications (i.e. signatures) of all of the modules in {\mlton} before having
45to look at any implementations (i.e. structures or functors). That is, the
46signatures are self-contained.
47
48We deviate from this only in allowing references to top level types (like {\tt
49int}), basis library modules, and {\mlton} library modules. So, the following
50signature is fine, because structure {\tt Regexp} is part of the {\mlton}
51library.
52\begin{verbatim}
53signature SIG =
54 sig
55 val f: Regexp.t -> int
56 end
57\end{verbatim}
58
59We also use signatures to express (some of) the dependencies between modules.
60For every module {\tt Foo}, we write two signatures in a file named {\tt
61foo.sig}. The signature {\tt FOO} specifies what is implemented by {\tt Foo}.
62The signature {\tt FOO\_STRUCTS} specifies the modules that are needed in order
63to specify {\tt Foo}, but that are not implemented by {\tt Foo}. As an example,
64consider {\mlton}'s closure conversion pass (in {\tt mlton/closure-convert}),
65which converts from {\tt Sxml}, {\mlton}'s higher-order simply-typed
66intermediate language, to {\tt Cps}, {\mlton}'s first-order simply-typed
67intermediate language. The file {\tt closure-convert.sig} contains the
68following.
69\begin{verbatim}
70signature CLOSURE_CONVERT_STRUCTS =
71 sig
72 structure Sxml: SXML
73 structure Cps: CPS
74 sharing Sxml.Atoms = Cps.Atoms
75 end
76
77signature CLOSURE_CONVERT =
78 sig
79 include CLOSURE_CONVERT_STRUCTS
80
81 val closureConvert: Sxml.Program.t -> Cps.Program.t
82 end
83\end{verbatim}
84These signatures say that the {\tt ClosureConvert} module implements a function
85{\tt closureConvert} that transforms an {\tt Sxml} program into a {\tt Cps}
86program. They also say that {\tt ClosureConvert} does not implement {\tt Sxml}
87or {\tt Cps}. Rather, it expects some other modules to implement these and for
88them to be provided to {\tt ClosureConvert}. The sharing constraint expresses
89that the ILs must share some basic atoms, like constants, variables, and
90primitives.
91
92Given the two signatures that specify a module, the module definition always has
93the same structure. A module {\tt Foo} is implemented in a file named {\tt
94foo.fun}, which defines a functor named {\tt Foo} that takes as an argument a
95structure matching {\tt FOO\_STRUCTS} and returns as a result a structure
96matching {\tt FOO}. For example, {\tt closure-convert.fun} contains the
97following.
98\begin{verbatim}
99functor ClosureConvert (S: CLOSURE_CONVERT_STRUCTS): CLOSURE_CONVERT =
100struct
101
102open S
103
104fun closureConvert ...
105
106end
107\end{verbatim}
108Although the signatures for {\tt ClosureConvert} express the dependence
109on the {\tt Sxml} and {\tt Cps} ILs, they do not express the
110dependence on other modules that are only used internally to closure
111conversion. For example, closure conversion uses an auxiliary module {\tt
112AbstractValue} as part of its higher-order control-flow analysis. Because {\tt
113AbstractValue} is only used internally to closure conversion, it does not appear
114in the signatures that specify closure conversion. So, helper functors (like
115{\tt AbstractValue}) are analogous to helper functions in that they are not
116visible to clients.
117
118We do not put helper functors lexically in scope because SML only allows top
119level functor definitions and, more importantly, because files would become
120unmanageably large. Instead, helper functors get their own {\tt .sig} and {\tt
121.fun} file, which follow exactly the convention above.
122
123\section{General conventions}
124
125\begin{itemize}
126\item A line of code never exceeds 80 columns.
127\item Use alphabetical order wherever possible.
128\begin{itemize}
129\item record field names
130\item datatype constructors
131\item value specs in signatures
132\item file lists in CM files
133\item export lists in CM files
134\end{itemize}
135\end{itemize}
136
137%------------------------------------------------------
138% Signature conventions
139%------------------------------------------------------
140
141\sec{Signatures}{signature-conventions}
142
143We now enumerate the conventions we follow in writing signatures.
144
145\begin{enumerate}
146
147\item
148Signature identifiers are in all capitals, using ``\_'' to
149separate words.
150
151\item
152A signature typically contains a single type specification that defines a type
153constructor {\tt t}, which is the type of interest in the specification. For
154oexample, here are signature fragments for integers, lists, and maps.
155\begin{verbatim}
156signature INTEGER =
157 sig
158 type t
159
160 val + : t * t -> t
161 ...
162 end
163
164signature LIST =
165 sig
166 type 'a t
167
168 val map: 'a t * ('a -> 'b) -> 'b t
169 ...
170 end
171
172signature MAP
173 sig
174 type ('a, 'b) t
175
176 val extend: ('a, 'b) t * 'a * 'b -> ('a, 'b) t
177 ...
178 end
179\end{verbatim}
180Although at first it might appear confusing to name every type {\tt t}, in fact
181there is never ambiguity, because at any point in the program there is at most
182one unqualified {\tt t} in scope, and all other types will be named with long
183identifiers (like {\tt Int.t} or {\tt Int.t List.t}). For example, the code for
184a function {\tt foo} within the {\tt Map} module might look like the following.
185\begin{verbatim}
186fun foo (l: 'a List.t, n: Int.t): ('a, Int.t) t = ...
187\end{verbatim}
188
189In practice, for pervasive types like {\tt int}, {\tt 'a list}, we often use the
190standard pervasive name instead of the {\tt t} name.
191
192\item Signatures should not contain free types or structures, other than
193pervasives, basis library modules, or {\mlton} library modules. This was
194explained in \secref{high-level-structure}.
195
196\item
197If additional abstract types (other than pervasive types) are needed to specify
198operations, they are included as substructures of the signature, and have a
199signature in their own right. For example, the following signature is good.
200
201\begin{verbatim}
202signature FOO =
203 sig
204 structure Var: VAR
205
206 type t
207 val fromVar: Var.t -> t
208 val toVar: t -> Var.t
209 end
210\end{verbatim}
211
212\item
213Signatures do not use substructures or multiple structures to group different
214operations on the same type. This makes you waste energy remembering where the
215operations are. For exmample, the following signature is bad.
216
217\begin{verbatim}
218signature REAL =
219 sig
220 type t
221
222 val + : t * t -> t
223
224 structure Trig:
225 sig
226 val sin: t -> t
227 val cos: t -> t
228 end
229 end
230\end{verbatim}
231
232\item
233Signatures usually should not contain datatypes. This exposes the
234implementation of what should be an abstract type. For example, the following
235signature is bad.
236\begin{verbatim}
237signature COMPLEX =
238 sig
239 datatype t = T of real * real
240 end
241\end{verbatim}
242A common exception to this rule is abstract syntax trees.
243
244\item
245Use structure sharing to express type sharing. For example, in {\tt
246closure-convert.sig}, a single structure sharing equation expresses a number of
247type sharing equations.
248
249\end{enumerate}
250
251%------------------------------------------------------
252% Value specifications
253%------------------------------------------------------
254
255\subsec{Value specifications}{val-specs}
256
257Here are the conventions that we use for individual value specifications in
258signatures. Of course, many of these conventions directly impact the way in
259which we write the core language expressions that implement the specifications.
260
261\begin{enumerate}
262
263\item
264In a datatype specification, if there is a single constructor, then that
265constructor is called {\tt T}.
266\begin{verbatim}
267datatype t = T of int
268\end{verbatim}
269
270\item
271In a datatype specification, if a constructor carries multiple values of the
272same type, use a record to name them to avoid confusion.
273\begin{verbatim}
274datatype t = T of {length: int, start: int}
275\end{verbatim}
276
277\item
278Identifiers begin with and use small letters, using capital letters to separate
279words.
280\begin{verbatim}
281val helloWorld: unit -> unit
282\end{verbatim}
283
284\item
285There is no space before the colon, and a single space after it. In the case of
286operators (like {\tt +}), there is a space before the colon to avoid lexing the
287colon as part of the operator.
288
289\item
290Pass multiple arguments as tuple, not curried.
291\begin{verbatim}
292val eval: Exp.t * Env.t -> Val.t
293\end{verbatim}
294
295\item
296Currying is only used when there staging of a computation, i.e., if
297precomputation is done on one of the arguments.
298\begin{verbatim}
299val match: Regexp.t -> string -> bool
300\end{verbatim}
301
302\item
303Functions which take a single element of the abstract type of a signature take
304the element as the first argument, and auxiliary arguments after.
305\begin{verbatim}
306val push: t * int -> unit
307val map: 'a t * ('a -> 'b) -> 'b t
308\end{verbatim}
309
310\item
311$n$-ary operations take the $n$ elements first, and auxilary arguments after.
312\begin{verbatim}
313val merge: 'a t * 'a t * ('a * 'a -> 'b) -> 'b t
314\end{verbatim}
315
316\item
317If two arguments to a function are of the same type, and the operation is not
318commutative, pass them using a record. This names the arguments and ensures
319they are not confused. Exceptions are the standard numerical and algebraic
320operators.
321\begin{verbatim}
322val fromTo: {start: int, step: int, stop: int} -> int list
323val substring: t * {length: int, start: int} -> t
324val - : t * t -> t
325\end{verbatim}
326
327\item
328Field names in record types are written in alphabetical order.
329
330\item
331Return multiple results as a tuple, or as a record if there is the potential for
332confusion.
333\begin{verbatim}
334val parse: string -> t * string
335val quotRem: t * t -> t * t
336val partition: 'a t * ('a -> bool) -> {no: 'a t, yes: 'a t}
337\end{verbatim}
338
339\item
340If a function returns multiple results, at least two of which are of the same
341type, and the name of the function does not clearly indicate which result is
342which, use a record to name the results.
343\begin{verbatim}
344val vars: t -> {frees : Vars.t, bound : Vars.t}
345val partition: 'a t * ('a -> bool) -> {yes : 'a t, no : 'a t}
346\end{verbatim}
347
348\item
349Use the same names and argument orders for similar functions in different
350signatures. This is especially common in the {\mlton} library.
351\begin{verbatim}
352val < : t * t -> bool
353val equals: t * t -> bool
354val forall: 'a t * ('a -> bool) -> bool
355\end{verbatim}
356
357\item
358Use {\tt is}, {\tt are}, {\tt can}, etc. to name predicates. One exception is
359{\tt equals}.
360\begin{verbatim}
361val isEven: int -> bool
362val canRead: t -> bool
363\end{verbatim}
364
365\end{enumerate}
366
367%------------------------------------------------------
368% Signature example
369%------------------------------------------------------
370
371\subsection{Example}
372
373Here is the complete specification of a simple interpreter. This demonstrates
374the {\tt t}-convention, the closed-signature convention, and the use of sharing
375constraints.
376
377\begin{verbatim}
378signature VAR =
379 sig
380 type t
381 end
382
383signature EXP =
384 sig
385 structure Var: VAR
386
387 datatype t =
388 Var of Var.t
389 | Lam of Var.t * t
390 | App of t * t
391 end
392
393signature VAL =
394 sig
395 structure Var: VAR
396
397 type t
398
399 val var: Var.t -> t
400 val lam: Var.t * t -> t
401 val app: t * t -> t
402 end
403
404signature INTERP =
405 sig
406 structure Exp: EXP
407 structure Val: VAL
408 sharing Exp.Var = Val.Var
409
410 val eval: Exp.t -> Val.t
411 end
412
413signature ENV =
414 sig
415 structure Var: VAR
416
417 type 'a t
418
419 val lookup: 'a t * Var.t -> 'a
420 val extend: 'a t * Var.t * 'a -> 'a t
421 end
422\end{verbatim}
423
424%------------------------------------------------------
425% Functors and structures
426%------------------------------------------------------
427
428\section{Functors and structures}
429We now enumerate the conventions we follow in writing functors and structures.
430There is some repetition with \secref{high-level-structure}.
431
432\begin{enumerate}
433
434\item
435Functor identifiers begin with capital letters, use mixed case, and use capital
436letters to separate words.
437
438\item
439Functor definitions look like the following.
440\begin{verbatim}
441functor Foo (S: FOO_STRUCTS): FOO =
442struct
443
444open S
445
446...
447
448end
449\end{verbatim}
450
451\item
452The name of the functor is the same as the name of the signature describing the
453structure it produces.
454
455\item
456The functor result is constrained by a signature.
457
458\item
459A functor takes as arguments any structures that occur in the signature of the
460result that it does not implement.
461
462\item
463Structure identifiers begin with capital letters, and use capital letters to
464separate words.
465
466\item
467The name of the structure is the same as the name of the functor that produces
468it.
469
470\item
471A structure definition looks like one of the following.
472\begin{verbatim}
473structure Foo = Foo (S)
474
475structure Foo =
476 struct
477 ...
478 end
479\end{verbatim}
480
481\item
482Avoid the use of {\tt open} except within tightly constrained scopes. The use
483of {\tt open} makes it hard to look at code later and understand where things
484come from.
485
486\end{enumerate}
487
488%------------------------------------------------------
489% Core expressions
490%------------------------------------------------------
491
492\section{Core expressions}
493
494We now enumerate the conventions we follow in writing core expressions. We do
495not repeat the conventions of \secref{val-spec}, although many of them apply
496here.
497\begin{enumerate}
498
499\item
500Tuples are written with spaces after commas, like {\tt (a, b, c)}.
501
502\item
503Records are written with spaces on both sides of equals and with spaces after
504commas, like {\tt \{bar = 1, foo = 2\}}.
505
506\item
507Record field names are written in alphabetical order, both in expressions and
508types.
509
510\item
511Function application is written with a space between the function and the
512argument. If there is one untupled argument, it looks like {\tt f x}. If there
513is a tupleg argument, it looks like {\tt f (x, y, z)}.
514
515\item
516When you want to mix declarations with side-effecting statements, use a
517declaration like {\tt val \_ = sideEffectingProcedure()}.
518
519\item
520In sequence expressions {\tt (e1; e2)} that span multiple lines, place the
521semicolon at the beginning of lines.
522\begin{verbatim}
523(e1
524 ; e2
525 ; e3)
526\end{verbatim}
527
528\item
529Never write nonexhaustive matches. Always handle the default case and raise an
530error message. Your error message will be better than the compiler's. Also, if
531you have lots of uncaught cases, then you are probably not using the type system
532in a strong enough way - your types are not expressing as much as they could.
533
534\item
535Never use the syntax for declaring functions that repeats the function name.
536Use {\tt case} or {\tt fn} instead. That is, do not write the following.
537\begin{verbatim}
538fun f 0 = 1
539 | f n = n + 1
540\end{verbatim}
541Instead, write the following.
542\begin{verbatim}
543val f =
544 fn 0 => 1
545 | n => n + 1
546\end{verbatim}
547Or, write the following.
548\begin{verbatim}
549fun f n =
550 case n of
551 0 => 1
552 | _ => n + 1
553\end{verbatim}
554
555\end{enumerate}
556
557\bibliographystyle{alpha}
558\bibliography{bib}
559\end{document}