Finish CPS documentation
[bpt/guile.git] / doc / ref / compiler.texi
CommitLineData
8680d53b
AW
1@c -*-texinfo-*-
2@c This is part of the GNU Guile Reference Manual.
bdad1340 3@c Copyright (C) 2008, 2009, 2010, 2011, 2012, 2013, 2014
8680d53b
AW
4@c Free Software Foundation, Inc.
5@c See the file guile.texi for copying conditions.
6
7@node Compiling to the Virtual Machine
8@section Compiling to the Virtual Machine
9
67915ab0
AW
10Compilers! The word itself inspires excitement and awe, even among
11experienced practitioners. But a compiler is just a program: an
12eminently hackable thing. This section aims to to describe Guile's
13compiler in such a way that interested Scheme hackers can feel
14comfortable reading and extending it.
00ce5125 15
e3ba263d 16@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
98850fd7 17know how to compile your @code{.scm} file.
00ce5125
AW
18
19@menu
20* Compiler Tower::
21* The Scheme Compiler::
81fd3152 22* Tree-IL::
67915ab0
AW
23* Continuation-Passing Style::
24* Bytecode::
e63d888e 25* Writing New High-Level Languages::
e3ba263d 26* Extending the Compiler::
00ce5125
AW
27@end menu
28
29@node Compiler Tower
30@subsection Compiler Tower
31
d297e544
AW
32Guile's compiler is quite simple -- its @emph{compilers}, to put it more
33accurately. Guile defines a tower of languages, starting at Scheme and
34progressively simplifying down to languages that resemble the VM
35instruction set (@pxref{Instruction Set}).
00ce5125
AW
36
37Each language knows how to compile to the next, so each step is simple
67915ab0
AW
38and understandable. Furthermore, this set of languages is not hardcoded
39into Guile, so it is possible for the user to add new high-level
40languages, new passes, or even different compilation targets.
00ce5125 41
e3ba263d
AW
42Languages are registered in the module, @code{(system base language)}:
43
44@example
45(use-modules (system base language))
46@end example
47
48They are registered with the @code{define-language} form.
49
50@deffn {Scheme Syntax} define-language @
994d87be
BT
51 [#:name] [#:title] [#:reader] [#:printer] @
52 [#:parser=#f] [#:compilers='()] @
53 [#:decompilers='()] [#:evaluator=#f] @
54 [#:joiner=#f] [#:for-humans?=#t] @
55 [#:make-default-environment=make-fresh-user-module]
e3ba263d
AW
56Define a language.
57
67915ab0
AW
58This syntax defines a @code{<language>} object, bound to @var{name} in
59the current environment. In addition, the language will be added to the
60global language set. For example, this is the language definition for
61Scheme:
e3ba263d
AW
62
63@example
64(define-language scheme
41e64dd7
AW
65 #:title "Scheme"
66 #:reader (lambda (port env) ...)
98850fd7 67 #:compilers `((tree-il . ,compile-tree-il))
81fd3152 68 #:decompilers `((tree-il . ,decompile-tree-il))
41e64dd7
AW
69 #:evaluator (lambda (x module) (primitive-eval x))
70 #:printer write
71 #:make-default-environment (lambda () ...))
e3ba263d 72@end example
e3ba263d
AW
73@end deffn
74
75The interesting thing about having languages defined this way is that
67915ab0 76they present a uniform interface to the read-eval-print loop. This
e3ba263d
AW
77allows the user to change the current language of the REPL:
78
79@example
81fd3152 80scheme@@(guile-user)> ,language tree-il
41e64dd7
AW
81Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
82tree-il@@(guile-user)> ,L scheme
83Happy hacking with Scheme! To switch back, type `,L tree-il'.
84scheme@@(guile-user)>
e3ba263d
AW
85@end example
86
87Languages can be looked up by name, as they were above.
88
89@deffn {Scheme Procedure} lookup-language name
90Looks up a language named @var{name}, autoloading it if necessary.
91
92Languages are autoloaded by looking for a variable named @var{name} in
93a module named @code{(language @var{name} spec)}.
94
95The language object will be returned, or @code{#f} if there does not
96exist a language with that name.
97@end deffn
98
99Defining languages this way allows us to programmatically determine
100the necessary steps for compiling code from one language to another.
101
102@deffn {Scheme Procedure} lookup-compilation-order from to
103Recursively traverses the set of languages to which @var{from} can
104compile, depth-first, and return the first path that can transform
105@var{from} to @var{to}. Returns @code{#f} if no path is found.
106
107This function memoizes its results in a cache that is invalidated by
108subsequent calls to @code{define-language}, so it should be quite
109fast.
110@end deffn
111
5745de91
AW
112There is a notion of a ``current language'', which is maintained in the
113@code{current-language} parameter, defined in the core @code{(guile)}
67915ab0
AW
114module. This language is normally Scheme, and may be rebound by the
115user. The run-time compilation interfaces
e3ba263d
AW
116(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
117and target languages.
118
119The normal tower of languages when compiling Scheme goes like this:
120
121@itemize
41e64dd7 122@item Scheme
81fd3152 123@item Tree Intermediate Language (Tree-IL)
67915ab0 124@item Continuation-Passing Style (CPS)
81fd3152 125@item Bytecode
e3ba263d
AW
126@end itemize
127
67915ab0
AW
128As discussed before (@pxref{Object File Format}), bytecode is in ELF
129format, ready to be serialized to disk. But when compiling Scheme at
130run time, you want a Scheme value: for example, a compiled procedure.
131For this reason, so as not to break the abstraction, Guile defines a
132fake language at the bottom of the tower:
81fd3152
AW
133
134@itemize
135@item Value
136@end itemize
137
d297e544
AW
138Compiling to @code{value} loads the bytecode into a procedure, turning
139cold bytes into warm code.
e3ba263d
AW
140
141Perhaps this strangeness can be explained by example:
d297e544 142@code{compile-file} defaults to compiling to bytecode, because it
e3ba263d 143produces object code that has to live in the barren world outside the
67915ab0
AW
144Guile runtime; but @code{compile} defaults to compiling to @code{value},
145as its product re-enters the Guile world.
146
147@c FIXME: This doesn't work anymore :( Should we add some kind of
148@c special GC pass, or disclaim this kind of code, or what?
e3ba263d
AW
149
150Indeed, the process of compilation can circulate through these
151different worlds indefinitely, as shown by the following quine:
152
153@example
00ce5125 154((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
e3ba263d 155@end example
00ce5125
AW
156
157@node The Scheme Compiler
158@subsection The Scheme Compiler
159
7081d4f9 160The job of the Scheme compiler is to expand all macros and all of Scheme
67915ab0
AW
161to its most primitive expressions. The definition of ``primitive
162expression'' is given by the inventory of constructs provided by
163Tree-IL, the target language of the Scheme compiler: procedure calls,
164conditionals, lexical references, and so on. This is described more
165fully in the next section.
81fd3152
AW
166
167The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
67915ab0 168that it is completely implemented by the macro expander. Since the
81fd3152
AW
169macro expander has to run over all of the source code already in order
170to expand macros, it might as well do the analysis at the same time,
171producing Tree-IL expressions directly.
172
67915ab0
AW
173Because this compiler is actually the macro expander, it is extensible.
174Any macro which the user writes becomes part of the compiler.
81fd3152
AW
175
176The Scheme-to-Tree-IL expander may be invoked using the generic
177@code{compile} procedure:
178
179@lisp
180(compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
181@result{}
67915ab0 182#<tree-il (call (toplevel +) (const 1) (const 2))>
81fd3152
AW
183@end lisp
184
67915ab0
AW
185@code{(compile @var{foo} #:from 'scheme #:to 'tree-il)} is entirely
186equivalent to calling the macro expander as @code{(macroexpand @var{foo}
187'c '(compile load eval))}. @xref{Macro Expansion}.
188@code{compile-tree-il}, the procedure dispatched by @code{compile} to
189@code{'tree-il}, is a small wrapper around @code{macroexpand}, to make
190it conform to the general form of compiler procedures in Guile's
191language tower.
81fd3152 192
98850fd7
AW
193Compiler procedures take three arguments: an expression, an
194environment, and a keyword list of options. They return three values:
195the compiled expression, the corresponding environment for the target
196language, and a ``continuation environment''. The compiled expression
197and environment will serve as input to the next language's compiler.
198The ``continuation environment'' can be used to compile another
199expression from the same source language within the same module.
81fd3152
AW
200
201For example, you might compile the expression, @code{(define-module
202(foo))}. This will result in a Tree-IL expression and environment. But
203if you compiled a second expression, you would want to take into
204account the compile-time effect of compiling the previous expression,
205which puts the user in the @code{(foo)} module. That is purpose of the
206``continuation environment''; you would pass it as the environment
207when compiling the subsequent expression.
208
41e64dd7
AW
209For Scheme, an environment is a module. By default, the @code{compile}
210and @code{compile-file} procedures compile in a fresh module, such
211that bindings and macros introduced by the expression being compiled
212are isolated:
1ebe6a63
LC
213
214@example
215(eq? (current-module) (compile '(current-module)))
216@result{} #f
217
218(compile '(define hello 'world))
219(defined? 'hello)
220@result{} #f
221
222(define / *)
223(eq? (compile '/) /)
224@result{} #f
225@end example
226
227Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
228@code{current-reader}}) are isolated:
229
230@example
231(compile '(fluid-set! current-reader (lambda args 'fail)))
232(fluid-ref current-reader)
233@result{} #f
234@end example
235
236Nevertheless, having the compiler and @dfn{compilee} share the same name
237space can be achieved by explicitly passing @code{(current-module)} as
238the compilation environment:
239
240@example
241(define hello 'world)
242(compile 'hello #:env (current-module))
243@result{} world
244@end example
245
81fd3152
AW
246@node Tree-IL
247@subsection Tree-IL
00ce5125 248
81fd3152 249Tree Intermediate Language (Tree-IL) is a structured intermediate
c850030f
AW
250language that is close in expressive power to Scheme. It is an
251expanded, pre-analyzed Scheme.
252
81fd3152
AW
253Tree-IL is ``structured'' in the sense that its representation is
254based on records, not S-expressions. This gives a rigidity to the
255language that ensures that compiling to a lower-level language only
41e64dd7
AW
256requires a limited set of transformations. For example, the Tree-IL
257type @code{<const>} is a record type with two fields, @code{src} and
258@code{exp}. Instances of this type are created via @code{make-const}.
259Fields of this type are accessed via the @code{const-src} and
260@code{const-exp} procedures. There is also a predicate, @code{const?}.
261@xref{Records}, for more information on records.
81fd3152
AW
262
263@c alpha renaming
264
265All Tree-IL types have a @code{src} slot, which holds source location
266information for the expression. This information, if present, will be
267residualized into the compiled object code, allowing backtraces to
268show source information. The format of @code{src} is the same as that
269returned by Guile's @code{source-properties} function. @xref{Source
270Properties}, for more information.
271
272Although Tree-IL objects are represented internally using records,
273there is also an equivalent S-expression external representation for
ecb87335 274each kind of Tree-IL. For example, the S-expression representation
81fd3152 275of @code{#<const src: #f exp: 3>} expression would be:
c850030f
AW
276
277@example
81fd3152 278(const 3)
c850030f
AW
279@end example
280
81fd3152 281Users may program with this format directly at the REPL:
c850030f
AW
282
283@example
81fd3152 284scheme@@(guile-user)> ,language tree-il
41e64dd7 285Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
67915ab0 286tree-il@@(guile-user)> (call (primitive +) (const 32) (const 10))
c850030f
AW
287@result{} 42
288@end example
289
81fd3152
AW
290The @code{src} fields are left out of the external representation.
291
98850fd7
AW
292One may create Tree-IL objects from their external representations via
293calling @code{parse-tree-il}, the reader for Tree-IL. If any source
294information is attached to the input S-expression, it will be
295propagated to the resulting Tree-IL expressions. This is probably the
296easiest way to compile to Tree-IL: just make the appropriate external
297representations in S-expression format, and let @code{parse-tree-il}
298take care of the rest.
299
81fd3152
AW
300@deftp {Scheme Variable} <void> src
301@deftpx {External Representation} (void)
67915ab0 302An empty expression. In practice, equivalent to Scheme's @code{(if #f
81fd3152
AW
303#f)}.
304@end deftp
d297e544 305
81fd3152
AW
306@deftp {Scheme Variable} <const> src exp
307@deftpx {External Representation} (const @var{exp})
308A constant.
309@end deftp
d297e544 310
81fd3152
AW
311@deftp {Scheme Variable} <primitive-ref> src name
312@deftpx {External Representation} (primitive @var{name})
67915ab0
AW
313A reference to a ``primitive''. A primitive is a procedure that, when
314compiled, may be open-coded. For example, @code{cons} is usually
81fd3152
AW
315recognized as a primitive, so that it compiles down to a single
316instruction.
317
318Compilation of Tree-IL usually begins with a pass that resolves some
319@code{<module-ref>} and @code{<toplevel-ref>} expressions to
67915ab0 320@code{<primitive-ref>} expressions. The actual compilation pass has
7081d4f9
AW
321special cases for calls to certain primitives, like @code{apply} or
322@code{cons}.
81fd3152 323@end deftp
d297e544 324
81fd3152
AW
325@deftp {Scheme Variable} <lexical-ref> src name gensym
326@deftpx {External Representation} (lexical @var{name} @var{gensym})
67915ab0 327A reference to a lexically-bound variable. The @var{name} is the
81fd3152
AW
328original name of the variable in the source program. @var{gensym} is a
329unique identifier for this variable.
330@end deftp
d297e544 331
81fd3152
AW
332@deftp {Scheme Variable} <lexical-set> src name gensym exp
333@deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
334Sets a lexically-bound variable.
335@end deftp
d297e544 336
81fd3152
AW
337@deftp {Scheme Variable} <module-ref> src mod name public?
338@deftpx {External Representation} (@@ @var{mod} @var{name})
339@deftpx {External Representation} (@@@@ @var{mod} @var{name})
340A reference to a variable in a specific module. @var{mod} should be
679cceed 341the name of the module, e.g.@: @code{(guile-user)}.
81fd3152
AW
342
343If @var{public?} is true, the variable named @var{name} will be looked
344up in @var{mod}'s public interface, and serialized with @code{@@};
345otherwise it will be looked up among the module's private bindings,
346and is serialized with @code{@@@@}.
347@end deftp
d297e544 348
81fd3152
AW
349@deftp {Scheme Variable} <module-set> src mod name public? exp
350@deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
351@deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
352Sets a variable in a specific module.
353@end deftp
d297e544 354
81fd3152
AW
355@deftp {Scheme Variable} <toplevel-ref> src name
356@deftpx {External Representation} (toplevel @var{name})
357References a variable from the current procedure's module.
358@end deftp
d297e544 359
81fd3152
AW
360@deftp {Scheme Variable} <toplevel-set> src name exp
361@deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
362Sets a variable in the current procedure's module.
363@end deftp
d297e544 364
81fd3152
AW
365@deftp {Scheme Variable} <toplevel-define> src name exp
366@deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
367Defines a new top-level variable in the current procedure's module.
368@end deftp
d297e544 369
81fd3152
AW
370@deftp {Scheme Variable} <conditional> src test then else
371@deftpx {External Representation} (if @var{test} @var{then} @var{else})
ca445ba5 372A conditional. Note that @var{else} is not optional.
c850030f 373@end deftp
d297e544 374
7081d4f9
AW
375@deftp {Scheme Variable} <call> src proc args
376@deftpx {External Representation} (call @var{proc} . @var{args})
ca445ba5 377A procedure call.
c850030f 378@end deftp
d297e544 379
a881a4ae
AW
380@deftp {Scheme Variable} <primcall> src name args
381@deftpx {External Representation} (primcall @var{name} . @var{args})
382A call to a primitive. Equivalent to @code{(call (primitive @var{name})
383. @var{args})}. This construct is often more convenient to generate and
384analyze than @code{<call>}.
385
386As part of the compilation process, instances of @code{(call (primitive
387@var{name}) . @var{args})} are transformed into primcalls.
388@end deftp
d297e544 389
67915ab0
AW
390@deftp {Scheme Variable} <seq> src head tail
391@deftpx {External Representation} (seq @var{head} @var{tail})
392A sequence. The semantics is that @var{head} is evaluated first, and
393any resulting values are ignored. Then @var{tail} is evaluated, in tail
394position.
c850030f 395@end deftp
d297e544 396
41e64dd7
AW
397@deftp {Scheme Variable} <lambda> src meta body
398@deftpx {External Representation} (lambda @var{meta} @var{body})
67915ab0
AW
399A closure. @var{meta} is an association list of properties for the
400procedure. @var{body} is a single Tree-IL expression of type
401@code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
41e64dd7
AW
402an alternate clause, this makes Tree-IL's @code{<lambda>} have the
403expressiveness of Scheme's @code{case-lambda}.
404@end deftp
d297e544 405
41e64dd7
AW
406@deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
407@deftpx {External Representation} @
408 (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
409 @var{body})@
410 [@var{alternate}])
67915ab0 411One clause of a @code{case-lambda}. A @code{lambda} expression in
41e64dd7
AW
412Scheme is treated as a @code{case-lambda} with one clause.
413
414@var{req} is a list of the procedure's required arguments, as symbols.
415@var{opt} is a list of the optional arguments, or @code{#f} if there
416are no optional arguments. @var{rest} is the name of the rest
417argument, or @code{#f}.
418
419@var{kw} is a list of the form, @code{(@var{allow-other-keys?}
420(@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
421keyword corresponding to the argument named @var{name}, and whose
67915ab0
AW
422corresponding gensym is @var{var}. @var{inits} are tree-il expressions
423corresponding to all of the optional and keyword arguments, evaluated to
424bind variables whose value is not supplied by the procedure caller.
41e64dd7
AW
425Each @var{init} expression is evaluated in the lexical context of
426previously bound variables, from left to right.
427
428@var{gensyms} is a list of gensyms corresponding to all arguments:
429first all of the required arguments, then the optional arguments if
430any, then the rest argument if any, then all of the keyword arguments.
431
67915ab0 432@var{body} is the body of the clause. If the procedure is called with
41e64dd7 433an appropriate number of arguments, @var{body} is evaluated in tail
67915ab0 434position. Otherwise, if there is an @var{alternate}, it should be a
41e64dd7 435@code{<lambda-case>} expression, representing the next clause to try.
64de6db5 436If there is no @var{alternate}, a wrong-number-of-arguments error is
41e64dd7
AW
437signaled.
438@end deftp
d297e544 439
41e64dd7
AW
440@deftp {Scheme Variable} <let> src names gensyms vals exp
441@deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
67915ab0
AW
442Lexical binding, like Scheme's @code{let}. @var{names} are the original
443binding names, @var{gensyms} are gensyms corresponding to the
81fd3152
AW
444@var{names}, and @var{vals} are Tree-IL expressions for the values.
445@var{exp} is a single Tree-IL expression.
446@end deftp
d297e544 447
935c7aca 448@deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
172988ee
AW
449@deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
450@deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
81fd3152 451A version of @code{<let>} that creates recursive bindings, like
935c7aca 452Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
81fd3152 453@end deftp
d297e544 454
67915ab0
AW
455@deftp {Scheme Variable} <prompt> escape-only? tag body handler
456@deftpx {External Representation} (prompt @var{escape-only?} @var{tag} @var{body} @var{handler})
457A dynamic prompt. Instates a prompt named @var{tag}, an expression,
41e64dd7 458during the dynamic extent of the execution of @var{body}, also an
67915ab0
AW
459expression. If an abort occurs to this prompt, control will be passed
460to @var{handler}, also an expression, which should be a procedure. The
461first argument to the handler procedure will be the captured
462continuation, followed by all of the values passed to the abort. If
463@var{escape-only?} is true, the handler should be a @code{<lambda>} with
464a single @code{<lambda-case>} body expression with no optional or
465keyword arguments, and no alternate, and whose first argument is
466unreferenced. @xref{Prompts}, for more information.
41e64dd7 467@end deftp
d297e544 468
41e64dd7
AW
469@deftp {Scheme Variable} <abort> tag args tail
470@deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
471An abort to the nearest prompt with the name @var{tag}, an expression.
472@var{args} should be a list of expressions to pass to the prompt's
473handler, and @var{tail} should be an expression that will evaluate to
67915ab0 474a list of additional arguments. An abort will save the partial
41e64dd7
AW
475continuation, which may later be reinstated, resulting in the
476@code{<abort>} expression evaluating to some number of values.
477@end deftp
81fd3152 478
98850fd7
AW
479There are two Tree-IL constructs that are not normally produced by
480higher-level compilers, but instead are generated during the
481source-to-source optimization and analysis passes that the Tree-IL
67915ab0
AW
482compiler does. Users should not generate these expressions directly,
483unless they feel very clever, as the default analysis pass will generate
484them as necessary.
98850fd7 485
41e64dd7
AW
486@deftp {Scheme Variable} <let-values> src names gensyms exp body
487@deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
98850fd7
AW
488Like Scheme's @code{receive} -- binds the values returned by
489evaluating @code{exp} to the @code{lambda}-like bindings described by
67915ab0 490@var{gensyms}. That is to say, @var{gensyms} may be an improper list.
98850fd7 491
7081d4f9 492@code{<let-values>} is an optimization of a @code{<call>} to the
98850fd7
AW
493primitive, @code{call-with-values}.
494@end deftp
d297e544 495
41e64dd7
AW
496@deftp {Scheme Variable} <fix> src names gensyms vals body
497@deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
98850fd7
AW
498Like @code{<letrec>}, but only for @var{vals} that are unset
499@code{lambda} expressions.
500
501@code{fix} is an optimization of @code{letrec} (and @code{let}).
502@end deftp
81fd3152 503
67915ab0
AW
504Tree-IL is a convenient compilation target from source languages. It
505can be convenient as a medium for optimization, though CPS is usually
506better. The strength of Tree-IL is that it does not fix order of
507evaluation, so it makes some code motion a bit easier.
00ce5125 508
67915ab0 509Optimization passes performed on Tree-IL currently include:
00ce5125 510
73643339 511@itemize
67915ab0
AW
512@item Open-coding (turning toplevel-refs into primitive-refs,
513and calls to primitives to primcalls)
514@item Partial evaluation (comprising inlining, copy propagation, and
515constant folding)
516@item Common subexpression elimination (CSE)
73643339
AW
517@end itemize
518
67915ab0
AW
519In the future, we will move the CSE pass to operate over the lower-level
520CPS language.
73643339 521
67915ab0
AW
522@node Continuation-Passing Style
523@subsection Continuation-Passing Style
73643339 524
67915ab0 525@cindex CPS
58b23156
AW
526Continuation-passing style (CPS) is Guile's principal intermediate
527language, bridging the gap between languages for people and languages
528for machines. CPS gives a name to every part of a program: every
529control point, and every intermediate value. This makes it an excellent
530medium for reasoning about programs, which is the principal job of a
531compiler.
532
533@menu
534* An Introduction to CPS::
535* CPS in Guile::
bdad1340 536* Building CPS::
58b23156
AW
537@end menu
538
539@node An Introduction to CPS
540@subsubsection An Introduction to CPS
541
542As an example, consider the following Scheme expression:
543
544@lisp
545(begin
546 (display "The sum of 32 and 10 is: ")
547 (display 42)
548 (newline))
549@end lisp
550
551Let us identify all of the sub-expressions in this expression. We give
552them unique labels, like @var{k1}, and annotate the original source
553code:
554
555@lisp
556(begin
557 (display "The sum of 32 and 10 is: ")
558 |k1 k2
559 k0
560 (display 42)
561 |k4 k5
562 k3
563 (newline))
564 |k7
565 k6
566@end lisp
567
568These labels also identify continuations. For example, the continuation
569of @code{k7} is @code{k6}. This is because after evaluating the value
570of @code{newline}, performed by the expression labelled @code{k7}, we
bdad1340 571continue to apply it in @code{k6}.
58b23156
AW
572
573Which label has @code{k0} as its continuation? It is either @code{k1}
574or @code{k2}. Scheme does not have a fixed order of evaluation of
575arguments, although it does guarantee that they are evaluated in some
576order. However, continuation-passing style makes evaluation order
577explicit. In Guile, this choice is made by the higher-level language
578compilers.
579
580Let us assume a left-to-right evaluation order. In that case the
581continuation of @code{k1} is @code{k2}, and the continuation of
582@code{k2} is @code{k0}.
583
584With this example established, we are ready to give an example of CPS in
585Scheme:
586
587@lisp
588(lambda (ktail)
589 (let ((k1 (lambda ()
590 (let ((k2 (lambda (proc)
591 (let ((k0 (lambda (arg0)
592 (proc k4 arg0))))
593 (k0 "The sum of 32 and 10 is: ")))))
594 (k2 display))))
595 (k4 (lambda _
596 (let ((k5 (lambda (proc)
597 (let ((k3 (lambda (arg0)
598 (proc k7 arg0))))
599 (k3 42)))))
600 (k5 display))))
601 (k7 (lambda _
602 (let ((k6 (lambda (proc)
603 (proc ktail))))
604 (k6 newline)))))
605 (k1))
606@end lisp
607
608Holy code explosion, Batman! What's with all the lambdas? Indeed, CPS
609is by nature much more verbose than ``direct-style'' intermediate
610languages like Tree-IL. At the same time, CPS is more simple than full
611Scheme, in the same way that a Turing machine is more simple than
612Scheme, although they are semantically equivalent.
613
614In the original program, the expression labelled @code{k0} is in effect
615context. Any values it returns are ignored. This is reflected in CPS
616by noting that its continuation, @code{k4}, takes any number of values
617and ignores them. Compare this to @code{k2}, which takes a single
618value; in this way we can say that @code{k1} is in a ``value'' context.
619Likewise @code{k6} is in tail context with respect to the expression as
620a whole, because its continuation is the tail continuation,
621@code{ktail}. CPS makes these details manifest, and gives them names.
622
bdad1340
AW
623@subsubheading Compiling CPS
624
625In CPS, there are no nested expressions. Indeed, CPS even removes the
626concept of a stack. All applications in CPS are in tail context. For
627that reason, applications in CPS are jumps, not calls. The @code{(k1)}
628above is nothing more than a @code{goto}. @code{(k3 42)} is a
629@code{goto} with a value. In this way, CPS bridges the gap between the
630lambda calculus and machine instruction sequences.
631
632On the side of machine instructions, Guile does still have a stack, and
633the @code{lambda} forms shown above do not actually result in one
634closure being allocated per subexpression at run-time. Lambda
635expressions introduced by a CPS transformation can always be allocated
636as labels or basic blocks within a function. In fact, we make a
637syntactic distinction between closures and continuations in the CPS
638language, and attempt to transform closures to continuations (basic
639blocks) where possible, via the @dfn{contification} optimization pass.
640
641Values bound by continuations are allocated to stack slots in a
642function's frame. The compiler from CPS only allocates slots to values
643that are actually live; it's possible to have a value in scope but not
644allocated to a slot.
645
58b23156
AW
646@node CPS in Guile
647@subsubsection CPS in Guile
648
bdad1340
AW
649Guile's CPS language is composed of @dfn{terms}, @dfn{expressions},
650and @dfn{continuations}.
58b23156 651
bdad1340
AW
652A term can either evaluate an expression and pass the resulting values
653to some continuation, or it can declare local continuations and contain
654a sub-term in the scope of those continuations.
655
656@deftp {CPS Term} $continue k src exp
657Evaluate the expression @var{exp} and pass the resulting values (if any)
658to the continuation labelled @var{k}. The source information associated
659with the expression may be found in @var{src}, which is either an alist
660as in @code{source-properties} or is @code{#f} if there is no associated
661source.
58b23156
AW
662@end deftp
663
bdad1340
AW
664@deftp {CPS Term} $letk conts body
665Bind @var{conts}, a list of continuations (@code{$cont} instances), in
666the scope of the sub-term @var{body}. The continuations are mutually
667recursive.
58b23156
AW
668@end deftp
669
bdad1340
AW
670Additionally, the early stages of CPS allow for a set of mutually
671recursive functions to be declared as a term. This @code{$letrec} type
672is like Tree-IL's @code{<fix>}. The contification pass will attempt to
673transform the functions declared in a @code{$letrec} into local
674continuations. Any remaining functions are later lowered to @code{$fun}
675expressions.
58b23156 676
bdad1340
AW
677@deftp {CPS Term} $letrec names syms funs body
678Declare the mutually recursive set of functions denoted by @var{names},
679@var{syms}, and @var{funs} within the sub-term @var{body}. @var{names}
680and @var{syms} are lists of symbols, and @var{funs} is a list of
681@var{$fun} values. @var{syms} are globally unique.
58b23156
AW
682@end deftp
683
bdad1340
AW
684Here is an inventory of the kinds of expressions in Guile's CPS
685language. Recall that all expressions are wrapped in a @code{$continue}
686term which specifies their continuation.
58b23156 687
bdad1340
AW
688@deftp {CPS Expression} $void
689Continue with the unspecified value.
58b23156 690@end deftp
bdad1340
AW
691
692@deftp {CPS Expression} $const val
693Continue with the constant value @var{val}.
58b23156 694@end deftp
bdad1340
AW
695
696@deftp {CPS Expression} $prim name
697Continue with the procedure that implements the primitive operation
698named by @var{name}.
58b23156 699@end deftp
bdad1340
AW
700
701@deftp {CPS Expression} $fun src meta free body
702Continue with a procedure. @var{src} identifies the source information
703for the procedure declaration, and @var{meta} is the metadata alist as
704described above in Tree-IL's @code{<lambda>}. @var{free} is a list of
705free variables accessed by the procedure. Early CPS uses an empty list
706for @var{free}; only after closure conversion is it correctly populated.
707Finally, @var{body} is the @code{$kentry} @code{$cont} of the procedure
708entry.
58b23156 709@end deftp
bdad1340
AW
710
711@deftp {CPS Expression} $call proc args
712Call @var{proc} with the arguments @var{args}, and pass all values to
713the continuation. @var{proc} and the elements of the @var{args} list
714should all be variable names. The continuation identified by the term's
715@var{k} should be a @code{$kreceive} or a @code{$ktail} instance.
58b23156 716@end deftp
bdad1340
AW
717
718@deftp {CPS Expression} $primcall name args
719Perform the primitive operation identified by @code{name}, a well-known
720symbol, passing it the arguments @var{args}, and pass all resulting
721values to the continuation. The set of available primitives includes
722all primitives known to Tree-IL and then some more; see the source code
723for details.
58b23156
AW
724@end deftp
725
bdad1340
AW
726@deftp {CPS Expression} $values args
727Pass the values named by the list @var{args} to the continuation.
728@end deftp
58b23156 729
bdad1340
AW
730@deftp {CPS Expression} $prompt escape? tag handler
731Push a prompt on the stack identified by the variable name @var{tag},
732which may be escape-only if @var{escape?} is true, and continue with
733zero values. If the body aborts to this prompt, control will proceed at
734the continuation labelled @var{handler}, which should be a
735@code{$kreceive} continuation. Prompts are later popped by
736@code{pop-prompt} primcalls.
737@end deftp
58b23156 738
bdad1340
AW
739The remaining element of the CPS language in Guile is the continuation.
740In CPS, all continuations have unique labels. Since this aspect is
741common to all continuation types, all continuations are contained in a
742@code{$cont} instance:
58b23156 743
bdad1340
AW
744@deftp {CPS Continuation Wrapper} $cont k cont
745Declare a continuation labelled @var{k}. All references to the
746continuation will use this label.
747@end deftp
58b23156 748
bdad1340
AW
749The most common kind of continuation binds some number of values, and
750then evaluates a sub-term. @code{$kargs} is this kind of simple
751@code{lambda}.
58b23156 752
bdad1340
AW
753@deftp {CPS Continuation} $kargs names syms body
754Bind the incoming values to the variables @var{syms}, with original
755names @var{names}, and then evaluate the sub-term @var{body}.
756@end deftp
58b23156 757
bdad1340
AW
758Variable names (the names in the @var{syms} of a @code{$kargs}) should
759be globally unique, and also disjoint from continuation labels. To bind
760a value to a variable and then evaluate some term, you would continue
761with the value to a @code{$kargs} that declares one variable. The bound
762value would then be available for use within the body of the
763@code{$kargs}.
58b23156 764
bdad1340
AW
765@deftp {CPS Continuation} $kif kt kf
766Receive one value. If it is true for the purposes of Scheme, branch to
767the continuation labelled @var{kt}, passing no values; otherwise, branch
768to @var{kf}.
769@end deftp
770
771For internal reasons, only certain terms may continue to a @code{$kif}.
772Compiling @code{$kif} avoids allocating space for the test variable, so
773it needs to be preceded by expressions that can test-and-branch without
774temporary values. In practice this condition is true for
775@code{$primcall}s to @code{null?}, @code{=}, and similar primitives that
776have corresponding @code{br-if-@var{foo}} VM operations; see the source
777code for full details. When in doubt, bind the test expression to a
778variable, and continue to the @code{$kif} with a @code{$values}
779expression. The optimizer should elide the @code{$values} if it is not
780needed.
58b23156 781
bdad1340
AW
782Calls out to other functions need to be wrapped in a @code{$kreceive}
783continuation in order to adapt the returned values to their uses in the
784calling function, if any.
785
786@deftp {CPS Continuation} $kreceive arity k
787Receive values on the stack. Parse them according to @var{arity}, and
788then proceed with the parsed values to the @var{$kargs} continuation
789labelled @var{k}. As a limitation specific to @code{$kreceive},
790@var{arity} may only contain required and rest arguments.
791@end deftp
792
793@code{$arity} is a helper data structure used by @code{$kreceive} and
794also by @code{$kclause}, described below.
795
796@deftp {CPS Data} $arity req opt rest kw allow-other-keys?
797A data type declaring an arity. @var{req} and @var{opt} are lists of
798source names of required and optional arguments, respectively.
799@var{rest} is either the source name of the rest variable, or @code{#f}
800if this arity does not accept additional values. @var{kw} is a list of
801the form @code{((@var{keyword} @var{name} @var{var}) ...)}, describing
802the keyword arguments. @var{allow-other-keys?} is true if other keyword
803arguments are allowed and false otherwise.
804
805Note that all of these names with the exception of the @var{var}s in the
806@var{kw} list are source names, not unique variable names.
807@end deftp
808
809Additionally, there are three specific kinds of continuations that can
810only be declared at function entries.
811
812@deftp {CPS Continuation} $kentry self tail clauses
813Declare a function entry. @var{self} is a variable bound to the
814procedure being called, and which may be used for self-references.
815@var{tail} declares the @code{$cont} wrapping the @code{$ktail} for this
816function, corresponding to the function's tail continuation.
817@var{clauses} is a list of @code{$kclause} @code{$cont} instances.
818@end deftp
819
820@deftp {CPS Continuation} $ktail
821A tail continuation.
822@end deftp
823
824@deftp {CPS Continuation} $kclause arity cont
825A clause of a function with a given arity. Applications of a function
826with a compatible set of actual arguments will continue to @var{cont}, a
827@code{$kargs} @code{$cont} instance representing the clause body.
828@end deftp
829
830
831@node Building CPS
832@subsubsection Building CPS
833
834Unlike Tree-IL, the CPS language is built to be constructed and
835deconstructed with abstract macros instead of via procedural
836constructors or accessors, or instead of S-expression matching.
837
838Deconstruction and matching is handled adequately by the @code{match}
839form from @code{(ice-9 match)}. @xref{Pattern Matching}. Construction
840is handled by a set of mutually recursive builder macros:
841@code{build-cps-term}, @code{build-cps-cont}, and @code{build-cps-exp}.
842
843In the following interface definitions, consider variables containing
844@code{cont} to be recursively build by @code{build-cps-cont}, and
845likewise for @code{term} and @code{exp}. Consider any other name to be
846evaluated as a Scheme expression. Many of these forms recognize
847@code{unquote} in some contexts, to splice in a previously-built value;
848see the specifications below for full details.
849
850@deffn {Scheme Syntax} build-cps-term ,val
851@deffnx {Scheme Syntax} build-cps-term ($letk (cont ...) term)
852@deffnx {Scheme Syntax} build-cps-term ($letrec names syms funs term)
853@deffnx {Scheme Syntax} build-cps-term ($continue k src exp)
854@deffnx {Scheme Syntax} build-cps-exp ,val
855@deffnx {Scheme Syntax} build-cps-exp ($void)
856@deffnx {Scheme Syntax} build-cps-exp ($const val)
857@deffnx {Scheme Syntax} build-cps-exp ($prim name)
858@deffnx {Scheme Syntax} build-cps-exp ($fun src meta free body)
859@deffnx {Scheme Syntax} build-cps-exp ($call proc (arg ...))
860@deffnx {Scheme Syntax} build-cps-exp ($call proc args)
861@deffnx {Scheme Syntax} build-cps-exp ($primcall name (arg ...))
862@deffnx {Scheme Syntax} build-cps-exp ($primcall name args)
863@deffnx {Scheme Syntax} build-cps-exp ($values (arg ...))
864@deffnx {Scheme Syntax} build-cps-exp ($values args)
865@deffnx {Scheme Syntax} build-cps-exp ($prompt escape? tag handler)
866@deffnx {Scheme Syntax} build-cps-cont ,val
867@deffnx {Scheme Syntax} build-cps-cont (k ($kargs (name ...) (sym ...) term))
868@deffnx {Scheme Syntax} build-cps-cont (k ($kargs names syms term))
869@deffnx {Scheme Syntax} build-cps-cont (k ($kif kt kf))
870@deffnx {Scheme Syntax} build-cps-cont (k ($kreceive req rest kargs))
871@deffnx {Scheme Syntax} build-cps-cont (k ($kentry self tail-cont ,clauses))
872@deffnx {Scheme Syntax} build-cps-cont (k ($kentry self tail-cont (cont ...)))
873@deffnx {Scheme Syntax} build-cps-cont (k ($kclause ,arity cont))
874@deffnx {Scheme Syntax} build-cps-cont (k ($kclause (req opt rest kw aok?) cont))
875Construct a CPS term, expression, or continuation.
876@end deffn
58b23156 877
bdad1340 878There are a few more miscellaneous interfaces as well.
58b23156 879
bdad1340
AW
880@deffn {Scheme Procedure} make-arity req opt rest kw allow-other-keywords?
881A procedural constructor for @code{$arity} objects.
882@end deffn
58b23156 883
bdad1340
AW
884@deffn {Scheme Syntax} let-gensyms (sym ...) body ...
885Bind @var{sym...} to fresh names, and evaluate @var{body...}.
886@end deffn
58b23156 887
bdad1340
AW
888@deffn {Scheme Syntax} rewrite-cps-term val (pat term) ...
889@deffnx {Scheme Syntax} rewrite-cps-exp val (pat exp) ...
890@deffnx {Scheme Syntax} rewrite-cps-cont val (pat cont) ...
891Match @var{val} against the series of patterns @var{pat...}, using
892@code{match}. The body of the matching clause should be a template in
893the syntax of @code{build-cps-term}, @code{build-cps-exp}, or
894@code{build-cps-cont}, respectively.
895@end deffn
73643339 896
67915ab0
AW
897@node Bytecode
898@subsection Bytecode
73643339 899
18e11135 900@xref{Object File Format}.
73643339 901
18e11135 902TODO: document (system vm loader)
e3ba263d 903
b8bc86bc
AW
904@deffn {Scheme Variable} load-thunk-from-file file
905@deffnx {C Function} scm_load_thunk_from_file (file)
ff73ae34
AW
906Load object code from a file named @var{file}. The file will be mapped
907into memory via @code{mmap}, so this is a very fast operation.
e3ba263d 908
b8bc86bc
AW
909On disk, object code is embedded in ELF, a flexible container format
910created for use in UNIX systems. Guile has its own ELF linker and
911loader, so it uses the ELF format on all systems.
73643339
AW
912@end deffn
913
18e11135 914TODO: document load-thunk-from-memory
ff73ae34
AW
915
916Compiling object code to the fake language, @code{value}, is performed
917via loading objcode into a program, then executing that thunk with
918respect to the compilation environment. Normally the environment
919propagates through the compiler transparently, but users may specify
41e64dd7 920the compilation environment manually as well, as a module.
ff73ae34 921
c850030f 922
e63d888e
DK
923@node Writing New High-Level Languages
924@subsection Writing New High-Level Languages
925
926In order to integrate a new language @var{lang} into Guile's compiler
927system, one has to create the module @code{(language @var{lang} spec)}
928containing the language definition and referencing the parser,
929compiler and other routines processing it. The module hierarchy in
930@code{(language brainfuck)} defines a very basic Brainfuck
931implementation meant to serve as easy-to-understand example on how to
4e432dab
AW
932do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
933for more information about the Brainfuck language itself.
934
e63d888e 935
ff73ae34
AW
936@node Extending the Compiler
937@subsection Extending the Compiler
e3ba263d 938
dd6e37d0
NJ
939At this point we take a detour from the impersonal tone of the rest of
940the manual. Admit it: if you've read this far into the compiler
941internals manual, you are a junkie. Perhaps a course at your university
942left you unsated, or perhaps you've always harbored a desire to hack the
943holy of computer science holies: a compiler. Well you're in good
944company, and in a good position. Guile's compiler needs your help.
ff73ae34
AW
945
946There are many possible avenues for improving Guile's compiler.
947Probably the most important improvement, speed-wise, will be some form
948of native compilation, both just-in-time and ahead-of-time. This could
949be done in many ways. Probably the easiest strategy would be to extend
950the compiled procedure structure to include a pointer to a native code
86872cc3 951vector, and compile from bytecode to native code at run-time after a
ff73ae34
AW
952procedure is called a certain number of times.
953
954The name of the game is a profiling-based harvest of the low-hanging
955fruit, running programs of interest under a system-level profiler and
956determining which improvements would give the most bang for the buck.
98850fd7
AW
957It's really getting to the point though that native compilation is the
958next step.
ff73ae34
AW
959
960The compiler also needs help at the top end, enhancing the Scheme that
98850fd7
AW
961it knows to also understand R6RS, and adding new high-level compilers.
962We have JavaScript and Emacs Lisp mostly complete, but they could use
ecb87335 963some love; Lua would be nice as well, but whatever language it is
98850fd7
AW
964that strikes your fancy would be welcome too.
965
966Compilers are for hacking, not for admiring or for complaining about.
967Get to it!