Beginnings of CPS section in manual
[bpt/guile.git] / doc / ref / compiler.texi
1 @c -*-texinfo-*-
2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 2008, 2009, 2010, 2011, 2012, 2013
4 @c Free Software Foundation, Inc.
5 @c See the file guile.texi for copying conditions.
6
7 @node Compiling to the Virtual Machine
8 @section Compiling to the Virtual Machine
9
10 Compilers! The word itself inspires excitement and awe, even among
11 experienced practitioners. But a compiler is just a program: an
12 eminently hackable thing. This section aims to to describe Guile's
13 compiler in such a way that interested Scheme hackers can feel
14 comfortable reading and extending it.
15
16 @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
17 know how to compile your @code{.scm} file.
18
19 @menu
20 * Compiler Tower::
21 * The Scheme Compiler::
22 * Tree-IL::
23 * Continuation-Passing Style::
24 * Bytecode::
25 * Writing New High-Level Languages::
26 * Extending the Compiler::
27 @end menu
28
29 @node Compiler Tower
30 @subsection Compiler Tower
31
32 Guile's compiler is quite simple -- its @emph{compilers}, to put it more
33 accurately. Guile defines a tower of languages, starting at Scheme and
34 progressively simplifying down to languages that resemble the VM
35 instruction set (@pxref{Instruction Set}).
36
37 Each language knows how to compile to the next, so each step is simple
38 and understandable. Furthermore, this set of languages is not hardcoded
39 into Guile, so it is possible for the user to add new high-level
40 languages, new passes, or even different compilation targets.
41
42 Languages are registered in the module, @code{(system base language)}:
43
44 @example
45 (use-modules (system base language))
46 @end example
47
48 They are registered with the @code{define-language} form.
49
50 @deffn {Scheme Syntax} define-language @
51 [#:name] [#:title] [#:reader] [#:printer] @
52 [#:parser=#f] [#:compilers='()] @
53 [#:decompilers='()] [#:evaluator=#f] @
54 [#:joiner=#f] [#:for-humans?=#t] @
55 [#:make-default-environment=make-fresh-user-module]
56 Define a language.
57
58 This syntax defines a @code{<language>} object, bound to @var{name} in
59 the current environment. In addition, the language will be added to the
60 global language set. For example, this is the language definition for
61 Scheme:
62
63 @example
64 (define-language scheme
65 #:title "Scheme"
66 #:reader (lambda (port env) ...)
67 #:compilers `((tree-il . ,compile-tree-il))
68 #:decompilers `((tree-il . ,decompile-tree-il))
69 #:evaluator (lambda (x module) (primitive-eval x))
70 #:printer write
71 #:make-default-environment (lambda () ...))
72 @end example
73 @end deffn
74
75 The interesting thing about having languages defined this way is that
76 they present a uniform interface to the read-eval-print loop. This
77 allows the user to change the current language of the REPL:
78
79 @example
80 scheme@@(guile-user)> ,language tree-il
81 Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
82 tree-il@@(guile-user)> ,L scheme
83 Happy hacking with Scheme! To switch back, type `,L tree-il'.
84 scheme@@(guile-user)>
85 @end example
86
87 Languages can be looked up by name, as they were above.
88
89 @deffn {Scheme Procedure} lookup-language name
90 Looks up a language named @var{name}, autoloading it if necessary.
91
92 Languages are autoloaded by looking for a variable named @var{name} in
93 a module named @code{(language @var{name} spec)}.
94
95 The language object will be returned, or @code{#f} if there does not
96 exist a language with that name.
97 @end deffn
98
99 Defining languages this way allows us to programmatically determine
100 the necessary steps for compiling code from one language to another.
101
102 @deffn {Scheme Procedure} lookup-compilation-order from to
103 Recursively traverses the set of languages to which @var{from} can
104 compile, depth-first, and return the first path that can transform
105 @var{from} to @var{to}. Returns @code{#f} if no path is found.
106
107 This function memoizes its results in a cache that is invalidated by
108 subsequent calls to @code{define-language}, so it should be quite
109 fast.
110 @end deffn
111
112 There is a notion of a ``current language'', which is maintained in the
113 @code{current-language} parameter, defined in the core @code{(guile)}
114 module. This language is normally Scheme, and may be rebound by the
115 user. The run-time compilation interfaces
116 (@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
117 and target languages.
118
119 The normal tower of languages when compiling Scheme goes like this:
120
121 @itemize
122 @item Scheme
123 @item Tree Intermediate Language (Tree-IL)
124 @item Continuation-Passing Style (CPS)
125 @item Bytecode
126 @end itemize
127
128 As discussed before (@pxref{Object File Format}), bytecode is in ELF
129 format, ready to be serialized to disk. But when compiling Scheme at
130 run time, you want a Scheme value: for example, a compiled procedure.
131 For this reason, so as not to break the abstraction, Guile defines a
132 fake language at the bottom of the tower:
133
134 @itemize
135 @item Value
136 @end itemize
137
138 Compiling to @code{value} loads the bytecode into a procedure, turning
139 cold bytes into warm code.
140
141 Perhaps this strangeness can be explained by example:
142 @code{compile-file} defaults to compiling to bytecode, because it
143 produces object code that has to live in the barren world outside the
144 Guile runtime; but @code{compile} defaults to compiling to @code{value},
145 as its product re-enters the Guile world.
146
147 @c FIXME: This doesn't work anymore :( Should we add some kind of
148 @c special GC pass, or disclaim this kind of code, or what?
149
150 Indeed, the process of compilation can circulate through these
151 different worlds indefinitely, as shown by the following quine:
152
153 @example
154 ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
155 @end example
156
157 @node The Scheme Compiler
158 @subsection The Scheme Compiler
159
160 The job of the Scheme compiler is to expand all macros and all of Scheme
161 to its most primitive expressions. The definition of ``primitive
162 expression'' is given by the inventory of constructs provided by
163 Tree-IL, the target language of the Scheme compiler: procedure calls,
164 conditionals, lexical references, and so on. This is described more
165 fully in the next section.
166
167 The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
168 that it is completely implemented by the macro expander. Since the
169 macro expander has to run over all of the source code already in order
170 to expand macros, it might as well do the analysis at the same time,
171 producing Tree-IL expressions directly.
172
173 Because this compiler is actually the macro expander, it is extensible.
174 Any macro which the user writes becomes part of the compiler.
175
176 The Scheme-to-Tree-IL expander may be invoked using the generic
177 @code{compile} procedure:
178
179 @lisp
180 (compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
181 @result{}
182 #<tree-il (call (toplevel +) (const 1) (const 2))>
183 @end lisp
184
185 @code{(compile @var{foo} #:from 'scheme #:to 'tree-il)} is entirely
186 equivalent to calling the macro expander as @code{(macroexpand @var{foo}
187 'c '(compile load eval))}. @xref{Macro Expansion}.
188 @code{compile-tree-il}, the procedure dispatched by @code{compile} to
189 @code{'tree-il}, is a small wrapper around @code{macroexpand}, to make
190 it conform to the general form of compiler procedures in Guile's
191 language tower.
192
193 Compiler procedures take three arguments: an expression, an
194 environment, and a keyword list of options. They return three values:
195 the compiled expression, the corresponding environment for the target
196 language, and a ``continuation environment''. The compiled expression
197 and environment will serve as input to the next language's compiler.
198 The ``continuation environment'' can be used to compile another
199 expression from the same source language within the same module.
200
201 For example, you might compile the expression, @code{(define-module
202 (foo))}. This will result in a Tree-IL expression and environment. But
203 if you compiled a second expression, you would want to take into
204 account the compile-time effect of compiling the previous expression,
205 which puts the user in the @code{(foo)} module. That is purpose of the
206 ``continuation environment''; you would pass it as the environment
207 when compiling the subsequent expression.
208
209 For Scheme, an environment is a module. By default, the @code{compile}
210 and @code{compile-file} procedures compile in a fresh module, such
211 that bindings and macros introduced by the expression being compiled
212 are isolated:
213
214 @example
215 (eq? (current-module) (compile '(current-module)))
216 @result{} #f
217
218 (compile '(define hello 'world))
219 (defined? 'hello)
220 @result{} #f
221
222 (define / *)
223 (eq? (compile '/) /)
224 @result{} #f
225 @end example
226
227 Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
228 @code{current-reader}}) are isolated:
229
230 @example
231 (compile '(fluid-set! current-reader (lambda args 'fail)))
232 (fluid-ref current-reader)
233 @result{} #f
234 @end example
235
236 Nevertheless, having the compiler and @dfn{compilee} share the same name
237 space can be achieved by explicitly passing @code{(current-module)} as
238 the compilation environment:
239
240 @example
241 (define hello 'world)
242 (compile 'hello #:env (current-module))
243 @result{} world
244 @end example
245
246 @node Tree-IL
247 @subsection Tree-IL
248
249 Tree Intermediate Language (Tree-IL) is a structured intermediate
250 language that is close in expressive power to Scheme. It is an
251 expanded, pre-analyzed Scheme.
252
253 Tree-IL is ``structured'' in the sense that its representation is
254 based on records, not S-expressions. This gives a rigidity to the
255 language that ensures that compiling to a lower-level language only
256 requires a limited set of transformations. For example, the Tree-IL
257 type @code{<const>} is a record type with two fields, @code{src} and
258 @code{exp}. Instances of this type are created via @code{make-const}.
259 Fields of this type are accessed via the @code{const-src} and
260 @code{const-exp} procedures. There is also a predicate, @code{const?}.
261 @xref{Records}, for more information on records.
262
263 @c alpha renaming
264
265 All Tree-IL types have a @code{src} slot, which holds source location
266 information for the expression. This information, if present, will be
267 residualized into the compiled object code, allowing backtraces to
268 show source information. The format of @code{src} is the same as that
269 returned by Guile's @code{source-properties} function. @xref{Source
270 Properties}, for more information.
271
272 Although Tree-IL objects are represented internally using records,
273 there is also an equivalent S-expression external representation for
274 each kind of Tree-IL. For example, the S-expression representation
275 of @code{#<const src: #f exp: 3>} expression would be:
276
277 @example
278 (const 3)
279 @end example
280
281 Users may program with this format directly at the REPL:
282
283 @example
284 scheme@@(guile-user)> ,language tree-il
285 Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
286 tree-il@@(guile-user)> (call (primitive +) (const 32) (const 10))
287 @result{} 42
288 @end example
289
290 The @code{src} fields are left out of the external representation.
291
292 One may create Tree-IL objects from their external representations via
293 calling @code{parse-tree-il}, the reader for Tree-IL. If any source
294 information is attached to the input S-expression, it will be
295 propagated to the resulting Tree-IL expressions. This is probably the
296 easiest way to compile to Tree-IL: just make the appropriate external
297 representations in S-expression format, and let @code{parse-tree-il}
298 take care of the rest.
299
300 @deftp {Scheme Variable} <void> src
301 @deftpx {External Representation} (void)
302 An empty expression. In practice, equivalent to Scheme's @code{(if #f
303 #f)}.
304 @end deftp
305
306 @deftp {Scheme Variable} <const> src exp
307 @deftpx {External Representation} (const @var{exp})
308 A constant.
309 @end deftp
310
311 @deftp {Scheme Variable} <primitive-ref> src name
312 @deftpx {External Representation} (primitive @var{name})
313 A reference to a ``primitive''. A primitive is a procedure that, when
314 compiled, may be open-coded. For example, @code{cons} is usually
315 recognized as a primitive, so that it compiles down to a single
316 instruction.
317
318 Compilation of Tree-IL usually begins with a pass that resolves some
319 @code{<module-ref>} and @code{<toplevel-ref>} expressions to
320 @code{<primitive-ref>} expressions. The actual compilation pass has
321 special cases for calls to certain primitives, like @code{apply} or
322 @code{cons}.
323 @end deftp
324
325 @deftp {Scheme Variable} <lexical-ref> src name gensym
326 @deftpx {External Representation} (lexical @var{name} @var{gensym})
327 A reference to a lexically-bound variable. The @var{name} is the
328 original name of the variable in the source program. @var{gensym} is a
329 unique identifier for this variable.
330 @end deftp
331
332 @deftp {Scheme Variable} <lexical-set> src name gensym exp
333 @deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
334 Sets a lexically-bound variable.
335 @end deftp
336
337 @deftp {Scheme Variable} <module-ref> src mod name public?
338 @deftpx {External Representation} (@@ @var{mod} @var{name})
339 @deftpx {External Representation} (@@@@ @var{mod} @var{name})
340 A reference to a variable in a specific module. @var{mod} should be
341 the name of the module, e.g.@: @code{(guile-user)}.
342
343 If @var{public?} is true, the variable named @var{name} will be looked
344 up in @var{mod}'s public interface, and serialized with @code{@@};
345 otherwise it will be looked up among the module's private bindings,
346 and is serialized with @code{@@@@}.
347 @end deftp
348
349 @deftp {Scheme Variable} <module-set> src mod name public? exp
350 @deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
351 @deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
352 Sets a variable in a specific module.
353 @end deftp
354
355 @deftp {Scheme Variable} <toplevel-ref> src name
356 @deftpx {External Representation} (toplevel @var{name})
357 References a variable from the current procedure's module.
358 @end deftp
359
360 @deftp {Scheme Variable} <toplevel-set> src name exp
361 @deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
362 Sets a variable in the current procedure's module.
363 @end deftp
364
365 @deftp {Scheme Variable} <toplevel-define> src name exp
366 @deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
367 Defines a new top-level variable in the current procedure's module.
368 @end deftp
369
370 @deftp {Scheme Variable} <conditional> src test then else
371 @deftpx {External Representation} (if @var{test} @var{then} @var{else})
372 A conditional. Note that @var{else} is not optional.
373 @end deftp
374
375 @deftp {Scheme Variable} <call> src proc args
376 @deftpx {External Representation} (call @var{proc} . @var{args})
377 A procedure call.
378 @end deftp
379
380 @deftp {Scheme Variable} <primcall> src name args
381 @deftpx {External Representation} (primcall @var{name} . @var{args})
382 A call to a primitive. Equivalent to @code{(call (primitive @var{name})
383 . @var{args})}. This construct is often more convenient to generate and
384 analyze than @code{<call>}.
385
386 As part of the compilation process, instances of @code{(call (primitive
387 @var{name}) . @var{args})} are transformed into primcalls.
388 @end deftp
389
390 @deftp {Scheme Variable} <seq> src head tail
391 @deftpx {External Representation} (seq @var{head} @var{tail})
392 A sequence. The semantics is that @var{head} is evaluated first, and
393 any resulting values are ignored. Then @var{tail} is evaluated, in tail
394 position.
395 @end deftp
396
397 @deftp {Scheme Variable} <lambda> src meta body
398 @deftpx {External Representation} (lambda @var{meta} @var{body})
399 A closure. @var{meta} is an association list of properties for the
400 procedure. @var{body} is a single Tree-IL expression of type
401 @code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
402 an alternate clause, this makes Tree-IL's @code{<lambda>} have the
403 expressiveness of Scheme's @code{case-lambda}.
404 @end deftp
405
406 @deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
407 @deftpx {External Representation} @
408 (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
409 @var{body})@
410 [@var{alternate}])
411 One clause of a @code{case-lambda}. A @code{lambda} expression in
412 Scheme is treated as a @code{case-lambda} with one clause.
413
414 @var{req} is a list of the procedure's required arguments, as symbols.
415 @var{opt} is a list of the optional arguments, or @code{#f} if there
416 are no optional arguments. @var{rest} is the name of the rest
417 argument, or @code{#f}.
418
419 @var{kw} is a list of the form, @code{(@var{allow-other-keys?}
420 (@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
421 keyword corresponding to the argument named @var{name}, and whose
422 corresponding gensym is @var{var}. @var{inits} are tree-il expressions
423 corresponding to all of the optional and keyword arguments, evaluated to
424 bind variables whose value is not supplied by the procedure caller.
425 Each @var{init} expression is evaluated in the lexical context of
426 previously bound variables, from left to right.
427
428 @var{gensyms} is a list of gensyms corresponding to all arguments:
429 first all of the required arguments, then the optional arguments if
430 any, then the rest argument if any, then all of the keyword arguments.
431
432 @var{body} is the body of the clause. If the procedure is called with
433 an appropriate number of arguments, @var{body} is evaluated in tail
434 position. Otherwise, if there is an @var{alternate}, it should be a
435 @code{<lambda-case>} expression, representing the next clause to try.
436 If there is no @var{alternate}, a wrong-number-of-arguments error is
437 signaled.
438 @end deftp
439
440 @deftp {Scheme Variable} <let> src names gensyms vals exp
441 @deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
442 Lexical binding, like Scheme's @code{let}. @var{names} are the original
443 binding names, @var{gensyms} are gensyms corresponding to the
444 @var{names}, and @var{vals} are Tree-IL expressions for the values.
445 @var{exp} is a single Tree-IL expression.
446 @end deftp
447
448 @deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
449 @deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
450 @deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
451 A version of @code{<let>} that creates recursive bindings, like
452 Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
453 @end deftp
454
455 @deftp {Scheme Variable} <prompt> escape-only? tag body handler
456 @deftpx {External Representation} (prompt @var{escape-only?} @var{tag} @var{body} @var{handler})
457 A dynamic prompt. Instates a prompt named @var{tag}, an expression,
458 during the dynamic extent of the execution of @var{body}, also an
459 expression. If an abort occurs to this prompt, control will be passed
460 to @var{handler}, also an expression, which should be a procedure. The
461 first argument to the handler procedure will be the captured
462 continuation, followed by all of the values passed to the abort. If
463 @var{escape-only?} is true, the handler should be a @code{<lambda>} with
464 a single @code{<lambda-case>} body expression with no optional or
465 keyword arguments, and no alternate, and whose first argument is
466 unreferenced. @xref{Prompts}, for more information.
467 @end deftp
468
469 @deftp {Scheme Variable} <abort> tag args tail
470 @deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
471 An abort to the nearest prompt with the name @var{tag}, an expression.
472 @var{args} should be a list of expressions to pass to the prompt's
473 handler, and @var{tail} should be an expression that will evaluate to
474 a list of additional arguments. An abort will save the partial
475 continuation, which may later be reinstated, resulting in the
476 @code{<abort>} expression evaluating to some number of values.
477 @end deftp
478
479 There are two Tree-IL constructs that are not normally produced by
480 higher-level compilers, but instead are generated during the
481 source-to-source optimization and analysis passes that the Tree-IL
482 compiler does. Users should not generate these expressions directly,
483 unless they feel very clever, as the default analysis pass will generate
484 them as necessary.
485
486 @deftp {Scheme Variable} <let-values> src names gensyms exp body
487 @deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
488 Like Scheme's @code{receive} -- binds the values returned by
489 evaluating @code{exp} to the @code{lambda}-like bindings described by
490 @var{gensyms}. That is to say, @var{gensyms} may be an improper list.
491
492 @code{<let-values>} is an optimization of a @code{<call>} to the
493 primitive, @code{call-with-values}.
494 @end deftp
495
496 @deftp {Scheme Variable} <fix> src names gensyms vals body
497 @deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
498 Like @code{<letrec>}, but only for @var{vals} that are unset
499 @code{lambda} expressions.
500
501 @code{fix} is an optimization of @code{letrec} (and @code{let}).
502 @end deftp
503
504 Tree-IL is a convenient compilation target from source languages. It
505 can be convenient as a medium for optimization, though CPS is usually
506 better. The strength of Tree-IL is that it does not fix order of
507 evaluation, so it makes some code motion a bit easier.
508
509 Optimization passes performed on Tree-IL currently include:
510
511 @itemize
512 @item Open-coding (turning toplevel-refs into primitive-refs,
513 and calls to primitives to primcalls)
514 @item Partial evaluation (comprising inlining, copy propagation, and
515 constant folding)
516 @item Common subexpression elimination (CSE)
517 @end itemize
518
519 In the future, we will move the CSE pass to operate over the lower-level
520 CPS language.
521
522 @node Continuation-Passing Style
523 @subsection Continuation-Passing Style
524
525 @cindex CPS
526 Continuation-passing style (CPS) is Guile's principal intermediate
527 language, bridging the gap between languages for people and languages
528 for machines. CPS gives a name to every part of a program: every
529 control point, and every intermediate value. This makes it an excellent
530 medium for reasoning about programs, which is the principal job of a
531 compiler.
532
533 @menu
534 * An Introduction to CPS::
535 * CPS in Guile::
536 * Compiling CPS::
537 @end menu
538
539 @node An Introduction to CPS
540 @subsubsection An Introduction to CPS
541
542 As an example, consider the following Scheme expression:
543
544 @lisp
545 (begin
546 (display "The sum of 32 and 10 is: ")
547 (display 42)
548 (newline))
549 @end lisp
550
551 Let us identify all of the sub-expressions in this expression. We give
552 them unique labels, like @var{k1}, and annotate the original source
553 code:
554
555 @lisp
556 (begin
557 (display "The sum of 32 and 10 is: ")
558 |k1 k2
559 k0
560 (display 42)
561 |k4 k5
562 k3
563 (newline))
564 |k7
565 k6
566 @end lisp
567
568 These labels also identify continuations. For example, the continuation
569 of @code{k7} is @code{k6}. This is because after evaluating the value
570 of @code{newline}, performed by the expression labelled @code{k7}, we
571 continue to apply it in @var{k6}.
572
573 Which label has @code{k0} as its continuation? It is either @code{k1}
574 or @code{k2}. Scheme does not have a fixed order of evaluation of
575 arguments, although it does guarantee that they are evaluated in some
576 order. However, continuation-passing style makes evaluation order
577 explicit. In Guile, this choice is made by the higher-level language
578 compilers.
579
580 Let us assume a left-to-right evaluation order. In that case the
581 continuation of @code{k1} is @code{k2}, and the continuation of
582 @code{k2} is @code{k0}.
583
584 With this example established, we are ready to give an example of CPS in
585 Scheme:
586
587 @lisp
588 (lambda (ktail)
589 (let ((k1 (lambda ()
590 (let ((k2 (lambda (proc)
591 (let ((k0 (lambda (arg0)
592 (proc k4 arg0))))
593 (k0 "The sum of 32 and 10 is: ")))))
594 (k2 display))))
595 (k4 (lambda _
596 (let ((k5 (lambda (proc)
597 (let ((k3 (lambda (arg0)
598 (proc k7 arg0))))
599 (k3 42)))))
600 (k5 display))))
601 (k7 (lambda _
602 (let ((k6 (lambda (proc)
603 (proc ktail))))
604 (k6 newline)))))
605 (k1))
606 @end lisp
607
608 Holy code explosion, Batman! What's with all the lambdas? Indeed, CPS
609 is by nature much more verbose than ``direct-style'' intermediate
610 languages like Tree-IL. At the same time, CPS is more simple than full
611 Scheme, in the same way that a Turing machine is more simple than
612 Scheme, although they are semantically equivalent.
613
614 In the original program, the expression labelled @code{k0} is in effect
615 context. Any values it returns are ignored. This is reflected in CPS
616 by noting that its continuation, @code{k4}, takes any number of values
617 and ignores them. Compare this to @code{k2}, which takes a single
618 value; in this way we can say that @code{k1} is in a ``value'' context.
619 Likewise @code{k6} is in tail context with respect to the expression as
620 a whole, because its continuation is the tail continuation,
621 @code{ktail}. CPS makes these details manifest, and gives them names.
622
623 @node CPS in Guile
624 @subsubsection CPS in Guile
625
626 Like Tree-IL, CPS is also a structured language, implemented with
627 records not S-expressions.
628
629 @deftp {Scheme Variable} <prompt> escape-only? tag body handler
630 @deftpx {External Representation} (prompt @var{escape-only?} @var{tag} @var{body} @var{handler})
631 @end deftp
632
633 @deftp {Scheme Variable} $arity req opt rest kw allow-other-keys?
634 @end deftp
635
636
637 @deftp {Scheme Variable} $letk conts body
638 @deftpx {External Representation} (letk @var{conts} @var{body})
639 @end deftp
640 @deftp {Scheme Variable} $continue k src exp
641 @deftpx {External Representation} (continue @var{k} @var{src} @var{exp})
642 @end deftp
643 @deftp {Scheme Variable} $letrec names syms funs body
644 @deftpx {External Representation} (letrec @var{names} @var{syms} @var{funs} @var{body})
645 @end deftp
646
647 ;; Continuations
648 @deftp {Scheme Variable} $cont k cont
649 @deftpx {External Representation} (cont k cont)
650 @end deftp
651 @deftp {Scheme Variable} $kif kt kf
652 @deftpx {External Representation} (kif kt kf)
653 @end deftp
654 @deftp {Scheme Variable} $ktrunc arity k
655 @deftpx {External Representation} (ktrunc arity k)
656 @end deftp
657 @deftp {Scheme Variable} $kargs names syms body
658 @deftpx {External Representation} (kargs names syms body)
659 @end deftp
660 @deftp {Scheme Variable} $kentry self tail clauses
661 @deftpx {External Representation} (kentry self tail clauses)
662 @end deftp
663 @deftp {Scheme Variable} $ktail
664 @deftpx {External Representation} (ktail)
665 @end deftp
666 @deftp {Scheme Variable} $kclause arity cont
667 @deftpx {External Representation} (kclause arity cont)
668 @end deftp
669
670 ;; Expressions.
671 @deftp {Scheme Variable} $void
672 @deftpx {External Representation} (void)
673 @end deftp
674 @deftp {Scheme Variable} $const val
675 @deftpx {External Representation} (const val)
676 @end deftp
677 @deftp {Scheme Variable} $prim name
678 @deftpx {External Representation} (prim name)
679 @end deftp
680 @deftp {Scheme Variable} $fun src meta free body
681 @deftpx {External Representation} (fun src meta free body)
682 @end deftp
683 @deftp {Scheme Variable} $call proc args
684 @deftpx {External Representation} (call proc args)
685 @end deftp
686 @deftp {Scheme Variable} $primcall name args
687 @deftpx {External Representation} (primcall name args)
688 @end deftp
689 @deftp {Scheme Variable} $values args
690 @deftpx {External Representation} (values args)
691 @end deftp
692 @deftp {Scheme Variable} $prompt escape? tag handler pop
693 @deftpx {External Representation} (prompt escape? tag handler pop)
694 @end deftp
695
696 ;; Helper.
697 $arity
698 make-$arity
699
700 ;; Terms.
701 $letk $continue $letrec
702
703 ;; Continuations.
704 $cont
705
706 ;; Continuation bodies.
707 $kif $ktrunc $kargs $kentry $ktail $kclause
708
709 ;; Expressions.
710 $void $const $prim $fun $call $primcall $values $prompt
711
712 ;; Building macros.
713 let-gensyms
714 build-cps-term build-cps-cont build-cps-exp
715 rewrite-cps-term rewrite-cps-cont rewrite-cps-exp
716
717 ;; Misc.
718 parse-cps unparse-cps
719 fold-conts fold-local-conts
720
721 cwcc
722
723 records, unlike early cps (rabbit, orbit)
724
725 @node Compiling CPS
726 @subsubsection Compiling CPS
727
728 In CPS, there are no nested expressions. Indeed, CPS even removes the
729 concept of a stack. All applications in CPS are in tail context. For
730 that reason, applications in CPS are jumps, not calls. The @code{(k1)}
731 above is nothing more than a @code{goto}. @code{(k3 42)} is a
732 @code{goto} with a value. In this way, CPS bridges the gap between the
733 lambda calculus and machine instruction sequences.
734
735 On the side of machine instructions, Guile does still have a stack, and
736 the @code{lambda} forms shown above do not actually result in one
737 closure being allocated per subexpression at run-time. Lambda
738 expressions introduced by a CPS transformation can always be allocated
739 as labels or basic blocks within a function. In fact, we make a
740 syntactic distinction between closures and continuations in the CPS
741 language, and attempt to transform closures to continuations (basic
742 blocks) where possible, via the @dfn{contification} optimization pass.
743
744 Values bound by continuations are allocated to stack slots in a
745 function's frame. The compiler from CPS only allocates slots to values
746 that are actually live; it's possible to have a value in scope but not
747 allocated to a slot.
748
749
750 @node Bytecode
751 @subsection Bytecode
752
753 Blah blah ...
754
755 @xref{Object File Format}
756
757 (system vm loader)
758
759 @deffn {Scheme Variable} load-thunk-from-file file
760 @deffnx {C Function} scm_load_thunk_from_file (file)
761 Load object code from a file named @var{file}. The file will be mapped
762 into memory via @code{mmap}, so this is a very fast operation.
763
764 On disk, object code is embedded in ELF, a flexible container format
765 created for use in UNIX systems. Guile has its own ELF linker and
766 loader, so it uses the ELF format on all systems.
767 @end deffn
768
769 likewise load-thunk-from-memory
770
771 Compiling object code to the fake language, @code{value}, is performed
772 via loading objcode into a program, then executing that thunk with
773 respect to the compilation environment. Normally the environment
774 propagates through the compiler transparently, but users may specify
775 the compilation environment manually as well, as a module.
776
777
778 @node Writing New High-Level Languages
779 @subsection Writing New High-Level Languages
780
781 In order to integrate a new language @var{lang} into Guile's compiler
782 system, one has to create the module @code{(language @var{lang} spec)}
783 containing the language definition and referencing the parser,
784 compiler and other routines processing it. The module hierarchy in
785 @code{(language brainfuck)} defines a very basic Brainfuck
786 implementation meant to serve as easy-to-understand example on how to
787 do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
788 for more information about the Brainfuck language itself.
789
790
791 @node Extending the Compiler
792 @subsection Extending the Compiler
793
794 At this point we take a detour from the impersonal tone of the rest of
795 the manual. Admit it: if you've read this far into the compiler
796 internals manual, you are a junkie. Perhaps a course at your university
797 left you unsated, or perhaps you've always harbored a desire to hack the
798 holy of computer science holies: a compiler. Well you're in good
799 company, and in a good position. Guile's compiler needs your help.
800
801 There are many possible avenues for improving Guile's compiler.
802 Probably the most important improvement, speed-wise, will be some form
803 of native compilation, both just-in-time and ahead-of-time. This could
804 be done in many ways. Probably the easiest strategy would be to extend
805 the compiled procedure structure to include a pointer to a native code
806 vector, and compile from bytecode to native code at run-time after a
807 procedure is called a certain number of times.
808
809 The name of the game is a profiling-based harvest of the low-hanging
810 fruit, running programs of interest under a system-level profiler and
811 determining which improvements would give the most bang for the buck.
812 It's really getting to the point though that native compilation is the
813 next step.
814
815 The compiler also needs help at the top end, enhancing the Scheme that
816 it knows to also understand R6RS, and adding new high-level compilers.
817 We have JavaScript and Emacs Lisp mostly complete, but they could use
818 some love; Lua would be nice as well, but whatever language it is
819 that strikes your fancy would be welcome too.
820
821 Compilers are for hacking, not for admiring or for complaining about.
822 Get to it!