2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 2008, 2009, 2010, 2011, 2012, 2013
4 @c Free Software Foundation, Inc.
5 @c See the file guile.texi for copying conditions.
7 @node Compiling to the Virtual Machine
8 @section Compiling to the Virtual Machine
10 Compilers! The word itself inspires excitement and awe, even among
11 experienced practitioners. But a compiler is just a program: an
12 eminently hackable thing. This section aims to to describe Guile's
13 compiler in such a way that interested Scheme hackers can feel
14 comfortable reading and extending it.
16 @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
17 know how to compile your @code{.scm} file.
21 * The Scheme Compiler::
23 * Continuation-Passing Style::
25 * Writing New High-Level Languages::
26 * Extending the Compiler::
30 @subsection Compiler Tower
32 Guile's compiler is quite simple -- its @emph{compilers}, to put it more
33 accurately. Guile defines a tower of languages, starting at Scheme and
34 progressively simplifying down to languages that resemble the VM
35 instruction set (@pxref{Instruction Set}).
37 Each language knows how to compile to the next, so each step is simple
38 and understandable. Furthermore, this set of languages is not hardcoded
39 into Guile, so it is possible for the user to add new high-level
40 languages, new passes, or even different compilation targets.
42 Languages are registered in the module, @code{(system base language)}:
45 (use-modules (system base language))
48 They are registered with the @code{define-language} form.
50 @deffn {Scheme Syntax} define-language @
51 [#:name] [#:title] [#:reader] [#:printer] @
52 [#:parser=#f] [#:compilers='()] @
53 [#:decompilers='()] [#:evaluator=#f] @
54 [#:joiner=#f] [#:for-humans?=#t] @
55 [#:make-default-environment=make-fresh-user-module]
58 This syntax defines a @code{<language>} object, bound to @var{name} in
59 the current environment. In addition, the language will be added to the
60 global language set. For example, this is the language definition for
64 (define-language scheme
66 #:reader (lambda (port env) ...)
67 #:compilers `((tree-il . ,compile-tree-il))
68 #:decompilers `((tree-il . ,decompile-tree-il))
69 #:evaluator (lambda (x module) (primitive-eval x))
71 #:make-default-environment (lambda () ...))
75 The interesting thing about having languages defined this way is that
76 they present a uniform interface to the read-eval-print loop. This
77 allows the user to change the current language of the REPL:
80 scheme@@(guile-user)> ,language tree-il
81 Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
82 tree-il@@(guile-user)> ,L scheme
83 Happy hacking with Scheme! To switch back, type `,L tree-il'.
87 Languages can be looked up by name, as they were above.
89 @deffn {Scheme Procedure} lookup-language name
90 Looks up a language named @var{name}, autoloading it if necessary.
92 Languages are autoloaded by looking for a variable named @var{name} in
93 a module named @code{(language @var{name} spec)}.
95 The language object will be returned, or @code{#f} if there does not
96 exist a language with that name.
99 Defining languages this way allows us to programmatically determine
100 the necessary steps for compiling code from one language to another.
102 @deffn {Scheme Procedure} lookup-compilation-order from to
103 Recursively traverses the set of languages to which @var{from} can
104 compile, depth-first, and return the first path that can transform
105 @var{from} to @var{to}. Returns @code{#f} if no path is found.
107 This function memoizes its results in a cache that is invalidated by
108 subsequent calls to @code{define-language}, so it should be quite
112 There is a notion of a ``current language'', which is maintained in the
113 @code{current-language} parameter, defined in the core @code{(guile)}
114 module. This language is normally Scheme, and may be rebound by the
115 user. The run-time compilation interfaces
116 (@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
117 and target languages.
119 The normal tower of languages when compiling Scheme goes like this:
123 @item Tree Intermediate Language (Tree-IL)
124 @item Continuation-Passing Style (CPS)
128 As discussed before (@pxref{Object File Format}), bytecode is in ELF
129 format, ready to be serialized to disk. But when compiling Scheme at
130 run time, you want a Scheme value: for example, a compiled procedure.
131 For this reason, so as not to break the abstraction, Guile defines a
132 fake language at the bottom of the tower:
138 Compiling to @code{value} loads the bytecode into a procedure, turning
139 cold bytes into warm code.
141 Perhaps this strangeness can be explained by example:
142 @code{compile-file} defaults to compiling to bytecode, because it
143 produces object code that has to live in the barren world outside the
144 Guile runtime; but @code{compile} defaults to compiling to @code{value},
145 as its product re-enters the Guile world.
147 @c FIXME: This doesn't work anymore :( Should we add some kind of
148 @c special GC pass, or disclaim this kind of code, or what?
150 Indeed, the process of compilation can circulate through these
151 different worlds indefinitely, as shown by the following quine:
154 ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
157 @node The Scheme Compiler
158 @subsection The Scheme Compiler
160 The job of the Scheme compiler is to expand all macros and all of Scheme
161 to its most primitive expressions. The definition of ``primitive
162 expression'' is given by the inventory of constructs provided by
163 Tree-IL, the target language of the Scheme compiler: procedure calls,
164 conditionals, lexical references, and so on. This is described more
165 fully in the next section.
167 The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
168 that it is completely implemented by the macro expander. Since the
169 macro expander has to run over all of the source code already in order
170 to expand macros, it might as well do the analysis at the same time,
171 producing Tree-IL expressions directly.
173 Because this compiler is actually the macro expander, it is extensible.
174 Any macro which the user writes becomes part of the compiler.
176 The Scheme-to-Tree-IL expander may be invoked using the generic
177 @code{compile} procedure:
180 (compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
182 #<tree-il (call (toplevel +) (const 1) (const 2))>
185 @code{(compile @var{foo} #:from 'scheme #:to 'tree-il)} is entirely
186 equivalent to calling the macro expander as @code{(macroexpand @var{foo}
187 'c '(compile load eval))}. @xref{Macro Expansion}.
188 @code{compile-tree-il}, the procedure dispatched by @code{compile} to
189 @code{'tree-il}, is a small wrapper around @code{macroexpand}, to make
190 it conform to the general form of compiler procedures in Guile's
193 Compiler procedures take three arguments: an expression, an
194 environment, and a keyword list of options. They return three values:
195 the compiled expression, the corresponding environment for the target
196 language, and a ``continuation environment''. The compiled expression
197 and environment will serve as input to the next language's compiler.
198 The ``continuation environment'' can be used to compile another
199 expression from the same source language within the same module.
201 For example, you might compile the expression, @code{(define-module
202 (foo))}. This will result in a Tree-IL expression and environment. But
203 if you compiled a second expression, you would want to take into
204 account the compile-time effect of compiling the previous expression,
205 which puts the user in the @code{(foo)} module. That is purpose of the
206 ``continuation environment''; you would pass it as the environment
207 when compiling the subsequent expression.
209 For Scheme, an environment is a module. By default, the @code{compile}
210 and @code{compile-file} procedures compile in a fresh module, such
211 that bindings and macros introduced by the expression being compiled
215 (eq? (current-module) (compile '(current-module)))
218 (compile '(define hello 'world))
227 Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
228 @code{current-reader}}) are isolated:
231 (compile '(fluid-set! current-reader (lambda args 'fail)))
232 (fluid-ref current-reader)
236 Nevertheless, having the compiler and @dfn{compilee} share the same name
237 space can be achieved by explicitly passing @code{(current-module)} as
238 the compilation environment:
241 (define hello 'world)
242 (compile 'hello #:env (current-module))
249 Tree Intermediate Language (Tree-IL) is a structured intermediate
250 language that is close in expressive power to Scheme. It is an
251 expanded, pre-analyzed Scheme.
253 Tree-IL is ``structured'' in the sense that its representation is
254 based on records, not S-expressions. This gives a rigidity to the
255 language that ensures that compiling to a lower-level language only
256 requires a limited set of transformations. For example, the Tree-IL
257 type @code{<const>} is a record type with two fields, @code{src} and
258 @code{exp}. Instances of this type are created via @code{make-const}.
259 Fields of this type are accessed via the @code{const-src} and
260 @code{const-exp} procedures. There is also a predicate, @code{const?}.
261 @xref{Records}, for more information on records.
265 All Tree-IL types have a @code{src} slot, which holds source location
266 information for the expression. This information, if present, will be
267 residualized into the compiled object code, allowing backtraces to
268 show source information. The format of @code{src} is the same as that
269 returned by Guile's @code{source-properties} function. @xref{Source
270 Properties}, for more information.
272 Although Tree-IL objects are represented internally using records,
273 there is also an equivalent S-expression external representation for
274 each kind of Tree-IL. For example, the S-expression representation
275 of @code{#<const src: #f exp: 3>} expression would be:
281 Users may program with this format directly at the REPL:
284 scheme@@(guile-user)> ,language tree-il
285 Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
286 tree-il@@(guile-user)> (call (primitive +) (const 32) (const 10))
290 The @code{src} fields are left out of the external representation.
292 One may create Tree-IL objects from their external representations via
293 calling @code{parse-tree-il}, the reader for Tree-IL. If any source
294 information is attached to the input S-expression, it will be
295 propagated to the resulting Tree-IL expressions. This is probably the
296 easiest way to compile to Tree-IL: just make the appropriate external
297 representations in S-expression format, and let @code{parse-tree-il}
298 take care of the rest.
300 @deftp {Scheme Variable} <void> src
301 @deftpx {External Representation} (void)
302 An empty expression. In practice, equivalent to Scheme's @code{(if #f
306 @deftp {Scheme Variable} <const> src exp
307 @deftpx {External Representation} (const @var{exp})
311 @deftp {Scheme Variable} <primitive-ref> src name
312 @deftpx {External Representation} (primitive @var{name})
313 A reference to a ``primitive''. A primitive is a procedure that, when
314 compiled, may be open-coded. For example, @code{cons} is usually
315 recognized as a primitive, so that it compiles down to a single
318 Compilation of Tree-IL usually begins with a pass that resolves some
319 @code{<module-ref>} and @code{<toplevel-ref>} expressions to
320 @code{<primitive-ref>} expressions. The actual compilation pass has
321 special cases for calls to certain primitives, like @code{apply} or
325 @deftp {Scheme Variable} <lexical-ref> src name gensym
326 @deftpx {External Representation} (lexical @var{name} @var{gensym})
327 A reference to a lexically-bound variable. The @var{name} is the
328 original name of the variable in the source program. @var{gensym} is a
329 unique identifier for this variable.
332 @deftp {Scheme Variable} <lexical-set> src name gensym exp
333 @deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
334 Sets a lexically-bound variable.
337 @deftp {Scheme Variable} <module-ref> src mod name public?
338 @deftpx {External Representation} (@@ @var{mod} @var{name})
339 @deftpx {External Representation} (@@@@ @var{mod} @var{name})
340 A reference to a variable in a specific module. @var{mod} should be
341 the name of the module, e.g.@: @code{(guile-user)}.
343 If @var{public?} is true, the variable named @var{name} will be looked
344 up in @var{mod}'s public interface, and serialized with @code{@@};
345 otherwise it will be looked up among the module's private bindings,
346 and is serialized with @code{@@@@}.
349 @deftp {Scheme Variable} <module-set> src mod name public? exp
350 @deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
351 @deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
352 Sets a variable in a specific module.
355 @deftp {Scheme Variable} <toplevel-ref> src name
356 @deftpx {External Representation} (toplevel @var{name})
357 References a variable from the current procedure's module.
360 @deftp {Scheme Variable} <toplevel-set> src name exp
361 @deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
362 Sets a variable in the current procedure's module.
365 @deftp {Scheme Variable} <toplevel-define> src name exp
366 @deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
367 Defines a new top-level variable in the current procedure's module.
370 @deftp {Scheme Variable} <conditional> src test then else
371 @deftpx {External Representation} (if @var{test} @var{then} @var{else})
372 A conditional. Note that @var{else} is not optional.
375 @deftp {Scheme Variable} <call> src proc args
376 @deftpx {External Representation} (call @var{proc} . @var{args})
380 @deftp {Scheme Variable} <primcall> src name args
381 @deftpx {External Representation} (primcall @var{name} . @var{args})
382 A call to a primitive. Equivalent to @code{(call (primitive @var{name})
383 . @var{args})}. This construct is often more convenient to generate and
384 analyze than @code{<call>}.
386 As part of the compilation process, instances of @code{(call (primitive
387 @var{name}) . @var{args})} are transformed into primcalls.
390 @deftp {Scheme Variable} <seq> src head tail
391 @deftpx {External Representation} (seq @var{head} @var{tail})
392 A sequence. The semantics is that @var{head} is evaluated first, and
393 any resulting values are ignored. Then @var{tail} is evaluated, in tail
397 @deftp {Scheme Variable} <lambda> src meta body
398 @deftpx {External Representation} (lambda @var{meta} @var{body})
399 A closure. @var{meta} is an association list of properties for the
400 procedure. @var{body} is a single Tree-IL expression of type
401 @code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
402 an alternate clause, this makes Tree-IL's @code{<lambda>} have the
403 expressiveness of Scheme's @code{case-lambda}.
406 @deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
407 @deftpx {External Representation} @
408 (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
411 One clause of a @code{case-lambda}. A @code{lambda} expression in
412 Scheme is treated as a @code{case-lambda} with one clause.
414 @var{req} is a list of the procedure's required arguments, as symbols.
415 @var{opt} is a list of the optional arguments, or @code{#f} if there
416 are no optional arguments. @var{rest} is the name of the rest
417 argument, or @code{#f}.
419 @var{kw} is a list of the form, @code{(@var{allow-other-keys?}
420 (@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
421 keyword corresponding to the argument named @var{name}, and whose
422 corresponding gensym is @var{var}. @var{inits} are tree-il expressions
423 corresponding to all of the optional and keyword arguments, evaluated to
424 bind variables whose value is not supplied by the procedure caller.
425 Each @var{init} expression is evaluated in the lexical context of
426 previously bound variables, from left to right.
428 @var{gensyms} is a list of gensyms corresponding to all arguments:
429 first all of the required arguments, then the optional arguments if
430 any, then the rest argument if any, then all of the keyword arguments.
432 @var{body} is the body of the clause. If the procedure is called with
433 an appropriate number of arguments, @var{body} is evaluated in tail
434 position. Otherwise, if there is an @var{alternate}, it should be a
435 @code{<lambda-case>} expression, representing the next clause to try.
436 If there is no @var{alternate}, a wrong-number-of-arguments error is
440 @deftp {Scheme Variable} <let> src names gensyms vals exp
441 @deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
442 Lexical binding, like Scheme's @code{let}. @var{names} are the original
443 binding names, @var{gensyms} are gensyms corresponding to the
444 @var{names}, and @var{vals} are Tree-IL expressions for the values.
445 @var{exp} is a single Tree-IL expression.
448 @deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
449 @deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
450 @deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
451 A version of @code{<let>} that creates recursive bindings, like
452 Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
455 @deftp {Scheme Variable} <prompt> escape-only? tag body handler
456 @deftpx {External Representation} (prompt @var{escape-only?} @var{tag} @var{body} @var{handler})
457 A dynamic prompt. Instates a prompt named @var{tag}, an expression,
458 during the dynamic extent of the execution of @var{body}, also an
459 expression. If an abort occurs to this prompt, control will be passed
460 to @var{handler}, also an expression, which should be a procedure. The
461 first argument to the handler procedure will be the captured
462 continuation, followed by all of the values passed to the abort. If
463 @var{escape-only?} is true, the handler should be a @code{<lambda>} with
464 a single @code{<lambda-case>} body expression with no optional or
465 keyword arguments, and no alternate, and whose first argument is
466 unreferenced. @xref{Prompts}, for more information.
469 @deftp {Scheme Variable} <abort> tag args tail
470 @deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
471 An abort to the nearest prompt with the name @var{tag}, an expression.
472 @var{args} should be a list of expressions to pass to the prompt's
473 handler, and @var{tail} should be an expression that will evaluate to
474 a list of additional arguments. An abort will save the partial
475 continuation, which may later be reinstated, resulting in the
476 @code{<abort>} expression evaluating to some number of values.
479 There are two Tree-IL constructs that are not normally produced by
480 higher-level compilers, but instead are generated during the
481 source-to-source optimization and analysis passes that the Tree-IL
482 compiler does. Users should not generate these expressions directly,
483 unless they feel very clever, as the default analysis pass will generate
486 @deftp {Scheme Variable} <let-values> src names gensyms exp body
487 @deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
488 Like Scheme's @code{receive} -- binds the values returned by
489 evaluating @code{exp} to the @code{lambda}-like bindings described by
490 @var{gensyms}. That is to say, @var{gensyms} may be an improper list.
492 @code{<let-values>} is an optimization of a @code{<call>} to the
493 primitive, @code{call-with-values}.
496 @deftp {Scheme Variable} <fix> src names gensyms vals body
497 @deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
498 Like @code{<letrec>}, but only for @var{vals} that are unset
499 @code{lambda} expressions.
501 @code{fix} is an optimization of @code{letrec} (and @code{let}).
504 Tree-IL is a convenient compilation target from source languages. It
505 can be convenient as a medium for optimization, though CPS is usually
506 better. The strength of Tree-IL is that it does not fix order of
507 evaluation, so it makes some code motion a bit easier.
509 Optimization passes performed on Tree-IL currently include:
512 @item Open-coding (turning toplevel-refs into primitive-refs,
513 and calls to primitives to primcalls)
514 @item Partial evaluation (comprising inlining, copy propagation, and
516 @item Common subexpression elimination (CSE)
519 In the future, we will move the CSE pass to operate over the lower-level
522 @node Continuation-Passing Style
523 @subsection Continuation-Passing Style
526 Continuation-passing style (CPS) is Guile's principal intermediate
527 language, bridging the gap between languages for people and languages
528 for machines. CPS gives a name to every part of a program: every
529 control point, and every intermediate value. This makes it an excellent
530 medium for reasoning about programs, which is the principal job of a
534 * An Introduction to CPS::
539 @node An Introduction to CPS
540 @subsubsection An Introduction to CPS
542 As an example, consider the following Scheme expression:
546 (display "The sum of 32 and 10 is: ")
551 Let us identify all of the sub-expressions in this expression. We give
552 them unique labels, like @var{k1}, and annotate the original source
557 (display "The sum of 32 and 10 is: ")
568 These labels also identify continuations. For example, the continuation
569 of @code{k7} is @code{k6}. This is because after evaluating the value
570 of @code{newline}, performed by the expression labelled @code{k7}, we
571 continue to apply it in @var{k6}.
573 Which label has @code{k0} as its continuation? It is either @code{k1}
574 or @code{k2}. Scheme does not have a fixed order of evaluation of
575 arguments, although it does guarantee that they are evaluated in some
576 order. However, continuation-passing style makes evaluation order
577 explicit. In Guile, this choice is made by the higher-level language
580 Let us assume a left-to-right evaluation order. In that case the
581 continuation of @code{k1} is @code{k2}, and the continuation of
582 @code{k2} is @code{k0}.
584 With this example established, we are ready to give an example of CPS in
590 (let ((k2 (lambda (proc)
591 (let ((k0 (lambda (arg0)
593 (k0 "The sum of 32 and 10 is: ")))))
596 (let ((k5 (lambda (proc)
597 (let ((k3 (lambda (arg0)
602 (let ((k6 (lambda (proc)
608 Holy code explosion, Batman! What's with all the lambdas? Indeed, CPS
609 is by nature much more verbose than ``direct-style'' intermediate
610 languages like Tree-IL. At the same time, CPS is more simple than full
611 Scheme, in the same way that a Turing machine is more simple than
612 Scheme, although they are semantically equivalent.
614 In the original program, the expression labelled @code{k0} is in effect
615 context. Any values it returns are ignored. This is reflected in CPS
616 by noting that its continuation, @code{k4}, takes any number of values
617 and ignores them. Compare this to @code{k2}, which takes a single
618 value; in this way we can say that @code{k1} is in a ``value'' context.
619 Likewise @code{k6} is in tail context with respect to the expression as
620 a whole, because its continuation is the tail continuation,
621 @code{ktail}. CPS makes these details manifest, and gives them names.
624 @subsubsection CPS in Guile
626 Like Tree-IL, CPS is also a structured language, implemented with
627 records not S-expressions.
629 @deftp {Scheme Variable} <prompt> escape-only? tag body handler
630 @deftpx {External Representation} (prompt @var{escape-only?} @var{tag} @var{body} @var{handler})
633 @deftp {Scheme Variable} $arity req opt rest kw allow-other-keys?
637 @deftp {Scheme Variable} $letk conts body
638 @deftpx {External Representation} (letk @var{conts} @var{body})
640 @deftp {Scheme Variable} $continue k src exp
641 @deftpx {External Representation} (continue @var{k} @var{src} @var{exp})
643 @deftp {Scheme Variable} $letrec names syms funs body
644 @deftpx {External Representation} (letrec @var{names} @var{syms} @var{funs} @var{body})
648 @deftp {Scheme Variable} $cont k cont
649 @deftpx {External Representation} (cont k cont)
651 @deftp {Scheme Variable} $kif kt kf
652 @deftpx {External Representation} (kif kt kf)
654 @deftp {Scheme Variable} $ktrunc arity k
655 @deftpx {External Representation} (ktrunc arity k)
657 @deftp {Scheme Variable} $kargs names syms body
658 @deftpx {External Representation} (kargs names syms body)
660 @deftp {Scheme Variable} $kentry self tail clauses
661 @deftpx {External Representation} (kentry self tail clauses)
663 @deftp {Scheme Variable} $ktail
664 @deftpx {External Representation} (ktail)
666 @deftp {Scheme Variable} $kclause arity cont
667 @deftpx {External Representation} (kclause arity cont)
671 @deftp {Scheme Variable} $void
672 @deftpx {External Representation} (void)
674 @deftp {Scheme Variable} $const val
675 @deftpx {External Representation} (const val)
677 @deftp {Scheme Variable} $prim name
678 @deftpx {External Representation} (prim name)
680 @deftp {Scheme Variable} $fun src meta free body
681 @deftpx {External Representation} (fun src meta free body)
683 @deftp {Scheme Variable} $call proc args
684 @deftpx {External Representation} (call proc args)
686 @deftp {Scheme Variable} $primcall name args
687 @deftpx {External Representation} (primcall name args)
689 @deftp {Scheme Variable} $values args
690 @deftpx {External Representation} (values args)
692 @deftp {Scheme Variable} $prompt escape? tag handler pop
693 @deftpx {External Representation} (prompt escape? tag handler pop)
701 $letk $continue $letrec
706 ;; Continuation bodies.
707 $kif $ktrunc $kargs $kentry $ktail $kclause
710 $void $const $prim $fun $call $primcall $values $prompt
714 build-cps-term build-cps-cont build-cps-exp
715 rewrite-cps-term rewrite-cps-cont rewrite-cps-exp
718 parse-cps unparse-cps
719 fold-conts fold-local-conts
723 records, unlike early cps (rabbit, orbit)
726 @subsubsection Compiling CPS
728 In CPS, there are no nested expressions. Indeed, CPS even removes the
729 concept of a stack. All applications in CPS are in tail context. For
730 that reason, applications in CPS are jumps, not calls. The @code{(k1)}
731 above is nothing more than a @code{goto}. @code{(k3 42)} is a
732 @code{goto} with a value. In this way, CPS bridges the gap between the
733 lambda calculus and machine instruction sequences.
735 On the side of machine instructions, Guile does still have a stack, and
736 the @code{lambda} forms shown above do not actually result in one
737 closure being allocated per subexpression at run-time. Lambda
738 expressions introduced by a CPS transformation can always be allocated
739 as labels or basic blocks within a function. In fact, we make a
740 syntactic distinction between closures and continuations in the CPS
741 language, and attempt to transform closures to continuations (basic
742 blocks) where possible, via the @dfn{contification} optimization pass.
744 Values bound by continuations are allocated to stack slots in a
745 function's frame. The compiler from CPS only allocates slots to values
746 that are actually live; it's possible to have a value in scope but not
755 @xref{Object File Format}
759 @deffn {Scheme Variable} load-thunk-from-file file
760 @deffnx {C Function} scm_load_thunk_from_file (file)
761 Load object code from a file named @var{file}. The file will be mapped
762 into memory via @code{mmap}, so this is a very fast operation.
764 On disk, object code is embedded in ELF, a flexible container format
765 created for use in UNIX systems. Guile has its own ELF linker and
766 loader, so it uses the ELF format on all systems.
769 likewise load-thunk-from-memory
771 Compiling object code to the fake language, @code{value}, is performed
772 via loading objcode into a program, then executing that thunk with
773 respect to the compilation environment. Normally the environment
774 propagates through the compiler transparently, but users may specify
775 the compilation environment manually as well, as a module.
778 @node Writing New High-Level Languages
779 @subsection Writing New High-Level Languages
781 In order to integrate a new language @var{lang} into Guile's compiler
782 system, one has to create the module @code{(language @var{lang} spec)}
783 containing the language definition and referencing the parser,
784 compiler and other routines processing it. The module hierarchy in
785 @code{(language brainfuck)} defines a very basic Brainfuck
786 implementation meant to serve as easy-to-understand example on how to
787 do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
788 for more information about the Brainfuck language itself.
791 @node Extending the Compiler
792 @subsection Extending the Compiler
794 At this point we take a detour from the impersonal tone of the rest of
795 the manual. Admit it: if you've read this far into the compiler
796 internals manual, you are a junkie. Perhaps a course at your university
797 left you unsated, or perhaps you've always harbored a desire to hack the
798 holy of computer science holies: a compiler. Well you're in good
799 company, and in a good position. Guile's compiler needs your help.
801 There are many possible avenues for improving Guile's compiler.
802 Probably the most important improvement, speed-wise, will be some form
803 of native compilation, both just-in-time and ahead-of-time. This could
804 be done in many ways. Probably the easiest strategy would be to extend
805 the compiled procedure structure to include a pointer to a native code
806 vector, and compile from bytecode to native code at run-time after a
807 procedure is called a certain number of times.
809 The name of the game is a profiling-based harvest of the low-hanging
810 fruit, running programs of interest under a system-level profiler and
811 determining which improvements would give the most bang for the buck.
812 It's really getting to the point though that native compilation is the
815 The compiler also needs help at the top end, enhancing the Scheme that
816 it knows to also understand R6RS, and adding new high-level compilers.
817 We have JavaScript and Emacs Lisp mostly complete, but they could use
818 some love; Lua would be nice as well, but whatever language it is
819 that strikes your fancy would be welcome too.
821 Compilers are for hacking, not for admiring or for complaining about.