2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 2008, 2009, 2010
4 @c Free Software Foundation, Inc.
5 @c See the file guile.texi for copying conditions.
7 @node Compiling to the Virtual Machine
8 @section Compiling to the Virtual Machine
10 Compilers have a mystique about them that is attractive and
11 off-putting at the same time. They are attractive because they are
12 magical -- they transform inert text into live results, like throwing
13 the switch on Frankenstein's monster. However, this magic is perceived
14 by many to be impenetrable.
16 This section aims to pay attention to the small man behind the
19 @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
20 know how to compile your @code{.scm} file.
24 * The Scheme Compiler::
28 * Bytecode and Objcode::
29 * Writing New High-Level Languages::
30 * Extending the Compiler::
34 @subsection Compiler Tower
36 Guile's compiler is quite simple, actually -- its @emph{compilers}, to
37 put it more accurately. Guile defines a tower of languages, starting
38 at Scheme and progressively simplifying down to languages that
39 resemble the VM instruction set (@pxref{Instruction Set}).
41 Each language knows how to compile to the next, so each step is simple
42 and understandable. Furthermore, this set of languages is not
43 hardcoded into Guile, so it is possible for the user to add new
44 high-level languages, new passes, or even different compilation
47 Languages are registered in the module, @code{(system base language)}:
50 (use-modules (system base language))
53 They are registered with the @code{define-language} form.
55 @deffn {Scheme Syntax} define-language @
56 name title reader printer @
57 [parser=#f] [compilers='()] [decompilers='()] [evaluator=#f] @
58 [joiner=#f] [make-default-environment=make-fresh-user-module]
61 This syntax defines a @code{#<language>} object, bound to @var{name}
62 in the current environment. In addition, the language will be added to
63 the global language set. For example, this is the language definition
67 (define-language scheme
69 #:reader (lambda (port env) ...)
70 #:compilers `((tree-il . ,compile-tree-il))
71 #:decompilers `((tree-il . ,decompile-tree-il))
72 #:evaluator (lambda (x module) (primitive-eval x))
74 #:make-default-environment (lambda () ...))
78 The interesting thing about having languages defined this way is that
79 they present a uniform interface to the read-eval-print loop. This
80 allows the user to change the current language of the REPL:
83 scheme@@(guile-user)> ,language tree-il
84 Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
85 tree-il@@(guile-user)> ,L scheme
86 Happy hacking with Scheme! To switch back, type `,L tree-il'.
90 Languages can be looked up by name, as they were above.
92 @deffn {Scheme Procedure} lookup-language name
93 Looks up a language named @var{name}, autoloading it if necessary.
95 Languages are autoloaded by looking for a variable named @var{name} in
96 a module named @code{(language @var{name} spec)}.
98 The language object will be returned, or @code{#f} if there does not
99 exist a language with that name.
102 Defining languages this way allows us to programmatically determine
103 the necessary steps for compiling code from one language to another.
105 @deffn {Scheme Procedure} lookup-compilation-order from to
106 Recursively traverses the set of languages to which @var{from} can
107 compile, depth-first, and return the first path that can transform
108 @var{from} to @var{to}. Returns @code{#f} if no path is found.
110 This function memoizes its results in a cache that is invalidated by
111 subsequent calls to @code{define-language}, so it should be quite
115 There is a notion of a ``current language'', which is maintained in
116 the @code{*current-language*} fluid. This language is normally Scheme,
117 and may be rebound by the user. The run-time compilation interfaces
118 (@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
119 and target languages.
121 The normal tower of languages when compiling Scheme goes like this:
125 @item Tree Intermediate Language (Tree-IL)
126 @item Guile Lowlevel Intermediate Language (GLIL)
132 Object code may be serialized to disk directly, though it has a cookie
133 and version prepended to the front. But when compiling Scheme at run
134 time, you want a Scheme value: for example, a compiled procedure. For
135 this reason, so as not to break the abstraction, Guile defines a fake
136 language at the bottom of the tower:
142 Compiling to @code{value} loads the object code into a procedure, and
143 wakes the sleeping giant.
145 Perhaps this strangeness can be explained by example:
146 @code{compile-file} defaults to compiling to object code, because it
147 produces object code that has to live in the barren world outside the
148 Guile runtime; but @code{compile} defaults to compiling to
149 @code{value}, as its product re-enters the Guile world.
151 Indeed, the process of compilation can circulate through these
152 different worlds indefinitely, as shown by the following quine:
155 ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
158 @node The Scheme Compiler
159 @subsection The Scheme Compiler
161 The job of the Scheme compiler is to expand all macros and all of
162 Scheme to its most primitive expressions. The definition of
163 ``primitive'' is given by the inventory of constructs provided by
164 Tree-IL, the target language of the Scheme compiler: procedure
165 applications, conditionals, lexical references, etc. This is described
166 more fully in the next section.
168 The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
169 that it is completely implemented by the macro expander. Since the
170 macro expander has to run over all of the source code already in order
171 to expand macros, it might as well do the analysis at the same time,
172 producing Tree-IL expressions directly.
174 Because this compiler is actually the macro expander, it is
175 extensible. Any macro which the user writes becomes part of the
178 The Scheme-to-Tree-IL expander may be invoked using the generic
179 @code{compile} procedure:
182 (compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
184 #<<application> src: #f
185 proc: #<<toplevel-ref> src: #f name: +>
186 args: (#<<const> src: #f exp: 1>
187 #<<const> src: #f exp: 2>)>
190 Or, since Tree-IL is so close to Scheme, it is often useful to expand
191 Scheme to Tree-IL, then translate back to Scheme. For that reason the
192 expander provides two interfaces. The former is equivalent to calling
193 @code{(macroexpand '(+ 1 2) 'c)}, where the @code{'c} is for
194 ``compile''. With @code{'e} (the default), the result is translated
198 (macroexpand '(+ 1 2))
200 (macroexpand '(let ((x 10)) (* x x)))
201 @result{} (let ((x84 10)) (* x84 x84))
204 The second example shows that as part of its job, the macro expander
205 renames lexically-bound variables. The original names are preserved
206 when compiling to Tree-IL, but can't be represented in Scheme: a
207 lexical binding only has one name. It is for this reason that the
208 @emph{native} output of the expander is @emph{not} Scheme. There's too
209 much information we would lose if we translated to Scheme directly:
210 lexical variable names, source locations, and module hygiene.
212 Note however that @code{macroexpand} does not have the same signature
213 as @code{compile-tree-il}. @code{compile-tree-il} is a small wrapper
214 around @code{macroexpand}, to make it conform to the general form of
215 compiler procedures in Guile's language tower.
217 Compiler procedures take three arguments: an expression, an
218 environment, and a keyword list of options. They return three values:
219 the compiled expression, the corresponding environment for the target
220 language, and a ``continuation environment''. The compiled expression
221 and environment will serve as input to the next language's compiler.
222 The ``continuation environment'' can be used to compile another
223 expression from the same source language within the same module.
225 For example, you might compile the expression, @code{(define-module
226 (foo))}. This will result in a Tree-IL expression and environment. But
227 if you compiled a second expression, you would want to take into
228 account the compile-time effect of compiling the previous expression,
229 which puts the user in the @code{(foo)} module. That is purpose of the
230 ``continuation environment''; you would pass it as the environment
231 when compiling the subsequent expression.
233 For Scheme, an environment is a module. By default, the @code{compile}
234 and @code{compile-file} procedures compile in a fresh module, such
235 that bindings and macros introduced by the expression being compiled
239 (eq? (current-module) (compile '(current-module)))
242 (compile '(define hello 'world))
251 Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
252 @code{current-reader}}) are isolated:
255 (compile '(fluid-set! current-reader (lambda args 'fail)))
256 (fluid-ref current-reader)
260 Nevertheless, having the compiler and @dfn{compilee} share the same name
261 space can be achieved by explicitly passing @code{(current-module)} as
262 the compilation environment:
265 (define hello 'world)
266 (compile 'hello #:env (current-module))
273 Tree Intermediate Language (Tree-IL) is a structured intermediate
274 language that is close in expressive power to Scheme. It is an
275 expanded, pre-analyzed Scheme.
277 Tree-IL is ``structured'' in the sense that its representation is
278 based on records, not S-expressions. This gives a rigidity to the
279 language that ensures that compiling to a lower-level language only
280 requires a limited set of transformations. For example, the Tree-IL
281 type @code{<const>} is a record type with two fields, @code{src} and
282 @code{exp}. Instances of this type are created via @code{make-const}.
283 Fields of this type are accessed via the @code{const-src} and
284 @code{const-exp} procedures. There is also a predicate, @code{const?}.
285 @xref{Records}, for more information on records.
289 All Tree-IL types have a @code{src} slot, which holds source location
290 information for the expression. This information, if present, will be
291 residualized into the compiled object code, allowing backtraces to
292 show source information. The format of @code{src} is the same as that
293 returned by Guile's @code{source-properties} function. @xref{Source
294 Properties}, for more information.
296 Although Tree-IL objects are represented internally using records,
297 there is also an equivalent S-expression external representation for
298 each kind of Tree-IL. For example, an the S-expression representation
299 of @code{#<const src: #f exp: 3>} expression would be:
305 Users may program with this format directly at the REPL:
308 scheme@@(guile-user)> ,language tree-il
309 Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
310 tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
314 The @code{src} fields are left out of the external representation.
316 One may create Tree-IL objects from their external representations via
317 calling @code{parse-tree-il}, the reader for Tree-IL. If any source
318 information is attached to the input S-expression, it will be
319 propagated to the resulting Tree-IL expressions. This is probably the
320 easiest way to compile to Tree-IL: just make the appropriate external
321 representations in S-expression format, and let @code{parse-tree-il}
322 take care of the rest.
324 @deftp {Scheme Variable} <void> src
325 @deftpx {External Representation} (void)
326 An empty expression. In practice, equivalent to Scheme's @code{(if #f
329 @deftp {Scheme Variable} <const> src exp
330 @deftpx {External Representation} (const @var{exp})
333 @deftp {Scheme Variable} <primitive-ref> src name
334 @deftpx {External Representation} (primitive @var{name})
335 A reference to a ``primitive''. A primitive is a procedure that, when
336 compiled, may be open-coded. For example, @code{cons} is usually
337 recognized as a primitive, so that it compiles down to a single
340 Compilation of Tree-IL usually begins with a pass that resolves some
341 @code{<module-ref>} and @code{<toplevel-ref>} expressions to
342 @code{<primitive-ref>} expressions. The actual compilation pass
343 has special cases for applications of certain primitives, like
344 @code{apply} or @code{cons}.
346 @deftp {Scheme Variable} <lexical-ref> src name gensym
347 @deftpx {External Representation} (lexical @var{name} @var{gensym})
348 A reference to a lexically-bound variable. The @var{name} is the
349 original name of the variable in the source program. @var{gensym} is a
350 unique identifier for this variable.
352 @deftp {Scheme Variable} <lexical-set> src name gensym exp
353 @deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
354 Sets a lexically-bound variable.
356 @deftp {Scheme Variable} <module-ref> src mod name public?
357 @deftpx {External Representation} (@@ @var{mod} @var{name})
358 @deftpx {External Representation} (@@@@ @var{mod} @var{name})
359 A reference to a variable in a specific module. @var{mod} should be
360 the name of the module, e.g. @code{(guile-user)}.
362 If @var{public?} is true, the variable named @var{name} will be looked
363 up in @var{mod}'s public interface, and serialized with @code{@@};
364 otherwise it will be looked up among the module's private bindings,
365 and is serialized with @code{@@@@}.
367 @deftp {Scheme Variable} <module-set> src mod name public? exp
368 @deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
369 @deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
370 Sets a variable in a specific module.
372 @deftp {Scheme Variable} <toplevel-ref> src name
373 @deftpx {External Representation} (toplevel @var{name})
374 References a variable from the current procedure's module.
376 @deftp {Scheme Variable} <toplevel-set> src name exp
377 @deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
378 Sets a variable in the current procedure's module.
380 @deftp {Scheme Variable} <toplevel-define> src name exp
381 @deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
382 Defines a new top-level variable in the current procedure's module.
384 @deftp {Scheme Variable} <conditional> src test then else
385 @deftpx {External Representation} (if @var{test} @var{then} @var{else})
386 A conditional. Note that @var{else} is not optional.
388 @deftp {Scheme Variable} <application> src proc args
389 @deftpx {External Representation} (apply @var{proc} . @var{args})
392 @deftp {Scheme Variable} <sequence> src exps
393 @deftpx {External Representation} (begin . @var{exps})
394 Like Scheme's @code{begin}.
396 @deftp {Scheme Variable} <lambda> src meta body
397 @deftpx {External Representation} (lambda @var{meta} @var{body})
398 A closure. @var{meta} is an association list of properties for the
399 procedure. @var{body} is a single Tree-IL expression of type
400 @code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
401 an alternate clause, this makes Tree-IL's @code{<lambda>} have the
402 expressiveness of Scheme's @code{case-lambda}.
404 @deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
405 @deftpx {External Representation} @
406 (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
409 One clause of a @code{case-lambda}. A @code{lambda} expression in
410 Scheme is treated as a @code{case-lambda} with one clause.
412 @var{req} is a list of the procedure's required arguments, as symbols.
413 @var{opt} is a list of the optional arguments, or @code{#f} if there
414 are no optional arguments. @var{rest} is the name of the rest
415 argument, or @code{#f}.
417 @var{kw} is a list of the form, @code{(@var{allow-other-keys?}
418 (@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
419 keyword corresponding to the argument named @var{name}, and whose
420 corresponding gensym is @var{var}. @var{inits} are tree-il expressions
421 corresponding to all of the optional and keyword argumens, evaluated
422 to bind variables whose value is not supplied by the procedure caller.
423 Each @var{init} expression is evaluated in the lexical context of
424 previously bound variables, from left to right.
426 @var{gensyms} is a list of gensyms corresponding to all arguments:
427 first all of the required arguments, then the optional arguments if
428 any, then the rest argument if any, then all of the keyword arguments.
430 @var{body} is the body of the clause. If the procedure is called with
431 an appropriate number of arguments, @var{body} is evaluated in tail
432 position. Otherwise, if there is a @var{consequent}, it should be a
433 @code{<lambda-case>} expression, representing the next clause to try.
434 If there is no @var{consequent}, a wrong-number-of-arguments error is
437 @deftp {Scheme Variable} <let> src names gensyms vals exp
438 @deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
439 Lexical binding, like Scheme's @code{let}. @var{names} are the
440 original binding names, @var{gensyms} are gensyms corresponding to the
441 @var{names}, and @var{vals} are Tree-IL expressions for the values.
442 @var{exp} is a single Tree-IL expression.
444 @deftp {Scheme Variable} <letrec> src names gensyms vals exp
445 @deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
446 A version of @code{<let>} that creates recursive bindings, like
447 Scheme's @code{letrec}.
449 @deftp {Scheme Variable} <dynlet> fluids vals body
450 @deftpx {External Representation} (dynlet @var{fluids} @var{vals} @var{body})
451 Dynamic binding; the equivalent of Scheme's @code{with-fluids}.
452 @var{fluids} should be a list of Tree-IL expressions that will
453 evaluate to fluids, and @var{vals} a corresponding list of expressions
454 to bind to the fluids during the dynamic extent of the evaluation of
457 @deftp {Scheme Variable} <dynref> fluid
458 @deftpx {External Representation} (dynref @var{fluid})
459 A dynamic variable reference. @var{fluid} should be a Tree-IL
460 expression evaluating to a fluid.
462 @deftp {Scheme Variable} <dynset> fluid exp
463 @deftpx {External Representation} (dynset @var{fluid} @var{exp})
464 A dynamic variable set. @var{fluid}, a Tree-IL expression evaluating
465 to a fluid, will be set to the result of evaluating @var{exp}.
467 @deftp {Scheme Variable} <dynwind> winder body unwinder
468 @deftpx {External Representation} (dynwind @var{winder} @var{body} @var{unwinder})
469 A @code{dynamic-wind}. @var{winder} and @var{unwinder} should both
470 evaluate to thunks. Ensure that the winder and the unwinder are called
471 before entering and after leaving @var{body}. Note that @var{body} is
472 an expression, without a thunk wrapper.
474 @deftp {Scheme Variable} <prompt> tag body handler
475 @deftpx {External Representation} (prompt @var{tag} @var{body} @var{handler})
476 A dynamic prompt. Instates a prompt named @var{tag}, an expression,
477 during the dynamic extent of the execution of @var{body}, also an
478 expression. If an abort occurs to this prompt, control will be passed
479 to @var{handler}, a @code{<lambda-case>} expression with no optional
480 or keyword arguments, and no alternate. The first argument to the
481 @code{<lambda-case>} will be the captured continuation, and then all
482 of the values passed to the abort. @xref{Prompts}, for more
485 @deftp {Scheme Variable} <abort> tag args tail
486 @deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
487 An abort to the nearest prompt with the name @var{tag}, an expression.
488 @var{args} should be a list of expressions to pass to the prompt's
489 handler, and @var{tail} should be an expression that will evaluate to
490 a list of additional arguments. An abort will save the partial
491 continuation, which may later be reinstated, resulting in the
492 @code{<abort>} expression evaluating to some number of values.
495 There are two Tree-IL constructs that are not normally produced by
496 higher-level compilers, but instead are generated during the
497 source-to-source optimization and analysis passes that the Tree-IL
498 compiler does. Users should not generate these expressions directly,
499 unless they feel very clever, as the default analysis pass will
500 generate them as necessary.
502 @deftp {Scheme Variable} <let-values> src names gensyms exp body
503 @deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
504 Like Scheme's @code{receive} -- binds the values returned by
505 evaluating @code{exp} to the @code{lambda}-like bindings described by
506 @var{gensyms}. That is to say, @var{gensyms} may be an improper list.
508 @code{<let-values>} is an optimization of @code{<application>} of the
509 primitive, @code{call-with-values}.
511 @deftp {Scheme Variable} <fix> src names gensyms vals body
512 @deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
513 Like @code{<letrec>}, but only for @var{vals} that are unset
514 @code{lambda} expressions.
516 @code{fix} is an optimization of @code{letrec} (and @code{let}).
519 Tree-IL implements a compiler to GLIL that recursively traverses
520 Tree-IL expressions, writing out GLIL expressions into a linear list.
521 The compiler also keeps some state as to whether the current
522 expression is in tail context, and whether its value will be used in
523 future computations. This state allows the compiler not to emit code
524 for constant expressions that will not be used (e.g. docstrings), and
525 to perform tail calls when in tail position.
527 Most optimization, such as it currently is, is performed on Tree-IL
528 expressions as source-to-source transformations. There will be more
529 optimizations added in the future.
531 Interested readers are encouraged to read the implementation in
532 @code{(language tree-il compile-glil)} for more details.
537 Guile Lowlevel Intermediate Language (GLIL) is a structured intermediate
538 language whose expressions more closely approximate Guile's VM
539 instruction set. Its expression types are defined in @code{(language
542 @deftp {Scheme Variable} <glil-program> meta . body
543 A unit of code that at run-time will correspond to a compiled
544 procedure. @var{meta} should be an alist of properties, as in
545 Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
548 @deftp {Scheme Variable} <glil-std-prelude> nreq nlocs else-label
549 A prologue for a function with no optional, keyword, or rest
550 arguments. @var{nreq} is the number of required arguments. @var{nlocs}
551 the total number of local variables, including the arguments. If the
552 procedure was not given exactly @var{nreq} arguments, control will
553 jump to @var{else-label}, if given, or otherwise signal an error.
555 @deftp {Scheme Variable} <glil-opt-prelude> nreq nopt rest nlocs else-label
556 A prologue for a function with optional or rest arguments. Like
557 @code{<glil-std-prelude>}, with the addition that @var{nopt} is the
558 number of optional arguments (possibly zero) and @var{rest} is an
559 index of a local variable at which to bind a rest argument, or
560 @code{#f} if there is no rest argument.
562 @deftp {Scheme Variable} <glil-kw-prelude> nreq nopt rest kw allow-other-keys? nlocs else-label
563 A prologue for a function with keyword arguments. Like
564 @code{<glil-opt-prelude>}, with the addition that @var{kw} is a list
565 of keyword arguments, and @var{allow-other-keys?} is a flag indicating
566 whether to allow unknown keys. @xref{Function Prologue Instructions,
567 @code{bind-kwargs}}, for details on the format of @var{kw}.
569 @deftp {Scheme Variable} <glil-bind> . vars
570 An advisory expression that notes a liveness extent for a set of
571 variables. @var{vars} is a list of @code{(@var{name} @var{type}
572 @var{index})}, where @var{type} should be either @code{argument},
573 @code{local}, or @code{external}.
575 @code{<glil-bind>} expressions end up being serialized as part of a
576 program's metadata and do not form part of a program's code path.
578 @deftp {Scheme Variable} <glil-mv-bind> vars rest
579 A multiple-value binding of the values on the stack to @var{vars}. Iff
580 @var{rest} is true, the last element of @var{vars} will be treated as
583 In addition to pushing a binding annotation on the stack, like
584 @code{<glil-bind>}, an expression is emitted at compilation time to
585 make sure that there are enough values available to bind. See the
586 notes on @code{truncate-values} in @ref{Procedure Call and Return
587 Instructions}, for more information.
589 @deftp {Scheme Variable} <glil-unbind>
590 Closes the liveness extent of the most recently encountered
591 @code{<glil-bind>} or @code{<glil-mv-bind>} expression. As GLIL
592 expressions are compiled, a parallel stack of live bindings is
593 maintained; this expression pops off the top element from that stack.
595 Bindings are written into the program's metadata so that debuggers and
596 other tools can determine the set of live local variables at a given
597 offset within a VM program.
599 @deftp {Scheme Variable} <glil-source> loc
600 Records source information for the preceding expression. @var{loc}
601 should be an association list of containing @code{line} @code{column},
602 and @code{filename} keys, e.g. as returned by
603 @code{source-properties}.
605 @deftp {Scheme Variable} <glil-void>
606 Pushes ``the unspecified value'' on the stack.
608 @deftp {Scheme Variable} <glil-const> obj
609 Pushes a constant value onto the stack. @var{obj} must be a number,
610 string, symbol, keyword, boolean, character, uniform array, the empty
611 list, or a pair or vector of constants.
613 @deftp {Scheme Variable} <glil-lexical> local? boxed? op index
614 Accesses a lexically bound variable. If the variable is not
615 @var{local?} it is free. All variables may have @code{ref},
616 @code{set}, and @code{bound?} as their @var{op}. Boxed variables may
617 also have the @var{op}s @code{box}, @code{empty-box}, and @code{fix},
618 which correspond in semantics to the VM instructions @code{box},
619 @code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
622 @deftp {Scheme Variable} <glil-toplevel> op name
623 Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
626 @deftp {Scheme Variable} <glil-module> op mod name public?
627 Accesses a variable within a specific module. See Tree-IL's
628 @code{<module-ref>}, for more information.
630 @deftp {Scheme Variable} <glil-label> label
631 Creates a new label. @var{label} can be any Scheme value, and should
634 @deftp {Scheme Variable} <glil-branch> inst label
635 Branch to a label. @var{label} should be a @code{<ghil-label>}.
636 @code{inst} is a branching instruction: @code{br-if}, @code{br}, etc.
638 @deftp {Scheme Variable} <glil-call> inst nargs
639 This expression is probably misnamed, as it does not correspond to
640 function calls. @code{<glil-call>} invokes the VM instruction named
641 @var{inst}, noting that it is called with @var{nargs} stack arguments.
642 The arguments should be pushed on the stack already. What happens to
643 the stack afterwards depends on the instruction.
645 @deftp {Scheme Variable} <glil-mv-call> nargs ra
646 Performs a multiple-value call. @var{ra} is a @code{<glil-label>}
647 corresponding to the multiple-value return address for the call. See
648 the notes on @code{mv-call} in @ref{Procedure Call and Return
649 Instructions}, for more information.
651 @deftp {Scheme Variable} <glil-prompt> label escape-only?
652 Push a dynamic prompt into the stack, with a handler at @var{label}.
653 @var{escape-only?} is a flag that is propagated to the prompt,
654 allowing an abort to avoid capturing a continuation in some cases.
655 @xref{Prompts}, for more information.
658 Users may enter in GLIL at the REPL as well, though there is a bit
659 more bookkeeping to do:
662 scheme@@(guile-user)> ,language glil
663 Happy hacking with Guile Lowlevel Intermediate Language (GLIL)!
664 To switch back, type `,L scheme'.
665 glil@@(guile-user)> (program () (std-prelude 0 0 #f)
666 (const 3) (call return 1))
670 Just as in all of Guile's compilers, an environment is passed to the
671 GLIL-to-object code compiler, and one is returned as well, along with
677 Assembly is an S-expression-based, human-readable representation of
678 the actual bytecodes that will be emitted for the VM. As such, it is a
679 useful intermediate language both for compilation and for
682 Besides the fact that it is not a record-based language, assembly
683 differs from GLIL in four main ways:
686 @item Labels have been resolved to byte offsets in the program.
687 @item Constants inside procedures have either been expressed as inline
688 instructions or cached in object arrays.
689 @item Procedures with metadata (source location information, liveness
690 extents, procedure names, generic properties, etc) have had their
691 metadata serialized out to thunks.
692 @item All expressions correspond directly to VM instructions -- i.e.,
693 there is no @code{<glil-lexical>} which can be a ref or a set.
696 Assembly is isomorphic to the bytecode that it compiles to. You can
697 compile to bytecode, then decompile back to assembly, and you have the
700 The general form of assembly instructions is the following:
703 (@var{inst} @var{arg} ...)
706 The @var{inst} names a VM instruction, and its @var{arg}s will be
707 embedded in the instruction stream. The easiest way to see assembly is
708 to play around with it at the REPL, as can be seen in this annotated
712 scheme@@(guile-user)> (pp (compile '(+ 32 10) #:to 'assembly))
714 ((:LCASE16 . 2)) ; Labels, unused in this case.
715 8 ; Length of the thunk that was compiled.
716 (load-program ; Metadata thunk.
719 #f ; No metadata thunk for the metadata thunk.
722 (make-int8 2) ; Liveness extents, source info, and arities,
723 (make-int8 8) ; in a format that Guile knows how to parse.
729 (assert-nargs-ee/locals 0) ; Prologue.
730 (make-int8 32) ; Actual code starts here.
736 Of course you can switch the REPL to assembly and enter in assembly
737 S-expressions directly, like with other languages, though it is more
738 difficult, given that the length fields have to be correct.
740 @node Bytecode and Objcode
741 @subsection Bytecode and Objcode
743 Finally, the raw bytes. There are actually two different ``languages''
744 here, corresponding to two different ways to represent the bytes.
746 ``Bytecode'' represents code as uniform byte vectors, useful for
747 structuring and destructuring code on the Scheme level. Bytecode is
748 the next step down from assembly:
751 scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
752 @result{} #vu8(8 0 0 0 25 0 0 0 ; Header.
754 10 32 10 10 148 66 17 ; Actual code.
755 0 0 0 0 0 0 0 9 ; Metadata thunk.
756 9 10 2 10 8 11 18 0 3 18 0 1 18 0 3 66)
759 ``Objcode'' is bytecode, but mapped directly to a C structure,
760 @code{struct scm_objcode}:
763 struct scm_objcode @{
765 scm_t_uint32 metalen;
770 As one might imagine, objcode imposes a minimum length on the
771 bytecode. Also, the @code{len} and @code{metalen} fields are in native
772 endianness, which makes objcode (and bytecode) system-dependent.
774 Objcode also has a couple of important efficiency hacks. First,
775 objcode may be mapped directly from disk, allowing compiled code to be
776 loaded quickly, often from the system's disk cache, and shared among
777 multiple processes. Secondly, objcode may be embedded in other
778 objcode, allowing procedures to have the text of other procedures
779 inlined into their bodies, without the need for separate allocation of
780 the code. Of course, the objcode object itself does need to be
783 Procedures related to objcode are defined in the @code{(system vm
786 @deffn {Scheme Procedure} objcode? obj
787 @deffnx {C Function} scm_objcode_p (obj)
788 Returns @code{#f} iff @var{obj} is object code, @code{#f} otherwise.
791 @deffn {Scheme Procedure} bytecode->objcode bytecode
792 @deffnx {C Function} scm_bytecode_to_objcode (bytecode)
793 Makes a bytecode object from @var{bytecode}, which should be a
794 bytevector. @xref{Bytevectors}.
797 @deffn {Scheme Variable} load-objcode file
798 @deffnx {C Function} scm_load_objcode (file)
799 Load object code from a file named @var{file}. The file will be mapped
800 into memory via @code{mmap}, so this is a very fast operation.
802 On disk, object code has an sixteen-byte cookie prepended to it, to
803 prevent accidental loading of arbitrary garbage.
806 @deffn {Scheme Variable} write-objcode objcode file
807 @deffnx {C Function} scm_write_objcode (objcode)
808 Write object code out to a file, prepending the sixteen-byte cookie.
811 @deffn {Scheme Variable} objcode->bytecode objcode
812 @deffnx {C Function} scm_objcode_to_bytecode (objcode)
813 Copy object code out to a bytevector for analysis by Scheme.
816 The following procedure is actually in @code{(system vm program)}, but
817 we'll mention it here:
819 @deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
820 @deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
821 Load up object code into a Scheme program. The resulting program will
822 have @var{objtable} as its object table, which should be a vector or
823 @code{#f}, and will capture the free variables from @var{free-vars}.
826 Object code from a file may be disassembled at the REPL via the
827 meta-command @code{,disassemble-file}, abbreviated as @code{,xx}.
828 Programs may be disassembled via @code{,disassemble}, abbreviated as
831 Compiling object code to the fake language, @code{value}, is performed
832 via loading objcode into a program, then executing that thunk with
833 respect to the compilation environment. Normally the environment
834 propagates through the compiler transparently, but users may specify
835 the compilation environment manually as well, as a module.
838 @node Writing New High-Level Languages
839 @subsection Writing New High-Level Languages
841 In order to integrate a new language @var{lang} into Guile's compiler
842 system, one has to create the module @code{(language @var{lang} spec)}
843 containing the language definition and referencing the parser,
844 compiler and other routines processing it. The module hierarchy in
845 @code{(language brainfuck)} defines a very basic Brainfuck
846 implementation meant to serve as easy-to-understand example on how to
847 do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
848 for more information about the Brainfuck language itself.
851 @node Extending the Compiler
852 @subsection Extending the Compiler
854 At this point, we break with the impersonal tone of the rest of the
855 manual, and make an intervention. Admit it: if you've read this far
856 into the compiler internals manual, you are a junkie. Perhaps a course
857 at your university left you unsated, or perhaps you've always harbored
858 a sublimated desire to hack the holy of computer science holies: a
859 compiler. Well you're in good company, and in a good position. Guile's
860 compiler needs your help.
862 There are many possible avenues for improving Guile's compiler.
863 Probably the most important improvement, speed-wise, will be some form
864 of native compilation, both just-in-time and ahead-of-time. This could
865 be done in many ways. Probably the easiest strategy would be to extend
866 the compiled procedure structure to include a pointer to a native code
867 vector, and compile from bytecode to native code at run-time after a
868 procedure is called a certain number of times.
870 The name of the game is a profiling-based harvest of the low-hanging
871 fruit, running programs of interest under a system-level profiler and
872 determining which improvements would give the most bang for the buck.
873 It's really getting to the point though that native compilation is the
876 The compiler also needs help at the top end, enhancing the Scheme that
877 it knows to also understand R6RS, and adding new high-level compilers.
878 We have JavaScript and Emacs Lisp mostly complete, but they could use
879 some love; Lua would be nice as well, butq whatever language it is
880 that strikes your fancy would be welcome too.
882 Compilers are for hacking, not for admiring or for complaining about.