Make consistent the usage of variable names in the function definitions found in...
[bpt/guile.git] / doc / ref / compiler.texi
1 @c -*-texinfo-*-
2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 2008, 2009, 2010
4 @c Free Software Foundation, Inc.
5 @c See the file guile.texi for copying conditions.
6
7 @node Compiling to the Virtual Machine
8 @section Compiling to the Virtual Machine
9
10 Compilers have a mystique about them that is attractive and
11 off-putting at the same time. They are attractive because they are
12 magical -- they transform inert text into live results, like throwing
13 the switch on Frankenstein's monster. However, this magic is perceived
14 by many to be impenetrable.
15
16 This section aims to pay attention to the small man behind the
17 curtain.
18
19 @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
20 know how to compile your @code{.scm} file.
21
22 @menu
23 * Compiler Tower::
24 * The Scheme Compiler::
25 * Tree-IL::
26 * GLIL::
27 * Assembly::
28 * Bytecode and Objcode::
29 * Writing New High-Level Languages::
30 * Extending the Compiler::
31 @end menu
32
33 @node Compiler Tower
34 @subsection Compiler Tower
35
36 Guile's compiler is quite simple, actually -- its @emph{compilers}, to
37 put it more accurately. Guile defines a tower of languages, starting
38 at Scheme and progressively simplifying down to languages that
39 resemble the VM instruction set (@pxref{Instruction Set}).
40
41 Each language knows how to compile to the next, so each step is simple
42 and understandable. Furthermore, this set of languages is not
43 hardcoded into Guile, so it is possible for the user to add new
44 high-level languages, new passes, or even different compilation
45 targets.
46
47 Languages are registered in the module, @code{(system base language)}:
48
49 @example
50 (use-modules (system base language))
51 @end example
52
53 They are registered with the @code{define-language} form.
54
55 @deffn {Scheme Syntax} define-language @
56 name title reader printer @
57 [parser=#f] [compilers='()] [decompilers='()] [evaluator=#f] @
58 [joiner=#f] [make-default-environment=make-fresh-user-module]
59 Define a language.
60
61 This syntax defines a @code{#<language>} object, bound to @var{name}
62 in the current environment. In addition, the language will be added to
63 the global language set. For example, this is the language definition
64 for Scheme:
65
66 @example
67 (define-language scheme
68 #:title "Scheme"
69 #:reader (lambda (port env) ...)
70 #:compilers `((tree-il . ,compile-tree-il))
71 #:decompilers `((tree-il . ,decompile-tree-il))
72 #:evaluator (lambda (x module) (primitive-eval x))
73 #:printer write
74 #:make-default-environment (lambda () ...))
75 @end example
76 @end deffn
77
78 The interesting thing about having languages defined this way is that
79 they present a uniform interface to the read-eval-print loop. This
80 allows the user to change the current language of the REPL:
81
82 @example
83 scheme@@(guile-user)> ,language tree-il
84 Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
85 tree-il@@(guile-user)> ,L scheme
86 Happy hacking with Scheme! To switch back, type `,L tree-il'.
87 scheme@@(guile-user)>
88 @end example
89
90 Languages can be looked up by name, as they were above.
91
92 @deffn {Scheme Procedure} lookup-language name
93 Looks up a language named @var{name}, autoloading it if necessary.
94
95 Languages are autoloaded by looking for a variable named @var{name} in
96 a module named @code{(language @var{name} spec)}.
97
98 The language object will be returned, or @code{#f} if there does not
99 exist a language with that name.
100 @end deffn
101
102 Defining languages this way allows us to programmatically determine
103 the necessary steps for compiling code from one language to another.
104
105 @deffn {Scheme Procedure} lookup-compilation-order from to
106 Recursively traverses the set of languages to which @var{from} can
107 compile, depth-first, and return the first path that can transform
108 @var{from} to @var{to}. Returns @code{#f} if no path is found.
109
110 This function memoizes its results in a cache that is invalidated by
111 subsequent calls to @code{define-language}, so it should be quite
112 fast.
113 @end deffn
114
115 There is a notion of a ``current language'', which is maintained in
116 the @code{*current-language*} fluid. This language is normally Scheme,
117 and may be rebound by the user. The run-time compilation interfaces
118 (@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
119 and target languages.
120
121 The normal tower of languages when compiling Scheme goes like this:
122
123 @itemize
124 @item Scheme
125 @item Tree Intermediate Language (Tree-IL)
126 @item Guile Lowlevel Intermediate Language (GLIL)
127 @item Assembly
128 @item Bytecode
129 @item Objcode
130 @end itemize
131
132 Object code may be serialized to disk directly, though it has a cookie
133 and version prepended to the front. But when compiling Scheme at run
134 time, you want a Scheme value: for example, a compiled procedure. For
135 this reason, so as not to break the abstraction, Guile defines a fake
136 language at the bottom of the tower:
137
138 @itemize
139 @item Value
140 @end itemize
141
142 Compiling to @code{value} loads the object code into a procedure, and
143 wakes the sleeping giant.
144
145 Perhaps this strangeness can be explained by example:
146 @code{compile-file} defaults to compiling to object code, because it
147 produces object code that has to live in the barren world outside the
148 Guile runtime; but @code{compile} defaults to compiling to
149 @code{value}, as its product re-enters the Guile world.
150
151 Indeed, the process of compilation can circulate through these
152 different worlds indefinitely, as shown by the following quine:
153
154 @example
155 ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
156 @end example
157
158 @node The Scheme Compiler
159 @subsection The Scheme Compiler
160
161 The job of the Scheme compiler is to expand all macros and all of
162 Scheme to its most primitive expressions. The definition of
163 ``primitive'' is given by the inventory of constructs provided by
164 Tree-IL, the target language of the Scheme compiler: procedure
165 applications, conditionals, lexical references, etc. This is described
166 more fully in the next section.
167
168 The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
169 that it is completely implemented by the macro expander. Since the
170 macro expander has to run over all of the source code already in order
171 to expand macros, it might as well do the analysis at the same time,
172 producing Tree-IL expressions directly.
173
174 Because this compiler is actually the macro expander, it is
175 extensible. Any macro which the user writes becomes part of the
176 compiler.
177
178 The Scheme-to-Tree-IL expander may be invoked using the generic
179 @code{compile} procedure:
180
181 @lisp
182 (compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
183 @result{}
184 #<<application> src: #f
185 proc: #<<toplevel-ref> src: #f name: +>
186 args: (#<<const> src: #f exp: 1>
187 #<<const> src: #f exp: 2>)>
188 @end lisp
189
190 Or, since Tree-IL is so close to Scheme, it is often useful to expand
191 Scheme to Tree-IL, then translate back to Scheme. For that reason the
192 expander provides two interfaces. The former is equivalent to calling
193 @code{(macroexpand '(+ 1 2) 'c)}, where the @code{'c} is for
194 ``compile''. With @code{'e} (the default), the result is translated
195 back to Scheme:
196
197 @lisp
198 (macroexpand '(+ 1 2))
199 @result{} (+ 1 2)
200 (macroexpand '(let ((x 10)) (* x x)))
201 @result{} (let ((x84 10)) (* x84 x84))
202 @end lisp
203
204 The second example shows that as part of its job, the macro expander
205 renames lexically-bound variables. The original names are preserved
206 when compiling to Tree-IL, but can't be represented in Scheme: a
207 lexical binding only has one name. It is for this reason that the
208 @emph{native} output of the expander is @emph{not} Scheme. There's too
209 much information we would lose if we translated to Scheme directly:
210 lexical variable names, source locations, and module hygiene.
211
212 Note however that @code{macroexpand} does not have the same signature
213 as @code{compile-tree-il}. @code{compile-tree-il} is a small wrapper
214 around @code{macroexpand}, to make it conform to the general form of
215 compiler procedures in Guile's language tower.
216
217 Compiler procedures take three arguments: an expression, an
218 environment, and a keyword list of options. They return three values:
219 the compiled expression, the corresponding environment for the target
220 language, and a ``continuation environment''. The compiled expression
221 and environment will serve as input to the next language's compiler.
222 The ``continuation environment'' can be used to compile another
223 expression from the same source language within the same module.
224
225 For example, you might compile the expression, @code{(define-module
226 (foo))}. This will result in a Tree-IL expression and environment. But
227 if you compiled a second expression, you would want to take into
228 account the compile-time effect of compiling the previous expression,
229 which puts the user in the @code{(foo)} module. That is purpose of the
230 ``continuation environment''; you would pass it as the environment
231 when compiling the subsequent expression.
232
233 For Scheme, an environment is a module. By default, the @code{compile}
234 and @code{compile-file} procedures compile in a fresh module, such
235 that bindings and macros introduced by the expression being compiled
236 are isolated:
237
238 @example
239 (eq? (current-module) (compile '(current-module)))
240 @result{} #f
241
242 (compile '(define hello 'world))
243 (defined? 'hello)
244 @result{} #f
245
246 (define / *)
247 (eq? (compile '/) /)
248 @result{} #f
249 @end example
250
251 Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
252 @code{current-reader}}) are isolated:
253
254 @example
255 (compile '(fluid-set! current-reader (lambda args 'fail)))
256 (fluid-ref current-reader)
257 @result{} #f
258 @end example
259
260 Nevertheless, having the compiler and @dfn{compilee} share the same name
261 space can be achieved by explicitly passing @code{(current-module)} as
262 the compilation environment:
263
264 @example
265 (define hello 'world)
266 (compile 'hello #:env (current-module))
267 @result{} world
268 @end example
269
270 @node Tree-IL
271 @subsection Tree-IL
272
273 Tree Intermediate Language (Tree-IL) is a structured intermediate
274 language that is close in expressive power to Scheme. It is an
275 expanded, pre-analyzed Scheme.
276
277 Tree-IL is ``structured'' in the sense that its representation is
278 based on records, not S-expressions. This gives a rigidity to the
279 language that ensures that compiling to a lower-level language only
280 requires a limited set of transformations. For example, the Tree-IL
281 type @code{<const>} is a record type with two fields, @code{src} and
282 @code{exp}. Instances of this type are created via @code{make-const}.
283 Fields of this type are accessed via the @code{const-src} and
284 @code{const-exp} procedures. There is also a predicate, @code{const?}.
285 @xref{Records}, for more information on records.
286
287 @c alpha renaming
288
289 All Tree-IL types have a @code{src} slot, which holds source location
290 information for the expression. This information, if present, will be
291 residualized into the compiled object code, allowing backtraces to
292 show source information. The format of @code{src} is the same as that
293 returned by Guile's @code{source-properties} function. @xref{Source
294 Properties}, for more information.
295
296 Although Tree-IL objects are represented internally using records,
297 there is also an equivalent S-expression external representation for
298 each kind of Tree-IL. For example, the S-expression representation
299 of @code{#<const src: #f exp: 3>} expression would be:
300
301 @example
302 (const 3)
303 @end example
304
305 Users may program with this format directly at the REPL:
306
307 @example
308 scheme@@(guile-user)> ,language tree-il
309 Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
310 tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
311 @result{} 42
312 @end example
313
314 The @code{src} fields are left out of the external representation.
315
316 One may create Tree-IL objects from their external representations via
317 calling @code{parse-tree-il}, the reader for Tree-IL. If any source
318 information is attached to the input S-expression, it will be
319 propagated to the resulting Tree-IL expressions. This is probably the
320 easiest way to compile to Tree-IL: just make the appropriate external
321 representations in S-expression format, and let @code{parse-tree-il}
322 take care of the rest.
323
324 @deftp {Scheme Variable} <void> src
325 @deftpx {External Representation} (void)
326 An empty expression. In practice, equivalent to Scheme's @code{(if #f
327 #f)}.
328 @end deftp
329 @deftp {Scheme Variable} <const> src exp
330 @deftpx {External Representation} (const @var{exp})
331 A constant.
332 @end deftp
333 @deftp {Scheme Variable} <primitive-ref> src name
334 @deftpx {External Representation} (primitive @var{name})
335 A reference to a ``primitive''. A primitive is a procedure that, when
336 compiled, may be open-coded. For example, @code{cons} is usually
337 recognized as a primitive, so that it compiles down to a single
338 instruction.
339
340 Compilation of Tree-IL usually begins with a pass that resolves some
341 @code{<module-ref>} and @code{<toplevel-ref>} expressions to
342 @code{<primitive-ref>} expressions. The actual compilation pass
343 has special cases for applications of certain primitives, like
344 @code{apply} or @code{cons}.
345 @end deftp
346 @deftp {Scheme Variable} <lexical-ref> src name gensym
347 @deftpx {External Representation} (lexical @var{name} @var{gensym})
348 A reference to a lexically-bound variable. The @var{name} is the
349 original name of the variable in the source program. @var{gensym} is a
350 unique identifier for this variable.
351 @end deftp
352 @deftp {Scheme Variable} <lexical-set> src name gensym exp
353 @deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
354 Sets a lexically-bound variable.
355 @end deftp
356 @deftp {Scheme Variable} <module-ref> src mod name public?
357 @deftpx {External Representation} (@@ @var{mod} @var{name})
358 @deftpx {External Representation} (@@@@ @var{mod} @var{name})
359 A reference to a variable in a specific module. @var{mod} should be
360 the name of the module, e.g.@: @code{(guile-user)}.
361
362 If @var{public?} is true, the variable named @var{name} will be looked
363 up in @var{mod}'s public interface, and serialized with @code{@@};
364 otherwise it will be looked up among the module's private bindings,
365 and is serialized with @code{@@@@}.
366 @end deftp
367 @deftp {Scheme Variable} <module-set> src mod name public? exp
368 @deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
369 @deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
370 Sets a variable in a specific module.
371 @end deftp
372 @deftp {Scheme Variable} <toplevel-ref> src name
373 @deftpx {External Representation} (toplevel @var{name})
374 References a variable from the current procedure's module.
375 @end deftp
376 @deftp {Scheme Variable} <toplevel-set> src name exp
377 @deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
378 Sets a variable in the current procedure's module.
379 @end deftp
380 @deftp {Scheme Variable} <toplevel-define> src name exp
381 @deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
382 Defines a new top-level variable in the current procedure's module.
383 @end deftp
384 @deftp {Scheme Variable} <conditional> src test then else
385 @deftpx {External Representation} (if @var{test} @var{then} @var{else})
386 A conditional. Note that @var{else} is not optional.
387 @end deftp
388 @deftp {Scheme Variable} <application> src proc args
389 @deftpx {External Representation} (apply @var{proc} . @var{args})
390 A procedure call.
391 @end deftp
392 @deftp {Scheme Variable} <sequence> src exps
393 @deftpx {External Representation} (begin . @var{exps})
394 Like Scheme's @code{begin}.
395 @end deftp
396 @deftp {Scheme Variable} <lambda> src meta body
397 @deftpx {External Representation} (lambda @var{meta} @var{body})
398 A closure. @var{meta} is an association list of properties for the
399 procedure. @var{body} is a single Tree-IL expression of type
400 @code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
401 an alternate clause, this makes Tree-IL's @code{<lambda>} have the
402 expressiveness of Scheme's @code{case-lambda}.
403 @end deftp
404 @deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
405 @deftpx {External Representation} @
406 (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
407 @var{body})@
408 [@var{alternate}])
409 One clause of a @code{case-lambda}. A @code{lambda} expression in
410 Scheme is treated as a @code{case-lambda} with one clause.
411
412 @var{req} is a list of the procedure's required arguments, as symbols.
413 @var{opt} is a list of the optional arguments, or @code{#f} if there
414 are no optional arguments. @var{rest} is the name of the rest
415 argument, or @code{#f}.
416
417 @var{kw} is a list of the form, @code{(@var{allow-other-keys?}
418 (@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
419 keyword corresponding to the argument named @var{name}, and whose
420 corresponding gensym is @var{var}. @var{inits} are tree-il expressions
421 corresponding to all of the optional and keyword arguments, evaluated
422 to bind variables whose value is not supplied by the procedure caller.
423 Each @var{init} expression is evaluated in the lexical context of
424 previously bound variables, from left to right.
425
426 @var{gensyms} is a list of gensyms corresponding to all arguments:
427 first all of the required arguments, then the optional arguments if
428 any, then the rest argument if any, then all of the keyword arguments.
429
430 @var{body} is the body of the clause. If the procedure is called with
431 an appropriate number of arguments, @var{body} is evaluated in tail
432 position. Otherwise, if there is an @var{alternate}, it should be a
433 @code{<lambda-case>} expression, representing the next clause to try.
434 If there is no @var{alternate}, a wrong-number-of-arguments error is
435 signaled.
436 @end deftp
437 @deftp {Scheme Variable} <let> src names gensyms vals exp
438 @deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
439 Lexical binding, like Scheme's @code{let}. @var{names} are the
440 original binding names, @var{gensyms} are gensyms corresponding to the
441 @var{names}, and @var{vals} are Tree-IL expressions for the values.
442 @var{exp} is a single Tree-IL expression.
443 @end deftp
444 @deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
445 @deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
446 @deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
447 A version of @code{<let>} that creates recursive bindings, like
448 Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
449 @end deftp
450 @deftp {Scheme Variable} <dynlet> fluids vals body
451 @deftpx {External Representation} (dynlet @var{fluids} @var{vals} @var{body})
452 Dynamic binding; the equivalent of Scheme's @code{with-fluids}.
453 @var{fluids} should be a list of Tree-IL expressions that will
454 evaluate to fluids, and @var{vals} a corresponding list of expressions
455 to bind to the fluids during the dynamic extent of the evaluation of
456 @var{body}.
457 @end deftp
458 @deftp {Scheme Variable} <dynref> fluid
459 @deftpx {External Representation} (dynref @var{fluid})
460 A dynamic variable reference. @var{fluid} should be a Tree-IL
461 expression evaluating to a fluid.
462 @end deftp
463 @deftp {Scheme Variable} <dynset> fluid exp
464 @deftpx {External Representation} (dynset @var{fluid} @var{exp})
465 A dynamic variable set. @var{fluid}, a Tree-IL expression evaluating
466 to a fluid, will be set to the result of evaluating @var{exp}.
467 @end deftp
468 @deftp {Scheme Variable} <dynwind> winder body unwinder
469 @deftpx {External Representation} (dynwind @var{winder} @var{body} @var{unwinder})
470 A @code{dynamic-wind}. @var{winder} and @var{unwinder} should both
471 evaluate to thunks. Ensure that the winder and the unwinder are called
472 before entering and after leaving @var{body}. Note that @var{body} is
473 an expression, without a thunk wrapper.
474 @end deftp
475 @deftp {Scheme Variable} <prompt> tag body handler
476 @deftpx {External Representation} (prompt @var{tag} @var{body} @var{handler})
477 A dynamic prompt. Instates a prompt named @var{tag}, an expression,
478 during the dynamic extent of the execution of @var{body}, also an
479 expression. If an abort occurs to this prompt, control will be passed
480 to @var{handler}, a @code{<lambda-case>} expression with no optional
481 or keyword arguments, and no alternate. The first argument to the
482 @code{<lambda-case>} will be the captured continuation, and then all
483 of the values passed to the abort. @xref{Prompts}, for more
484 information.
485 @end deftp
486 @deftp {Scheme Variable} <abort> tag args tail
487 @deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
488 An abort to the nearest prompt with the name @var{tag}, an expression.
489 @var{args} should be a list of expressions to pass to the prompt's
490 handler, and @var{tail} should be an expression that will evaluate to
491 a list of additional arguments. An abort will save the partial
492 continuation, which may later be reinstated, resulting in the
493 @code{<abort>} expression evaluating to some number of values.
494 @end deftp
495
496 There are two Tree-IL constructs that are not normally produced by
497 higher-level compilers, but instead are generated during the
498 source-to-source optimization and analysis passes that the Tree-IL
499 compiler does. Users should not generate these expressions directly,
500 unless they feel very clever, as the default analysis pass will
501 generate them as necessary.
502
503 @deftp {Scheme Variable} <let-values> src names gensyms exp body
504 @deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
505 Like Scheme's @code{receive} -- binds the values returned by
506 evaluating @code{exp} to the @code{lambda}-like bindings described by
507 @var{gensyms}. That is to say, @var{gensyms} may be an improper list.
508
509 @code{<let-values>} is an optimization of @code{<application>} of the
510 primitive, @code{call-with-values}.
511 @end deftp
512 @deftp {Scheme Variable} <fix> src names gensyms vals body
513 @deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
514 Like @code{<letrec>}, but only for @var{vals} that are unset
515 @code{lambda} expressions.
516
517 @code{fix} is an optimization of @code{letrec} (and @code{let}).
518 @end deftp
519
520 Tree-IL implements a compiler to GLIL that recursively traverses
521 Tree-IL expressions, writing out GLIL expressions into a linear list.
522 The compiler also keeps some state as to whether the current
523 expression is in tail context, and whether its value will be used in
524 future computations. This state allows the compiler not to emit code
525 for constant expressions that will not be used (e.g.@: docstrings), and
526 to perform tail calls when in tail position.
527
528 Most optimization, such as it currently is, is performed on Tree-IL
529 expressions as source-to-source transformations. There will be more
530 optimizations added in the future.
531
532 Interested readers are encouraged to read the implementation in
533 @code{(language tree-il compile-glil)} for more details.
534
535 @node GLIL
536 @subsection GLIL
537
538 Guile Lowlevel Intermediate Language (GLIL) is a structured intermediate
539 language whose expressions more closely approximate Guile's VM
540 instruction set. Its expression types are defined in @code{(language
541 glil)}.
542
543 @deftp {Scheme Variable} <glil-program> meta . body
544 A unit of code that at run-time will correspond to a compiled
545 procedure. @var{meta} should be an alist of properties, as in
546 Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
547 expressions.
548 @end deftp
549 @deftp {Scheme Variable} <glil-std-prelude> nreq nlocs else-label
550 A prologue for a function with no optional, keyword, or rest
551 arguments. @var{nreq} is the number of required arguments. @var{nlocs}
552 the total number of local variables, including the arguments. If the
553 procedure was not given exactly @var{nreq} arguments, control will
554 jump to @var{else-label}, if given, or otherwise signal an error.
555 @end deftp
556 @deftp {Scheme Variable} <glil-opt-prelude> nreq nopt rest nlocs else-label
557 A prologue for a function with optional or rest arguments. Like
558 @code{<glil-std-prelude>}, with the addition that @var{nopt} is the
559 number of optional arguments (possibly zero) and @var{rest} is an
560 index of a local variable at which to bind a rest argument, or
561 @code{#f} if there is no rest argument.
562 @end deftp
563 @deftp {Scheme Variable} <glil-kw-prelude> nreq nopt rest kw allow-other-keys? nlocs else-label
564 A prologue for a function with keyword arguments. Like
565 @code{<glil-opt-prelude>}, with the addition that @var{kw} is a list
566 of keyword arguments, and @var{allow-other-keys?} is a flag indicating
567 whether to allow unknown keys. @xref{Function Prologue Instructions,
568 @code{bind-kwargs}}, for details on the format of @var{kw}.
569 @end deftp
570 @deftp {Scheme Variable} <glil-bind> . vars
571 An advisory expression that notes a liveness extent for a set of
572 variables. @var{vars} is a list of @code{(@var{name} @var{type}
573 @var{index})}, where @var{type} should be either @code{argument},
574 @code{local}, or @code{external}.
575
576 @code{<glil-bind>} expressions end up being serialized as part of a
577 program's metadata and do not form part of a program's code path.
578 @end deftp
579 @deftp {Scheme Variable} <glil-mv-bind> vars rest
580 A multiple-value binding of the values on the stack to @var{vars}. Iff
581 @var{rest} is true, the last element of @var{vars} will be treated as
582 a rest argument.
583
584 In addition to pushing a binding annotation on the stack, like
585 @code{<glil-bind>}, an expression is emitted at compilation time to
586 make sure that there are enough values available to bind. See the
587 notes on @code{truncate-values} in @ref{Procedure Call and Return
588 Instructions}, for more information.
589 @end deftp
590 @deftp {Scheme Variable} <glil-unbind>
591 Closes the liveness extent of the most recently encountered
592 @code{<glil-bind>} or @code{<glil-mv-bind>} expression. As GLIL
593 expressions are compiled, a parallel stack of live bindings is
594 maintained; this expression pops off the top element from that stack.
595
596 Bindings are written into the program's metadata so that debuggers and
597 other tools can determine the set of live local variables at a given
598 offset within a VM program.
599 @end deftp
600 @deftp {Scheme Variable} <glil-source> loc
601 Records source information for the preceding expression. @var{loc}
602 should be an association list of containing @code{line} @code{column},
603 and @code{filename} keys, e.g.@: as returned by
604 @code{source-properties}.
605 @end deftp
606 @deftp {Scheme Variable} <glil-void>
607 Pushes ``the unspecified value'' on the stack.
608 @end deftp
609 @deftp {Scheme Variable} <glil-const> obj
610 Pushes a constant value onto the stack. @var{obj} must be a number,
611 string, symbol, keyword, boolean, character, uniform array, the empty
612 list, or a pair or vector of constants.
613 @end deftp
614 @deftp {Scheme Variable} <glil-lexical> local? boxed? op index
615 Accesses a lexically bound variable. If the variable is not
616 @var{local?} it is free. All variables may have @code{ref},
617 @code{set}, and @code{bound?} as their @var{op}. Boxed variables may
618 also have the @var{op}s @code{box}, @code{empty-box}, and @code{fix},
619 which correspond in semantics to the VM instructions @code{box},
620 @code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
621 more information.
622 @end deftp
623 @deftp {Scheme Variable} <glil-toplevel> op name
624 Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
625 or @code{define}.
626 @end deftp
627 @deftp {Scheme Variable} <glil-module> op mod name public?
628 Accesses a variable within a specific module. See Tree-IL's
629 @code{<module-ref>}, for more information.
630 @end deftp
631 @deftp {Scheme Variable} <glil-label> label
632 Creates a new label. @var{label} can be any Scheme value, and should
633 be unique.
634 @end deftp
635 @deftp {Scheme Variable} <glil-branch> inst label
636 Branch to a label. @var{label} should be a @code{<ghil-label>}.
637 @code{inst} is a branching instruction: @code{br-if}, @code{br}, etc.
638 @end deftp
639 @deftp {Scheme Variable} <glil-call> inst nargs
640 This expression is probably misnamed, as it does not correspond to
641 function calls. @code{<glil-call>} invokes the VM instruction named
642 @var{inst}, noting that it is called with @var{nargs} stack arguments.
643 The arguments should be pushed on the stack already. What happens to
644 the stack afterwards depends on the instruction.
645 @end deftp
646 @deftp {Scheme Variable} <glil-mv-call> nargs ra
647 Performs a multiple-value call. @var{ra} is a @code{<glil-label>}
648 corresponding to the multiple-value return address for the call. See
649 the notes on @code{mv-call} in @ref{Procedure Call and Return
650 Instructions}, for more information.
651 @end deftp
652 @deftp {Scheme Variable} <glil-prompt> label escape-only?
653 Push a dynamic prompt into the stack, with a handler at @var{label}.
654 @var{escape-only?} is a flag that is propagated to the prompt,
655 allowing an abort to avoid capturing a continuation in some cases.
656 @xref{Prompts}, for more information.
657 @end deftp
658
659 Users may enter in GLIL at the REPL as well, though there is a bit
660 more bookkeeping to do:
661
662 @example
663 scheme@@(guile-user)> ,language glil
664 Happy hacking with Guile Lowlevel Intermediate Language (GLIL)!
665 To switch back, type `,L scheme'.
666 glil@@(guile-user)> (program () (std-prelude 0 0 #f)
667 (const 3) (call return 1))
668 @result{} 3
669 @end example
670
671 Just as in all of Guile's compilers, an environment is passed to the
672 GLIL-to-object code compiler, and one is returned as well, along with
673 the object code.
674
675 @node Assembly
676 @subsection Assembly
677
678 Assembly is an S-expression-based, human-readable representation of
679 the actual bytecodes that will be emitted for the VM. As such, it is a
680 useful intermediate language both for compilation and for
681 decompilation.
682
683 Besides the fact that it is not a record-based language, assembly
684 differs from GLIL in four main ways:
685
686 @itemize
687 @item Labels have been resolved to byte offsets in the program.
688 @item Constants inside procedures have either been expressed as inline
689 instructions or cached in object arrays.
690 @item Procedures with metadata (source location information, liveness
691 extents, procedure names, generic properties, etc) have had their
692 metadata serialized out to thunks.
693 @item All expressions correspond directly to VM instructions -- i.e.,
694 there is no @code{<glil-lexical>} which can be a ref or a set.
695 @end itemize
696
697 Assembly is isomorphic to the bytecode that it compiles to. You can
698 compile to bytecode, then decompile back to assembly, and you have the
699 same assembly code.
700
701 The general form of assembly instructions is the following:
702
703 @lisp
704 (@var{inst} @var{arg} ...)
705 @end lisp
706
707 The @var{inst} names a VM instruction, and its @var{arg}s will be
708 embedded in the instruction stream. The easiest way to see assembly is
709 to play around with it at the REPL, as can be seen in this annotated
710 example:
711
712 @example
713 scheme@@(guile-user)> ,pp (compile '(+ 32 10) #:to 'assembly)
714 (load-program
715 ((:LCASE16 . 2)) ; Labels, unused in this case.
716 8 ; Length of the thunk that was compiled.
717 (load-program ; Metadata thunk.
718 ()
719 17
720 #f ; No metadata thunk for the metadata thunk.
721 (make-eol)
722 (make-eol)
723 (make-int8 2) ; Liveness extents, source info, and arities,
724 (make-int8 8) ; in a format that Guile knows how to parse.
725 (make-int8:0)
726 (list 0 3)
727 (list 0 1)
728 (list 0 3)
729 (return))
730 (assert-nargs-ee/locals 0) ; Prologue.
731 (make-int8 32) ; Actual code starts here.
732 (make-int8 10)
733 (add)
734 (return))
735 @end example
736
737 Of course you can switch the REPL to assembly and enter in assembly
738 S-expressions directly, like with other languages, though it is more
739 difficult, given that the length fields have to be correct.
740
741 @node Bytecode and Objcode
742 @subsection Bytecode and Objcode
743
744 Finally, the raw bytes. There are actually two different ``languages''
745 here, corresponding to two different ways to represent the bytes.
746
747 ``Bytecode'' represents code as uniform byte vectors, useful for
748 structuring and destructuring code on the Scheme level. Bytecode is
749 the next step down from assembly:
750
751 @example
752 scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
753 @result{} #vu8(8 0 0 0 25 0 0 0 ; Header.
754 95 0 ; Prologue.
755 10 32 10 10 148 66 17 ; Actual code.
756 0 0 0 0 0 0 0 9 ; Metadata thunk.
757 9 10 2 10 8 11 18 0 3 18 0 1 18 0 3 66)
758 @end example
759
760 ``Objcode'' is bytecode, but mapped directly to a C structure,
761 @code{struct scm_objcode}:
762
763 @example
764 struct scm_objcode @{
765 scm_t_uint32 len;
766 scm_t_uint32 metalen;
767 scm_t_uint8 base[0];
768 @};
769 @end example
770
771 As one might imagine, objcode imposes a minimum length on the
772 bytecode. Also, the @code{len} and @code{metalen} fields are in native
773 endianness, which makes objcode (and bytecode) system-dependent.
774
775 Objcode also has a couple of important efficiency hacks. First,
776 objcode may be mapped directly from disk, allowing compiled code to be
777 loaded quickly, often from the system's disk cache, and shared among
778 multiple processes. Secondly, objcode may be embedded in other
779 objcode, allowing procedures to have the text of other procedures
780 inlined into their bodies, without the need for separate allocation of
781 the code. Of course, the objcode object itself does need to be
782 allocated.
783
784 Procedures related to objcode are defined in the @code{(system vm
785 objcode)} module.
786
787 @deffn {Scheme Procedure} objcode? obj
788 @deffnx {C Function} scm_objcode_p (obj)
789 Returns @code{#f} iff @var{obj} is object code, @code{#f} otherwise.
790 @end deffn
791
792 @deffn {Scheme Procedure} bytecode->objcode bytecode
793 @deffnx {C Function} scm_bytecode_to_objcode (bytecode)
794 Makes a bytecode object from @var{bytecode}, which should be a
795 bytevector. @xref{Bytevectors}.
796 @end deffn
797
798 @deffn {Scheme Variable} load-objcode file
799 @deffnx {C Function} scm_load_objcode (file)
800 Load object code from a file named @var{file}. The file will be mapped
801 into memory via @code{mmap}, so this is a very fast operation.
802
803 On disk, object code has an sixteen-byte cookie prepended to it, to
804 prevent accidental loading of arbitrary garbage.
805 @end deffn
806
807 @deffn {Scheme Variable} write-objcode objcode file
808 @deffnx {C Function} scm_write_objcode (objcode)
809 Write object code out to a file, prepending the sixteen-byte cookie.
810 @end deffn
811
812 @deffn {Scheme Variable} objcode->bytecode objcode
813 @deffnx {C Function} scm_objcode_to_bytecode (objcode)
814 Copy object code out to a bytevector for analysis by Scheme.
815 @end deffn
816
817 The following procedure is actually in @code{(system vm program)}, but
818 we'll mention it here:
819
820 @deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
821 @deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
822 Load up object code into a Scheme program. The resulting program will
823 have @var{objtable} as its object table, which should be a vector or
824 @code{#f}, and will capture the free variables from @var{free-vars}.
825 @end deffn
826
827 Object code from a file may be disassembled at the REPL via the
828 meta-command @code{,disassemble-file}, abbreviated as @code{,xx}.
829 Programs may be disassembled via @code{,disassemble}, abbreviated as
830 @code{,x}.
831
832 Compiling object code to the fake language, @code{value}, is performed
833 via loading objcode into a program, then executing that thunk with
834 respect to the compilation environment. Normally the environment
835 propagates through the compiler transparently, but users may specify
836 the compilation environment manually as well, as a module.
837
838
839 @node Writing New High-Level Languages
840 @subsection Writing New High-Level Languages
841
842 In order to integrate a new language @var{lang} into Guile's compiler
843 system, one has to create the module @code{(language @var{lang} spec)}
844 containing the language definition and referencing the parser,
845 compiler and other routines processing it. The module hierarchy in
846 @code{(language brainfuck)} defines a very basic Brainfuck
847 implementation meant to serve as easy-to-understand example on how to
848 do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
849 for more information about the Brainfuck language itself.
850
851
852 @node Extending the Compiler
853 @subsection Extending the Compiler
854
855 At this point we take a detour from the impersonal tone of the rest of
856 the manual. Admit it: if you've read this far into the compiler
857 internals manual, you are a junkie. Perhaps a course at your university
858 left you unsated, or perhaps you've always harbored a desire to hack the
859 holy of computer science holies: a compiler. Well you're in good
860 company, and in a good position. Guile's compiler needs your help.
861
862 There are many possible avenues for improving Guile's compiler.
863 Probably the most important improvement, speed-wise, will be some form
864 of native compilation, both just-in-time and ahead-of-time. This could
865 be done in many ways. Probably the easiest strategy would be to extend
866 the compiled procedure structure to include a pointer to a native code
867 vector, and compile from bytecode to native code at run-time after a
868 procedure is called a certain number of times.
869
870 The name of the game is a profiling-based harvest of the low-hanging
871 fruit, running programs of interest under a system-level profiler and
872 determining which improvements would give the most bang for the buck.
873 It's really getting to the point though that native compilation is the
874 next step.
875
876 The compiler also needs help at the top end, enhancing the Scheme that
877 it knows to also understand R6RS, and adding new high-level compilers.
878 We have JavaScript and Emacs Lisp mostly complete, but they could use
879 some love; Lua would be nice as well, but whatever language it is
880 that strikes your fancy would be welcome too.
881
882 Compilers are for hacking, not for admiring or for complaining about.
883 Get to it!