Merge remote-tracking branch 'local-2.0/stable-2.0'
[bpt/guile.git] / doc / ref / compiler.texi
1 @c -*-texinfo-*-
2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 2008, 2009, 2010, 2011
4 @c Free Software Foundation, Inc.
5 @c See the file guile.texi for copying conditions.
6
7 @node Compiling to the Virtual Machine
8 @section Compiling to the Virtual Machine
9
10 Compilers have a mystique about them that is attractive and
11 off-putting at the same time. They are attractive because they are
12 magical -- they transform inert text into live results, like throwing
13 the switch on Frankenstein's monster. However, this magic is perceived
14 by many to be impenetrable.
15
16 This section aims to pay attention to the small man behind the
17 curtain.
18
19 @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
20 know how to compile your @code{.scm} file.
21
22 @menu
23 * Compiler Tower::
24 * The Scheme Compiler::
25 * Tree-IL::
26 * GLIL::
27 * Assembly::
28 * Bytecode and Objcode::
29 * Writing New High-Level Languages::
30 * Extending the Compiler::
31 @end menu
32
33 @node Compiler Tower
34 @subsection Compiler Tower
35
36 Guile's compiler is quite simple, actually -- its @emph{compilers}, to
37 put it more accurately. Guile defines a tower of languages, starting
38 at Scheme and progressively simplifying down to languages that
39 resemble the VM instruction set (@pxref{Instruction Set}).
40
41 Each language knows how to compile to the next, so each step is simple
42 and understandable. Furthermore, this set of languages is not
43 hardcoded into Guile, so it is possible for the user to add new
44 high-level languages, new passes, or even different compilation
45 targets.
46
47 Languages are registered in the module, @code{(system base language)}:
48
49 @example
50 (use-modules (system base language))
51 @end example
52
53 They are registered with the @code{define-language} form.
54
55 @deffn {Scheme Syntax} define-language @
56 name title reader printer @
57 [parser=#f] [compilers='()] [decompilers='()] [evaluator=#f] @
58 [joiner=#f] [make-default-environment=make-fresh-user-module]
59 Define a language.
60
61 This syntax defines a @code{#<language>} object, bound to @var{name}
62 in the current environment. In addition, the language will be added to
63 the global language set. For example, this is the language definition
64 for Scheme:
65
66 @example
67 (define-language scheme
68 #:title "Scheme"
69 #:reader (lambda (port env) ...)
70 #:compilers `((tree-il . ,compile-tree-il))
71 #:decompilers `((tree-il . ,decompile-tree-il))
72 #:evaluator (lambda (x module) (primitive-eval x))
73 #:printer write
74 #:make-default-environment (lambda () ...))
75 @end example
76 @end deffn
77
78 The interesting thing about having languages defined this way is that
79 they present a uniform interface to the read-eval-print loop. This
80 allows the user to change the current language of the REPL:
81
82 @example
83 scheme@@(guile-user)> ,language tree-il
84 Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
85 tree-il@@(guile-user)> ,L scheme
86 Happy hacking with Scheme! To switch back, type `,L tree-il'.
87 scheme@@(guile-user)>
88 @end example
89
90 Languages can be looked up by name, as they were above.
91
92 @deffn {Scheme Procedure} lookup-language name
93 Looks up a language named @var{name}, autoloading it if necessary.
94
95 Languages are autoloaded by looking for a variable named @var{name} in
96 a module named @code{(language @var{name} spec)}.
97
98 The language object will be returned, or @code{#f} if there does not
99 exist a language with that name.
100 @end deffn
101
102 Defining languages this way allows us to programmatically determine
103 the necessary steps for compiling code from one language to another.
104
105 @deffn {Scheme Procedure} lookup-compilation-order from to
106 Recursively traverses the set of languages to which @var{from} can
107 compile, depth-first, and return the first path that can transform
108 @var{from} to @var{to}. Returns @code{#f} if no path is found.
109
110 This function memoizes its results in a cache that is invalidated by
111 subsequent calls to @code{define-language}, so it should be quite
112 fast.
113 @end deffn
114
115 There is a notion of a ``current language'', which is maintained in
116 the @code{*current-language*} fluid. This language is normally Scheme,
117 and may be rebound by the user. The run-time compilation interfaces
118 (@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
119 and target languages.
120
121 The normal tower of languages when compiling Scheme goes like this:
122
123 @itemize
124 @item Scheme
125 @item Tree Intermediate Language (Tree-IL)
126 @item Guile Lowlevel Intermediate Language (GLIL)
127 @item Assembly
128 @item Bytecode
129 @item Objcode
130 @end itemize
131
132 Object code may be serialized to disk directly, though it has a cookie
133 and version prepended to the front. But when compiling Scheme at run
134 time, you want a Scheme value: for example, a compiled procedure. For
135 this reason, so as not to break the abstraction, Guile defines a fake
136 language at the bottom of the tower:
137
138 @itemize
139 @item Value
140 @end itemize
141
142 Compiling to @code{value} loads the object code into a procedure, and
143 wakes the sleeping giant.
144
145 Perhaps this strangeness can be explained by example:
146 @code{compile-file} defaults to compiling to object code, because it
147 produces object code that has to live in the barren world outside the
148 Guile runtime; but @code{compile} defaults to compiling to
149 @code{value}, as its product re-enters the Guile world.
150
151 Indeed, the process of compilation can circulate through these
152 different worlds indefinitely, as shown by the following quine:
153
154 @example
155 ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
156 @end example
157
158 @node The Scheme Compiler
159 @subsection The Scheme Compiler
160
161 The job of the Scheme compiler is to expand all macros and all of Scheme
162 to its most primitive expressions. The definition of ``primitive'' is
163 given by the inventory of constructs provided by Tree-IL, the target
164 language of the Scheme compiler: procedure calls, conditionals, lexical
165 references, etc. This is described more fully in the next section.
166
167 The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
168 that it is completely implemented by the macro expander. Since the
169 macro expander has to run over all of the source code already in order
170 to expand macros, it might as well do the analysis at the same time,
171 producing Tree-IL expressions directly.
172
173 Because this compiler is actually the macro expander, it is
174 extensible. Any macro which the user writes becomes part of the
175 compiler.
176
177 The Scheme-to-Tree-IL expander may be invoked using the generic
178 @code{compile} procedure:
179
180 @lisp
181 (compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
182 @result{}
183 #<<call> src: #f
184 proc: #<<toplevel-ref> src: #f name: +>
185 args: (#<<const> src: #f exp: 1>
186 #<<const> src: #f exp: 2>)>
187 @end lisp
188
189 Or, since Tree-IL is so close to Scheme, it is often useful to expand
190 Scheme to Tree-IL, then translate back to Scheme. For that reason the
191 expander provides two interfaces. The former is equivalent to calling
192 @code{(macroexpand '(+ 1 2) 'c)}, where the @code{'c} is for
193 ``compile''. With @code{'e} (the default), the result is translated
194 back to Scheme:
195
196 @lisp
197 (macroexpand '(+ 1 2))
198 @result{} (+ 1 2)
199 (macroexpand '(let ((x 10)) (* x x)))
200 @result{} (let ((x84 10)) (* x84 x84))
201 @end lisp
202
203 The second example shows that as part of its job, the macro expander
204 renames lexically-bound variables. The original names are preserved
205 when compiling to Tree-IL, but can't be represented in Scheme: a
206 lexical binding only has one name. It is for this reason that the
207 @emph{native} output of the expander is @emph{not} Scheme. There's too
208 much information we would lose if we translated to Scheme directly:
209 lexical variable names, source locations, and module hygiene.
210
211 Note however that @code{macroexpand} does not have the same signature
212 as @code{compile-tree-il}. @code{compile-tree-il} is a small wrapper
213 around @code{macroexpand}, to make it conform to the general form of
214 compiler procedures in Guile's language tower.
215
216 Compiler procedures take three arguments: an expression, an
217 environment, and a keyword list of options. They return three values:
218 the compiled expression, the corresponding environment for the target
219 language, and a ``continuation environment''. The compiled expression
220 and environment will serve as input to the next language's compiler.
221 The ``continuation environment'' can be used to compile another
222 expression from the same source language within the same module.
223
224 For example, you might compile the expression, @code{(define-module
225 (foo))}. This will result in a Tree-IL expression and environment. But
226 if you compiled a second expression, you would want to take into
227 account the compile-time effect of compiling the previous expression,
228 which puts the user in the @code{(foo)} module. That is purpose of the
229 ``continuation environment''; you would pass it as the environment
230 when compiling the subsequent expression.
231
232 For Scheme, an environment is a module. By default, the @code{compile}
233 and @code{compile-file} procedures compile in a fresh module, such
234 that bindings and macros introduced by the expression being compiled
235 are isolated:
236
237 @example
238 (eq? (current-module) (compile '(current-module)))
239 @result{} #f
240
241 (compile '(define hello 'world))
242 (defined? 'hello)
243 @result{} #f
244
245 (define / *)
246 (eq? (compile '/) /)
247 @result{} #f
248 @end example
249
250 Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
251 @code{current-reader}}) are isolated:
252
253 @example
254 (compile '(fluid-set! current-reader (lambda args 'fail)))
255 (fluid-ref current-reader)
256 @result{} #f
257 @end example
258
259 Nevertheless, having the compiler and @dfn{compilee} share the same name
260 space can be achieved by explicitly passing @code{(current-module)} as
261 the compilation environment:
262
263 @example
264 (define hello 'world)
265 (compile 'hello #:env (current-module))
266 @result{} world
267 @end example
268
269 @node Tree-IL
270 @subsection Tree-IL
271
272 Tree Intermediate Language (Tree-IL) is a structured intermediate
273 language that is close in expressive power to Scheme. It is an
274 expanded, pre-analyzed Scheme.
275
276 Tree-IL is ``structured'' in the sense that its representation is
277 based on records, not S-expressions. This gives a rigidity to the
278 language that ensures that compiling to a lower-level language only
279 requires a limited set of transformations. For example, the Tree-IL
280 type @code{<const>} is a record type with two fields, @code{src} and
281 @code{exp}. Instances of this type are created via @code{make-const}.
282 Fields of this type are accessed via the @code{const-src} and
283 @code{const-exp} procedures. There is also a predicate, @code{const?}.
284 @xref{Records}, for more information on records.
285
286 @c alpha renaming
287
288 All Tree-IL types have a @code{src} slot, which holds source location
289 information for the expression. This information, if present, will be
290 residualized into the compiled object code, allowing backtraces to
291 show source information. The format of @code{src} is the same as that
292 returned by Guile's @code{source-properties} function. @xref{Source
293 Properties}, for more information.
294
295 Although Tree-IL objects are represented internally using records,
296 there is also an equivalent S-expression external representation for
297 each kind of Tree-IL. For example, the S-expression representation
298 of @code{#<const src: #f exp: 3>} expression would be:
299
300 @example
301 (const 3)
302 @end example
303
304 Users may program with this format directly at the REPL:
305
306 @example
307 scheme@@(guile-user)> ,language tree-il
308 Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
309 tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
310 @result{} 42
311 @end example
312
313 The @code{src} fields are left out of the external representation.
314
315 One may create Tree-IL objects from their external representations via
316 calling @code{parse-tree-il}, the reader for Tree-IL. If any source
317 information is attached to the input S-expression, it will be
318 propagated to the resulting Tree-IL expressions. This is probably the
319 easiest way to compile to Tree-IL: just make the appropriate external
320 representations in S-expression format, and let @code{parse-tree-il}
321 take care of the rest.
322
323 @deftp {Scheme Variable} <void> src
324 @deftpx {External Representation} (void)
325 An empty expression. In practice, equivalent to Scheme's @code{(if #f
326 #f)}.
327 @end deftp
328 @deftp {Scheme Variable} <const> src exp
329 @deftpx {External Representation} (const @var{exp})
330 A constant.
331 @end deftp
332 @deftp {Scheme Variable} <primitive-ref> src name
333 @deftpx {External Representation} (primitive @var{name})
334 A reference to a ``primitive''. A primitive is a procedure that, when
335 compiled, may be open-coded. For example, @code{cons} is usually
336 recognized as a primitive, so that it compiles down to a single
337 instruction.
338
339 Compilation of Tree-IL usually begins with a pass that resolves some
340 @code{<module-ref>} and @code{<toplevel-ref>} expressions to
341 @code{<primitive-ref>} expressions. The actual compilation pass has
342 special cases for calls to certain primitives, like @code{apply} or
343 @code{cons}.
344 @end deftp
345 @deftp {Scheme Variable} <lexical-ref> src name gensym
346 @deftpx {External Representation} (lexical @var{name} @var{gensym})
347 A reference to a lexically-bound variable. The @var{name} is the
348 original name of the variable in the source program. @var{gensym} is a
349 unique identifier for this variable.
350 @end deftp
351 @deftp {Scheme Variable} <lexical-set> src name gensym exp
352 @deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
353 Sets a lexically-bound variable.
354 @end deftp
355 @deftp {Scheme Variable} <module-ref> src mod name public?
356 @deftpx {External Representation} (@@ @var{mod} @var{name})
357 @deftpx {External Representation} (@@@@ @var{mod} @var{name})
358 A reference to a variable in a specific module. @var{mod} should be
359 the name of the module, e.g.@: @code{(guile-user)}.
360
361 If @var{public?} is true, the variable named @var{name} will be looked
362 up in @var{mod}'s public interface, and serialized with @code{@@};
363 otherwise it will be looked up among the module's private bindings,
364 and is serialized with @code{@@@@}.
365 @end deftp
366 @deftp {Scheme Variable} <module-set> src mod name public? exp
367 @deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
368 @deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
369 Sets a variable in a specific module.
370 @end deftp
371 @deftp {Scheme Variable} <toplevel-ref> src name
372 @deftpx {External Representation} (toplevel @var{name})
373 References a variable from the current procedure's module.
374 @end deftp
375 @deftp {Scheme Variable} <toplevel-set> src name exp
376 @deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
377 Sets a variable in the current procedure's module.
378 @end deftp
379 @deftp {Scheme Variable} <toplevel-define> src name exp
380 @deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
381 Defines a new top-level variable in the current procedure's module.
382 @end deftp
383 @deftp {Scheme Variable} <conditional> src test then else
384 @deftpx {External Representation} (if @var{test} @var{then} @var{else})
385 A conditional. Note that @var{else} is not optional.
386 @end deftp
387 @deftp {Scheme Variable} <call> src proc args
388 @deftpx {External Representation} (call @var{proc} . @var{args})
389 A procedure call.
390 @end deftp
391 @deftp {Scheme Variable} <primcall> src name args
392 @deftpx {External Representation} (primcall @var{name} . @var{args})
393 A call to a primitive. Equivalent to @code{(call (primitive @var{name})
394 . @var{args})}. This construct is often more convenient to generate and
395 analyze than @code{<call>}.
396
397 As part of the compilation process, instances of @code{(call (primitive
398 @var{name}) . @var{args})} are transformed into primcalls.
399 @end deftp
400 @deftp {Scheme Variable} <sequence> src exps
401 @deftpx {External Representation} (begin . @var{exps})
402 Like Scheme's @code{begin}.
403 @end deftp
404 @deftp {Scheme Variable} <lambda> src meta body
405 @deftpx {External Representation} (lambda @var{meta} @var{body})
406 A closure. @var{meta} is an association list of properties for the
407 procedure. @var{body} is a single Tree-IL expression of type
408 @code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
409 an alternate clause, this makes Tree-IL's @code{<lambda>} have the
410 expressiveness of Scheme's @code{case-lambda}.
411 @end deftp
412 @deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
413 @deftpx {External Representation} @
414 (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
415 @var{body})@
416 [@var{alternate}])
417 One clause of a @code{case-lambda}. A @code{lambda} expression in
418 Scheme is treated as a @code{case-lambda} with one clause.
419
420 @var{req} is a list of the procedure's required arguments, as symbols.
421 @var{opt} is a list of the optional arguments, or @code{#f} if there
422 are no optional arguments. @var{rest} is the name of the rest
423 argument, or @code{#f}.
424
425 @var{kw} is a list of the form, @code{(@var{allow-other-keys?}
426 (@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
427 keyword corresponding to the argument named @var{name}, and whose
428 corresponding gensym is @var{var}. @var{inits} are tree-il expressions
429 corresponding to all of the optional and keyword arguments, evaluated
430 to bind variables whose value is not supplied by the procedure caller.
431 Each @var{init} expression is evaluated in the lexical context of
432 previously bound variables, from left to right.
433
434 @var{gensyms} is a list of gensyms corresponding to all arguments:
435 first all of the required arguments, then the optional arguments if
436 any, then the rest argument if any, then all of the keyword arguments.
437
438 @var{body} is the body of the clause. If the procedure is called with
439 an appropriate number of arguments, @var{body} is evaluated in tail
440 position. Otherwise, if there is an @var{alternate}, it should be a
441 @code{<lambda-case>} expression, representing the next clause to try.
442 If there is no @var{alternate}, a wrong-number-of-arguments error is
443 signaled.
444 @end deftp
445 @deftp {Scheme Variable} <let> src names gensyms vals exp
446 @deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
447 Lexical binding, like Scheme's @code{let}. @var{names} are the
448 original binding names, @var{gensyms} are gensyms corresponding to the
449 @var{names}, and @var{vals} are Tree-IL expressions for the values.
450 @var{exp} is a single Tree-IL expression.
451 @end deftp
452 @deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
453 @deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
454 @deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
455 A version of @code{<let>} that creates recursive bindings, like
456 Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
457 @end deftp
458 @deftp {Scheme Variable} <dynlet> fluids vals body
459 @deftpx {External Representation} (dynlet @var{fluids} @var{vals} @var{body})
460 Dynamic binding; the equivalent of Scheme's @code{with-fluids}.
461 @var{fluids} should be a list of Tree-IL expressions that will
462 evaluate to fluids, and @var{vals} a corresponding list of expressions
463 to bind to the fluids during the dynamic extent of the evaluation of
464 @var{body}.
465 @end deftp
466 @deftp {Scheme Variable} <dynref> fluid
467 @deftpx {External Representation} (dynref @var{fluid})
468 A dynamic variable reference. @var{fluid} should be a Tree-IL
469 expression evaluating to a fluid.
470 @end deftp
471 @deftp {Scheme Variable} <dynset> fluid exp
472 @deftpx {External Representation} (dynset @var{fluid} @var{exp})
473 A dynamic variable set. @var{fluid}, a Tree-IL expression evaluating
474 to a fluid, will be set to the result of evaluating @var{exp}.
475 @end deftp
476 @deftp {Scheme Variable} <dynwind> winder pre body post unwinder
477 @deftpx {External Representation} (dynwind @var{winder} @var{pre} @var{body} @var{post} @var{unwinder})
478 A @code{dynamic-wind}. @var{winder} and @var{unwinder} should both
479 evaluate to thunks. Ensure that the winder and the unwinder are called
480 before entering and after leaving @var{body}. Note that @var{body} is
481 an expression, without a thunk wrapper. Guile actually inlines the
482 bodies of @var{winder} and @var{unwinder} for the case of normal control
483 flow, compiling the expressions in @var{pre} and @var{post},
484 respectively.
485 @end deftp
486 @deftp {Scheme Variable} <prompt> tag body handler
487 @deftpx {External Representation} (prompt @var{tag} @var{body} @var{handler})
488 A dynamic prompt. Instates a prompt named @var{tag}, an expression,
489 during the dynamic extent of the execution of @var{body}, also an
490 expression. If an abort occurs to this prompt, control will be passed
491 to @var{handler}, a @code{<lambda-case>} expression with no optional
492 or keyword arguments, and no alternate. The first argument to the
493 @code{<lambda-case>} will be the captured continuation, and then all
494 of the values passed to the abort. @xref{Prompts}, for more
495 information.
496 @end deftp
497 @deftp {Scheme Variable} <abort> tag args tail
498 @deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
499 An abort to the nearest prompt with the name @var{tag}, an expression.
500 @var{args} should be a list of expressions to pass to the prompt's
501 handler, and @var{tail} should be an expression that will evaluate to
502 a list of additional arguments. An abort will save the partial
503 continuation, which may later be reinstated, resulting in the
504 @code{<abort>} expression evaluating to some number of values.
505 @end deftp
506
507 There are two Tree-IL constructs that are not normally produced by
508 higher-level compilers, but instead are generated during the
509 source-to-source optimization and analysis passes that the Tree-IL
510 compiler does. Users should not generate these expressions directly,
511 unless they feel very clever, as the default analysis pass will
512 generate them as necessary.
513
514 @deftp {Scheme Variable} <let-values> src names gensyms exp body
515 @deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
516 Like Scheme's @code{receive} -- binds the values returned by
517 evaluating @code{exp} to the @code{lambda}-like bindings described by
518 @var{gensyms}. That is to say, @var{gensyms} may be an improper list.
519
520 @code{<let-values>} is an optimization of a @code{<call>} to the
521 primitive, @code{call-with-values}.
522 @end deftp
523 @deftp {Scheme Variable} <fix> src names gensyms vals body
524 @deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
525 Like @code{<letrec>}, but only for @var{vals} that are unset
526 @code{lambda} expressions.
527
528 @code{fix} is an optimization of @code{letrec} (and @code{let}).
529 @end deftp
530
531 Tree-IL implements a compiler to GLIL that recursively traverses
532 Tree-IL expressions, writing out GLIL expressions into a linear list.
533 The compiler also keeps some state as to whether the current
534 expression is in tail context, and whether its value will be used in
535 future computations. This state allows the compiler not to emit code
536 for constant expressions that will not be used (e.g.@: docstrings), and
537 to perform tail calls when in tail position.
538
539 Most optimization, such as it currently is, is performed on Tree-IL
540 expressions as source-to-source transformations. There will be more
541 optimizations added in the future.
542
543 Interested readers are encouraged to read the implementation in
544 @code{(language tree-il compile-glil)} for more details.
545
546 @node GLIL
547 @subsection GLIL
548
549 Guile Lowlevel Intermediate Language (GLIL) is a structured intermediate
550 language whose expressions more closely approximate Guile's VM
551 instruction set. Its expression types are defined in @code{(language
552 glil)}.
553
554 @deftp {Scheme Variable} <glil-program> meta . body
555 A unit of code that at run-time will correspond to a compiled
556 procedure. @var{meta} should be an alist of properties, as in
557 Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
558 expressions.
559 @end deftp
560 @deftp {Scheme Variable} <glil-std-prelude> nreq nlocs else-label
561 A prologue for a function with no optional, keyword, or rest
562 arguments. @var{nreq} is the number of required arguments. @var{nlocs}
563 the total number of local variables, including the arguments. If the
564 procedure was not given exactly @var{nreq} arguments, control will
565 jump to @var{else-label}, if given, or otherwise signal an error.
566 @end deftp
567 @deftp {Scheme Variable} <glil-opt-prelude> nreq nopt rest nlocs else-label
568 A prologue for a function with optional or rest arguments. Like
569 @code{<glil-std-prelude>}, with the addition that @var{nopt} is the
570 number of optional arguments (possibly zero) and @var{rest} is an
571 index of a local variable at which to bind a rest argument, or
572 @code{#f} if there is no rest argument.
573 @end deftp
574 @deftp {Scheme Variable} <glil-kw-prelude> nreq nopt rest kw allow-other-keys? nlocs else-label
575 A prologue for a function with keyword arguments. Like
576 @code{<glil-opt-prelude>}, with the addition that @var{kw} is a list
577 of keyword arguments, and @var{allow-other-keys?} is a flag indicating
578 whether to allow unknown keys. @xref{Function Prologue Instructions,
579 @code{bind-kwargs}}, for details on the format of @var{kw}.
580 @end deftp
581 @deftp {Scheme Variable} <glil-bind> . vars
582 An advisory expression that notes a liveness extent for a set of
583 variables. @var{vars} is a list of @code{(@var{name} @var{type}
584 @var{index})}, where @var{type} should be either @code{argument},
585 @code{local}, or @code{external}.
586
587 @code{<glil-bind>} expressions end up being serialized as part of a
588 program's metadata and do not form part of a program's code path.
589 @end deftp
590 @deftp {Scheme Variable} <glil-mv-bind> vars rest
591 A multiple-value binding of the values on the stack to @var{vars}. Iff
592 @var{rest} is true, the last element of @var{vars} will be treated as
593 a rest argument.
594
595 In addition to pushing a binding annotation on the stack, like
596 @code{<glil-bind>}, an expression is emitted at compilation time to
597 make sure that there are enough values available to bind. See the
598 notes on @code{truncate-values} in @ref{Procedure Call and Return
599 Instructions}, for more information.
600 @end deftp
601 @deftp {Scheme Variable} <glil-unbind>
602 Closes the liveness extent of the most recently encountered
603 @code{<glil-bind>} or @code{<glil-mv-bind>} expression. As GLIL
604 expressions are compiled, a parallel stack of live bindings is
605 maintained; this expression pops off the top element from that stack.
606
607 Bindings are written into the program's metadata so that debuggers and
608 other tools can determine the set of live local variables at a given
609 offset within a VM program.
610 @end deftp
611 @deftp {Scheme Variable} <glil-source> loc
612 Records source information for the preceding expression. @var{loc}
613 should be an association list of containing @code{line} @code{column},
614 and @code{filename} keys, e.g.@: as returned by
615 @code{source-properties}.
616 @end deftp
617 @deftp {Scheme Variable} <glil-void>
618 Pushes ``the unspecified value'' on the stack.
619 @end deftp
620 @deftp {Scheme Variable} <glil-const> obj
621 Pushes a constant value onto the stack. @var{obj} must be a number,
622 string, symbol, keyword, boolean, character, uniform array, the empty
623 list, or a pair or vector of constants.
624 @end deftp
625 @deftp {Scheme Variable} <glil-lexical> local? boxed? op index
626 Accesses a lexically bound variable. If the variable is not
627 @var{local?} it is free. All variables may have @code{ref},
628 @code{set}, and @code{bound?} as their @var{op}. Boxed variables may
629 also have the @var{op}s @code{box}, @code{empty-box}, and @code{fix},
630 which correspond in semantics to the VM instructions @code{box},
631 @code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
632 more information.
633 @end deftp
634 @deftp {Scheme Variable} <glil-toplevel> op name
635 Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
636 or @code{define}.
637 @end deftp
638 @deftp {Scheme Variable} <glil-module> op mod name public?
639 Accesses a variable within a specific module. See Tree-IL's
640 @code{<module-ref>}, for more information.
641 @end deftp
642 @deftp {Scheme Variable} <glil-label> label
643 Creates a new label. @var{label} can be any Scheme value, and should
644 be unique.
645 @end deftp
646 @deftp {Scheme Variable} <glil-branch> inst label
647 Branch to a label. @var{label} should be a @code{<ghil-label>}.
648 @code{inst} is a branching instruction: @code{br-if}, @code{br}, etc.
649 @end deftp
650 @deftp {Scheme Variable} <glil-call> inst nargs
651 This expression is probably misnamed, as it does not correspond to
652 function calls. @code{<glil-call>} invokes the VM instruction named
653 @var{inst}, noting that it is called with @var{nargs} stack arguments.
654 The arguments should be pushed on the stack already. What happens to
655 the stack afterwards depends on the instruction.
656 @end deftp
657 @deftp {Scheme Variable} <glil-mv-call> nargs ra
658 Performs a multiple-value call. @var{ra} is a @code{<glil-label>}
659 corresponding to the multiple-value return address for the call. See
660 the notes on @code{mv-call} in @ref{Procedure Call and Return
661 Instructions}, for more information.
662 @end deftp
663 @deftp {Scheme Variable} <glil-prompt> label escape-only?
664 Push a dynamic prompt into the stack, with a handler at @var{label}.
665 @var{escape-only?} is a flag that is propagated to the prompt,
666 allowing an abort to avoid capturing a continuation in some cases.
667 @xref{Prompts}, for more information.
668 @end deftp
669
670 Users may enter in GLIL at the REPL as well, though there is a bit
671 more bookkeeping to do:
672
673 @example
674 scheme@@(guile-user)> ,language glil
675 Happy hacking with Guile Lowlevel Intermediate Language (GLIL)!
676 To switch back, type `,L scheme'.
677 glil@@(guile-user)> (program () (std-prelude 0 0 #f)
678 (const 3) (call return 1))
679 @result{} 3
680 @end example
681
682 Just as in all of Guile's compilers, an environment is passed to the
683 GLIL-to-object code compiler, and one is returned as well, along with
684 the object code.
685
686 @node Assembly
687 @subsection Assembly
688
689 Assembly is an S-expression-based, human-readable representation of
690 the actual bytecodes that will be emitted for the VM. As such, it is a
691 useful intermediate language both for compilation and for
692 decompilation.
693
694 Besides the fact that it is not a record-based language, assembly
695 differs from GLIL in four main ways:
696
697 @itemize
698 @item Labels have been resolved to byte offsets in the program.
699 @item Constants inside procedures have either been expressed as inline
700 instructions or cached in object arrays.
701 @item Procedures with metadata (source location information, liveness
702 extents, procedure names, generic properties, etc) have had their
703 metadata serialized out to thunks.
704 @item All expressions correspond directly to VM instructions -- i.e.,
705 there is no @code{<glil-lexical>} which can be a ref or a set.
706 @end itemize
707
708 Assembly is isomorphic to the bytecode that it compiles to. You can
709 compile to bytecode, then decompile back to assembly, and you have the
710 same assembly code.
711
712 The general form of assembly instructions is the following:
713
714 @lisp
715 (@var{inst} @var{arg} ...)
716 @end lisp
717
718 The @var{inst} names a VM instruction, and its @var{arg}s will be
719 embedded in the instruction stream. The easiest way to see assembly is
720 to play around with it at the REPL, as can be seen in this annotated
721 example:
722
723 @example
724 scheme@@(guile-user)> ,pp (compile '(+ 32 10) #:to 'assembly)
725 (load-program
726 ((:LCASE16 . 2)) ; Labels, unused in this case.
727 8 ; Length of the thunk that was compiled.
728 (load-program ; Metadata thunk.
729 ()
730 17
731 #f ; No metadata thunk for the metadata thunk.
732 (make-eol)
733 (make-eol)
734 (make-int8 2) ; Liveness extents, source info, and arities,
735 (make-int8 8) ; in a format that Guile knows how to parse.
736 (make-int8:0)
737 (list 0 3)
738 (list 0 1)
739 (list 0 3)
740 (return))
741 (assert-nargs-ee/locals 0) ; Prologue.
742 (make-int8 32) ; Actual code starts here.
743 (make-int8 10)
744 (add)
745 (return))
746 @end example
747
748 Of course you can switch the REPL to assembly and enter in assembly
749 S-expressions directly, like with other languages, though it is more
750 difficult, given that the length fields have to be correct.
751
752 @node Bytecode and Objcode
753 @subsection Bytecode and Objcode
754
755 Finally, the raw bytes. There are actually two different ``languages''
756 here, corresponding to two different ways to represent the bytes.
757
758 ``Bytecode'' represents code as uniform byte vectors, useful for
759 structuring and destructuring code on the Scheme level. Bytecode is
760 the next step down from assembly:
761
762 @example
763 scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
764 @result{} #vu8(8 0 0 0 25 0 0 0 ; Header.
765 95 0 ; Prologue.
766 10 32 10 10 148 66 17 ; Actual code.
767 0 0 0 0 0 0 0 9 ; Metadata thunk.
768 9 10 2 10 8 11 18 0 3 18 0 1 18 0 3 66)
769 @end example
770
771 ``Objcode'' is bytecode, but mapped directly to a C structure,
772 @code{struct scm_objcode}:
773
774 @example
775 struct scm_objcode @{
776 scm_t_uint32 len;
777 scm_t_uint32 metalen;
778 scm_t_uint8 base[0];
779 @};
780 @end example
781
782 As one might imagine, objcode imposes a minimum length on the
783 bytecode. Also, the @code{len} and @code{metalen} fields are in native
784 endianness, which makes objcode (and bytecode) system-dependent.
785
786 Objcode also has a couple of important efficiency hacks. First,
787 objcode may be mapped directly from disk, allowing compiled code to be
788 loaded quickly, often from the system's disk cache, and shared among
789 multiple processes. Secondly, objcode may be embedded in other
790 objcode, allowing procedures to have the text of other procedures
791 inlined into their bodies, without the need for separate allocation of
792 the code. Of course, the objcode object itself does need to be
793 allocated.
794
795 Procedures related to objcode are defined in the @code{(system vm
796 objcode)} module.
797
798 @deffn {Scheme Procedure} objcode? obj
799 @deffnx {C Function} scm_objcode_p (obj)
800 Returns @code{#f} iff @var{obj} is object code, @code{#f} otherwise.
801 @end deffn
802
803 @deffn {Scheme Procedure} bytecode->objcode bytecode
804 @deffnx {C Function} scm_bytecode_to_objcode (bytecode)
805 Makes a bytecode object from @var{bytecode}, which should be a
806 bytevector. @xref{Bytevectors}.
807 @end deffn
808
809 @deffn {Scheme Variable} load-objcode file
810 @deffnx {C Function} scm_load_objcode (file)
811 Load object code from a file named @var{file}. The file will be mapped
812 into memory via @code{mmap}, so this is a very fast operation.
813
814 On disk, object code has an sixteen-byte cookie prepended to it, to
815 prevent accidental loading of arbitrary garbage.
816 @end deffn
817
818 @deffn {Scheme Variable} write-objcode objcode file
819 @deffnx {C Function} scm_write_objcode (objcode)
820 Write object code out to a file, prepending the sixteen-byte cookie.
821 @end deffn
822
823 @deffn {Scheme Variable} objcode->bytecode objcode
824 @deffnx {C Function} scm_objcode_to_bytecode (objcode)
825 Copy object code out to a bytevector for analysis by Scheme.
826 @end deffn
827
828 The following procedure is actually in @code{(system vm program)}, but
829 we'll mention it here:
830
831 @deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
832 @deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
833 Load up object code into a Scheme program. The resulting program will
834 have @var{objtable} as its object table, which should be a vector or
835 @code{#f}, and will capture the free variables from @var{free-vars}.
836 @end deffn
837
838 Object code from a file may be disassembled at the REPL via the
839 meta-command @code{,disassemble-file}, abbreviated as @code{,xx}.
840 Programs may be disassembled via @code{,disassemble}, abbreviated as
841 @code{,x}.
842
843 Compiling object code to the fake language, @code{value}, is performed
844 via loading objcode into a program, then executing that thunk with
845 respect to the compilation environment. Normally the environment
846 propagates through the compiler transparently, but users may specify
847 the compilation environment manually as well, as a module.
848
849
850 @node Writing New High-Level Languages
851 @subsection Writing New High-Level Languages
852
853 In order to integrate a new language @var{lang} into Guile's compiler
854 system, one has to create the module @code{(language @var{lang} spec)}
855 containing the language definition and referencing the parser,
856 compiler and other routines processing it. The module hierarchy in
857 @code{(language brainfuck)} defines a very basic Brainfuck
858 implementation meant to serve as easy-to-understand example on how to
859 do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
860 for more information about the Brainfuck language itself.
861
862
863 @node Extending the Compiler
864 @subsection Extending the Compiler
865
866 At this point we take a detour from the impersonal tone of the rest of
867 the manual. Admit it: if you've read this far into the compiler
868 internals manual, you are a junkie. Perhaps a course at your university
869 left you unsated, or perhaps you've always harbored a desire to hack the
870 holy of computer science holies: a compiler. Well you're in good
871 company, and in a good position. Guile's compiler needs your help.
872
873 There are many possible avenues for improving Guile's compiler.
874 Probably the most important improvement, speed-wise, will be some form
875 of native compilation, both just-in-time and ahead-of-time. This could
876 be done in many ways. Probably the easiest strategy would be to extend
877 the compiled procedure structure to include a pointer to a native code
878 vector, and compile from bytecode to native code at run-time after a
879 procedure is called a certain number of times.
880
881 The name of the game is a profiling-based harvest of the low-hanging
882 fruit, running programs of interest under a system-level profiler and
883 determining which improvements would give the most bang for the buck.
884 It's really getting to the point though that native compilation is the
885 next step.
886
887 The compiler also needs help at the top end, enhancing the Scheme that
888 it knows to also understand R6RS, and adding new high-level compilers.
889 We have JavaScript and Emacs Lisp mostly complete, but they could use
890 some love; Lua would be nice as well, but whatever language it is
891 that strikes your fancy would be welcome too.
892
893 Compilers are for hacking, not for admiring or for complaining about.
894 Get to it!