Critical sections in guardians do not need to block asyncs
[bpt/guile.git] / doc / ref / compiler.texi
1 @c -*-texinfo-*-
2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 2008, 2009, 2010, 2013
4 @c Free Software Foundation, Inc.
5 @c See the file guile.texi for copying conditions.
6
7 @node Compiling to the Virtual Machine
8 @section Compiling to the Virtual Machine
9
10 Compilers have a mystique about them that is attractive and
11 off-putting at the same time. They are attractive because they are
12 magical -- they transform inert text into live results, like throwing
13 the switch on Frankenstein's monster. However, this magic is perceived
14 by many to be impenetrable.
15
16 This section aims to pay attention to the small man behind the
17 curtain.
18
19 @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
20 know how to compile your @code{.scm} file.
21
22 @menu
23 * Compiler Tower::
24 * The Scheme Compiler::
25 * Tree-IL::
26 * GLIL::
27 * Assembly::
28 * Bytecode and Objcode::
29 * Writing New High-Level Languages::
30 * Extending the Compiler::
31 @end menu
32
33 @node Compiler Tower
34 @subsection Compiler Tower
35
36 Guile's compiler is quite simple, actually -- its @emph{compilers}, to
37 put it more accurately. Guile defines a tower of languages, starting
38 at Scheme and progressively simplifying down to languages that
39 resemble the VM instruction set (@pxref{Instruction Set}).
40
41 Each language knows how to compile to the next, so each step is simple
42 and understandable. Furthermore, this set of languages is not
43 hardcoded into Guile, so it is possible for the user to add new
44 high-level languages, new passes, or even different compilation
45 targets.
46
47 Languages are registered in the module, @code{(system base language)}:
48
49 @example
50 (use-modules (system base language))
51 @end example
52
53 They are registered with the @code{define-language} form.
54
55 @deffn {Scheme Syntax} define-language @
56 [#:name] [#:title] [#:reader] [#:printer] @
57 [#:parser=#f] [#:compilers='()] @
58 [#:decompilers='()] [#:evaluator=#f] @
59 [#:joiner=#f] [#:for-humans?=#t] @
60 [#:make-default-environment=make-fresh-user-module]
61 Define a language.
62
63 This syntax defines a @code{#<language>} object, bound to @var{name}
64 in the current environment. In addition, the language will be added to
65 the global language set. For example, this is the language definition
66 for Scheme:
67
68 @example
69 (define-language scheme
70 #:title "Scheme"
71 #:reader (lambda (port env) ...)
72 #:compilers `((tree-il . ,compile-tree-il))
73 #:decompilers `((tree-il . ,decompile-tree-il))
74 #:evaluator (lambda (x module) (primitive-eval x))
75 #:printer write
76 #:make-default-environment (lambda () ...))
77 @end example
78 @end deffn
79
80 The interesting thing about having languages defined this way is that
81 they present a uniform interface to the read-eval-print loop. This
82 allows the user to change the current language of the REPL:
83
84 @example
85 scheme@@(guile-user)> ,language tree-il
86 Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
87 tree-il@@(guile-user)> ,L scheme
88 Happy hacking with Scheme! To switch back, type `,L tree-il'.
89 scheme@@(guile-user)>
90 @end example
91
92 Languages can be looked up by name, as they were above.
93
94 @deffn {Scheme Procedure} lookup-language name
95 Looks up a language named @var{name}, autoloading it if necessary.
96
97 Languages are autoloaded by looking for a variable named @var{name} in
98 a module named @code{(language @var{name} spec)}.
99
100 The language object will be returned, or @code{#f} if there does not
101 exist a language with that name.
102 @end deffn
103
104 Defining languages this way allows us to programmatically determine
105 the necessary steps for compiling code from one language to another.
106
107 @deffn {Scheme Procedure} lookup-compilation-order from to
108 Recursively traverses the set of languages to which @var{from} can
109 compile, depth-first, and return the first path that can transform
110 @var{from} to @var{to}. Returns @code{#f} if no path is found.
111
112 This function memoizes its results in a cache that is invalidated by
113 subsequent calls to @code{define-language}, so it should be quite
114 fast.
115 @end deffn
116
117 There is a notion of a ``current language'', which is maintained in the
118 @code{current-language} parameter, defined in the core @code{(guile)}
119 module. This language is normally Scheme, and may be rebound by the
120 user. The run-time compilation interfaces
121 (@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
122 and target languages.
123
124 The normal tower of languages when compiling Scheme goes like this:
125
126 @itemize
127 @item Scheme
128 @item Tree Intermediate Language (Tree-IL)
129 @item Guile Lowlevel Intermediate Language (GLIL)
130 @item Assembly
131 @item Bytecode
132 @item Objcode
133 @end itemize
134
135 Object code may be serialized to disk directly, though it has a cookie
136 and version prepended to the front. But when compiling Scheme at run
137 time, you want a Scheme value: for example, a compiled procedure. For
138 this reason, so as not to break the abstraction, Guile defines a fake
139 language at the bottom of the tower:
140
141 @itemize
142 @item Value
143 @end itemize
144
145 Compiling to @code{value} loads the object code into a procedure, and
146 wakes the sleeping giant.
147
148 Perhaps this strangeness can be explained by example:
149 @code{compile-file} defaults to compiling to object code, because it
150 produces object code that has to live in the barren world outside the
151 Guile runtime; but @code{compile} defaults to compiling to
152 @code{value}, as its product re-enters the Guile world.
153
154 Indeed, the process of compilation can circulate through these
155 different worlds indefinitely, as shown by the following quine:
156
157 @example
158 ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
159 @end example
160
161 @node The Scheme Compiler
162 @subsection The Scheme Compiler
163
164 The job of the Scheme compiler is to expand all macros and all of
165 Scheme to its most primitive expressions. The definition of
166 ``primitive'' is given by the inventory of constructs provided by
167 Tree-IL, the target language of the Scheme compiler: procedure
168 applications, conditionals, lexical references, etc. This is described
169 more fully in the next section.
170
171 The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
172 that it is completely implemented by the macro expander. Since the
173 macro expander has to run over all of the source code already in order
174 to expand macros, it might as well do the analysis at the same time,
175 producing Tree-IL expressions directly.
176
177 Because this compiler is actually the macro expander, it is
178 extensible. Any macro which the user writes becomes part of the
179 compiler.
180
181 The Scheme-to-Tree-IL expander may be invoked using the generic
182 @code{compile} procedure:
183
184 @lisp
185 (compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
186 @result{}
187 #<<application> src: #f
188 proc: #<<toplevel-ref> src: #f name: +>
189 args: (#<<const> src: #f exp: 1>
190 #<<const> src: #f exp: 2>)>
191 @end lisp
192
193 Or, since Tree-IL is so close to Scheme, it is often useful to expand
194 Scheme to Tree-IL, then translate back to Scheme. For that reason the
195 expander provides two interfaces. The former is equivalent to calling
196 @code{(macroexpand '(+ 1 2) 'c)}, where the @code{'c} is for
197 ``compile''. With @code{'e} (the default), the result is translated
198 back to Scheme:
199
200 @lisp
201 (macroexpand '(+ 1 2))
202 @result{} (+ 1 2)
203 (macroexpand '(let ((x 10)) (* x x)))
204 @result{} (let ((x84 10)) (* x84 x84))
205 @end lisp
206
207 The second example shows that as part of its job, the macro expander
208 renames lexically-bound variables. The original names are preserved
209 when compiling to Tree-IL, but can't be represented in Scheme: a
210 lexical binding only has one name. It is for this reason that the
211 @emph{native} output of the expander is @emph{not} Scheme. There's too
212 much information we would lose if we translated to Scheme directly:
213 lexical variable names, source locations, and module hygiene.
214
215 Note however that @code{macroexpand} does not have the same signature
216 as @code{compile-tree-il}. @code{compile-tree-il} is a small wrapper
217 around @code{macroexpand}, to make it conform to the general form of
218 compiler procedures in Guile's language tower.
219
220 Compiler procedures take three arguments: an expression, an
221 environment, and a keyword list of options. They return three values:
222 the compiled expression, the corresponding environment for the target
223 language, and a ``continuation environment''. The compiled expression
224 and environment will serve as input to the next language's compiler.
225 The ``continuation environment'' can be used to compile another
226 expression from the same source language within the same module.
227
228 For example, you might compile the expression, @code{(define-module
229 (foo))}. This will result in a Tree-IL expression and environment. But
230 if you compiled a second expression, you would want to take into
231 account the compile-time effect of compiling the previous expression,
232 which puts the user in the @code{(foo)} module. That is purpose of the
233 ``continuation environment''; you would pass it as the environment
234 when compiling the subsequent expression.
235
236 For Scheme, an environment is a module. By default, the @code{compile}
237 and @code{compile-file} procedures compile in a fresh module, such
238 that bindings and macros introduced by the expression being compiled
239 are isolated:
240
241 @example
242 (eq? (current-module) (compile '(current-module)))
243 @result{} #f
244
245 (compile '(define hello 'world))
246 (defined? 'hello)
247 @result{} #f
248
249 (define / *)
250 (eq? (compile '/) /)
251 @result{} #f
252 @end example
253
254 Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
255 @code{current-reader}}) are isolated:
256
257 @example
258 (compile '(fluid-set! current-reader (lambda args 'fail)))
259 (fluid-ref current-reader)
260 @result{} #f
261 @end example
262
263 Nevertheless, having the compiler and @dfn{compilee} share the same name
264 space can be achieved by explicitly passing @code{(current-module)} as
265 the compilation environment:
266
267 @example
268 (define hello 'world)
269 (compile 'hello #:env (current-module))
270 @result{} world
271 @end example
272
273 @node Tree-IL
274 @subsection Tree-IL
275
276 Tree Intermediate Language (Tree-IL) is a structured intermediate
277 language that is close in expressive power to Scheme. It is an
278 expanded, pre-analyzed Scheme.
279
280 Tree-IL is ``structured'' in the sense that its representation is
281 based on records, not S-expressions. This gives a rigidity to the
282 language that ensures that compiling to a lower-level language only
283 requires a limited set of transformations. For example, the Tree-IL
284 type @code{<const>} is a record type with two fields, @code{src} and
285 @code{exp}. Instances of this type are created via @code{make-const}.
286 Fields of this type are accessed via the @code{const-src} and
287 @code{const-exp} procedures. There is also a predicate, @code{const?}.
288 @xref{Records}, for more information on records.
289
290 @c alpha renaming
291
292 All Tree-IL types have a @code{src} slot, which holds source location
293 information for the expression. This information, if present, will be
294 residualized into the compiled object code, allowing backtraces to
295 show source information. The format of @code{src} is the same as that
296 returned by Guile's @code{source-properties} function. @xref{Source
297 Properties}, for more information.
298
299 Although Tree-IL objects are represented internally using records,
300 there is also an equivalent S-expression external representation for
301 each kind of Tree-IL. For example, the S-expression representation
302 of @code{#<const src: #f exp: 3>} expression would be:
303
304 @example
305 (const 3)
306 @end example
307
308 Users may program with this format directly at the REPL:
309
310 @example
311 scheme@@(guile-user)> ,language tree-il
312 Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
313 tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
314 @result{} 42
315 @end example
316
317 The @code{src} fields are left out of the external representation.
318
319 One may create Tree-IL objects from their external representations via
320 calling @code{parse-tree-il}, the reader for Tree-IL. If any source
321 information is attached to the input S-expression, it will be
322 propagated to the resulting Tree-IL expressions. This is probably the
323 easiest way to compile to Tree-IL: just make the appropriate external
324 representations in S-expression format, and let @code{parse-tree-il}
325 take care of the rest.
326
327 @deftp {Scheme Variable} <void> src
328 @deftpx {External Representation} (void)
329 An empty expression. In practice, equivalent to Scheme's @code{(if #f
330 #f)}.
331 @end deftp
332 @deftp {Scheme Variable} <const> src exp
333 @deftpx {External Representation} (const @var{exp})
334 A constant.
335 @end deftp
336 @deftp {Scheme Variable} <primitive-ref> src name
337 @deftpx {External Representation} (primitive @var{name})
338 A reference to a ``primitive''. A primitive is a procedure that, when
339 compiled, may be open-coded. For example, @code{cons} is usually
340 recognized as a primitive, so that it compiles down to a single
341 instruction.
342
343 Compilation of Tree-IL usually begins with a pass that resolves some
344 @code{<module-ref>} and @code{<toplevel-ref>} expressions to
345 @code{<primitive-ref>} expressions. The actual compilation pass
346 has special cases for applications of certain primitives, like
347 @code{apply} or @code{cons}.
348 @end deftp
349 @deftp {Scheme Variable} <lexical-ref> src name gensym
350 @deftpx {External Representation} (lexical @var{name} @var{gensym})
351 A reference to a lexically-bound variable. The @var{name} is the
352 original name of the variable in the source program. @var{gensym} is a
353 unique identifier for this variable.
354 @end deftp
355 @deftp {Scheme Variable} <lexical-set> src name gensym exp
356 @deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
357 Sets a lexically-bound variable.
358 @end deftp
359 @deftp {Scheme Variable} <module-ref> src mod name public?
360 @deftpx {External Representation} (@@ @var{mod} @var{name})
361 @deftpx {External Representation} (@@@@ @var{mod} @var{name})
362 A reference to a variable in a specific module. @var{mod} should be
363 the name of the module, e.g.@: @code{(guile-user)}.
364
365 If @var{public?} is true, the variable named @var{name} will be looked
366 up in @var{mod}'s public interface, and serialized with @code{@@};
367 otherwise it will be looked up among the module's private bindings,
368 and is serialized with @code{@@@@}.
369 @end deftp
370 @deftp {Scheme Variable} <module-set> src mod name public? exp
371 @deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
372 @deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
373 Sets a variable in a specific module.
374 @end deftp
375 @deftp {Scheme Variable} <toplevel-ref> src name
376 @deftpx {External Representation} (toplevel @var{name})
377 References a variable from the current procedure's module.
378 @end deftp
379 @deftp {Scheme Variable} <toplevel-set> src name exp
380 @deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
381 Sets a variable in the current procedure's module.
382 @end deftp
383 @deftp {Scheme Variable} <toplevel-define> src name exp
384 @deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
385 Defines a new top-level variable in the current procedure's module.
386 @end deftp
387 @deftp {Scheme Variable} <conditional> src test then else
388 @deftpx {External Representation} (if @var{test} @var{then} @var{else})
389 A conditional. Note that @var{else} is not optional.
390 @end deftp
391 @deftp {Scheme Variable} <application> src proc args
392 @deftpx {External Representation} (apply @var{proc} . @var{args})
393 A procedure call.
394 @end deftp
395 @deftp {Scheme Variable} <sequence> src exps
396 @deftpx {External Representation} (begin . @var{exps})
397 Like Scheme's @code{begin}.
398 @end deftp
399 @deftp {Scheme Variable} <lambda> src meta body
400 @deftpx {External Representation} (lambda @var{meta} @var{body})
401 A closure. @var{meta} is an association list of properties for the
402 procedure. @var{body} is a single Tree-IL expression of type
403 @code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
404 an alternate clause, this makes Tree-IL's @code{<lambda>} have the
405 expressiveness of Scheme's @code{case-lambda}.
406 @end deftp
407 @deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
408 @deftpx {External Representation} @
409 (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
410 @var{body})@
411 [@var{alternate}])
412 One clause of a @code{case-lambda}. A @code{lambda} expression in
413 Scheme is treated as a @code{case-lambda} with one clause.
414
415 @var{req} is a list of the procedure's required arguments, as symbols.
416 @var{opt} is a list of the optional arguments, or @code{#f} if there
417 are no optional arguments. @var{rest} is the name of the rest
418 argument, or @code{#f}.
419
420 @var{kw} is a list of the form, @code{(@var{allow-other-keys?}
421 (@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
422 keyword corresponding to the argument named @var{name}, and whose
423 corresponding gensym is @var{var}. @var{inits} are tree-il expressions
424 corresponding to all of the optional and keyword arguments, evaluated
425 to bind variables whose value is not supplied by the procedure caller.
426 Each @var{init} expression is evaluated in the lexical context of
427 previously bound variables, from left to right.
428
429 @var{gensyms} is a list of gensyms corresponding to all arguments:
430 first all of the required arguments, then the optional arguments if
431 any, then the rest argument if any, then all of the keyword arguments.
432
433 @var{body} is the body of the clause. If the procedure is called with
434 an appropriate number of arguments, @var{body} is evaluated in tail
435 position. Otherwise, if there is an @var{alternate}, it should be a
436 @code{<lambda-case>} expression, representing the next clause to try.
437 If there is no @var{alternate}, a wrong-number-of-arguments error is
438 signaled.
439 @end deftp
440 @deftp {Scheme Variable} <let> src names gensyms vals exp
441 @deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
442 Lexical binding, like Scheme's @code{let}. @var{names} are the
443 original binding names, @var{gensyms} are gensyms corresponding to the
444 @var{names}, and @var{vals} are Tree-IL expressions for the values.
445 @var{exp} is a single Tree-IL expression.
446 @end deftp
447 @deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
448 @deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
449 @deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
450 A version of @code{<let>} that creates recursive bindings, like
451 Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
452 @end deftp
453 @deftp {Scheme Variable} <dynlet> fluids vals body
454 @deftpx {External Representation} (dynlet @var{fluids} @var{vals} @var{body})
455 Dynamic binding; the equivalent of Scheme's @code{with-fluids}.
456 @var{fluids} should be a list of Tree-IL expressions that will
457 evaluate to fluids, and @var{vals} a corresponding list of expressions
458 to bind to the fluids during the dynamic extent of the evaluation of
459 @var{body}.
460 @end deftp
461 @deftp {Scheme Variable} <dynref> fluid
462 @deftpx {External Representation} (dynref @var{fluid})
463 A dynamic variable reference. @var{fluid} should be a Tree-IL
464 expression evaluating to a fluid.
465 @end deftp
466 @deftp {Scheme Variable} <dynset> fluid exp
467 @deftpx {External Representation} (dynset @var{fluid} @var{exp})
468 A dynamic variable set. @var{fluid}, a Tree-IL expression evaluating
469 to a fluid, will be set to the result of evaluating @var{exp}.
470 @end deftp
471 @deftp {Scheme Variable} <dynwind> winder body unwinder
472 @deftpx {External Representation} (dynwind @var{winder} @var{body} @var{unwinder})
473 A @code{dynamic-wind}. @var{winder} and @var{unwinder} should both
474 evaluate to thunks. Ensure that the winder and the unwinder are called
475 before entering and after leaving @var{body}. Note that @var{body} is
476 an expression, without a thunk wrapper.
477 @end deftp
478 @deftp {Scheme Variable} <prompt> tag body handler
479 @deftpx {External Representation} (prompt @var{tag} @var{body} @var{handler})
480 A dynamic prompt. Instates a prompt named @var{tag}, an expression,
481 during the dynamic extent of the execution of @var{body}, also an
482 expression. If an abort occurs to this prompt, control will be passed
483 to @var{handler}, a @code{<lambda-case>} expression with no optional
484 or keyword arguments, and no alternate. The first argument to the
485 @code{<lambda-case>} will be the captured continuation, and then all
486 of the values passed to the abort. @xref{Prompts}, for more
487 information.
488 @end deftp
489 @deftp {Scheme Variable} <abort> tag args tail
490 @deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
491 An abort to the nearest prompt with the name @var{tag}, an expression.
492 @var{args} should be a list of expressions to pass to the prompt's
493 handler, and @var{tail} should be an expression that will evaluate to
494 a list of additional arguments. An abort will save the partial
495 continuation, which may later be reinstated, resulting in the
496 @code{<abort>} expression evaluating to some number of values.
497 @end deftp
498
499 There are two Tree-IL constructs that are not normally produced by
500 higher-level compilers, but instead are generated during the
501 source-to-source optimization and analysis passes that the Tree-IL
502 compiler does. Users should not generate these expressions directly,
503 unless they feel very clever, as the default analysis pass will
504 generate them as necessary.
505
506 @deftp {Scheme Variable} <let-values> src names gensyms exp body
507 @deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
508 Like Scheme's @code{receive} -- binds the values returned by
509 evaluating @code{exp} to the @code{lambda}-like bindings described by
510 @var{gensyms}. That is to say, @var{gensyms} may be an improper list.
511
512 @code{<let-values>} is an optimization of @code{<application>} of the
513 primitive, @code{call-with-values}.
514 @end deftp
515 @deftp {Scheme Variable} <fix> src names gensyms vals body
516 @deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
517 Like @code{<letrec>}, but only for @var{vals} that are unset
518 @code{lambda} expressions.
519
520 @code{fix} is an optimization of @code{letrec} (and @code{let}).
521 @end deftp
522
523 Tree-IL implements a compiler to GLIL that recursively traverses
524 Tree-IL expressions, writing out GLIL expressions into a linear list.
525 The compiler also keeps some state as to whether the current
526 expression is in tail context, and whether its value will be used in
527 future computations. This state allows the compiler not to emit code
528 for constant expressions that will not be used (e.g.@: docstrings), and
529 to perform tail calls when in tail position.
530
531 Most optimization, such as it currently is, is performed on Tree-IL
532 expressions as source-to-source transformations. There will be more
533 optimizations added in the future.
534
535 Interested readers are encouraged to read the implementation in
536 @code{(language tree-il compile-glil)} for more details.
537
538 @node GLIL
539 @subsection GLIL
540
541 Guile Lowlevel Intermediate Language (GLIL) is a structured intermediate
542 language whose expressions more closely approximate Guile's VM
543 instruction set. Its expression types are defined in @code{(language
544 glil)}.
545
546 @deftp {Scheme Variable} <glil-program> meta . body
547 A unit of code that at run-time will correspond to a compiled
548 procedure. @var{meta} should be an alist of properties, as in
549 Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
550 expressions.
551 @end deftp
552 @deftp {Scheme Variable} <glil-std-prelude> nreq nlocs else-label
553 A prologue for a function with no optional, keyword, or rest
554 arguments. @var{nreq} is the number of required arguments. @var{nlocs}
555 the total number of local variables, including the arguments. If the
556 procedure was not given exactly @var{nreq} arguments, control will
557 jump to @var{else-label}, if given, or otherwise signal an error.
558 @end deftp
559 @deftp {Scheme Variable} <glil-opt-prelude> nreq nopt rest nlocs else-label
560 A prologue for a function with optional or rest arguments. Like
561 @code{<glil-std-prelude>}, with the addition that @var{nopt} is the
562 number of optional arguments (possibly zero) and @var{rest} is an
563 index of a local variable at which to bind a rest argument, or
564 @code{#f} if there is no rest argument.
565 @end deftp
566 @deftp {Scheme Variable} <glil-kw-prelude> nreq nopt rest kw allow-other-keys? nlocs else-label
567 A prologue for a function with keyword arguments. Like
568 @code{<glil-opt-prelude>}, with the addition that @var{kw} is a list
569 of keyword arguments, and @var{allow-other-keys?} is a flag indicating
570 whether to allow unknown keys. @xref{Function Prologue Instructions,
571 @code{bind-kwargs}}, for details on the format of @var{kw}.
572 @end deftp
573 @deftp {Scheme Variable} <glil-bind> . vars
574 An advisory expression that notes a liveness extent for a set of
575 variables. @var{vars} is a list of @code{(@var{name} @var{type}
576 @var{index})}, where @var{type} should be either @code{argument},
577 @code{local}, or @code{external}.
578
579 @code{<glil-bind>} expressions end up being serialized as part of a
580 program's metadata and do not form part of a program's code path.
581 @end deftp
582 @deftp {Scheme Variable} <glil-mv-bind> vars rest
583 A multiple-value binding of the values on the stack to @var{vars}. If
584 @var{rest} is true, the last element of @var{vars} will be treated as a
585 rest argument.
586
587 In addition to pushing a binding annotation on the stack, like
588 @code{<glil-bind>}, an expression is emitted at compilation time to
589 make sure that there are enough values available to bind. See the
590 notes on @code{truncate-values} in @ref{Procedure Call and Return
591 Instructions}, for more information.
592 @end deftp
593 @deftp {Scheme Variable} <glil-unbind>
594 Closes the liveness extent of the most recently encountered
595 @code{<glil-bind>} or @code{<glil-mv-bind>} expression. As GLIL
596 expressions are compiled, a parallel stack of live bindings is
597 maintained; this expression pops off the top element from that stack.
598
599 Bindings are written into the program's metadata so that debuggers and
600 other tools can determine the set of live local variables at a given
601 offset within a VM program.
602 @end deftp
603 @deftp {Scheme Variable} <glil-source> loc
604 Records source information for the preceding expression. @var{loc}
605 should be an association list of containing @code{line} @code{column},
606 and @code{filename} keys, e.g.@: as returned by
607 @code{source-properties}.
608 @end deftp
609 @deftp {Scheme Variable} <glil-void>
610 Pushes ``the unspecified value'' on the stack.
611 @end deftp
612 @deftp {Scheme Variable} <glil-const> obj
613 Pushes a constant value onto the stack. @var{obj} must be a number,
614 string, symbol, keyword, boolean, character, uniform array, the empty
615 list, or a pair or vector of constants.
616 @end deftp
617 @deftp {Scheme Variable} <glil-lexical> local? boxed? op index
618 Accesses a lexically bound variable. If the variable is not
619 @var{local?} it is free. All variables may have @code{ref},
620 @code{set}, and @code{bound?} as their @var{op}. Boxed variables may
621 also have the @var{op}s @code{box}, @code{empty-box}, and @code{fix},
622 which correspond in semantics to the VM instructions @code{box},
623 @code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
624 more information.
625 @end deftp
626 @deftp {Scheme Variable} <glil-toplevel> op name
627 Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
628 or @code{define}.
629 @end deftp
630 @deftp {Scheme Variable} <glil-module> op mod name public?
631 Accesses a variable within a specific module. See Tree-IL's
632 @code{<module-ref>}, for more information.
633 @end deftp
634 @deftp {Scheme Variable} <glil-label> label
635 Creates a new label. @var{label} can be any Scheme value, and should
636 be unique.
637 @end deftp
638 @deftp {Scheme Variable} <glil-branch> inst label
639 Branch to a label. @var{label} should be a @code{<ghil-label>}.
640 @code{inst} is a branching instruction: @code{br-if}, @code{br}, etc.
641 @end deftp
642 @deftp {Scheme Variable} <glil-call> inst nargs
643 This expression is probably misnamed, as it does not correspond to
644 function calls. @code{<glil-call>} invokes the VM instruction named
645 @var{inst}, noting that it is called with @var{nargs} stack arguments.
646 The arguments should be pushed on the stack already. What happens to
647 the stack afterwards depends on the instruction.
648 @end deftp
649 @deftp {Scheme Variable} <glil-mv-call> nargs ra
650 Performs a multiple-value call. @var{ra} is a @code{<glil-label>}
651 corresponding to the multiple-value return address for the call. See
652 the notes on @code{mv-call} in @ref{Procedure Call and Return
653 Instructions}, for more information.
654 @end deftp
655 @deftp {Scheme Variable} <glil-prompt> label escape-only?
656 Push a dynamic prompt into the stack, with a handler at @var{label}.
657 @var{escape-only?} is a flag that is propagated to the prompt,
658 allowing an abort to avoid capturing a continuation in some cases.
659 @xref{Prompts}, for more information.
660 @end deftp
661
662 Users may enter in GLIL at the REPL as well, though there is a bit
663 more bookkeeping to do:
664
665 @example
666 scheme@@(guile-user)> ,language glil
667 Happy hacking with Guile Lowlevel Intermediate Language (GLIL)!
668 To switch back, type `,L scheme'.
669 glil@@(guile-user)> (program () (std-prelude 0 0 #f)
670 (const 3) (call return 1))
671 @result{} 3
672 @end example
673
674 Just as in all of Guile's compilers, an environment is passed to the
675 GLIL-to-object code compiler, and one is returned as well, along with
676 the object code.
677
678 @node Assembly
679 @subsection Assembly
680
681 Assembly is an S-expression-based, human-readable representation of
682 the actual bytecodes that will be emitted for the VM. As such, it is a
683 useful intermediate language both for compilation and for
684 decompilation.
685
686 Besides the fact that it is not a record-based language, assembly
687 differs from GLIL in four main ways:
688
689 @itemize
690 @item Labels have been resolved to byte offsets in the program.
691 @item Constants inside procedures have either been expressed as inline
692 instructions or cached in object arrays.
693 @item Procedures with metadata (source location information, liveness
694 extents, procedure names, generic properties, etc) have had their
695 metadata serialized out to thunks.
696 @item All expressions correspond directly to VM instructions -- i.e.,
697 there is no @code{<glil-lexical>} which can be a ref or a set.
698 @end itemize
699
700 Assembly is isomorphic to the bytecode that it compiles to. You can
701 compile to bytecode, then decompile back to assembly, and you have the
702 same assembly code.
703
704 The general form of assembly instructions is the following:
705
706 @lisp
707 (@var{inst} @var{arg} ...)
708 @end lisp
709
710 The @var{inst} names a VM instruction, and its @var{arg}s will be
711 embedded in the instruction stream. The easiest way to see assembly is
712 to play around with it at the REPL, as can be seen in this annotated
713 example:
714
715 @example
716 scheme@@(guile-user)> ,pp (compile '(+ 32 10) #:to 'assembly)
717 (load-program
718 ((:LCASE16 . 2)) ; Labels, unused in this case.
719 8 ; Length of the thunk that was compiled.
720 (load-program ; Metadata thunk.
721 ()
722 17
723 #f ; No metadata thunk for the metadata thunk.
724 (make-eol)
725 (make-eol)
726 (make-int8 2) ; Liveness extents, source info, and arities,
727 (make-int8 8) ; in a format that Guile knows how to parse.
728 (make-int8:0)
729 (list 0 3)
730 (list 0 1)
731 (list 0 3)
732 (return))
733 (assert-nargs-ee/locals 0) ; Prologue.
734 (make-int8 32) ; Actual code starts here.
735 (make-int8 10)
736 (add)
737 (return))
738 @end example
739
740 Of course you can switch the REPL to assembly and enter in assembly
741 S-expressions directly, like with other languages, though it is more
742 difficult, given that the length fields have to be correct.
743
744 @node Bytecode and Objcode
745 @subsection Bytecode and Objcode
746
747 Finally, the raw bytes. There are actually two different ``languages''
748 here, corresponding to two different ways to represent the bytes.
749
750 ``Bytecode'' represents code as uniform byte vectors, useful for
751 structuring and destructuring code on the Scheme level. Bytecode is
752 the next step down from assembly:
753
754 @example
755 scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
756 @result{} #vu8(8 0 0 0 25 0 0 0 ; Header.
757 95 0 ; Prologue.
758 10 32 10 10 148 66 17 ; Actual code.
759 0 0 0 0 0 0 0 9 ; Metadata thunk.
760 9 10 2 10 8 11 18 0 3 18 0 1 18 0 3 66)
761 @end example
762
763 ``Objcode'' is bytecode, but mapped directly to a C structure,
764 @code{struct scm_objcode}:
765
766 @example
767 struct scm_objcode @{
768 scm_t_uint32 len;
769 scm_t_uint32 metalen;
770 scm_t_uint8 base[0];
771 @};
772 @end example
773
774 As one might imagine, objcode imposes a minimum length on the
775 bytecode. Also, the @code{len} and @code{metalen} fields are in native
776 endianness, which makes objcode (and bytecode) system-dependent.
777
778 Objcode also has a couple of important efficiency hacks. First,
779 objcode may be mapped directly from disk, allowing compiled code to be
780 loaded quickly, often from the system's disk cache, and shared among
781 multiple processes. Secondly, objcode may be embedded in other
782 objcode, allowing procedures to have the text of other procedures
783 inlined into their bodies, without the need for separate allocation of
784 the code. Of course, the objcode object itself does need to be
785 allocated.
786
787 Procedures related to objcode are defined in the @code{(system vm
788 objcode)} module.
789
790 @deffn {Scheme Procedure} objcode? obj
791 @deffnx {C Function} scm_objcode_p (obj)
792 Returns @code{#f} if @var{obj} is object code, @code{#f} otherwise.
793 @end deffn
794
795 @deffn {Scheme Procedure} bytecode->objcode bytecode
796 @deffnx {C Function} scm_bytecode_to_objcode (bytecode)
797 Makes a bytecode object from @var{bytecode}, which should be a
798 bytevector. @xref{Bytevectors}.
799 @end deffn
800
801 @deffn {Scheme Variable} load-objcode file
802 @deffnx {C Function} scm_load_objcode (file)
803 Load object code from a file named @var{file}. The file will be mapped
804 into memory via @code{mmap}, so this is a very fast operation.
805
806 On disk, object code has an sixteen-byte cookie prepended to it, to
807 prevent accidental loading of arbitrary garbage.
808 @end deffn
809
810 @deffn {Scheme Variable} write-objcode objcode file
811 @deffnx {C Function} scm_write_objcode (objcode)
812 Write object code out to a file, prepending the sixteen-byte cookie.
813 @end deffn
814
815 @deffn {Scheme Variable} objcode->bytecode objcode
816 @deffnx {C Function} scm_objcode_to_bytecode (objcode)
817 Copy object code out to a bytevector for analysis by Scheme.
818 @end deffn
819
820 The following procedure is actually in @code{(system vm program)}, but
821 we'll mention it here:
822
823 @deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
824 @deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
825 Load up object code into a Scheme program. The resulting program will
826 have @var{objtable} as its object table, which should be a vector or
827 @code{#f}, and will capture the free variables from @var{free-vars}.
828 @end deffn
829
830 Object code from a file may be disassembled at the REPL via the
831 meta-command @code{,disassemble-file}, abbreviated as @code{,xx}.
832 Programs may be disassembled via @code{,disassemble}, abbreviated as
833 @code{,x}.
834
835 Compiling object code to the fake language, @code{value}, is performed
836 via loading objcode into a program, then executing that thunk with
837 respect to the compilation environment. Normally the environment
838 propagates through the compiler transparently, but users may specify
839 the compilation environment manually as well, as a module.
840
841
842 @node Writing New High-Level Languages
843 @subsection Writing New High-Level Languages
844
845 In order to integrate a new language @var{lang} into Guile's compiler
846 system, one has to create the module @code{(language @var{lang} spec)}
847 containing the language definition and referencing the parser,
848 compiler and other routines processing it. The module hierarchy in
849 @code{(language brainfuck)} defines a very basic Brainfuck
850 implementation meant to serve as easy-to-understand example on how to
851 do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
852 for more information about the Brainfuck language itself.
853
854
855 @node Extending the Compiler
856 @subsection Extending the Compiler
857
858 At this point we take a detour from the impersonal tone of the rest of
859 the manual. Admit it: if you've read this far into the compiler
860 internals manual, you are a junkie. Perhaps a course at your university
861 left you unsated, or perhaps you've always harbored a desire to hack the
862 holy of computer science holies: a compiler. Well you're in good
863 company, and in a good position. Guile's compiler needs your help.
864
865 There are many possible avenues for improving Guile's compiler.
866 Probably the most important improvement, speed-wise, will be some form
867 of native compilation, both just-in-time and ahead-of-time. This could
868 be done in many ways. Probably the easiest strategy would be to extend
869 the compiled procedure structure to include a pointer to a native code
870 vector, and compile from bytecode to native code at run-time after a
871 procedure is called a certain number of times.
872
873 The name of the game is a profiling-based harvest of the low-hanging
874 fruit, running programs of interest under a system-level profiler and
875 determining which improvements would give the most bang for the buck.
876 It's really getting to the point though that native compilation is the
877 next step.
878
879 The compiler also needs help at the top end, enhancing the Scheme that
880 it knows to also understand R6RS, and adding new high-level compilers.
881 We have JavaScript and Emacs Lisp mostly complete, but they could use
882 some love; Lua would be nice as well, but whatever language it is
883 that strikes your fancy would be welcome too.
884
885 Compilers are for hacking, not for admiring or for complaining about.
886 Get to it!