Merge remote-tracking branch 'local-2.0/stable-2.0'
[bpt/guile.git] / doc / ref / compiler.texi
CommitLineData
8680d53b
AW
1@c -*-texinfo-*-
2@c This is part of the GNU Guile Reference Manual.
7081d4f9 3@c Copyright (C) 2008, 2009, 2010, 2011
8680d53b
AW
4@c Free Software Foundation, Inc.
5@c See the file guile.texi for copying conditions.
6
7@node Compiling to the Virtual Machine
8@section Compiling to the Virtual Machine
9
00ce5125
AW
10Compilers have a mystique about them that is attractive and
11off-putting at the same time. They are attractive because they are
12magical -- they transform inert text into live results, like throwing
e33e3aee
AW
13the switch on Frankenstein's monster. However, this magic is perceived
14by many to be impenetrable.
00ce5125 15
0b8f3ac5
AW
16This section aims to pay attention to the small man behind the
17curtain.
00ce5125 18
e3ba263d 19@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
98850fd7 20know how to compile your @code{.scm} file.
00ce5125
AW
21
22@menu
23* Compiler Tower::
24* The Scheme Compiler::
81fd3152 25* Tree-IL::
00ce5125 26* GLIL::
81fd3152 27* Assembly::
73643339 28* Bytecode and Objcode::
e63d888e 29* Writing New High-Level Languages::
e3ba263d 30* Extending the Compiler::
00ce5125
AW
31@end menu
32
33@node Compiler Tower
34@subsection Compiler Tower
35
36Guile's compiler is quite simple, actually -- its @emph{compilers}, to
37put it more accurately. Guile defines a tower of languages, starting
38at Scheme and progressively simplifying down to languages that
e3ba263d 39resemble the VM instruction set (@pxref{Instruction Set}).
00ce5125
AW
40
41Each language knows how to compile to the next, so each step is simple
42and understandable. Furthermore, this set of languages is not
43hardcoded into Guile, so it is possible for the user to add new
44high-level languages, new passes, or even different compilation
45targets.
46
e3ba263d
AW
47Languages are registered in the module, @code{(system base language)}:
48
49@example
50(use-modules (system base language))
51@end example
52
53They are registered with the @code{define-language} form.
54
55@deffn {Scheme Syntax} define-language @
41e64dd7
AW
56name title reader printer @
57[parser=#f] [compilers='()] [decompilers='()] [evaluator=#f] @
58[joiner=#f] [make-default-environment=make-fresh-user-module]
e3ba263d
AW
59Define a language.
60
61This syntax defines a @code{#<language>} object, bound to @var{name}
62in the current environment. In addition, the language will be added to
63the global language set. For example, this is the language definition
64for Scheme:
65
66@example
67(define-language scheme
41e64dd7
AW
68 #:title "Scheme"
69 #:reader (lambda (port env) ...)
98850fd7 70 #:compilers `((tree-il . ,compile-tree-il))
81fd3152 71 #:decompilers `((tree-il . ,decompile-tree-il))
41e64dd7
AW
72 #:evaluator (lambda (x module) (primitive-eval x))
73 #:printer write
74 #:make-default-environment (lambda () ...))
e3ba263d 75@end example
e3ba263d
AW
76@end deffn
77
78The interesting thing about having languages defined this way is that
79they present a uniform interface to the read-eval-print loop. This
80allows the user to change the current language of the REPL:
81
82@example
81fd3152 83scheme@@(guile-user)> ,language tree-il
41e64dd7
AW
84Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
85tree-il@@(guile-user)> ,L scheme
86Happy hacking with Scheme! To switch back, type `,L tree-il'.
87scheme@@(guile-user)>
e3ba263d
AW
88@end example
89
90Languages can be looked up by name, as they were above.
91
92@deffn {Scheme Procedure} lookup-language name
93Looks up a language named @var{name}, autoloading it if necessary.
94
95Languages are autoloaded by looking for a variable named @var{name} in
96a module named @code{(language @var{name} spec)}.
97
98The language object will be returned, or @code{#f} if there does not
99exist a language with that name.
100@end deffn
101
102Defining languages this way allows us to programmatically determine
103the necessary steps for compiling code from one language to another.
104
105@deffn {Scheme Procedure} lookup-compilation-order from to
106Recursively traverses the set of languages to which @var{from} can
107compile, depth-first, and return the first path that can transform
108@var{from} to @var{to}. Returns @code{#f} if no path is found.
109
110This function memoizes its results in a cache that is invalidated by
111subsequent calls to @code{define-language}, so it should be quite
112fast.
113@end deffn
114
115There is a notion of a ``current language'', which is maintained in
116the @code{*current-language*} fluid. This language is normally Scheme,
86872cc3 117and may be rebound by the user. The run-time compilation interfaces
e3ba263d
AW
118(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
119and target languages.
120
121The normal tower of languages when compiling Scheme goes like this:
122
123@itemize
41e64dd7 124@item Scheme
81fd3152 125@item Tree Intermediate Language (Tree-IL)
41e64dd7 126@item Guile Lowlevel Intermediate Language (GLIL)
81fd3152
AW
127@item Assembly
128@item Bytecode
73643339 129@item Objcode
e3ba263d
AW
130@end itemize
131
132Object code may be serialized to disk directly, though it has a cookie
73643339
AW
133and version prepended to the front. But when compiling Scheme at run
134time, you want a Scheme value: for example, a compiled procedure. For
135this reason, so as not to break the abstraction, Guile defines a fake
81fd3152
AW
136language at the bottom of the tower:
137
138@itemize
139@item Value
140@end itemize
141
142Compiling to @code{value} loads the object code into a procedure, and
143wakes the sleeping giant.
e3ba263d
AW
144
145Perhaps this strangeness can be explained by example:
146@code{compile-file} defaults to compiling to object code, because it
147produces object code that has to live in the barren world outside the
148Guile runtime; but @code{compile} defaults to compiling to
149@code{value}, as its product re-enters the Guile world.
150
151Indeed, the process of compilation can circulate through these
152different worlds indefinitely, as shown by the following quine:
153
154@example
00ce5125 155((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
e3ba263d 156@end example
00ce5125
AW
157
158@node The Scheme Compiler
159@subsection The Scheme Compiler
160
7081d4f9
AW
161The job of the Scheme compiler is to expand all macros and all of Scheme
162to its most primitive expressions. The definition of ``primitive'' is
163given by the inventory of constructs provided by Tree-IL, the target
164language of the Scheme compiler: procedure calls, conditionals, lexical
165references, etc. This is described more fully in the next section.
81fd3152
AW
166
167The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
168that it is completely implemented by the macro expander. Since the
169macro expander has to run over all of the source code already in order
170to expand macros, it might as well do the analysis at the same time,
171producing Tree-IL expressions directly.
172
173Because this compiler is actually the macro expander, it is
174extensible. Any macro which the user writes becomes part of the
175compiler.
176
177The Scheme-to-Tree-IL expander may be invoked using the generic
178@code{compile} procedure:
179
180@lisp
181(compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
182@result{}
7081d4f9
AW
183 #<<call> src: #f
184 proc: #<<toplevel-ref> src: #f name: +>
185 args: (#<<const> src: #f exp: 1>
186 #<<const> src: #f exp: 2>)>
81fd3152
AW
187@end lisp
188
189Or, since Tree-IL is so close to Scheme, it is often useful to expand
190Scheme to Tree-IL, then translate back to Scheme. For that reason the
191expander provides two interfaces. The former is equivalent to calling
41e64dd7 192@code{(macroexpand '(+ 1 2) 'c)}, where the @code{'c} is for
81fd3152
AW
193``compile''. With @code{'e} (the default), the result is translated
194back to Scheme:
195
196@lisp
41e64dd7 197(macroexpand '(+ 1 2))
81fd3152 198@result{} (+ 1 2)
41e64dd7 199(macroexpand '(let ((x 10)) (* x x)))
81fd3152
AW
200@result{} (let ((x84 10)) (* x84 x84))
201@end lisp
202
203The second example shows that as part of its job, the macro expander
204renames lexically-bound variables. The original names are preserved
205when compiling to Tree-IL, but can't be represented in Scheme: a
206lexical binding only has one name. It is for this reason that the
207@emph{native} output of the expander is @emph{not} Scheme. There's too
208much information we would lose if we translated to Scheme directly:
209lexical variable names, source locations, and module hygiene.
210
41e64dd7
AW
211Note however that @code{macroexpand} does not have the same signature
212as @code{compile-tree-il}. @code{compile-tree-il} is a small wrapper
213around @code{macroexpand}, to make it conform to the general form of
81fd3152
AW
214compiler procedures in Guile's language tower.
215
98850fd7
AW
216Compiler procedures take three arguments: an expression, an
217environment, and a keyword list of options. They return three values:
218the compiled expression, the corresponding environment for the target
219language, and a ``continuation environment''. The compiled expression
220and environment will serve as input to the next language's compiler.
221The ``continuation environment'' can be used to compile another
222expression from the same source language within the same module.
81fd3152
AW
223
224For example, you might compile the expression, @code{(define-module
225(foo))}. This will result in a Tree-IL expression and environment. But
226if you compiled a second expression, you would want to take into
227account the compile-time effect of compiling the previous expression,
228which puts the user in the @code{(foo)} module. That is purpose of the
229``continuation environment''; you would pass it as the environment
230when compiling the subsequent expression.
231
41e64dd7
AW
232For Scheme, an environment is a module. By default, the @code{compile}
233and @code{compile-file} procedures compile in a fresh module, such
234that bindings and macros introduced by the expression being compiled
235are isolated:
1ebe6a63
LC
236
237@example
238(eq? (current-module) (compile '(current-module)))
239@result{} #f
240
241(compile '(define hello 'world))
242(defined? 'hello)
243@result{} #f
244
245(define / *)
246(eq? (compile '/) /)
247@result{} #f
248@end example
249
250Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
251@code{current-reader}}) are isolated:
252
253@example
254(compile '(fluid-set! current-reader (lambda args 'fail)))
255(fluid-ref current-reader)
256@result{} #f
257@end example
258
259Nevertheless, having the compiler and @dfn{compilee} share the same name
260space can be achieved by explicitly passing @code{(current-module)} as
261the compilation environment:
262
263@example
264(define hello 'world)
265(compile 'hello #:env (current-module))
266@result{} world
267@end example
268
81fd3152
AW
269@node Tree-IL
270@subsection Tree-IL
00ce5125 271
81fd3152 272Tree Intermediate Language (Tree-IL) is a structured intermediate
c850030f
AW
273language that is close in expressive power to Scheme. It is an
274expanded, pre-analyzed Scheme.
275
81fd3152
AW
276Tree-IL is ``structured'' in the sense that its representation is
277based on records, not S-expressions. This gives a rigidity to the
278language that ensures that compiling to a lower-level language only
41e64dd7
AW
279requires a limited set of transformations. For example, the Tree-IL
280type @code{<const>} is a record type with two fields, @code{src} and
281@code{exp}. Instances of this type are created via @code{make-const}.
282Fields of this type are accessed via the @code{const-src} and
283@code{const-exp} procedures. There is also a predicate, @code{const?}.
284@xref{Records}, for more information on records.
81fd3152
AW
285
286@c alpha renaming
287
288All Tree-IL types have a @code{src} slot, which holds source location
289information for the expression. This information, if present, will be
290residualized into the compiled object code, allowing backtraces to
291show source information. The format of @code{src} is the same as that
292returned by Guile's @code{source-properties} function. @xref{Source
293Properties}, for more information.
294
295Although Tree-IL objects are represented internally using records,
296there is also an equivalent S-expression external representation for
ecb87335 297each kind of Tree-IL. For example, the S-expression representation
81fd3152 298of @code{#<const src: #f exp: 3>} expression would be:
c850030f
AW
299
300@example
81fd3152 301(const 3)
c850030f
AW
302@end example
303
81fd3152 304Users may program with this format directly at the REPL:
c850030f
AW
305
306@example
81fd3152 307scheme@@(guile-user)> ,language tree-il
41e64dd7 308Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
81fd3152 309tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
c850030f
AW
310@result{} 42
311@end example
312
81fd3152
AW
313The @code{src} fields are left out of the external representation.
314
98850fd7
AW
315One may create Tree-IL objects from their external representations via
316calling @code{parse-tree-il}, the reader for Tree-IL. If any source
317information is attached to the input S-expression, it will be
318propagated to the resulting Tree-IL expressions. This is probably the
319easiest way to compile to Tree-IL: just make the appropriate external
320representations in S-expression format, and let @code{parse-tree-il}
321take care of the rest.
322
81fd3152
AW
323@deftp {Scheme Variable} <void> src
324@deftpx {External Representation} (void)
325An empty expression. In practice, equivalent to Scheme's @code{(if #f
326#f)}.
327@end deftp
328@deftp {Scheme Variable} <const> src exp
329@deftpx {External Representation} (const @var{exp})
330A constant.
331@end deftp
332@deftp {Scheme Variable} <primitive-ref> src name
333@deftpx {External Representation} (primitive @var{name})
334A reference to a ``primitive''. A primitive is a procedure that, when
335compiled, may be open-coded. For example, @code{cons} is usually
336recognized as a primitive, so that it compiles down to a single
337instruction.
338
339Compilation of Tree-IL usually begins with a pass that resolves some
340@code{<module-ref>} and @code{<toplevel-ref>} expressions to
7081d4f9
AW
341@code{<primitive-ref>} expressions. The actual compilation pass has
342special cases for calls to certain primitives, like @code{apply} or
343@code{cons}.
81fd3152
AW
344@end deftp
345@deftp {Scheme Variable} <lexical-ref> src name gensym
346@deftpx {External Representation} (lexical @var{name} @var{gensym})
347A reference to a lexically-bound variable. The @var{name} is the
348original name of the variable in the source program. @var{gensym} is a
349unique identifier for this variable.
350@end deftp
351@deftp {Scheme Variable} <lexical-set> src name gensym exp
352@deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
353Sets a lexically-bound variable.
354@end deftp
355@deftp {Scheme Variable} <module-ref> src mod name public?
356@deftpx {External Representation} (@@ @var{mod} @var{name})
357@deftpx {External Representation} (@@@@ @var{mod} @var{name})
358A reference to a variable in a specific module. @var{mod} should be
679cceed 359the name of the module, e.g.@: @code{(guile-user)}.
81fd3152
AW
360
361If @var{public?} is true, the variable named @var{name} will be looked
362up in @var{mod}'s public interface, and serialized with @code{@@};
363otherwise it will be looked up among the module's private bindings,
364and is serialized with @code{@@@@}.
365@end deftp
366@deftp {Scheme Variable} <module-set> src mod name public? exp
367@deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
368@deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
369Sets a variable in a specific module.
370@end deftp
371@deftp {Scheme Variable} <toplevel-ref> src name
372@deftpx {External Representation} (toplevel @var{name})
373References a variable from the current procedure's module.
374@end deftp
375@deftp {Scheme Variable} <toplevel-set> src name exp
376@deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
377Sets a variable in the current procedure's module.
378@end deftp
379@deftp {Scheme Variable} <toplevel-define> src name exp
380@deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
381Defines a new top-level variable in the current procedure's module.
382@end deftp
383@deftp {Scheme Variable} <conditional> src test then else
384@deftpx {External Representation} (if @var{test} @var{then} @var{else})
ca445ba5 385A conditional. Note that @var{else} is not optional.
c850030f 386@end deftp
7081d4f9
AW
387@deftp {Scheme Variable} <call> src proc args
388@deftpx {External Representation} (call @var{proc} . @var{args})
ca445ba5 389A procedure call.
c850030f 390@end deftp
a881a4ae
AW
391@deftp {Scheme Variable} <primcall> src name args
392@deftpx {External Representation} (primcall @var{name} . @var{args})
393A call to a primitive. Equivalent to @code{(call (primitive @var{name})
394. @var{args})}. This construct is often more convenient to generate and
395analyze than @code{<call>}.
396
397As part of the compilation process, instances of @code{(call (primitive
398@var{name}) . @var{args})} are transformed into primcalls.
399@end deftp
81fd3152
AW
400@deftp {Scheme Variable} <sequence> src exps
401@deftpx {External Representation} (begin . @var{exps})
402Like Scheme's @code{begin}.
c850030f 403@end deftp
41e64dd7
AW
404@deftp {Scheme Variable} <lambda> src meta body
405@deftpx {External Representation} (lambda @var{meta} @var{body})
406A closure. @var{meta} is an association list of properties for the
407procedure. @var{body} is a single Tree-IL expression of type
408@code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
409an alternate clause, this makes Tree-IL's @code{<lambda>} have the
410expressiveness of Scheme's @code{case-lambda}.
411@end deftp
412@deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
413@deftpx {External Representation} @
414 (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
415 @var{body})@
416 [@var{alternate}])
417One clause of a @code{case-lambda}. A @code{lambda} expression in
418Scheme is treated as a @code{case-lambda} with one clause.
419
420@var{req} is a list of the procedure's required arguments, as symbols.
421@var{opt} is a list of the optional arguments, or @code{#f} if there
422are no optional arguments. @var{rest} is the name of the rest
423argument, or @code{#f}.
424
425@var{kw} is a list of the form, @code{(@var{allow-other-keys?}
426(@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
427keyword corresponding to the argument named @var{name}, and whose
428corresponding gensym is @var{var}. @var{inits} are tree-il expressions
ecb87335 429corresponding to all of the optional and keyword arguments, evaluated
41e64dd7
AW
430to bind variables whose value is not supplied by the procedure caller.
431Each @var{init} expression is evaluated in the lexical context of
432previously bound variables, from left to right.
433
434@var{gensyms} is a list of gensyms corresponding to all arguments:
435first all of the required arguments, then the optional arguments if
436any, then the rest argument if any, then all of the keyword arguments.
437
438@var{body} is the body of the clause. If the procedure is called with
439an appropriate number of arguments, @var{body} is evaluated in tail
64de6db5 440position. Otherwise, if there is an @var{alternate}, it should be a
41e64dd7 441@code{<lambda-case>} expression, representing the next clause to try.
64de6db5 442If there is no @var{alternate}, a wrong-number-of-arguments error is
41e64dd7
AW
443signaled.
444@end deftp
445@deftp {Scheme Variable} <let> src names gensyms vals exp
446@deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
81fd3152 447Lexical binding, like Scheme's @code{let}. @var{names} are the
41e64dd7 448original binding names, @var{gensyms} are gensyms corresponding to the
81fd3152
AW
449@var{names}, and @var{vals} are Tree-IL expressions for the values.
450@var{exp} is a single Tree-IL expression.
451@end deftp
935c7aca 452@deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
172988ee
AW
453@deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
454@deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
81fd3152 455A version of @code{<let>} that creates recursive bindings, like
935c7aca 456Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
81fd3152 457@end deftp
41e64dd7
AW
458@deftp {Scheme Variable} <dynlet> fluids vals body
459@deftpx {External Representation} (dynlet @var{fluids} @var{vals} @var{body})
460Dynamic binding; the equivalent of Scheme's @code{with-fluids}.
461@var{fluids} should be a list of Tree-IL expressions that will
462evaluate to fluids, and @var{vals} a corresponding list of expressions
463to bind to the fluids during the dynamic extent of the evaluation of
464@var{body}.
465@end deftp
466@deftp {Scheme Variable} <dynref> fluid
467@deftpx {External Representation} (dynref @var{fluid})
468A dynamic variable reference. @var{fluid} should be a Tree-IL
469expression evaluating to a fluid.
470@end deftp
471@deftp {Scheme Variable} <dynset> fluid exp
472@deftpx {External Representation} (dynset @var{fluid} @var{exp})
473A dynamic variable set. @var{fluid}, a Tree-IL expression evaluating
474to a fluid, will be set to the result of evaluating @var{exp}.
475@end deftp
880e7948
AW
476@deftp {Scheme Variable} <dynwind> winder pre body post unwinder
477@deftpx {External Representation} (dynwind @var{winder} @var{pre} @var{body} @var{post} @var{unwinder})
41e64dd7 478A @code{dynamic-wind}. @var{winder} and @var{unwinder} should both
880e7948
AW
479evaluate to thunks. Ensure that the winder and the unwinder are called
480before entering and after leaving @var{body}. Note that @var{body} is
481an expression, without a thunk wrapper. Guile actually inlines the
482bodies of @var{winder} and @var{unwinder} for the case of normal control
483flow, compiling the expressions in @var{pre} and @var{post},
484respectively.
41e64dd7
AW
485@end deftp
486@deftp {Scheme Variable} <prompt> tag body handler
487@deftpx {External Representation} (prompt @var{tag} @var{body} @var{handler})
488A dynamic prompt. Instates a prompt named @var{tag}, an expression,
489during the dynamic extent of the execution of @var{body}, also an
490expression. If an abort occurs to this prompt, control will be passed
491to @var{handler}, a @code{<lambda-case>} expression with no optional
492or keyword arguments, and no alternate. The first argument to the
493@code{<lambda-case>} will be the captured continuation, and then all
494of the values passed to the abort. @xref{Prompts}, for more
495information.
496@end deftp
497@deftp {Scheme Variable} <abort> tag args tail
498@deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
499An abort to the nearest prompt with the name @var{tag}, an expression.
500@var{args} should be a list of expressions to pass to the prompt's
501handler, and @var{tail} should be an expression that will evaluate to
502a list of additional arguments. An abort will save the partial
503continuation, which may later be reinstated, resulting in the
504@code{<abort>} expression evaluating to some number of values.
505@end deftp
81fd3152 506
98850fd7
AW
507There are two Tree-IL constructs that are not normally produced by
508higher-level compilers, but instead are generated during the
509source-to-source optimization and analysis passes that the Tree-IL
510compiler does. Users should not generate these expressions directly,
511unless they feel very clever, as the default analysis pass will
512generate them as necessary.
513
41e64dd7
AW
514@deftp {Scheme Variable} <let-values> src names gensyms exp body
515@deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
98850fd7
AW
516Like Scheme's @code{receive} -- binds the values returned by
517evaluating @code{exp} to the @code{lambda}-like bindings described by
41e64dd7 518@var{gensyms}. That is to say, @var{gensyms} may be an improper list.
98850fd7 519
7081d4f9 520@code{<let-values>} is an optimization of a @code{<call>} to the
98850fd7
AW
521primitive, @code{call-with-values}.
522@end deftp
41e64dd7
AW
523@deftp {Scheme Variable} <fix> src names gensyms vals body
524@deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
98850fd7
AW
525Like @code{<letrec>}, but only for @var{vals} that are unset
526@code{lambda} expressions.
527
528@code{fix} is an optimization of @code{letrec} (and @code{let}).
529@end deftp
81fd3152
AW
530
531Tree-IL implements a compiler to GLIL that recursively traverses
532Tree-IL expressions, writing out GLIL expressions into a linear list.
533The compiler also keeps some state as to whether the current
534expression is in tail context, and whether its value will be used in
535future computations. This state allows the compiler not to emit code
679cceed 536for constant expressions that will not be used (e.g.@: docstrings), and
81fd3152
AW
537to perform tail calls when in tail position.
538
98850fd7
AW
539Most optimization, such as it currently is, is performed on Tree-IL
540expressions as source-to-source transformations. There will be more
541optimizations added in the future.
c850030f
AW
542
543Interested readers are encouraged to read the implementation in
81fd3152 544@code{(language tree-il compile-glil)} for more details.
00ce5125
AW
545
546@node GLIL
547@subsection GLIL
548
41e64dd7 549Guile Lowlevel Intermediate Language (GLIL) is a structured intermediate
81fd3152 550language whose expressions more closely approximate Guile's VM
98850fd7
AW
551instruction set. Its expression types are defined in @code{(language
552glil)}.
c850030f 553
41e64dd7 554@deftp {Scheme Variable} <glil-program> meta . body
86872cc3 555A unit of code that at run-time will correspond to a compiled
41e64dd7 556procedure. @var{meta} should be an alist of properties, as in
98850fd7
AW
557Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
558expressions.
c850030f 559@end deftp
41e64dd7
AW
560@deftp {Scheme Variable} <glil-std-prelude> nreq nlocs else-label
561A prologue for a function with no optional, keyword, or rest
562arguments. @var{nreq} is the number of required arguments. @var{nlocs}
563the total number of local variables, including the arguments. If the
564procedure was not given exactly @var{nreq} arguments, control will
565jump to @var{else-label}, if given, or otherwise signal an error.
566@end deftp
567@deftp {Scheme Variable} <glil-opt-prelude> nreq nopt rest nlocs else-label
568A prologue for a function with optional or rest arguments. Like
569@code{<glil-std-prelude>}, with the addition that @var{nopt} is the
570number of optional arguments (possibly zero) and @var{rest} is an
571index of a local variable at which to bind a rest argument, or
572@code{#f} if there is no rest argument.
573@end deftp
574@deftp {Scheme Variable} <glil-kw-prelude> nreq nopt rest kw allow-other-keys? nlocs else-label
575A prologue for a function with keyword arguments. Like
576@code{<glil-opt-prelude>}, with the addition that @var{kw} is a list
577of keyword arguments, and @var{allow-other-keys?} is a flag indicating
578whether to allow unknown keys. @xref{Function Prologue Instructions,
579@code{bind-kwargs}}, for details on the format of @var{kw}.
580@end deftp
c850030f 581@deftp {Scheme Variable} <glil-bind> . vars
ff73ae34
AW
582An advisory expression that notes a liveness extent for a set of
583variables. @var{vars} is a list of @code{(@var{name} @var{type}
584@var{index})}, where @var{type} should be either @code{argument},
585@code{local}, or @code{external}.
586
587@code{<glil-bind>} expressions end up being serialized as part of a
588program's metadata and do not form part of a program's code path.
c850030f
AW
589@end deftp
590@deftp {Scheme Variable} <glil-mv-bind> vars rest
ff73ae34
AW
591A multiple-value binding of the values on the stack to @var{vars}. Iff
592@var{rest} is true, the last element of @var{vars} will be treated as
593a rest argument.
594
595In addition to pushing a binding annotation on the stack, like
596@code{<glil-bind>}, an expression is emitted at compilation time to
597make sure that there are enough values available to bind. See the
acc51c3e
AW
598notes on @code{truncate-values} in @ref{Procedure Call and Return
599Instructions}, for more information.
c850030f
AW
600@end deftp
601@deftp {Scheme Variable} <glil-unbind>
ff73ae34
AW
602Closes the liveness extent of the most recently encountered
603@code{<glil-bind>} or @code{<glil-mv-bind>} expression. As GLIL
604expressions are compiled, a parallel stack of live bindings is
605maintained; this expression pops off the top element from that stack.
606
607Bindings are written into the program's metadata so that debuggers and
608other tools can determine the set of live local variables at a given
609offset within a VM program.
c850030f
AW
610@end deftp
611@deftp {Scheme Variable} <glil-source> loc
ff73ae34 612Records source information for the preceding expression. @var{loc}
73643339 613should be an association list of containing @code{line} @code{column},
679cceed 614and @code{filename} keys, e.g.@: as returned by
73643339 615@code{source-properties}.
c850030f
AW
616@end deftp
617@deftp {Scheme Variable} <glil-void>
98850fd7 618Pushes ``the unspecified value'' on the stack.
c850030f
AW
619@end deftp
620@deftp {Scheme Variable} <glil-const> obj
ff73ae34 621Pushes a constant value onto the stack. @var{obj} must be a number,
98850fd7
AW
622string, symbol, keyword, boolean, character, uniform array, the empty
623list, or a pair or vector of constants.
c850030f 624@end deftp
98850fd7
AW
625@deftp {Scheme Variable} <glil-lexical> local? boxed? op index
626Accesses a lexically bound variable. If the variable is not
41e64dd7
AW
627@var{local?} it is free. All variables may have @code{ref},
628@code{set}, and @code{bound?} as their @var{op}. Boxed variables may
629also have the @var{op}s @code{box}, @code{empty-box}, and @code{fix},
630which correspond in semantics to the VM instructions @code{box},
98850fd7
AW
631@code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
632more information.
c850030f
AW
633@end deftp
634@deftp {Scheme Variable} <glil-toplevel> op name
ff73ae34
AW
635Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
636or @code{define}.
c850030f
AW
637@end deftp
638@deftp {Scheme Variable} <glil-module> op mod name public?
73643339
AW
639Accesses a variable within a specific module. See Tree-IL's
640@code{<module-ref>}, for more information.
c850030f
AW
641@end deftp
642@deftp {Scheme Variable} <glil-label> label
ff73ae34
AW
643Creates a new label. @var{label} can be any Scheme value, and should
644be unique.
c850030f
AW
645@end deftp
646@deftp {Scheme Variable} <glil-branch> inst label
ff73ae34 647Branch to a label. @var{label} should be a @code{<ghil-label>}.
c850030f
AW
648@code{inst} is a branching instruction: @code{br-if}, @code{br}, etc.
649@end deftp
650@deftp {Scheme Variable} <glil-call> inst nargs
ff73ae34 651This expression is probably misnamed, as it does not correspond to
c850030f
AW
652function calls. @code{<glil-call>} invokes the VM instruction named
653@var{inst}, noting that it is called with @var{nargs} stack arguments.
ff73ae34
AW
654The arguments should be pushed on the stack already. What happens to
655the stack afterwards depends on the instruction.
c850030f
AW
656@end deftp
657@deftp {Scheme Variable} <glil-mv-call> nargs ra
ff73ae34
AW
658Performs a multiple-value call. @var{ra} is a @code{<glil-label>}
659corresponding to the multiple-value return address for the call. See
acc51c3e
AW
660the notes on @code{mv-call} in @ref{Procedure Call and Return
661Instructions}, for more information.
c850030f 662@end deftp
41e64dd7
AW
663@deftp {Scheme Variable} <glil-prompt> label escape-only?
664Push a dynamic prompt into the stack, with a handler at @var{label}.
665@var{escape-only?} is a flag that is propagated to the prompt,
666allowing an abort to avoid capturing a continuation in some cases.
667@xref{Prompts}, for more information.
668@end deftp
c850030f 669
ff73ae34 670Users may enter in GLIL at the REPL as well, though there is a bit
41e64dd7 671more bookkeeping to do:
00ce5125 672
ff73ae34
AW
673@example
674scheme@@(guile-user)> ,language glil
41e64dd7
AW
675Happy hacking with Guile Lowlevel Intermediate Language (GLIL)!
676To switch back, type `,L scheme'.
677glil@@(guile-user)> (program () (std-prelude 0 0 #f)
678 (const 3) (call return 1))
ff73ae34
AW
679@result{} 3
680@end example
00ce5125 681
ff73ae34
AW
682Just as in all of Guile's compilers, an environment is passed to the
683GLIL-to-object code compiler, and one is returned as well, along with
684the object code.
00ce5125 685
81fd3152
AW
686@node Assembly
687@subsection Assembly
688
73643339
AW
689Assembly is an S-expression-based, human-readable representation of
690the actual bytecodes that will be emitted for the VM. As such, it is a
691useful intermediate language both for compilation and for
692decompilation.
81fd3152 693
73643339
AW
694Besides the fact that it is not a record-based language, assembly
695differs from GLIL in four main ways:
00ce5125 696
73643339
AW
697@itemize
698@item Labels have been resolved to byte offsets in the program.
699@item Constants inside procedures have either been expressed as inline
98850fd7 700instructions or cached in object arrays.
73643339
AW
701@item Procedures with metadata (source location information, liveness
702extents, procedure names, generic properties, etc) have had their
703metadata serialized out to thunks.
704@item All expressions correspond directly to VM instructions -- i.e.,
98850fd7 705there is no @code{<glil-lexical>} which can be a ref or a set.
73643339
AW
706@end itemize
707
708Assembly is isomorphic to the bytecode that it compiles to. You can
709compile to bytecode, then decompile back to assembly, and you have the
710same assembly code.
711
712The general form of assembly instructions is the following:
713
714@lisp
715(@var{inst} @var{arg} ...)
716@end lisp
717
718The @var{inst} names a VM instruction, and its @var{arg}s will be
719embedded in the instruction stream. The easiest way to see assembly is
720to play around with it at the REPL, as can be seen in this annotated
721example:
722
723@example
dc3b2661 724scheme@@(guile-user)> ,pp (compile '(+ 32 10) #:to 'assembly)
41e64dd7 725(load-program
0a715b9a
AW
726 ((:LCASE16 . 2)) ; Labels, unused in this case.
727 8 ; Length of the thunk that was compiled.
41e64dd7 728 (load-program ; Metadata thunk.
73643339 729 ()
41e64dd7
AW
730 17
731 #f ; No metadata thunk for the metadata thunk.
732 (make-eol)
733 (make-eol)
0a715b9a
AW
734 (make-int8 2) ; Liveness extents, source info, and arities,
735 (make-int8 8) ; in a format that Guile knows how to parse.
736 (make-int8:0)
41e64dd7
AW
737 (list 0 3)
738 (list 0 1)
739 (list 0 3)
740 (return))
0a715b9a 741 (assert-nargs-ee/locals 0) ; Prologue.
41e64dd7
AW
742 (make-int8 32) ; Actual code starts here.
743 (make-int8 10)
744 (add)
0a715b9a 745 (return))
73643339
AW
746@end example
747
748Of course you can switch the REPL to assembly and enter in assembly
749S-expressions directly, like with other languages, though it is more
750difficult, given that the length fields have to be correct.
751
752@node Bytecode and Objcode
753@subsection Bytecode and Objcode
754
755Finally, the raw bytes. There are actually two different ``languages''
756here, corresponding to two different ways to represent the bytes.
757
758``Bytecode'' represents code as uniform byte vectors, useful for
759structuring and destructuring code on the Scheme level. Bytecode is
760the next step down from assembly:
761
762@example
73643339 763scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
0a715b9a
AW
764@result{} #vu8(8 0 0 0 25 0 0 0 ; Header.
765 95 0 ; Prologue.
766 10 32 10 10 148 66 17 ; Actual code.
767 0 0 0 0 0 0 0 9 ; Metadata thunk.
768 9 10 2 10 8 11 18 0 3 18 0 1 18 0 3 66)
73643339
AW
769@end example
770
771``Objcode'' is bytecode, but mapped directly to a C structure,
772@code{struct scm_objcode}:
773
774@example
775struct scm_objcode @{
73643339
AW
776 scm_t_uint32 len;
777 scm_t_uint32 metalen;
778 scm_t_uint8 base[0];
779@};
780@end example
781
782As one might imagine, objcode imposes a minimum length on the
41e64dd7
AW
783bytecode. Also, the @code{len} and @code{metalen} fields are in native
784endianness, which makes objcode (and bytecode) system-dependent.
73643339
AW
785
786Objcode also has a couple of important efficiency hacks. First,
787objcode may be mapped directly from disk, allowing compiled code to be
788loaded quickly, often from the system's disk cache, and shared among
789multiple processes. Secondly, objcode may be embedded in other
790objcode, allowing procedures to have the text of other procedures
791inlined into their bodies, without the need for separate allocation of
792the code. Of course, the objcode object itself does need to be
793allocated.
794
795Procedures related to objcode are defined in the @code{(system vm
796objcode)} module.
00ce5125 797
ff73ae34
AW
798@deffn {Scheme Procedure} objcode? obj
799@deffnx {C Function} scm_objcode_p (obj)
800Returns @code{#f} iff @var{obj} is object code, @code{#f} otherwise.
801@end deffn
00ce5125 802
73643339 803@deffn {Scheme Procedure} bytecode->objcode bytecode
42a438e8 804@deffnx {C Function} scm_bytecode_to_objcode (bytecode)
ff73ae34 805Makes a bytecode object from @var{bytecode}, which should be a
41e64dd7 806bytevector. @xref{Bytevectors}.
ff73ae34 807@end deffn
e3ba263d 808
ff73ae34
AW
809@deffn {Scheme Variable} load-objcode file
810@deffnx {C Function} scm_load_objcode (file)
811Load object code from a file named @var{file}. The file will be mapped
812into memory via @code{mmap}, so this is a very fast operation.
e3ba263d 813
98850fd7 814On disk, object code has an sixteen-byte cookie prepended to it, to
73643339
AW
815prevent accidental loading of arbitrary garbage.
816@end deffn
817
818@deffn {Scheme Variable} write-objcode objcode file
819@deffnx {C Function} scm_write_objcode (objcode)
41e64dd7 820Write object code out to a file, prepending the sixteen-byte cookie.
ff73ae34 821@end deffn
e3ba263d 822
41e64dd7
AW
823@deffn {Scheme Variable} objcode->bytecode objcode
824@deffnx {C Function} scm_objcode_to_bytecode (objcode)
825Copy object code out to a bytevector for analysis by Scheme.
ff73ae34 826@end deffn
e3ba263d 827
73643339
AW
828The following procedure is actually in @code{(system vm program)}, but
829we'll mention it here:
830
98850fd7
AW
831@deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
832@deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
ff73ae34 833Load up object code into a Scheme program. The resulting program will
73643339 834have @var{objtable} as its object table, which should be a vector or
98850fd7 835@code{#f}, and will capture the free variables from @var{free-vars}.
ff73ae34 836@end deffn
c850030f 837
ff73ae34
AW
838Object code from a file may be disassembled at the REPL via the
839meta-command @code{,disassemble-file}, abbreviated as @code{,xx}.
840Programs may be disassembled via @code{,disassemble}, abbreviated as
841@code{,x}.
842
843Compiling object code to the fake language, @code{value}, is performed
844via loading objcode into a program, then executing that thunk with
845respect to the compilation environment. Normally the environment
846propagates through the compiler transparently, but users may specify
41e64dd7 847the compilation environment manually as well, as a module.
ff73ae34 848
c850030f 849
e63d888e
DK
850@node Writing New High-Level Languages
851@subsection Writing New High-Level Languages
852
853In order to integrate a new language @var{lang} into Guile's compiler
854system, one has to create the module @code{(language @var{lang} spec)}
855containing the language definition and referencing the parser,
856compiler and other routines processing it. The module hierarchy in
857@code{(language brainfuck)} defines a very basic Brainfuck
858implementation meant to serve as easy-to-understand example on how to
4e432dab
AW
859do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
860for more information about the Brainfuck language itself.
861
e63d888e 862
ff73ae34
AW
863@node Extending the Compiler
864@subsection Extending the Compiler
e3ba263d 865
dd6e37d0
NJ
866At this point we take a detour from the impersonal tone of the rest of
867the manual. Admit it: if you've read this far into the compiler
868internals manual, you are a junkie. Perhaps a course at your university
869left you unsated, or perhaps you've always harbored a desire to hack the
870holy of computer science holies: a compiler. Well you're in good
871company, and in a good position. Guile's compiler needs your help.
ff73ae34
AW
872
873There are many possible avenues for improving Guile's compiler.
874Probably the most important improvement, speed-wise, will be some form
875of native compilation, both just-in-time and ahead-of-time. This could
876be done in many ways. Probably the easiest strategy would be to extend
877the compiled procedure structure to include a pointer to a native code
86872cc3 878vector, and compile from bytecode to native code at run-time after a
ff73ae34
AW
879procedure is called a certain number of times.
880
881The name of the game is a profiling-based harvest of the low-hanging
882fruit, running programs of interest under a system-level profiler and
883determining which improvements would give the most bang for the buck.
98850fd7
AW
884It's really getting to the point though that native compilation is the
885next step.
ff73ae34
AW
886
887The compiler also needs help at the top end, enhancing the Scheme that
98850fd7
AW
888it knows to also understand R6RS, and adding new high-level compilers.
889We have JavaScript and Emacs Lisp mostly complete, but they could use
ecb87335 890some love; Lua would be nice as well, but whatever language it is
98850fd7
AW
891that strikes your fancy would be welcome too.
892
893Compilers are for hacking, not for admiring or for complaining about.
894Get to it!