print-options doc update
[bpt/guile.git] / doc / ref / compiler.texi
CommitLineData
8680d53b
AW
1@c -*-texinfo-*-
2@c This is part of the GNU Guile Reference Manual.
acc51c3e 3@c Copyright (C) 2008, 2009, 2010
8680d53b
AW
4@c Free Software Foundation, Inc.
5@c See the file guile.texi for copying conditions.
6
7@node Compiling to the Virtual Machine
8@section Compiling to the Virtual Machine
9
00ce5125
AW
10Compilers have a mystique about them that is attractive and
11off-putting at the same time. They are attractive because they are
12magical -- they transform inert text into live results, like throwing
e33e3aee
AW
13the switch on Frankenstein's monster. However, this magic is perceived
14by many to be impenetrable.
00ce5125 15
0b8f3ac5
AW
16This section aims to pay attention to the small man behind the
17curtain.
00ce5125 18
e3ba263d 19@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
98850fd7 20know how to compile your @code{.scm} file.
00ce5125
AW
21
22@menu
23* Compiler Tower::
24* The Scheme Compiler::
81fd3152 25* Tree-IL::
00ce5125 26* GLIL::
81fd3152 27* Assembly::
73643339 28* Bytecode and Objcode::
e63d888e 29* Writing New High-Level Languages::
e3ba263d 30* Extending the Compiler::
00ce5125
AW
31@end menu
32
33@node Compiler Tower
34@subsection Compiler Tower
35
36Guile's compiler is quite simple, actually -- its @emph{compilers}, to
37put it more accurately. Guile defines a tower of languages, starting
38at Scheme and progressively simplifying down to languages that
e3ba263d 39resemble the VM instruction set (@pxref{Instruction Set}).
00ce5125
AW
40
41Each language knows how to compile to the next, so each step is simple
42and understandable. Furthermore, this set of languages is not
43hardcoded into Guile, so it is possible for the user to add new
44high-level languages, new passes, or even different compilation
45targets.
46
e3ba263d
AW
47Languages are registered in the module, @code{(system base language)}:
48
49@example
50(use-modules (system base language))
51@end example
52
53They are registered with the @code{define-language} form.
54
55@deffn {Scheme Syntax} define-language @
41e64dd7
AW
56name title reader printer @
57[parser=#f] [compilers='()] [decompilers='()] [evaluator=#f] @
58[joiner=#f] [make-default-environment=make-fresh-user-module]
e3ba263d
AW
59Define a language.
60
61This syntax defines a @code{#<language>} object, bound to @var{name}
62in the current environment. In addition, the language will be added to
63the global language set. For example, this is the language definition
64for Scheme:
65
66@example
67(define-language scheme
41e64dd7
AW
68 #:title "Scheme"
69 #:reader (lambda (port env) ...)
98850fd7 70 #:compilers `((tree-il . ,compile-tree-il))
81fd3152 71 #:decompilers `((tree-il . ,decompile-tree-il))
41e64dd7
AW
72 #:evaluator (lambda (x module) (primitive-eval x))
73 #:printer write
74 #:make-default-environment (lambda () ...))
e3ba263d 75@end example
e3ba263d
AW
76@end deffn
77
78The interesting thing about having languages defined this way is that
79they present a uniform interface to the read-eval-print loop. This
80allows the user to change the current language of the REPL:
81
82@example
81fd3152 83scheme@@(guile-user)> ,language tree-il
41e64dd7
AW
84Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
85tree-il@@(guile-user)> ,L scheme
86Happy hacking with Scheme! To switch back, type `,L tree-il'.
87scheme@@(guile-user)>
e3ba263d
AW
88@end example
89
90Languages can be looked up by name, as they were above.
91
92@deffn {Scheme Procedure} lookup-language name
93Looks up a language named @var{name}, autoloading it if necessary.
94
95Languages are autoloaded by looking for a variable named @var{name} in
96a module named @code{(language @var{name} spec)}.
97
98The language object will be returned, or @code{#f} if there does not
99exist a language with that name.
100@end deffn
101
102Defining languages this way allows us to programmatically determine
103the necessary steps for compiling code from one language to another.
104
105@deffn {Scheme Procedure} lookup-compilation-order from to
106Recursively traverses the set of languages to which @var{from} can
107compile, depth-first, and return the first path that can transform
108@var{from} to @var{to}. Returns @code{#f} if no path is found.
109
110This function memoizes its results in a cache that is invalidated by
111subsequent calls to @code{define-language}, so it should be quite
112fast.
113@end deffn
114
115There is a notion of a ``current language'', which is maintained in
116the @code{*current-language*} fluid. This language is normally Scheme,
86872cc3 117and may be rebound by the user. The run-time compilation interfaces
e3ba263d
AW
118(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
119and target languages.
120
121The normal tower of languages when compiling Scheme goes like this:
122
123@itemize
41e64dd7 124@item Scheme
81fd3152 125@item Tree Intermediate Language (Tree-IL)
41e64dd7 126@item Guile Lowlevel Intermediate Language (GLIL)
81fd3152
AW
127@item Assembly
128@item Bytecode
73643339 129@item Objcode
e3ba263d
AW
130@end itemize
131
132Object code may be serialized to disk directly, though it has a cookie
73643339
AW
133and version prepended to the front. But when compiling Scheme at run
134time, you want a Scheme value: for example, a compiled procedure. For
135this reason, so as not to break the abstraction, Guile defines a fake
81fd3152
AW
136language at the bottom of the tower:
137
138@itemize
139@item Value
140@end itemize
141
142Compiling to @code{value} loads the object code into a procedure, and
143wakes the sleeping giant.
e3ba263d
AW
144
145Perhaps this strangeness can be explained by example:
146@code{compile-file} defaults to compiling to object code, because it
147produces object code that has to live in the barren world outside the
148Guile runtime; but @code{compile} defaults to compiling to
149@code{value}, as its product re-enters the Guile world.
150
151Indeed, the process of compilation can circulate through these
152different worlds indefinitely, as shown by the following quine:
153
154@example
00ce5125 155((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
e3ba263d 156@end example
00ce5125
AW
157
158@node The Scheme Compiler
159@subsection The Scheme Compiler
160
81fd3152
AW
161The job of the Scheme compiler is to expand all macros and all of
162Scheme to its most primitive expressions. The definition of
163``primitive'' is given by the inventory of constructs provided by
164Tree-IL, the target language of the Scheme compiler: procedure
165applications, conditionals, lexical references, etc. This is described
166more fully in the next section.
167
168The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
169that it is completely implemented by the macro expander. Since the
170macro expander has to run over all of the source code already in order
171to expand macros, it might as well do the analysis at the same time,
172producing Tree-IL expressions directly.
173
174Because this compiler is actually the macro expander, it is
175extensible. Any macro which the user writes becomes part of the
176compiler.
177
178The Scheme-to-Tree-IL expander may be invoked using the generic
179@code{compile} procedure:
180
181@lisp
182(compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
183@result{}
184 #<<application> src: #f
185 proc: #<<toplevel-ref> src: #f name: +>
186 args: (#<<const> src: #f exp: 1>
187 #<<const> src: #f exp: 2>)>
188@end lisp
189
190Or, since Tree-IL is so close to Scheme, it is often useful to expand
191Scheme to Tree-IL, then translate back to Scheme. For that reason the
192expander provides two interfaces. The former is equivalent to calling
41e64dd7 193@code{(macroexpand '(+ 1 2) 'c)}, where the @code{'c} is for
81fd3152
AW
194``compile''. With @code{'e} (the default), the result is translated
195back to Scheme:
196
197@lisp
41e64dd7 198(macroexpand '(+ 1 2))
81fd3152 199@result{} (+ 1 2)
41e64dd7 200(macroexpand '(let ((x 10)) (* x x)))
81fd3152
AW
201@result{} (let ((x84 10)) (* x84 x84))
202@end lisp
203
204The second example shows that as part of its job, the macro expander
205renames lexically-bound variables. The original names are preserved
206when compiling to Tree-IL, but can't be represented in Scheme: a
207lexical binding only has one name. It is for this reason that the
208@emph{native} output of the expander is @emph{not} Scheme. There's too
209much information we would lose if we translated to Scheme directly:
210lexical variable names, source locations, and module hygiene.
211
41e64dd7
AW
212Note however that @code{macroexpand} does not have the same signature
213as @code{compile-tree-il}. @code{compile-tree-il} is a small wrapper
214around @code{macroexpand}, to make it conform to the general form of
81fd3152
AW
215compiler procedures in Guile's language tower.
216
98850fd7
AW
217Compiler procedures take three arguments: an expression, an
218environment, and a keyword list of options. They return three values:
219the compiled expression, the corresponding environment for the target
220language, and a ``continuation environment''. The compiled expression
221and environment will serve as input to the next language's compiler.
222The ``continuation environment'' can be used to compile another
223expression from the same source language within the same module.
81fd3152
AW
224
225For example, you might compile the expression, @code{(define-module
226(foo))}. This will result in a Tree-IL expression and environment. But
227if you compiled a second expression, you would want to take into
228account the compile-time effect of compiling the previous expression,
229which puts the user in the @code{(foo)} module. That is purpose of the
230``continuation environment''; you would pass it as the environment
231when compiling the subsequent expression.
232
41e64dd7
AW
233For Scheme, an environment is a module. By default, the @code{compile}
234and @code{compile-file} procedures compile in a fresh module, such
235that bindings and macros introduced by the expression being compiled
236are isolated:
1ebe6a63
LC
237
238@example
239(eq? (current-module) (compile '(current-module)))
240@result{} #f
241
242(compile '(define hello 'world))
243(defined? 'hello)
244@result{} #f
245
246(define / *)
247(eq? (compile '/) /)
248@result{} #f
249@end example
250
251Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
252@code{current-reader}}) are isolated:
253
254@example
255(compile '(fluid-set! current-reader (lambda args 'fail)))
256(fluid-ref current-reader)
257@result{} #f
258@end example
259
260Nevertheless, having the compiler and @dfn{compilee} share the same name
261space can be achieved by explicitly passing @code{(current-module)} as
262the compilation environment:
263
264@example
265(define hello 'world)
266(compile 'hello #:env (current-module))
267@result{} world
268@end example
269
81fd3152
AW
270@node Tree-IL
271@subsection Tree-IL
00ce5125 272
81fd3152 273Tree Intermediate Language (Tree-IL) is a structured intermediate
c850030f
AW
274language that is close in expressive power to Scheme. It is an
275expanded, pre-analyzed Scheme.
276
81fd3152
AW
277Tree-IL is ``structured'' in the sense that its representation is
278based on records, not S-expressions. This gives a rigidity to the
279language that ensures that compiling to a lower-level language only
41e64dd7
AW
280requires a limited set of transformations. For example, the Tree-IL
281type @code{<const>} is a record type with two fields, @code{src} and
282@code{exp}. Instances of this type are created via @code{make-const}.
283Fields of this type are accessed via the @code{const-src} and
284@code{const-exp} procedures. There is also a predicate, @code{const?}.
285@xref{Records}, for more information on records.
81fd3152
AW
286
287@c alpha renaming
288
289All Tree-IL types have a @code{src} slot, which holds source location
290information for the expression. This information, if present, will be
291residualized into the compiled object code, allowing backtraces to
292show source information. The format of @code{src} is the same as that
293returned by Guile's @code{source-properties} function. @xref{Source
294Properties}, for more information.
295
296Although Tree-IL objects are represented internally using records,
297there is also an equivalent S-expression external representation for
ecb87335 298each kind of Tree-IL. For example, the S-expression representation
81fd3152 299of @code{#<const src: #f exp: 3>} expression would be:
c850030f
AW
300
301@example
81fd3152 302(const 3)
c850030f
AW
303@end example
304
81fd3152 305Users may program with this format directly at the REPL:
c850030f
AW
306
307@example
81fd3152 308scheme@@(guile-user)> ,language tree-il
41e64dd7 309Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
81fd3152 310tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
c850030f
AW
311@result{} 42
312@end example
313
81fd3152
AW
314The @code{src} fields are left out of the external representation.
315
98850fd7
AW
316One may create Tree-IL objects from their external representations via
317calling @code{parse-tree-il}, the reader for Tree-IL. If any source
318information is attached to the input S-expression, it will be
319propagated to the resulting Tree-IL expressions. This is probably the
320easiest way to compile to Tree-IL: just make the appropriate external
321representations in S-expression format, and let @code{parse-tree-il}
322take care of the rest.
323
81fd3152
AW
324@deftp {Scheme Variable} <void> src
325@deftpx {External Representation} (void)
326An empty expression. In practice, equivalent to Scheme's @code{(if #f
327#f)}.
328@end deftp
329@deftp {Scheme Variable} <const> src exp
330@deftpx {External Representation} (const @var{exp})
331A constant.
332@end deftp
333@deftp {Scheme Variable} <primitive-ref> src name
334@deftpx {External Representation} (primitive @var{name})
335A reference to a ``primitive''. A primitive is a procedure that, when
336compiled, may be open-coded. For example, @code{cons} is usually
337recognized as a primitive, so that it compiles down to a single
338instruction.
339
340Compilation of Tree-IL usually begins with a pass that resolves some
341@code{<module-ref>} and @code{<toplevel-ref>} expressions to
342@code{<primitive-ref>} expressions. The actual compilation pass
343has special cases for applications of certain primitives, like
344@code{apply} or @code{cons}.
345@end deftp
346@deftp {Scheme Variable} <lexical-ref> src name gensym
347@deftpx {External Representation} (lexical @var{name} @var{gensym})
348A reference to a lexically-bound variable. The @var{name} is the
349original name of the variable in the source program. @var{gensym} is a
350unique identifier for this variable.
351@end deftp
352@deftp {Scheme Variable} <lexical-set> src name gensym exp
353@deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
354Sets a lexically-bound variable.
355@end deftp
356@deftp {Scheme Variable} <module-ref> src mod name public?
357@deftpx {External Representation} (@@ @var{mod} @var{name})
358@deftpx {External Representation} (@@@@ @var{mod} @var{name})
359A reference to a variable in a specific module. @var{mod} should be
679cceed 360the name of the module, e.g.@: @code{(guile-user)}.
81fd3152
AW
361
362If @var{public?} is true, the variable named @var{name} will be looked
363up in @var{mod}'s public interface, and serialized with @code{@@};
364otherwise it will be looked up among the module's private bindings,
365and is serialized with @code{@@@@}.
366@end deftp
367@deftp {Scheme Variable} <module-set> src mod name public? exp
368@deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
369@deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
370Sets a variable in a specific module.
371@end deftp
372@deftp {Scheme Variable} <toplevel-ref> src name
373@deftpx {External Representation} (toplevel @var{name})
374References a variable from the current procedure's module.
375@end deftp
376@deftp {Scheme Variable} <toplevel-set> src name exp
377@deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
378Sets a variable in the current procedure's module.
379@end deftp
380@deftp {Scheme Variable} <toplevel-define> src name exp
381@deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
382Defines a new top-level variable in the current procedure's module.
383@end deftp
384@deftp {Scheme Variable} <conditional> src test then else
385@deftpx {External Representation} (if @var{test} @var{then} @var{else})
ca445ba5 386A conditional. Note that @var{else} is not optional.
c850030f 387@end deftp
81fd3152
AW
388@deftp {Scheme Variable} <application> src proc args
389@deftpx {External Representation} (apply @var{proc} . @var{args})
ca445ba5 390A procedure call.
c850030f 391@end deftp
81fd3152
AW
392@deftp {Scheme Variable} <sequence> src exps
393@deftpx {External Representation} (begin . @var{exps})
394Like Scheme's @code{begin}.
c850030f 395@end deftp
41e64dd7
AW
396@deftp {Scheme Variable} <lambda> src meta body
397@deftpx {External Representation} (lambda @var{meta} @var{body})
398A closure. @var{meta} is an association list of properties for the
399procedure. @var{body} is a single Tree-IL expression of type
400@code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
401an alternate clause, this makes Tree-IL's @code{<lambda>} have the
402expressiveness of Scheme's @code{case-lambda}.
403@end deftp
404@deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
405@deftpx {External Representation} @
406 (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
407 @var{body})@
408 [@var{alternate}])
409One clause of a @code{case-lambda}. A @code{lambda} expression in
410Scheme is treated as a @code{case-lambda} with one clause.
411
412@var{req} is a list of the procedure's required arguments, as symbols.
413@var{opt} is a list of the optional arguments, or @code{#f} if there
414are no optional arguments. @var{rest} is the name of the rest
415argument, or @code{#f}.
416
417@var{kw} is a list of the form, @code{(@var{allow-other-keys?}
418(@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
419keyword corresponding to the argument named @var{name}, and whose
420corresponding gensym is @var{var}. @var{inits} are tree-il expressions
ecb87335 421corresponding to all of the optional and keyword arguments, evaluated
41e64dd7
AW
422to bind variables whose value is not supplied by the procedure caller.
423Each @var{init} expression is evaluated in the lexical context of
424previously bound variables, from left to right.
425
426@var{gensyms} is a list of gensyms corresponding to all arguments:
427first all of the required arguments, then the optional arguments if
428any, then the rest argument if any, then all of the keyword arguments.
429
430@var{body} is the body of the clause. If the procedure is called with
431an appropriate number of arguments, @var{body} is evaluated in tail
432position. Otherwise, if there is a @var{consequent}, it should be a
433@code{<lambda-case>} expression, representing the next clause to try.
434If there is no @var{consequent}, a wrong-number-of-arguments error is
435signaled.
436@end deftp
437@deftp {Scheme Variable} <let> src names gensyms vals exp
438@deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
81fd3152 439Lexical binding, like Scheme's @code{let}. @var{names} are the
41e64dd7 440original binding names, @var{gensyms} are gensyms corresponding to the
81fd3152
AW
441@var{names}, and @var{vals} are Tree-IL expressions for the values.
442@var{exp} is a single Tree-IL expression.
443@end deftp
935c7aca 444@deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
172988ee
AW
445@deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
446@deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
81fd3152 447A version of @code{<let>} that creates recursive bindings, like
935c7aca 448Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
81fd3152 449@end deftp
41e64dd7
AW
450@deftp {Scheme Variable} <dynlet> fluids vals body
451@deftpx {External Representation} (dynlet @var{fluids} @var{vals} @var{body})
452Dynamic binding; the equivalent of Scheme's @code{with-fluids}.
453@var{fluids} should be a list of Tree-IL expressions that will
454evaluate to fluids, and @var{vals} a corresponding list of expressions
455to bind to the fluids during the dynamic extent of the evaluation of
456@var{body}.
457@end deftp
458@deftp {Scheme Variable} <dynref> fluid
459@deftpx {External Representation} (dynref @var{fluid})
460A dynamic variable reference. @var{fluid} should be a Tree-IL
461expression evaluating to a fluid.
462@end deftp
463@deftp {Scheme Variable} <dynset> fluid exp
464@deftpx {External Representation} (dynset @var{fluid} @var{exp})
465A dynamic variable set. @var{fluid}, a Tree-IL expression evaluating
466to a fluid, will be set to the result of evaluating @var{exp}.
467@end deftp
468@deftp {Scheme Variable} <dynwind> winder body unwinder
469@deftpx {External Representation} (dynwind @var{winder} @var{body} @var{unwinder})
470A @code{dynamic-wind}. @var{winder} and @var{unwinder} should both
471evaluate to thunks. Ensure that the winder and the unwinder are called
472before entering and after leaving @var{body}. Note that @var{body} is
473an expression, without a thunk wrapper.
474@end deftp
475@deftp {Scheme Variable} <prompt> tag body handler
476@deftpx {External Representation} (prompt @var{tag} @var{body} @var{handler})
477A dynamic prompt. Instates a prompt named @var{tag}, an expression,
478during the dynamic extent of the execution of @var{body}, also an
479expression. If an abort occurs to this prompt, control will be passed
480to @var{handler}, a @code{<lambda-case>} expression with no optional
481or keyword arguments, and no alternate. The first argument to the
482@code{<lambda-case>} will be the captured continuation, and then all
483of the values passed to the abort. @xref{Prompts}, for more
484information.
485@end deftp
486@deftp {Scheme Variable} <abort> tag args tail
487@deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
488An abort to the nearest prompt with the name @var{tag}, an expression.
489@var{args} should be a list of expressions to pass to the prompt's
490handler, and @var{tail} should be an expression that will evaluate to
491a list of additional arguments. An abort will save the partial
492continuation, which may later be reinstated, resulting in the
493@code{<abort>} expression evaluating to some number of values.
494@end deftp
81fd3152 495
98850fd7
AW
496There are two Tree-IL constructs that are not normally produced by
497higher-level compilers, but instead are generated during the
498source-to-source optimization and analysis passes that the Tree-IL
499compiler does. Users should not generate these expressions directly,
500unless they feel very clever, as the default analysis pass will
501generate them as necessary.
502
41e64dd7
AW
503@deftp {Scheme Variable} <let-values> src names gensyms exp body
504@deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
98850fd7
AW
505Like Scheme's @code{receive} -- binds the values returned by
506evaluating @code{exp} to the @code{lambda}-like bindings described by
41e64dd7 507@var{gensyms}. That is to say, @var{gensyms} may be an improper list.
98850fd7
AW
508
509@code{<let-values>} is an optimization of @code{<application>} of the
510primitive, @code{call-with-values}.
511@end deftp
41e64dd7
AW
512@deftp {Scheme Variable} <fix> src names gensyms vals body
513@deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
98850fd7
AW
514Like @code{<letrec>}, but only for @var{vals} that are unset
515@code{lambda} expressions.
516
517@code{fix} is an optimization of @code{letrec} (and @code{let}).
518@end deftp
81fd3152
AW
519
520Tree-IL implements a compiler to GLIL that recursively traverses
521Tree-IL expressions, writing out GLIL expressions into a linear list.
522The compiler also keeps some state as to whether the current
523expression is in tail context, and whether its value will be used in
524future computations. This state allows the compiler not to emit code
679cceed 525for constant expressions that will not be used (e.g.@: docstrings), and
81fd3152
AW
526to perform tail calls when in tail position.
527
98850fd7
AW
528Most optimization, such as it currently is, is performed on Tree-IL
529expressions as source-to-source transformations. There will be more
530optimizations added in the future.
c850030f
AW
531
532Interested readers are encouraged to read the implementation in
81fd3152 533@code{(language tree-il compile-glil)} for more details.
00ce5125
AW
534
535@node GLIL
536@subsection GLIL
537
41e64dd7 538Guile Lowlevel Intermediate Language (GLIL) is a structured intermediate
81fd3152 539language whose expressions more closely approximate Guile's VM
98850fd7
AW
540instruction set. Its expression types are defined in @code{(language
541glil)}.
c850030f 542
41e64dd7 543@deftp {Scheme Variable} <glil-program> meta . body
86872cc3 544A unit of code that at run-time will correspond to a compiled
41e64dd7 545procedure. @var{meta} should be an alist of properties, as in
98850fd7
AW
546Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
547expressions.
c850030f 548@end deftp
41e64dd7
AW
549@deftp {Scheme Variable} <glil-std-prelude> nreq nlocs else-label
550A prologue for a function with no optional, keyword, or rest
551arguments. @var{nreq} is the number of required arguments. @var{nlocs}
552the total number of local variables, including the arguments. If the
553procedure was not given exactly @var{nreq} arguments, control will
554jump to @var{else-label}, if given, or otherwise signal an error.
555@end deftp
556@deftp {Scheme Variable} <glil-opt-prelude> nreq nopt rest nlocs else-label
557A prologue for a function with optional or rest arguments. Like
558@code{<glil-std-prelude>}, with the addition that @var{nopt} is the
559number of optional arguments (possibly zero) and @var{rest} is an
560index of a local variable at which to bind a rest argument, or
561@code{#f} if there is no rest argument.
562@end deftp
563@deftp {Scheme Variable} <glil-kw-prelude> nreq nopt rest kw allow-other-keys? nlocs else-label
564A prologue for a function with keyword arguments. Like
565@code{<glil-opt-prelude>}, with the addition that @var{kw} is a list
566of keyword arguments, and @var{allow-other-keys?} is a flag indicating
567whether to allow unknown keys. @xref{Function Prologue Instructions,
568@code{bind-kwargs}}, for details on the format of @var{kw}.
569@end deftp
c850030f 570@deftp {Scheme Variable} <glil-bind> . vars
ff73ae34
AW
571An advisory expression that notes a liveness extent for a set of
572variables. @var{vars} is a list of @code{(@var{name} @var{type}
573@var{index})}, where @var{type} should be either @code{argument},
574@code{local}, or @code{external}.
575
576@code{<glil-bind>} expressions end up being serialized as part of a
577program's metadata and do not form part of a program's code path.
c850030f
AW
578@end deftp
579@deftp {Scheme Variable} <glil-mv-bind> vars rest
ff73ae34
AW
580A multiple-value binding of the values on the stack to @var{vars}. Iff
581@var{rest} is true, the last element of @var{vars} will be treated as
582a rest argument.
583
584In addition to pushing a binding annotation on the stack, like
585@code{<glil-bind>}, an expression is emitted at compilation time to
586make sure that there are enough values available to bind. See the
acc51c3e
AW
587notes on @code{truncate-values} in @ref{Procedure Call and Return
588Instructions}, for more information.
c850030f
AW
589@end deftp
590@deftp {Scheme Variable} <glil-unbind>
ff73ae34
AW
591Closes the liveness extent of the most recently encountered
592@code{<glil-bind>} or @code{<glil-mv-bind>} expression. As GLIL
593expressions are compiled, a parallel stack of live bindings is
594maintained; this expression pops off the top element from that stack.
595
596Bindings are written into the program's metadata so that debuggers and
597other tools can determine the set of live local variables at a given
598offset within a VM program.
c850030f
AW
599@end deftp
600@deftp {Scheme Variable} <glil-source> loc
ff73ae34 601Records source information for the preceding expression. @var{loc}
73643339 602should be an association list of containing @code{line} @code{column},
679cceed 603and @code{filename} keys, e.g.@: as returned by
73643339 604@code{source-properties}.
c850030f
AW
605@end deftp
606@deftp {Scheme Variable} <glil-void>
98850fd7 607Pushes ``the unspecified value'' on the stack.
c850030f
AW
608@end deftp
609@deftp {Scheme Variable} <glil-const> obj
ff73ae34 610Pushes a constant value onto the stack. @var{obj} must be a number,
98850fd7
AW
611string, symbol, keyword, boolean, character, uniform array, the empty
612list, or a pair or vector of constants.
c850030f 613@end deftp
98850fd7
AW
614@deftp {Scheme Variable} <glil-lexical> local? boxed? op index
615Accesses a lexically bound variable. If the variable is not
41e64dd7
AW
616@var{local?} it is free. All variables may have @code{ref},
617@code{set}, and @code{bound?} as their @var{op}. Boxed variables may
618also have the @var{op}s @code{box}, @code{empty-box}, and @code{fix},
619which correspond in semantics to the VM instructions @code{box},
98850fd7
AW
620@code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
621more information.
c850030f
AW
622@end deftp
623@deftp {Scheme Variable} <glil-toplevel> op name
ff73ae34
AW
624Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
625or @code{define}.
c850030f
AW
626@end deftp
627@deftp {Scheme Variable} <glil-module> op mod name public?
73643339
AW
628Accesses a variable within a specific module. See Tree-IL's
629@code{<module-ref>}, for more information.
c850030f
AW
630@end deftp
631@deftp {Scheme Variable} <glil-label> label
ff73ae34
AW
632Creates a new label. @var{label} can be any Scheme value, and should
633be unique.
c850030f
AW
634@end deftp
635@deftp {Scheme Variable} <glil-branch> inst label
ff73ae34 636Branch to a label. @var{label} should be a @code{<ghil-label>}.
c850030f
AW
637@code{inst} is a branching instruction: @code{br-if}, @code{br}, etc.
638@end deftp
639@deftp {Scheme Variable} <glil-call> inst nargs
ff73ae34 640This expression is probably misnamed, as it does not correspond to
c850030f
AW
641function calls. @code{<glil-call>} invokes the VM instruction named
642@var{inst}, noting that it is called with @var{nargs} stack arguments.
ff73ae34
AW
643The arguments should be pushed on the stack already. What happens to
644the stack afterwards depends on the instruction.
c850030f
AW
645@end deftp
646@deftp {Scheme Variable} <glil-mv-call> nargs ra
ff73ae34
AW
647Performs a multiple-value call. @var{ra} is a @code{<glil-label>}
648corresponding to the multiple-value return address for the call. See
acc51c3e
AW
649the notes on @code{mv-call} in @ref{Procedure Call and Return
650Instructions}, for more information.
c850030f 651@end deftp
41e64dd7
AW
652@deftp {Scheme Variable} <glil-prompt> label escape-only?
653Push a dynamic prompt into the stack, with a handler at @var{label}.
654@var{escape-only?} is a flag that is propagated to the prompt,
655allowing an abort to avoid capturing a continuation in some cases.
656@xref{Prompts}, for more information.
657@end deftp
c850030f 658
ff73ae34 659Users may enter in GLIL at the REPL as well, though there is a bit
41e64dd7 660more bookkeeping to do:
00ce5125 661
ff73ae34
AW
662@example
663scheme@@(guile-user)> ,language glil
41e64dd7
AW
664Happy hacking with Guile Lowlevel Intermediate Language (GLIL)!
665To switch back, type `,L scheme'.
666glil@@(guile-user)> (program () (std-prelude 0 0 #f)
667 (const 3) (call return 1))
ff73ae34
AW
668@result{} 3
669@end example
00ce5125 670
ff73ae34
AW
671Just as in all of Guile's compilers, an environment is passed to the
672GLIL-to-object code compiler, and one is returned as well, along with
673the object code.
00ce5125 674
81fd3152
AW
675@node Assembly
676@subsection Assembly
677
73643339
AW
678Assembly is an S-expression-based, human-readable representation of
679the actual bytecodes that will be emitted for the VM. As such, it is a
680useful intermediate language both for compilation and for
681decompilation.
81fd3152 682
73643339
AW
683Besides the fact that it is not a record-based language, assembly
684differs from GLIL in four main ways:
00ce5125 685
73643339
AW
686@itemize
687@item Labels have been resolved to byte offsets in the program.
688@item Constants inside procedures have either been expressed as inline
98850fd7 689instructions or cached in object arrays.
73643339
AW
690@item Procedures with metadata (source location information, liveness
691extents, procedure names, generic properties, etc) have had their
692metadata serialized out to thunks.
693@item All expressions correspond directly to VM instructions -- i.e.,
98850fd7 694there is no @code{<glil-lexical>} which can be a ref or a set.
73643339
AW
695@end itemize
696
697Assembly is isomorphic to the bytecode that it compiles to. You can
698compile to bytecode, then decompile back to assembly, and you have the
699same assembly code.
700
701The general form of assembly instructions is the following:
702
703@lisp
704(@var{inst} @var{arg} ...)
705@end lisp
706
707The @var{inst} names a VM instruction, and its @var{arg}s will be
708embedded in the instruction stream. The easiest way to see assembly is
709to play around with it at the REPL, as can be seen in this annotated
710example:
711
712@example
dc3b2661 713scheme@@(guile-user)> ,pp (compile '(+ 32 10) #:to 'assembly)
41e64dd7 714(load-program
0a715b9a
AW
715 ((:LCASE16 . 2)) ; Labels, unused in this case.
716 8 ; Length of the thunk that was compiled.
41e64dd7 717 (load-program ; Metadata thunk.
73643339 718 ()
41e64dd7
AW
719 17
720 #f ; No metadata thunk for the metadata thunk.
721 (make-eol)
722 (make-eol)
0a715b9a
AW
723 (make-int8 2) ; Liveness extents, source info, and arities,
724 (make-int8 8) ; in a format that Guile knows how to parse.
725 (make-int8:0)
41e64dd7
AW
726 (list 0 3)
727 (list 0 1)
728 (list 0 3)
729 (return))
0a715b9a 730 (assert-nargs-ee/locals 0) ; Prologue.
41e64dd7
AW
731 (make-int8 32) ; Actual code starts here.
732 (make-int8 10)
733 (add)
0a715b9a 734 (return))
73643339
AW
735@end example
736
737Of course you can switch the REPL to assembly and enter in assembly
738S-expressions directly, like with other languages, though it is more
739difficult, given that the length fields have to be correct.
740
741@node Bytecode and Objcode
742@subsection Bytecode and Objcode
743
744Finally, the raw bytes. There are actually two different ``languages''
745here, corresponding to two different ways to represent the bytes.
746
747``Bytecode'' represents code as uniform byte vectors, useful for
748structuring and destructuring code on the Scheme level. Bytecode is
749the next step down from assembly:
750
751@example
73643339 752scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
0a715b9a
AW
753@result{} #vu8(8 0 0 0 25 0 0 0 ; Header.
754 95 0 ; Prologue.
755 10 32 10 10 148 66 17 ; Actual code.
756 0 0 0 0 0 0 0 9 ; Metadata thunk.
757 9 10 2 10 8 11 18 0 3 18 0 1 18 0 3 66)
73643339
AW
758@end example
759
760``Objcode'' is bytecode, but mapped directly to a C structure,
761@code{struct scm_objcode}:
762
763@example
764struct scm_objcode @{
73643339
AW
765 scm_t_uint32 len;
766 scm_t_uint32 metalen;
767 scm_t_uint8 base[0];
768@};
769@end example
770
771As one might imagine, objcode imposes a minimum length on the
41e64dd7
AW
772bytecode. Also, the @code{len} and @code{metalen} fields are in native
773endianness, which makes objcode (and bytecode) system-dependent.
73643339
AW
774
775Objcode also has a couple of important efficiency hacks. First,
776objcode may be mapped directly from disk, allowing compiled code to be
777loaded quickly, often from the system's disk cache, and shared among
778multiple processes. Secondly, objcode may be embedded in other
779objcode, allowing procedures to have the text of other procedures
780inlined into their bodies, without the need for separate allocation of
781the code. Of course, the objcode object itself does need to be
782allocated.
783
784Procedures related to objcode are defined in the @code{(system vm
785objcode)} module.
00ce5125 786
ff73ae34
AW
787@deffn {Scheme Procedure} objcode? obj
788@deffnx {C Function} scm_objcode_p (obj)
789Returns @code{#f} iff @var{obj} is object code, @code{#f} otherwise.
790@end deffn
00ce5125 791
73643339 792@deffn {Scheme Procedure} bytecode->objcode bytecode
42a438e8 793@deffnx {C Function} scm_bytecode_to_objcode (bytecode)
ff73ae34 794Makes a bytecode object from @var{bytecode}, which should be a
41e64dd7 795bytevector. @xref{Bytevectors}.
ff73ae34 796@end deffn
e3ba263d 797
ff73ae34
AW
798@deffn {Scheme Variable} load-objcode file
799@deffnx {C Function} scm_load_objcode (file)
800Load object code from a file named @var{file}. The file will be mapped
801into memory via @code{mmap}, so this is a very fast operation.
e3ba263d 802
98850fd7 803On disk, object code has an sixteen-byte cookie prepended to it, to
73643339
AW
804prevent accidental loading of arbitrary garbage.
805@end deffn
806
807@deffn {Scheme Variable} write-objcode objcode file
808@deffnx {C Function} scm_write_objcode (objcode)
41e64dd7 809Write object code out to a file, prepending the sixteen-byte cookie.
ff73ae34 810@end deffn
e3ba263d 811
41e64dd7
AW
812@deffn {Scheme Variable} objcode->bytecode objcode
813@deffnx {C Function} scm_objcode_to_bytecode (objcode)
814Copy object code out to a bytevector for analysis by Scheme.
ff73ae34 815@end deffn
e3ba263d 816
73643339
AW
817The following procedure is actually in @code{(system vm program)}, but
818we'll mention it here:
819
98850fd7
AW
820@deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
821@deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
ff73ae34 822Load up object code into a Scheme program. The resulting program will
73643339 823have @var{objtable} as its object table, which should be a vector or
98850fd7 824@code{#f}, and will capture the free variables from @var{free-vars}.
ff73ae34 825@end deffn
c850030f 826
ff73ae34
AW
827Object code from a file may be disassembled at the REPL via the
828meta-command @code{,disassemble-file}, abbreviated as @code{,xx}.
829Programs may be disassembled via @code{,disassemble}, abbreviated as
830@code{,x}.
831
832Compiling object code to the fake language, @code{value}, is performed
833via loading objcode into a program, then executing that thunk with
834respect to the compilation environment. Normally the environment
835propagates through the compiler transparently, but users may specify
41e64dd7 836the compilation environment manually as well, as a module.
ff73ae34 837
c850030f 838
e63d888e
DK
839@node Writing New High-Level Languages
840@subsection Writing New High-Level Languages
841
842In order to integrate a new language @var{lang} into Guile's compiler
843system, one has to create the module @code{(language @var{lang} spec)}
844containing the language definition and referencing the parser,
845compiler and other routines processing it. The module hierarchy in
846@code{(language brainfuck)} defines a very basic Brainfuck
847implementation meant to serve as easy-to-understand example on how to
4e432dab
AW
848do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
849for more information about the Brainfuck language itself.
850
e63d888e 851
ff73ae34
AW
852@node Extending the Compiler
853@subsection Extending the Compiler
e3ba263d 854
8fa6525e
NJ
855At this point we take a detour from the impersonal tone of the rest of
856the manual. Admit it: if you've read this far into the compiler
857internals manual, you are a junkie. Perhaps a course at your university
858left you unsated, or perhaps you've always harbored a desire to hack the
859holy of computer science holies: a compiler. Well you're in good
860company, and in a good position. Guile's compiler needs your help.
ff73ae34
AW
861
862There are many possible avenues for improving Guile's compiler.
863Probably the most important improvement, speed-wise, will be some form
864of native compilation, both just-in-time and ahead-of-time. This could
865be done in many ways. Probably the easiest strategy would be to extend
866the compiled procedure structure to include a pointer to a native code
86872cc3 867vector, and compile from bytecode to native code at run-time after a
ff73ae34
AW
868procedure is called a certain number of times.
869
870The name of the game is a profiling-based harvest of the low-hanging
871fruit, running programs of interest under a system-level profiler and
872determining which improvements would give the most bang for the buck.
98850fd7
AW
873It's really getting to the point though that native compilation is the
874next step.
ff73ae34
AW
875
876The compiler also needs help at the top end, enhancing the Scheme that
98850fd7
AW
877it knows to also understand R6RS, and adding new high-level compilers.
878We have JavaScript and Emacs Lisp mostly complete, but they could use
ecb87335 879some love; Lua would be nice as well, but whatever language it is
98850fd7
AW
880that strikes your fancy would be welcome too.
881
882Compilers are for hacking, not for admiring or for complaining about.
883Get to it!