validating repl options; value-history on by default
[bpt/guile.git] / doc / ref / compiler.texi
CommitLineData
8680d53b
AW
1@c -*-texinfo-*-
2@c This is part of the GNU Guile Reference Manual.
acc51c3e 3@c Copyright (C) 2008, 2009, 2010
8680d53b
AW
4@c Free Software Foundation, Inc.
5@c See the file guile.texi for copying conditions.
6
7@node Compiling to the Virtual Machine
8@section Compiling to the Virtual Machine
9
00ce5125
AW
10Compilers have a mystique about them that is attractive and
11off-putting at the same time. They are attractive because they are
12magical -- they transform inert text into live results, like throwing
e33e3aee
AW
13the switch on Frankenstein's monster. However, this magic is perceived
14by many to be impenetrable.
00ce5125 15
0b8f3ac5
AW
16This section aims to pay attention to the small man behind the
17curtain.
00ce5125 18
e3ba263d 19@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
98850fd7 20know how to compile your @code{.scm} file.
00ce5125
AW
21
22@menu
23* Compiler Tower::
24* The Scheme Compiler::
81fd3152 25* Tree-IL::
00ce5125 26* GLIL::
81fd3152 27* Assembly::
73643339 28* Bytecode and Objcode::
e63d888e 29* Writing New High-Level Languages::
e3ba263d 30* Extending the Compiler::
00ce5125
AW
31@end menu
32
33@node Compiler Tower
34@subsection Compiler Tower
35
36Guile's compiler is quite simple, actually -- its @emph{compilers}, to
37put it more accurately. Guile defines a tower of languages, starting
38at Scheme and progressively simplifying down to languages that
e3ba263d 39resemble the VM instruction set (@pxref{Instruction Set}).
00ce5125
AW
40
41Each language knows how to compile to the next, so each step is simple
42and understandable. Furthermore, this set of languages is not
43hardcoded into Guile, so it is possible for the user to add new
44high-level languages, new passes, or even different compilation
45targets.
46
e3ba263d
AW
47Languages are registered in the module, @code{(system base language)}:
48
49@example
50(use-modules (system base language))
51@end example
52
53They are registered with the @code{define-language} form.
54
55@deffn {Scheme Syntax} define-language @
41e64dd7
AW
56name title reader printer @
57[parser=#f] [compilers='()] [decompilers='()] [evaluator=#f] @
58[joiner=#f] [make-default-environment=make-fresh-user-module]
e3ba263d
AW
59Define a language.
60
61This syntax defines a @code{#<language>} object, bound to @var{name}
62in the current environment. In addition, the language will be added to
63the global language set. For example, this is the language definition
64for Scheme:
65
66@example
67(define-language scheme
41e64dd7
AW
68 #:title "Scheme"
69 #:reader (lambda (port env) ...)
98850fd7 70 #:compilers `((tree-il . ,compile-tree-il))
81fd3152 71 #:decompilers `((tree-il . ,decompile-tree-il))
41e64dd7
AW
72 #:evaluator (lambda (x module) (primitive-eval x))
73 #:printer write
74 #:make-default-environment (lambda () ...))
e3ba263d 75@end example
e3ba263d
AW
76@end deffn
77
78The interesting thing about having languages defined this way is that
79they present a uniform interface to the read-eval-print loop. This
80allows the user to change the current language of the REPL:
81
82@example
81fd3152 83scheme@@(guile-user)> ,language tree-il
41e64dd7
AW
84Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
85tree-il@@(guile-user)> ,L scheme
86Happy hacking with Scheme! To switch back, type `,L tree-il'.
87scheme@@(guile-user)>
e3ba263d
AW
88@end example
89
90Languages can be looked up by name, as they were above.
91
92@deffn {Scheme Procedure} lookup-language name
93Looks up a language named @var{name}, autoloading it if necessary.
94
95Languages are autoloaded by looking for a variable named @var{name} in
96a module named @code{(language @var{name} spec)}.
97
98The language object will be returned, or @code{#f} if there does not
99exist a language with that name.
100@end deffn
101
102Defining languages this way allows us to programmatically determine
103the necessary steps for compiling code from one language to another.
104
105@deffn {Scheme Procedure} lookup-compilation-order from to
106Recursively traverses the set of languages to which @var{from} can
107compile, depth-first, and return the first path that can transform
108@var{from} to @var{to}. Returns @code{#f} if no path is found.
109
110This function memoizes its results in a cache that is invalidated by
111subsequent calls to @code{define-language}, so it should be quite
112fast.
113@end deffn
114
115There is a notion of a ``current language'', which is maintained in
116the @code{*current-language*} fluid. This language is normally Scheme,
86872cc3 117and may be rebound by the user. The run-time compilation interfaces
e3ba263d
AW
118(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
119and target languages.
120
121The normal tower of languages when compiling Scheme goes like this:
122
123@itemize
41e64dd7 124@item Scheme
81fd3152 125@item Tree Intermediate Language (Tree-IL)
41e64dd7 126@item Guile Lowlevel Intermediate Language (GLIL)
81fd3152
AW
127@item Assembly
128@item Bytecode
73643339 129@item Objcode
e3ba263d
AW
130@end itemize
131
132Object code may be serialized to disk directly, though it has a cookie
73643339
AW
133and version prepended to the front. But when compiling Scheme at run
134time, you want a Scheme value: for example, a compiled procedure. For
135this reason, so as not to break the abstraction, Guile defines a fake
81fd3152
AW
136language at the bottom of the tower:
137
138@itemize
139@item Value
140@end itemize
141
142Compiling to @code{value} loads the object code into a procedure, and
143wakes the sleeping giant.
e3ba263d
AW
144
145Perhaps this strangeness can be explained by example:
146@code{compile-file} defaults to compiling to object code, because it
147produces object code that has to live in the barren world outside the
148Guile runtime; but @code{compile} defaults to compiling to
149@code{value}, as its product re-enters the Guile world.
150
151Indeed, the process of compilation can circulate through these
152different worlds indefinitely, as shown by the following quine:
153
154@example
00ce5125 155((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
e3ba263d 156@end example
00ce5125
AW
157
158@node The Scheme Compiler
159@subsection The Scheme Compiler
160
81fd3152
AW
161The job of the Scheme compiler is to expand all macros and all of
162Scheme to its most primitive expressions. The definition of
163``primitive'' is given by the inventory of constructs provided by
164Tree-IL, the target language of the Scheme compiler: procedure
165applications, conditionals, lexical references, etc. This is described
166more fully in the next section.
167
168The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
169that it is completely implemented by the macro expander. Since the
170macro expander has to run over all of the source code already in order
171to expand macros, it might as well do the analysis at the same time,
172producing Tree-IL expressions directly.
173
174Because this compiler is actually the macro expander, it is
175extensible. Any macro which the user writes becomes part of the
176compiler.
177
178The Scheme-to-Tree-IL expander may be invoked using the generic
179@code{compile} procedure:
180
181@lisp
182(compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
183@result{}
184 #<<application> src: #f
185 proc: #<<toplevel-ref> src: #f name: +>
186 args: (#<<const> src: #f exp: 1>
187 #<<const> src: #f exp: 2>)>
188@end lisp
189
190Or, since Tree-IL is so close to Scheme, it is often useful to expand
191Scheme to Tree-IL, then translate back to Scheme. For that reason the
192expander provides two interfaces. The former is equivalent to calling
41e64dd7 193@code{(macroexpand '(+ 1 2) 'c)}, where the @code{'c} is for
81fd3152
AW
194``compile''. With @code{'e} (the default), the result is translated
195back to Scheme:
196
197@lisp
41e64dd7 198(macroexpand '(+ 1 2))
81fd3152 199@result{} (+ 1 2)
41e64dd7 200(macroexpand '(let ((x 10)) (* x x)))
81fd3152
AW
201@result{} (let ((x84 10)) (* x84 x84))
202@end lisp
203
204The second example shows that as part of its job, the macro expander
205renames lexically-bound variables. The original names are preserved
206when compiling to Tree-IL, but can't be represented in Scheme: a
207lexical binding only has one name. It is for this reason that the
208@emph{native} output of the expander is @emph{not} Scheme. There's too
209much information we would lose if we translated to Scheme directly:
210lexical variable names, source locations, and module hygiene.
211
41e64dd7
AW
212Note however that @code{macroexpand} does not have the same signature
213as @code{compile-tree-il}. @code{compile-tree-il} is a small wrapper
214around @code{macroexpand}, to make it conform to the general form of
81fd3152
AW
215compiler procedures in Guile's language tower.
216
98850fd7
AW
217Compiler procedures take three arguments: an expression, an
218environment, and a keyword list of options. They return three values:
219the compiled expression, the corresponding environment for the target
220language, and a ``continuation environment''. The compiled expression
221and environment will serve as input to the next language's compiler.
222The ``continuation environment'' can be used to compile another
223expression from the same source language within the same module.
81fd3152
AW
224
225For example, you might compile the expression, @code{(define-module
226(foo))}. This will result in a Tree-IL expression and environment. But
227if you compiled a second expression, you would want to take into
228account the compile-time effect of compiling the previous expression,
229which puts the user in the @code{(foo)} module. That is purpose of the
230``continuation environment''; you would pass it as the environment
231when compiling the subsequent expression.
232
41e64dd7
AW
233For Scheme, an environment is a module. By default, the @code{compile}
234and @code{compile-file} procedures compile in a fresh module, such
235that bindings and macros introduced by the expression being compiled
236are isolated:
1ebe6a63
LC
237
238@example
239(eq? (current-module) (compile '(current-module)))
240@result{} #f
241
242(compile '(define hello 'world))
243(defined? 'hello)
244@result{} #f
245
246(define / *)
247(eq? (compile '/) /)
248@result{} #f
249@end example
250
251Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
252@code{current-reader}}) are isolated:
253
254@example
255(compile '(fluid-set! current-reader (lambda args 'fail)))
256(fluid-ref current-reader)
257@result{} #f
258@end example
259
260Nevertheless, having the compiler and @dfn{compilee} share the same name
261space can be achieved by explicitly passing @code{(current-module)} as
262the compilation environment:
263
264@example
265(define hello 'world)
266(compile 'hello #:env (current-module))
267@result{} world
268@end example
269
81fd3152
AW
270@node Tree-IL
271@subsection Tree-IL
00ce5125 272
81fd3152 273Tree Intermediate Language (Tree-IL) is a structured intermediate
c850030f
AW
274language that is close in expressive power to Scheme. It is an
275expanded, pre-analyzed Scheme.
276
81fd3152
AW
277Tree-IL is ``structured'' in the sense that its representation is
278based on records, not S-expressions. This gives a rigidity to the
279language that ensures that compiling to a lower-level language only
41e64dd7
AW
280requires a limited set of transformations. For example, the Tree-IL
281type @code{<const>} is a record type with two fields, @code{src} and
282@code{exp}. Instances of this type are created via @code{make-const}.
283Fields of this type are accessed via the @code{const-src} and
284@code{const-exp} procedures. There is also a predicate, @code{const?}.
285@xref{Records}, for more information on records.
81fd3152
AW
286
287@c alpha renaming
288
289All Tree-IL types have a @code{src} slot, which holds source location
290information for the expression. This information, if present, will be
291residualized into the compiled object code, allowing backtraces to
292show source information. The format of @code{src} is the same as that
293returned by Guile's @code{source-properties} function. @xref{Source
294Properties}, for more information.
295
296Although Tree-IL objects are represented internally using records,
297there is also an equivalent S-expression external representation for
298each kind of Tree-IL. For example, an the S-expression representation
299of @code{#<const src: #f exp: 3>} expression would be:
c850030f
AW
300
301@example
81fd3152 302(const 3)
c850030f
AW
303@end example
304
81fd3152 305Users may program with this format directly at the REPL:
c850030f
AW
306
307@example
81fd3152 308scheme@@(guile-user)> ,language tree-il
41e64dd7 309Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
81fd3152 310tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
c850030f
AW
311@result{} 42
312@end example
313
81fd3152
AW
314The @code{src} fields are left out of the external representation.
315
98850fd7
AW
316One may create Tree-IL objects from their external representations via
317calling @code{parse-tree-il}, the reader for Tree-IL. If any source
318information is attached to the input S-expression, it will be
319propagated to the resulting Tree-IL expressions. This is probably the
320easiest way to compile to Tree-IL: just make the appropriate external
321representations in S-expression format, and let @code{parse-tree-il}
322take care of the rest.
323
81fd3152
AW
324@deftp {Scheme Variable} <void> src
325@deftpx {External Representation} (void)
326An empty expression. In practice, equivalent to Scheme's @code{(if #f
327#f)}.
328@end deftp
329@deftp {Scheme Variable} <const> src exp
330@deftpx {External Representation} (const @var{exp})
331A constant.
332@end deftp
333@deftp {Scheme Variable} <primitive-ref> src name
334@deftpx {External Representation} (primitive @var{name})
335A reference to a ``primitive''. A primitive is a procedure that, when
336compiled, may be open-coded. For example, @code{cons} is usually
337recognized as a primitive, so that it compiles down to a single
338instruction.
339
340Compilation of Tree-IL usually begins with a pass that resolves some
341@code{<module-ref>} and @code{<toplevel-ref>} expressions to
342@code{<primitive-ref>} expressions. The actual compilation pass
343has special cases for applications of certain primitives, like
344@code{apply} or @code{cons}.
345@end deftp
346@deftp {Scheme Variable} <lexical-ref> src name gensym
347@deftpx {External Representation} (lexical @var{name} @var{gensym})
348A reference to a lexically-bound variable. The @var{name} is the
349original name of the variable in the source program. @var{gensym} is a
350unique identifier for this variable.
351@end deftp
352@deftp {Scheme Variable} <lexical-set> src name gensym exp
353@deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
354Sets a lexically-bound variable.
355@end deftp
356@deftp {Scheme Variable} <module-ref> src mod name public?
357@deftpx {External Representation} (@@ @var{mod} @var{name})
358@deftpx {External Representation} (@@@@ @var{mod} @var{name})
359A reference to a variable in a specific module. @var{mod} should be
360the name of the module, e.g. @code{(guile-user)}.
361
362If @var{public?} is true, the variable named @var{name} will be looked
363up in @var{mod}'s public interface, and serialized with @code{@@};
364otherwise it will be looked up among the module's private bindings,
365and is serialized with @code{@@@@}.
366@end deftp
367@deftp {Scheme Variable} <module-set> src mod name public? exp
368@deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
369@deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
370Sets a variable in a specific module.
371@end deftp
372@deftp {Scheme Variable} <toplevel-ref> src name
373@deftpx {External Representation} (toplevel @var{name})
374References a variable from the current procedure's module.
375@end deftp
376@deftp {Scheme Variable} <toplevel-set> src name exp
377@deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
378Sets a variable in the current procedure's module.
379@end deftp
380@deftp {Scheme Variable} <toplevel-define> src name exp
381@deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
382Defines a new top-level variable in the current procedure's module.
383@end deftp
384@deftp {Scheme Variable} <conditional> src test then else
385@deftpx {External Representation} (if @var{test} @var{then} @var{else})
ca445ba5 386A conditional. Note that @var{else} is not optional.
c850030f 387@end deftp
81fd3152
AW
388@deftp {Scheme Variable} <application> src proc args
389@deftpx {External Representation} (apply @var{proc} . @var{args})
ca445ba5 390A procedure call.
c850030f 391@end deftp
81fd3152
AW
392@deftp {Scheme Variable} <sequence> src exps
393@deftpx {External Representation} (begin . @var{exps})
394Like Scheme's @code{begin}.
c850030f 395@end deftp
41e64dd7
AW
396@deftp {Scheme Variable} <lambda> src meta body
397@deftpx {External Representation} (lambda @var{meta} @var{body})
398A closure. @var{meta} is an association list of properties for the
399procedure. @var{body} is a single Tree-IL expression of type
400@code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
401an alternate clause, this makes Tree-IL's @code{<lambda>} have the
402expressiveness of Scheme's @code{case-lambda}.
403@end deftp
404@deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
405@deftpx {External Representation} @
406 (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
407 @var{body})@
408 [@var{alternate}])
409One clause of a @code{case-lambda}. A @code{lambda} expression in
410Scheme is treated as a @code{case-lambda} with one clause.
411
412@var{req} is a list of the procedure's required arguments, as symbols.
413@var{opt} is a list of the optional arguments, or @code{#f} if there
414are no optional arguments. @var{rest} is the name of the rest
415argument, or @code{#f}.
416
417@var{kw} is a list of the form, @code{(@var{allow-other-keys?}
418(@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
419keyword corresponding to the argument named @var{name}, and whose
420corresponding gensym is @var{var}. @var{inits} are tree-il expressions
421corresponding to all of the optional and keyword argumens, evaluated
422to bind variables whose value is not supplied by the procedure caller.
423Each @var{init} expression is evaluated in the lexical context of
424previously bound variables, from left to right.
425
426@var{gensyms} is a list of gensyms corresponding to all arguments:
427first all of the required arguments, then the optional arguments if
428any, then the rest argument if any, then all of the keyword arguments.
429
430@var{body} is the body of the clause. If the procedure is called with
431an appropriate number of arguments, @var{body} is evaluated in tail
432position. Otherwise, if there is a @var{consequent}, it should be a
433@code{<lambda-case>} expression, representing the next clause to try.
434If there is no @var{consequent}, a wrong-number-of-arguments error is
435signaled.
436@end deftp
437@deftp {Scheme Variable} <let> src names gensyms vals exp
438@deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
81fd3152 439Lexical binding, like Scheme's @code{let}. @var{names} are the
41e64dd7 440original binding names, @var{gensyms} are gensyms corresponding to the
81fd3152
AW
441@var{names}, and @var{vals} are Tree-IL expressions for the values.
442@var{exp} is a single Tree-IL expression.
443@end deftp
41e64dd7
AW
444@deftp {Scheme Variable} <letrec> src names gensyms vals exp
445@deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
81fd3152
AW
446A version of @code{<let>} that creates recursive bindings, like
447Scheme's @code{letrec}.
448@end deftp
41e64dd7
AW
449@deftp {Scheme Variable} <dynlet> fluids vals body
450@deftpx {External Representation} (dynlet @var{fluids} @var{vals} @var{body})
451Dynamic binding; the equivalent of Scheme's @code{with-fluids}.
452@var{fluids} should be a list of Tree-IL expressions that will
453evaluate to fluids, and @var{vals} a corresponding list of expressions
454to bind to the fluids during the dynamic extent of the evaluation of
455@var{body}.
456@end deftp
457@deftp {Scheme Variable} <dynref> fluid
458@deftpx {External Representation} (dynref @var{fluid})
459A dynamic variable reference. @var{fluid} should be a Tree-IL
460expression evaluating to a fluid.
461@end deftp
462@deftp {Scheme Variable} <dynset> fluid exp
463@deftpx {External Representation} (dynset @var{fluid} @var{exp})
464A dynamic variable set. @var{fluid}, a Tree-IL expression evaluating
465to a fluid, will be set to the result of evaluating @var{exp}.
466@end deftp
467@deftp {Scheme Variable} <dynwind> winder body unwinder
468@deftpx {External Representation} (dynwind @var{winder} @var{body} @var{unwinder})
469A @code{dynamic-wind}. @var{winder} and @var{unwinder} should both
470evaluate to thunks. Ensure that the winder and the unwinder are called
471before entering and after leaving @var{body}. Note that @var{body} is
472an expression, without a thunk wrapper.
473@end deftp
474@deftp {Scheme Variable} <prompt> tag body handler
475@deftpx {External Representation} (prompt @var{tag} @var{body} @var{handler})
476A dynamic prompt. Instates a prompt named @var{tag}, an expression,
477during the dynamic extent of the execution of @var{body}, also an
478expression. If an abort occurs to this prompt, control will be passed
479to @var{handler}, a @code{<lambda-case>} expression with no optional
480or keyword arguments, and no alternate. The first argument to the
481@code{<lambda-case>} will be the captured continuation, and then all
482of the values passed to the abort. @xref{Prompts}, for more
483information.
484@end deftp
485@deftp {Scheme Variable} <abort> tag args tail
486@deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
487An abort to the nearest prompt with the name @var{tag}, an expression.
488@var{args} should be a list of expressions to pass to the prompt's
489handler, and @var{tail} should be an expression that will evaluate to
490a list of additional arguments. An abort will save the partial
491continuation, which may later be reinstated, resulting in the
492@code{<abort>} expression evaluating to some number of values.
493@end deftp
81fd3152 494
98850fd7
AW
495There are two Tree-IL constructs that are not normally produced by
496higher-level compilers, but instead are generated during the
497source-to-source optimization and analysis passes that the Tree-IL
498compiler does. Users should not generate these expressions directly,
499unless they feel very clever, as the default analysis pass will
500generate them as necessary.
501
41e64dd7
AW
502@deftp {Scheme Variable} <let-values> src names gensyms exp body
503@deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
98850fd7
AW
504Like Scheme's @code{receive} -- binds the values returned by
505evaluating @code{exp} to the @code{lambda}-like bindings described by
41e64dd7 506@var{gensyms}. That is to say, @var{gensyms} may be an improper list.
98850fd7
AW
507
508@code{<let-values>} is an optimization of @code{<application>} of the
509primitive, @code{call-with-values}.
510@end deftp
41e64dd7
AW
511@deftp {Scheme Variable} <fix> src names gensyms vals body
512@deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
98850fd7
AW
513Like @code{<letrec>}, but only for @var{vals} that are unset
514@code{lambda} expressions.
515
516@code{fix} is an optimization of @code{letrec} (and @code{let}).
517@end deftp
81fd3152
AW
518
519Tree-IL implements a compiler to GLIL that recursively traverses
520Tree-IL expressions, writing out GLIL expressions into a linear list.
521The compiler also keeps some state as to whether the current
522expression is in tail context, and whether its value will be used in
523future computations. This state allows the compiler not to emit code
524for constant expressions that will not be used (e.g. docstrings), and
525to perform tail calls when in tail position.
526
98850fd7
AW
527Most optimization, such as it currently is, is performed on Tree-IL
528expressions as source-to-source transformations. There will be more
529optimizations added in the future.
c850030f
AW
530
531Interested readers are encouraged to read the implementation in
81fd3152 532@code{(language tree-il compile-glil)} for more details.
00ce5125
AW
533
534@node GLIL
535@subsection GLIL
536
41e64dd7 537Guile Lowlevel Intermediate Language (GLIL) is a structured intermediate
81fd3152 538language whose expressions more closely approximate Guile's VM
98850fd7
AW
539instruction set. Its expression types are defined in @code{(language
540glil)}.
c850030f 541
41e64dd7 542@deftp {Scheme Variable} <glil-program> meta . body
86872cc3 543A unit of code that at run-time will correspond to a compiled
41e64dd7 544procedure. @var{meta} should be an alist of properties, as in
98850fd7
AW
545Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
546expressions.
c850030f 547@end deftp
41e64dd7
AW
548@deftp {Scheme Variable} <glil-std-prelude> nreq nlocs else-label
549A prologue for a function with no optional, keyword, or rest
550arguments. @var{nreq} is the number of required arguments. @var{nlocs}
551the total number of local variables, including the arguments. If the
552procedure was not given exactly @var{nreq} arguments, control will
553jump to @var{else-label}, if given, or otherwise signal an error.
554@end deftp
555@deftp {Scheme Variable} <glil-opt-prelude> nreq nopt rest nlocs else-label
556A prologue for a function with optional or rest arguments. Like
557@code{<glil-std-prelude>}, with the addition that @var{nopt} is the
558number of optional arguments (possibly zero) and @var{rest} is an
559index of a local variable at which to bind a rest argument, or
560@code{#f} if there is no rest argument.
561@end deftp
562@deftp {Scheme Variable} <glil-kw-prelude> nreq nopt rest kw allow-other-keys? nlocs else-label
563A prologue for a function with keyword arguments. Like
564@code{<glil-opt-prelude>}, with the addition that @var{kw} is a list
565of keyword arguments, and @var{allow-other-keys?} is a flag indicating
566whether to allow unknown keys. @xref{Function Prologue Instructions,
567@code{bind-kwargs}}, for details on the format of @var{kw}.
568@end deftp
c850030f 569@deftp {Scheme Variable} <glil-bind> . vars
ff73ae34
AW
570An advisory expression that notes a liveness extent for a set of
571variables. @var{vars} is a list of @code{(@var{name} @var{type}
572@var{index})}, where @var{type} should be either @code{argument},
573@code{local}, or @code{external}.
574
575@code{<glil-bind>} expressions end up being serialized as part of a
576program's metadata and do not form part of a program's code path.
c850030f
AW
577@end deftp
578@deftp {Scheme Variable} <glil-mv-bind> vars rest
ff73ae34
AW
579A multiple-value binding of the values on the stack to @var{vars}. Iff
580@var{rest} is true, the last element of @var{vars} will be treated as
581a rest argument.
582
583In addition to pushing a binding annotation on the stack, like
584@code{<glil-bind>}, an expression is emitted at compilation time to
585make sure that there are enough values available to bind. See the
acc51c3e
AW
586notes on @code{truncate-values} in @ref{Procedure Call and Return
587Instructions}, for more information.
c850030f
AW
588@end deftp
589@deftp {Scheme Variable} <glil-unbind>
ff73ae34
AW
590Closes the liveness extent of the most recently encountered
591@code{<glil-bind>} or @code{<glil-mv-bind>} expression. As GLIL
592expressions are compiled, a parallel stack of live bindings is
593maintained; this expression pops off the top element from that stack.
594
595Bindings are written into the program's metadata so that debuggers and
596other tools can determine the set of live local variables at a given
597offset within a VM program.
c850030f
AW
598@end deftp
599@deftp {Scheme Variable} <glil-source> loc
ff73ae34 600Records source information for the preceding expression. @var{loc}
73643339
AW
601should be an association list of containing @code{line} @code{column},
602and @code{filename} keys, e.g. as returned by
603@code{source-properties}.
c850030f
AW
604@end deftp
605@deftp {Scheme Variable} <glil-void>
98850fd7 606Pushes ``the unspecified value'' on the stack.
c850030f
AW
607@end deftp
608@deftp {Scheme Variable} <glil-const> obj
ff73ae34 609Pushes a constant value onto the stack. @var{obj} must be a number,
98850fd7
AW
610string, symbol, keyword, boolean, character, uniform array, the empty
611list, or a pair or vector of constants.
c850030f 612@end deftp
98850fd7
AW
613@deftp {Scheme Variable} <glil-lexical> local? boxed? op index
614Accesses a lexically bound variable. If the variable is not
41e64dd7
AW
615@var{local?} it is free. All variables may have @code{ref},
616@code{set}, and @code{bound?} as their @var{op}. Boxed variables may
617also have the @var{op}s @code{box}, @code{empty-box}, and @code{fix},
618which correspond in semantics to the VM instructions @code{box},
98850fd7
AW
619@code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
620more information.
c850030f
AW
621@end deftp
622@deftp {Scheme Variable} <glil-toplevel> op name
ff73ae34
AW
623Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
624or @code{define}.
c850030f
AW
625@end deftp
626@deftp {Scheme Variable} <glil-module> op mod name public?
73643339
AW
627Accesses a variable within a specific module. See Tree-IL's
628@code{<module-ref>}, for more information.
c850030f
AW
629@end deftp
630@deftp {Scheme Variable} <glil-label> label
ff73ae34
AW
631Creates a new label. @var{label} can be any Scheme value, and should
632be unique.
c850030f
AW
633@end deftp
634@deftp {Scheme Variable} <glil-branch> inst label
ff73ae34 635Branch to a label. @var{label} should be a @code{<ghil-label>}.
c850030f
AW
636@code{inst} is a branching instruction: @code{br-if}, @code{br}, etc.
637@end deftp
638@deftp {Scheme Variable} <glil-call> inst nargs
ff73ae34 639This expression is probably misnamed, as it does not correspond to
c850030f
AW
640function calls. @code{<glil-call>} invokes the VM instruction named
641@var{inst}, noting that it is called with @var{nargs} stack arguments.
ff73ae34
AW
642The arguments should be pushed on the stack already. What happens to
643the stack afterwards depends on the instruction.
c850030f
AW
644@end deftp
645@deftp {Scheme Variable} <glil-mv-call> nargs ra
ff73ae34
AW
646Performs a multiple-value call. @var{ra} is a @code{<glil-label>}
647corresponding to the multiple-value return address for the call. See
acc51c3e
AW
648the notes on @code{mv-call} in @ref{Procedure Call and Return
649Instructions}, for more information.
c850030f 650@end deftp
41e64dd7
AW
651@deftp {Scheme Variable} <glil-prompt> label escape-only?
652Push a dynamic prompt into the stack, with a handler at @var{label}.
653@var{escape-only?} is a flag that is propagated to the prompt,
654allowing an abort to avoid capturing a continuation in some cases.
655@xref{Prompts}, for more information.
656@end deftp
c850030f 657
ff73ae34 658Users may enter in GLIL at the REPL as well, though there is a bit
41e64dd7 659more bookkeeping to do:
00ce5125 660
ff73ae34
AW
661@example
662scheme@@(guile-user)> ,language glil
41e64dd7
AW
663Happy hacking with Guile Lowlevel Intermediate Language (GLIL)!
664To switch back, type `,L scheme'.
665glil@@(guile-user)> (program () (std-prelude 0 0 #f)
666 (const 3) (call return 1))
ff73ae34
AW
667@result{} 3
668@end example
00ce5125 669
ff73ae34
AW
670Just as in all of Guile's compilers, an environment is passed to the
671GLIL-to-object code compiler, and one is returned as well, along with
672the object code.
00ce5125 673
81fd3152
AW
674@node Assembly
675@subsection Assembly
676
73643339
AW
677Assembly is an S-expression-based, human-readable representation of
678the actual bytecodes that will be emitted for the VM. As such, it is a
679useful intermediate language both for compilation and for
680decompilation.
81fd3152 681
73643339
AW
682Besides the fact that it is not a record-based language, assembly
683differs from GLIL in four main ways:
00ce5125 684
73643339
AW
685@itemize
686@item Labels have been resolved to byte offsets in the program.
687@item Constants inside procedures have either been expressed as inline
98850fd7 688instructions or cached in object arrays.
73643339
AW
689@item Procedures with metadata (source location information, liveness
690extents, procedure names, generic properties, etc) have had their
691metadata serialized out to thunks.
692@item All expressions correspond directly to VM instructions -- i.e.,
98850fd7 693there is no @code{<glil-lexical>} which can be a ref or a set.
73643339
AW
694@end itemize
695
696Assembly is isomorphic to the bytecode that it compiles to. You can
697compile to bytecode, then decompile back to assembly, and you have the
698same assembly code.
699
700The general form of assembly instructions is the following:
701
702@lisp
703(@var{inst} @var{arg} ...)
704@end lisp
705
706The @var{inst} names a VM instruction, and its @var{arg}s will be
707embedded in the instruction stream. The easiest way to see assembly is
708to play around with it at the REPL, as can be seen in this annotated
709example:
710
711@example
0a715b9a 712scheme@@(guile-user)> (pp (compile '(+ 32 10) #:to 'assembly))
41e64dd7 713(load-program
0a715b9a
AW
714 ((:LCASE16 . 2)) ; Labels, unused in this case.
715 8 ; Length of the thunk that was compiled.
41e64dd7 716 (load-program ; Metadata thunk.
73643339 717 ()
41e64dd7
AW
718 17
719 #f ; No metadata thunk for the metadata thunk.
720 (make-eol)
721 (make-eol)
0a715b9a
AW
722 (make-int8 2) ; Liveness extents, source info, and arities,
723 (make-int8 8) ; in a format that Guile knows how to parse.
724 (make-int8:0)
41e64dd7
AW
725 (list 0 3)
726 (list 0 1)
727 (list 0 3)
728 (return))
0a715b9a 729 (assert-nargs-ee/locals 0) ; Prologue.
41e64dd7
AW
730 (make-int8 32) ; Actual code starts here.
731 (make-int8 10)
732 (add)
0a715b9a 733 (return))
73643339
AW
734@end example
735
736Of course you can switch the REPL to assembly and enter in assembly
737S-expressions directly, like with other languages, though it is more
738difficult, given that the length fields have to be correct.
739
740@node Bytecode and Objcode
741@subsection Bytecode and Objcode
742
743Finally, the raw bytes. There are actually two different ``languages''
744here, corresponding to two different ways to represent the bytes.
745
746``Bytecode'' represents code as uniform byte vectors, useful for
747structuring and destructuring code on the Scheme level. Bytecode is
748the next step down from assembly:
749
750@example
73643339 751scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
0a715b9a
AW
752@result{} #vu8(8 0 0 0 25 0 0 0 ; Header.
753 95 0 ; Prologue.
754 10 32 10 10 148 66 17 ; Actual code.
755 0 0 0 0 0 0 0 9 ; Metadata thunk.
756 9 10 2 10 8 11 18 0 3 18 0 1 18 0 3 66)
73643339
AW
757@end example
758
759``Objcode'' is bytecode, but mapped directly to a C structure,
760@code{struct scm_objcode}:
761
762@example
763struct scm_objcode @{
73643339
AW
764 scm_t_uint32 len;
765 scm_t_uint32 metalen;
766 scm_t_uint8 base[0];
767@};
768@end example
769
770As one might imagine, objcode imposes a minimum length on the
41e64dd7
AW
771bytecode. Also, the @code{len} and @code{metalen} fields are in native
772endianness, which makes objcode (and bytecode) system-dependent.
73643339
AW
773
774Objcode also has a couple of important efficiency hacks. First,
775objcode may be mapped directly from disk, allowing compiled code to be
776loaded quickly, often from the system's disk cache, and shared among
777multiple processes. Secondly, objcode may be embedded in other
778objcode, allowing procedures to have the text of other procedures
779inlined into their bodies, without the need for separate allocation of
780the code. Of course, the objcode object itself does need to be
781allocated.
782
783Procedures related to objcode are defined in the @code{(system vm
784objcode)} module.
00ce5125 785
ff73ae34
AW
786@deffn {Scheme Procedure} objcode? obj
787@deffnx {C Function} scm_objcode_p (obj)
788Returns @code{#f} iff @var{obj} is object code, @code{#f} otherwise.
789@end deffn
00ce5125 790
73643339 791@deffn {Scheme Procedure} bytecode->objcode bytecode
42a438e8 792@deffnx {C Function} scm_bytecode_to_objcode (bytecode)
ff73ae34 793Makes a bytecode object from @var{bytecode}, which should be a
41e64dd7 794bytevector. @xref{Bytevectors}.
ff73ae34 795@end deffn
e3ba263d 796
ff73ae34
AW
797@deffn {Scheme Variable} load-objcode file
798@deffnx {C Function} scm_load_objcode (file)
799Load object code from a file named @var{file}. The file will be mapped
800into memory via @code{mmap}, so this is a very fast operation.
e3ba263d 801
98850fd7 802On disk, object code has an sixteen-byte cookie prepended to it, to
73643339
AW
803prevent accidental loading of arbitrary garbage.
804@end deffn
805
806@deffn {Scheme Variable} write-objcode objcode file
807@deffnx {C Function} scm_write_objcode (objcode)
41e64dd7 808Write object code out to a file, prepending the sixteen-byte cookie.
ff73ae34 809@end deffn
e3ba263d 810
41e64dd7
AW
811@deffn {Scheme Variable} objcode->bytecode objcode
812@deffnx {C Function} scm_objcode_to_bytecode (objcode)
813Copy object code out to a bytevector for analysis by Scheme.
ff73ae34 814@end deffn
e3ba263d 815
73643339
AW
816The following procedure is actually in @code{(system vm program)}, but
817we'll mention it here:
818
98850fd7
AW
819@deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
820@deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
ff73ae34 821Load up object code into a Scheme program. The resulting program will
73643339 822have @var{objtable} as its object table, which should be a vector or
98850fd7 823@code{#f}, and will capture the free variables from @var{free-vars}.
ff73ae34 824@end deffn
c850030f 825
ff73ae34
AW
826Object code from a file may be disassembled at the REPL via the
827meta-command @code{,disassemble-file}, abbreviated as @code{,xx}.
828Programs may be disassembled via @code{,disassemble}, abbreviated as
829@code{,x}.
830
831Compiling object code to the fake language, @code{value}, is performed
832via loading objcode into a program, then executing that thunk with
833respect to the compilation environment. Normally the environment
834propagates through the compiler transparently, but users may specify
41e64dd7 835the compilation environment manually as well, as a module.
ff73ae34 836
c850030f 837
e63d888e
DK
838@node Writing New High-Level Languages
839@subsection Writing New High-Level Languages
840
841In order to integrate a new language @var{lang} into Guile's compiler
842system, one has to create the module @code{(language @var{lang} spec)}
843containing the language definition and referencing the parser,
844compiler and other routines processing it. The module hierarchy in
845@code{(language brainfuck)} defines a very basic Brainfuck
846implementation meant to serve as easy-to-understand example on how to
4e432dab
AW
847do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
848for more information about the Brainfuck language itself.
849
e63d888e 850
ff73ae34
AW
851@node Extending the Compiler
852@subsection Extending the Compiler
e3ba263d 853
ff73ae34
AW
854At this point, we break with the impersonal tone of the rest of the
855manual, and make an intervention. Admit it: if you've read this far
856into the compiler internals manual, you are a junkie. Perhaps a course
857at your university left you unsated, or perhaps you've always harbored
858a sublimated desire to hack the holy of computer science holies: a
859compiler. Well you're in good company, and in a good position. Guile's
860compiler needs your help.
861
862There are many possible avenues for improving Guile's compiler.
863Probably the most important improvement, speed-wise, will be some form
864of native compilation, both just-in-time and ahead-of-time. This could
865be done in many ways. Probably the easiest strategy would be to extend
866the compiled procedure structure to include a pointer to a native code
86872cc3 867vector, and compile from bytecode to native code at run-time after a
ff73ae34
AW
868procedure is called a certain number of times.
869
870The name of the game is a profiling-based harvest of the low-hanging
871fruit, running programs of interest under a system-level profiler and
872determining which improvements would give the most bang for the buck.
98850fd7
AW
873It's really getting to the point though that native compilation is the
874next step.
ff73ae34
AW
875
876The compiler also needs help at the top end, enhancing the Scheme that
98850fd7
AW
877it knows to also understand R6RS, and adding new high-level compilers.
878We have JavaScript and Emacs Lisp mostly complete, but they could use
879some love; Lua would be nice as well, butq whatever language it is
880that strikes your fancy would be welcome too.
881
882Compilers are for hacking, not for admiring or for complaining about.
883Get to it!