Make snarfing tools more robust to varied C preprocessor behavior.
[bpt/guile.git] / doc / ref / compiler.texi
CommitLineData
8680d53b
AW
1@c -*-texinfo-*-
2@c This is part of the GNU Guile Reference Manual.
5745de91 3@c Copyright (C) 2008, 2009, 2010, 2013
8680d53b
AW
4@c Free Software Foundation, Inc.
5@c See the file guile.texi for copying conditions.
6
7@node Compiling to the Virtual Machine
8@section Compiling to the Virtual Machine
9
00ce5125
AW
10Compilers have a mystique about them that is attractive and
11off-putting at the same time. They are attractive because they are
12magical -- they transform inert text into live results, like throwing
e33e3aee
AW
13the switch on Frankenstein's monster. However, this magic is perceived
14by many to be impenetrable.
00ce5125 15
0b8f3ac5
AW
16This section aims to pay attention to the small man behind the
17curtain.
00ce5125 18
e3ba263d 19@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
98850fd7 20know how to compile your @code{.scm} file.
00ce5125
AW
21
22@menu
23* Compiler Tower::
24* The Scheme Compiler::
81fd3152 25* Tree-IL::
00ce5125 26* GLIL::
81fd3152 27* Assembly::
73643339 28* Bytecode and Objcode::
e63d888e 29* Writing New High-Level Languages::
e3ba263d 30* Extending the Compiler::
00ce5125
AW
31@end menu
32
33@node Compiler Tower
34@subsection Compiler Tower
35
36Guile's compiler is quite simple, actually -- its @emph{compilers}, to
37put it more accurately. Guile defines a tower of languages, starting
38at Scheme and progressively simplifying down to languages that
e3ba263d 39resemble the VM instruction set (@pxref{Instruction Set}).
00ce5125
AW
40
41Each language knows how to compile to the next, so each step is simple
42and understandable. Furthermore, this set of languages is not
43hardcoded into Guile, so it is possible for the user to add new
44high-level languages, new passes, or even different compilation
45targets.
46
e3ba263d
AW
47Languages are registered in the module, @code{(system base language)}:
48
49@example
50(use-modules (system base language))
51@end example
52
53They are registered with the @code{define-language} form.
54
55@deffn {Scheme Syntax} define-language @
994d87be
BT
56 [#:name] [#:title] [#:reader] [#:printer] @
57 [#:parser=#f] [#:compilers='()] @
58 [#:decompilers='()] [#:evaluator=#f] @
59 [#:joiner=#f] [#:for-humans?=#t] @
60 [#:make-default-environment=make-fresh-user-module]
e3ba263d
AW
61Define a language.
62
63This syntax defines a @code{#<language>} object, bound to @var{name}
64in the current environment. In addition, the language will be added to
65the global language set. For example, this is the language definition
66for Scheme:
67
68@example
69(define-language scheme
41e64dd7
AW
70 #:title "Scheme"
71 #:reader (lambda (port env) ...)
98850fd7 72 #:compilers `((tree-il . ,compile-tree-il))
81fd3152 73 #:decompilers `((tree-il . ,decompile-tree-il))
41e64dd7
AW
74 #:evaluator (lambda (x module) (primitive-eval x))
75 #:printer write
76 #:make-default-environment (lambda () ...))
e3ba263d 77@end example
e3ba263d
AW
78@end deffn
79
80The interesting thing about having languages defined this way is that
81they present a uniform interface to the read-eval-print loop. This
82allows the user to change the current language of the REPL:
83
84@example
81fd3152 85scheme@@(guile-user)> ,language tree-il
41e64dd7
AW
86Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
87tree-il@@(guile-user)> ,L scheme
88Happy hacking with Scheme! To switch back, type `,L tree-il'.
89scheme@@(guile-user)>
e3ba263d
AW
90@end example
91
92Languages can be looked up by name, as they were above.
93
94@deffn {Scheme Procedure} lookup-language name
95Looks up a language named @var{name}, autoloading it if necessary.
96
97Languages are autoloaded by looking for a variable named @var{name} in
98a module named @code{(language @var{name} spec)}.
99
100The language object will be returned, or @code{#f} if there does not
101exist a language with that name.
102@end deffn
103
104Defining languages this way allows us to programmatically determine
105the necessary steps for compiling code from one language to another.
106
107@deffn {Scheme Procedure} lookup-compilation-order from to
108Recursively traverses the set of languages to which @var{from} can
109compile, depth-first, and return the first path that can transform
110@var{from} to @var{to}. Returns @code{#f} if no path is found.
111
112This function memoizes its results in a cache that is invalidated by
113subsequent calls to @code{define-language}, so it should be quite
114fast.
115@end deffn
116
5745de91
AW
117There is a notion of a ``current language'', which is maintained in the
118@code{current-language} parameter, defined in the core @code{(guile)}
119module. This language is normally Scheme, and may be rebound by the
120user. The run-time compilation interfaces
e3ba263d
AW
121(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
122and target languages.
123
124The normal tower of languages when compiling Scheme goes like this:
125
126@itemize
41e64dd7 127@item Scheme
81fd3152 128@item Tree Intermediate Language (Tree-IL)
41e64dd7 129@item Guile Lowlevel Intermediate Language (GLIL)
81fd3152
AW
130@item Assembly
131@item Bytecode
73643339 132@item Objcode
e3ba263d
AW
133@end itemize
134
135Object code may be serialized to disk directly, though it has a cookie
73643339
AW
136and version prepended to the front. But when compiling Scheme at run
137time, you want a Scheme value: for example, a compiled procedure. For
138this reason, so as not to break the abstraction, Guile defines a fake
81fd3152
AW
139language at the bottom of the tower:
140
141@itemize
142@item Value
143@end itemize
144
145Compiling to @code{value} loads the object code into a procedure, and
146wakes the sleeping giant.
e3ba263d
AW
147
148Perhaps this strangeness can be explained by example:
149@code{compile-file} defaults to compiling to object code, because it
150produces object code that has to live in the barren world outside the
151Guile runtime; but @code{compile} defaults to compiling to
152@code{value}, as its product re-enters the Guile world.
153
154Indeed, the process of compilation can circulate through these
155different worlds indefinitely, as shown by the following quine:
156
157@example
00ce5125 158((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
e3ba263d 159@end example
00ce5125
AW
160
161@node The Scheme Compiler
162@subsection The Scheme Compiler
163
81fd3152
AW
164The job of the Scheme compiler is to expand all macros and all of
165Scheme to its most primitive expressions. The definition of
166``primitive'' is given by the inventory of constructs provided by
167Tree-IL, the target language of the Scheme compiler: procedure
168applications, conditionals, lexical references, etc. This is described
169more fully in the next section.
170
171The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
172that it is completely implemented by the macro expander. Since the
173macro expander has to run over all of the source code already in order
174to expand macros, it might as well do the analysis at the same time,
175producing Tree-IL expressions directly.
176
177Because this compiler is actually the macro expander, it is
178extensible. Any macro which the user writes becomes part of the
179compiler.
180
181The Scheme-to-Tree-IL expander may be invoked using the generic
182@code{compile} procedure:
183
184@lisp
185(compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
186@result{}
187 #<<application> src: #f
188 proc: #<<toplevel-ref> src: #f name: +>
189 args: (#<<const> src: #f exp: 1>
190 #<<const> src: #f exp: 2>)>
191@end lisp
192
193Or, since Tree-IL is so close to Scheme, it is often useful to expand
194Scheme to Tree-IL, then translate back to Scheme. For that reason the
195expander provides two interfaces. The former is equivalent to calling
41e64dd7 196@code{(macroexpand '(+ 1 2) 'c)}, where the @code{'c} is for
81fd3152
AW
197``compile''. With @code{'e} (the default), the result is translated
198back to Scheme:
199
200@lisp
41e64dd7 201(macroexpand '(+ 1 2))
81fd3152 202@result{} (+ 1 2)
41e64dd7 203(macroexpand '(let ((x 10)) (* x x)))
81fd3152
AW
204@result{} (let ((x84 10)) (* x84 x84))
205@end lisp
206
207The second example shows that as part of its job, the macro expander
208renames lexically-bound variables. The original names are preserved
209when compiling to Tree-IL, but can't be represented in Scheme: a
210lexical binding only has one name. It is for this reason that the
211@emph{native} output of the expander is @emph{not} Scheme. There's too
212much information we would lose if we translated to Scheme directly:
213lexical variable names, source locations, and module hygiene.
214
41e64dd7
AW
215Note however that @code{macroexpand} does not have the same signature
216as @code{compile-tree-il}. @code{compile-tree-il} is a small wrapper
217around @code{macroexpand}, to make it conform to the general form of
81fd3152
AW
218compiler procedures in Guile's language tower.
219
98850fd7
AW
220Compiler procedures take three arguments: an expression, an
221environment, and a keyword list of options. They return three values:
222the compiled expression, the corresponding environment for the target
223language, and a ``continuation environment''. The compiled expression
224and environment will serve as input to the next language's compiler.
225The ``continuation environment'' can be used to compile another
226expression from the same source language within the same module.
81fd3152
AW
227
228For example, you might compile the expression, @code{(define-module
229(foo))}. This will result in a Tree-IL expression and environment. But
230if you compiled a second expression, you would want to take into
231account the compile-time effect of compiling the previous expression,
232which puts the user in the @code{(foo)} module. That is purpose of the
233``continuation environment''; you would pass it as the environment
234when compiling the subsequent expression.
235
41e64dd7
AW
236For Scheme, an environment is a module. By default, the @code{compile}
237and @code{compile-file} procedures compile in a fresh module, such
238that bindings and macros introduced by the expression being compiled
239are isolated:
1ebe6a63
LC
240
241@example
242(eq? (current-module) (compile '(current-module)))
243@result{} #f
244
245(compile '(define hello 'world))
246(defined? 'hello)
247@result{} #f
248
249(define / *)
250(eq? (compile '/) /)
251@result{} #f
252@end example
253
254Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
255@code{current-reader}}) are isolated:
256
257@example
258(compile '(fluid-set! current-reader (lambda args 'fail)))
259(fluid-ref current-reader)
260@result{} #f
261@end example
262
263Nevertheless, having the compiler and @dfn{compilee} share the same name
264space can be achieved by explicitly passing @code{(current-module)} as
265the compilation environment:
266
267@example
268(define hello 'world)
269(compile 'hello #:env (current-module))
270@result{} world
271@end example
272
81fd3152
AW
273@node Tree-IL
274@subsection Tree-IL
00ce5125 275
81fd3152 276Tree Intermediate Language (Tree-IL) is a structured intermediate
c850030f
AW
277language that is close in expressive power to Scheme. It is an
278expanded, pre-analyzed Scheme.
279
81fd3152
AW
280Tree-IL is ``structured'' in the sense that its representation is
281based on records, not S-expressions. This gives a rigidity to the
282language that ensures that compiling to a lower-level language only
41e64dd7
AW
283requires a limited set of transformations. For example, the Tree-IL
284type @code{<const>} is a record type with two fields, @code{src} and
285@code{exp}. Instances of this type are created via @code{make-const}.
286Fields of this type are accessed via the @code{const-src} and
287@code{const-exp} procedures. There is also a predicate, @code{const?}.
288@xref{Records}, for more information on records.
81fd3152
AW
289
290@c alpha renaming
291
292All Tree-IL types have a @code{src} slot, which holds source location
293information for the expression. This information, if present, will be
294residualized into the compiled object code, allowing backtraces to
295show source information. The format of @code{src} is the same as that
296returned by Guile's @code{source-properties} function. @xref{Source
297Properties}, for more information.
298
299Although Tree-IL objects are represented internally using records,
300there is also an equivalent S-expression external representation for
ecb87335 301each kind of Tree-IL. For example, the S-expression representation
81fd3152 302of @code{#<const src: #f exp: 3>} expression would be:
c850030f
AW
303
304@example
81fd3152 305(const 3)
c850030f
AW
306@end example
307
81fd3152 308Users may program with this format directly at the REPL:
c850030f
AW
309
310@example
81fd3152 311scheme@@(guile-user)> ,language tree-il
41e64dd7 312Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
81fd3152 313tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
c850030f
AW
314@result{} 42
315@end example
316
81fd3152
AW
317The @code{src} fields are left out of the external representation.
318
98850fd7
AW
319One may create Tree-IL objects from their external representations via
320calling @code{parse-tree-il}, the reader for Tree-IL. If any source
321information is attached to the input S-expression, it will be
322propagated to the resulting Tree-IL expressions. This is probably the
323easiest way to compile to Tree-IL: just make the appropriate external
324representations in S-expression format, and let @code{parse-tree-il}
325take care of the rest.
326
81fd3152
AW
327@deftp {Scheme Variable} <void> src
328@deftpx {External Representation} (void)
329An empty expression. In practice, equivalent to Scheme's @code{(if #f
330#f)}.
331@end deftp
332@deftp {Scheme Variable} <const> src exp
333@deftpx {External Representation} (const @var{exp})
334A constant.
335@end deftp
336@deftp {Scheme Variable} <primitive-ref> src name
337@deftpx {External Representation} (primitive @var{name})
338A reference to a ``primitive''. A primitive is a procedure that, when
339compiled, may be open-coded. For example, @code{cons} is usually
340recognized as a primitive, so that it compiles down to a single
341instruction.
342
343Compilation of Tree-IL usually begins with a pass that resolves some
344@code{<module-ref>} and @code{<toplevel-ref>} expressions to
345@code{<primitive-ref>} expressions. The actual compilation pass
346has special cases for applications of certain primitives, like
347@code{apply} or @code{cons}.
348@end deftp
349@deftp {Scheme Variable} <lexical-ref> src name gensym
350@deftpx {External Representation} (lexical @var{name} @var{gensym})
351A reference to a lexically-bound variable. The @var{name} is the
352original name of the variable in the source program. @var{gensym} is a
353unique identifier for this variable.
354@end deftp
355@deftp {Scheme Variable} <lexical-set> src name gensym exp
356@deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
357Sets a lexically-bound variable.
358@end deftp
359@deftp {Scheme Variable} <module-ref> src mod name public?
360@deftpx {External Representation} (@@ @var{mod} @var{name})
361@deftpx {External Representation} (@@@@ @var{mod} @var{name})
362A reference to a variable in a specific module. @var{mod} should be
679cceed 363the name of the module, e.g.@: @code{(guile-user)}.
81fd3152
AW
364
365If @var{public?} is true, the variable named @var{name} will be looked
366up in @var{mod}'s public interface, and serialized with @code{@@};
367otherwise it will be looked up among the module's private bindings,
368and is serialized with @code{@@@@}.
369@end deftp
370@deftp {Scheme Variable} <module-set> src mod name public? exp
371@deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
372@deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
373Sets a variable in a specific module.
374@end deftp
375@deftp {Scheme Variable} <toplevel-ref> src name
376@deftpx {External Representation} (toplevel @var{name})
377References a variable from the current procedure's module.
378@end deftp
379@deftp {Scheme Variable} <toplevel-set> src name exp
380@deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
381Sets a variable in the current procedure's module.
382@end deftp
383@deftp {Scheme Variable} <toplevel-define> src name exp
384@deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
385Defines a new top-level variable in the current procedure's module.
386@end deftp
387@deftp {Scheme Variable} <conditional> src test then else
388@deftpx {External Representation} (if @var{test} @var{then} @var{else})
ca445ba5 389A conditional. Note that @var{else} is not optional.
c850030f 390@end deftp
81fd3152
AW
391@deftp {Scheme Variable} <application> src proc args
392@deftpx {External Representation} (apply @var{proc} . @var{args})
ca445ba5 393A procedure call.
c850030f 394@end deftp
81fd3152
AW
395@deftp {Scheme Variable} <sequence> src exps
396@deftpx {External Representation} (begin . @var{exps})
397Like Scheme's @code{begin}.
c850030f 398@end deftp
41e64dd7
AW
399@deftp {Scheme Variable} <lambda> src meta body
400@deftpx {External Representation} (lambda @var{meta} @var{body})
401A closure. @var{meta} is an association list of properties for the
402procedure. @var{body} is a single Tree-IL expression of type
403@code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
404an alternate clause, this makes Tree-IL's @code{<lambda>} have the
405expressiveness of Scheme's @code{case-lambda}.
406@end deftp
407@deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
408@deftpx {External Representation} @
409 (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
410 @var{body})@
411 [@var{alternate}])
412One clause of a @code{case-lambda}. A @code{lambda} expression in
413Scheme is treated as a @code{case-lambda} with one clause.
414
415@var{req} is a list of the procedure's required arguments, as symbols.
416@var{opt} is a list of the optional arguments, or @code{#f} if there
417are no optional arguments. @var{rest} is the name of the rest
418argument, or @code{#f}.
419
420@var{kw} is a list of the form, @code{(@var{allow-other-keys?}
421(@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
422keyword corresponding to the argument named @var{name}, and whose
423corresponding gensym is @var{var}. @var{inits} are tree-il expressions
ecb87335 424corresponding to all of the optional and keyword arguments, evaluated
41e64dd7
AW
425to bind variables whose value is not supplied by the procedure caller.
426Each @var{init} expression is evaluated in the lexical context of
427previously bound variables, from left to right.
428
429@var{gensyms} is a list of gensyms corresponding to all arguments:
430first all of the required arguments, then the optional arguments if
431any, then the rest argument if any, then all of the keyword arguments.
432
433@var{body} is the body of the clause. If the procedure is called with
434an appropriate number of arguments, @var{body} is evaluated in tail
64de6db5 435position. Otherwise, if there is an @var{alternate}, it should be a
41e64dd7 436@code{<lambda-case>} expression, representing the next clause to try.
64de6db5 437If there is no @var{alternate}, a wrong-number-of-arguments error is
41e64dd7
AW
438signaled.
439@end deftp
440@deftp {Scheme Variable} <let> src names gensyms vals exp
441@deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
81fd3152 442Lexical binding, like Scheme's @code{let}. @var{names} are the
41e64dd7 443original binding names, @var{gensyms} are gensyms corresponding to the
81fd3152
AW
444@var{names}, and @var{vals} are Tree-IL expressions for the values.
445@var{exp} is a single Tree-IL expression.
446@end deftp
935c7aca 447@deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
172988ee
AW
448@deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
449@deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
81fd3152 450A version of @code{<let>} that creates recursive bindings, like
935c7aca 451Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
81fd3152 452@end deftp
41e64dd7
AW
453@deftp {Scheme Variable} <dynlet> fluids vals body
454@deftpx {External Representation} (dynlet @var{fluids} @var{vals} @var{body})
455Dynamic binding; the equivalent of Scheme's @code{with-fluids}.
456@var{fluids} should be a list of Tree-IL expressions that will
457evaluate to fluids, and @var{vals} a corresponding list of expressions
458to bind to the fluids during the dynamic extent of the evaluation of
459@var{body}.
460@end deftp
461@deftp {Scheme Variable} <dynref> fluid
462@deftpx {External Representation} (dynref @var{fluid})
463A dynamic variable reference. @var{fluid} should be a Tree-IL
464expression evaluating to a fluid.
465@end deftp
466@deftp {Scheme Variable} <dynset> fluid exp
467@deftpx {External Representation} (dynset @var{fluid} @var{exp})
468A dynamic variable set. @var{fluid}, a Tree-IL expression evaluating
469to a fluid, will be set to the result of evaluating @var{exp}.
470@end deftp
471@deftp {Scheme Variable} <dynwind> winder body unwinder
472@deftpx {External Representation} (dynwind @var{winder} @var{body} @var{unwinder})
473A @code{dynamic-wind}. @var{winder} and @var{unwinder} should both
474evaluate to thunks. Ensure that the winder and the unwinder are called
475before entering and after leaving @var{body}. Note that @var{body} is
476an expression, without a thunk wrapper.
477@end deftp
478@deftp {Scheme Variable} <prompt> tag body handler
479@deftpx {External Representation} (prompt @var{tag} @var{body} @var{handler})
480A dynamic prompt. Instates a prompt named @var{tag}, an expression,
481during the dynamic extent of the execution of @var{body}, also an
482expression. If an abort occurs to this prompt, control will be passed
483to @var{handler}, a @code{<lambda-case>} expression with no optional
484or keyword arguments, and no alternate. The first argument to the
485@code{<lambda-case>} will be the captured continuation, and then all
486of the values passed to the abort. @xref{Prompts}, for more
487information.
488@end deftp
489@deftp {Scheme Variable} <abort> tag args tail
490@deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
491An abort to the nearest prompt with the name @var{tag}, an expression.
492@var{args} should be a list of expressions to pass to the prompt's
493handler, and @var{tail} should be an expression that will evaluate to
494a list of additional arguments. An abort will save the partial
495continuation, which may later be reinstated, resulting in the
496@code{<abort>} expression evaluating to some number of values.
497@end deftp
81fd3152 498
98850fd7
AW
499There are two Tree-IL constructs that are not normally produced by
500higher-level compilers, but instead are generated during the
501source-to-source optimization and analysis passes that the Tree-IL
502compiler does. Users should not generate these expressions directly,
503unless they feel very clever, as the default analysis pass will
504generate them as necessary.
505
41e64dd7
AW
506@deftp {Scheme Variable} <let-values> src names gensyms exp body
507@deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
98850fd7
AW
508Like Scheme's @code{receive} -- binds the values returned by
509evaluating @code{exp} to the @code{lambda}-like bindings described by
41e64dd7 510@var{gensyms}. That is to say, @var{gensyms} may be an improper list.
98850fd7
AW
511
512@code{<let-values>} is an optimization of @code{<application>} of the
513primitive, @code{call-with-values}.
514@end deftp
41e64dd7
AW
515@deftp {Scheme Variable} <fix> src names gensyms vals body
516@deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
98850fd7
AW
517Like @code{<letrec>}, but only for @var{vals} that are unset
518@code{lambda} expressions.
519
520@code{fix} is an optimization of @code{letrec} (and @code{let}).
521@end deftp
81fd3152
AW
522
523Tree-IL implements a compiler to GLIL that recursively traverses
524Tree-IL expressions, writing out GLIL expressions into a linear list.
525The compiler also keeps some state as to whether the current
526expression is in tail context, and whether its value will be used in
527future computations. This state allows the compiler not to emit code
679cceed 528for constant expressions that will not be used (e.g.@: docstrings), and
81fd3152
AW
529to perform tail calls when in tail position.
530
98850fd7
AW
531Most optimization, such as it currently is, is performed on Tree-IL
532expressions as source-to-source transformations. There will be more
533optimizations added in the future.
c850030f
AW
534
535Interested readers are encouraged to read the implementation in
81fd3152 536@code{(language tree-il compile-glil)} for more details.
00ce5125
AW
537
538@node GLIL
539@subsection GLIL
540
41e64dd7 541Guile Lowlevel Intermediate Language (GLIL) is a structured intermediate
81fd3152 542language whose expressions more closely approximate Guile's VM
98850fd7
AW
543instruction set. Its expression types are defined in @code{(language
544glil)}.
c850030f 545
41e64dd7 546@deftp {Scheme Variable} <glil-program> meta . body
86872cc3 547A unit of code that at run-time will correspond to a compiled
41e64dd7 548procedure. @var{meta} should be an alist of properties, as in
98850fd7
AW
549Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
550expressions.
c850030f 551@end deftp
41e64dd7
AW
552@deftp {Scheme Variable} <glil-std-prelude> nreq nlocs else-label
553A prologue for a function with no optional, keyword, or rest
554arguments. @var{nreq} is the number of required arguments. @var{nlocs}
555the total number of local variables, including the arguments. If the
556procedure was not given exactly @var{nreq} arguments, control will
557jump to @var{else-label}, if given, or otherwise signal an error.
558@end deftp
559@deftp {Scheme Variable} <glil-opt-prelude> nreq nopt rest nlocs else-label
560A prologue for a function with optional or rest arguments. Like
561@code{<glil-std-prelude>}, with the addition that @var{nopt} is the
562number of optional arguments (possibly zero) and @var{rest} is an
563index of a local variable at which to bind a rest argument, or
564@code{#f} if there is no rest argument.
565@end deftp
566@deftp {Scheme Variable} <glil-kw-prelude> nreq nopt rest kw allow-other-keys? nlocs else-label
567A prologue for a function with keyword arguments. Like
568@code{<glil-opt-prelude>}, with the addition that @var{kw} is a list
569of keyword arguments, and @var{allow-other-keys?} is a flag indicating
570whether to allow unknown keys. @xref{Function Prologue Instructions,
571@code{bind-kwargs}}, for details on the format of @var{kw}.
572@end deftp
c850030f 573@deftp {Scheme Variable} <glil-bind> . vars
ff73ae34
AW
574An advisory expression that notes a liveness extent for a set of
575variables. @var{vars} is a list of @code{(@var{name} @var{type}
576@var{index})}, where @var{type} should be either @code{argument},
577@code{local}, or @code{external}.
578
579@code{<glil-bind>} expressions end up being serialized as part of a
580program's metadata and do not form part of a program's code path.
c850030f
AW
581@end deftp
582@deftp {Scheme Variable} <glil-mv-bind> vars rest
a4b4fbbd
JE
583A multiple-value binding of the values on the stack to @var{vars}. If
584@var{rest} is true, the last element of @var{vars} will be treated as a
585rest argument.
ff73ae34
AW
586
587In addition to pushing a binding annotation on the stack, like
588@code{<glil-bind>}, an expression is emitted at compilation time to
589make sure that there are enough values available to bind. See the
acc51c3e
AW
590notes on @code{truncate-values} in @ref{Procedure Call and Return
591Instructions}, for more information.
c850030f
AW
592@end deftp
593@deftp {Scheme Variable} <glil-unbind>
ff73ae34
AW
594Closes the liveness extent of the most recently encountered
595@code{<glil-bind>} or @code{<glil-mv-bind>} expression. As GLIL
596expressions are compiled, a parallel stack of live bindings is
597maintained; this expression pops off the top element from that stack.
598
599Bindings are written into the program's metadata so that debuggers and
600other tools can determine the set of live local variables at a given
601offset within a VM program.
c850030f
AW
602@end deftp
603@deftp {Scheme Variable} <glil-source> loc
ff73ae34 604Records source information for the preceding expression. @var{loc}
73643339 605should be an association list of containing @code{line} @code{column},
679cceed 606and @code{filename} keys, e.g.@: as returned by
73643339 607@code{source-properties}.
c850030f
AW
608@end deftp
609@deftp {Scheme Variable} <glil-void>
98850fd7 610Pushes ``the unspecified value'' on the stack.
c850030f
AW
611@end deftp
612@deftp {Scheme Variable} <glil-const> obj
ff73ae34 613Pushes a constant value onto the stack. @var{obj} must be a number,
98850fd7
AW
614string, symbol, keyword, boolean, character, uniform array, the empty
615list, or a pair or vector of constants.
c850030f 616@end deftp
98850fd7
AW
617@deftp {Scheme Variable} <glil-lexical> local? boxed? op index
618Accesses a lexically bound variable. If the variable is not
41e64dd7
AW
619@var{local?} it is free. All variables may have @code{ref},
620@code{set}, and @code{bound?} as their @var{op}. Boxed variables may
621also have the @var{op}s @code{box}, @code{empty-box}, and @code{fix},
622which correspond in semantics to the VM instructions @code{box},
98850fd7
AW
623@code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
624more information.
c850030f
AW
625@end deftp
626@deftp {Scheme Variable} <glil-toplevel> op name
ff73ae34
AW
627Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
628or @code{define}.
c850030f
AW
629@end deftp
630@deftp {Scheme Variable} <glil-module> op mod name public?
73643339
AW
631Accesses a variable within a specific module. See Tree-IL's
632@code{<module-ref>}, for more information.
c850030f
AW
633@end deftp
634@deftp {Scheme Variable} <glil-label> label
ff73ae34
AW
635Creates a new label. @var{label} can be any Scheme value, and should
636be unique.
c850030f
AW
637@end deftp
638@deftp {Scheme Variable} <glil-branch> inst label
ff73ae34 639Branch to a label. @var{label} should be a @code{<ghil-label>}.
c850030f
AW
640@code{inst} is a branching instruction: @code{br-if}, @code{br}, etc.
641@end deftp
642@deftp {Scheme Variable} <glil-call> inst nargs
ff73ae34 643This expression is probably misnamed, as it does not correspond to
c850030f
AW
644function calls. @code{<glil-call>} invokes the VM instruction named
645@var{inst}, noting that it is called with @var{nargs} stack arguments.
ff73ae34
AW
646The arguments should be pushed on the stack already. What happens to
647the stack afterwards depends on the instruction.
c850030f
AW
648@end deftp
649@deftp {Scheme Variable} <glil-mv-call> nargs ra
ff73ae34
AW
650Performs a multiple-value call. @var{ra} is a @code{<glil-label>}
651corresponding to the multiple-value return address for the call. See
acc51c3e
AW
652the notes on @code{mv-call} in @ref{Procedure Call and Return
653Instructions}, for more information.
c850030f 654@end deftp
41e64dd7
AW
655@deftp {Scheme Variable} <glil-prompt> label escape-only?
656Push a dynamic prompt into the stack, with a handler at @var{label}.
657@var{escape-only?} is a flag that is propagated to the prompt,
658allowing an abort to avoid capturing a continuation in some cases.
659@xref{Prompts}, for more information.
660@end deftp
c850030f 661
ff73ae34 662Users may enter in GLIL at the REPL as well, though there is a bit
41e64dd7 663more bookkeeping to do:
00ce5125 664
ff73ae34
AW
665@example
666scheme@@(guile-user)> ,language glil
41e64dd7
AW
667Happy hacking with Guile Lowlevel Intermediate Language (GLIL)!
668To switch back, type `,L scheme'.
669glil@@(guile-user)> (program () (std-prelude 0 0 #f)
670 (const 3) (call return 1))
ff73ae34
AW
671@result{} 3
672@end example
00ce5125 673
ff73ae34
AW
674Just as in all of Guile's compilers, an environment is passed to the
675GLIL-to-object code compiler, and one is returned as well, along with
676the object code.
00ce5125 677
81fd3152
AW
678@node Assembly
679@subsection Assembly
680
73643339
AW
681Assembly is an S-expression-based, human-readable representation of
682the actual bytecodes that will be emitted for the VM. As such, it is a
683useful intermediate language both for compilation and for
684decompilation.
81fd3152 685
73643339
AW
686Besides the fact that it is not a record-based language, assembly
687differs from GLIL in four main ways:
00ce5125 688
73643339
AW
689@itemize
690@item Labels have been resolved to byte offsets in the program.
691@item Constants inside procedures have either been expressed as inline
98850fd7 692instructions or cached in object arrays.
73643339
AW
693@item Procedures with metadata (source location information, liveness
694extents, procedure names, generic properties, etc) have had their
695metadata serialized out to thunks.
696@item All expressions correspond directly to VM instructions -- i.e.,
98850fd7 697there is no @code{<glil-lexical>} which can be a ref or a set.
73643339
AW
698@end itemize
699
700Assembly is isomorphic to the bytecode that it compiles to. You can
701compile to bytecode, then decompile back to assembly, and you have the
702same assembly code.
703
704The general form of assembly instructions is the following:
705
706@lisp
707(@var{inst} @var{arg} ...)
708@end lisp
709
710The @var{inst} names a VM instruction, and its @var{arg}s will be
711embedded in the instruction stream. The easiest way to see assembly is
712to play around with it at the REPL, as can be seen in this annotated
713example:
714
715@example
dc3b2661 716scheme@@(guile-user)> ,pp (compile '(+ 32 10) #:to 'assembly)
41e64dd7 717(load-program
0a715b9a
AW
718 ((:LCASE16 . 2)) ; Labels, unused in this case.
719 8 ; Length of the thunk that was compiled.
41e64dd7 720 (load-program ; Metadata thunk.
73643339 721 ()
41e64dd7
AW
722 17
723 #f ; No metadata thunk for the metadata thunk.
724 (make-eol)
725 (make-eol)
0a715b9a
AW
726 (make-int8 2) ; Liveness extents, source info, and arities,
727 (make-int8 8) ; in a format that Guile knows how to parse.
728 (make-int8:0)
41e64dd7
AW
729 (list 0 3)
730 (list 0 1)
731 (list 0 3)
732 (return))
0a715b9a 733 (assert-nargs-ee/locals 0) ; Prologue.
41e64dd7
AW
734 (make-int8 32) ; Actual code starts here.
735 (make-int8 10)
736 (add)
0a715b9a 737 (return))
73643339
AW
738@end example
739
740Of course you can switch the REPL to assembly and enter in assembly
741S-expressions directly, like with other languages, though it is more
742difficult, given that the length fields have to be correct.
743
744@node Bytecode and Objcode
745@subsection Bytecode and Objcode
746
747Finally, the raw bytes. There are actually two different ``languages''
748here, corresponding to two different ways to represent the bytes.
749
750``Bytecode'' represents code as uniform byte vectors, useful for
751structuring and destructuring code on the Scheme level. Bytecode is
752the next step down from assembly:
753
754@example
73643339 755scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
0a715b9a
AW
756@result{} #vu8(8 0 0 0 25 0 0 0 ; Header.
757 95 0 ; Prologue.
758 10 32 10 10 148 66 17 ; Actual code.
759 0 0 0 0 0 0 0 9 ; Metadata thunk.
760 9 10 2 10 8 11 18 0 3 18 0 1 18 0 3 66)
73643339
AW
761@end example
762
763``Objcode'' is bytecode, but mapped directly to a C structure,
764@code{struct scm_objcode}:
765
766@example
767struct scm_objcode @{
73643339
AW
768 scm_t_uint32 len;
769 scm_t_uint32 metalen;
770 scm_t_uint8 base[0];
771@};
772@end example
773
774As one might imagine, objcode imposes a minimum length on the
41e64dd7
AW
775bytecode. Also, the @code{len} and @code{metalen} fields are in native
776endianness, which makes objcode (and bytecode) system-dependent.
73643339
AW
777
778Objcode also has a couple of important efficiency hacks. First,
779objcode may be mapped directly from disk, allowing compiled code to be
780loaded quickly, often from the system's disk cache, and shared among
781multiple processes. Secondly, objcode may be embedded in other
782objcode, allowing procedures to have the text of other procedures
783inlined into their bodies, without the need for separate allocation of
784the code. Of course, the objcode object itself does need to be
785allocated.
786
787Procedures related to objcode are defined in the @code{(system vm
788objcode)} module.
00ce5125 789
ff73ae34
AW
790@deffn {Scheme Procedure} objcode? obj
791@deffnx {C Function} scm_objcode_p (obj)
a4b4fbbd 792Returns @code{#f} if @var{obj} is object code, @code{#f} otherwise.
ff73ae34 793@end deffn
00ce5125 794
73643339 795@deffn {Scheme Procedure} bytecode->objcode bytecode
42a438e8 796@deffnx {C Function} scm_bytecode_to_objcode (bytecode)
ff73ae34 797Makes a bytecode object from @var{bytecode}, which should be a
41e64dd7 798bytevector. @xref{Bytevectors}.
ff73ae34 799@end deffn
e3ba263d 800
ff73ae34
AW
801@deffn {Scheme Variable} load-objcode file
802@deffnx {C Function} scm_load_objcode (file)
803Load object code from a file named @var{file}. The file will be mapped
804into memory via @code{mmap}, so this is a very fast operation.
e3ba263d 805
98850fd7 806On disk, object code has an sixteen-byte cookie prepended to it, to
73643339
AW
807prevent accidental loading of arbitrary garbage.
808@end deffn
809
810@deffn {Scheme Variable} write-objcode objcode file
811@deffnx {C Function} scm_write_objcode (objcode)
41e64dd7 812Write object code out to a file, prepending the sixteen-byte cookie.
ff73ae34 813@end deffn
e3ba263d 814
41e64dd7
AW
815@deffn {Scheme Variable} objcode->bytecode objcode
816@deffnx {C Function} scm_objcode_to_bytecode (objcode)
817Copy object code out to a bytevector for analysis by Scheme.
ff73ae34 818@end deffn
e3ba263d 819
73643339
AW
820The following procedure is actually in @code{(system vm program)}, but
821we'll mention it here:
822
98850fd7
AW
823@deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
824@deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
ff73ae34 825Load up object code into a Scheme program. The resulting program will
73643339 826have @var{objtable} as its object table, which should be a vector or
98850fd7 827@code{#f}, and will capture the free variables from @var{free-vars}.
ff73ae34 828@end deffn
c850030f 829
ff73ae34
AW
830Object code from a file may be disassembled at the REPL via the
831meta-command @code{,disassemble-file}, abbreviated as @code{,xx}.
832Programs may be disassembled via @code{,disassemble}, abbreviated as
833@code{,x}.
834
835Compiling object code to the fake language, @code{value}, is performed
836via loading objcode into a program, then executing that thunk with
837respect to the compilation environment. Normally the environment
838propagates through the compiler transparently, but users may specify
41e64dd7 839the compilation environment manually as well, as a module.
ff73ae34 840
c850030f 841
e63d888e
DK
842@node Writing New High-Level Languages
843@subsection Writing New High-Level Languages
844
845In order to integrate a new language @var{lang} into Guile's compiler
846system, one has to create the module @code{(language @var{lang} spec)}
847containing the language definition and referencing the parser,
848compiler and other routines processing it. The module hierarchy in
849@code{(language brainfuck)} defines a very basic Brainfuck
850implementation meant to serve as easy-to-understand example on how to
4e432dab
AW
851do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
852for more information about the Brainfuck language itself.
853
e63d888e 854
ff73ae34
AW
855@node Extending the Compiler
856@subsection Extending the Compiler
e3ba263d 857
8fa6525e
NJ
858At this point we take a detour from the impersonal tone of the rest of
859the manual. Admit it: if you've read this far into the compiler
860internals manual, you are a junkie. Perhaps a course at your university
861left you unsated, or perhaps you've always harbored a desire to hack the
862holy of computer science holies: a compiler. Well you're in good
863company, and in a good position. Guile's compiler needs your help.
ff73ae34
AW
864
865There are many possible avenues for improving Guile's compiler.
866Probably the most important improvement, speed-wise, will be some form
867of native compilation, both just-in-time and ahead-of-time. This could
868be done in many ways. Probably the easiest strategy would be to extend
869the compiled procedure structure to include a pointer to a native code
86872cc3 870vector, and compile from bytecode to native code at run-time after a
ff73ae34
AW
871procedure is called a certain number of times.
872
873The name of the game is a profiling-based harvest of the low-hanging
874fruit, running programs of interest under a system-level profiler and
875determining which improvements would give the most bang for the buck.
98850fd7
AW
876It's really getting to the point though that native compilation is the
877next step.
ff73ae34
AW
878
879The compiler also needs help at the top end, enhancing the Scheme that
98850fd7
AW
880it knows to also understand R6RS, and adding new high-level compilers.
881We have JavaScript and Emacs Lisp mostly complete, but they could use
ecb87335 882some love; Lua would be nice as well, but whatever language it is
98850fd7
AW
883that strikes your fancy would be welcome too.
884
885Compilers are for hacking, not for admiring or for complaining about.
886Get to it!