update uniform vector docs
[bpt/guile.git] / doc / ref / compiler.texi
CommitLineData
8680d53b
AW
1@c -*-texinfo-*-
2@c This is part of the GNU Guile Reference Manual.
e63d888e 3@c Copyright (C) 2008, 2009
8680d53b
AW
4@c Free Software Foundation, Inc.
5@c See the file guile.texi for copying conditions.
6
7@node Compiling to the Virtual Machine
8@section Compiling to the Virtual Machine
9
00ce5125
AW
10Compilers have a mystique about them that is attractive and
11off-putting at the same time. They are attractive because they are
12magical -- they transform inert text into live results, like throwing
e33e3aee
AW
13the switch on Frankenstein's monster. However, this magic is perceived
14by many to be impenetrable.
00ce5125 15
0b8f3ac5
AW
16This section aims to pay attention to the small man behind the
17curtain.
00ce5125 18
e3ba263d 19@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
98850fd7 20know how to compile your @code{.scm} file.
00ce5125
AW
21
22@menu
23* Compiler Tower::
24* The Scheme Compiler::
81fd3152 25* Tree-IL::
00ce5125 26* GLIL::
81fd3152 27* Assembly::
73643339 28* Bytecode and Objcode::
e63d888e 29* Writing New High-Level Languages::
e3ba263d 30* Extending the Compiler::
00ce5125
AW
31@end menu
32
33@node Compiler Tower
34@subsection Compiler Tower
35
36Guile's compiler is quite simple, actually -- its @emph{compilers}, to
37put it more accurately. Guile defines a tower of languages, starting
38at Scheme and progressively simplifying down to languages that
e3ba263d 39resemble the VM instruction set (@pxref{Instruction Set}).
00ce5125
AW
40
41Each language knows how to compile to the next, so each step is simple
42and understandable. Furthermore, this set of languages is not
43hardcoded into Guile, so it is possible for the user to add new
44high-level languages, new passes, or even different compilation
45targets.
46
e3ba263d
AW
47Languages are registered in the module, @code{(system base language)}:
48
49@example
50(use-modules (system base language))
51@end example
52
53They are registered with the @code{define-language} form.
54
55@deffn {Scheme Syntax} define-language @
56name title version reader printer @
81fd3152 57[parser=#f] [compilers='()] [decompilers='()] [evaluator=#f]
e3ba263d
AW
58Define a language.
59
60This syntax defines a @code{#<language>} object, bound to @var{name}
61in the current environment. In addition, the language will be added to
62the global language set. For example, this is the language definition
63for Scheme:
64
65@example
66(define-language scheme
81fd3152
AW
67 #:title "Guile Scheme"
68 #:version "0.5"
69 #:reader read
98850fd7 70 #:compilers `((tree-il . ,compile-tree-il))
81fd3152
AW
71 #:decompilers `((tree-il . ,decompile-tree-il))
72 #:evaluator (lambda (x module) (primitive-eval x))
73 #:printer write)
e3ba263d 74@end example
e3ba263d
AW
75@end deffn
76
77The interesting thing about having languages defined this way is that
78they present a uniform interface to the read-eval-print loop. This
79allows the user to change the current language of the REPL:
80
81@example
82$ guile
83Guile Scheme interpreter 0.5 on Guile 1.9.0
84Copyright (C) 2001-2008 Free Software Foundation, Inc.
85
86Enter `,help' for help.
81fd3152
AW
87scheme@@(guile-user)> ,language tree-il
88Tree Intermediate Language interpreter 1.0 on Guile 1.9.0
e3ba263d
AW
89Copyright (C) 2001-2008 Free Software Foundation, Inc.
90
91Enter `,help' for help.
81fd3152 92tree-il@@(guile-user)>
e3ba263d
AW
93@end example
94
95Languages can be looked up by name, as they were above.
96
97@deffn {Scheme Procedure} lookup-language name
98Looks up a language named @var{name}, autoloading it if necessary.
99
100Languages are autoloaded by looking for a variable named @var{name} in
101a module named @code{(language @var{name} spec)}.
102
103The language object will be returned, or @code{#f} if there does not
104exist a language with that name.
105@end deffn
106
107Defining languages this way allows us to programmatically determine
108the necessary steps for compiling code from one language to another.
109
110@deffn {Scheme Procedure} lookup-compilation-order from to
111Recursively traverses the set of languages to which @var{from} can
112compile, depth-first, and return the first path that can transform
113@var{from} to @var{to}. Returns @code{#f} if no path is found.
114
115This function memoizes its results in a cache that is invalidated by
116subsequent calls to @code{define-language}, so it should be quite
117fast.
118@end deffn
119
120There is a notion of a ``current language'', which is maintained in
121the @code{*current-language*} fluid. This language is normally Scheme,
86872cc3 122and may be rebound by the user. The run-time compilation interfaces
e3ba263d
AW
123(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
124and target languages.
125
126The normal tower of languages when compiling Scheme goes like this:
127
128@itemize
129@item Scheme, which we know and love
81fd3152 130@item Tree Intermediate Language (Tree-IL)
e3ba263d 131@item Guile Low Intermediate Language (GLIL)
81fd3152
AW
132@item Assembly
133@item Bytecode
73643339 134@item Objcode
e3ba263d
AW
135@end itemize
136
137Object code may be serialized to disk directly, though it has a cookie
73643339
AW
138and version prepended to the front. But when compiling Scheme at run
139time, you want a Scheme value: for example, a compiled procedure. For
140this reason, so as not to break the abstraction, Guile defines a fake
81fd3152
AW
141language at the bottom of the tower:
142
143@itemize
144@item Value
145@end itemize
146
147Compiling to @code{value} loads the object code into a procedure, and
148wakes the sleeping giant.
e3ba263d
AW
149
150Perhaps this strangeness can be explained by example:
151@code{compile-file} defaults to compiling to object code, because it
152produces object code that has to live in the barren world outside the
153Guile runtime; but @code{compile} defaults to compiling to
154@code{value}, as its product re-enters the Guile world.
155
156Indeed, the process of compilation can circulate through these
157different worlds indefinitely, as shown by the following quine:
158
159@example
00ce5125 160((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
e3ba263d 161@end example
00ce5125
AW
162
163@node The Scheme Compiler
164@subsection The Scheme Compiler
165
81fd3152
AW
166The job of the Scheme compiler is to expand all macros and all of
167Scheme to its most primitive expressions. The definition of
168``primitive'' is given by the inventory of constructs provided by
169Tree-IL, the target language of the Scheme compiler: procedure
170applications, conditionals, lexical references, etc. This is described
171more fully in the next section.
172
173The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
174that it is completely implemented by the macro expander. Since the
175macro expander has to run over all of the source code already in order
176to expand macros, it might as well do the analysis at the same time,
177producing Tree-IL expressions directly.
178
179Because this compiler is actually the macro expander, it is
180extensible. Any macro which the user writes becomes part of the
181compiler.
182
183The Scheme-to-Tree-IL expander may be invoked using the generic
184@code{compile} procedure:
185
186@lisp
187(compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
188@result{}
189 #<<application> src: #f
190 proc: #<<toplevel-ref> src: #f name: +>
191 args: (#<<const> src: #f exp: 1>
192 #<<const> src: #f exp: 2>)>
193@end lisp
194
195Or, since Tree-IL is so close to Scheme, it is often useful to expand
196Scheme to Tree-IL, then translate back to Scheme. For that reason the
197expander provides two interfaces. The former is equivalent to calling
198@code{(sc-expand '(+ 1 2) 'c)}, where the @code{'c} is for
199``compile''. With @code{'e} (the default), the result is translated
200back to Scheme:
201
202@lisp
203(sc-expand '(+ 1 2))
204@result{} (+ 1 2)
205(sc-expand '(let ((x 10)) (* x x)))
206@result{} (let ((x84 10)) (* x84 x84))
207@end lisp
208
209The second example shows that as part of its job, the macro expander
210renames lexically-bound variables. The original names are preserved
211when compiling to Tree-IL, but can't be represented in Scheme: a
212lexical binding only has one name. It is for this reason that the
213@emph{native} output of the expander is @emph{not} Scheme. There's too
214much information we would lose if we translated to Scheme directly:
215lexical variable names, source locations, and module hygiene.
216
217Note however that @code{sc-expand} does not have the same signature as
218@code{compile-tree-il}. @code{compile-tree-il} is a small wrapper
219around @code{sc-expand}, to make it conform to the general form of
220compiler procedures in Guile's language tower.
221
98850fd7
AW
222Compiler procedures take three arguments: an expression, an
223environment, and a keyword list of options. They return three values:
224the compiled expression, the corresponding environment for the target
225language, and a ``continuation environment''. The compiled expression
226and environment will serve as input to the next language's compiler.
227The ``continuation environment'' can be used to compile another
228expression from the same source language within the same module.
81fd3152
AW
229
230For example, you might compile the expression, @code{(define-module
231(foo))}. This will result in a Tree-IL expression and environment. But
232if you compiled a second expression, you would want to take into
233account the compile-time effect of compiling the previous expression,
234which puts the user in the @code{(foo)} module. That is purpose of the
235``continuation environment''; you would pass it as the environment
236when compiling the subsequent expression.
237
238For Scheme, an environment may be one of two things:
1ebe6a63 239
ca445ba5
AW
240@itemize
241@item @code{#f}, in which case compilation is performed in the context
81fd3152
AW
242of the current module; or
243@item a module, which specifies the context of the compilation.
ca445ba5
AW
244@end itemize
245
1ebe6a63
LC
246By default, the @code{compile} and @code{compile-file} procedures
247compile in a fresh module, such that bindings and macros introduced by
248the expression being compiled are isolated:
249
250@example
251(eq? (current-module) (compile '(current-module)))
252@result{} #f
253
254(compile '(define hello 'world))
255(defined? 'hello)
256@result{} #f
257
258(define / *)
259(eq? (compile '/) /)
260@result{} #f
261@end example
262
263Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
264@code{current-reader}}) are isolated:
265
266@example
267(compile '(fluid-set! current-reader (lambda args 'fail)))
268(fluid-ref current-reader)
269@result{} #f
270@end example
271
272Nevertheless, having the compiler and @dfn{compilee} share the same name
273space can be achieved by explicitly passing @code{(current-module)} as
274the compilation environment:
275
276@example
277(define hello 'world)
278(compile 'hello #:env (current-module))
279@result{} world
280@end example
281
81fd3152
AW
282@node Tree-IL
283@subsection Tree-IL
00ce5125 284
81fd3152 285Tree Intermediate Language (Tree-IL) is a structured intermediate
c850030f
AW
286language that is close in expressive power to Scheme. It is an
287expanded, pre-analyzed Scheme.
288
81fd3152
AW
289Tree-IL is ``structured'' in the sense that its representation is
290based on records, not S-expressions. This gives a rigidity to the
291language that ensures that compiling to a lower-level language only
292requires a limited set of transformations. Practically speaking,
293consider the Tree-IL type, @code{<const>}, which has two fields,
294@code{src} and @code{exp}. Instances of this type are records created
295via @code{make-const}, and whose fields are accessed as
296@code{const-src}, and @code{const-exp}. There is also a predicate,
297@code{const?}. @xref{Records}, for more information on records.
298
299@c alpha renaming
300
301All Tree-IL types have a @code{src} slot, which holds source location
302information for the expression. This information, if present, will be
303residualized into the compiled object code, allowing backtraces to
304show source information. The format of @code{src} is the same as that
305returned by Guile's @code{source-properties} function. @xref{Source
306Properties}, for more information.
307
308Although Tree-IL objects are represented internally using records,
309there is also an equivalent S-expression external representation for
310each kind of Tree-IL. For example, an the S-expression representation
311of @code{#<const src: #f exp: 3>} expression would be:
c850030f
AW
312
313@example
81fd3152 314(const 3)
c850030f
AW
315@end example
316
81fd3152 317Users may program with this format directly at the REPL:
c850030f
AW
318
319@example
81fd3152
AW
320scheme@@(guile-user)> ,language tree-il
321Tree Intermediate Language interpreter 1.0 on Guile 1.9.0
c850030f
AW
322Copyright (C) 2001-2008 Free Software Foundation, Inc.
323
324Enter `,help' for help.
81fd3152 325tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
c850030f
AW
326@result{} 42
327@end example
328
81fd3152
AW
329The @code{src} fields are left out of the external representation.
330
98850fd7
AW
331One may create Tree-IL objects from their external representations via
332calling @code{parse-tree-il}, the reader for Tree-IL. If any source
333information is attached to the input S-expression, it will be
334propagated to the resulting Tree-IL expressions. This is probably the
335easiest way to compile to Tree-IL: just make the appropriate external
336representations in S-expression format, and let @code{parse-tree-il}
337take care of the rest.
338
81fd3152
AW
339@deftp {Scheme Variable} <void> src
340@deftpx {External Representation} (void)
341An empty expression. In practice, equivalent to Scheme's @code{(if #f
342#f)}.
343@end deftp
344@deftp {Scheme Variable} <const> src exp
345@deftpx {External Representation} (const @var{exp})
346A constant.
347@end deftp
348@deftp {Scheme Variable} <primitive-ref> src name
349@deftpx {External Representation} (primitive @var{name})
350A reference to a ``primitive''. A primitive is a procedure that, when
351compiled, may be open-coded. For example, @code{cons} is usually
352recognized as a primitive, so that it compiles down to a single
353instruction.
354
355Compilation of Tree-IL usually begins with a pass that resolves some
356@code{<module-ref>} and @code{<toplevel-ref>} expressions to
357@code{<primitive-ref>} expressions. The actual compilation pass
358has special cases for applications of certain primitives, like
359@code{apply} or @code{cons}.
360@end deftp
361@deftp {Scheme Variable} <lexical-ref> src name gensym
362@deftpx {External Representation} (lexical @var{name} @var{gensym})
363A reference to a lexically-bound variable. The @var{name} is the
364original name of the variable in the source program. @var{gensym} is a
365unique identifier for this variable.
366@end deftp
367@deftp {Scheme Variable} <lexical-set> src name gensym exp
368@deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
369Sets a lexically-bound variable.
370@end deftp
371@deftp {Scheme Variable} <module-ref> src mod name public?
372@deftpx {External Representation} (@@ @var{mod} @var{name})
373@deftpx {External Representation} (@@@@ @var{mod} @var{name})
374A reference to a variable in a specific module. @var{mod} should be
375the name of the module, e.g. @code{(guile-user)}.
376
377If @var{public?} is true, the variable named @var{name} will be looked
378up in @var{mod}'s public interface, and serialized with @code{@@};
379otherwise it will be looked up among the module's private bindings,
380and is serialized with @code{@@@@}.
381@end deftp
382@deftp {Scheme Variable} <module-set> src mod name public? exp
383@deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
384@deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
385Sets a variable in a specific module.
386@end deftp
387@deftp {Scheme Variable} <toplevel-ref> src name
388@deftpx {External Representation} (toplevel @var{name})
389References a variable from the current procedure's module.
390@end deftp
391@deftp {Scheme Variable} <toplevel-set> src name exp
392@deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
393Sets a variable in the current procedure's module.
394@end deftp
395@deftp {Scheme Variable} <toplevel-define> src name exp
396@deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
397Defines a new top-level variable in the current procedure's module.
398@end deftp
399@deftp {Scheme Variable} <conditional> src test then else
400@deftpx {External Representation} (if @var{test} @var{then} @var{else})
ca445ba5 401A conditional. Note that @var{else} is not optional.
c850030f 402@end deftp
81fd3152
AW
403@deftp {Scheme Variable} <application> src proc args
404@deftpx {External Representation} (apply @var{proc} . @var{args})
ca445ba5 405A procedure call.
c850030f 406@end deftp
81fd3152
AW
407@deftp {Scheme Variable} <sequence> src exps
408@deftpx {External Representation} (begin . @var{exps})
409Like Scheme's @code{begin}.
c850030f 410@end deftp
81fd3152
AW
411@deftp {Scheme Variable} <lambda> src names vars meta body
412@deftpx {External Representation} (lambda @var{names} @var{vars} @var{meta} @var{body})
413A closure. @var{names} is original binding form, as given in the
414source code, which may be an improper list. @var{vars} are gensyms
415corresponding to the @var{names}. @var{meta} is an association list of
416properties. The actual @var{body} is a single Tree-IL expression.
417@end deftp
418@deftp {Scheme Variable} <let> src names vars vals exp
419@deftpx {External Representation} (let @var{names} @var{vars} @var{vals} @var{exp})
420Lexical binding, like Scheme's @code{let}. @var{names} are the
421original binding names, @var{vars} are gensyms corresponding to the
422@var{names}, and @var{vals} are Tree-IL expressions for the values.
423@var{exp} is a single Tree-IL expression.
424@end deftp
425@deftp {Scheme Variable} <letrec> src names vars vals exp
426@deftpx {External Representation} (letrec @var{names} @var{vars} @var{vals} @var{exp})
427A version of @code{<let>} that creates recursive bindings, like
428Scheme's @code{letrec}.
429@end deftp
430
98850fd7
AW
431There are two Tree-IL constructs that are not normally produced by
432higher-level compilers, but instead are generated during the
433source-to-source optimization and analysis passes that the Tree-IL
434compiler does. Users should not generate these expressions directly,
435unless they feel very clever, as the default analysis pass will
436generate them as necessary.
437
438@deftp {Scheme Variable} <let-values> src names vars exp body
439@deftpx {External Representation} (let-values @var{names} @var{vars} @var{exp} @var{body})
440Like Scheme's @code{receive} -- binds the values returned by
441evaluating @code{exp} to the @code{lambda}-like bindings described by
442@var{vars}. That is to say, @var{vars} may be an improper list.
443
444@code{<let-values>} is an optimization of @code{<application>} of the
445primitive, @code{call-with-values}.
446@end deftp
447@deftp {Scheme Variable} <fix> src names vars vals body
448@deftpx {External Representation} (fix @var{names} @var{vars} @var{vals} @var{body})
449Like @code{<letrec>}, but only for @var{vals} that are unset
450@code{lambda} expressions.
451
452@code{fix} is an optimization of @code{letrec} (and @code{let}).
453@end deftp
81fd3152
AW
454
455Tree-IL implements a compiler to GLIL that recursively traverses
456Tree-IL expressions, writing out GLIL expressions into a linear list.
457The compiler also keeps some state as to whether the current
458expression is in tail context, and whether its value will be used in
459future computations. This state allows the compiler not to emit code
460for constant expressions that will not be used (e.g. docstrings), and
461to perform tail calls when in tail position.
462
98850fd7
AW
463Most optimization, such as it currently is, is performed on Tree-IL
464expressions as source-to-source transformations. There will be more
465optimizations added in the future.
c850030f
AW
466
467Interested readers are encouraged to read the implementation in
81fd3152 468@code{(language tree-il compile-glil)} for more details.
00ce5125
AW
469
470@node GLIL
471@subsection GLIL
472
ff73ae34 473Guile Low Intermediate Language (GLIL) is a structured intermediate
81fd3152 474language whose expressions more closely approximate Guile's VM
98850fd7
AW
475instruction set. Its expression types are defined in @code{(language
476glil)}.
c850030f 477
98850fd7 478@deftp {Scheme Variable} <glil-program> nargs nrest nlocs meta . body
86872cc3 479A unit of code that at run-time will correspond to a compiled
98850fd7
AW
480procedure. @var{nargs} @var{nrest} and @var{nlocs} collectively define
481the program's arity; see @ref{Compiled Procedures}, for more
482information. @var{meta} should be an alist of properties, as in
483Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
484expressions.
c850030f
AW
485@end deftp
486@deftp {Scheme Variable} <glil-bind> . vars
ff73ae34
AW
487An advisory expression that notes a liveness extent for a set of
488variables. @var{vars} is a list of @code{(@var{name} @var{type}
489@var{index})}, where @var{type} should be either @code{argument},
490@code{local}, or @code{external}.
491
492@code{<glil-bind>} expressions end up being serialized as part of a
493program's metadata and do not form part of a program's code path.
c850030f
AW
494@end deftp
495@deftp {Scheme Variable} <glil-mv-bind> vars rest
ff73ae34
AW
496A multiple-value binding of the values on the stack to @var{vars}. Iff
497@var{rest} is true, the last element of @var{vars} will be treated as
498a rest argument.
499
500In addition to pushing a binding annotation on the stack, like
501@code{<glil-bind>}, an expression is emitted at compilation time to
502make sure that there are enough values available to bind. See the
503notes on @code{truncate-values} in @ref{Procedural Instructions}, for
504more information.
c850030f
AW
505@end deftp
506@deftp {Scheme Variable} <glil-unbind>
ff73ae34
AW
507Closes the liveness extent of the most recently encountered
508@code{<glil-bind>} or @code{<glil-mv-bind>} expression. As GLIL
509expressions are compiled, a parallel stack of live bindings is
510maintained; this expression pops off the top element from that stack.
511
512Bindings are written into the program's metadata so that debuggers and
513other tools can determine the set of live local variables at a given
514offset within a VM program.
c850030f
AW
515@end deftp
516@deftp {Scheme Variable} <glil-source> loc
ff73ae34 517Records source information for the preceding expression. @var{loc}
73643339
AW
518should be an association list of containing @code{line} @code{column},
519and @code{filename} keys, e.g. as returned by
520@code{source-properties}.
c850030f
AW
521@end deftp
522@deftp {Scheme Variable} <glil-void>
98850fd7 523Pushes ``the unspecified value'' on the stack.
c850030f
AW
524@end deftp
525@deftp {Scheme Variable} <glil-const> obj
ff73ae34 526Pushes a constant value onto the stack. @var{obj} must be a number,
98850fd7
AW
527string, symbol, keyword, boolean, character, uniform array, the empty
528list, or a pair or vector of constants.
c850030f 529@end deftp
98850fd7
AW
530@deftp {Scheme Variable} <glil-lexical> local? boxed? op index
531Accesses a lexically bound variable. If the variable is not
532@var{local?} it is free. All variables may have @code{ref} and
533@code{set} as their @var{op}. Boxed variables may also have the
534@var{op}s @code{box}, @code{empty-box}, and @code{fix}, which
535correspond in semantics to the VM instructions @code{box},
536@code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
537more information.
c850030f
AW
538@end deftp
539@deftp {Scheme Variable} <glil-toplevel> op name
ff73ae34
AW
540Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
541or @code{define}.
c850030f
AW
542@end deftp
543@deftp {Scheme Variable} <glil-module> op mod name public?
73643339
AW
544Accesses a variable within a specific module. See Tree-IL's
545@code{<module-ref>}, for more information.
c850030f
AW
546@end deftp
547@deftp {Scheme Variable} <glil-label> label
ff73ae34
AW
548Creates a new label. @var{label} can be any Scheme value, and should
549be unique.
c850030f
AW
550@end deftp
551@deftp {Scheme Variable} <glil-branch> inst label
ff73ae34 552Branch to a label. @var{label} should be a @code{<ghil-label>}.
c850030f
AW
553@code{inst} is a branching instruction: @code{br-if}, @code{br}, etc.
554@end deftp
555@deftp {Scheme Variable} <glil-call> inst nargs
ff73ae34 556This expression is probably misnamed, as it does not correspond to
c850030f
AW
557function calls. @code{<glil-call>} invokes the VM instruction named
558@var{inst}, noting that it is called with @var{nargs} stack arguments.
ff73ae34
AW
559The arguments should be pushed on the stack already. What happens to
560the stack afterwards depends on the instruction.
c850030f
AW
561@end deftp
562@deftp {Scheme Variable} <glil-mv-call> nargs ra
ff73ae34
AW
563Performs a multiple-value call. @var{ra} is a @code{<glil-label>}
564corresponding to the multiple-value return address for the call. See
565the notes on @code{mv-call} in @ref{Procedural Instructions}, for more
566information.
c850030f
AW
567@end deftp
568
ff73ae34
AW
569Users may enter in GLIL at the REPL as well, though there is a bit
570more bookkeeping to do. Since GLIL needs the set of variables to be
571declared explicitly in a @code{<glil-program>}, GLIL expressions must
572be wrapped in a thunk that declares the arity of the expression:
00ce5125 573
ff73ae34
AW
574@example
575scheme@@(guile-user)> ,language glil
45867c2a
NJ
576Guile Lowlevel Intermediate Language (GLIL) interpreter 0.3 on
577 Guile 1.9.0
ff73ae34 578Copyright (C) 2001-2008 Free Software Foundation, Inc.
00ce5125 579
ff73ae34 580Enter `,help' for help.
98850fd7 581glil@@(guile-user)> (program 0 0 0 () (const 3) (call return 1))
ff73ae34
AW
582@result{} 3
583@end example
00ce5125 584
ff73ae34
AW
585Just as in all of Guile's compilers, an environment is passed to the
586GLIL-to-object code compiler, and one is returned as well, along with
587the object code.
00ce5125 588
81fd3152
AW
589@node Assembly
590@subsection Assembly
591
73643339
AW
592Assembly is an S-expression-based, human-readable representation of
593the actual bytecodes that will be emitted for the VM. As such, it is a
594useful intermediate language both for compilation and for
595decompilation.
81fd3152 596
73643339
AW
597Besides the fact that it is not a record-based language, assembly
598differs from GLIL in four main ways:
00ce5125 599
73643339
AW
600@itemize
601@item Labels have been resolved to byte offsets in the program.
602@item Constants inside procedures have either been expressed as inline
98850fd7 603instructions or cached in object arrays.
73643339
AW
604@item Procedures with metadata (source location information, liveness
605extents, procedure names, generic properties, etc) have had their
606metadata serialized out to thunks.
607@item All expressions correspond directly to VM instructions -- i.e.,
98850fd7 608there is no @code{<glil-lexical>} which can be a ref or a set.
73643339
AW
609@end itemize
610
611Assembly is isomorphic to the bytecode that it compiles to. You can
612compile to bytecode, then decompile back to assembly, and you have the
613same assembly code.
614
615The general form of assembly instructions is the following:
616
617@lisp
618(@var{inst} @var{arg} ...)
619@end lisp
620
621The @var{inst} names a VM instruction, and its @var{arg}s will be
622embedded in the instruction stream. The easiest way to see assembly is
623to play around with it at the REPL, as can be seen in this annotated
624example:
625
626@example
627scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
98850fd7 628(load-program 0 0 0
73643339 629 () ; Labels
98850fd7 630 70 ; Length
73643339 631 #f ; Metadata
98850fd7 632 (make-false)
73643339
AW
633 (make-false) ; object table for the returned lambda
634 (nop)
635 (nop) ; Alignment. Since assembly has already resolved its labels
636 (nop) ; to offsets, and programs must be 8-byte aligned since their
637 (nop) ; object code is mmap'd directly to structures, assembly
638 (nop) ; has to have the alignment embedded in it.
639 (nop)
98850fd7
AW
640 (load-program
641 1
642 0
73643339 643 ()
98850fd7
AW
644 8
645 (load-program 0 0 0 () 21 #f
73643339
AW
646 (load-symbol "x") ; Name and liveness extent for @code{x}.
647 (make-false)
648 (make-int8:0) ; Some instruction+arg combinations
649 (make-int8:0) ; have abbreviations.
650 (make-int8 6)
651 (list 0 5)
652 (list 0 1)
653 (make-eol)
654 (list 0 2)
655 (return))
656 ; And here, the actual code.
657 (local-ref 0)
658 (local-ref 0)
659 (add)
98850fd7
AW
660 (return)
661 (nop)
662 (nop))
73643339
AW
663 ; Return our new procedure.
664 (return))
665@end example
666
667Of course you can switch the REPL to assembly and enter in assembly
668S-expressions directly, like with other languages, though it is more
669difficult, given that the length fields have to be correct.
670
671@node Bytecode and Objcode
672@subsection Bytecode and Objcode
673
674Finally, the raw bytes. There are actually two different ``languages''
675here, corresponding to two different ways to represent the bytes.
676
677``Bytecode'' represents code as uniform byte vectors, useful for
678structuring and destructuring code on the Scheme level. Bytecode is
679the next step down from assembly:
680
681@example
682scheme@@(guile-user)> (compile '(+ 32 10) #:to 'assembly)
98850fd7 683@result{} (load-program 0 0 0 () 6 #f
73643339
AW
684 (make-int8 32) (make-int8 10) (add) (return))
685scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
98850fd7 686@result{} #u8(0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 10 32 10 10 120 52)
73643339
AW
687@end example
688
689``Objcode'' is bytecode, but mapped directly to a C structure,
690@code{struct scm_objcode}:
691
692@example
693struct scm_objcode @{
694 scm_t_uint8 nargs;
695 scm_t_uint8 nrest;
98850fd7 696 scm_t_uint16 nlocs;
73643339
AW
697 scm_t_uint32 len;
698 scm_t_uint32 metalen;
699 scm_t_uint8 base[0];
700@};
701@end example
702
703As one might imagine, objcode imposes a minimum length on the
704bytecode. Also, the multibyte fields are in native endianness, which
705makes objcode (and bytecode) system-dependent. Indeed, in the short
98850fd7 706example above, all but the last 6 bytes were the program's header.
73643339
AW
707
708Objcode also has a couple of important efficiency hacks. First,
709objcode may be mapped directly from disk, allowing compiled code to be
710loaded quickly, often from the system's disk cache, and shared among
711multiple processes. Secondly, objcode may be embedded in other
712objcode, allowing procedures to have the text of other procedures
713inlined into their bodies, without the need for separate allocation of
714the code. Of course, the objcode object itself does need to be
715allocated.
716
717Procedures related to objcode are defined in the @code{(system vm
718objcode)} module.
00ce5125 719
ff73ae34
AW
720@deffn {Scheme Procedure} objcode? obj
721@deffnx {C Function} scm_objcode_p (obj)
722Returns @code{#f} iff @var{obj} is object code, @code{#f} otherwise.
723@end deffn
00ce5125 724
73643339
AW
725@deffn {Scheme Procedure} bytecode->objcode bytecode
726@deffnx {C Function} scm_bytecode_to_objcode (bytecode,)
ff73ae34 727Makes a bytecode object from @var{bytecode}, which should be a
73643339 728@code{u8vector}.
ff73ae34 729@end deffn
e3ba263d 730
ff73ae34
AW
731@deffn {Scheme Variable} load-objcode file
732@deffnx {C Function} scm_load_objcode (file)
733Load object code from a file named @var{file}. The file will be mapped
734into memory via @code{mmap}, so this is a very fast operation.
e3ba263d 735
98850fd7 736On disk, object code has an sixteen-byte cookie prepended to it, to
73643339
AW
737prevent accidental loading of arbitrary garbage.
738@end deffn
739
740@deffn {Scheme Variable} write-objcode objcode file
741@deffnx {C Function} scm_write_objcode (objcode)
742Write object code out to a file, prepending the eight-byte cookie.
ff73ae34 743@end deffn
e3ba263d 744
ff73ae34
AW
745@deffn {Scheme Variable} objcode->u8vector objcode
746@deffnx {C Function} scm_objcode_to_u8vector (objcode)
73643339 747Copy object code out to a @code{u8vector} for analysis by Scheme.
ff73ae34 748@end deffn
e3ba263d 749
73643339
AW
750The following procedure is actually in @code{(system vm program)}, but
751we'll mention it here:
752
98850fd7
AW
753@deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
754@deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
ff73ae34 755Load up object code into a Scheme program. The resulting program will
73643339 756have @var{objtable} as its object table, which should be a vector or
98850fd7 757@code{#f}, and will capture the free variables from @var{free-vars}.
ff73ae34 758@end deffn
c850030f 759
ff73ae34
AW
760Object code from a file may be disassembled at the REPL via the
761meta-command @code{,disassemble-file}, abbreviated as @code{,xx}.
762Programs may be disassembled via @code{,disassemble}, abbreviated as
763@code{,x}.
764
765Compiling object code to the fake language, @code{value}, is performed
766via loading objcode into a program, then executing that thunk with
767respect to the compilation environment. Normally the environment
768propagates through the compiler transparently, but users may specify
769the compilation environment manually as well:
770
98850fd7 771@deffn {Scheme Procedure} make-objcode-env module free-vars
ff73ae34 772Make an object code environment. @var{module} should be a Scheme
98850fd7 773module, and @var{free-vars} should be a vector of free variables.
ff73ae34
AW
774@code{#f} is also a valid object code environment.
775@end deffn
c850030f 776
e63d888e
DK
777@node Writing New High-Level Languages
778@subsection Writing New High-Level Languages
779
780In order to integrate a new language @var{lang} into Guile's compiler
781system, one has to create the module @code{(language @var{lang} spec)}
782containing the language definition and referencing the parser,
783compiler and other routines processing it. The module hierarchy in
784@code{(language brainfuck)} defines a very basic Brainfuck
785implementation meant to serve as easy-to-understand example on how to
4e432dab
AW
786do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
787for more information about the Brainfuck language itself.
788
e63d888e 789
ff73ae34
AW
790@node Extending the Compiler
791@subsection Extending the Compiler
e3ba263d 792
ff73ae34
AW
793At this point, we break with the impersonal tone of the rest of the
794manual, and make an intervention. Admit it: if you've read this far
795into the compiler internals manual, you are a junkie. Perhaps a course
796at your university left you unsated, or perhaps you've always harbored
797a sublimated desire to hack the holy of computer science holies: a
798compiler. Well you're in good company, and in a good position. Guile's
799compiler needs your help.
800
801There are many possible avenues for improving Guile's compiler.
802Probably the most important improvement, speed-wise, will be some form
803of native compilation, both just-in-time and ahead-of-time. This could
804be done in many ways. Probably the easiest strategy would be to extend
805the compiled procedure structure to include a pointer to a native code
86872cc3 806vector, and compile from bytecode to native code at run-time after a
ff73ae34
AW
807procedure is called a certain number of times.
808
809The name of the game is a profiling-based harvest of the low-hanging
810fruit, running programs of interest under a system-level profiler and
811determining which improvements would give the most bang for the buck.
98850fd7
AW
812It's really getting to the point though that native compilation is the
813next step.
ff73ae34
AW
814
815The compiler also needs help at the top end, enhancing the Scheme that
98850fd7
AW
816it knows to also understand R6RS, and adding new high-level compilers.
817We have JavaScript and Emacs Lisp mostly complete, but they could use
818some love; Lua would be nice as well, butq whatever language it is
819that strikes your fancy would be welcome too.
820
821Compilers are for hacking, not for admiring or for complaining about.
822Get to it!