guile>
@end lisp
+@anchor{Memoization}
+@cindex Memoization
(For anyone wondering why the first @code{(do-main 4)} call above
generates lots more trace lines than the subsequent calls: these
examples also demonstrate how the Guile evaluator ``memoizes'' code.
to be impenetrable.
This section aims to pull back the veil from over Guile's compiler
-implementation, some reference to the wizard of oz FIXME.
+implementation, and pay attention to the small man behind the curtain.
-REFFIXME, if you're lost and you just wanted to know how to compile
-your .scm file.
+@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
+know how to compile your .scm file.
@menu
* Compiler Tower::
* GHIL::
* GLIL::
* Object Code::
+* Extending the Compiler::
@end menu
+FIXME: document the new repl somewhere?
+
@node Compiler Tower
@subsection Compiler Tower
Guile's compiler is quite simple, actually -- its @emph{compilers}, to
put it more accurately. Guile defines a tower of languages, starting
at Scheme and progressively simplifying down to languages that
-resemble the VM instruction set (REFFIXME).
+resemble the VM instruction set (@pxref{Instruction Set}).
Each language knows how to compile to the next, so each step is simple
and understandable. Furthermore, this set of languages is not
high-level languages, new passes, or even different compilation
targets.
-lookup-language
-(lang xxx spec)
-
-(system-base-language)
-
-describe:
-
-(define-record <language>
- name
- title
- version
- reader
- printer
- (parser #f)
- (read-file #f)
- (compilers '())
- (evaluator #f))
-
-(define-macro (define-language name . spec)
-
-(lookup-compilation-order from to)
-
-language definition
-
-compiling from here to there
-
-the normal tower: scheme, ghil, glil, object code
-maybe from there serialized to disk
-or if at repl, brought back to life by compiling to ``value''
-
-compile-file defaults to compiling to objcode
-compile defaults to compiling to value
-
+Languages are registered in the module, @code{(system base language)}:
+
+@example
+(use-modules (system base language))
+@end example
+
+They are registered with the @code{define-language} form.
+
+@deffn {Scheme Syntax} define-language @
+name title version reader printer @
+[parser=#f] [read-file=#f] [compilers='()] [evaluator=#f]
+Define a language.
+
+This syntax defines a @code{#<language>} object, bound to @var{name}
+in the current environment. In addition, the language will be added to
+the global language set. For example, this is the language definition
+for Scheme:
+
+@example
+(define-language scheme
+ #:title "Guile Scheme"
+ #:version "0.5"
+ #:reader read
+ #:read-file read-file
+ #:compilers `((,ghil . ,translate))
+ #:evaluator (lambda (x module) (primitive-eval x))
+ #:printer write)
+@end example
+
+In this example, from @code{(language scheme spec)}, @code{read-file}
+reads expressions from a port and wraps them in a @code{begin} block.
+@end deffn
+
+The interesting thing about having languages defined this way is that
+they present a uniform interface to the read-eval-print loop. This
+allows the user to change the current language of the REPL:
+
+@example
+$ guile
+Guile Scheme interpreter 0.5 on Guile 1.9.0
+Copyright (C) 2001-2008 Free Software Foundation, Inc.
+
+Enter `,help' for help.
+scheme@@(guile-user)> ,language ghil
+Guile High Intermediate Language (GHIL) interpreter 0.3 on Guile 1.9.0
+Copyright (C) 2001-2008 Free Software Foundation, Inc.
+
+Enter `,help' for help.
+ghil@@(guile-user)>
+@end example
+
+Languages can be looked up by name, as they were above.
+
+@deffn {Scheme Procedure} lookup-language name
+Looks up a language named @var{name}, autoloading it if necessary.
+
+Languages are autoloaded by looking for a variable named @var{name} in
+a module named @code{(language @var{name} spec)}.
+
+The language object will be returned, or @code{#f} if there does not
+exist a language with that name.
+@end deffn
+
+Defining languages this way allows us to programmatically determine
+the necessary steps for compiling code from one language to another.
+
+@deffn {Scheme Procedure} lookup-compilation-order from to
+Recursively traverses the set of languages to which @var{from} can
+compile, depth-first, and return the first path that can transform
+@var{from} to @var{to}. Returns @code{#f} if no path is found.
+
+This function memoizes its results in a cache that is invalidated by
+subsequent calls to @code{define-language}, so it should be quite
+fast.
+@end deffn
+
+There is a notion of a ``current language'', which is maintained in
+the @code{*current-language*} fluid. This language is normally Scheme,
+and may be rebound by the user. The runtime compilation interfaces
+(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
+and target languages.
+
+The normal tower of languages when compiling Scheme goes like this:
+
+@itemize
+@item Scheme, which we know and love
+@item Guile High Intermediate Language (GHIL)
+@item Guile Low Intermediate Language (GLIL)
+@item Object code
+@end itemize
+
+Object code may be serialized to disk directly, though it has a cookie
+and version prepended to the front. But when compiling Scheme at
+runtime, you want a Scheme value, e.g. a compiled procedure. For this
+reason, so as not to break the abstraction, Guile defines a fake
+language, @code{value}. Compiling to @code{value} loads the object
+code into a procedure, and wakes the sleeping giant.
+
+Perhaps this strangeness can be explained by example:
+@code{compile-file} defaults to compiling to object code, because it
+produces object code that has to live in the barren world outside the
+Guile runtime; but @code{compile} defaults to compiling to
+@code{value}, as its product re-enters the Guile world.
+
+Indeed, the process of compilation can circulate through these
+different worlds indefinitely, as shown by the following quine:
+
+@example
((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
-quine
+@end example
@node The Scheme Compiler
@subsection The Scheme Compiler
Compiled code will effectively be a thunk, of no arguments, but
optionally closing over some number of variables (which should be
-captured via `make-closure' REFFIXME.
+captured via `make-closure', @pxref{Loading Instructions}).
@node Object Code
@subsection Object Code
those externals. so you can recompile a closure at runtime, a trick
that goops uses.
+@node Extending the Compiler
+@subsection Extending the Compiler
+
+JIT compilation
+
+AOT compilation
+
+link to what dybvig did
+
+profiling
+
+startup time
of Scheme source code.
But while the evaluator is highly optimized and hand-tuned, and
-contains some extensive speed trickery (REFFIXME memoization), it
-still performs many needless computations during the course of
-evaluating an expression. For example, application of a function to
-arguments needlessly conses up the arguments in a list. Evaluation of
-an expression always has to figure out what the car of the expression
-is -- a procedure, a memoized form, or something else. All values have
-to be allocated on the heap. Et cetera.
+contains some extensive speed trickery (@pxref{Memoization}), it still
+performs many needless computations during the course of evaluating an
+expression. For example, application of a function to arguments
+needlessly conses up the arguments in a list. Evaluation of an
+expression always has to figure out what the car of the expression is
+-- a procedure, a memoized form, or something else. All values have to
+be allocated on the heap. Et cetera.
The solution to this problem is to compile the higher-level language,
Scheme, into a lower-level language for which all of the checks and
Note that this decision to implement a bytecode compiler does not
preclude native compilation. We can compile from bytecode to native
code at runtime, or even do ahead of time compilation. More
-possibilities are discussed in REFFIXME.
+possibilities are discussed in @xref{Extending the Compiler}.
@node VM Concepts
@subsection VM Concepts
In other architectures, the instruction pointer is sometimes called
the ``program counter'' (pc). This set of registers is pretty typical
for stack machines; their exact meanings in the context of Guile's VM
-is described below REFFIXME.
+is described in the next section.
A virtual machine executes by loading a compiled procedure, and
executing the object code associated with that procedure. Of course,
@item External link
This field is a reference to the list of heap-allocated variables
-associated with this frame. A discussion of heap versus stack
-allocation can be found in REFFIXME.
+associated with this frame. For a discussion of heap versus stack
+allocation, @xref{Variables and the VM}.
@item Local variable @var{n}
Lambda-local variables that are allocated on the stack are all
thereafter only referenced on the heap.
@item Program
-This is the program being applied. Programs are discussed in REFFIXME!
+This is the program being applied. For more information on how
+programs are implemented, @xref{VM Programs}.
@end table
@node Variables and the VM
a reference to any captured lexical variables, an object array, and
some metadata such as the procedure's arity, name, and documentation.
You can pick apart these pieces with the accessors in @code{(system vm
-program)}. REFFIXME, for a full API reference.
+program)}. @xref{Compiled Procedures}, for a full API reference.
@cindex object table
The object array of a compiled procedure, also known as the
@dfn{object table}, holds all Scheme objects whose values are known
not to change across invocations of the procedure: constant strings,
symbols, etc. The object table of a program is initialized right
-before a program is loaded with @code{load-program} REFFIXME.
+before a program is loaded with @code{load-program}.
+@xref{Loading Instructions}, for more information.
Variable objects are one such type of constant object: when a global
binding is defined, a variable object is associated to it and that
are resolved relative to the module that was current when the
procedure was created. This lookup occurs lazily, at the first time
the variable is actually referenced, and the location of the lookup is
-cached so that future references are very cheap. REFFIXME xref
-toplevel-ref, for more details.
+cached so that future references are very cheap. @xref{Environment
+Control Instructions}, for more details.
Then we see a reference to an external variable, corresponding to
@code{a}. The disassembler doesn't have enough information to give a
symbols and pairs associated with the metadata are only created if the
user asks for them.
-The format of the thunk's return value is specified in REFFIXME.
+For information on the format of the thunk's return value,
+@xref{Compiled Procedures}.
@item Optionally, the program's object table, as a vector.
A program that does not reference toplevel bindings and does not use
@code{scm_apply}.
For compiled procedures, this instruction sets up a new stack frame,
-as described in REFFIXME, and then dispatches to the first instruction
-in the called procedure, relying on the called procedure to return one
-value to the newly-created continuation.
+as described in @ref{Stack Layout}, and then dispatches to the first
+instruction in the called procedure, relying on the called procedure
+to return one value to the newly-created continuation.
@end deffn
@deffn Instruction goto/args nargs
in addition to a single-value continuation.
The offset (a two-byte value) is an offset within the instruction
-stream; the multiple-value return address in the new frame (see
-frames REFFIXME) will be set to the normal return address plus this
-offset. Instructions at that offset will expect the top value of the
-stack to be the number of values, and below that values themselves,
-pushed separately.
+stream; the multiple-value return address in the new frame
+(@pxref{Stack Layout}) will be set to the normal return address plus
+this offset. Instructions at that offset will expect the top value of
+the stack to be the number of values, and below that values
+themselves, pushed separately.
@end deffn
@deffn Instruction return/values nvalues
this instruction.
If multiple values have been returned, the SCM value will be a
-multiple-values object (REFFIXME scm_values).
+multiple-values object (@pxref{Multiple Values}).
@end deffn
@deffn Instruction break