Merge commit '1e3fd6a0c81bb3e9900a93a9d1923cc788de0f99'
[bpt/guile.git] / doc / ref / compiler.texi
CommitLineData
8680d53b
AW
1@c -*-texinfo-*-
2@c This is part of the GNU Guile Reference Manual.
bdad1340 3@c Copyright (C) 2008, 2009, 2010, 2011, 2012, 2013, 2014
8680d53b
AW
4@c Free Software Foundation, Inc.
5@c See the file guile.texi for copying conditions.
6
7@node Compiling to the Virtual Machine
8@section Compiling to the Virtual Machine
9
67915ab0
AW
10Compilers! The word itself inspires excitement and awe, even among
11experienced practitioners. But a compiler is just a program: an
12eminently hackable thing. This section aims to to describe Guile's
13compiler in such a way that interested Scheme hackers can feel
14comfortable reading and extending it.
00ce5125 15
e3ba263d 16@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
98850fd7 17know how to compile your @code{.scm} file.
00ce5125
AW
18
19@menu
20* Compiler Tower::
21* The Scheme Compiler::
81fd3152 22* Tree-IL::
67915ab0
AW
23* Continuation-Passing Style::
24* Bytecode::
e63d888e 25* Writing New High-Level Languages::
e3ba263d 26* Extending the Compiler::
00ce5125
AW
27@end menu
28
29@node Compiler Tower
30@subsection Compiler Tower
31
d297e544
AW
32Guile's compiler is quite simple -- its @emph{compilers}, to put it more
33accurately. Guile defines a tower of languages, starting at Scheme and
34progressively simplifying down to languages that resemble the VM
35instruction set (@pxref{Instruction Set}).
00ce5125
AW
36
37Each language knows how to compile to the next, so each step is simple
67915ab0
AW
38and understandable. Furthermore, this set of languages is not hardcoded
39into Guile, so it is possible for the user to add new high-level
40languages, new passes, or even different compilation targets.
00ce5125 41
e3ba263d
AW
42Languages are registered in the module, @code{(system base language)}:
43
44@example
45(use-modules (system base language))
46@end example
47
48They are registered with the @code{define-language} form.
49
50@deffn {Scheme Syntax} define-language @
994d87be
BT
51 [#:name] [#:title] [#:reader] [#:printer] @
52 [#:parser=#f] [#:compilers='()] @
53 [#:decompilers='()] [#:evaluator=#f] @
54 [#:joiner=#f] [#:for-humans?=#t] @
55 [#:make-default-environment=make-fresh-user-module]
e3ba263d
AW
56Define a language.
57
67915ab0
AW
58This syntax defines a @code{<language>} object, bound to @var{name} in
59the current environment. In addition, the language will be added to the
60global language set. For example, this is the language definition for
61Scheme:
e3ba263d
AW
62
63@example
64(define-language scheme
41e64dd7
AW
65 #:title "Scheme"
66 #:reader (lambda (port env) ...)
98850fd7 67 #:compilers `((tree-il . ,compile-tree-il))
81fd3152 68 #:decompilers `((tree-il . ,decompile-tree-il))
41e64dd7
AW
69 #:evaluator (lambda (x module) (primitive-eval x))
70 #:printer write
71 #:make-default-environment (lambda () ...))
e3ba263d 72@end example
e3ba263d
AW
73@end deffn
74
75The interesting thing about having languages defined this way is that
67915ab0 76they present a uniform interface to the read-eval-print loop. This
e3ba263d
AW
77allows the user to change the current language of the REPL:
78
79@example
81fd3152 80scheme@@(guile-user)> ,language tree-il
41e64dd7
AW
81Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
82tree-il@@(guile-user)> ,L scheme
83Happy hacking with Scheme! To switch back, type `,L tree-il'.
84scheme@@(guile-user)>
e3ba263d
AW
85@end example
86
87Languages can be looked up by name, as they were above.
88
89@deffn {Scheme Procedure} lookup-language name
90Looks up a language named @var{name}, autoloading it if necessary.
91
92Languages are autoloaded by looking for a variable named @var{name} in
93a module named @code{(language @var{name} spec)}.
94
95The language object will be returned, or @code{#f} if there does not
96exist a language with that name.
97@end deffn
98
99Defining languages this way allows us to programmatically determine
100the necessary steps for compiling code from one language to another.
101
102@deffn {Scheme Procedure} lookup-compilation-order from to
103Recursively traverses the set of languages to which @var{from} can
104compile, depth-first, and return the first path that can transform
105@var{from} to @var{to}. Returns @code{#f} if no path is found.
106
107This function memoizes its results in a cache that is invalidated by
108subsequent calls to @code{define-language}, so it should be quite
109fast.
110@end deffn
111
5745de91
AW
112There is a notion of a ``current language'', which is maintained in the
113@code{current-language} parameter, defined in the core @code{(guile)}
67915ab0
AW
114module. This language is normally Scheme, and may be rebound by the
115user. The run-time compilation interfaces
e3ba263d
AW
116(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
117and target languages.
118
119The normal tower of languages when compiling Scheme goes like this:
120
121@itemize
41e64dd7 122@item Scheme
81fd3152 123@item Tree Intermediate Language (Tree-IL)
67915ab0 124@item Continuation-Passing Style (CPS)
81fd3152 125@item Bytecode
e3ba263d
AW
126@end itemize
127
67915ab0
AW
128As discussed before (@pxref{Object File Format}), bytecode is in ELF
129format, ready to be serialized to disk. But when compiling Scheme at
130run time, you want a Scheme value: for example, a compiled procedure.
131For this reason, so as not to break the abstraction, Guile defines a
132fake language at the bottom of the tower:
81fd3152
AW
133
134@itemize
135@item Value
136@end itemize
137
d297e544
AW
138Compiling to @code{value} loads the bytecode into a procedure, turning
139cold bytes into warm code.
e3ba263d
AW
140
141Perhaps this strangeness can be explained by example:
d297e544 142@code{compile-file} defaults to compiling to bytecode, because it
e3ba263d 143produces object code that has to live in the barren world outside the
67915ab0
AW
144Guile runtime; but @code{compile} defaults to compiling to @code{value},
145as its product re-enters the Guile world.
146
147@c FIXME: This doesn't work anymore :( Should we add some kind of
148@c special GC pass, or disclaim this kind of code, or what?
e3ba263d
AW
149
150Indeed, the process of compilation can circulate through these
151different worlds indefinitely, as shown by the following quine:
152
153@example
00ce5125 154((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
e3ba263d 155@end example
00ce5125
AW
156
157@node The Scheme Compiler
158@subsection The Scheme Compiler
159
7081d4f9 160The job of the Scheme compiler is to expand all macros and all of Scheme
67915ab0
AW
161to its most primitive expressions. The definition of ``primitive
162expression'' is given by the inventory of constructs provided by
163Tree-IL, the target language of the Scheme compiler: procedure calls,
164conditionals, lexical references, and so on. This is described more
165fully in the next section.
81fd3152
AW
166
167The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
67915ab0 168that it is completely implemented by the macro expander. Since the
81fd3152
AW
169macro expander has to run over all of the source code already in order
170to expand macros, it might as well do the analysis at the same time,
171producing Tree-IL expressions directly.
172
67915ab0
AW
173Because this compiler is actually the macro expander, it is extensible.
174Any macro which the user writes becomes part of the compiler.
81fd3152
AW
175
176The Scheme-to-Tree-IL expander may be invoked using the generic
177@code{compile} procedure:
178
179@lisp
180(compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
181@result{}
67915ab0 182#<tree-il (call (toplevel +) (const 1) (const 2))>
81fd3152
AW
183@end lisp
184
67915ab0
AW
185@code{(compile @var{foo} #:from 'scheme #:to 'tree-il)} is entirely
186equivalent to calling the macro expander as @code{(macroexpand @var{foo}
187'c '(compile load eval))}. @xref{Macro Expansion}.
188@code{compile-tree-il}, the procedure dispatched by @code{compile} to
189@code{'tree-il}, is a small wrapper around @code{macroexpand}, to make
190it conform to the general form of compiler procedures in Guile's
191language tower.
81fd3152 192
98850fd7
AW
193Compiler procedures take three arguments: an expression, an
194environment, and a keyword list of options. They return three values:
195the compiled expression, the corresponding environment for the target
196language, and a ``continuation environment''. The compiled expression
197and environment will serve as input to the next language's compiler.
198The ``continuation environment'' can be used to compile another
199expression from the same source language within the same module.
81fd3152
AW
200
201For example, you might compile the expression, @code{(define-module
202(foo))}. This will result in a Tree-IL expression and environment. But
203if you compiled a second expression, you would want to take into
204account the compile-time effect of compiling the previous expression,
205which puts the user in the @code{(foo)} module. That is purpose of the
206``continuation environment''; you would pass it as the environment
207when compiling the subsequent expression.
208
41e64dd7
AW
209For Scheme, an environment is a module. By default, the @code{compile}
210and @code{compile-file} procedures compile in a fresh module, such
211that bindings and macros introduced by the expression being compiled
212are isolated:
1ebe6a63
LC
213
214@example
215(eq? (current-module) (compile '(current-module)))
216@result{} #f
217
218(compile '(define hello 'world))
219(defined? 'hello)
220@result{} #f
221
222(define / *)
223(eq? (compile '/) /)
224@result{} #f
225@end example
226
227Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
228@code{current-reader}}) are isolated:
229
230@example
231(compile '(fluid-set! current-reader (lambda args 'fail)))
232(fluid-ref current-reader)
233@result{} #f
234@end example
235
236Nevertheless, having the compiler and @dfn{compilee} share the same name
237space can be achieved by explicitly passing @code{(current-module)} as
238the compilation environment:
239
240@example
241(define hello 'world)
242(compile 'hello #:env (current-module))
243@result{} world
244@end example
245
81fd3152
AW
246@node Tree-IL
247@subsection Tree-IL
00ce5125 248
81fd3152 249Tree Intermediate Language (Tree-IL) is a structured intermediate
c850030f
AW
250language that is close in expressive power to Scheme. It is an
251expanded, pre-analyzed Scheme.
252
81fd3152
AW
253Tree-IL is ``structured'' in the sense that its representation is
254based on records, not S-expressions. This gives a rigidity to the
255language that ensures that compiling to a lower-level language only
41e64dd7
AW
256requires a limited set of transformations. For example, the Tree-IL
257type @code{<const>} is a record type with two fields, @code{src} and
258@code{exp}. Instances of this type are created via @code{make-const}.
259Fields of this type are accessed via the @code{const-src} and
260@code{const-exp} procedures. There is also a predicate, @code{const?}.
261@xref{Records}, for more information on records.
81fd3152
AW
262
263@c alpha renaming
264
265All Tree-IL types have a @code{src} slot, which holds source location
266information for the expression. This information, if present, will be
267residualized into the compiled object code, allowing backtraces to
268show source information. The format of @code{src} is the same as that
269returned by Guile's @code{source-properties} function. @xref{Source
270Properties}, for more information.
271
272Although Tree-IL objects are represented internally using records,
273there is also an equivalent S-expression external representation for
ecb87335 274each kind of Tree-IL. For example, the S-expression representation
81fd3152 275of @code{#<const src: #f exp: 3>} expression would be:
c850030f
AW
276
277@example
81fd3152 278(const 3)
c850030f
AW
279@end example
280
81fd3152 281Users may program with this format directly at the REPL:
c850030f
AW
282
283@example
81fd3152 284scheme@@(guile-user)> ,language tree-il
41e64dd7 285Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
67915ab0 286tree-il@@(guile-user)> (call (primitive +) (const 32) (const 10))
c850030f
AW
287@result{} 42
288@end example
289
81fd3152
AW
290The @code{src} fields are left out of the external representation.
291
98850fd7
AW
292One may create Tree-IL objects from their external representations via
293calling @code{parse-tree-il}, the reader for Tree-IL. If any source
294information is attached to the input S-expression, it will be
295propagated to the resulting Tree-IL expressions. This is probably the
296easiest way to compile to Tree-IL: just make the appropriate external
297representations in S-expression format, and let @code{parse-tree-il}
298take care of the rest.
299
81fd3152
AW
300@deftp {Scheme Variable} <void> src
301@deftpx {External Representation} (void)
67915ab0 302An empty expression. In practice, equivalent to Scheme's @code{(if #f
81fd3152
AW
303#f)}.
304@end deftp
d297e544 305
81fd3152
AW
306@deftp {Scheme Variable} <const> src exp
307@deftpx {External Representation} (const @var{exp})
308A constant.
309@end deftp
d297e544 310
81fd3152
AW
311@deftp {Scheme Variable} <primitive-ref> src name
312@deftpx {External Representation} (primitive @var{name})
67915ab0
AW
313A reference to a ``primitive''. A primitive is a procedure that, when
314compiled, may be open-coded. For example, @code{cons} is usually
81fd3152
AW
315recognized as a primitive, so that it compiles down to a single
316instruction.
317
318Compilation of Tree-IL usually begins with a pass that resolves some
319@code{<module-ref>} and @code{<toplevel-ref>} expressions to
67915ab0 320@code{<primitive-ref>} expressions. The actual compilation pass has
7081d4f9
AW
321special cases for calls to certain primitives, like @code{apply} or
322@code{cons}.
81fd3152 323@end deftp
d297e544 324
81fd3152
AW
325@deftp {Scheme Variable} <lexical-ref> src name gensym
326@deftpx {External Representation} (lexical @var{name} @var{gensym})
67915ab0 327A reference to a lexically-bound variable. The @var{name} is the
81fd3152
AW
328original name of the variable in the source program. @var{gensym} is a
329unique identifier for this variable.
330@end deftp
d297e544 331
81fd3152
AW
332@deftp {Scheme Variable} <lexical-set> src name gensym exp
333@deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
334Sets a lexically-bound variable.
335@end deftp
d297e544 336
81fd3152
AW
337@deftp {Scheme Variable} <module-ref> src mod name public?
338@deftpx {External Representation} (@@ @var{mod} @var{name})
339@deftpx {External Representation} (@@@@ @var{mod} @var{name})
340A reference to a variable in a specific module. @var{mod} should be
679cceed 341the name of the module, e.g.@: @code{(guile-user)}.
81fd3152
AW
342
343If @var{public?} is true, the variable named @var{name} will be looked
344up in @var{mod}'s public interface, and serialized with @code{@@};
345otherwise it will be looked up among the module's private bindings,
346and is serialized with @code{@@@@}.
347@end deftp
d297e544 348
81fd3152
AW
349@deftp {Scheme Variable} <module-set> src mod name public? exp
350@deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
351@deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
352Sets a variable in a specific module.
353@end deftp
d297e544 354
81fd3152
AW
355@deftp {Scheme Variable} <toplevel-ref> src name
356@deftpx {External Representation} (toplevel @var{name})
357References a variable from the current procedure's module.
358@end deftp
d297e544 359
81fd3152
AW
360@deftp {Scheme Variable} <toplevel-set> src name exp
361@deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
362Sets a variable in the current procedure's module.
363@end deftp
d297e544 364
81fd3152
AW
365@deftp {Scheme Variable} <toplevel-define> src name exp
366@deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
367Defines a new top-level variable in the current procedure's module.
368@end deftp
d297e544 369
81fd3152
AW
370@deftp {Scheme Variable} <conditional> src test then else
371@deftpx {External Representation} (if @var{test} @var{then} @var{else})
ca445ba5 372A conditional. Note that @var{else} is not optional.
c850030f 373@end deftp
d297e544 374
7081d4f9
AW
375@deftp {Scheme Variable} <call> src proc args
376@deftpx {External Representation} (call @var{proc} . @var{args})
ca445ba5 377A procedure call.
c850030f 378@end deftp
d297e544 379
a881a4ae
AW
380@deftp {Scheme Variable} <primcall> src name args
381@deftpx {External Representation} (primcall @var{name} . @var{args})
382A call to a primitive. Equivalent to @code{(call (primitive @var{name})
383. @var{args})}. This construct is often more convenient to generate and
384analyze than @code{<call>}.
385
386As part of the compilation process, instances of @code{(call (primitive
387@var{name}) . @var{args})} are transformed into primcalls.
388@end deftp
d297e544 389
67915ab0
AW
390@deftp {Scheme Variable} <seq> src head tail
391@deftpx {External Representation} (seq @var{head} @var{tail})
392A sequence. The semantics is that @var{head} is evaluated first, and
393any resulting values are ignored. Then @var{tail} is evaluated, in tail
394position.
c850030f 395@end deftp
d297e544 396
41e64dd7
AW
397@deftp {Scheme Variable} <lambda> src meta body
398@deftpx {External Representation} (lambda @var{meta} @var{body})
67915ab0
AW
399A closure. @var{meta} is an association list of properties for the
400procedure. @var{body} is a single Tree-IL expression of type
401@code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
41e64dd7
AW
402an alternate clause, this makes Tree-IL's @code{<lambda>} have the
403expressiveness of Scheme's @code{case-lambda}.
404@end deftp
d297e544 405
41e64dd7
AW
406@deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
407@deftpx {External Representation} @
408 (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
409 @var{body})@
410 [@var{alternate}])
67915ab0 411One clause of a @code{case-lambda}. A @code{lambda} expression in
41e64dd7
AW
412Scheme is treated as a @code{case-lambda} with one clause.
413
414@var{req} is a list of the procedure's required arguments, as symbols.
415@var{opt} is a list of the optional arguments, or @code{#f} if there
416are no optional arguments. @var{rest} is the name of the rest
417argument, or @code{#f}.
418
419@var{kw} is a list of the form, @code{(@var{allow-other-keys?}
420(@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
421keyword corresponding to the argument named @var{name}, and whose
67915ab0
AW
422corresponding gensym is @var{var}. @var{inits} are tree-il expressions
423corresponding to all of the optional and keyword arguments, evaluated to
424bind variables whose value is not supplied by the procedure caller.
41e64dd7
AW
425Each @var{init} expression is evaluated in the lexical context of
426previously bound variables, from left to right.
427
428@var{gensyms} is a list of gensyms corresponding to all arguments:
429first all of the required arguments, then the optional arguments if
430any, then the rest argument if any, then all of the keyword arguments.
431
67915ab0 432@var{body} is the body of the clause. If the procedure is called with
41e64dd7 433an appropriate number of arguments, @var{body} is evaluated in tail
67915ab0 434position. Otherwise, if there is an @var{alternate}, it should be a
41e64dd7 435@code{<lambda-case>} expression, representing the next clause to try.
64de6db5 436If there is no @var{alternate}, a wrong-number-of-arguments error is
41e64dd7
AW
437signaled.
438@end deftp
d297e544 439
41e64dd7
AW
440@deftp {Scheme Variable} <let> src names gensyms vals exp
441@deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
67915ab0
AW
442Lexical binding, like Scheme's @code{let}. @var{names} are the original
443binding names, @var{gensyms} are gensyms corresponding to the
81fd3152
AW
444@var{names}, and @var{vals} are Tree-IL expressions for the values.
445@var{exp} is a single Tree-IL expression.
446@end deftp
d297e544 447
935c7aca 448@deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
172988ee
AW
449@deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
450@deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
81fd3152 451A version of @code{<let>} that creates recursive bindings, like
935c7aca 452Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
81fd3152 453@end deftp
d297e544 454
67915ab0
AW
455@deftp {Scheme Variable} <prompt> escape-only? tag body handler
456@deftpx {External Representation} (prompt @var{escape-only?} @var{tag} @var{body} @var{handler})
457A dynamic prompt. Instates a prompt named @var{tag}, an expression,
41e64dd7 458during the dynamic extent of the execution of @var{body}, also an
67915ab0
AW
459expression. If an abort occurs to this prompt, control will be passed
460to @var{handler}, also an expression, which should be a procedure. The
461first argument to the handler procedure will be the captured
462continuation, followed by all of the values passed to the abort. If
463@var{escape-only?} is true, the handler should be a @code{<lambda>} with
464a single @code{<lambda-case>} body expression with no optional or
465keyword arguments, and no alternate, and whose first argument is
466unreferenced. @xref{Prompts}, for more information.
41e64dd7 467@end deftp
d297e544 468
41e64dd7
AW
469@deftp {Scheme Variable} <abort> tag args tail
470@deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
471An abort to the nearest prompt with the name @var{tag}, an expression.
472@var{args} should be a list of expressions to pass to the prompt's
473handler, and @var{tail} should be an expression that will evaluate to
67915ab0 474a list of additional arguments. An abort will save the partial
41e64dd7
AW
475continuation, which may later be reinstated, resulting in the
476@code{<abort>} expression evaluating to some number of values.
477@end deftp
81fd3152 478
98850fd7
AW
479There are two Tree-IL constructs that are not normally produced by
480higher-level compilers, but instead are generated during the
481source-to-source optimization and analysis passes that the Tree-IL
67915ab0
AW
482compiler does. Users should not generate these expressions directly,
483unless they feel very clever, as the default analysis pass will generate
484them as necessary.
98850fd7 485
41e64dd7
AW
486@deftp {Scheme Variable} <let-values> src names gensyms exp body
487@deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
98850fd7
AW
488Like Scheme's @code{receive} -- binds the values returned by
489evaluating @code{exp} to the @code{lambda}-like bindings described by
67915ab0 490@var{gensyms}. That is to say, @var{gensyms} may be an improper list.
98850fd7 491
7081d4f9 492@code{<let-values>} is an optimization of a @code{<call>} to the
98850fd7
AW
493primitive, @code{call-with-values}.
494@end deftp
d297e544 495
41e64dd7
AW
496@deftp {Scheme Variable} <fix> src names gensyms vals body
497@deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
98850fd7
AW
498Like @code{<letrec>}, but only for @var{vals} that are unset
499@code{lambda} expressions.
500
501@code{fix} is an optimization of @code{letrec} (and @code{let}).
502@end deftp
81fd3152 503
67915ab0
AW
504Tree-IL is a convenient compilation target from source languages. It
505can be convenient as a medium for optimization, though CPS is usually
506better. The strength of Tree-IL is that it does not fix order of
507evaluation, so it makes some code motion a bit easier.
00ce5125 508
67915ab0 509Optimization passes performed on Tree-IL currently include:
00ce5125 510
73643339 511@itemize
67915ab0
AW
512@item Open-coding (turning toplevel-refs into primitive-refs,
513and calls to primitives to primcalls)
514@item Partial evaluation (comprising inlining, copy propagation, and
515constant folding)
516@item Common subexpression elimination (CSE)
73643339
AW
517@end itemize
518
67915ab0
AW
519In the future, we will move the CSE pass to operate over the lower-level
520CPS language.
73643339 521
67915ab0
AW
522@node Continuation-Passing Style
523@subsection Continuation-Passing Style
73643339 524
67915ab0 525@cindex CPS
58b23156
AW
526Continuation-passing style (CPS) is Guile's principal intermediate
527language, bridging the gap between languages for people and languages
528for machines. CPS gives a name to every part of a program: every
529control point, and every intermediate value. This makes it an excellent
530medium for reasoning about programs, which is the principal job of a
531compiler.
532
533@menu
534* An Introduction to CPS::
535* CPS in Guile::
bdad1340 536* Building CPS::
69dc8268 537* Compiling CPS::
58b23156
AW
538@end menu
539
540@node An Introduction to CPS
541@subsubsection An Introduction to CPS
542
69dc8268 543Consider the following Scheme expression:
58b23156
AW
544
545@lisp
546(begin
547 (display "The sum of 32 and 10 is: ")
548 (display 42)
549 (newline))
550@end lisp
551
69dc8268
AW
552Let us identify all of the sub-expressions in this expression,
553annotating them with unique labels:
58b23156
AW
554
555@lisp
556(begin
557 (display "The sum of 32 and 10 is: ")
558 |k1 k2
559 k0
560 (display 42)
561 |k4 k5
562 k3
563 (newline))
564 |k7
565 k6
566@end lisp
567
69dc8268
AW
568Each of these labels identifies a point in a program. One label may be
569the continuation of another label. For example, the continuation of
570@code{k7} is @code{k6}. This is because after evaluating the value of
571@code{newline}, performed by the expression labelled @code{k7}, we
bdad1340 572continue to apply it in @code{k6}.
58b23156 573
69dc8268
AW
574Which expression has @code{k0} as its continuation? It is either the
575expression labelled @code{k1} or the expression labelled @code{k2}.
576Scheme does not have a fixed order of evaluation of arguments, though it
577does guarantee that they are evaluated in some order. Unlike general
578Scheme, continuation-passing style makes evaluation order explicit. In
579Guile, this choice is made by the higher-level language compilers.
58b23156
AW
580
581Let us assume a left-to-right evaluation order. In that case the
582continuation of @code{k1} is @code{k2}, and the continuation of
583@code{k2} is @code{k0}.
584
585With this example established, we are ready to give an example of CPS in
586Scheme:
587
69dc8268 588@smalllisp
58b23156
AW
589(lambda (ktail)
590 (let ((k1 (lambda ()
591 (let ((k2 (lambda (proc)
592 (let ((k0 (lambda (arg0)
593 (proc k4 arg0))))
594 (k0 "The sum of 32 and 10 is: ")))))
595 (k2 display))))
596 (k4 (lambda _
597 (let ((k5 (lambda (proc)
598 (let ((k3 (lambda (arg0)
599 (proc k7 arg0))))
600 (k3 42)))))
601 (k5 display))))
602 (k7 (lambda _
603 (let ((k6 (lambda (proc)
604 (proc ktail))))
605 (k6 newline)))))
606 (k1))
69dc8268 607@end smalllisp
58b23156
AW
608
609Holy code explosion, Batman! What's with all the lambdas? Indeed, CPS
610is by nature much more verbose than ``direct-style'' intermediate
69dc8268
AW
611languages like Tree-IL. At the same time, CPS is simpler than full
612Scheme, because it makes things more explicit.
58b23156
AW
613
614In the original program, the expression labelled @code{k0} is in effect
69dc8268
AW
615context. Any values it returns are ignored. In Scheme, this fact is
616implicit. In CPS, we can see it explicitly by noting that its
617continuation, @code{k4}, takes any number of values and ignores them.
618Compare this to @code{k2}, which takes a single value; in this way we
619can say that @code{k1} is in a ``value'' context. Likewise @code{k6} is
620in tail context with respect to the expression as a whole, because its
621continuation is the tail continuation, @code{ktail}. CPS makes these
622details manifest, and gives them names.
bdad1340 623
58b23156
AW
624@node CPS in Guile
625@subsubsection CPS in Guile
626
bdad1340
AW
627Guile's CPS language is composed of @dfn{terms}, @dfn{expressions},
628and @dfn{continuations}.
58b23156 629
bdad1340
AW
630A term can either evaluate an expression and pass the resulting values
631to some continuation, or it can declare local continuations and contain
632a sub-term in the scope of those continuations.
633
634@deftp {CPS Term} $continue k src exp
635Evaluate the expression @var{exp} and pass the resulting values (if any)
636to the continuation labelled @var{k}. The source information associated
637with the expression may be found in @var{src}, which is either an alist
638as in @code{source-properties} or is @code{#f} if there is no associated
639source.
58b23156
AW
640@end deftp
641
bdad1340
AW
642@deftp {CPS Term} $letk conts body
643Bind @var{conts}, a list of continuations (@code{$cont} instances), in
644the scope of the sub-term @var{body}. The continuations are mutually
645recursive.
58b23156
AW
646@end deftp
647
bdad1340
AW
648Additionally, the early stages of CPS allow for a set of mutually
649recursive functions to be declared as a term. This @code{$letrec} type
650is like Tree-IL's @code{<fix>}. The contification pass will attempt to
651transform the functions declared in a @code{$letrec} into local
652continuations. Any remaining functions are later lowered to @code{$fun}
653expressions.
58b23156 654
bdad1340
AW
655@deftp {CPS Term} $letrec names syms funs body
656Declare the mutually recursive set of functions denoted by @var{names},
657@var{syms}, and @var{funs} within the sub-term @var{body}. @var{names}
658and @var{syms} are lists of symbols, and @var{funs} is a list of
507c58b2 659@code{$fun} values. @var{syms} are globally unique.
58b23156
AW
660@end deftp
661
bdad1340
AW
662Here is an inventory of the kinds of expressions in Guile's CPS
663language. Recall that all expressions are wrapped in a @code{$continue}
664term which specifies their continuation.
58b23156 665
bdad1340
AW
666@deftp {CPS Expression} $void
667Continue with the unspecified value.
58b23156 668@end deftp
bdad1340
AW
669
670@deftp {CPS Expression} $const val
671Continue with the constant value @var{val}.
58b23156 672@end deftp
bdad1340
AW
673
674@deftp {CPS Expression} $prim name
675Continue with the procedure that implements the primitive operation
676named by @var{name}.
58b23156 677@end deftp
bdad1340
AW
678
679@deftp {CPS Expression} $fun src meta free body
680Continue with a procedure. @var{src} identifies the source information
681for the procedure declaration, and @var{meta} is the metadata alist as
682described above in Tree-IL's @code{<lambda>}. @var{free} is a list of
683free variables accessed by the procedure. Early CPS uses an empty list
684for @var{free}; only after closure conversion is it correctly populated.
685Finally, @var{body} is the @code{$kentry} @code{$cont} of the procedure
686entry.
58b23156 687@end deftp
bdad1340
AW
688
689@deftp {CPS Expression} $call proc args
b3ae2b50 690@deftpx {CPS Expression} $callk label proc args
bdad1340
AW
691Call @var{proc} with the arguments @var{args}, and pass all values to
692the continuation. @var{proc} and the elements of the @var{args} list
693should all be variable names. The continuation identified by the term's
694@var{k} should be a @code{$kreceive} or a @code{$ktail} instance.
b3ae2b50
AW
695
696@code{$callk} is for the case where the call target is known to be in
697the same compilation unit. @var{label} should be some continuation
698label, though it need not be in scope. In this case the @var{proc} is
699simply an additional argument, since it is not used to determine the
700call target at run-time.
58b23156 701@end deftp
bdad1340
AW
702
703@deftp {CPS Expression} $primcall name args
704Perform the primitive operation identified by @code{name}, a well-known
705symbol, passing it the arguments @var{args}, and pass all resulting
706values to the continuation. The set of available primitives includes
707all primitives known to Tree-IL and then some more; see the source code
708for details.
58b23156
AW
709@end deftp
710
bdad1340
AW
711@deftp {CPS Expression} $values args
712Pass the values named by the list @var{args} to the continuation.
713@end deftp
58b23156 714
bdad1340
AW
715@deftp {CPS Expression} $prompt escape? tag handler
716Push a prompt on the stack identified by the variable name @var{tag},
717which may be escape-only if @var{escape?} is true, and continue with
718zero values. If the body aborts to this prompt, control will proceed at
719the continuation labelled @var{handler}, which should be a
720@code{$kreceive} continuation. Prompts are later popped by
721@code{pop-prompt} primcalls.
722@end deftp
58b23156 723
bdad1340
AW
724The remaining element of the CPS language in Guile is the continuation.
725In CPS, all continuations have unique labels. Since this aspect is
726common to all continuation types, all continuations are contained in a
727@code{$cont} instance:
58b23156 728
bdad1340
AW
729@deftp {CPS Continuation Wrapper} $cont k cont
730Declare a continuation labelled @var{k}. All references to the
731continuation will use this label.
732@end deftp
58b23156 733
bdad1340
AW
734The most common kind of continuation binds some number of values, and
735then evaluates a sub-term. @code{$kargs} is this kind of simple
736@code{lambda}.
58b23156 737
bdad1340
AW
738@deftp {CPS Continuation} $kargs names syms body
739Bind the incoming values to the variables @var{syms}, with original
740names @var{names}, and then evaluate the sub-term @var{body}.
741@end deftp
58b23156 742
bdad1340
AW
743Variable names (the names in the @var{syms} of a @code{$kargs}) should
744be globally unique, and also disjoint from continuation labels. To bind
745a value to a variable and then evaluate some term, you would continue
746with the value to a @code{$kargs} that declares one variable. The bound
747value would then be available for use within the body of the
748@code{$kargs}.
58b23156 749
bdad1340
AW
750@deftp {CPS Continuation} $kif kt kf
751Receive one value. If it is true for the purposes of Scheme, branch to
752the continuation labelled @var{kt}, passing no values; otherwise, branch
753to @var{kf}.
754@end deftp
755
756For internal reasons, only certain terms may continue to a @code{$kif}.
757Compiling @code{$kif} avoids allocating space for the test variable, so
758it needs to be preceded by expressions that can test-and-branch without
759temporary values. In practice this condition is true for
760@code{$primcall}s to @code{null?}, @code{=}, and similar primitives that
761have corresponding @code{br-if-@var{foo}} VM operations; see the source
762code for full details. When in doubt, bind the test expression to a
763variable, and continue to the @code{$kif} with a @code{$values}
764expression. The optimizer should elide the @code{$values} if it is not
765needed.
58b23156 766
bdad1340
AW
767Calls out to other functions need to be wrapped in a @code{$kreceive}
768continuation in order to adapt the returned values to their uses in the
769calling function, if any.
770
771@deftp {CPS Continuation} $kreceive arity k
772Receive values on the stack. Parse them according to @var{arity}, and
507c58b2 773then proceed with the parsed values to the @code{$kargs} continuation
bdad1340
AW
774labelled @var{k}. As a limitation specific to @code{$kreceive},
775@var{arity} may only contain required and rest arguments.
776@end deftp
777
778@code{$arity} is a helper data structure used by @code{$kreceive} and
779also by @code{$kclause}, described below.
780
781@deftp {CPS Data} $arity req opt rest kw allow-other-keys?
782A data type declaring an arity. @var{req} and @var{opt} are lists of
783source names of required and optional arguments, respectively.
784@var{rest} is either the source name of the rest variable, or @code{#f}
785if this arity does not accept additional values. @var{kw} is a list of
786the form @code{((@var{keyword} @var{name} @var{var}) ...)}, describing
787the keyword arguments. @var{allow-other-keys?} is true if other keyword
788arguments are allowed and false otherwise.
789
790Note that all of these names with the exception of the @var{var}s in the
791@var{kw} list are source names, not unique variable names.
792@end deftp
793
794Additionally, there are three specific kinds of continuations that can
795only be declared at function entries.
796
797@deftp {CPS Continuation} $kentry self tail clauses
798Declare a function entry. @var{self} is a variable bound to the
799procedure being called, and which may be used for self-references.
800@var{tail} declares the @code{$cont} wrapping the @code{$ktail} for this
801function, corresponding to the function's tail continuation.
802@var{clauses} is a list of @code{$kclause} @code{$cont} instances.
803@end deftp
804
805@deftp {CPS Continuation} $ktail
806A tail continuation.
807@end deftp
808
809@deftp {CPS Continuation} $kclause arity cont
810A clause of a function with a given arity. Applications of a function
811with a compatible set of actual arguments will continue to @var{cont}, a
812@code{$kargs} @code{$cont} instance representing the clause body.
813@end deftp
814
815
816@node Building CPS
817@subsubsection Building CPS
818
819Unlike Tree-IL, the CPS language is built to be constructed and
820deconstructed with abstract macros instead of via procedural
821constructors or accessors, or instead of S-expression matching.
822
823Deconstruction and matching is handled adequately by the @code{match}
824form from @code{(ice-9 match)}. @xref{Pattern Matching}. Construction
825is handled by a set of mutually recursive builder macros:
826@code{build-cps-term}, @code{build-cps-cont}, and @code{build-cps-exp}.
827
828In the following interface definitions, consider variables containing
829@code{cont} to be recursively build by @code{build-cps-cont}, and
830likewise for @code{term} and @code{exp}. Consider any other name to be
831evaluated as a Scheme expression. Many of these forms recognize
832@code{unquote} in some contexts, to splice in a previously-built value;
833see the specifications below for full details.
834
835@deffn {Scheme Syntax} build-cps-term ,val
836@deffnx {Scheme Syntax} build-cps-term ($letk (cont ...) term)
837@deffnx {Scheme Syntax} build-cps-term ($letrec names syms funs term)
838@deffnx {Scheme Syntax} build-cps-term ($continue k src exp)
839@deffnx {Scheme Syntax} build-cps-exp ,val
840@deffnx {Scheme Syntax} build-cps-exp ($void)
841@deffnx {Scheme Syntax} build-cps-exp ($const val)
842@deffnx {Scheme Syntax} build-cps-exp ($prim name)
843@deffnx {Scheme Syntax} build-cps-exp ($fun src meta free body)
844@deffnx {Scheme Syntax} build-cps-exp ($call proc (arg ...))
845@deffnx {Scheme Syntax} build-cps-exp ($call proc args)
846@deffnx {Scheme Syntax} build-cps-exp ($primcall name (arg ...))
847@deffnx {Scheme Syntax} build-cps-exp ($primcall name args)
848@deffnx {Scheme Syntax} build-cps-exp ($values (arg ...))
849@deffnx {Scheme Syntax} build-cps-exp ($values args)
850@deffnx {Scheme Syntax} build-cps-exp ($prompt escape? tag handler)
851@deffnx {Scheme Syntax} build-cps-cont ,val
852@deffnx {Scheme Syntax} build-cps-cont (k ($kargs (name ...) (sym ...) term))
853@deffnx {Scheme Syntax} build-cps-cont (k ($kargs names syms term))
854@deffnx {Scheme Syntax} build-cps-cont (k ($kif kt kf))
855@deffnx {Scheme Syntax} build-cps-cont (k ($kreceive req rest kargs))
856@deffnx {Scheme Syntax} build-cps-cont (k ($kentry self tail-cont ,clauses))
857@deffnx {Scheme Syntax} build-cps-cont (k ($kentry self tail-cont (cont ...)))
858@deffnx {Scheme Syntax} build-cps-cont (k ($kclause ,arity cont))
859@deffnx {Scheme Syntax} build-cps-cont (k ($kclause (req opt rest kw aok?) cont))
860Construct a CPS term, expression, or continuation.
861@end deffn
58b23156 862
bdad1340 863There are a few more miscellaneous interfaces as well.
58b23156 864
bdad1340
AW
865@deffn {Scheme Procedure} make-arity req opt rest kw allow-other-keywords?
866A procedural constructor for @code{$arity} objects.
867@end deffn
58b23156 868
bdad1340
AW
869@deffn {Scheme Syntax} let-gensyms (sym ...) body ...
870Bind @var{sym...} to fresh names, and evaluate @var{body...}.
871@end deffn
58b23156 872
bdad1340
AW
873@deffn {Scheme Syntax} rewrite-cps-term val (pat term) ...
874@deffnx {Scheme Syntax} rewrite-cps-exp val (pat exp) ...
875@deffnx {Scheme Syntax} rewrite-cps-cont val (pat cont) ...
876Match @var{val} against the series of patterns @var{pat...}, using
877@code{match}. The body of the matching clause should be a template in
878the syntax of @code{build-cps-term}, @code{build-cps-exp}, or
879@code{build-cps-cont}, respectively.
880@end deffn
73643339 881
69dc8268
AW
882@node Compiling CPS
883@subsubsection Compiling CPS
884
885Compiling CPS in Guile has three phases: conversion, optimization, and
886code generation.
887
888CPS conversion is the process of taking a higher-level language and
889compiling it to CPS. Source languages can do this directly, or they can
890convert to Tree-IL (which is probably easier) and let Tree-IL convert to
891CPS later. Going through Tree-IL has the advantage of running Tree-IL
892optimization passes, like partial evaluation. Also, the compiler from
893Tree-IL to CPS handles assignment conversion, in which assigned local
894variables (in Tree-IL, locals that are @code{<lexical-set>}) are
895converted to being boxed values on the heap. @xref{Variables and the
896VM}.
897
898After CPS conversion, Guile runs some optimization passes. The major
899optimization performed on CPS is contification, in which functions that
900are always called with the same continuation are incorporated directly
901into a function's body. This opens up space for more optimizations, and
902turns procedure calls into @code{goto}. It can also make loops out of
903recursive function nests.
904
905At the time of this writing (2014), most high-level optimization in
906Guile is done on Tree-IL. We would like to rewrite many of these passes
907to operate on CPS instead, as it is easier to reason about CPS.
908
909The rest of the optimization passes are really cleanups and
910canonicalizations. CPS spans the gap between high-level languages and
911low-level bytecodes, which allows much of the compilation process to be
912expressed as source-to-source transformations. Such is the case for
913closure conversion, in which references to variables that are free in a
914function are converted to closure references, and in which functions are
915converted to closures. There are a few more passes to ensure that the
916only primcalls left in the term are those that have a corresponding
917instruction in the virtual machine, and that their continuations expect
918the right number of values.
919
920Finally, the backend of the CPS compiler emits bytecode for each
921function, one by one. To do so, it determines the set of live variables
922at all points in the function. Using this liveness information, it
923allocates stack slots to each variable, such that a variable can live in
924one slot for the duration of its lifetime, without shuffling. (Of
925course, variables with disjoint lifetimes can share a slot.) Finally
926the backend emits code, typically just one VM instruction, for each
927continuation in the function.
928
929
67915ab0
AW
930@node Bytecode
931@subsection Bytecode
73643339 932
69dc8268
AW
933As mentioned before, Guile compiles all code to bytecode, and that
934bytecode is contained in ELF images. @xref{Object File Format}, for
935more on Guile's use of ELF.
936
937To produce a bytecode image, Guile provides an assembler and a linker.
938
939The assembler, defined in the @code{(system vm assembler)} module, has a
940relatively straightforward imperative interface. It provides a
941@code{make-assembler} function to instantiate an assembler and a set of
942@code{emit-@var{inst}} procedures to emit instructions of each kind.
943
944The @code{emit-@var{inst}} procedures are actually generated at
945compile-time from a machine-readable description of the VM. With a few
946exceptions for certain operand types, each operand of an emit procedure
947corresponds to an operand of the corresponding instruction.
948
949Consider @code{vector-length}, from @pxref{Miscellaneous Instructions}.
950It is documented as:
951
952@deftypefn Instruction {} vector-length u12:@var{dst} u12:@var{src}
953@end deftypefn
954
955Therefore the emit procedure has the form:
956
957@deffn {Scheme Procedure} emit-vector-length asm dst src
958@end deffn
959
960All emit procedure take the assembler as their first argument, and
961return no useful values.
962
963The argument types depend on the operand types. @xref{Instruction Set}.
964Most are integers within a restricted range, though labels are generally
965expressed as opaque symbols.
966
967There are a few macro-instructions as well.
968
969@deffn {Scheme Procedure} emit-label asm label
970Define a label at the current program point.
971@end deffn
972
973@deffn {Scheme Procedure} emit-source asm source
974Associate @var{source} with the current program point.
975@end deffn
976
977@deffn {Scheme Procedure} emit-cache-current-module! asm module scope
978@deffnx {Scheme Procedure} emit-cached-toplevel-box asm dst scope sym bound?
979@deffnx {Scheme Procedure} emit-cached-module-box asm dst module-name sym public? bound?
980Macro-instructions to implement caching of top-level variables. The
981first takes the current module, in the slot @var{module}, and associates
982it with a cache location identified by @var{scope}. The second takes a
983@var{scope}, and resolves the variable. @xref{Top-Level Environment
984Instructions}. The last does not need a cached module, rather taking
985the module name directly.
986@end deffn
73643339 987
69dc8268
AW
988@deffn {Scheme Procedure} emit-load-constant asm dst constant
989Load the Scheme datum @var{constant} into @var{dst}.
990@end deffn
991
992@deffn {Scheme Procedure} emit-begin-program asm label properties
993@deffnx {Scheme Procedure} emit-end-program asm
994Delimit the bounds of a procedure, with the given @var{label} and the
995metadata @var{properties}.
996@end deffn
997
998@deffn {Scheme Procedure} emit-load-static-procedure asm dst label
999Load a procedure with the given @var{label} into local @var{dst}. This
1000macro-instruction should only be used with procedures without free
1001variables -- procedures that are not closures.
1002@end deffn
1003
1004@deffn {Scheme Procedure} emit-begin-standard-arity asm req nlocals alternate
1005@deffnx {Scheme Procedure} emit-begin-opt-arity asm req opt rest nlocals alternate
1006@deffnx {Scheme Procedure} emit-begin-kw-arity asm req opt rest kw-indices allow-other-keys? nlocals alternate
1007@deffnx {Scheme Procedure} emit-end-arity asm
1008Delimit a clause of a procedure.
1009@end deffn
1010
1011@deffn {Scheme Procedure} emit-br-if-symbol asm slot invert? label
1012@deffnx {Scheme Procedure} emit-br-if-variable asm slot invert? label
1013@deffnx {Scheme Procedure} emit-br-if-vector asm slot invert? label
1014@deffnx {Scheme Procedure} emit-br-if-string asm slot invert? label
1015@deffnx {Scheme Procedure} emit-br-if-bytevector asm slot invert? label
1016@deffnx {Scheme Procedure} emit-br-if-bitvector asm slot invert? label
1017TC7-specific test-and-branch instructions. The TC7 is a 7-bit code that
1018is part of a heap object's type. @xref{The SCM Type in Guile}. Also,
1019@xref{Branch Instructions}.
1020@end deffn
1021
1022The linker is a complicated beast. Hackers interested in how it works
1023would do well do read Ian Lance Taylor's series of articles on linkers.
1024Searching the internet should find them easily. From the user's
1025perspective, there is only one knob to control: whether the resulting
1026image will be written out to a file or not. If the user passes
1027@code{#:to-file? #t} as part of the compiler options (@pxref{The Scheme
1028Compiler}), the linker will align the resulting segments on page
1029boundaries, and otherwise not.
1030
1031@deffn {Scheme Procedure} link-assembly asm #:page-aligned?=#t
1032Link an ELF image, and return the bytevector. If @var{page-aligned?} is
1033true, Guile will align the segments with different permissions on
1034page-sized boundaries, in order to maximize code sharing between
1035different processes. Otherwise, padding is minimized, to minimize
1036address space consumption.
1037@end deffn
1038
1039To write an image to disk, just use @code{put-bytevector} from
1040@code{(ice-9 binary-ports)}.
1041
1042Compiling object code to the fake language, @code{value}, is performed
1043via loading objcode into a program, then executing that thunk with
1044respect to the compilation environment. Normally the environment
1045propagates through the compiler transparently, but users may specify the
1046compilation environment manually as well, as a module. Procedures to
1047load images can be found in the @code{(system vm loader)} module:
1048
1049@lisp
1050(use-modules (system vm loader))
1051@end lisp
e3ba263d 1052
b8bc86bc
AW
1053@deffn {Scheme Variable} load-thunk-from-file file
1054@deffnx {C Function} scm_load_thunk_from_file (file)
ff73ae34
AW
1055Load object code from a file named @var{file}. The file will be mapped
1056into memory via @code{mmap}, so this is a very fast operation.
69dc8268 1057@end deffn
e3ba263d 1058
69dc8268
AW
1059@deffn {Scheme Variable} load-thunk-from-memory bv
1060@deffnx {C Function} scm_load_thunk_from_memory (bv)
1061Load object code from a bytevector. The data will be copied out of the
1062bytevector in order to ensure proper alignment of embedded Scheme
1063values.
73643339
AW
1064@end deffn
1065
69dc8268
AW
1066Additionally there are procedures to find the ELF image for a given
1067pointer, or to list all mapped ELF images:
ff73ae34 1068
69dc8268
AW
1069@deffn {Scheme Variable} find-mapped-elf-image ptr
1070Given the integer value @var{ptr}, find and return the ELF image that
1071contains that pointer, as a bytevector. If no image is found, return
1072@code{#f}. This routine is mostly used by debuggers and other
1073introspective tools.
1074@end deffn
1075
1076@deffn {Scheme Variable} all-mapped-elf-images
1077Return all mapped ELF images, as a list of bytevectors.
1078@end deffn
ff73ae34 1079
c850030f 1080
e63d888e
DK
1081@node Writing New High-Level Languages
1082@subsection Writing New High-Level Languages
1083
1084In order to integrate a new language @var{lang} into Guile's compiler
1085system, one has to create the module @code{(language @var{lang} spec)}
1086containing the language definition and referencing the parser,
1087compiler and other routines processing it. The module hierarchy in
1088@code{(language brainfuck)} defines a very basic Brainfuck
1089implementation meant to serve as easy-to-understand example on how to
4e432dab
AW
1090do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
1091for more information about the Brainfuck language itself.
1092
e63d888e 1093
ff73ae34
AW
1094@node Extending the Compiler
1095@subsection Extending the Compiler
e3ba263d 1096
dd6e37d0
NJ
1097At this point we take a detour from the impersonal tone of the rest of
1098the manual. Admit it: if you've read this far into the compiler
1099internals manual, you are a junkie. Perhaps a course at your university
1100left you unsated, or perhaps you've always harbored a desire to hack the
1101holy of computer science holies: a compiler. Well you're in good
1102company, and in a good position. Guile's compiler needs your help.
ff73ae34
AW
1103
1104There are many possible avenues for improving Guile's compiler.
1105Probably the most important improvement, speed-wise, will be some form
1106of native compilation, both just-in-time and ahead-of-time. This could
1107be done in many ways. Probably the easiest strategy would be to extend
1108the compiled procedure structure to include a pointer to a native code
86872cc3 1109vector, and compile from bytecode to native code at run-time after a
ff73ae34
AW
1110procedure is called a certain number of times.
1111
1112The name of the game is a profiling-based harvest of the low-hanging
1113fruit, running programs of interest under a system-level profiler and
1114determining which improvements would give the most bang for the buck.
98850fd7
AW
1115It's really getting to the point though that native compilation is the
1116next step.
ff73ae34
AW
1117
1118The compiler also needs help at the top end, enhancing the Scheme that
98850fd7
AW
1119it knows to also understand R6RS, and adding new high-level compilers.
1120We have JavaScript and Emacs Lisp mostly complete, but they could use
ecb87335 1121some love; Lua would be nice as well, but whatever language it is
98850fd7
AW
1122that strikes your fancy would be welcome too.
1123
1124Compilers are for hacking, not for admiring or for complaining about.
1125Get to it!