Commit | Line | Data |
---|---|---|
8680d53b AW |
1 | @c -*-texinfo-*- |
2 | @c This is part of the GNU Guile Reference Manual. | |
bdad1340 | 3 | @c Copyright (C) 2008, 2009, 2010, 2011, 2012, 2013, 2014 |
8680d53b AW |
4 | @c Free Software Foundation, Inc. |
5 | @c See the file guile.texi for copying conditions. | |
6 | ||
7 | @node Compiling to the Virtual Machine | |
8 | @section Compiling to the Virtual Machine | |
9 | ||
67915ab0 AW |
10 | Compilers! The word itself inspires excitement and awe, even among |
11 | experienced practitioners. But a compiler is just a program: an | |
12 | eminently hackable thing. This section aims to to describe Guile's | |
13 | compiler in such a way that interested Scheme hackers can feel | |
14 | comfortable reading and extending it. | |
00ce5125 | 15 | |
e3ba263d | 16 | @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to |
98850fd7 | 17 | know how to compile your @code{.scm} file. |
00ce5125 AW |
18 | |
19 | @menu | |
20 | * Compiler Tower:: | |
21 | * The Scheme Compiler:: | |
81fd3152 | 22 | * Tree-IL:: |
67915ab0 AW |
23 | * Continuation-Passing Style:: |
24 | * Bytecode:: | |
e63d888e | 25 | * Writing New High-Level Languages:: |
e3ba263d | 26 | * Extending the Compiler:: |
00ce5125 AW |
27 | @end menu |
28 | ||
29 | @node Compiler Tower | |
30 | @subsection Compiler Tower | |
31 | ||
d297e544 AW |
32 | Guile's compiler is quite simple -- its @emph{compilers}, to put it more |
33 | accurately. Guile defines a tower of languages, starting at Scheme and | |
34 | progressively simplifying down to languages that resemble the VM | |
35 | instruction set (@pxref{Instruction Set}). | |
00ce5125 AW |
36 | |
37 | Each language knows how to compile to the next, so each step is simple | |
67915ab0 AW |
38 | and understandable. Furthermore, this set of languages is not hardcoded |
39 | into Guile, so it is possible for the user to add new high-level | |
40 | languages, new passes, or even different compilation targets. | |
00ce5125 | 41 | |
e3ba263d AW |
42 | Languages are registered in the module, @code{(system base language)}: |
43 | ||
44 | @example | |
45 | (use-modules (system base language)) | |
46 | @end example | |
47 | ||
48 | They are registered with the @code{define-language} form. | |
49 | ||
50 | @deffn {Scheme Syntax} define-language @ | |
994d87be BT |
51 | [#:name] [#:title] [#:reader] [#:printer] @ |
52 | [#:parser=#f] [#:compilers='()] @ | |
53 | [#:decompilers='()] [#:evaluator=#f] @ | |
54 | [#:joiner=#f] [#:for-humans?=#t] @ | |
55 | [#:make-default-environment=make-fresh-user-module] | |
e3ba263d AW |
56 | Define a language. |
57 | ||
67915ab0 AW |
58 | This syntax defines a @code{<language>} object, bound to @var{name} in |
59 | the current environment. In addition, the language will be added to the | |
60 | global language set. For example, this is the language definition for | |
61 | Scheme: | |
e3ba263d AW |
62 | |
63 | @example | |
64 | (define-language scheme | |
41e64dd7 AW |
65 | #:title "Scheme" |
66 | #:reader (lambda (port env) ...) | |
98850fd7 | 67 | #:compilers `((tree-il . ,compile-tree-il)) |
81fd3152 | 68 | #:decompilers `((tree-il . ,decompile-tree-il)) |
41e64dd7 AW |
69 | #:evaluator (lambda (x module) (primitive-eval x)) |
70 | #:printer write | |
71 | #:make-default-environment (lambda () ...)) | |
e3ba263d | 72 | @end example |
e3ba263d AW |
73 | @end deffn |
74 | ||
75 | The interesting thing about having languages defined this way is that | |
67915ab0 | 76 | they present a uniform interface to the read-eval-print loop. This |
e3ba263d AW |
77 | allows the user to change the current language of the REPL: |
78 | ||
79 | @example | |
81fd3152 | 80 | scheme@@(guile-user)> ,language tree-il |
41e64dd7 AW |
81 | Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'. |
82 | tree-il@@(guile-user)> ,L scheme | |
83 | Happy hacking with Scheme! To switch back, type `,L tree-il'. | |
84 | scheme@@(guile-user)> | |
e3ba263d AW |
85 | @end example |
86 | ||
87 | Languages can be looked up by name, as they were above. | |
88 | ||
89 | @deffn {Scheme Procedure} lookup-language name | |
90 | Looks up a language named @var{name}, autoloading it if necessary. | |
91 | ||
92 | Languages are autoloaded by looking for a variable named @var{name} in | |
93 | a module named @code{(language @var{name} spec)}. | |
94 | ||
95 | The language object will be returned, or @code{#f} if there does not | |
96 | exist a language with that name. | |
97 | @end deffn | |
98 | ||
99 | Defining languages this way allows us to programmatically determine | |
100 | the necessary steps for compiling code from one language to another. | |
101 | ||
102 | @deffn {Scheme Procedure} lookup-compilation-order from to | |
103 | Recursively traverses the set of languages to which @var{from} can | |
104 | compile, depth-first, and return the first path that can transform | |
105 | @var{from} to @var{to}. Returns @code{#f} if no path is found. | |
106 | ||
107 | This function memoizes its results in a cache that is invalidated by | |
108 | subsequent calls to @code{define-language}, so it should be quite | |
109 | fast. | |
110 | @end deffn | |
111 | ||
5745de91 AW |
112 | There is a notion of a ``current language'', which is maintained in the |
113 | @code{current-language} parameter, defined in the core @code{(guile)} | |
67915ab0 AW |
114 | module. This language is normally Scheme, and may be rebound by the |
115 | user. The run-time compilation interfaces | |
e3ba263d AW |
116 | (@pxref{Read/Load/Eval/Compile}) also allow you to choose other source |
117 | and target languages. | |
118 | ||
119 | The normal tower of languages when compiling Scheme goes like this: | |
120 | ||
121 | @itemize | |
41e64dd7 | 122 | @item Scheme |
81fd3152 | 123 | @item Tree Intermediate Language (Tree-IL) |
67915ab0 | 124 | @item Continuation-Passing Style (CPS) |
81fd3152 | 125 | @item Bytecode |
e3ba263d AW |
126 | @end itemize |
127 | ||
67915ab0 AW |
128 | As discussed before (@pxref{Object File Format}), bytecode is in ELF |
129 | format, ready to be serialized to disk. But when compiling Scheme at | |
130 | run time, you want a Scheme value: for example, a compiled procedure. | |
131 | For this reason, so as not to break the abstraction, Guile defines a | |
132 | fake language at the bottom of the tower: | |
81fd3152 AW |
133 | |
134 | @itemize | |
135 | @item Value | |
136 | @end itemize | |
137 | ||
d297e544 AW |
138 | Compiling to @code{value} loads the bytecode into a procedure, turning |
139 | cold bytes into warm code. | |
e3ba263d AW |
140 | |
141 | Perhaps this strangeness can be explained by example: | |
d297e544 | 142 | @code{compile-file} defaults to compiling to bytecode, because it |
e3ba263d | 143 | produces object code that has to live in the barren world outside the |
67915ab0 AW |
144 | Guile runtime; but @code{compile} defaults to compiling to @code{value}, |
145 | as its product re-enters the Guile world. | |
146 | ||
147 | @c FIXME: This doesn't work anymore :( Should we add some kind of | |
148 | @c special GC pass, or disclaim this kind of code, or what? | |
e3ba263d AW |
149 | |
150 | Indeed, the process of compilation can circulate through these | |
151 | different worlds indefinitely, as shown by the following quine: | |
152 | ||
153 | @example | |
00ce5125 | 154 | ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x))) |
e3ba263d | 155 | @end example |
00ce5125 AW |
156 | |
157 | @node The Scheme Compiler | |
158 | @subsection The Scheme Compiler | |
159 | ||
7081d4f9 | 160 | The job of the Scheme compiler is to expand all macros and all of Scheme |
67915ab0 AW |
161 | to its most primitive expressions. The definition of ``primitive |
162 | expression'' is given by the inventory of constructs provided by | |
163 | Tree-IL, the target language of the Scheme compiler: procedure calls, | |
164 | conditionals, lexical references, and so on. This is described more | |
165 | fully in the next section. | |
81fd3152 AW |
166 | |
167 | The tricky and amusing thing about the Scheme-to-Tree-IL compiler is | |
67915ab0 | 168 | that it is completely implemented by the macro expander. Since the |
81fd3152 AW |
169 | macro expander has to run over all of the source code already in order |
170 | to expand macros, it might as well do the analysis at the same time, | |
171 | producing Tree-IL expressions directly. | |
172 | ||
67915ab0 AW |
173 | Because this compiler is actually the macro expander, it is extensible. |
174 | Any macro which the user writes becomes part of the compiler. | |
81fd3152 AW |
175 | |
176 | The Scheme-to-Tree-IL expander may be invoked using the generic | |
177 | @code{compile} procedure: | |
178 | ||
179 | @lisp | |
180 | (compile '(+ 1 2) #:from 'scheme #:to 'tree-il) | |
181 | @result{} | |
67915ab0 | 182 | #<tree-il (call (toplevel +) (const 1) (const 2))> |
81fd3152 AW |
183 | @end lisp |
184 | ||
67915ab0 AW |
185 | @code{(compile @var{foo} #:from 'scheme #:to 'tree-il)} is entirely |
186 | equivalent to calling the macro expander as @code{(macroexpand @var{foo} | |
187 | 'c '(compile load eval))}. @xref{Macro Expansion}. | |
188 | @code{compile-tree-il}, the procedure dispatched by @code{compile} to | |
189 | @code{'tree-il}, is a small wrapper around @code{macroexpand}, to make | |
190 | it conform to the general form of compiler procedures in Guile's | |
191 | language tower. | |
81fd3152 | 192 | |
98850fd7 AW |
193 | Compiler procedures take three arguments: an expression, an |
194 | environment, and a keyword list of options. They return three values: | |
195 | the compiled expression, the corresponding environment for the target | |
196 | language, and a ``continuation environment''. The compiled expression | |
197 | and environment will serve as input to the next language's compiler. | |
198 | The ``continuation environment'' can be used to compile another | |
199 | expression from the same source language within the same module. | |
81fd3152 AW |
200 | |
201 | For example, you might compile the expression, @code{(define-module | |
202 | (foo))}. This will result in a Tree-IL expression and environment. But | |
203 | if you compiled a second expression, you would want to take into | |
204 | account the compile-time effect of compiling the previous expression, | |
205 | which puts the user in the @code{(foo)} module. That is purpose of the | |
206 | ``continuation environment''; you would pass it as the environment | |
207 | when compiling the subsequent expression. | |
208 | ||
41e64dd7 AW |
209 | For Scheme, an environment is a module. By default, the @code{compile} |
210 | and @code{compile-file} procedures compile in a fresh module, such | |
211 | that bindings and macros introduced by the expression being compiled | |
212 | are isolated: | |
1ebe6a63 LC |
213 | |
214 | @example | |
215 | (eq? (current-module) (compile '(current-module))) | |
216 | @result{} #f | |
217 | ||
218 | (compile '(define hello 'world)) | |
219 | (defined? 'hello) | |
220 | @result{} #f | |
221 | ||
222 | (define / *) | |
223 | (eq? (compile '/) /) | |
224 | @result{} #f | |
225 | @end example | |
226 | ||
227 | Similarly, changes to the @code{current-reader} fluid (@pxref{Loading, | |
228 | @code{current-reader}}) are isolated: | |
229 | ||
230 | @example | |
231 | (compile '(fluid-set! current-reader (lambda args 'fail))) | |
232 | (fluid-ref current-reader) | |
233 | @result{} #f | |
234 | @end example | |
235 | ||
236 | Nevertheless, having the compiler and @dfn{compilee} share the same name | |
237 | space can be achieved by explicitly passing @code{(current-module)} as | |
238 | the compilation environment: | |
239 | ||
240 | @example | |
241 | (define hello 'world) | |
242 | (compile 'hello #:env (current-module)) | |
243 | @result{} world | |
244 | @end example | |
245 | ||
81fd3152 AW |
246 | @node Tree-IL |
247 | @subsection Tree-IL | |
00ce5125 | 248 | |
81fd3152 | 249 | Tree Intermediate Language (Tree-IL) is a structured intermediate |
c850030f AW |
250 | language that is close in expressive power to Scheme. It is an |
251 | expanded, pre-analyzed Scheme. | |
252 | ||
81fd3152 AW |
253 | Tree-IL is ``structured'' in the sense that its representation is |
254 | based on records, not S-expressions. This gives a rigidity to the | |
255 | language that ensures that compiling to a lower-level language only | |
41e64dd7 AW |
256 | requires a limited set of transformations. For example, the Tree-IL |
257 | type @code{<const>} is a record type with two fields, @code{src} and | |
258 | @code{exp}. Instances of this type are created via @code{make-const}. | |
259 | Fields of this type are accessed via the @code{const-src} and | |
260 | @code{const-exp} procedures. There is also a predicate, @code{const?}. | |
261 | @xref{Records}, for more information on records. | |
81fd3152 AW |
262 | |
263 | @c alpha renaming | |
264 | ||
265 | All Tree-IL types have a @code{src} slot, which holds source location | |
266 | information for the expression. This information, if present, will be | |
267 | residualized into the compiled object code, allowing backtraces to | |
268 | show source information. The format of @code{src} is the same as that | |
269 | returned by Guile's @code{source-properties} function. @xref{Source | |
270 | Properties}, for more information. | |
271 | ||
272 | Although Tree-IL objects are represented internally using records, | |
273 | there is also an equivalent S-expression external representation for | |
ecb87335 | 274 | each kind of Tree-IL. For example, the S-expression representation |
81fd3152 | 275 | of @code{#<const src: #f exp: 3>} expression would be: |
c850030f AW |
276 | |
277 | @example | |
81fd3152 | 278 | (const 3) |
c850030f AW |
279 | @end example |
280 | ||
81fd3152 | 281 | Users may program with this format directly at the REPL: |
c850030f AW |
282 | |
283 | @example | |
81fd3152 | 284 | scheme@@(guile-user)> ,language tree-il |
41e64dd7 | 285 | Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'. |
67915ab0 | 286 | tree-il@@(guile-user)> (call (primitive +) (const 32) (const 10)) |
c850030f AW |
287 | @result{} 42 |
288 | @end example | |
289 | ||
81fd3152 AW |
290 | The @code{src} fields are left out of the external representation. |
291 | ||
98850fd7 AW |
292 | One may create Tree-IL objects from their external representations via |
293 | calling @code{parse-tree-il}, the reader for Tree-IL. If any source | |
294 | information is attached to the input S-expression, it will be | |
295 | propagated to the resulting Tree-IL expressions. This is probably the | |
296 | easiest way to compile to Tree-IL: just make the appropriate external | |
297 | representations in S-expression format, and let @code{parse-tree-il} | |
298 | take care of the rest. | |
299 | ||
81fd3152 AW |
300 | @deftp {Scheme Variable} <void> src |
301 | @deftpx {External Representation} (void) | |
67915ab0 | 302 | An empty expression. In practice, equivalent to Scheme's @code{(if #f |
81fd3152 AW |
303 | #f)}. |
304 | @end deftp | |
d297e544 | 305 | |
81fd3152 AW |
306 | @deftp {Scheme Variable} <const> src exp |
307 | @deftpx {External Representation} (const @var{exp}) | |
308 | A constant. | |
309 | @end deftp | |
d297e544 | 310 | |
81fd3152 AW |
311 | @deftp {Scheme Variable} <primitive-ref> src name |
312 | @deftpx {External Representation} (primitive @var{name}) | |
67915ab0 AW |
313 | A reference to a ``primitive''. A primitive is a procedure that, when |
314 | compiled, may be open-coded. For example, @code{cons} is usually | |
81fd3152 AW |
315 | recognized as a primitive, so that it compiles down to a single |
316 | instruction. | |
317 | ||
318 | Compilation of Tree-IL usually begins with a pass that resolves some | |
319 | @code{<module-ref>} and @code{<toplevel-ref>} expressions to | |
67915ab0 | 320 | @code{<primitive-ref>} expressions. The actual compilation pass has |
7081d4f9 AW |
321 | special cases for calls to certain primitives, like @code{apply} or |
322 | @code{cons}. | |
81fd3152 | 323 | @end deftp |
d297e544 | 324 | |
81fd3152 AW |
325 | @deftp {Scheme Variable} <lexical-ref> src name gensym |
326 | @deftpx {External Representation} (lexical @var{name} @var{gensym}) | |
67915ab0 | 327 | A reference to a lexically-bound variable. The @var{name} is the |
81fd3152 AW |
328 | original name of the variable in the source program. @var{gensym} is a |
329 | unique identifier for this variable. | |
330 | @end deftp | |
d297e544 | 331 | |
81fd3152 AW |
332 | @deftp {Scheme Variable} <lexical-set> src name gensym exp |
333 | @deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp}) | |
334 | Sets a lexically-bound variable. | |
335 | @end deftp | |
d297e544 | 336 | |
81fd3152 AW |
337 | @deftp {Scheme Variable} <module-ref> src mod name public? |
338 | @deftpx {External Representation} (@@ @var{mod} @var{name}) | |
339 | @deftpx {External Representation} (@@@@ @var{mod} @var{name}) | |
340 | A reference to a variable in a specific module. @var{mod} should be | |
679cceed | 341 | the name of the module, e.g.@: @code{(guile-user)}. |
81fd3152 AW |
342 | |
343 | If @var{public?} is true, the variable named @var{name} will be looked | |
344 | up in @var{mod}'s public interface, and serialized with @code{@@}; | |
345 | otherwise it will be looked up among the module's private bindings, | |
346 | and is serialized with @code{@@@@}. | |
347 | @end deftp | |
d297e544 | 348 | |
81fd3152 AW |
349 | @deftp {Scheme Variable} <module-set> src mod name public? exp |
350 | @deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp}) | |
351 | @deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp}) | |
352 | Sets a variable in a specific module. | |
353 | @end deftp | |
d297e544 | 354 | |
81fd3152 AW |
355 | @deftp {Scheme Variable} <toplevel-ref> src name |
356 | @deftpx {External Representation} (toplevel @var{name}) | |
357 | References a variable from the current procedure's module. | |
358 | @end deftp | |
d297e544 | 359 | |
81fd3152 AW |
360 | @deftp {Scheme Variable} <toplevel-set> src name exp |
361 | @deftpx {External Representation} (set! (toplevel @var{name}) @var{exp}) | |
362 | Sets a variable in the current procedure's module. | |
363 | @end deftp | |
d297e544 | 364 | |
81fd3152 AW |
365 | @deftp {Scheme Variable} <toplevel-define> src name exp |
366 | @deftpx {External Representation} (define (toplevel @var{name}) @var{exp}) | |
367 | Defines a new top-level variable in the current procedure's module. | |
368 | @end deftp | |
d297e544 | 369 | |
81fd3152 AW |
370 | @deftp {Scheme Variable} <conditional> src test then else |
371 | @deftpx {External Representation} (if @var{test} @var{then} @var{else}) | |
ca445ba5 | 372 | A conditional. Note that @var{else} is not optional. |
c850030f | 373 | @end deftp |
d297e544 | 374 | |
7081d4f9 AW |
375 | @deftp {Scheme Variable} <call> src proc args |
376 | @deftpx {External Representation} (call @var{proc} . @var{args}) | |
ca445ba5 | 377 | A procedure call. |
c850030f | 378 | @end deftp |
d297e544 | 379 | |
a881a4ae AW |
380 | @deftp {Scheme Variable} <primcall> src name args |
381 | @deftpx {External Representation} (primcall @var{name} . @var{args}) | |
382 | A call to a primitive. Equivalent to @code{(call (primitive @var{name}) | |
383 | . @var{args})}. This construct is often more convenient to generate and | |
384 | analyze than @code{<call>}. | |
385 | ||
386 | As part of the compilation process, instances of @code{(call (primitive | |
387 | @var{name}) . @var{args})} are transformed into primcalls. | |
388 | @end deftp | |
d297e544 | 389 | |
67915ab0 AW |
390 | @deftp {Scheme Variable} <seq> src head tail |
391 | @deftpx {External Representation} (seq @var{head} @var{tail}) | |
392 | A sequence. The semantics is that @var{head} is evaluated first, and | |
393 | any resulting values are ignored. Then @var{tail} is evaluated, in tail | |
394 | position. | |
c850030f | 395 | @end deftp |
d297e544 | 396 | |
41e64dd7 AW |
397 | @deftp {Scheme Variable} <lambda> src meta body |
398 | @deftpx {External Representation} (lambda @var{meta} @var{body}) | |
67915ab0 AW |
399 | A closure. @var{meta} is an association list of properties for the |
400 | procedure. @var{body} is a single Tree-IL expression of type | |
401 | @code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to | |
41e64dd7 AW |
402 | an alternate clause, this makes Tree-IL's @code{<lambda>} have the |
403 | expressiveness of Scheme's @code{case-lambda}. | |
404 | @end deftp | |
d297e544 | 405 | |
41e64dd7 AW |
406 | @deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate |
407 | @deftpx {External Representation} @ | |
408 | (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@ | |
409 | @var{body})@ | |
410 | [@var{alternate}]) | |
67915ab0 | 411 | One clause of a @code{case-lambda}. A @code{lambda} expression in |
41e64dd7 AW |
412 | Scheme is treated as a @code{case-lambda} with one clause. |
413 | ||
414 | @var{req} is a list of the procedure's required arguments, as symbols. | |
415 | @var{opt} is a list of the optional arguments, or @code{#f} if there | |
416 | are no optional arguments. @var{rest} is the name of the rest | |
417 | argument, or @code{#f}. | |
418 | ||
419 | @var{kw} is a list of the form, @code{(@var{allow-other-keys?} | |
420 | (@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the | |
421 | keyword corresponding to the argument named @var{name}, and whose | |
67915ab0 AW |
422 | corresponding gensym is @var{var}. @var{inits} are tree-il expressions |
423 | corresponding to all of the optional and keyword arguments, evaluated to | |
424 | bind variables whose value is not supplied by the procedure caller. | |
41e64dd7 AW |
425 | Each @var{init} expression is evaluated in the lexical context of |
426 | previously bound variables, from left to right. | |
427 | ||
428 | @var{gensyms} is a list of gensyms corresponding to all arguments: | |
429 | first all of the required arguments, then the optional arguments if | |
430 | any, then the rest argument if any, then all of the keyword arguments. | |
431 | ||
67915ab0 | 432 | @var{body} is the body of the clause. If the procedure is called with |
41e64dd7 | 433 | an appropriate number of arguments, @var{body} is evaluated in tail |
67915ab0 | 434 | position. Otherwise, if there is an @var{alternate}, it should be a |
41e64dd7 | 435 | @code{<lambda-case>} expression, representing the next clause to try. |
64de6db5 | 436 | If there is no @var{alternate}, a wrong-number-of-arguments error is |
41e64dd7 AW |
437 | signaled. |
438 | @end deftp | |
d297e544 | 439 | |
41e64dd7 AW |
440 | @deftp {Scheme Variable} <let> src names gensyms vals exp |
441 | @deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp}) | |
67915ab0 AW |
442 | Lexical binding, like Scheme's @code{let}. @var{names} are the original |
443 | binding names, @var{gensyms} are gensyms corresponding to the | |
81fd3152 AW |
444 | @var{names}, and @var{vals} are Tree-IL expressions for the values. |
445 | @var{exp} is a single Tree-IL expression. | |
446 | @end deftp | |
d297e544 | 447 | |
935c7aca | 448 | @deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp |
172988ee AW |
449 | @deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp}) |
450 | @deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp}) | |
81fd3152 | 451 | A version of @code{<let>} that creates recursive bindings, like |
935c7aca | 452 | Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true. |
81fd3152 | 453 | @end deftp |
d297e544 | 454 | |
67915ab0 AW |
455 | @deftp {Scheme Variable} <prompt> escape-only? tag body handler |
456 | @deftpx {External Representation} (prompt @var{escape-only?} @var{tag} @var{body} @var{handler}) | |
457 | A dynamic prompt. Instates a prompt named @var{tag}, an expression, | |
41e64dd7 | 458 | during the dynamic extent of the execution of @var{body}, also an |
67915ab0 AW |
459 | expression. If an abort occurs to this prompt, control will be passed |
460 | to @var{handler}, also an expression, which should be a procedure. The | |
461 | first argument to the handler procedure will be the captured | |
462 | continuation, followed by all of the values passed to the abort. If | |
463 | @var{escape-only?} is true, the handler should be a @code{<lambda>} with | |
464 | a single @code{<lambda-case>} body expression with no optional or | |
465 | keyword arguments, and no alternate, and whose first argument is | |
466 | unreferenced. @xref{Prompts}, for more information. | |
41e64dd7 | 467 | @end deftp |
d297e544 | 468 | |
41e64dd7 AW |
469 | @deftp {Scheme Variable} <abort> tag args tail |
470 | @deftpx {External Representation} (abort @var{tag} @var{args} @var{tail}) | |
471 | An abort to the nearest prompt with the name @var{tag}, an expression. | |
472 | @var{args} should be a list of expressions to pass to the prompt's | |
473 | handler, and @var{tail} should be an expression that will evaluate to | |
67915ab0 | 474 | a list of additional arguments. An abort will save the partial |
41e64dd7 AW |
475 | continuation, which may later be reinstated, resulting in the |
476 | @code{<abort>} expression evaluating to some number of values. | |
477 | @end deftp | |
81fd3152 | 478 | |
98850fd7 AW |
479 | There are two Tree-IL constructs that are not normally produced by |
480 | higher-level compilers, but instead are generated during the | |
481 | source-to-source optimization and analysis passes that the Tree-IL | |
67915ab0 AW |
482 | compiler does. Users should not generate these expressions directly, |
483 | unless they feel very clever, as the default analysis pass will generate | |
484 | them as necessary. | |
98850fd7 | 485 | |
41e64dd7 AW |
486 | @deftp {Scheme Variable} <let-values> src names gensyms exp body |
487 | @deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body}) | |
98850fd7 AW |
488 | Like Scheme's @code{receive} -- binds the values returned by |
489 | evaluating @code{exp} to the @code{lambda}-like bindings described by | |
67915ab0 | 490 | @var{gensyms}. That is to say, @var{gensyms} may be an improper list. |
98850fd7 | 491 | |
7081d4f9 | 492 | @code{<let-values>} is an optimization of a @code{<call>} to the |
98850fd7 AW |
493 | primitive, @code{call-with-values}. |
494 | @end deftp | |
d297e544 | 495 | |
41e64dd7 AW |
496 | @deftp {Scheme Variable} <fix> src names gensyms vals body |
497 | @deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body}) | |
98850fd7 AW |
498 | Like @code{<letrec>}, but only for @var{vals} that are unset |
499 | @code{lambda} expressions. | |
500 | ||
501 | @code{fix} is an optimization of @code{letrec} (and @code{let}). | |
502 | @end deftp | |
81fd3152 | 503 | |
67915ab0 AW |
504 | Tree-IL is a convenient compilation target from source languages. It |
505 | can be convenient as a medium for optimization, though CPS is usually | |
506 | better. The strength of Tree-IL is that it does not fix order of | |
507 | evaluation, so it makes some code motion a bit easier. | |
00ce5125 | 508 | |
67915ab0 | 509 | Optimization passes performed on Tree-IL currently include: |
00ce5125 | 510 | |
73643339 | 511 | @itemize |
67915ab0 AW |
512 | @item Open-coding (turning toplevel-refs into primitive-refs, |
513 | and calls to primitives to primcalls) | |
514 | @item Partial evaluation (comprising inlining, copy propagation, and | |
515 | constant folding) | |
516 | @item Common subexpression elimination (CSE) | |
73643339 AW |
517 | @end itemize |
518 | ||
67915ab0 AW |
519 | In the future, we will move the CSE pass to operate over the lower-level |
520 | CPS language. | |
73643339 | 521 | |
67915ab0 AW |
522 | @node Continuation-Passing Style |
523 | @subsection Continuation-Passing Style | |
73643339 | 524 | |
67915ab0 | 525 | @cindex CPS |
58b23156 AW |
526 | Continuation-passing style (CPS) is Guile's principal intermediate |
527 | language, bridging the gap between languages for people and languages | |
528 | for machines. CPS gives a name to every part of a program: every | |
529 | control point, and every intermediate value. This makes it an excellent | |
530 | medium for reasoning about programs, which is the principal job of a | |
531 | compiler. | |
532 | ||
533 | @menu | |
534 | * An Introduction to CPS:: | |
535 | * CPS in Guile:: | |
bdad1340 | 536 | * Building CPS:: |
69dc8268 | 537 | * Compiling CPS:: |
58b23156 AW |
538 | @end menu |
539 | ||
540 | @node An Introduction to CPS | |
541 | @subsubsection An Introduction to CPS | |
542 | ||
69dc8268 | 543 | Consider the following Scheme expression: |
58b23156 AW |
544 | |
545 | @lisp | |
546 | (begin | |
547 | (display "The sum of 32 and 10 is: ") | |
548 | (display 42) | |
549 | (newline)) | |
550 | @end lisp | |
551 | ||
69dc8268 AW |
552 | Let us identify all of the sub-expressions in this expression, |
553 | annotating them with unique labels: | |
58b23156 AW |
554 | |
555 | @lisp | |
556 | (begin | |
557 | (display "The sum of 32 and 10 is: ") | |
558 | |k1 k2 | |
559 | k0 | |
560 | (display 42) | |
561 | |k4 k5 | |
562 | k3 | |
563 | (newline)) | |
564 | |k7 | |
565 | k6 | |
566 | @end lisp | |
567 | ||
69dc8268 AW |
568 | Each of these labels identifies a point in a program. One label may be |
569 | the continuation of another label. For example, the continuation of | |
570 | @code{k7} is @code{k6}. This is because after evaluating the value of | |
571 | @code{newline}, performed by the expression labelled @code{k7}, we | |
bdad1340 | 572 | continue to apply it in @code{k6}. |
58b23156 | 573 | |
69dc8268 AW |
574 | Which expression has @code{k0} as its continuation? It is either the |
575 | expression labelled @code{k1} or the expression labelled @code{k2}. | |
576 | Scheme does not have a fixed order of evaluation of arguments, though it | |
577 | does guarantee that they are evaluated in some order. Unlike general | |
578 | Scheme, continuation-passing style makes evaluation order explicit. In | |
579 | Guile, this choice is made by the higher-level language compilers. | |
58b23156 AW |
580 | |
581 | Let us assume a left-to-right evaluation order. In that case the | |
582 | continuation of @code{k1} is @code{k2}, and the continuation of | |
583 | @code{k2} is @code{k0}. | |
584 | ||
585 | With this example established, we are ready to give an example of CPS in | |
586 | Scheme: | |
587 | ||
69dc8268 | 588 | @smalllisp |
58b23156 AW |
589 | (lambda (ktail) |
590 | (let ((k1 (lambda () | |
591 | (let ((k2 (lambda (proc) | |
592 | (let ((k0 (lambda (arg0) | |
593 | (proc k4 arg0)))) | |
594 | (k0 "The sum of 32 and 10 is: "))))) | |
595 | (k2 display)))) | |
596 | (k4 (lambda _ | |
597 | (let ((k5 (lambda (proc) | |
598 | (let ((k3 (lambda (arg0) | |
599 | (proc k7 arg0)))) | |
600 | (k3 42))))) | |
601 | (k5 display)))) | |
602 | (k7 (lambda _ | |
603 | (let ((k6 (lambda (proc) | |
604 | (proc ktail)))) | |
605 | (k6 newline))))) | |
606 | (k1)) | |
69dc8268 | 607 | @end smalllisp |
58b23156 AW |
608 | |
609 | Holy code explosion, Batman! What's with all the lambdas? Indeed, CPS | |
610 | is by nature much more verbose than ``direct-style'' intermediate | |
69dc8268 AW |
611 | languages like Tree-IL. At the same time, CPS is simpler than full |
612 | Scheme, because it makes things more explicit. | |
58b23156 AW |
613 | |
614 | In the original program, the expression labelled @code{k0} is in effect | |
69dc8268 AW |
615 | context. Any values it returns are ignored. In Scheme, this fact is |
616 | implicit. In CPS, we can see it explicitly by noting that its | |
617 | continuation, @code{k4}, takes any number of values and ignores them. | |
618 | Compare this to @code{k2}, which takes a single value; in this way we | |
619 | can say that @code{k1} is in a ``value'' context. Likewise @code{k6} is | |
620 | in tail context with respect to the expression as a whole, because its | |
621 | continuation is the tail continuation, @code{ktail}. CPS makes these | |
622 | details manifest, and gives them names. | |
bdad1340 | 623 | |
58b23156 AW |
624 | @node CPS in Guile |
625 | @subsubsection CPS in Guile | |
626 | ||
bdad1340 AW |
627 | Guile's CPS language is composed of @dfn{terms}, @dfn{expressions}, |
628 | and @dfn{continuations}. | |
58b23156 | 629 | |
bdad1340 AW |
630 | A term can either evaluate an expression and pass the resulting values |
631 | to some continuation, or it can declare local continuations and contain | |
632 | a sub-term in the scope of those continuations. | |
633 | ||
634 | @deftp {CPS Term} $continue k src exp | |
635 | Evaluate the expression @var{exp} and pass the resulting values (if any) | |
636 | to the continuation labelled @var{k}. The source information associated | |
637 | with the expression may be found in @var{src}, which is either an alist | |
638 | as in @code{source-properties} or is @code{#f} if there is no associated | |
639 | source. | |
58b23156 AW |
640 | @end deftp |
641 | ||
bdad1340 AW |
642 | @deftp {CPS Term} $letk conts body |
643 | Bind @var{conts}, a list of continuations (@code{$cont} instances), in | |
644 | the scope of the sub-term @var{body}. The continuations are mutually | |
645 | recursive. | |
58b23156 AW |
646 | @end deftp |
647 | ||
bdad1340 AW |
648 | Additionally, the early stages of CPS allow for a set of mutually |
649 | recursive functions to be declared as a term. This @code{$letrec} type | |
650 | is like Tree-IL's @code{<fix>}. The contification pass will attempt to | |
651 | transform the functions declared in a @code{$letrec} into local | |
652 | continuations. Any remaining functions are later lowered to @code{$fun} | |
653 | expressions. | |
58b23156 | 654 | |
bdad1340 AW |
655 | @deftp {CPS Term} $letrec names syms funs body |
656 | Declare the mutually recursive set of functions denoted by @var{names}, | |
657 | @var{syms}, and @var{funs} within the sub-term @var{body}. @var{names} | |
658 | and @var{syms} are lists of symbols, and @var{funs} is a list of | |
507c58b2 | 659 | @code{$fun} values. @var{syms} are globally unique. |
58b23156 AW |
660 | @end deftp |
661 | ||
bdad1340 AW |
662 | Here is an inventory of the kinds of expressions in Guile's CPS |
663 | language. Recall that all expressions are wrapped in a @code{$continue} | |
664 | term which specifies their continuation. | |
58b23156 | 665 | |
bdad1340 AW |
666 | @deftp {CPS Expression} $void |
667 | Continue with the unspecified value. | |
58b23156 | 668 | @end deftp |
bdad1340 AW |
669 | |
670 | @deftp {CPS Expression} $const val | |
671 | Continue with the constant value @var{val}. | |
58b23156 | 672 | @end deftp |
bdad1340 AW |
673 | |
674 | @deftp {CPS Expression} $prim name | |
675 | Continue with the procedure that implements the primitive operation | |
676 | named by @var{name}. | |
58b23156 | 677 | @end deftp |
bdad1340 AW |
678 | |
679 | @deftp {CPS Expression} $fun src meta free body | |
680 | Continue with a procedure. @var{src} identifies the source information | |
681 | for the procedure declaration, and @var{meta} is the metadata alist as | |
682 | described above in Tree-IL's @code{<lambda>}. @var{free} is a list of | |
683 | free variables accessed by the procedure. Early CPS uses an empty list | |
684 | for @var{free}; only after closure conversion is it correctly populated. | |
685 | Finally, @var{body} is the @code{$kentry} @code{$cont} of the procedure | |
686 | entry. | |
58b23156 | 687 | @end deftp |
bdad1340 AW |
688 | |
689 | @deftp {CPS Expression} $call proc args | |
b3ae2b50 | 690 | @deftpx {CPS Expression} $callk label proc args |
bdad1340 AW |
691 | Call @var{proc} with the arguments @var{args}, and pass all values to |
692 | the continuation. @var{proc} and the elements of the @var{args} list | |
693 | should all be variable names. The continuation identified by the term's | |
694 | @var{k} should be a @code{$kreceive} or a @code{$ktail} instance. | |
b3ae2b50 AW |
695 | |
696 | @code{$callk} is for the case where the call target is known to be in | |
697 | the same compilation unit. @var{label} should be some continuation | |
698 | label, though it need not be in scope. In this case the @var{proc} is | |
699 | simply an additional argument, since it is not used to determine the | |
700 | call target at run-time. | |
58b23156 | 701 | @end deftp |
bdad1340 AW |
702 | |
703 | @deftp {CPS Expression} $primcall name args | |
704 | Perform the primitive operation identified by @code{name}, a well-known | |
705 | symbol, passing it the arguments @var{args}, and pass all resulting | |
706 | values to the continuation. The set of available primitives includes | |
707 | all primitives known to Tree-IL and then some more; see the source code | |
708 | for details. | |
58b23156 AW |
709 | @end deftp |
710 | ||
bdad1340 AW |
711 | @deftp {CPS Expression} $values args |
712 | Pass the values named by the list @var{args} to the continuation. | |
713 | @end deftp | |
58b23156 | 714 | |
bdad1340 AW |
715 | @deftp {CPS Expression} $prompt escape? tag handler |
716 | Push a prompt on the stack identified by the variable name @var{tag}, | |
717 | which may be escape-only if @var{escape?} is true, and continue with | |
718 | zero values. If the body aborts to this prompt, control will proceed at | |
719 | the continuation labelled @var{handler}, which should be a | |
720 | @code{$kreceive} continuation. Prompts are later popped by | |
721 | @code{pop-prompt} primcalls. | |
722 | @end deftp | |
58b23156 | 723 | |
bdad1340 AW |
724 | The remaining element of the CPS language in Guile is the continuation. |
725 | In CPS, all continuations have unique labels. Since this aspect is | |
726 | common to all continuation types, all continuations are contained in a | |
727 | @code{$cont} instance: | |
58b23156 | 728 | |
bdad1340 AW |
729 | @deftp {CPS Continuation Wrapper} $cont k cont |
730 | Declare a continuation labelled @var{k}. All references to the | |
731 | continuation will use this label. | |
732 | @end deftp | |
58b23156 | 733 | |
bdad1340 AW |
734 | The most common kind of continuation binds some number of values, and |
735 | then evaluates a sub-term. @code{$kargs} is this kind of simple | |
736 | @code{lambda}. | |
58b23156 | 737 | |
bdad1340 AW |
738 | @deftp {CPS Continuation} $kargs names syms body |
739 | Bind the incoming values to the variables @var{syms}, with original | |
740 | names @var{names}, and then evaluate the sub-term @var{body}. | |
741 | @end deftp | |
58b23156 | 742 | |
bdad1340 AW |
743 | Variable names (the names in the @var{syms} of a @code{$kargs}) should |
744 | be globally unique, and also disjoint from continuation labels. To bind | |
745 | a value to a variable and then evaluate some term, you would continue | |
746 | with the value to a @code{$kargs} that declares one variable. The bound | |
747 | value would then be available for use within the body of the | |
748 | @code{$kargs}. | |
58b23156 | 749 | |
bdad1340 AW |
750 | @deftp {CPS Continuation} $kif kt kf |
751 | Receive one value. If it is true for the purposes of Scheme, branch to | |
752 | the continuation labelled @var{kt}, passing no values; otherwise, branch | |
753 | to @var{kf}. | |
754 | @end deftp | |
755 | ||
756 | For internal reasons, only certain terms may continue to a @code{$kif}. | |
757 | Compiling @code{$kif} avoids allocating space for the test variable, so | |
758 | it needs to be preceded by expressions that can test-and-branch without | |
759 | temporary values. In practice this condition is true for | |
760 | @code{$primcall}s to @code{null?}, @code{=}, and similar primitives that | |
761 | have corresponding @code{br-if-@var{foo}} VM operations; see the source | |
762 | code for full details. When in doubt, bind the test expression to a | |
763 | variable, and continue to the @code{$kif} with a @code{$values} | |
764 | expression. The optimizer should elide the @code{$values} if it is not | |
765 | needed. | |
58b23156 | 766 | |
bdad1340 AW |
767 | Calls out to other functions need to be wrapped in a @code{$kreceive} |
768 | continuation in order to adapt the returned values to their uses in the | |
769 | calling function, if any. | |
770 | ||
771 | @deftp {CPS Continuation} $kreceive arity k | |
772 | Receive values on the stack. Parse them according to @var{arity}, and | |
507c58b2 | 773 | then proceed with the parsed values to the @code{$kargs} continuation |
bdad1340 AW |
774 | labelled @var{k}. As a limitation specific to @code{$kreceive}, |
775 | @var{arity} may only contain required and rest arguments. | |
776 | @end deftp | |
777 | ||
778 | @code{$arity} is a helper data structure used by @code{$kreceive} and | |
779 | also by @code{$kclause}, described below. | |
780 | ||
781 | @deftp {CPS Data} $arity req opt rest kw allow-other-keys? | |
782 | A data type declaring an arity. @var{req} and @var{opt} are lists of | |
783 | source names of required and optional arguments, respectively. | |
784 | @var{rest} is either the source name of the rest variable, or @code{#f} | |
785 | if this arity does not accept additional values. @var{kw} is a list of | |
786 | the form @code{((@var{keyword} @var{name} @var{var}) ...)}, describing | |
787 | the keyword arguments. @var{allow-other-keys?} is true if other keyword | |
788 | arguments are allowed and false otherwise. | |
789 | ||
790 | Note that all of these names with the exception of the @var{var}s in the | |
791 | @var{kw} list are source names, not unique variable names. | |
792 | @end deftp | |
793 | ||
794 | Additionally, there are three specific kinds of continuations that can | |
795 | only be declared at function entries. | |
796 | ||
797 | @deftp {CPS Continuation} $kentry self tail clauses | |
798 | Declare a function entry. @var{self} is a variable bound to the | |
799 | procedure being called, and which may be used for self-references. | |
800 | @var{tail} declares the @code{$cont} wrapping the @code{$ktail} for this | |
801 | function, corresponding to the function's tail continuation. | |
802 | @var{clauses} is a list of @code{$kclause} @code{$cont} instances. | |
803 | @end deftp | |
804 | ||
805 | @deftp {CPS Continuation} $ktail | |
806 | A tail continuation. | |
807 | @end deftp | |
808 | ||
809 | @deftp {CPS Continuation} $kclause arity cont | |
810 | A clause of a function with a given arity. Applications of a function | |
811 | with a compatible set of actual arguments will continue to @var{cont}, a | |
812 | @code{$kargs} @code{$cont} instance representing the clause body. | |
813 | @end deftp | |
814 | ||
815 | ||
816 | @node Building CPS | |
817 | @subsubsection Building CPS | |
818 | ||
819 | Unlike Tree-IL, the CPS language is built to be constructed and | |
820 | deconstructed with abstract macros instead of via procedural | |
821 | constructors or accessors, or instead of S-expression matching. | |
822 | ||
823 | Deconstruction and matching is handled adequately by the @code{match} | |
824 | form from @code{(ice-9 match)}. @xref{Pattern Matching}. Construction | |
825 | is handled by a set of mutually recursive builder macros: | |
826 | @code{build-cps-term}, @code{build-cps-cont}, and @code{build-cps-exp}. | |
827 | ||
828 | In the following interface definitions, consider variables containing | |
829 | @code{cont} to be recursively build by @code{build-cps-cont}, and | |
830 | likewise for @code{term} and @code{exp}. Consider any other name to be | |
831 | evaluated as a Scheme expression. Many of these forms recognize | |
832 | @code{unquote} in some contexts, to splice in a previously-built value; | |
833 | see the specifications below for full details. | |
834 | ||
835 | @deffn {Scheme Syntax} build-cps-term ,val | |
836 | @deffnx {Scheme Syntax} build-cps-term ($letk (cont ...) term) | |
837 | @deffnx {Scheme Syntax} build-cps-term ($letrec names syms funs term) | |
838 | @deffnx {Scheme Syntax} build-cps-term ($continue k src exp) | |
839 | @deffnx {Scheme Syntax} build-cps-exp ,val | |
840 | @deffnx {Scheme Syntax} build-cps-exp ($void) | |
841 | @deffnx {Scheme Syntax} build-cps-exp ($const val) | |
842 | @deffnx {Scheme Syntax} build-cps-exp ($prim name) | |
843 | @deffnx {Scheme Syntax} build-cps-exp ($fun src meta free body) | |
844 | @deffnx {Scheme Syntax} build-cps-exp ($call proc (arg ...)) | |
845 | @deffnx {Scheme Syntax} build-cps-exp ($call proc args) | |
846 | @deffnx {Scheme Syntax} build-cps-exp ($primcall name (arg ...)) | |
847 | @deffnx {Scheme Syntax} build-cps-exp ($primcall name args) | |
848 | @deffnx {Scheme Syntax} build-cps-exp ($values (arg ...)) | |
849 | @deffnx {Scheme Syntax} build-cps-exp ($values args) | |
850 | @deffnx {Scheme Syntax} build-cps-exp ($prompt escape? tag handler) | |
851 | @deffnx {Scheme Syntax} build-cps-cont ,val | |
852 | @deffnx {Scheme Syntax} build-cps-cont (k ($kargs (name ...) (sym ...) term)) | |
853 | @deffnx {Scheme Syntax} build-cps-cont (k ($kargs names syms term)) | |
854 | @deffnx {Scheme Syntax} build-cps-cont (k ($kif kt kf)) | |
855 | @deffnx {Scheme Syntax} build-cps-cont (k ($kreceive req rest kargs)) | |
856 | @deffnx {Scheme Syntax} build-cps-cont (k ($kentry self tail-cont ,clauses)) | |
857 | @deffnx {Scheme Syntax} build-cps-cont (k ($kentry self tail-cont (cont ...))) | |
858 | @deffnx {Scheme Syntax} build-cps-cont (k ($kclause ,arity cont)) | |
859 | @deffnx {Scheme Syntax} build-cps-cont (k ($kclause (req opt rest kw aok?) cont)) | |
860 | Construct a CPS term, expression, or continuation. | |
861 | @end deffn | |
58b23156 | 862 | |
bdad1340 | 863 | There are a few more miscellaneous interfaces as well. |
58b23156 | 864 | |
bdad1340 AW |
865 | @deffn {Scheme Procedure} make-arity req opt rest kw allow-other-keywords? |
866 | A procedural constructor for @code{$arity} objects. | |
867 | @end deffn | |
58b23156 | 868 | |
bdad1340 AW |
869 | @deffn {Scheme Syntax} let-gensyms (sym ...) body ... |
870 | Bind @var{sym...} to fresh names, and evaluate @var{body...}. | |
871 | @end deffn | |
58b23156 | 872 | |
bdad1340 AW |
873 | @deffn {Scheme Syntax} rewrite-cps-term val (pat term) ... |
874 | @deffnx {Scheme Syntax} rewrite-cps-exp val (pat exp) ... | |
875 | @deffnx {Scheme Syntax} rewrite-cps-cont val (pat cont) ... | |
876 | Match @var{val} against the series of patterns @var{pat...}, using | |
877 | @code{match}. The body of the matching clause should be a template in | |
878 | the syntax of @code{build-cps-term}, @code{build-cps-exp}, or | |
879 | @code{build-cps-cont}, respectively. | |
880 | @end deffn | |
73643339 | 881 | |
69dc8268 AW |
882 | @node Compiling CPS |
883 | @subsubsection Compiling CPS | |
884 | ||
885 | Compiling CPS in Guile has three phases: conversion, optimization, and | |
886 | code generation. | |
887 | ||
888 | CPS conversion is the process of taking a higher-level language and | |
889 | compiling it to CPS. Source languages can do this directly, or they can | |
890 | convert to Tree-IL (which is probably easier) and let Tree-IL convert to | |
891 | CPS later. Going through Tree-IL has the advantage of running Tree-IL | |
892 | optimization passes, like partial evaluation. Also, the compiler from | |
893 | Tree-IL to CPS handles assignment conversion, in which assigned local | |
894 | variables (in Tree-IL, locals that are @code{<lexical-set>}) are | |
895 | converted to being boxed values on the heap. @xref{Variables and the | |
896 | VM}. | |
897 | ||
898 | After CPS conversion, Guile runs some optimization passes. The major | |
899 | optimization performed on CPS is contification, in which functions that | |
900 | are always called with the same continuation are incorporated directly | |
901 | into a function's body. This opens up space for more optimizations, and | |
902 | turns procedure calls into @code{goto}. It can also make loops out of | |
903 | recursive function nests. | |
904 | ||
905 | At the time of this writing (2014), most high-level optimization in | |
906 | Guile is done on Tree-IL. We would like to rewrite many of these passes | |
907 | to operate on CPS instead, as it is easier to reason about CPS. | |
908 | ||
909 | The rest of the optimization passes are really cleanups and | |
910 | canonicalizations. CPS spans the gap between high-level languages and | |
911 | low-level bytecodes, which allows much of the compilation process to be | |
912 | expressed as source-to-source transformations. Such is the case for | |
913 | closure conversion, in which references to variables that are free in a | |
914 | function are converted to closure references, and in which functions are | |
915 | converted to closures. There are a few more passes to ensure that the | |
916 | only primcalls left in the term are those that have a corresponding | |
917 | instruction in the virtual machine, and that their continuations expect | |
918 | the right number of values. | |
919 | ||
920 | Finally, the backend of the CPS compiler emits bytecode for each | |
921 | function, one by one. To do so, it determines the set of live variables | |
922 | at all points in the function. Using this liveness information, it | |
923 | allocates stack slots to each variable, such that a variable can live in | |
924 | one slot for the duration of its lifetime, without shuffling. (Of | |
925 | course, variables with disjoint lifetimes can share a slot.) Finally | |
926 | the backend emits code, typically just one VM instruction, for each | |
927 | continuation in the function. | |
928 | ||
929 | ||
67915ab0 AW |
930 | @node Bytecode |
931 | @subsection Bytecode | |
73643339 | 932 | |
69dc8268 AW |
933 | As mentioned before, Guile compiles all code to bytecode, and that |
934 | bytecode is contained in ELF images. @xref{Object File Format}, for | |
935 | more on Guile's use of ELF. | |
936 | ||
937 | To produce a bytecode image, Guile provides an assembler and a linker. | |
938 | ||
939 | The assembler, defined in the @code{(system vm assembler)} module, has a | |
940 | relatively straightforward imperative interface. It provides a | |
941 | @code{make-assembler} function to instantiate an assembler and a set of | |
942 | @code{emit-@var{inst}} procedures to emit instructions of each kind. | |
943 | ||
944 | The @code{emit-@var{inst}} procedures are actually generated at | |
945 | compile-time from a machine-readable description of the VM. With a few | |
946 | exceptions for certain operand types, each operand of an emit procedure | |
947 | corresponds to an operand of the corresponding instruction. | |
948 | ||
949 | Consider @code{vector-length}, from @pxref{Miscellaneous Instructions}. | |
950 | It is documented as: | |
951 | ||
952 | @deftypefn Instruction {} vector-length u12:@var{dst} u12:@var{src} | |
953 | @end deftypefn | |
954 | ||
955 | Therefore the emit procedure has the form: | |
956 | ||
957 | @deffn {Scheme Procedure} emit-vector-length asm dst src | |
958 | @end deffn | |
959 | ||
960 | All emit procedure take the assembler as their first argument, and | |
961 | return no useful values. | |
962 | ||
963 | The argument types depend on the operand types. @xref{Instruction Set}. | |
964 | Most are integers within a restricted range, though labels are generally | |
965 | expressed as opaque symbols. | |
966 | ||
967 | There are a few macro-instructions as well. | |
968 | ||
969 | @deffn {Scheme Procedure} emit-label asm label | |
970 | Define a label at the current program point. | |
971 | @end deffn | |
972 | ||
973 | @deffn {Scheme Procedure} emit-source asm source | |
974 | Associate @var{source} with the current program point. | |
975 | @end deffn | |
976 | ||
977 | @deffn {Scheme Procedure} emit-cache-current-module! asm module scope | |
978 | @deffnx {Scheme Procedure} emit-cached-toplevel-box asm dst scope sym bound? | |
979 | @deffnx {Scheme Procedure} emit-cached-module-box asm dst module-name sym public? bound? | |
980 | Macro-instructions to implement caching of top-level variables. The | |
981 | first takes the current module, in the slot @var{module}, and associates | |
982 | it with a cache location identified by @var{scope}. The second takes a | |
983 | @var{scope}, and resolves the variable. @xref{Top-Level Environment | |
984 | Instructions}. The last does not need a cached module, rather taking | |
985 | the module name directly. | |
986 | @end deffn | |
73643339 | 987 | |
69dc8268 AW |
988 | @deffn {Scheme Procedure} emit-load-constant asm dst constant |
989 | Load the Scheme datum @var{constant} into @var{dst}. | |
990 | @end deffn | |
991 | ||
992 | @deffn {Scheme Procedure} emit-begin-program asm label properties | |
993 | @deffnx {Scheme Procedure} emit-end-program asm | |
994 | Delimit the bounds of a procedure, with the given @var{label} and the | |
995 | metadata @var{properties}. | |
996 | @end deffn | |
997 | ||
998 | @deffn {Scheme Procedure} emit-load-static-procedure asm dst label | |
999 | Load a procedure with the given @var{label} into local @var{dst}. This | |
1000 | macro-instruction should only be used with procedures without free | |
1001 | variables -- procedures that are not closures. | |
1002 | @end deffn | |
1003 | ||
1004 | @deffn {Scheme Procedure} emit-begin-standard-arity asm req nlocals alternate | |
1005 | @deffnx {Scheme Procedure} emit-begin-opt-arity asm req opt rest nlocals alternate | |
1006 | @deffnx {Scheme Procedure} emit-begin-kw-arity asm req opt rest kw-indices allow-other-keys? nlocals alternate | |
1007 | @deffnx {Scheme Procedure} emit-end-arity asm | |
1008 | Delimit a clause of a procedure. | |
1009 | @end deffn | |
1010 | ||
1011 | @deffn {Scheme Procedure} emit-br-if-symbol asm slot invert? label | |
1012 | @deffnx {Scheme Procedure} emit-br-if-variable asm slot invert? label | |
1013 | @deffnx {Scheme Procedure} emit-br-if-vector asm slot invert? label | |
1014 | @deffnx {Scheme Procedure} emit-br-if-string asm slot invert? label | |
1015 | @deffnx {Scheme Procedure} emit-br-if-bytevector asm slot invert? label | |
1016 | @deffnx {Scheme Procedure} emit-br-if-bitvector asm slot invert? label | |
1017 | TC7-specific test-and-branch instructions. The TC7 is a 7-bit code that | |
1018 | is part of a heap object's type. @xref{The SCM Type in Guile}. Also, | |
1019 | @xref{Branch Instructions}. | |
1020 | @end deffn | |
1021 | ||
1022 | The linker is a complicated beast. Hackers interested in how it works | |
1023 | would do well do read Ian Lance Taylor's series of articles on linkers. | |
1024 | Searching the internet should find them easily. From the user's | |
1025 | perspective, there is only one knob to control: whether the resulting | |
1026 | image will be written out to a file or not. If the user passes | |
1027 | @code{#:to-file? #t} as part of the compiler options (@pxref{The Scheme | |
1028 | Compiler}), the linker will align the resulting segments on page | |
1029 | boundaries, and otherwise not. | |
1030 | ||
1031 | @deffn {Scheme Procedure} link-assembly asm #:page-aligned?=#t | |
1032 | Link an ELF image, and return the bytevector. If @var{page-aligned?} is | |
1033 | true, Guile will align the segments with different permissions on | |
1034 | page-sized boundaries, in order to maximize code sharing between | |
1035 | different processes. Otherwise, padding is minimized, to minimize | |
1036 | address space consumption. | |
1037 | @end deffn | |
1038 | ||
1039 | To write an image to disk, just use @code{put-bytevector} from | |
1040 | @code{(ice-9 binary-ports)}. | |
1041 | ||
1042 | Compiling object code to the fake language, @code{value}, is performed | |
1043 | via loading objcode into a program, then executing that thunk with | |
1044 | respect to the compilation environment. Normally the environment | |
1045 | propagates through the compiler transparently, but users may specify the | |
1046 | compilation environment manually as well, as a module. Procedures to | |
1047 | load images can be found in the @code{(system vm loader)} module: | |
1048 | ||
1049 | @lisp | |
1050 | (use-modules (system vm loader)) | |
1051 | @end lisp | |
e3ba263d | 1052 | |
b8bc86bc AW |
1053 | @deffn {Scheme Variable} load-thunk-from-file file |
1054 | @deffnx {C Function} scm_load_thunk_from_file (file) | |
ff73ae34 AW |
1055 | Load object code from a file named @var{file}. The file will be mapped |
1056 | into memory via @code{mmap}, so this is a very fast operation. | |
69dc8268 | 1057 | @end deffn |
e3ba263d | 1058 | |
69dc8268 AW |
1059 | @deffn {Scheme Variable} load-thunk-from-memory bv |
1060 | @deffnx {C Function} scm_load_thunk_from_memory (bv) | |
1061 | Load object code from a bytevector. The data will be copied out of the | |
1062 | bytevector in order to ensure proper alignment of embedded Scheme | |
1063 | values. | |
73643339 AW |
1064 | @end deffn |
1065 | ||
69dc8268 AW |
1066 | Additionally there are procedures to find the ELF image for a given |
1067 | pointer, or to list all mapped ELF images: | |
ff73ae34 | 1068 | |
69dc8268 AW |
1069 | @deffn {Scheme Variable} find-mapped-elf-image ptr |
1070 | Given the integer value @var{ptr}, find and return the ELF image that | |
1071 | contains that pointer, as a bytevector. If no image is found, return | |
1072 | @code{#f}. This routine is mostly used by debuggers and other | |
1073 | introspective tools. | |
1074 | @end deffn | |
1075 | ||
1076 | @deffn {Scheme Variable} all-mapped-elf-images | |
1077 | Return all mapped ELF images, as a list of bytevectors. | |
1078 | @end deffn | |
ff73ae34 | 1079 | |
c850030f | 1080 | |
e63d888e DK |
1081 | @node Writing New High-Level Languages |
1082 | @subsection Writing New High-Level Languages | |
1083 | ||
1084 | In order to integrate a new language @var{lang} into Guile's compiler | |
1085 | system, one has to create the module @code{(language @var{lang} spec)} | |
1086 | containing the language definition and referencing the parser, | |
1087 | compiler and other routines processing it. The module hierarchy in | |
1088 | @code{(language brainfuck)} defines a very basic Brainfuck | |
1089 | implementation meant to serve as easy-to-understand example on how to | |
4e432dab AW |
1090 | do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck} |
1091 | for more information about the Brainfuck language itself. | |
1092 | ||
e63d888e | 1093 | |
ff73ae34 AW |
1094 | @node Extending the Compiler |
1095 | @subsection Extending the Compiler | |
e3ba263d | 1096 | |
dd6e37d0 NJ |
1097 | At this point we take a detour from the impersonal tone of the rest of |
1098 | the manual. Admit it: if you've read this far into the compiler | |
1099 | internals manual, you are a junkie. Perhaps a course at your university | |
1100 | left you unsated, or perhaps you've always harbored a desire to hack the | |
1101 | holy of computer science holies: a compiler. Well you're in good | |
1102 | company, and in a good position. Guile's compiler needs your help. | |
ff73ae34 AW |
1103 | |
1104 | There are many possible avenues for improving Guile's compiler. | |
1105 | Probably the most important improvement, speed-wise, will be some form | |
1106 | of native compilation, both just-in-time and ahead-of-time. This could | |
1107 | be done in many ways. Probably the easiest strategy would be to extend | |
1108 | the compiled procedure structure to include a pointer to a native code | |
86872cc3 | 1109 | vector, and compile from bytecode to native code at run-time after a |
ff73ae34 AW |
1110 | procedure is called a certain number of times. |
1111 | ||
1112 | The name of the game is a profiling-based harvest of the low-hanging | |
1113 | fruit, running programs of interest under a system-level profiler and | |
1114 | determining which improvements would give the most bang for the buck. | |
98850fd7 AW |
1115 | It's really getting to the point though that native compilation is the |
1116 | next step. | |
ff73ae34 AW |
1117 | |
1118 | The compiler also needs help at the top end, enhancing the Scheme that | |
98850fd7 AW |
1119 | it knows to also understand R6RS, and adding new high-level compilers. |
1120 | We have JavaScript and Emacs Lisp mostly complete, but they could use | |
ecb87335 | 1121 | some love; Lua would be nice as well, but whatever language it is |
98850fd7 AW |
1122 | that strikes your fancy would be welcome too. |
1123 | ||
1124 | Compilers are for hacking, not for admiring or for complaining about. | |
1125 | Get to it! |