Commit | Line | Data |
---|---|---|
8680d53b AW |
1 | @c -*-texinfo-*- |
2 | @c This is part of the GNU Guile Reference Manual. | |
acc51c3e | 3 | @c Copyright (C) 2008, 2009, 2010 |
8680d53b AW |
4 | @c Free Software Foundation, Inc. |
5 | @c See the file guile.texi for copying conditions. | |
6 | ||
7 | @node Compiling to the Virtual Machine | |
8 | @section Compiling to the Virtual Machine | |
9 | ||
00ce5125 AW |
10 | Compilers have a mystique about them that is attractive and |
11 | off-putting at the same time. They are attractive because they are | |
12 | magical -- they transform inert text into live results, like throwing | |
e33e3aee AW |
13 | the switch on Frankenstein's monster. However, this magic is perceived |
14 | by many to be impenetrable. | |
00ce5125 | 15 | |
0b8f3ac5 AW |
16 | This section aims to pay attention to the small man behind the |
17 | curtain. | |
00ce5125 | 18 | |
e3ba263d | 19 | @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to |
98850fd7 | 20 | know how to compile your @code{.scm} file. |
00ce5125 AW |
21 | |
22 | @menu | |
23 | * Compiler Tower:: | |
24 | * The Scheme Compiler:: | |
81fd3152 | 25 | * Tree-IL:: |
00ce5125 | 26 | * GLIL:: |
81fd3152 | 27 | * Assembly:: |
73643339 | 28 | * Bytecode and Objcode:: |
e63d888e | 29 | * Writing New High-Level Languages:: |
e3ba263d | 30 | * Extending the Compiler:: |
00ce5125 AW |
31 | @end menu |
32 | ||
33 | @node Compiler Tower | |
34 | @subsection Compiler Tower | |
35 | ||
36 | Guile's compiler is quite simple, actually -- its @emph{compilers}, to | |
37 | put it more accurately. Guile defines a tower of languages, starting | |
38 | at Scheme and progressively simplifying down to languages that | |
e3ba263d | 39 | resemble the VM instruction set (@pxref{Instruction Set}). |
00ce5125 AW |
40 | |
41 | Each language knows how to compile to the next, so each step is simple | |
42 | and understandable. Furthermore, this set of languages is not | |
43 | hardcoded into Guile, so it is possible for the user to add new | |
44 | high-level languages, new passes, or even different compilation | |
45 | targets. | |
46 | ||
e3ba263d AW |
47 | Languages are registered in the module, @code{(system base language)}: |
48 | ||
49 | @example | |
50 | (use-modules (system base language)) | |
51 | @end example | |
52 | ||
53 | They are registered with the @code{define-language} form. | |
54 | ||
55 | @deffn {Scheme Syntax} define-language @ | |
41e64dd7 AW |
56 | name title reader printer @ |
57 | [parser=#f] [compilers='()] [decompilers='()] [evaluator=#f] @ | |
58 | [joiner=#f] [make-default-environment=make-fresh-user-module] | |
e3ba263d AW |
59 | Define a language. |
60 | ||
61 | This syntax defines a @code{#<language>} object, bound to @var{name} | |
62 | in the current environment. In addition, the language will be added to | |
63 | the global language set. For example, this is the language definition | |
64 | for Scheme: | |
65 | ||
66 | @example | |
67 | (define-language scheme | |
41e64dd7 AW |
68 | #:title "Scheme" |
69 | #:reader (lambda (port env) ...) | |
98850fd7 | 70 | #:compilers `((tree-il . ,compile-tree-il)) |
81fd3152 | 71 | #:decompilers `((tree-il . ,decompile-tree-il)) |
41e64dd7 AW |
72 | #:evaluator (lambda (x module) (primitive-eval x)) |
73 | #:printer write | |
74 | #:make-default-environment (lambda () ...)) | |
e3ba263d | 75 | @end example |
e3ba263d AW |
76 | @end deffn |
77 | ||
78 | The interesting thing about having languages defined this way is that | |
79 | they present a uniform interface to the read-eval-print loop. This | |
80 | allows the user to change the current language of the REPL: | |
81 | ||
82 | @example | |
81fd3152 | 83 | scheme@@(guile-user)> ,language tree-il |
41e64dd7 AW |
84 | Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'. |
85 | tree-il@@(guile-user)> ,L scheme | |
86 | Happy hacking with Scheme! To switch back, type `,L tree-il'. | |
87 | scheme@@(guile-user)> | |
e3ba263d AW |
88 | @end example |
89 | ||
90 | Languages can be looked up by name, as they were above. | |
91 | ||
92 | @deffn {Scheme Procedure} lookup-language name | |
93 | Looks up a language named @var{name}, autoloading it if necessary. | |
94 | ||
95 | Languages are autoloaded by looking for a variable named @var{name} in | |
96 | a module named @code{(language @var{name} spec)}. | |
97 | ||
98 | The language object will be returned, or @code{#f} if there does not | |
99 | exist a language with that name. | |
100 | @end deffn | |
101 | ||
102 | Defining languages this way allows us to programmatically determine | |
103 | the necessary steps for compiling code from one language to another. | |
104 | ||
105 | @deffn {Scheme Procedure} lookup-compilation-order from to | |
106 | Recursively traverses the set of languages to which @var{from} can | |
107 | compile, depth-first, and return the first path that can transform | |
108 | @var{from} to @var{to}. Returns @code{#f} if no path is found. | |
109 | ||
110 | This function memoizes its results in a cache that is invalidated by | |
111 | subsequent calls to @code{define-language}, so it should be quite | |
112 | fast. | |
113 | @end deffn | |
114 | ||
115 | There is a notion of a ``current language'', which is maintained in | |
116 | the @code{*current-language*} fluid. This language is normally Scheme, | |
86872cc3 | 117 | and may be rebound by the user. The run-time compilation interfaces |
e3ba263d AW |
118 | (@pxref{Read/Load/Eval/Compile}) also allow you to choose other source |
119 | and target languages. | |
120 | ||
121 | The normal tower of languages when compiling Scheme goes like this: | |
122 | ||
123 | @itemize | |
41e64dd7 | 124 | @item Scheme |
81fd3152 | 125 | @item Tree Intermediate Language (Tree-IL) |
41e64dd7 | 126 | @item Guile Lowlevel Intermediate Language (GLIL) |
81fd3152 AW |
127 | @item Assembly |
128 | @item Bytecode | |
73643339 | 129 | @item Objcode |
e3ba263d AW |
130 | @end itemize |
131 | ||
132 | Object code may be serialized to disk directly, though it has a cookie | |
73643339 AW |
133 | and version prepended to the front. But when compiling Scheme at run |
134 | time, you want a Scheme value: for example, a compiled procedure. For | |
135 | this reason, so as not to break the abstraction, Guile defines a fake | |
81fd3152 AW |
136 | language at the bottom of the tower: |
137 | ||
138 | @itemize | |
139 | @item Value | |
140 | @end itemize | |
141 | ||
142 | Compiling to @code{value} loads the object code into a procedure, and | |
143 | wakes the sleeping giant. | |
e3ba263d AW |
144 | |
145 | Perhaps this strangeness can be explained by example: | |
146 | @code{compile-file} defaults to compiling to object code, because it | |
147 | produces object code that has to live in the barren world outside the | |
148 | Guile runtime; but @code{compile} defaults to compiling to | |
149 | @code{value}, as its product re-enters the Guile world. | |
150 | ||
151 | Indeed, the process of compilation can circulate through these | |
152 | different worlds indefinitely, as shown by the following quine: | |
153 | ||
154 | @example | |
00ce5125 | 155 | ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x))) |
e3ba263d | 156 | @end example |
00ce5125 AW |
157 | |
158 | @node The Scheme Compiler | |
159 | @subsection The Scheme Compiler | |
160 | ||
81fd3152 AW |
161 | The job of the Scheme compiler is to expand all macros and all of |
162 | Scheme to its most primitive expressions. The definition of | |
163 | ``primitive'' is given by the inventory of constructs provided by | |
164 | Tree-IL, the target language of the Scheme compiler: procedure | |
165 | applications, conditionals, lexical references, etc. This is described | |
166 | more fully in the next section. | |
167 | ||
168 | The tricky and amusing thing about the Scheme-to-Tree-IL compiler is | |
169 | that it is completely implemented by the macro expander. Since the | |
170 | macro expander has to run over all of the source code already in order | |
171 | to expand macros, it might as well do the analysis at the same time, | |
172 | producing Tree-IL expressions directly. | |
173 | ||
174 | Because this compiler is actually the macro expander, it is | |
175 | extensible. Any macro which the user writes becomes part of the | |
176 | compiler. | |
177 | ||
178 | The Scheme-to-Tree-IL expander may be invoked using the generic | |
179 | @code{compile} procedure: | |
180 | ||
181 | @lisp | |
182 | (compile '(+ 1 2) #:from 'scheme #:to 'tree-il) | |
183 | @result{} | |
184 | #<<application> src: #f | |
185 | proc: #<<toplevel-ref> src: #f name: +> | |
186 | args: (#<<const> src: #f exp: 1> | |
187 | #<<const> src: #f exp: 2>)> | |
188 | @end lisp | |
189 | ||
190 | Or, since Tree-IL is so close to Scheme, it is often useful to expand | |
191 | Scheme to Tree-IL, then translate back to Scheme. For that reason the | |
192 | expander provides two interfaces. The former is equivalent to calling | |
41e64dd7 | 193 | @code{(macroexpand '(+ 1 2) 'c)}, where the @code{'c} is for |
81fd3152 AW |
194 | ``compile''. With @code{'e} (the default), the result is translated |
195 | back to Scheme: | |
196 | ||
197 | @lisp | |
41e64dd7 | 198 | (macroexpand '(+ 1 2)) |
81fd3152 | 199 | @result{} (+ 1 2) |
41e64dd7 | 200 | (macroexpand '(let ((x 10)) (* x x))) |
81fd3152 AW |
201 | @result{} (let ((x84 10)) (* x84 x84)) |
202 | @end lisp | |
203 | ||
204 | The second example shows that as part of its job, the macro expander | |
205 | renames lexically-bound variables. The original names are preserved | |
206 | when compiling to Tree-IL, but can't be represented in Scheme: a | |
207 | lexical binding only has one name. It is for this reason that the | |
208 | @emph{native} output of the expander is @emph{not} Scheme. There's too | |
209 | much information we would lose if we translated to Scheme directly: | |
210 | lexical variable names, source locations, and module hygiene. | |
211 | ||
41e64dd7 AW |
212 | Note however that @code{macroexpand} does not have the same signature |
213 | as @code{compile-tree-il}. @code{compile-tree-il} is a small wrapper | |
214 | around @code{macroexpand}, to make it conform to the general form of | |
81fd3152 AW |
215 | compiler procedures in Guile's language tower. |
216 | ||
98850fd7 AW |
217 | Compiler procedures take three arguments: an expression, an |
218 | environment, and a keyword list of options. They return three values: | |
219 | the compiled expression, the corresponding environment for the target | |
220 | language, and a ``continuation environment''. The compiled expression | |
221 | and environment will serve as input to the next language's compiler. | |
222 | The ``continuation environment'' can be used to compile another | |
223 | expression from the same source language within the same module. | |
81fd3152 AW |
224 | |
225 | For example, you might compile the expression, @code{(define-module | |
226 | (foo))}. This will result in a Tree-IL expression and environment. But | |
227 | if you compiled a second expression, you would want to take into | |
228 | account the compile-time effect of compiling the previous expression, | |
229 | which puts the user in the @code{(foo)} module. That is purpose of the | |
230 | ``continuation environment''; you would pass it as the environment | |
231 | when compiling the subsequent expression. | |
232 | ||
41e64dd7 AW |
233 | For Scheme, an environment is a module. By default, the @code{compile} |
234 | and @code{compile-file} procedures compile in a fresh module, such | |
235 | that bindings and macros introduced by the expression being compiled | |
236 | are isolated: | |
1ebe6a63 LC |
237 | |
238 | @example | |
239 | (eq? (current-module) (compile '(current-module))) | |
240 | @result{} #f | |
241 | ||
242 | (compile '(define hello 'world)) | |
243 | (defined? 'hello) | |
244 | @result{} #f | |
245 | ||
246 | (define / *) | |
247 | (eq? (compile '/) /) | |
248 | @result{} #f | |
249 | @end example | |
250 | ||
251 | Similarly, changes to the @code{current-reader} fluid (@pxref{Loading, | |
252 | @code{current-reader}}) are isolated: | |
253 | ||
254 | @example | |
255 | (compile '(fluid-set! current-reader (lambda args 'fail))) | |
256 | (fluid-ref current-reader) | |
257 | @result{} #f | |
258 | @end example | |
259 | ||
260 | Nevertheless, having the compiler and @dfn{compilee} share the same name | |
261 | space can be achieved by explicitly passing @code{(current-module)} as | |
262 | the compilation environment: | |
263 | ||
264 | @example | |
265 | (define hello 'world) | |
266 | (compile 'hello #:env (current-module)) | |
267 | @result{} world | |
268 | @end example | |
269 | ||
81fd3152 AW |
270 | @node Tree-IL |
271 | @subsection Tree-IL | |
00ce5125 | 272 | |
81fd3152 | 273 | Tree Intermediate Language (Tree-IL) is a structured intermediate |
c850030f AW |
274 | language that is close in expressive power to Scheme. It is an |
275 | expanded, pre-analyzed Scheme. | |
276 | ||
81fd3152 AW |
277 | Tree-IL is ``structured'' in the sense that its representation is |
278 | based on records, not S-expressions. This gives a rigidity to the | |
279 | language that ensures that compiling to a lower-level language only | |
41e64dd7 AW |
280 | requires a limited set of transformations. For example, the Tree-IL |
281 | type @code{<const>} is a record type with two fields, @code{src} and | |
282 | @code{exp}. Instances of this type are created via @code{make-const}. | |
283 | Fields of this type are accessed via the @code{const-src} and | |
284 | @code{const-exp} procedures. There is also a predicate, @code{const?}. | |
285 | @xref{Records}, for more information on records. | |
81fd3152 AW |
286 | |
287 | @c alpha renaming | |
288 | ||
289 | All Tree-IL types have a @code{src} slot, which holds source location | |
290 | information for the expression. This information, if present, will be | |
291 | residualized into the compiled object code, allowing backtraces to | |
292 | show source information. The format of @code{src} is the same as that | |
293 | returned by Guile's @code{source-properties} function. @xref{Source | |
294 | Properties}, for more information. | |
295 | ||
296 | Although Tree-IL objects are represented internally using records, | |
297 | there is also an equivalent S-expression external representation for | |
298 | each kind of Tree-IL. For example, an the S-expression representation | |
299 | of @code{#<const src: #f exp: 3>} expression would be: | |
c850030f AW |
300 | |
301 | @example | |
81fd3152 | 302 | (const 3) |
c850030f AW |
303 | @end example |
304 | ||
81fd3152 | 305 | Users may program with this format directly at the REPL: |
c850030f AW |
306 | |
307 | @example | |
81fd3152 | 308 | scheme@@(guile-user)> ,language tree-il |
41e64dd7 | 309 | Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'. |
81fd3152 | 310 | tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10)) |
c850030f AW |
311 | @result{} 42 |
312 | @end example | |
313 | ||
81fd3152 AW |
314 | The @code{src} fields are left out of the external representation. |
315 | ||
98850fd7 AW |
316 | One may create Tree-IL objects from their external representations via |
317 | calling @code{parse-tree-il}, the reader for Tree-IL. If any source | |
318 | information is attached to the input S-expression, it will be | |
319 | propagated to the resulting Tree-IL expressions. This is probably the | |
320 | easiest way to compile to Tree-IL: just make the appropriate external | |
321 | representations in S-expression format, and let @code{parse-tree-il} | |
322 | take care of the rest. | |
323 | ||
81fd3152 AW |
324 | @deftp {Scheme Variable} <void> src |
325 | @deftpx {External Representation} (void) | |
326 | An empty expression. In practice, equivalent to Scheme's @code{(if #f | |
327 | #f)}. | |
328 | @end deftp | |
329 | @deftp {Scheme Variable} <const> src exp | |
330 | @deftpx {External Representation} (const @var{exp}) | |
331 | A constant. | |
332 | @end deftp | |
333 | @deftp {Scheme Variable} <primitive-ref> src name | |
334 | @deftpx {External Representation} (primitive @var{name}) | |
335 | A reference to a ``primitive''. A primitive is a procedure that, when | |
336 | compiled, may be open-coded. For example, @code{cons} is usually | |
337 | recognized as a primitive, so that it compiles down to a single | |
338 | instruction. | |
339 | ||
340 | Compilation of Tree-IL usually begins with a pass that resolves some | |
341 | @code{<module-ref>} and @code{<toplevel-ref>} expressions to | |
342 | @code{<primitive-ref>} expressions. The actual compilation pass | |
343 | has special cases for applications of certain primitives, like | |
344 | @code{apply} or @code{cons}. | |
345 | @end deftp | |
346 | @deftp {Scheme Variable} <lexical-ref> src name gensym | |
347 | @deftpx {External Representation} (lexical @var{name} @var{gensym}) | |
348 | A reference to a lexically-bound variable. The @var{name} is the | |
349 | original name of the variable in the source program. @var{gensym} is a | |
350 | unique identifier for this variable. | |
351 | @end deftp | |
352 | @deftp {Scheme Variable} <lexical-set> src name gensym exp | |
353 | @deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp}) | |
354 | Sets a lexically-bound variable. | |
355 | @end deftp | |
356 | @deftp {Scheme Variable} <module-ref> src mod name public? | |
357 | @deftpx {External Representation} (@@ @var{mod} @var{name}) | |
358 | @deftpx {External Representation} (@@@@ @var{mod} @var{name}) | |
359 | A reference to a variable in a specific module. @var{mod} should be | |
360 | the name of the module, e.g. @code{(guile-user)}. | |
361 | ||
362 | If @var{public?} is true, the variable named @var{name} will be looked | |
363 | up in @var{mod}'s public interface, and serialized with @code{@@}; | |
364 | otherwise it will be looked up among the module's private bindings, | |
365 | and is serialized with @code{@@@@}. | |
366 | @end deftp | |
367 | @deftp {Scheme Variable} <module-set> src mod name public? exp | |
368 | @deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp}) | |
369 | @deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp}) | |
370 | Sets a variable in a specific module. | |
371 | @end deftp | |
372 | @deftp {Scheme Variable} <toplevel-ref> src name | |
373 | @deftpx {External Representation} (toplevel @var{name}) | |
374 | References a variable from the current procedure's module. | |
375 | @end deftp | |
376 | @deftp {Scheme Variable} <toplevel-set> src name exp | |
377 | @deftpx {External Representation} (set! (toplevel @var{name}) @var{exp}) | |
378 | Sets a variable in the current procedure's module. | |
379 | @end deftp | |
380 | @deftp {Scheme Variable} <toplevel-define> src name exp | |
381 | @deftpx {External Representation} (define (toplevel @var{name}) @var{exp}) | |
382 | Defines a new top-level variable in the current procedure's module. | |
383 | @end deftp | |
384 | @deftp {Scheme Variable} <conditional> src test then else | |
385 | @deftpx {External Representation} (if @var{test} @var{then} @var{else}) | |
ca445ba5 | 386 | A conditional. Note that @var{else} is not optional. |
c850030f | 387 | @end deftp |
81fd3152 AW |
388 | @deftp {Scheme Variable} <application> src proc args |
389 | @deftpx {External Representation} (apply @var{proc} . @var{args}) | |
ca445ba5 | 390 | A procedure call. |
c850030f | 391 | @end deftp |
81fd3152 AW |
392 | @deftp {Scheme Variable} <sequence> src exps |
393 | @deftpx {External Representation} (begin . @var{exps}) | |
394 | Like Scheme's @code{begin}. | |
c850030f | 395 | @end deftp |
41e64dd7 AW |
396 | @deftp {Scheme Variable} <lambda> src meta body |
397 | @deftpx {External Representation} (lambda @var{meta} @var{body}) | |
398 | A closure. @var{meta} is an association list of properties for the | |
399 | procedure. @var{body} is a single Tree-IL expression of type | |
400 | @code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to | |
401 | an alternate clause, this makes Tree-IL's @code{<lambda>} have the | |
402 | expressiveness of Scheme's @code{case-lambda}. | |
403 | @end deftp | |
404 | @deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate | |
405 | @deftpx {External Representation} @ | |
406 | (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@ | |
407 | @var{body})@ | |
408 | [@var{alternate}]) | |
409 | One clause of a @code{case-lambda}. A @code{lambda} expression in | |
410 | Scheme is treated as a @code{case-lambda} with one clause. | |
411 | ||
412 | @var{req} is a list of the procedure's required arguments, as symbols. | |
413 | @var{opt} is a list of the optional arguments, or @code{#f} if there | |
414 | are no optional arguments. @var{rest} is the name of the rest | |
415 | argument, or @code{#f}. | |
416 | ||
417 | @var{kw} is a list of the form, @code{(@var{allow-other-keys?} | |
418 | (@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the | |
419 | keyword corresponding to the argument named @var{name}, and whose | |
420 | corresponding gensym is @var{var}. @var{inits} are tree-il expressions | |
421 | corresponding to all of the optional and keyword argumens, evaluated | |
422 | to bind variables whose value is not supplied by the procedure caller. | |
423 | Each @var{init} expression is evaluated in the lexical context of | |
424 | previously bound variables, from left to right. | |
425 | ||
426 | @var{gensyms} is a list of gensyms corresponding to all arguments: | |
427 | first all of the required arguments, then the optional arguments if | |
428 | any, then the rest argument if any, then all of the keyword arguments. | |
429 | ||
430 | @var{body} is the body of the clause. If the procedure is called with | |
431 | an appropriate number of arguments, @var{body} is evaluated in tail | |
432 | position. Otherwise, if there is a @var{consequent}, it should be a | |
433 | @code{<lambda-case>} expression, representing the next clause to try. | |
434 | If there is no @var{consequent}, a wrong-number-of-arguments error is | |
435 | signaled. | |
436 | @end deftp | |
437 | @deftp {Scheme Variable} <let> src names gensyms vals exp | |
438 | @deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp}) | |
81fd3152 | 439 | Lexical binding, like Scheme's @code{let}. @var{names} are the |
41e64dd7 | 440 | original binding names, @var{gensyms} are gensyms corresponding to the |
81fd3152 AW |
441 | @var{names}, and @var{vals} are Tree-IL expressions for the values. |
442 | @var{exp} is a single Tree-IL expression. | |
443 | @end deftp | |
41e64dd7 AW |
444 | @deftp {Scheme Variable} <letrec> src names gensyms vals exp |
445 | @deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp}) | |
81fd3152 AW |
446 | A version of @code{<let>} that creates recursive bindings, like |
447 | Scheme's @code{letrec}. | |
448 | @end deftp | |
41e64dd7 AW |
449 | @deftp {Scheme Variable} <dynlet> fluids vals body |
450 | @deftpx {External Representation} (dynlet @var{fluids} @var{vals} @var{body}) | |
451 | Dynamic binding; the equivalent of Scheme's @code{with-fluids}. | |
452 | @var{fluids} should be a list of Tree-IL expressions that will | |
453 | evaluate to fluids, and @var{vals} a corresponding list of expressions | |
454 | to bind to the fluids during the dynamic extent of the evaluation of | |
455 | @var{body}. | |
456 | @end deftp | |
457 | @deftp {Scheme Variable} <dynref> fluid | |
458 | @deftpx {External Representation} (dynref @var{fluid}) | |
459 | A dynamic variable reference. @var{fluid} should be a Tree-IL | |
460 | expression evaluating to a fluid. | |
461 | @end deftp | |
462 | @deftp {Scheme Variable} <dynset> fluid exp | |
463 | @deftpx {External Representation} (dynset @var{fluid} @var{exp}) | |
464 | A dynamic variable set. @var{fluid}, a Tree-IL expression evaluating | |
465 | to a fluid, will be set to the result of evaluating @var{exp}. | |
466 | @end deftp | |
467 | @deftp {Scheme Variable} <dynwind> winder body unwinder | |
468 | @deftpx {External Representation} (dynwind @var{winder} @var{body} @var{unwinder}) | |
469 | A @code{dynamic-wind}. @var{winder} and @var{unwinder} should both | |
470 | evaluate to thunks. Ensure that the winder and the unwinder are called | |
471 | before entering and after leaving @var{body}. Note that @var{body} is | |
472 | an expression, without a thunk wrapper. | |
473 | @end deftp | |
474 | @deftp {Scheme Variable} <prompt> tag body handler | |
475 | @deftpx {External Representation} (prompt @var{tag} @var{body} @var{handler}) | |
476 | A dynamic prompt. Instates a prompt named @var{tag}, an expression, | |
477 | during the dynamic extent of the execution of @var{body}, also an | |
478 | expression. If an abort occurs to this prompt, control will be passed | |
479 | to @var{handler}, a @code{<lambda-case>} expression with no optional | |
480 | or keyword arguments, and no alternate. The first argument to the | |
481 | @code{<lambda-case>} will be the captured continuation, and then all | |
482 | of the values passed to the abort. @xref{Prompts}, for more | |
483 | information. | |
484 | @end deftp | |
485 | @deftp {Scheme Variable} <abort> tag args tail | |
486 | @deftpx {External Representation} (abort @var{tag} @var{args} @var{tail}) | |
487 | An abort to the nearest prompt with the name @var{tag}, an expression. | |
488 | @var{args} should be a list of expressions to pass to the prompt's | |
489 | handler, and @var{tail} should be an expression that will evaluate to | |
490 | a list of additional arguments. An abort will save the partial | |
491 | continuation, which may later be reinstated, resulting in the | |
492 | @code{<abort>} expression evaluating to some number of values. | |
493 | @end deftp | |
81fd3152 | 494 | |
98850fd7 AW |
495 | There are two Tree-IL constructs that are not normally produced by |
496 | higher-level compilers, but instead are generated during the | |
497 | source-to-source optimization and analysis passes that the Tree-IL | |
498 | compiler does. Users should not generate these expressions directly, | |
499 | unless they feel very clever, as the default analysis pass will | |
500 | generate them as necessary. | |
501 | ||
41e64dd7 AW |
502 | @deftp {Scheme Variable} <let-values> src names gensyms exp body |
503 | @deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body}) | |
98850fd7 AW |
504 | Like Scheme's @code{receive} -- binds the values returned by |
505 | evaluating @code{exp} to the @code{lambda}-like bindings described by | |
41e64dd7 | 506 | @var{gensyms}. That is to say, @var{gensyms} may be an improper list. |
98850fd7 AW |
507 | |
508 | @code{<let-values>} is an optimization of @code{<application>} of the | |
509 | primitive, @code{call-with-values}. | |
510 | @end deftp | |
41e64dd7 AW |
511 | @deftp {Scheme Variable} <fix> src names gensyms vals body |
512 | @deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body}) | |
98850fd7 AW |
513 | Like @code{<letrec>}, but only for @var{vals} that are unset |
514 | @code{lambda} expressions. | |
515 | ||
516 | @code{fix} is an optimization of @code{letrec} (and @code{let}). | |
517 | @end deftp | |
81fd3152 AW |
518 | |
519 | Tree-IL implements a compiler to GLIL that recursively traverses | |
520 | Tree-IL expressions, writing out GLIL expressions into a linear list. | |
521 | The compiler also keeps some state as to whether the current | |
522 | expression is in tail context, and whether its value will be used in | |
523 | future computations. This state allows the compiler not to emit code | |
524 | for constant expressions that will not be used (e.g. docstrings), and | |
525 | to perform tail calls when in tail position. | |
526 | ||
98850fd7 AW |
527 | Most optimization, such as it currently is, is performed on Tree-IL |
528 | expressions as source-to-source transformations. There will be more | |
529 | optimizations added in the future. | |
c850030f AW |
530 | |
531 | Interested readers are encouraged to read the implementation in | |
81fd3152 | 532 | @code{(language tree-il compile-glil)} for more details. |
00ce5125 AW |
533 | |
534 | @node GLIL | |
535 | @subsection GLIL | |
536 | ||
41e64dd7 | 537 | Guile Lowlevel Intermediate Language (GLIL) is a structured intermediate |
81fd3152 | 538 | language whose expressions more closely approximate Guile's VM |
98850fd7 AW |
539 | instruction set. Its expression types are defined in @code{(language |
540 | glil)}. | |
c850030f | 541 | |
41e64dd7 | 542 | @deftp {Scheme Variable} <glil-program> meta . body |
86872cc3 | 543 | A unit of code that at run-time will correspond to a compiled |
41e64dd7 | 544 | procedure. @var{meta} should be an alist of properties, as in |
98850fd7 AW |
545 | Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL |
546 | expressions. | |
c850030f | 547 | @end deftp |
41e64dd7 AW |
548 | @deftp {Scheme Variable} <glil-std-prelude> nreq nlocs else-label |
549 | A prologue for a function with no optional, keyword, or rest | |
550 | arguments. @var{nreq} is the number of required arguments. @var{nlocs} | |
551 | the total number of local variables, including the arguments. If the | |
552 | procedure was not given exactly @var{nreq} arguments, control will | |
553 | jump to @var{else-label}, if given, or otherwise signal an error. | |
554 | @end deftp | |
555 | @deftp {Scheme Variable} <glil-opt-prelude> nreq nopt rest nlocs else-label | |
556 | A prologue for a function with optional or rest arguments. Like | |
557 | @code{<glil-std-prelude>}, with the addition that @var{nopt} is the | |
558 | number of optional arguments (possibly zero) and @var{rest} is an | |
559 | index of a local variable at which to bind a rest argument, or | |
560 | @code{#f} if there is no rest argument. | |
561 | @end deftp | |
562 | @deftp {Scheme Variable} <glil-kw-prelude> nreq nopt rest kw allow-other-keys? nlocs else-label | |
563 | A prologue for a function with keyword arguments. Like | |
564 | @code{<glil-opt-prelude>}, with the addition that @var{kw} is a list | |
565 | of keyword arguments, and @var{allow-other-keys?} is a flag indicating | |
566 | whether to allow unknown keys. @xref{Function Prologue Instructions, | |
567 | @code{bind-kwargs}}, for details on the format of @var{kw}. | |
568 | @end deftp | |
c850030f | 569 | @deftp {Scheme Variable} <glil-bind> . vars |
ff73ae34 AW |
570 | An advisory expression that notes a liveness extent for a set of |
571 | variables. @var{vars} is a list of @code{(@var{name} @var{type} | |
572 | @var{index})}, where @var{type} should be either @code{argument}, | |
573 | @code{local}, or @code{external}. | |
574 | ||
575 | @code{<glil-bind>} expressions end up being serialized as part of a | |
576 | program's metadata and do not form part of a program's code path. | |
c850030f AW |
577 | @end deftp |
578 | @deftp {Scheme Variable} <glil-mv-bind> vars rest | |
ff73ae34 AW |
579 | A multiple-value binding of the values on the stack to @var{vars}. Iff |
580 | @var{rest} is true, the last element of @var{vars} will be treated as | |
581 | a rest argument. | |
582 | ||
583 | In addition to pushing a binding annotation on the stack, like | |
584 | @code{<glil-bind>}, an expression is emitted at compilation time to | |
585 | make sure that there are enough values available to bind. See the | |
acc51c3e AW |
586 | notes on @code{truncate-values} in @ref{Procedure Call and Return |
587 | Instructions}, for more information. | |
c850030f AW |
588 | @end deftp |
589 | @deftp {Scheme Variable} <glil-unbind> | |
ff73ae34 AW |
590 | Closes the liveness extent of the most recently encountered |
591 | @code{<glil-bind>} or @code{<glil-mv-bind>} expression. As GLIL | |
592 | expressions are compiled, a parallel stack of live bindings is | |
593 | maintained; this expression pops off the top element from that stack. | |
594 | ||
595 | Bindings are written into the program's metadata so that debuggers and | |
596 | other tools can determine the set of live local variables at a given | |
597 | offset within a VM program. | |
c850030f AW |
598 | @end deftp |
599 | @deftp {Scheme Variable} <glil-source> loc | |
ff73ae34 | 600 | Records source information for the preceding expression. @var{loc} |
73643339 AW |
601 | should be an association list of containing @code{line} @code{column}, |
602 | and @code{filename} keys, e.g. as returned by | |
603 | @code{source-properties}. | |
c850030f AW |
604 | @end deftp |
605 | @deftp {Scheme Variable} <glil-void> | |
98850fd7 | 606 | Pushes ``the unspecified value'' on the stack. |
c850030f AW |
607 | @end deftp |
608 | @deftp {Scheme Variable} <glil-const> obj | |
ff73ae34 | 609 | Pushes a constant value onto the stack. @var{obj} must be a number, |
98850fd7 AW |
610 | string, symbol, keyword, boolean, character, uniform array, the empty |
611 | list, or a pair or vector of constants. | |
c850030f | 612 | @end deftp |
98850fd7 AW |
613 | @deftp {Scheme Variable} <glil-lexical> local? boxed? op index |
614 | Accesses a lexically bound variable. If the variable is not | |
41e64dd7 AW |
615 | @var{local?} it is free. All variables may have @code{ref}, |
616 | @code{set}, and @code{bound?} as their @var{op}. Boxed variables may | |
617 | also have the @var{op}s @code{box}, @code{empty-box}, and @code{fix}, | |
618 | which correspond in semantics to the VM instructions @code{box}, | |
98850fd7 AW |
619 | @code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for |
620 | more information. | |
c850030f AW |
621 | @end deftp |
622 | @deftp {Scheme Variable} <glil-toplevel> op name | |
ff73ae34 AW |
623 | Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set}, |
624 | or @code{define}. | |
c850030f AW |
625 | @end deftp |
626 | @deftp {Scheme Variable} <glil-module> op mod name public? | |
73643339 AW |
627 | Accesses a variable within a specific module. See Tree-IL's |
628 | @code{<module-ref>}, for more information. | |
c850030f AW |
629 | @end deftp |
630 | @deftp {Scheme Variable} <glil-label> label | |
ff73ae34 AW |
631 | Creates a new label. @var{label} can be any Scheme value, and should |
632 | be unique. | |
c850030f AW |
633 | @end deftp |
634 | @deftp {Scheme Variable} <glil-branch> inst label | |
ff73ae34 | 635 | Branch to a label. @var{label} should be a @code{<ghil-label>}. |
c850030f AW |
636 | @code{inst} is a branching instruction: @code{br-if}, @code{br}, etc. |
637 | @end deftp | |
638 | @deftp {Scheme Variable} <glil-call> inst nargs | |
ff73ae34 | 639 | This expression is probably misnamed, as it does not correspond to |
c850030f AW |
640 | function calls. @code{<glil-call>} invokes the VM instruction named |
641 | @var{inst}, noting that it is called with @var{nargs} stack arguments. | |
ff73ae34 AW |
642 | The arguments should be pushed on the stack already. What happens to |
643 | the stack afterwards depends on the instruction. | |
c850030f AW |
644 | @end deftp |
645 | @deftp {Scheme Variable} <glil-mv-call> nargs ra | |
ff73ae34 AW |
646 | Performs a multiple-value call. @var{ra} is a @code{<glil-label>} |
647 | corresponding to the multiple-value return address for the call. See | |
acc51c3e AW |
648 | the notes on @code{mv-call} in @ref{Procedure Call and Return |
649 | Instructions}, for more information. | |
c850030f | 650 | @end deftp |
41e64dd7 AW |
651 | @deftp {Scheme Variable} <glil-prompt> label escape-only? |
652 | Push a dynamic prompt into the stack, with a handler at @var{label}. | |
653 | @var{escape-only?} is a flag that is propagated to the prompt, | |
654 | allowing an abort to avoid capturing a continuation in some cases. | |
655 | @xref{Prompts}, for more information. | |
656 | @end deftp | |
c850030f | 657 | |
ff73ae34 | 658 | Users may enter in GLIL at the REPL as well, though there is a bit |
41e64dd7 | 659 | more bookkeeping to do: |
00ce5125 | 660 | |
ff73ae34 AW |
661 | @example |
662 | scheme@@(guile-user)> ,language glil | |
41e64dd7 AW |
663 | Happy hacking with Guile Lowlevel Intermediate Language (GLIL)! |
664 | To switch back, type `,L scheme'. | |
665 | glil@@(guile-user)> (program () (std-prelude 0 0 #f) | |
666 | (const 3) (call return 1)) | |
ff73ae34 AW |
667 | @result{} 3 |
668 | @end example | |
00ce5125 | 669 | |
ff73ae34 AW |
670 | Just as in all of Guile's compilers, an environment is passed to the |
671 | GLIL-to-object code compiler, and one is returned as well, along with | |
672 | the object code. | |
00ce5125 | 673 | |
81fd3152 AW |
674 | @node Assembly |
675 | @subsection Assembly | |
676 | ||
73643339 AW |
677 | Assembly is an S-expression-based, human-readable representation of |
678 | the actual bytecodes that will be emitted for the VM. As such, it is a | |
679 | useful intermediate language both for compilation and for | |
680 | decompilation. | |
81fd3152 | 681 | |
73643339 AW |
682 | Besides the fact that it is not a record-based language, assembly |
683 | differs from GLIL in four main ways: | |
00ce5125 | 684 | |
73643339 AW |
685 | @itemize |
686 | @item Labels have been resolved to byte offsets in the program. | |
687 | @item Constants inside procedures have either been expressed as inline | |
98850fd7 | 688 | instructions or cached in object arrays. |
73643339 AW |
689 | @item Procedures with metadata (source location information, liveness |
690 | extents, procedure names, generic properties, etc) have had their | |
691 | metadata serialized out to thunks. | |
692 | @item All expressions correspond directly to VM instructions -- i.e., | |
98850fd7 | 693 | there is no @code{<glil-lexical>} which can be a ref or a set. |
73643339 AW |
694 | @end itemize |
695 | ||
696 | Assembly is isomorphic to the bytecode that it compiles to. You can | |
697 | compile to bytecode, then decompile back to assembly, and you have the | |
698 | same assembly code. | |
699 | ||
700 | The general form of assembly instructions is the following: | |
701 | ||
702 | @lisp | |
703 | (@var{inst} @var{arg} ...) | |
704 | @end lisp | |
705 | ||
706 | The @var{inst} names a VM instruction, and its @var{arg}s will be | |
707 | embedded in the instruction stream. The easiest way to see assembly is | |
708 | to play around with it at the REPL, as can be seen in this annotated | |
709 | example: | |
710 | ||
711 | @example | |
0a715b9a | 712 | scheme@@(guile-user)> (pp (compile '(+ 32 10) #:to 'assembly)) |
41e64dd7 | 713 | (load-program |
0a715b9a AW |
714 | ((:LCASE16 . 2)) ; Labels, unused in this case. |
715 | 8 ; Length of the thunk that was compiled. | |
41e64dd7 | 716 | (load-program ; Metadata thunk. |
73643339 | 717 | () |
41e64dd7 AW |
718 | 17 |
719 | #f ; No metadata thunk for the metadata thunk. | |
720 | (make-eol) | |
721 | (make-eol) | |
0a715b9a AW |
722 | (make-int8 2) ; Liveness extents, source info, and arities, |
723 | (make-int8 8) ; in a format that Guile knows how to parse. | |
724 | (make-int8:0) | |
41e64dd7 AW |
725 | (list 0 3) |
726 | (list 0 1) | |
727 | (list 0 3) | |
728 | (return)) | |
0a715b9a | 729 | (assert-nargs-ee/locals 0) ; Prologue. |
41e64dd7 AW |
730 | (make-int8 32) ; Actual code starts here. |
731 | (make-int8 10) | |
732 | (add) | |
0a715b9a | 733 | (return)) |
73643339 AW |
734 | @end example |
735 | ||
736 | Of course you can switch the REPL to assembly and enter in assembly | |
737 | S-expressions directly, like with other languages, though it is more | |
738 | difficult, given that the length fields have to be correct. | |
739 | ||
740 | @node Bytecode and Objcode | |
741 | @subsection Bytecode and Objcode | |
742 | ||
743 | Finally, the raw bytes. There are actually two different ``languages'' | |
744 | here, corresponding to two different ways to represent the bytes. | |
745 | ||
746 | ``Bytecode'' represents code as uniform byte vectors, useful for | |
747 | structuring and destructuring code on the Scheme level. Bytecode is | |
748 | the next step down from assembly: | |
749 | ||
750 | @example | |
73643339 | 751 | scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode) |
0a715b9a AW |
752 | @result{} #vu8(8 0 0 0 25 0 0 0 ; Header. |
753 | 95 0 ; Prologue. | |
754 | 10 32 10 10 148 66 17 ; Actual code. | |
755 | 0 0 0 0 0 0 0 9 ; Metadata thunk. | |
756 | 9 10 2 10 8 11 18 0 3 18 0 1 18 0 3 66) | |
73643339 AW |
757 | @end example |
758 | ||
759 | ``Objcode'' is bytecode, but mapped directly to a C structure, | |
760 | @code{struct scm_objcode}: | |
761 | ||
762 | @example | |
763 | struct scm_objcode @{ | |
73643339 AW |
764 | scm_t_uint32 len; |
765 | scm_t_uint32 metalen; | |
766 | scm_t_uint8 base[0]; | |
767 | @}; | |
768 | @end example | |
769 | ||
770 | As one might imagine, objcode imposes a minimum length on the | |
41e64dd7 AW |
771 | bytecode. Also, the @code{len} and @code{metalen} fields are in native |
772 | endianness, which makes objcode (and bytecode) system-dependent. | |
73643339 AW |
773 | |
774 | Objcode also has a couple of important efficiency hacks. First, | |
775 | objcode may be mapped directly from disk, allowing compiled code to be | |
776 | loaded quickly, often from the system's disk cache, and shared among | |
777 | multiple processes. Secondly, objcode may be embedded in other | |
778 | objcode, allowing procedures to have the text of other procedures | |
779 | inlined into their bodies, without the need for separate allocation of | |
780 | the code. Of course, the objcode object itself does need to be | |
781 | allocated. | |
782 | ||
783 | Procedures related to objcode are defined in the @code{(system vm | |
784 | objcode)} module. | |
00ce5125 | 785 | |
ff73ae34 AW |
786 | @deffn {Scheme Procedure} objcode? obj |
787 | @deffnx {C Function} scm_objcode_p (obj) | |
788 | Returns @code{#f} iff @var{obj} is object code, @code{#f} otherwise. | |
789 | @end deffn | |
00ce5125 | 790 | |
73643339 | 791 | @deffn {Scheme Procedure} bytecode->objcode bytecode |
42a438e8 | 792 | @deffnx {C Function} scm_bytecode_to_objcode (bytecode) |
ff73ae34 | 793 | Makes a bytecode object from @var{bytecode}, which should be a |
41e64dd7 | 794 | bytevector. @xref{Bytevectors}. |
ff73ae34 | 795 | @end deffn |
e3ba263d | 796 | |
ff73ae34 AW |
797 | @deffn {Scheme Variable} load-objcode file |
798 | @deffnx {C Function} scm_load_objcode (file) | |
799 | Load object code from a file named @var{file}. The file will be mapped | |
800 | into memory via @code{mmap}, so this is a very fast operation. | |
e3ba263d | 801 | |
98850fd7 | 802 | On disk, object code has an sixteen-byte cookie prepended to it, to |
73643339 AW |
803 | prevent accidental loading of arbitrary garbage. |
804 | @end deffn | |
805 | ||
806 | @deffn {Scheme Variable} write-objcode objcode file | |
807 | @deffnx {C Function} scm_write_objcode (objcode) | |
41e64dd7 | 808 | Write object code out to a file, prepending the sixteen-byte cookie. |
ff73ae34 | 809 | @end deffn |
e3ba263d | 810 | |
41e64dd7 AW |
811 | @deffn {Scheme Variable} objcode->bytecode objcode |
812 | @deffnx {C Function} scm_objcode_to_bytecode (objcode) | |
813 | Copy object code out to a bytevector for analysis by Scheme. | |
ff73ae34 | 814 | @end deffn |
e3ba263d | 815 | |
73643339 AW |
816 | The following procedure is actually in @code{(system vm program)}, but |
817 | we'll mention it here: | |
818 | ||
98850fd7 AW |
819 | @deffn {Scheme Variable} make-program objcode objtable [free-vars=#f] |
820 | @deffnx {C Function} scm_make_program (objcode, objtable, free_vars) | |
ff73ae34 | 821 | Load up object code into a Scheme program. The resulting program will |
73643339 | 822 | have @var{objtable} as its object table, which should be a vector or |
98850fd7 | 823 | @code{#f}, and will capture the free variables from @var{free-vars}. |
ff73ae34 | 824 | @end deffn |
c850030f | 825 | |
ff73ae34 AW |
826 | Object code from a file may be disassembled at the REPL via the |
827 | meta-command @code{,disassemble-file}, abbreviated as @code{,xx}. | |
828 | Programs may be disassembled via @code{,disassemble}, abbreviated as | |
829 | @code{,x}. | |
830 | ||
831 | Compiling object code to the fake language, @code{value}, is performed | |
832 | via loading objcode into a program, then executing that thunk with | |
833 | respect to the compilation environment. Normally the environment | |
834 | propagates through the compiler transparently, but users may specify | |
41e64dd7 | 835 | the compilation environment manually as well, as a module. |
ff73ae34 | 836 | |
c850030f | 837 | |
e63d888e DK |
838 | @node Writing New High-Level Languages |
839 | @subsection Writing New High-Level Languages | |
840 | ||
841 | In order to integrate a new language @var{lang} into Guile's compiler | |
842 | system, one has to create the module @code{(language @var{lang} spec)} | |
843 | containing the language definition and referencing the parser, | |
844 | compiler and other routines processing it. The module hierarchy in | |
845 | @code{(language brainfuck)} defines a very basic Brainfuck | |
846 | implementation meant to serve as easy-to-understand example on how to | |
4e432dab AW |
847 | do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck} |
848 | for more information about the Brainfuck language itself. | |
849 | ||
e63d888e | 850 | |
ff73ae34 AW |
851 | @node Extending the Compiler |
852 | @subsection Extending the Compiler | |
e3ba263d | 853 | |
ff73ae34 AW |
854 | At this point, we break with the impersonal tone of the rest of the |
855 | manual, and make an intervention. Admit it: if you've read this far | |
856 | into the compiler internals manual, you are a junkie. Perhaps a course | |
857 | at your university left you unsated, or perhaps you've always harbored | |
858 | a sublimated desire to hack the holy of computer science holies: a | |
859 | compiler. Well you're in good company, and in a good position. Guile's | |
860 | compiler needs your help. | |
861 | ||
862 | There are many possible avenues for improving Guile's compiler. | |
863 | Probably the most important improvement, speed-wise, will be some form | |
864 | of native compilation, both just-in-time and ahead-of-time. This could | |
865 | be done in many ways. Probably the easiest strategy would be to extend | |
866 | the compiled procedure structure to include a pointer to a native code | |
86872cc3 | 867 | vector, and compile from bytecode to native code at run-time after a |
ff73ae34 AW |
868 | procedure is called a certain number of times. |
869 | ||
870 | The name of the game is a profiling-based harvest of the low-hanging | |
871 | fruit, running programs of interest under a system-level profiler and | |
872 | determining which improvements would give the most bang for the buck. | |
98850fd7 AW |
873 | It's really getting to the point though that native compilation is the |
874 | next step. | |
ff73ae34 AW |
875 | |
876 | The compiler also needs help at the top end, enhancing the Scheme that | |
98850fd7 AW |
877 | it knows to also understand R6RS, and adding new high-level compilers. |
878 | We have JavaScript and Emacs Lisp mostly complete, but they could use | |
879 | some love; Lua would be nice as well, butq whatever language it is | |
880 | that strikes your fancy would be welcome too. | |
881 | ||
882 | Compilers are for hacking, not for admiring or for complaining about. | |
883 | Get to it! |