Commit | Line | Data |
---|---|---|
8680d53b AW |
1 | @c -*-texinfo-*- |
2 | @c This is part of the GNU Guile Reference Manual. | |
3 | @c Copyright (C) 2008 | |
4 | @c Free Software Foundation, Inc. | |
5 | @c See the file guile.texi for copying conditions. | |
6 | ||
7 | @node Compiling to the Virtual Machine | |
8 | @section Compiling to the Virtual Machine | |
9 | ||
00ce5125 AW |
10 | Compilers have a mystique about them that is attractive and |
11 | off-putting at the same time. They are attractive because they are | |
12 | magical -- they transform inert text into live results, like throwing | |
e33e3aee AW |
13 | the switch on Frankenstein's monster. However, this magic is perceived |
14 | by many to be impenetrable. | |
00ce5125 AW |
15 | |
16 | This section aims to pull back the veil from over Guile's compiler | |
e3ba263d | 17 | implementation, and pay attention to the small man behind the curtain. |
00ce5125 | 18 | |
e3ba263d AW |
19 | @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to |
20 | know how to compile your .scm file. | |
00ce5125 AW |
21 | |
22 | @menu | |
23 | * Compiler Tower:: | |
24 | * The Scheme Compiler:: | |
25 | * GHIL:: | |
26 | * GLIL:: | |
27 | * Object Code:: | |
e3ba263d | 28 | * Extending the Compiler:: |
00ce5125 AW |
29 | @end menu |
30 | ||
31 | @node Compiler Tower | |
32 | @subsection Compiler Tower | |
33 | ||
34 | Guile's compiler is quite simple, actually -- its @emph{compilers}, to | |
35 | put it more accurately. Guile defines a tower of languages, starting | |
36 | at Scheme and progressively simplifying down to languages that | |
e3ba263d | 37 | resemble the VM instruction set (@pxref{Instruction Set}). |
00ce5125 AW |
38 | |
39 | Each language knows how to compile to the next, so each step is simple | |
40 | and understandable. Furthermore, this set of languages is not | |
41 | hardcoded into Guile, so it is possible for the user to add new | |
42 | high-level languages, new passes, or even different compilation | |
43 | targets. | |
44 | ||
e3ba263d AW |
45 | Languages are registered in the module, @code{(system base language)}: |
46 | ||
47 | @example | |
48 | (use-modules (system base language)) | |
49 | @end example | |
50 | ||
51 | They are registered with the @code{define-language} form. | |
52 | ||
53 | @deffn {Scheme Syntax} define-language @ | |
54 | name title version reader printer @ | |
55 | [parser=#f] [read-file=#f] [compilers='()] [evaluator=#f] | |
56 | Define a language. | |
57 | ||
58 | This syntax defines a @code{#<language>} object, bound to @var{name} | |
59 | in the current environment. In addition, the language will be added to | |
60 | the global language set. For example, this is the language definition | |
61 | for Scheme: | |
62 | ||
63 | @example | |
64 | (define-language scheme | |
65 | #:title "Guile Scheme" | |
66 | #:version "0.5" | |
67 | #:reader read | |
68 | #:read-file read-file | |
ca445ba5 | 69 | #:compilers `((,ghil . ,compile-ghil)) |
e3ba263d AW |
70 | #:evaluator (lambda (x module) (primitive-eval x)) |
71 | #:printer write) | |
72 | @end example | |
73 | ||
74 | In this example, from @code{(language scheme spec)}, @code{read-file} | |
75 | reads expressions from a port and wraps them in a @code{begin} block. | |
76 | @end deffn | |
77 | ||
78 | The interesting thing about having languages defined this way is that | |
79 | they present a uniform interface to the read-eval-print loop. This | |
80 | allows the user to change the current language of the REPL: | |
81 | ||
82 | @example | |
83 | $ guile | |
84 | Guile Scheme interpreter 0.5 on Guile 1.9.0 | |
85 | Copyright (C) 2001-2008 Free Software Foundation, Inc. | |
86 | ||
87 | Enter `,help' for help. | |
88 | scheme@@(guile-user)> ,language ghil | |
89 | Guile High Intermediate Language (GHIL) interpreter 0.3 on Guile 1.9.0 | |
90 | Copyright (C) 2001-2008 Free Software Foundation, Inc. | |
91 | ||
92 | Enter `,help' for help. | |
93 | ghil@@(guile-user)> | |
94 | @end example | |
95 | ||
96 | Languages can be looked up by name, as they were above. | |
97 | ||
98 | @deffn {Scheme Procedure} lookup-language name | |
99 | Looks up a language named @var{name}, autoloading it if necessary. | |
100 | ||
101 | Languages are autoloaded by looking for a variable named @var{name} in | |
102 | a module named @code{(language @var{name} spec)}. | |
103 | ||
104 | The language object will be returned, or @code{#f} if there does not | |
105 | exist a language with that name. | |
106 | @end deffn | |
107 | ||
108 | Defining languages this way allows us to programmatically determine | |
109 | the necessary steps for compiling code from one language to another. | |
110 | ||
111 | @deffn {Scheme Procedure} lookup-compilation-order from to | |
112 | Recursively traverses the set of languages to which @var{from} can | |
113 | compile, depth-first, and return the first path that can transform | |
114 | @var{from} to @var{to}. Returns @code{#f} if no path is found. | |
115 | ||
116 | This function memoizes its results in a cache that is invalidated by | |
117 | subsequent calls to @code{define-language}, so it should be quite | |
118 | fast. | |
119 | @end deffn | |
120 | ||
121 | There is a notion of a ``current language'', which is maintained in | |
122 | the @code{*current-language*} fluid. This language is normally Scheme, | |
123 | and may be rebound by the user. The runtime compilation interfaces | |
124 | (@pxref{Read/Load/Eval/Compile}) also allow you to choose other source | |
125 | and target languages. | |
126 | ||
127 | The normal tower of languages when compiling Scheme goes like this: | |
128 | ||
129 | @itemize | |
130 | @item Scheme, which we know and love | |
131 | @item Guile High Intermediate Language (GHIL) | |
132 | @item Guile Low Intermediate Language (GLIL) | |
133 | @item Object code | |
134 | @end itemize | |
135 | ||
136 | Object code may be serialized to disk directly, though it has a cookie | |
137 | and version prepended to the front. But when compiling Scheme at | |
138 | runtime, you want a Scheme value, e.g. a compiled procedure. For this | |
139 | reason, so as not to break the abstraction, Guile defines a fake | |
140 | language, @code{value}. Compiling to @code{value} loads the object | |
141 | code into a procedure, and wakes the sleeping giant. | |
142 | ||
143 | Perhaps this strangeness can be explained by example: | |
144 | @code{compile-file} defaults to compiling to object code, because it | |
145 | produces object code that has to live in the barren world outside the | |
146 | Guile runtime; but @code{compile} defaults to compiling to | |
147 | @code{value}, as its product re-enters the Guile world. | |
148 | ||
149 | Indeed, the process of compilation can circulate through these | |
150 | different worlds indefinitely, as shown by the following quine: | |
151 | ||
152 | @example | |
00ce5125 | 153 | ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x))) |
e3ba263d | 154 | @end example |
00ce5125 AW |
155 | |
156 | @node The Scheme Compiler | |
157 | @subsection The Scheme Compiler | |
158 | ||
ca445ba5 AW |
159 | The job of the Scheme compiler is to expand all macros and to resolve |
160 | all symbols to lexical variables. Its target language, GHIL, is fairly | |
161 | close to Scheme itself, so this process is not very complicated. | |
00ce5125 | 162 | |
ca445ba5 AW |
163 | The Scheme compiler is driven by a table of @dfn{translators}, |
164 | declared with the @code{define-scheme-translator} form, defined in the | |
165 | module, @code{(language scheme compile-ghil)}. | |
00ce5125 | 166 | |
ca445ba5 AW |
167 | @deffn {Scheme Syntax} define-scheme-translator head clause1 clause2... |
168 | The best documentation of this form is probably an example. Here is | |
169 | the translator for @code{if}: | |
00ce5125 | 170 | |
ca445ba5 AW |
171 | @example |
172 | (define-scheme-translator if | |
173 | ;; (if TEST THEN [ELSE]) | |
174 | ((,test ,then) | |
175 | (make-ghil-if e l (retrans test) (retrans then) (retrans '(begin)))) | |
176 | ((,test ,then ,else) | |
177 | (make-ghil-if e l (retrans test) (retrans then) (retrans else)))) | |
178 | @end example | |
00ce5125 | 179 | |
ca445ba5 AW |
180 | The match syntax is from the @code{pmatch} macro, defined in |
181 | @code{(system base pmatch)}. The result of a clause should be a valid | |
182 | GHIL value. If no clause matches, a syntax error is signalled. | |
183 | ||
184 | In the body of the clauses, the following bindings are introduced: | |
185 | @itemize | |
186 | @item @code{e}, the current environment | |
187 | @item @code{l}, the current source location (or @code{#f}) | |
188 | @item @code{retrans}, a procedure that may be called to compile | |
189 | subexpressions | |
190 | @end itemize | |
191 | ||
192 | Note that translators are looked up by @emph{value}, not by name. That | |
193 | is to say, the translator is keyed under the @emph{value} of | |
194 | @code{if}, which normally prints as @code{#<primitive-builtin-macro! | |
195 | if>}. | |
196 | @end deffn | |
197 | ||
198 | Users can extend the compiler by defining new translators. | |
199 | Additionally, some forms can be inlined directly to | |
200 | instructions -- @xref{Inlined Scheme Instructions}, for a list. The | |
201 | actual inliners are defined in @code{(language scheme inline)}: | |
202 | ||
203 | @deffn {Scheme Syntax} define-inline head arity1 result1 arity2 result2... | |
204 | Defines an inliner for @code{head}. As in | |
205 | @code{define-scheme-translator}, inliners are keyed by value and not | |
206 | by name. | |
207 | ||
208 | Expressions are matched on their arities. For example: | |
209 | ||
210 | @example | |
211 | (define-inline eq? | |
212 | (x y) (eq? x y)) | |
213 | @end example | |
00ce5125 | 214 | |
ca445ba5 AW |
215 | This inlines calls to the Scheme procedure, @code{eq?}, to the |
216 | instruction @code{eq?}. | |
217 | ||
218 | A more complicated example would be: | |
219 | ||
220 | @example | |
221 | (define-inline + | |
222 | () 0 | |
223 | (x) x | |
224 | (x y) (add x y) | |
225 | (x y . rest) (add x (+ y . rest))) | |
226 | @end example | |
227 | @end deffn | |
228 | ||
229 | Compilers take two arguments, an expression and an environment, and | |
230 | return two values as well: an expression in the target language, and | |
231 | an environment suitable for the target language. The format of the | |
232 | environment is language-dependent. | |
233 | ||
234 | For Scheme, an environment may be one of three things: | |
235 | @itemize | |
236 | @item @code{#f}, in which case compilation is performed in the context | |
237 | of the current module; | |
238 | @item a module, which specifies the context of the compilation; or | |
239 | @item a @dfn{compile environment}, which specifies lexical variables | |
240 | as well. | |
241 | @end itemize | |
242 | ||
243 | The format of a compile environment for scheme is @code{(@var{module} | |
244 | @var{lexicals} . @var{externals})}, though users are strongly | |
245 | discouraged from constructing these environments themselves. Instead, | |
246 | if you need this functionality -- as in GOOPS' dynamic method compiler | |
247 | -- capture an environment with @code{compile-time-environment}, then | |
248 | pass that environment to @code{compile}. | |
249 | ||
250 | @deffn {Scheme Procedure} compile-time-environment | |
251 | A special function known to the compiler that, when compiled, will | |
252 | return a representation of the lexical environment in place at compile | |
253 | time. Useful for supporting some forms of dynamic compilation. Returns | |
254 | @code{#f} if called from the interpreter. | |
255 | @end deffn | |
00ce5125 AW |
256 | |
257 | @node GHIL | |
258 | @subsection GHIL | |
259 | ||
c850030f AW |
260 | Guile High Intermediate Language (GHIL) is a structured intermediate |
261 | language that is close in expressive power to Scheme. It is an | |
262 | expanded, pre-analyzed Scheme. | |
263 | ||
264 | GHIL is ``structured'' in the sense that its representation is based | |
265 | on records, not S-expressions. This gives a rigidity to the language | |
266 | that ensures that compiling to a lower-level language only requires a | |
267 | limited set of transformations. Practically speaking, consider the | |
268 | GHIL type, @code{<ghil-quote>}, which has fields named @code{env}, | |
269 | @code{loc}, and @code{exp}. Instances of this type are records created | |
270 | via @code{make-ghil-quote}, and whose fields are accessed as | |
271 | @code{ghil-quote-env}, @code{ghil-quote-loc}, and | |
272 | @code{ghil-quote-exp}. There is also a predicate, @code{ghil-quote?}. | |
273 | @xref{Records}, for more information on records. | |
274 | ||
275 | Expressions of GHIL name their environments explicitly, and all | |
276 | variables are referenced by identity in addition to by name. | |
277 | @code{(language ghil)} defines a number of routines to deal explicitly | |
278 | with variables and environments: | |
279 | ||
280 | @deftp {Scheme Variable} <ghil-toplevel-env> [table='()] | |
281 | A toplevel environment. The @var{table} holds all toplevel variables | |
282 | that have been resolved in this environment. | |
283 | @end deftp | |
284 | @deftp {Scheme Variable} <ghil-env> parent [table='()] [variables='()] | |
285 | A lexical environment. @var{parent} will be the enclosing lexical | |
286 | environment, or a toplevel environment. @var{table} holds an alist | |
287 | mapping symbols to variables bound in this environment, while | |
288 | @var{variables} holds a cumulative list of all variables ever defined | |
289 | in this environment. | |
290 | ||
291 | Lexical environments correspond to procedures. Bindings introduced | |
292 | e.g. by Scheme's @code{let} add to the bindings in a lexical | |
293 | environment. An example of a case in which a variable might be in | |
294 | @var{variables} but not in @var{table} would be a variable that is in | |
295 | the same procedure, but is out of scope. | |
296 | @end deftp | |
297 | @deftp {Scheme Variable} <ghil-var> env name kind [index=#f] | |
298 | A variable. @var{kind} is one of @code{argument}, @code{local}, | |
299 | @code{external}, @code{toplevel}, @code{public}, or @code{private}; | |
300 | see the procedures below for more information. @var{index} is used in | |
301 | compilation. | |
302 | @end deftp | |
303 | ||
304 | @deffn {Scheme Procedure} ghil-var-is-bound? env sym | |
305 | Recursively look up a variable named @var{sym} in @var{env}, and | |
306 | return it or @code{#f} if none is found. | |
ca445ba5 | 307 | @end deffn |
c850030f AW |
308 | @deffn {Scheme Procedure} ghil-var-for-ref! env sym |
309 | Recursively look up a variable named @var{sym} in @var{env}, and | |
310 | return it. If the symbol was not bound, return a new toplevel | |
311 | variable. | |
ca445ba5 | 312 | @end deffn |
c850030f AW |
313 | @deffn {Scheme Procedure} ghil-var-for-set! env sym |
314 | Like @code{ghil-var-for-ref!}, except that the returned variable will | |
315 | be marked as @code{external}. @xref{Variables and the VM}. | |
ca445ba5 | 316 | @end deffn |
c850030f AW |
317 | @deffn {Scheme Procedure} ghil-var-define! toplevel-env sym |
318 | Return an existing or new toplevel variable named @var{sym}. | |
319 | @var{toplevel-env} must be a toplevel environment. | |
ca445ba5 | 320 | @end deffn |
c850030f AW |
321 | @deffn {Scheme Procedure} ghil-var-at-module! env modname sym interface? |
322 | Return a variable that will be resolved at runtime with respect to a | |
323 | specific module named @var{modname}. If @var{interface?} is true, the | |
324 | variable will be of type @code{public}, otherwise @code{private}. | |
ca445ba5 | 325 | @end deffn |
c850030f AW |
326 | @deffn {Scheme Procedure} call-with-ghil-environment env syms func |
327 | Bind @var{syms} to fresh variables within a new lexical environment | |
328 | whose parent is @var{env}, and call @var{func} as @code{(@var{func} | |
329 | @var{new-env} @var{new-vars})}. | |
ca445ba5 | 330 | @end deffn |
c850030f AW |
331 | @deffn {Scheme Procedure} call-with-ghil-bindings env syms func |
332 | Like @code{call-with-ghil-environment}, except the existing | |
333 | environment @var{env} is re-used. For that reason, @var{func} is | |
334 | invoked as @code{(@var{func} @var{new-vars})} | |
ca445ba5 | 335 | @end deffn |
c850030f AW |
336 | |
337 | In the aforementioned @code{<ghil-quote>} type, the @var{env} slot | |
338 | holds a pointer to the environment in which the expression occurs. The | |
339 | @var{loc} slot holds source location information, so that errors | |
340 | corresponding to this expression can be mapped back to the initial | |
341 | expression in the higher-level language, e.g. Scheme. @xref{Compiled | |
342 | Procedures}, for more information on source location objects. | |
343 | ||
344 | GHIL also has a declarative serialization format, which makes writing | |
345 | and reading it a tractable problem for the human mind. Since all GHIL | |
346 | language constructs contain @code{env} and @code{loc} pointers, they | |
347 | are left out of the serialization. (Serializing @code{env} structures | |
348 | would be difficult, as they are often circular.) What is left is the | |
349 | type of expression, and the remaining slots defined in the expression | |
350 | type. | |
351 | ||
352 | For example, an S-expression representation of the @code{<ghil-quote>} | |
353 | expression would be: | |
354 | ||
355 | @example | |
356 | (quote 3) | |
357 | @end example | |
358 | ||
359 | It's deceptively like Scheme. The general rule is, for a type defined | |
360 | as @code{<ghil-@var{foo}> env loc @var{slot1} @var{slot2}...}, the | |
361 | S-expression representation will be @code{(@var{foo} @var{slot1} | |
362 | @var{slot2}...)}. Users may program with this format directly at the | |
363 | REPL: | |
364 | ||
365 | @example | |
366 | scheme@@(guile-user)> ,language ghil | |
367 | Guile High Intermediate Language (GHIL) interpreter 0.3 on Guile 1.9.0 | |
368 | Copyright (C) 2001-2008 Free Software Foundation, Inc. | |
369 | ||
370 | Enter `,help' for help. | |
371 | ghil@@(guile-user)> (call (ref +) (quote 32) (quote 10)) | |
372 | @result{} 42 | |
373 | @end example | |
374 | ||
375 | For convenience, some slots are serialized as rest arguments; those | |
376 | are noted below. The other caveat is that variables are serialized as | |
377 | their names only, and not their identities. | |
378 | ||
379 | @deftp {Scheme Variable} <ghil-void> env loc | |
380 | The unspecified value. | |
381 | @end deftp | |
382 | @deftp {Scheme Variable} <ghil-quote> env loc exp | |
383 | A quoted expression. | |
384 | ||
385 | Note that unlike in Scheme, there are no self-quoting expressions; all | |
386 | constants must come from @code{quote} expressions. | |
387 | @end deftp | |
388 | @deftp {Scheme Variable} <ghil-quasiquote> env loc exp | |
389 | A quasiquoted expression. The expression is treated as a constant, | |
390 | except for embedded @code{unquote} and @code{unquote-splicing} forms. | |
391 | @end deftp | |
392 | @deftp {Scheme Variable} <ghil-unquote> env loc exp | |
393 | Like Scheme's @code{unquote}; only valid within a quasiquote. | |
394 | @end deftp | |
395 | @deftp {Scheme Variable} <ghil-unquote-splicing> env loc exp | |
396 | Like Scheme's @code{unquote-splicing}; only valid within a quasiquote. | |
397 | @end deftp | |
398 | @deftp {Scheme Variable} <ghil-ref> env loc var | |
399 | A variable reference. Note that for purposes of serialization, | |
400 | @var{var} is serialized as its name, as a symbol. | |
401 | @end deftp | |
402 | @deftp {Scheme Variable} <ghil-set> env loc var val | |
403 | A variable mutation. @var{var} is serialized as a symbol. | |
404 | @end deftp | |
405 | @deftp {Scheme Variable} <ghil-define> env loc var val | |
406 | A toplevel variable definition. See @code{ghil-var-define!}. | |
407 | @end deftp | |
408 | @deftp {Scheme Variable} <ghil-if> env loc test then else | |
ca445ba5 | 409 | A conditional. Note that @var{else} is not optional. |
c850030f AW |
410 | @end deftp |
411 | @deftp {Scheme Variable} <ghil-and> env loc . exps | |
ca445ba5 | 412 | Like Scheme's @code{and}. |
c850030f AW |
413 | @end deftp |
414 | @deftp {Scheme Variable} <ghil-or> env loc . exps | |
ca445ba5 | 415 | Like Scheme's @code{or}. |
c850030f AW |
416 | @end deftp |
417 | @deftp {Scheme Variable} <ghil-begin> env loc . body | |
418 | Like Scheme's @code{begin}. | |
419 | @end deftp | |
420 | @deftp {Scheme Variable} <ghil-bind> env loc vars exprs . body | |
421 | Like a deconstructed @code{let}: each element of @var{vars} will be | |
422 | bound to the corresponding GHIL expression in @var{exprs}. | |
423 | ||
424 | Note that for purposes of the serialization format, @var{exprs} are | |
425 | evaluated before the new bindings are added to the environment. For | |
426 | @code{letrec} semantics, there also exists a @code{bindrec} parse | |
427 | flavor. This is useful for writing GHIL at the REPL, but the | |
428 | serializer does not currently have the cleverness needed to determine | |
429 | whether a @code{<ghil-bind>} has @code{let} or @code{letrec} | |
430 | semantics, and thus only serializes @code{<ghil-bind>} as @code{bind}. | |
431 | @end deftp | |
432 | @deftp {Scheme Variable} <ghil-mv-bind> env loc vars rest producer . body | |
ca445ba5 AW |
433 | Like Scheme's @code{receive} -- binds the values returned by |
434 | applying @code{producer}, which should be a thunk, to the | |
c850030f AW |
435 | @code{lambda}-like bindings described by @var{vars} and @var{rest}. |
436 | @end deftp | |
437 | @deftp {Scheme Variable} <ghil-lambda> env loc vars rest meta . body | |
438 | A closure. @var{vars} is the argument list, serialized as a list of | |
439 | symbols. @var{rest} is a boolean, which is @code{#t} iff the last | |
440 | argument is a rest argument. @var{meta} is an association list of | |
441 | properties. The actual @var{body} should be a list of GHIL | |
442 | expressions. | |
443 | @end deftp | |
444 | @deftp {Scheme Variable} <ghil-call> env loc proc . args | |
ca445ba5 | 445 | A procedure call. |
c850030f AW |
446 | @end deftp |
447 | @deftp {Scheme Variable} <ghil-mv-call> env loc producer consumer | |
ca445ba5 | 448 | Like Scheme's @code{call-with-values}. |
c850030f AW |
449 | @end deftp |
450 | @deftp {Scheme Variable} <ghil-inline> env loc op . args | |
ca445ba5 AW |
451 | An inlined VM instruction. @var{op} should be the instruction name as |
452 | a symbol, and @var{args} should be its arguments, as GHIL expressions. | |
c850030f AW |
453 | @end deftp |
454 | @deftp {Scheme Variable} <ghil-values> env loc . values | |
ca445ba5 | 455 | Like Scheme's @code{values}. |
c850030f AW |
456 | @end deftp |
457 | @deftp {Scheme Variable} <ghil-values*> env loc . values | |
ca445ba5 AW |
458 | @var{values} are as in the Scheme expression, @code{(apply values . |
459 | @var{vals})}. | |
c850030f AW |
460 | @end deftp |
461 | @deftp {Scheme Variable} <ghil-reified-env> env loc | |
ca445ba5 | 462 | Produces, at runtime, a reification of the environment at compile |
c850030f AW |
463 | time. Used in the implementation of Scheme's |
464 | @code{compile-time-environment}. | |
465 | @end deftp | |
466 | ||
467 | GHIL implements a compiler to GLIL that recursively traverses GHIL | |
468 | expressions, writing out GLIL expressions into a linear list. The | |
469 | compiler also keeps some state as to whether the current expression is | |
470 | in tail context, and whether its value will be used in future | |
471 | computations. This state allows the compiler not to emit code for | |
472 | constant expressions that will not be used (e.g. docstrings), and to | |
473 | perform tail calls when in tail position. | |
474 | ||
475 | Just as the Scheme to GHIL compiler introduced new hidden state---the | |
476 | environment---the GHIL to GLIL compiler introduces more state, the | |
477 | stack. While not represented explicitly, the stack is present in the | |
478 | compilation of each GHIL expression: compiling a GHIL expression | |
479 | should leave the runtime value stack in the same state. For example, | |
480 | if the intermediate value stack has two elements before evaluating an | |
481 | @code{if} expression, it should have two elements after that | |
482 | expression. | |
483 | ||
484 | Interested readers are encouraged to read the implementation in | |
485 | @code{(language ghil compile-glil)} for more details. | |
00ce5125 AW |
486 | |
487 | @node GLIL | |
488 | @subsection GLIL | |
489 | ||
ff73ae34 | 490 | Guile Low Intermediate Language (GLIL) is a structured intermediate |
c850030f AW |
491 | language whose expressions closely mirror the functionality of Guile's |
492 | VM instruction set. | |
493 | ||
494 | Its expression types are defined in @code{(language glil)}, and as | |
495 | with GHIL, some of its fields parse as rest arguments. | |
496 | ||
ff73ae34 AW |
497 | @deftp {Scheme Variable} <glil-program> nargs nrest nlocs nexts meta . body |
498 | A unit of code that at runtime will correspond to a compiled | |
e33e3aee | 499 | procedure. @var{nargs} @var{nrest} @var{nlocs}, and @var{nexts} |
ff73ae34 AW |
500 | collectively define the program's arity; see @ref{Compiled |
501 | Procedures}, for more information. @var{meta} should be an alist of | |
502 | properties, as in @code{<ghil-lambda>}. @var{body} is a list of GLIL | |
503 | expressions. | |
c850030f AW |
504 | @end deftp |
505 | @deftp {Scheme Variable} <glil-bind> . vars | |
ff73ae34 AW |
506 | An advisory expression that notes a liveness extent for a set of |
507 | variables. @var{vars} is a list of @code{(@var{name} @var{type} | |
508 | @var{index})}, where @var{type} should be either @code{argument}, | |
509 | @code{local}, or @code{external}. | |
510 | ||
511 | @code{<glil-bind>} expressions end up being serialized as part of a | |
512 | program's metadata and do not form part of a program's code path. | |
c850030f AW |
513 | @end deftp |
514 | @deftp {Scheme Variable} <glil-mv-bind> vars rest | |
ff73ae34 AW |
515 | A multiple-value binding of the values on the stack to @var{vars}. Iff |
516 | @var{rest} is true, the last element of @var{vars} will be treated as | |
517 | a rest argument. | |
518 | ||
519 | In addition to pushing a binding annotation on the stack, like | |
520 | @code{<glil-bind>}, an expression is emitted at compilation time to | |
521 | make sure that there are enough values available to bind. See the | |
522 | notes on @code{truncate-values} in @ref{Procedural Instructions}, for | |
523 | more information. | |
c850030f AW |
524 | @end deftp |
525 | @deftp {Scheme Variable} <glil-unbind> | |
ff73ae34 AW |
526 | Closes the liveness extent of the most recently encountered |
527 | @code{<glil-bind>} or @code{<glil-mv-bind>} expression. As GLIL | |
528 | expressions are compiled, a parallel stack of live bindings is | |
529 | maintained; this expression pops off the top element from that stack. | |
530 | ||
531 | Bindings are written into the program's metadata so that debuggers and | |
532 | other tools can determine the set of live local variables at a given | |
533 | offset within a VM program. | |
c850030f AW |
534 | @end deftp |
535 | @deftp {Scheme Variable} <glil-source> loc | |
ff73ae34 AW |
536 | Records source information for the preceding expression. @var{loc} |
537 | should be a vector, @code{#(@var{line} @var{column} @var{filename})}. | |
c850030f AW |
538 | @end deftp |
539 | @deftp {Scheme Variable} <glil-void> | |
ff73ae34 | 540 | Pushes the unspecified value on the stack. |
c850030f AW |
541 | @end deftp |
542 | @deftp {Scheme Variable} <glil-const> obj | |
ff73ae34 AW |
543 | Pushes a constant value onto the stack. @var{obj} must be a number, |
544 | string, symbol, keyword, boolean, character, or a pair or vector or | |
545 | list thereof, or the empty list. | |
c850030f AW |
546 | @end deftp |
547 | @deftp {Scheme Variable} <glil-argument> op index | |
ff73ae34 AW |
548 | Accesses an argument on the stack. If @var{op} is @code{ref}, the |
549 | argument is pushed onto the stack; if it is @code{set}, the argument | |
550 | is set from the top value on the stack, which is popped off. | |
c850030f AW |
551 | @end deftp |
552 | @deftp {Scheme Variable} <glil-local> op index | |
ff73ae34 AW |
553 | Like @code{<glil-argument>}, but for local variables. @xref{Stack |
554 | Layout}, for more information. | |
c850030f AW |
555 | @end deftp |
556 | @deftp {Scheme Variable} <glil-external> op depth index | |
ff73ae34 AW |
557 | Accesses a heap-allocated variable, addressed by @var{depth}, the nth |
558 | enclosing environment, and @var{index}, the variable's position within | |
559 | the environment. @var{op} is @code{ref} or @code{set}. | |
c850030f AW |
560 | @end deftp |
561 | @deftp {Scheme Variable} <glil-toplevel> op name | |
ff73ae34 AW |
562 | Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set}, |
563 | or @code{define}. | |
c850030f AW |
564 | @end deftp |
565 | @deftp {Scheme Variable} <glil-module> op mod name public? | |
ff73ae34 AW |
566 | Accesses a variable within a specific module. See |
567 | @code{ghil-var-at-module!}, for more information. | |
c850030f AW |
568 | @end deftp |
569 | @deftp {Scheme Variable} <glil-label> label | |
ff73ae34 AW |
570 | Creates a new label. @var{label} can be any Scheme value, and should |
571 | be unique. | |
c850030f AW |
572 | @end deftp |
573 | @deftp {Scheme Variable} <glil-branch> inst label | |
ff73ae34 | 574 | Branch to a label. @var{label} should be a @code{<ghil-label>}. |
c850030f AW |
575 | @code{inst} is a branching instruction: @code{br-if}, @code{br}, etc. |
576 | @end deftp | |
577 | @deftp {Scheme Variable} <glil-call> inst nargs | |
ff73ae34 | 578 | This expression is probably misnamed, as it does not correspond to |
c850030f AW |
579 | function calls. @code{<glil-call>} invokes the VM instruction named |
580 | @var{inst}, noting that it is called with @var{nargs} stack arguments. | |
ff73ae34 AW |
581 | The arguments should be pushed on the stack already. What happens to |
582 | the stack afterwards depends on the instruction. | |
c850030f AW |
583 | @end deftp |
584 | @deftp {Scheme Variable} <glil-mv-call> nargs ra | |
ff73ae34 AW |
585 | Performs a multiple-value call. @var{ra} is a @code{<glil-label>} |
586 | corresponding to the multiple-value return address for the call. See | |
587 | the notes on @code{mv-call} in @ref{Procedural Instructions}, for more | |
588 | information. | |
c850030f AW |
589 | @end deftp |
590 | ||
ff73ae34 AW |
591 | Users may enter in GLIL at the REPL as well, though there is a bit |
592 | more bookkeeping to do. Since GLIL needs the set of variables to be | |
593 | declared explicitly in a @code{<glil-program>}, GLIL expressions must | |
594 | be wrapped in a thunk that declares the arity of the expression: | |
00ce5125 | 595 | |
ff73ae34 AW |
596 | @example |
597 | scheme@@(guile-user)> ,language glil | |
598 | Guile Lowlevel Intermediate Language (GLIL) interpreter 0.3 on Guile 1.9.0 | |
599 | Copyright (C) 2001-2008 Free Software Foundation, Inc. | |
00ce5125 | 600 | |
ff73ae34 AW |
601 | Enter `,help' for help. |
602 | glil@@(guile-user)> (program 0 0 0 0 () (const 3) (call return 0)) | |
603 | @result{} 3 | |
604 | @end example | |
00ce5125 | 605 | |
ff73ae34 AW |
606 | Just as in all of Guile's compilers, an environment is passed to the |
607 | GLIL-to-object code compiler, and one is returned as well, along with | |
608 | the object code. | |
00ce5125 AW |
609 | |
610 | @node Object Code | |
611 | @subsection Object Code | |
612 | ||
ff73ae34 AW |
613 | Object code is the serialization of the raw instruction stream of a |
614 | program, ready for interpretation by the VM. Procedures related to | |
615 | object code are defined in the @code{(system vm objcode)} module. | |
00ce5125 | 616 | |
ff73ae34 AW |
617 | @deffn {Scheme Procedure} objcode? obj |
618 | @deffnx {C Function} scm_objcode_p (obj) | |
619 | Returns @code{#f} iff @var{obj} is object code, @code{#f} otherwise. | |
620 | @end deffn | |
00ce5125 | 621 | |
ff73ae34 AW |
622 | @deffn {Scheme Procedure} bytecode->objcode bytecode nlocs nexts |
623 | @deffnx {C Function} scm_bytecode_to_objcode (bytecode, nlocs, nexts) | |
624 | Makes a bytecode object from @var{bytecode}, which should be a | |
625 | @code{u8vector}. @var{nlocs} and @var{nexts} denote the number of | |
626 | stack and heap variables to reserve when this objcode is executed. | |
627 | @end deffn | |
e3ba263d | 628 | |
ff73ae34 AW |
629 | @deffn {Scheme Variable} load-objcode file |
630 | @deffnx {C Function} scm_load_objcode (file) | |
631 | Load object code from a file named @var{file}. The file will be mapped | |
632 | into memory via @code{mmap}, so this is a very fast operation. | |
e3ba263d | 633 | |
ff73ae34 AW |
634 | On disk, object code has an eight-byte cookie prepended to it, so that |
635 | we will not execute arbitrary garbage. In addition, two more bytes are | |
636 | reserved for @var{nlocs} and @var{nexts}. | |
637 | @end deffn | |
e3ba263d | 638 | |
ff73ae34 AW |
639 | @deffn {Scheme Variable} objcode->u8vector objcode |
640 | @deffnx {C Function} scm_objcode_to_u8vector (objcode) | |
641 | Copy object code out to a @code{u8vector} for analysis by Scheme. The | |
642 | ten-byte header is included. | |
643 | @end deffn | |
e3ba263d | 644 | |
ff73ae34 AW |
645 | @deffn {Scheme Variable} objcode->program objcode [external='()] |
646 | @deffnx {C Function} scm_objcode_to_program (objcode, external) | |
647 | Load up object code into a Scheme program. The resulting program will | |
648 | be a thunk that captures closure variables from @var{external}. | |
649 | @end deffn | |
c850030f | 650 | |
ff73ae34 AW |
651 | Object code from a file may be disassembled at the REPL via the |
652 | meta-command @code{,disassemble-file}, abbreviated as @code{,xx}. | |
653 | Programs may be disassembled via @code{,disassemble}, abbreviated as | |
654 | @code{,x}. | |
655 | ||
656 | Compiling object code to the fake language, @code{value}, is performed | |
657 | via loading objcode into a program, then executing that thunk with | |
658 | respect to the compilation environment. Normally the environment | |
659 | propagates through the compiler transparently, but users may specify | |
660 | the compilation environment manually as well: | |
661 | ||
662 | @deffn {Scheme Procedure} make-objcode-env module externals | |
663 | Make an object code environment. @var{module} should be a Scheme | |
664 | module, and @var{externals} should be a list of external variables. | |
665 | @code{#f} is also a valid object code environment. | |
666 | @end deffn | |
c850030f | 667 | |
ff73ae34 AW |
668 | @node Extending the Compiler |
669 | @subsection Extending the Compiler | |
e3ba263d | 670 | |
ff73ae34 AW |
671 | At this point, we break with the impersonal tone of the rest of the |
672 | manual, and make an intervention. Admit it: if you've read this far | |
673 | into the compiler internals manual, you are a junkie. Perhaps a course | |
674 | at your university left you unsated, or perhaps you've always harbored | |
675 | a sublimated desire to hack the holy of computer science holies: a | |
676 | compiler. Well you're in good company, and in a good position. Guile's | |
677 | compiler needs your help. | |
678 | ||
679 | There are many possible avenues for improving Guile's compiler. | |
680 | Probably the most important improvement, speed-wise, will be some form | |
681 | of native compilation, both just-in-time and ahead-of-time. This could | |
682 | be done in many ways. Probably the easiest strategy would be to extend | |
683 | the compiled procedure structure to include a pointer to a native code | |
684 | vector, and compile from bytecode to native code at runtime after a | |
685 | procedure is called a certain number of times. | |
686 | ||
687 | The name of the game is a profiling-based harvest of the low-hanging | |
688 | fruit, running programs of interest under a system-level profiler and | |
689 | determining which improvements would give the most bang for the buck. | |
690 | There are many well-known efficiency hacks in the literature: Dybvig's | |
691 | letrec optimization, individual boxing of heap-allocated values (and | |
692 | then store the boxes on the stack directory), optimized case-lambda | |
693 | expressions, stack underflow and overflow handlers, etc. Highly | |
694 | recommended papers: Dybvig's HOCS, Ghuloum's compiler paper. | |
695 | ||
696 | The compiler also needs help at the top end, enhancing the Scheme that | |
697 | it knows to also understand R6RS, and adding new high-level compilers: | |
698 | Emacs Lisp, Lua, JavaScript... |