Commit | Line | Data |
---|---|---|
015959cb KN |
1 | \input texinfo @c -*-texinfo-*- |
2 | @c %**start of header | |
3 | @setfilename guile-vm.info | |
4 | @settitle Guile VM Specification | |
5 | @footnotestyle end | |
6 | @setchapternewpage odd | |
7 | @c %**end of header | |
8 | ||
fa19602c LC |
9 | @set EDITION 0.6 |
10 | @set VERSION 0.6 | |
11 | @set UPDATED 2005-04-26 | |
015959cb | 12 | |
238e7a11 LC |
13 | @c Macro for instruction definitions. |
14 | @macro insn{} | |
15 | Instruction | |
16 | @end macro | |
17 | ||
a52b2d3d LC |
18 | @c For Scheme procedure definitions. |
19 | @macro scmproc{} | |
20 | Scheme Procedure | |
21 | @end macro | |
22 | ||
23 | @c Scheme records. | |
24 | @macro scmrec{} | |
25 | Record | |
26 | @end macro | |
27 | ||
015959cb KN |
28 | @ifinfo |
29 | @dircategory Scheme Programming | |
30 | @direntry | |
238e7a11 | 31 | * Guile VM: (guile-vm). Guile's Virtual Machine. |
015959cb KN |
32 | @end direntry |
33 | ||
34 | This file documents Guile VM. | |
35 | ||
36 | Copyright @copyright{} 2000 Keisuke Nishida | |
238e7a11 | 37 | Copyright @copyright{} 2005 Ludovic Court`es |
015959cb KN |
38 | |
39 | Permission is granted to make and distribute verbatim copies of this | |
40 | manual provided the copyright notice and this permission notice are | |
41 | preserved on all copies. | |
42 | ||
43 | @ignore | |
44 | Permission is granted to process this file through TeX and print the | |
45 | results, provided the printed document carries a copying permission | |
46 | notice identical to this one except for the removal of this paragraph | |
47 | (this paragraph not being relevant to the printed manual). | |
48 | ||
49 | @end ignore | |
50 | Permission is granted to copy and distribute modified versions of this | |
51 | manual under the conditions for verbatim copying, provided that the | |
52 | entire resulting derived work is distributed under the terms of a | |
53 | permission notice identical to this one. | |
54 | ||
55 | Permission is granted to copy and distribute translations of this manual | |
56 | into another language, under the above conditions for modified versions, | |
57 | except that this permission notice may be stated in a translation | |
58 | approved by the Free Software Foundation. | |
59 | @end ifinfo | |
60 | ||
61 | @titlepage | |
62 | @title Guile VM Specification | |
63 | @subtitle for Guile VM @value{VERSION} | |
64 | @author Keisuke Nishida | |
65 | ||
66 | @page | |
67 | @vskip 0pt plus 1filll | |
68 | Edition @value{EDITION} @* | |
69 | Updated for Guile VM @value{VERSION} @* | |
70 | @value{UPDATED} @* | |
71 | ||
72 | Copyright @copyright{} 2000 Keisuke Nishida | |
238e7a11 | 73 | Copyright @copyright{} 2005 Ludovic Court`es |
015959cb KN |
74 | |
75 | Permission is granted to make and distribute verbatim copies of this | |
76 | manual provided the copyright notice and this permission notice are | |
77 | preserved on all copies. | |
78 | ||
79 | Permission is granted to copy and distribute modified versions of this | |
80 | manual under the conditions for verbatim copying, provided that the | |
81 | entire resulting derived work is distributed under the terms of a | |
82 | permission notice identical to this one. | |
83 | ||
84 | Permission is granted to copy and distribute translations of this manual | |
85 | into another language, under the above conditions for modified versions, | |
86 | except that this permission notice may be stated in a translation | |
87 | approved by the Free Software Foundation. | |
88 | @end titlepage | |
89 | ||
90 | @contents | |
91 | ||
92 | @c ********************************************************************* | |
93 | @node Top, Introduction, (dir), (dir) | |
94 | @top Guile VM Specification | |
95 | ||
238e7a11 LC |
96 | This document would like to correspond to Guile VM @value{VERSION}. |
97 | However, be warned that important parts still correspond to version | |
98 | 0.0 and are not valid anymore. | |
015959cb KN |
99 | |
100 | @menu | |
fa19602c LC |
101 | * Introduction:: |
102 | * Variable Management:: | |
fa19602c | 103 | * Instruction Set:: |
a52b2d3d | 104 | * The Compiler:: |
b6368dbb LC |
105 | * Concept Index:: |
106 | * Function and Instruction Index:: | |
107 | * Command and Variable Index:: | |
f41cb00c LC |
108 | |
109 | @detailmenu | |
110 | --- The Detailed Node Listing --- | |
111 | ||
112 | Instruction Set | |
113 | ||
114 | * Environment Control Instructions:: | |
115 | * Branch Instructions:: | |
116 | * Subprogram Control Instructions:: | |
117 | * Data Control Instructions:: | |
118 | ||
a52b2d3d LC |
119 | The Compiler |
120 | ||
121 | * Overview:: | |
122 | * The Language Front-Ends:: | |
123 | * GHIL:: | |
9dbbe4bb | 124 | * Compiling Scheme Code:: |
a52b2d3d LC |
125 | * GLIL:: |
126 | * The Assembler:: | |
127 | ||
f41cb00c | 128 | @end detailmenu |
015959cb KN |
129 | @end menu |
130 | ||
131 | @c ********************************************************************* | |
fa19602c | 132 | @node Introduction, Variable Management, Top, Top |
015959cb | 133 | @chapter What is Guile VM? |
a98cef7e KN |
134 | |
135 | A Guile VM has a set of registers and its own stack memory. Guile may | |
136 | have more than one VM's. Each VM may execute at most one program at a | |
137 | time. Guile VM is a CISC system so designed as to execute Scheme and | |
138 | other languages efficiently. | |
139 | ||
fa19602c | 140 | @unnumberedsubsec Registers |
a98cef7e | 141 | |
fa19602c LC |
142 | @itemize |
143 | @item pc - Program counter ;; ip (instruction poiner) is better? | |
144 | @item sp - Stack pointer | |
145 | @item bp - Base pointer | |
146 | @item ac - Accumulator | |
147 | @end itemize | |
a98cef7e | 148 | |
fa19602c | 149 | @unnumberedsubsec Engine |
a98cef7e KN |
150 | |
151 | A VM may have one of three engines: reckless, regular, or debugging. | |
152 | Reckless engine is fastest but dangerous. Regular engine is normally | |
153 | fail-safe and reasonably fast. Debugging engine is safest and | |
154 | functional but very slow. | |
155 | ||
fa19602c | 156 | @unnumberedsubsec Memory |
a98cef7e KN |
157 | |
158 | Stack is the only memory that each VM owns. The other memory is shared | |
159 | memory that is shared among every VM and other part of Guile. | |
160 | ||
fa19602c | 161 | @unnumberedsubsec Program |
a98cef7e KN |
162 | |
163 | A VM program consists of a bytecode that is executed and an environment | |
164 | in which execution is done. Each program is allocated in the shared | |
165 | memory and may be executed by any VM. A program may call other programs | |
166 | within a VM. | |
167 | ||
fa19602c | 168 | @unnumberedsubsec Instruction |
a98cef7e KN |
169 | |
170 | Guile VM has dozens of system instructions and (possibly) hundreds of | |
171 | functional instructions. Some Scheme procedures such as cons and car | |
172 | are implemented as VM's builtin functions, which are very efficient. | |
173 | Other procedures defined outside of the VM are also considered as VM's | |
174 | functional features, since they do not change the state of VM. | |
175 | Procedures defined within the VM are called subprograms. | |
176 | ||
177 | Most instructions deal with the accumulator (ac). The VM stores all | |
178 | results from functions in ac, instead of pushing them into the stack. | |
179 | I'm not sure whether this is a good thing or not. | |
180 | ||
a52b2d3d | 181 | @node Variable Management, Instruction Set, Introduction, Top |
015959cb | 182 | @chapter Variable Management |
a98cef7e | 183 | |
a52b2d3d LC |
184 | FIXME: This chapter needs to be reviewed so that it matches reality. |
185 | A more up-to-date description of the mechanisms described in this | |
186 | section is given in @ref{Instruction Set}. | |
187 | ||
a98cef7e KN |
188 | A program may have access to local variables, external variables, and |
189 | top-level variables. | |
190 | ||
fa19602c | 191 | @section Local/external variables |
a98cef7e KN |
192 | |
193 | A stack is logically divided into several blocks during execution. A | |
194 | "block" is such a unit that maintains local variables and dynamic chain. | |
195 | A "frame" is an upper level unit that maintains subprogram calls. | |
196 | ||
fa19602c | 197 | @example |
a98cef7e KN |
198 | Stack |
199 | dynamic | | | | | |
200 | chain +==========+ - = | |
201 | | |local vars| | | | |
202 | `-|block data| | block | | |
203 | /|frame data| | | | |
204 | | +----------+ - | | |
205 | | |local vars| | | frame | |
206 | `-|block data| | | | |
207 | /+----------+ - | | |
208 | | |local vars| | | | |
209 | `-|block data| | | | |
210 | /+==========+ - = | |
211 | | |local vars| | | | |
212 | `-|block data| | | | |
213 | /|frame data| | | | |
214 | | +----------+ - | | |
215 | | | | | | | |
fa19602c | 216 | @end example |
a98cef7e KN |
217 | |
218 | The first block of each frame may look like this: | |
219 | ||
fa19602c | 220 | @example |
a98cef7e KN |
221 | Address Data |
222 | ------- ---- | |
223 | xxx0028 Local variable 2 | |
224 | xxx0024 Local variable 1 | |
225 | bp ->xxx0020 Local variable 0 | |
226 | xxx001c Local link (block data) | |
227 | xxx0018 External link (block data) | |
228 | xxx0014 Stack pointer (block data) | |
229 | xxx0010 Return address (frame data) | |
230 | xxx000c Parent program (frame data) | |
fa19602c | 231 | @end example |
a98cef7e KN |
232 | |
233 | The base pointer (bp) always points to the lowest address of local | |
234 | variables of the recent block. Local variables are referred as "bp[n]". | |
235 | The local link field has a pointer to the dynamic parent of the block. | |
236 | The parent's variables are referred as "bp[-1][n]", and grandparent's | |
237 | are "bp[-1][-1][n]". Thus, any local variable is represented by its | |
238 | depth and offset from the current bp. | |
239 | ||
240 | A variable may be "external", which is allocated in the shared memory. | |
241 | The external link field of a block has a pointer to such a variable set, | |
242 | which I call "fragment" (what should I call?). A fragment has a set of | |
243 | variables and its own chain. | |
244 | ||
fa19602c | 245 | @example |
a98cef7e KN |
246 | local external |
247 | chain| | chain | |
248 | | +-----+ .--------, | | |
015959cb | 249 | `-|block|--+->|external|-' |
a98cef7e KN |
250 | /+-----+ | `--------'\, |
251 | `-|block|--' | | |
252 | /+-----+ .--------, | | |
015959cb | 253 | `-|block|---->|external|-' |
a98cef7e KN |
254 | +-----+ `--------' |
255 | | | | |
fa19602c | 256 | @end example |
a98cef7e KN |
257 | |
258 | An external variable is referred as "bp[-2]->variables[n]" or | |
259 | "bp[-2]->link->...->variables[n]". This is also represented by a pair | |
260 | of depth and offset. At any point of execution, the value of bp | |
261 | determines the current local link and external link, and thus the | |
262 | current environment of a program. | |
263 | ||
264 | Other data fields are described later. | |
265 | ||
fa19602c | 266 | @section Top-level variables |
a98cef7e KN |
267 | |
268 | Guile VM uses the same top-level variables as the regular Guile. A | |
269 | program may have direct access to vcells. Currently this is done by | |
270 | calling scm_intern0, but a program is possible to have any top-level | |
271 | environment defined by the current module. | |
272 | ||
fa19602c | 273 | @section Scheme and VM variable |
a98cef7e KN |
274 | |
275 | Let's think about the following Scheme code as an example: | |
276 | ||
fa19602c | 277 | @example |
a98cef7e KN |
278 | (define (foo a) |
279 | (lambda (b) (list foo a b))) | |
fa19602c | 280 | @end example |
a98cef7e KN |
281 | |
282 | In the lambda expression, "foo" is a top-level variable, "a" is an | |
283 | external variable, and "b" is a local variable. | |
284 | ||
285 | When a VM executes foo, it allocates a block for "a". Since "a" may be | |
286 | externally referred from the closure, the VM creates a fragment with a | |
287 | copy of "a" in it. When the VM evaluates the lambda expression, it | |
288 | creates a subprogram (closure), associating the fragment with the | |
289 | subprogram as its external environment. When the closure is executed, | |
290 | its environment will look like this: | |
291 | ||
fa19602c | 292 | @example |
a98cef7e KN |
293 | block Top-level: foo |
294 | +-------------+ | |
295 | |local var: b | fragment | |
296 | +-------------+ .-----------, | |
297 | |external link|---->|variable: a| | |
298 | +-------------+ `-----------' | |
fa19602c | 299 | @end example |
a98cef7e KN |
300 | |
301 | The fragment remains as long as the closure exists. | |
302 | ||
fa19602c | 303 | @section Addressing mode |
a98cef7e KN |
304 | |
305 | Guile VM has five addressing modes: | |
306 | ||
fa19602c LC |
307 | @itemize |
308 | @item Real address | |
309 | @item Local position | |
310 | @item External position | |
311 | @item Top-level location | |
312 | @item Constant object | |
313 | @end itemize | |
a98cef7e KN |
314 | |
315 | Real address points to the address in the real program and is only used | |
316 | with the program counter (pc). | |
317 | ||
318 | Local position and external position are represented as a pair of depth | |
319 | and offset from bp, as described above. These are base relative | |
320 | addresses, and the real address may vary during execution. | |
321 | ||
322 | Top-level location is represented as a Guile's vcell. This location is | |
323 | determined at loading time, so the use of this address is efficient. | |
324 | ||
015959cb | 325 | Constant object is not an address but gives an instruction an Scheme |
a98cef7e KN |
326 | object directly. |
327 | ||
328 | [ We'll also need dynamic scope addressing to support Emacs Lisp? ] | |
329 | ||
a98cef7e | 330 | |
fa19602c | 331 | Overall procedure: |
a98cef7e | 332 | |
fa19602c LC |
333 | @enumerate |
334 | @item A source program is compiled into a bytecode. | |
335 | @item A bytecode is given an environment and becomes a program. | |
336 | @item A VM starts execution, creating a frame for it. | |
337 | @item Whenever a program calls a subprogram, a new frame is created for it. | |
338 | @item When a program finishes execution, it returns a value, and the VM | |
339 | continues execution of the parent program. | |
340 | @item When all programs terminated, the VM returns the final value and stops. | |
341 | @end enumerate | |
a98cef7e | 342 | |
a52b2d3d LC |
343 | \f |
344 | @node Instruction Set, The Compiler, Variable Management, Top | |
fa19602c | 345 | @chapter Instruction Set |
a98cef7e KN |
346 | |
347 | The Guile VM instruction set is roughly divided two groups: system | |
348 | instructions and functional instructions. System instructions control | |
349 | the execution of programs, while functional instructions provide many | |
238e7a11 LC |
350 | useful calculations. |
351 | ||
352 | @menu | |
353 | * Environment Control Instructions:: | |
f41cb00c | 354 | * Branch Instructions:: |
238e7a11 LC |
355 | * Subprogram Control Instructions:: |
356 | * Data Control Instructions:: | |
357 | @end menu | |
358 | ||
f41cb00c | 359 | @node Environment Control Instructions, Branch Instructions, Instruction Set, Instruction Set |
238e7a11 LC |
360 | @section Environment Control Instructions |
361 | ||
362 | @deffn @insn{} link binding-name | |
363 | Look up @var{binding-name} (a string) in the current environment and | |
364 | push the corresponding variable object onto the stack. If | |
365 | @var{binding-name} is not bound yet, then create a new binding and | |
366 | push its variable object. | |
367 | @end deffn | |
368 | ||
369 | @deffn @insn{} variable-ref | |
370 | Dereference the variable object which is on top of the stack and | |
371 | replace it by the value of the variable it represents. | |
372 | @end deffn | |
373 | ||
374 | @deffn @insn{} variable-set | |
375 | Set the value of the variable on top of the stack (at @code{sp[0]}) to | |
376 | the object located immediately before (at @code{sp[-1]}). | |
377 | @end deffn | |
378 | ||
379 | As an example, let us look at what a simple function call looks like: | |
380 | ||
381 | @example | |
382 | (+ 2 3) | |
383 | @end example | |
384 | ||
385 | This call yields the following sequence of instructions: | |
a98cef7e | 386 | |
238e7a11 | 387 | @example |
62082959 | 388 | (link "+") ;; lookup binding "+" |
238e7a11 LC |
389 | (variable-ref) ;; dereference it |
390 | (make-int8 2) ;; push immediate value `2' | |
391 | (make-int8 3) ;; push immediate value `3' | |
392 | (tail-call 2) ;; call the proc at sp[-3] with two args | |
393 | @end example | |
fa19602c | 394 | |
62082959 LC |
395 | @deffn @insn{} local-ref offset |
396 | Push onto the stack the value of the local variable located at | |
397 | @var{offset} within the current stack frame. | |
398 | @end deffn | |
399 | ||
400 | @deffn @insn{} local-set offset | |
401 | Pop the Scheme object located on top of the stack and make it the new | |
402 | value of the local variable located at @var{offset} within the current | |
403 | stack frame. | |
404 | @end deffn | |
405 | ||
406 | @deffn @insn{} external-ref offset | |
407 | Push the value of the closure variable located at position | |
408 | @var{offset} within the program's list of external variables. | |
409 | @end deffn | |
410 | ||
411 | @deffn @insn{} external-set offset | |
412 | Pop the Scheme object located on top of the stack and make it the new | |
413 | value of the closure variable located at @var{offset} within the | |
414 | program's list of external variables. | |
415 | @end deffn | |
416 | ||
0b5f0e49 LC |
417 | @deffn @insn{} make-closure |
418 | Pop the program object from the stack and assign it the current | |
419 | closure variable list as its closure. Push the result program | |
420 | object. | |
421 | @end deffn | |
422 | ||
423 | Let's illustrate this: | |
62082959 LC |
424 | |
425 | @example | |
426 | (let ((x 2)) | |
427 | (lambda () | |
428 | (let ((x++ (+ 1 x))) | |
429 | (set! x x++) | |
430 | x++))) | |
431 | @end example | |
432 | ||
433 | The resulting program has one external (closure) variable, i.e. its | |
434 | @var{nexts} is set to 1 (@pxref{Subprogram Control Instructions}). | |
435 | This yields the following code: | |
436 | ||
437 | @example | |
0b5f0e49 LC |
438 | ;; the traditional program prologue with NLOCS = 0 and NEXTS = 1 |
439 | ||
62082959 LC |
440 | 0 (make-int8 2) |
441 | 2 (external-set 0) | |
442 | 4 (make-int8 4) | |
0b5f0e49 LC |
443 | 6 (link "+") ;; lookup `+' |
444 | 9 (vector 1) ;; create the external variable vector for | |
445 | ;; later use by `object-ref' and `object-set' | |
62082959 LC |
446 | ... |
447 | 40 (load-program ##34#) | |
0b5f0e49 LC |
448 | 59 (make-closure) ;; assign the current closure to the program |
449 | ;; just pushed by `load-program' | |
450 | 60 (return) | |
62082959 LC |
451 | @end example |
452 | ||
453 | The program loaded here by @var{load-program} contains the following | |
454 | sequence of instructions: | |
455 | ||
456 | @example | |
457 | 0 (object-ref 0) ;; push the variable for `+' | |
458 | 2 (variable-ref) ;; dereference `+' | |
459 | 3 (make-int8:1) ;; push 1 | |
460 | 4 (external-ref 0) ;; push the value of `x' | |
461 | 6 (call 2) ;; call `+' and push the result | |
462 | 8 (local-set 0) ;; make it the new value of `x++' | |
463 | 10 (local-ref 0) ;; push the value of `x++' | |
464 | 12 (external-set 0) ;; make it the new value of `x' | |
465 | 14 (local-ref 0) ;; push the value of `x++' | |
466 | 16 (return) ;; return it | |
467 | @end example | |
468 | ||
0b5f0e49 LC |
469 | At this point, you should know pretty much everything about the three |
470 | types of variables a program may need to access. | |
fa19602c | 471 | |
f41cb00c LC |
472 | |
473 | @node Branch Instructions, Subprogram Control Instructions, Environment Control Instructions, Instruction Set | |
474 | @section Branch Instructions | |
475 | ||
476 | All the conditional branch instructions described below work in the | |
477 | same way: | |
478 | ||
479 | @itemize | |
480 | @item They take the Scheme object located on the stack and use it as | |
481 | the branch condition; | |
482 | @item If the condition if false, then program execution continues with | |
483 | the next instruction; | |
484 | @item If the condition is true, then the instruction pointer is | |
485 | increased by the offset passed as an argument to the branch | |
486 | instruction; | |
487 | @item Finally, when the instruction finished, the condition object is | |
488 | removed from the stack. | |
489 | @end itemize | |
490 | ||
491 | Note that the offset passed to the instruction is encoded on two 8-bit | |
492 | integers which are then combined by the VM as one 16-bit integer. | |
493 | ||
494 | @deffn @insn{} br offset | |
495 | Jump to @var{offset}. | |
496 | @end deffn | |
497 | ||
498 | @deffn @insn{} br-if offset | |
499 | Jump to @var{offset} if the condition on the stack is not false. | |
500 | @end deffn | |
501 | ||
502 | @deffn @insn{} br-if-not offset | |
503 | Jump to @var{offset} if the condition on the stack is false. | |
504 | @end deffn | |
505 | ||
506 | @deffn @insn{} br-if-eq offset | |
507 | Jump to @var{offset} if the two objects located on the stack are | |
508 | equal in the sense of @var{eq?}. Note that, for this instruction, the | |
509 | stack pointer is decremented by two Scheme objects instead of only | |
510 | one. | |
511 | @end deffn | |
512 | ||
513 | @deffn @insn{} br-if-not-eq offset | |
514 | Same as @var{br-if-eq} for non-equal objects. | |
515 | @end deffn | |
516 | ||
517 | @deffn @insn{} br-if-null offset | |
518 | Jump to @var{offset} if the object on the stack is @code{'()}. | |
519 | @end deffn | |
520 | ||
521 | @deffn @insn{} br-if-not-null offset | |
522 | Jump to @var{offset} if the object on the stack is not @code{'()}. | |
523 | @end deffn | |
524 | ||
525 | ||
526 | @node Subprogram Control Instructions, Data Control Instructions, Branch Instructions, Instruction Set | |
238e7a11 LC |
527 | @section Subprogram Control Instructions |
528 | ||
529 | Programs (read: ``compiled procedure'') may refer to external | |
530 | bindings, like variables or functions defined outside the program | |
531 | itself, in the environment in which it will evaluate at run-time. In | |
532 | a sense, a program's environment and its bindings are an implicit | |
533 | parameter of every program. | |
534 | ||
b6368dbb | 535 | @cindex object table |
238e7a11 | 536 | In order to handle such bindings, each program has an @dfn{object |
0b5f0e49 LC |
537 | table} associated to it. This table (actually a Scheme vector) |
538 | contains all constant objects referenced by the program. The object | |
539 | table of a program is initialized right before a program is loaded | |
540 | with @var{load-program}. | |
541 | ||
542 | Variable objects are one such type of constant object: when a global | |
543 | binding is defined, a variable object is associated to it and that | |
544 | object will remain constant over time, even if the value bound to it | |
545 | changes. Therefore, external bindings only need to be looked up once | |
546 | when the program is loaded. References to the corresponding external | |
547 | variables from within the program are then performed via the | |
548 | @var{object-ref} instruction and are almost as fast as local variable | |
549 | references. | |
238e7a11 LC |
550 | |
551 | Let us consider the following program (procedure) which references | |
552 | external bindings @code{frob} and @var{%magic}: | |
553 | ||
554 | @example | |
555 | (lambda (x) | |
556 | (frob x %magic)) | |
557 | @end example | |
558 | ||
559 | This yields the following assembly code: | |
560 | ||
561 | @example | |
562 | (make-int8 64) ;; number of args, vars, etc. (see below) | |
563 | (link "frob") | |
564 | (link "%magic") | |
62082959 | 565 | (vector 2) ;; object table (external bindings) |
238e7a11 LC |
566 | ... |
567 | (load-program #u8(20 0 23 21 0 20 1 23 36 2)) | |
568 | (return) | |
569 | @end example | |
570 | ||
571 | All the instructions occurring before @var{load-program} (some were | |
572 | omitted for simplicity) form a @dfn{prologue} which, among other | |
573 | things, pushed an object table (a vector) that contains the variable | |
574 | objects for the variables bound to @var{frob} and @var{%magic}. This | |
575 | vector and other data pushed onto the stack are then popped by the | |
576 | @var{load-program} instruction. | |
577 | ||
578 | Besides, the @var{load-program} instruction takes one explicit | |
579 | argument which is the bytecode of the program itself. Disassembled, | |
580 | this bytecode looks like: | |
581 | ||
582 | @example | |
0b5f0e49 | 583 | (object-ref 0) ;; push the variable object of `frob' |
238e7a11 LC |
584 | (variable-ref) ;; dereference it |
585 | (local-ref 0) ;; push the value of `x' | |
586 | (object-ref 1) ;; push the variable object of `%magic' | |
587 | (variable-ref) ;; dereference it | |
588 | (tail-call 2) ;; call `frob' with two parameters | |
589 | @end example | |
590 | ||
591 | This clearly shows that there is little difference between references | |
62082959 LC |
592 | to local variables and references to externally bound variables since |
593 | lookup of externally bound variables if performed only once before the | |
594 | program is run. | |
238e7a11 LC |
595 | |
596 | @deffn @insn{} load-program bytecode | |
f41cb00c LC |
597 | Load the program whose bytecode is @var{bytecode} (a u8vector), pop |
598 | its meta-information from the stack, and push a corresponding program | |
599 | object onto the stack. The program's meta-information may consist of | |
600 | (in the order in which it should be pushed onto the stack): | |
fa19602c LC |
601 | |
602 | @itemize | |
238e7a11 LC |
603 | @item optionally, a pair representing meta-data (see the |
604 | @var{program-meta} procedure); [FIXME: explain their meaning] | |
605 | @item optionally, a vector which is the program's object table (a | |
606 | program that does not reference external bindings does not need an | |
607 | object table); | |
2d80426a LC |
608 | @item either one immediate integer or four immediate integers |
609 | representing respectively the number of arguments taken by the | |
610 | function (@var{nargs}), the number of @dfn{rest arguments} | |
611 | (@var{nrest}, 0 or 1), the number of local variables (@var{nlocs}) and | |
62082959 LC |
612 | the number of external variables (@var{nexts}) (@pxref{Environment |
613 | Control Instructions}). | |
fa19602c LC |
614 | @end itemize |
615 | ||
238e7a11 LC |
616 | @end deffn |
617 | ||
618 | @deffn @insn{} object-ref offset | |
619 | Push the variable object for the external variable located at | |
620 | @var{offset} within the program's object table. | |
621 | @end deffn | |
622 | ||
623 | @deffn @insn{} return | |
624 | Free the program's frame. | |
625 | @end deffn | |
626 | ||
f41cb00c LC |
627 | @deffn @insn{} call nargs |
628 | Call the procedure, continuation or program located at | |
629 | @code{sp[-nargs]} with the @var{nargs} arguments located from | |
630 | @code{sp[0]} to @code{sp[-nargs + 1]}. The | |
631 | procedure/continuation/program and its arguments are dropped from the | |
62082959 LC |
632 | stack and the result is pushed. When calling a program, the |
633 | @code{call} instruction reserves room for its local variables on the | |
634 | stack, and initializes its list of closure variables and its vector of | |
635 | externally bound variables. | |
f41cb00c LC |
636 | @end deffn |
637 | ||
638 | @deffn @insn{} tail-call nargs | |
639 | Same as @code{call} except that, for tail-recursive calls to a | |
640 | program, the current stack frame is re-used, as required by RnRS. | |
62082959 | 641 | This instruction is otherwise similar to @code{call}. |
f41cb00c LC |
642 | @end deffn |
643 | ||
238e7a11 LC |
644 | |
645 | @node Data Control Instructions, , Subprogram Control Instructions, Instruction Set | |
646 | @section Data Control Instructions | |
647 | ||
648 | @deffn @insn{} make-int8 value | |
649 | Push @var{value}, an 8-bit integer, onto the stack. | |
650 | @end deffn | |
651 | ||
652 | @deffn @insn{} make-int8:0 | |
653 | Push the immediate value @code{0} onto the stack. | |
654 | @end deffn | |
655 | ||
656 | @deffn @insn{} make-int8:1 | |
657 | Push the immediate value @code{1} onto the stack. | |
658 | @end deffn | |
659 | ||
660 | @deffn @insn{} make-false | |
661 | Push @code{#f} onto the stack. | |
662 | @end deffn | |
663 | ||
664 | @deffn @insn{} make-true | |
665 | Push @code{#t} onto the stack. | |
666 | @end deffn | |
fa19602c LC |
667 | |
668 | @itemize | |
669 | @item %push | |
670 | @item %pushi | |
671 | @item %pushl, %pushl:0:0, %pushl:0:1, %pushl:0:2, %pushl:0:3 | |
672 | @item %pushe, %pushe:0:0, %pushe:0:1, %pushe:0:2, %pushe:0:3 | |
673 | @item %pusht | |
674 | @end itemize | |
675 | ||
676 | @itemize | |
677 | @item %loadi | |
678 | @item %loadl, %loadl:0:0, %loadl:0:1, %loadl:0:2, %loadl:0:3 | |
679 | @item %loade, %loade:0:0, %loade:0:1, %loade:0:2, %loade:0:3 | |
680 | @item %loadt | |
681 | @end itemize | |
682 | ||
683 | @itemize | |
684 | @item %savei | |
685 | @item %savel, %savel:0:0, %savel:0:1, %savel:0:2, %savel:0:3 | |
686 | @item %savee, %savee:0:0, %savee:0:1, %savee:0:2, %savee:0:3 | |
687 | @item %savet | |
688 | @end itemize | |
689 | ||
690 | @section Flow control instructions | |
691 | ||
692 | @itemize | |
693 | @item %br-if | |
694 | @item %br-if-not | |
695 | @item %jump | |
696 | @end itemize | |
697 | ||
698 | @section Function call instructions | |
699 | ||
700 | @itemize | |
701 | @item %func, %func0, %func1, %func2 | |
702 | @end itemize | |
703 | ||
704 | @section Scheme built-in functions | |
705 | ||
706 | @itemize | |
707 | @item cons | |
708 | @item car | |
709 | @item cdr | |
710 | @end itemize | |
711 | ||
712 | @section Mathematical buitin functions | |
713 | ||
714 | @itemize | |
715 | @item 1+ | |
716 | @item 1- | |
717 | @item add, add2 | |
718 | @item sub, sub2, minus | |
719 | @item mul2 | |
720 | @item div2 | |
721 | @item lt2 | |
722 | @item gt2 | |
723 | @item le2 | |
724 | @item ge2 | |
725 | @item num-eq2 | |
726 | @end itemize | |
015959cb | 727 | |
a52b2d3d LC |
728 | |
729 | \f | |
b6368dbb | 730 | @node The Compiler, Concept Index, Instruction Set, Top |
a52b2d3d LC |
731 | @chapter The Compiler |
732 | ||
733 | This section describes Guile-VM's compiler and the compilation process | |
734 | to produce bytecode executable by the VM itself (@pxref{Instruction | |
735 | Set}). | |
736 | ||
737 | @menu | |
738 | * Overview:: | |
739 | * The Language Front-Ends:: | |
740 | * GHIL:: | |
9dbbe4bb | 741 | * Compiling Scheme Code:: |
a52b2d3d LC |
742 | * GLIL:: |
743 | * The Assembler:: | |
744 | @end menu | |
745 | ||
746 | @node Overview, The Language Front-Ends, The Compiler, The Compiler | |
747 | @section Overview | |
748 | ||
749 | Compilation in Guile-VM is a three-stage process: | |
750 | ||
b6368dbb LC |
751 | @cindex intermediate language |
752 | @cindex assembler | |
753 | @cindex compiler | |
754 | @cindex GHIL | |
755 | @cindex GLIL | |
756 | @cindex bytecode | |
757 | ||
a52b2d3d LC |
758 | @enumerate |
759 | @item the source programming language (e.g. R5RS Scheme) is read and | |
760 | translated into GHIL, @dfn{Guile's High-Level Intermediate Language}; | |
761 | @item GHIL code is then translated into a lower-level intermediate | |
762 | language call GLIL, @dfn{Guile's Low-Level Intermediate Language}; | |
763 | @item finally, GLIL is @dfn{assembled} into the VM's assembly language | |
764 | (@pxref{Instruction Set}) and bytecode. | |
765 | @end enumerate | |
766 | ||
767 | The use of two separate intermediate languages eases the | |
768 | implementation of front-ends since the gap between high-level | |
769 | languages like Scheme and GHIL is relatively small. | |
770 | ||
b6368dbb | 771 | @vindex guilec |
a52b2d3d LC |
772 | From an end-user viewpoint, compiling a Guile program into bytecode |
773 | can be done either by using the @command{guilec} command-line tool, or | |
774 | by using the @code{compile-file} procedure exported by the | |
775 | @code{(system base compile)} module. | |
776 | ||
884d46de LC |
777 | @deffn @scmproc{} compile-file file . opts |
778 | Compile Scheme source code from file @var{file} using compilation | |
779 | options @var{opts}. The resulting file, a Guile object file, will be | |
780 | name according the application of the @code{compiled-file-name} | |
781 | procedure to @var{file}. The possible values for @var{opts} are the | |
782 | same as for the @code{compile-in} procedure (see below, @pxref{The Language | |
783 | Front-Ends}). | |
784 | @end deffn | |
785 | ||
786 | @deffn @scmproc{} compiled-file-name file | |
787 | Given source file name @var{file} (a string), return a string that | |
788 | denotes the name of the Guile object file corresponding to | |
789 | @var{file}. By default, the file name returned is @var{file} minus | |
790 | its extension and plus the @code{.go} file extension. | |
791 | @end deffn | |
792 | ||
793 | @cindex self-hosting | |
794 | It is worth noting, as you might have already guessed, that Guile-VM's | |
795 | compiler is written in Guile Scheme and is @dfn{self-hosted}: it can | |
796 | compile itself. | |
a52b2d3d LC |
797 | |
798 | @node The Language Front-Ends, GHIL, Overview, The Compiler | |
799 | @section The Language Front-Ends | |
800 | ||
801 | Guile-VM comes with a number of @dfn{language front-ends}, that is, | |
802 | code that can read a given high-level programming language like R5RS | |
803 | Scheme, and translate it into a lower-level representation suitable to | |
804 | the compiler. | |
805 | ||
806 | Each language front-end provides a @dfn{specification} and a | |
807 | @dfn{translator} to GHIL. Both of them come in the @code{language} | |
808 | module hierarchy. As an example, the front-end for Scheme is located | |
809 | in the @code{(language scheme spec)} and @code{(language scheme | |
810 | translate)} modules. Language front-ends can then be retrieved using | |
811 | the @code{lookup-language} procedure of the @code{(system base | |
812 | language)} module. | |
813 | ||
814 | @deftp @scmrec{} <language> name title version reader printer read-file expander translator evaluator environment | |
815 | Denotes a language front-end specification a various methods used by | |
816 | the compiler to handle source written in that language. Of particular | |
817 | interest is the @code{translator} slot (@pxref{GHIL}). | |
818 | @end deftp | |
819 | ||
9dbbe4bb LC |
820 | @deffn @scmproc{} lookup-language lang |
821 | Look for a language front-end named @var{lang}, a symbol (e.g, | |
822 | @code{scheme}), and return the @code{<language>} record describing it | |
823 | if found. If @var{lang} does not denote a language front-end, an | |
824 | error is raised. Note that this procedure assumes that language | |
825 | @var{lang} exists if there exist a @code{(language @var{lang} spec)} | |
826 | module. | |
a52b2d3d LC |
827 | @end deffn |
828 | ||
884d46de LC |
829 | The @code{(system base compile)} module defines a procedure similar to |
830 | @code{compile-file} but that is not limited to the Scheme language: | |
831 | ||
832 | @deffn @scmproc{} compile-in expr env lang . opts | |
833 | Compile expression @var{expr}, which is written in language @var{lang} | |
834 | (a @code{<language>} object), using compilation options @var{opts}, | |
835 | and return bytecode as produced by the assembler (@pxref{The | |
836 | Assembler}). | |
837 | ||
838 | Options @var{opts} may contain the following keywords: | |
839 | ||
840 | @table @code | |
841 | @item :e | |
842 | compilation will stop after the code expansion phase. | |
843 | @item :t | |
844 | compilation will stop after the code translation phase, i.e. after | |
845 | code in the source language @var{lang} has been translated into GHIL | |
846 | (@pxref{GHIL}). | |
847 | @item :c | |
848 | compilation will stop after the compilation phase and before the | |
849 | assembly phase, i.e. once GHIL has been translated into GLIL | |
850 | (@pxref{GLIL}). | |
851 | @end table | |
852 | ||
853 | Additionally, @var{opts} may contain any option understood by the | |
854 | GHIL-to-GLIL compiler described in @xref{GLIL}. | |
855 | @end deffn | |
856 | ||
a52b2d3d | 857 | |
9dbbe4bb | 858 | @node GHIL, Compiling Scheme Code, The Language Front-Ends, The Compiler |
a52b2d3d LC |
859 | @section Guile's High-Level Intermediate Language |
860 | ||
861 | GHIL has constructs almost equivalent to those found in Scheme. | |
862 | However, unlike Scheme, it is meant to be read only by the compiler | |
863 | itself. Therefore, a sequence of GHIL code is only a sequence of GHIL | |
864 | @emph{objects} (records), as opposed to symbols, each of which | |
865 | represents a particular language feature. These records are all | |
866 | defined in the @code{(system il ghil)} module and are named | |
867 | @code{<ghil-*>}. | |
868 | ||
b6368dbb | 869 | Each GHIL record has at least two fields: one containing the |
a52b2d3d | 870 | environment (Guile module) in which it is considered, and one |
b6368dbb | 871 | containing its location [FIXME: currently seems to be unused]. Below |
a52b2d3d LC |
872 | is a list of the main GHIL object types and their fields: |
873 | ||
874 | @example | |
875 | ;; Objects | |
876 | (<ghil-void> env loc) | |
877 | (<ghil-quote> env loc obj) | |
878 | (<ghil-quasiquote> env loc exp) | |
879 | (<ghil-unquote> env loc exp) | |
880 | (<ghil-unquote-splicing> env loc exp) | |
881 | ;; Variables | |
882 | (<ghil-ref> env loc var) | |
883 | (<ghil-set> env loc var val) | |
884 | (<ghil-define> env loc var val) | |
885 | ;; Controls | |
886 | (<ghil-if> env loc test then else) | |
887 | (<ghil-and> env loc exps) | |
888 | (<ghil-or> env loc exps) | |
889 | (<ghil-begin> env loc exps) | |
890 | (<ghil-bind> env loc vars vals body) | |
891 | (<ghil-lambda> env loc vars rest body) | |
892 | (<ghil-call> env loc proc args) | |
893 | (<ghil-inline> env loc inline args) | |
894 | @end example | |
895 | ||
896 | As can be seen from this examples, the constructs in GHIL are pretty | |
897 | close to the fundamental primitives of Scheme. | |
898 | ||
899 | It is the role of front-end language translators (@pxref{The Language | |
900 | Front-Ends}) to produce a sequence of GHIL objects from the | |
9dbbe4bb LC |
901 | human-readable, source programming language. The next section |
902 | describes the translator for the Scheme language. | |
a52b2d3d | 903 | |
9dbbe4bb LC |
904 | @node Compiling Scheme Code, GLIL, GHIL, The Compiler |
905 | @section Compiling Scheme Code | |
a52b2d3d | 906 | |
9dbbe4bb LC |
907 | The language object for Scheme, as returned by @code{(lookup-language |
908 | 'scheme)} (@pxref{The Language Front-Ends}), defines a translator | |
909 | procedure that returns a sequence of GHIL objects given Scheme code. | |
910 | Before actually performing this operation, the Scheme translator | |
911 | expands macros in the original source code. | |
912 | ||
913 | The macros that may be expanded can come from different sources: | |
914 | ||
915 | @itemize | |
916 | @item core Guile macros, such as @code{false-if-exception}; | |
917 | @item macros defined in modules used by the module being compiled, | |
918 | e.g., @code{receive} in @code{(ice-9 receive)}; | |
919 | @item macros defined within the module being compiled. | |
920 | @end itemize | |
921 | ||
922 | @cindex macro | |
923 | @cindex syntax transformer | |
924 | @findex define-macro | |
925 | @findex defmacro | |
926 | The main complexity in handling macros at compilation time is that | |
927 | Guile's macros are first-class objects. For instance, when using | |
928 | @code{define-macro}, one actually defines a @emph{procedure} that | |
929 | returns code; of course, unlike a ``regular'' procedure, it is | |
930 | executed when an S-exp is @dfn{memoized} by the evaluator, i.e., | |
931 | before the actual evaluation takes place. Worse, it is possible to | |
932 | turn a procedure into a macro, or @dfn{syntax transformer}, thus | |
933 | removing, to some extent, the boundary between the macro expansion and | |
934 | evaluation phases, @inforef{Internal Macros, , guile}. | |
935 | ||
936 | [FIXME: explain limitations, etc.] | |
937 | ||
938 | ||
939 | @node GLIL, The Assembler, Compiling Scheme Code, The Compiler | |
a52b2d3d LC |
940 | @section Guile's Low-Level Intermediate Language |
941 | ||
942 | A GHIL instruction sequence can be compiled into GLIL using the | |
943 | @code{compile} procedure exported by the @code{(system il compile)} | |
944 | module. During this translation process, various optimizations may | |
945 | also be performed. | |
946 | ||
947 | The module @code{(system il glil)} defines record types representing | |
948 | various low-level abstractions. Compared to GHIL, the flow control | |
949 | primitives in GLIL are much more low-level: only @code{<glil-label>}, | |
950 | @code{<glil-branch>} and @code{<glil-call>} are available, no | |
951 | @code{lambda}, @code{if}, etc. | |
952 | ||
953 | ||
954 | @deffn @scmproc{} compile ghil environment . opts | |
955 | Compile @var{ghil}, a GHIL instruction sequence, within | |
956 | environment/module @var{environment}, and return the resulting GLIL | |
957 | instruction sequence. The option list @var{opts} may be either the | |
958 | empty list or a list containing the @code{:O} keyword in which case | |
959 | @code{compile} will first go through an optimization stage of | |
960 | @var{ghil}. | |
884d46de LC |
961 | |
962 | Note that the @code{:O} option may be passed at a higher-level to the | |
963 | @code{compile-file} and @code{compile-in} procedures (@pxref{The | |
964 | Language Front-Ends}). | |
a52b2d3d LC |
965 | @end deffn |
966 | ||
967 | @deffn @scmproc{} pprint-glil glil . port | |
968 | Print @var{glil}, a GLIL sequence instructions, in a human-readable | |
969 | form. If @var{port} is passed, it will be used as the output port. | |
970 | @end deffn | |
971 | ||
972 | ||
973 | Let's consider the following Scheme expression: | |
974 | ||
975 | @example | |
976 | (lambda (x) (+ x 1)) | |
977 | @end example | |
978 | ||
979 | The corresponding (unoptimized) GLIL code, as shown by | |
980 | @code{pprint-glil}, looks like this: | |
981 | ||
982 | @example | |
983 | (@@asm (0 0 0 0) | |
984 | (@@asm (1 0 0 0) ;; expect one arg. | |
985 | (@@bind (x argument 0)) ;; debugging info | |
986 | (module-ref #f +) ;; lookup `+' | |
987 | (argument-ref 0) ;; push the argument onto | |
988 | ;; the stack | |
989 | (const 1) ;; push `1' | |
990 | (tail-call 2) ;; call `+', with 2 args, | |
991 | ;; using the same stack frame | |
992 | (@@source 15 33)) ;; additional debugging info | |
993 | (return 0)) | |
994 | @end example | |
995 | ||
996 | This is not unlike the VM's assembly language described in | |
997 | @ref{Instruction Set}. | |
998 | ||
999 | @node The Assembler, , GLIL, The Compiler | |
1000 | @section The Assembler | |
1001 | ||
b6368dbb LC |
1002 | @findex code->bytes |
1003 | ||
a52b2d3d LC |
1004 | The final compilation step consists in converting the GLIL instruction |
1005 | sequence into VM bytecode. This is what the @code{assemble} procedure | |
1006 | defined in the @code{(system vm assemble)} module is for. It relies | |
1007 | on the @code{code->bytes} procedure of the @code{(system vm conv)} | |
1008 | module to convert instructions (represented as lists whose @code{car} | |
1009 | is a symbol naming the instruction, e.g. @code{object-ref}, | |
1010 | @pxref{Instruction Set}) into binary code, or @dfn{bytecode}. | |
1011 | Bytecode itself is represented using SRFI-4 byte vectors, | |
1012 | @inforef{SRFI-4, SRFI-4 homogeneous numeric vectors, guile}. | |
1013 | ||
1014 | ||
1015 | @deffn @scmproc{} assemble glil environment . opts | |
1016 | Return a binary representation of @var{glil} (bytecode), either in the | |
1017 | form of an SRFI-4 @code{u8vector} or a @code{<bytespec>} object. | |
1018 | [FIXME: Why is that?] | |
1019 | @end deffn | |
1020 | ||
1021 | ||
b6368dbb | 1022 | |
015959cb | 1023 | @c ********************************************************************* |
b6368dbb LC |
1024 | @node Concept Index, Function and Instruction Index, The Compiler, Top |
1025 | @unnumbered Concept Index | |
1026 | @printindex cp | |
015959cb | 1027 | |
b6368dbb LC |
1028 | @node Function and Instruction Index, Command and Variable Index, Concept Index, Top |
1029 | @unnumbered Function and Instruction Index | |
1030 | @printindex fn | |
015959cb | 1031 | |
b6368dbb LC |
1032 | @node Command and Variable Index, , Function and Instruction Index, Top |
1033 | @unnumbered Command and Variable Index | |
1034 | @printindex vr | |
015959cb KN |
1035 | |
1036 | @bye | |
b6368dbb | 1037 | |
015959cb | 1038 | @c Local Variables: |
b6368dbb | 1039 | @c ispell-local-dictionary: "american"; |
015959cb | 1040 | @c End: |
238e7a11 LC |
1041 | |
1042 | @c LocalWords: bytecode |