Commit | Line | Data |
---|---|---|
8680d53b AW |
1 | @c -*-texinfo-*- |
2 | @c This is part of the GNU Guile Reference Manual. | |
581f410f | 3 | @c Copyright (C) 2008,2009,2010,2013 |
8680d53b AW |
4 | @c Free Software Foundation, Inc. |
5 | @c See the file guile.texi for copying conditions. | |
6 | ||
7 | @node A Virtual Machine for Guile | |
8 | @section A Virtual Machine for Guile | |
9 | ||
f11871d6 AW |
10 | Guile has both an interpreter and a compiler. To a user, the difference |
11 | is transparent---interpreted and compiled procedures can call each other | |
12 | as they please. | |
090d51ed AW |
13 | |
14 | The difference is that the compiler creates and interprets bytecode | |
15 | for a custom virtual machine, instead of interpreting the | |
98850fd7 AW |
16 | S-expressions directly. Loading and running compiled code is faster |
17 | than loading and running source code. | |
090d51ed AW |
18 | |
19 | The virtual machine that does the bytecode interpretation is a part of | |
20 | Guile itself. This section describes the nature of Guile's virtual | |
21 | machine. | |
22 | ||
8680d53b AW |
23 | @menu |
24 | * Why a VM?:: | |
25 | * VM Concepts:: | |
26 | * Stack Layout:: | |
27 | * Variables and the VM:: | |
00ce5125 | 28 | * VM Programs:: |
8680d53b AW |
29 | * Instruction Set:: |
30 | @end menu | |
31 | ||
32 | @node Why a VM? | |
33 | @subsection Why a VM? | |
34 | ||
86872cc3 | 35 | @cindex interpreter |
f11871d6 AW |
36 | For a long time, Guile only had an interpreter. Guile's interpreter |
37 | operated directly on the S-expression representation of Scheme source | |
38 | code. | |
090d51ed | 39 | |
f11871d6 | 40 | But while the interpreter was highly optimized and hand-tuned, it still |
e3ba263d AW |
41 | performs many needless computations during the course of evaluating an |
42 | expression. For example, application of a function to arguments | |
f11871d6 AW |
43 | needlessly consed up the arguments in a list. Evaluation of an |
44 | expression always had to figure out what the car of the expression is -- | |
45 | a procedure, a memoized form, or something else. All values have to be | |
46 | allocated on the heap. Et cetera. | |
090d51ed | 47 | |
f11871d6 | 48 | The solution to this problem was to compile the higher-level language, |
090d51ed | 49 | Scheme, into a lower-level language for which all of the checks and |
86872cc3 | 50 | dispatching have already been done---the code is instead stripped to |
090d51ed AW |
51 | the bare minimum needed to ``do the job''. |
52 | ||
53 | The question becomes then, what low-level language to choose? There | |
54 | are many options. We could compile to native code directly, but that | |
55 | poses portability problems for Guile, as it is a highly cross-platform | |
56 | project. | |
57 | ||
58 | So we want the performance gains that compilation provides, but we | |
59 | also want to maintain the portability benefits of a single code path. | |
60 | The obvious solution is to compile to a virtual machine that is | |
61 | present on all Guile installations. | |
62 | ||
63 | The easiest (and most fun) way to depend on a virtual machine is to | |
64 | implement the virtual machine within Guile itself. This way the | |
65 | virtual machine provides what Scheme needs (tail calls, multiple | |
86872cc3 AW |
66 | values, @code{call/cc}) and can provide optimized inline instructions |
67 | for Guile (@code{cons}, @code{struct-ref}, etc.). | |
090d51ed AW |
68 | |
69 | So this is what Guile does. The rest of this section describes that VM | |
70 | that Guile implements, and the compiled procedures that run on it. | |
71 | ||
f11871d6 AW |
72 | Before moving on, though, we should note that though we spoke of the |
73 | interpreter in the past tense, Guile still has an interpreter. The | |
74 | difference is that before, it was Guile's main evaluator, and so was | |
75 | implemented in highly optimized C; now, it is actually implemented in | |
76 | Scheme, and compiled down to VM bytecode, just like any other program. | |
77 | (There is still a C interpreter around, used to bootstrap the compiler, | |
78 | but it is not normally used at runtime.) | |
79 | ||
80 | The upside of implementing the interpreter in Scheme is that we preserve | |
81 | tail calls and multiple-value handling between interpreted and compiled | |
82 | code. The downside is that the interpreter in Guile 2.0 is slower than | |
83 | the interpreter in 1.8. We hope the that the compiler's speed makes up | |
84 | for the loss! | |
85 | ||
86 | Also note that this decision to implement a bytecode compiler does not | |
090d51ed AW |
87 | preclude native compilation. We can compile from bytecode to native |
88 | code at runtime, or even do ahead of time compilation. More | |
86872cc3 | 89 | possibilities are discussed in @ref{Extending the Compiler}. |
8680d53b AW |
90 | |
91 | @node VM Concepts | |
92 | @subsection VM Concepts | |
93 | ||
f11871d6 AW |
94 | Compiled code is run by a virtual machine (VM). Each thread has its own |
95 | VM. When a compiled procedure is run, Guile looks up the virtual machine | |
96 | for the current thread and executes the procedure using that VM. | |
8680d53b | 97 | |
86872cc3 | 98 | Guile's virtual machine is a stack machine---that is, it has few |
8680d53b AW |
99 | registers, and the instructions defined in the VM operate by pushing |
100 | and popping values from a stack. | |
101 | ||
102 | Stack memory is exclusive to the virtual machine that owns it. In | |
103 | addition to their stacks, virtual machines also have access to the | |
104 | global memory (modules, global bindings, etc) that is shared among | |
105 | other parts of Guile, including other VMs. | |
106 | ||
107 | A VM has generic instructions, such as those to reference local | |
d22fc3e4 | 108 | variables, and instructions designed to support Guile's languages -- |
8680d53b AW |
109 | mathematical instructions that support the entire numerical tower, an |
110 | inlined implementation of @code{cons}, etc. | |
111 | ||
112 | The registers that a VM has are as follows: | |
113 | ||
114 | @itemize | |
115 | @item ip - Instruction pointer | |
116 | @item sp - Stack pointer | |
117 | @item fp - Frame pointer | |
118 | @end itemize | |
119 | ||
120 | In other architectures, the instruction pointer is sometimes called | |
121 | the ``program counter'' (pc). This set of registers is pretty typical | |
d22fc3e4 | 122 | for stack machines; their exact meanings in the context of Guile's VM |
81fd3152 | 123 | are described in the next section. |
8680d53b | 124 | |
81fd3152 AW |
125 | @c wingo: The following is true, but I don't know in what context to |
126 | @c describe it. A documentation FIXME. | |
8680d53b AW |
127 | |
128 | @c A VM may have one of three engines: reckless, regular, or debugging. | |
129 | @c Reckless engine is fastest but dangerous. Regular engine is normally | |
130 | @c fail-safe and reasonably fast. Debugging engine is safest and | |
131 | @c functional but very slow. | |
132 | ||
81fd3152 AW |
133 | @c (Actually we have just a regular and a debugging engine; normally |
134 | @c we use the latter, it's almost as fast as the ``regular'' engine.) | |
135 | ||
8680d53b AW |
136 | @node Stack Layout |
137 | @subsection Stack Layout | |
138 | ||
139 | While not strictly necessary to understand how to work with the VM, it | |
98850fd7 | 140 | is instructive and sometimes entertaining to consider the structure of |
8680d53b AW |
141 | the VM stack. |
142 | ||
143 | Logically speaking, a VM stack is composed of ``frames''. Each frame | |
144 | corresponds to the application of one compiled procedure, and contains | |
145 | storage space for arguments, local variables, intermediate values, and | |
146 | some bookkeeping information (such as what to do after the frame | |
147 | computes its value). | |
148 | ||
149 | While the compiler is free to do whatever it wants to, as long as the | |
150 | semantics of a computation are preserved, in practice every time you | |
151 | call a function, a new frame is created. (The notable exception of | |
152 | course is the tail call case, @pxref{Tail Calls}.) | |
153 | ||
154 | Within a frame, you have the data associated with the function | |
155 | application itself, which is of a fixed size, and the stack space for | |
156 | intermediate values. Sometimes only the former is referred to as the | |
157 | ``frame'', and the latter is the ``stack'', although all pending | |
158 | application frames can have some intermediate computations interleaved | |
159 | on the stack. | |
160 | ||
161 | The structure of the fixed part of an application frame is as follows: | |
162 | ||
163 | @example | |
164 | Stack | |
8274228f AW |
165 | | ... | |
166 | | Intermed. val. 0 | <- fp + bp->nargs + bp->nlocs = SCM_FRAME_UPPER_ADDRESS (fp) | |
167 | +==================+ | |
168 | | Local variable 1 | | |
8680d53b AW |
169 | | Local variable 0 | <- fp + bp->nargs |
170 | | Argument 1 | | |
171 | | Argument 0 | <- fp | |
172 | | Program | <- fp - 1 | |
8274228f AW |
173 | +------------------+ |
174 | | Return address | | |
175 | | MV return address| | |
176 | | Dynamic link | <- fp - 4 = SCM_FRAME_DATA_ADDRESS (fp) = SCM_FRAME_LOWER_ADDRESS (fp) | |
177 | +==================+ | |
8680d53b AW |
178 | | | |
179 | @end example | |
180 | ||
181 | In the above drawing, the stack grows upward. The intermediate values | |
182 | stored in the application of this frame are stored above | |
183 | @code{SCM_FRAME_UPPER_ADDRESS (fp)}. @code{bp} refers to the | |
81fd3152 | 184 | @code{struct scm_objcode} data associated with the program at |
8680d53b AW |
185 | @code{fp - 1}. @code{nargs} and @code{nlocs} are properties of the |
186 | compiled procedure, which will be discussed later. | |
187 | ||
188 | The individual fields of the frame are as follows: | |
189 | ||
190 | @table @asis | |
191 | @item Return address | |
192 | The @code{ip} that was in effect before this program was applied. When | |
193 | we return from this activation frame, we will jump back to this | |
194 | @code{ip}. | |
195 | ||
196 | @item MV return address | |
197 | The @code{ip} to return to if this application returns multiple | |
198 | values. For continuations that only accept one value, this value will | |
d22fc3e4 AW |
199 | be @code{NULL}; for others, it will be an @code{ip} that points to a |
200 | multiple-value return address in the calling code. That code will | |
86872cc3 AW |
201 | expect the top value on the stack to be an integer---the number of |
202 | values being returned---and that below that integer there are the | |
d22fc3e4 | 203 | values being returned. |
8680d53b AW |
204 | |
205 | @item Dynamic link | |
206 | This is the @code{fp} in effect before this program was applied. In | |
207 | effect, this and the return address are the registers that are always | |
98850fd7 AW |
208 | ``saved''. The dynamic link links the current frame to the previous |
209 | frame; computing a stack trace involves traversing these frames. | |
8680d53b AW |
210 | |
211 | @item Local variable @var{n} | |
98850fd7 AW |
212 | Lambda-local variables that are all allocated as part of the frame. |
213 | This makes access to variables very cheap. | |
8680d53b AW |
214 | |
215 | @item Argument @var{n} | |
216 | The calling convention of the VM requires arguments of a function | |
98850fd7 AW |
217 | application to be pushed on the stack, and here they are. References |
218 | to arguments dispatch to these locations on the stack. | |
8680d53b AW |
219 | |
220 | @item Program | |
e3ba263d AW |
221 | This is the program being applied. For more information on how |
222 | programs are implemented, @xref{VM Programs}. | |
8680d53b AW |
223 | @end table |
224 | ||
225 | @node Variables and the VM | |
226 | @subsection Variables and the VM | |
227 | ||
81fd3152 | 228 | Consider the following Scheme code as an example: |
8680d53b AW |
229 | |
230 | @example | |
231 | (define (foo a) | |
232 | (lambda (b) (list foo a b))) | |
233 | @end example | |
234 | ||
98850fd7 AW |
235 | Within the lambda expression, @code{foo} is a top-level variable, @code{a} is a |
236 | lexically captured variable, and @code{b} is a local variable. | |
237 | ||
238 | Another way to refer to @code{a} and @code{b} is to say that @code{a} | |
239 | is a ``free'' variable, since it is not defined within the lambda, and | |
240 | @code{b} is a ``bound'' variable. These are the terms used in the | |
241 | @dfn{lambda calculus}, a mathematical notation for describing | |
242 | functions. The lambda calculus is useful because it allows one to | |
243 | prove statements about functions. It is especially good at describing | |
244 | scope relations, and it is for that reason that we mention it here. | |
245 | ||
246 | Guile allocates all variables on the stack. When a lexically enclosed | |
f11871d6 AW |
247 | procedure with free variables---a @dfn{closure}---is created, it copies |
248 | those variables into its free variable vector. References to free | |
98850fd7 AW |
249 | variables are then redirected through the free variable vector. |
250 | ||
251 | If a variable is ever @code{set!}, however, it will need to be | |
252 | heap-allocated instead of stack-allocated, so that different closures | |
253 | that capture the same variable can see the same value. Also, this | |
254 | allows continuations to capture a reference to the variable, instead | |
255 | of to its value at one point in time. For these reasons, @code{set!} | |
256 | variables are allocated in ``boxes''---actually, in variable cells. | |
257 | @xref{Variables}, for more information. References to @code{set!} | |
258 | variables are indirected through the boxes. | |
259 | ||
260 | Thus perhaps counterintuitively, what would seem ``closer to the | |
261 | metal'', viz @code{set!}, actually forces an extra memory allocation | |
262 | and indirection. | |
263 | ||
264 | Going back to our example, @code{b} may be allocated on the stack, as | |
265 | it is never mutated. | |
266 | ||
267 | @code{a} may also be allocated on the stack, as it too is never | |
268 | mutated. Within the enclosed lambda, its value will be copied into | |
269 | (and referenced from) the free variables vector. | |
270 | ||
271 | @code{foo} is a top-level variable, because @code{foo} is not | |
272 | lexically bound in this example. | |
8680d53b | 273 | |
00ce5125 AW |
274 | @node VM Programs |
275 | @subsection Compiled Procedures are VM Programs | |
8680d53b AW |
276 | |
277 | By default, when you enter in expressions at Guile's REPL, they are | |
278 | first compiled to VM object code, then that VM object code is executed | |
279 | to produce a value. If the expression evaluates to a procedure, the | |
280 | result of this process is a compiled procedure. | |
281 | ||
282 | A compiled procedure is a compound object, consisting of its bytecode, | |
283 | a reference to any captured lexical variables, an object array, and | |
284 | some metadata such as the procedure's arity, name, and documentation. | |
285 | You can pick apart these pieces with the accessors in @code{(system vm | |
e3ba263d | 286 | program)}. @xref{Compiled Procedures}, for a full API reference. |
8680d53b | 287 | |
bd7aa35f | 288 | @cindex object table |
81fd3152 | 289 | @cindex object array |
bd7aa35f AW |
290 | The object array of a compiled procedure, also known as the |
291 | @dfn{object table}, holds all Scheme objects whose values are known | |
292 | not to change across invocations of the procedure: constant strings, | |
293 | symbols, etc. The object table of a program is initialized right | |
e3ba263d AW |
294 | before a program is loaded with @code{load-program}. |
295 | @xref{Loading Instructions}, for more information. | |
8680d53b | 296 | |
bd7aa35f AW |
297 | Variable objects are one such type of constant object: when a global |
298 | binding is defined, a variable object is associated to it and that | |
299 | object will remain constant over time, even if the value bound to it | |
300 | changes. Therefore, toplevel bindings only need to be looked up once. | |
301 | Thereafter, references to the corresponding toplevel variables from | |
302 | within the program are then performed via the @code{toplevel-ref} | |
303 | instruction, which uses the object vector, and are almost as fast as | |
304 | local variable references. | |
8680d53b AW |
305 | |
306 | We can see how these concepts tie together by disassembling the | |
81fd3152 | 307 | @code{foo} function we defined earlier to see what is going on: |
8680d53b AW |
308 | |
309 | @smallexample | |
310 | scheme@@(guile-user)> (define (foo a) (lambda (b) (list foo a b))) | |
311 | scheme@@(guile-user)> ,x foo | |
0a715b9a AW |
312 | 0 (assert-nargs-ee/locals 1) |
313 | 2 (object-ref 1) ;; #<procedure 8ebec20 at <current input>:0:17 (b)> | |
314 | 4 (local-ref 0) ;; `a' | |
315 | 6 (make-closure 0 1) | |
316 | 9 (return) | |
8680d53b AW |
317 | |
318 | ---------------------------------------- | |
0a715b9a AW |
319 | Disassembly of #<procedure 8ebec20 at <current input>:0:17 (b)>: |
320 | ||
321 | 0 (assert-nargs-ee/locals 1) | |
322 | 2 (toplevel-ref 1) ;; `foo' | |
323 | 4 (free-ref 0) ;; (closure variable) | |
324 | 6 (local-ref 0) ;; `b' | |
325 | 8 (list 0 3) ;; 3 elements at (unknown file):0:29 | |
326 | 11 (return) | |
8680d53b AW |
327 | @end smallexample |
328 | ||
136b5494 | 329 | First there's some prelude, where @code{foo} checks that it was called with only |
0a715b9a | 330 | 1 argument. Then at @code{ip} 2, we load up the compiled lambda. @code{Ip} 4 |
136b5494 | 331 | loads up `a', so that it can be captured into a closure by at @code{ip} |
0a715b9a | 332 | 6---binding code (from the compiled lambda) with data (the free-variable |
136b5494 AW |
333 | vector). Finally we return the closure. |
334 | ||
335 | The second stanza disassembles the compiled lambda. After the prelude, we note | |
336 | that toplevel variables are resolved relative to the module that was current | |
337 | when the procedure was created. This lookup occurs lazily, at the first time the | |
338 | variable is actually referenced, and the location of the lookup is cached so | |
acc51c3e | 339 | that future references are very cheap. @xref{Top-Level Environment Instructions}, |
136b5494 AW |
340 | for more details. |
341 | ||
342 | Then we see a reference to a free variable, corresponding to @code{a}. The | |
343 | disassembler doesn't have enough information to give a name to that variable, so | |
344 | it just marks it as being a ``closure variable''. Finally we see the reference | |
345 | to @code{b}, then the @code{list} opcode, an inline implementation of the | |
346 | @code{list} scheme routine. | |
8680d53b AW |
347 | |
348 | @node Instruction Set | |
349 | @subsection Instruction Set | |
350 | ||
acc51c3e | 351 | There are about 180 instructions in Guile's virtual machine. These |
bd7aa35f AW |
352 | instructions represent atomic units of a program's execution. Ideally, |
353 | they perform one task without conditional branches, then dispatch to | |
354 | the next instruction in the stream. | |
355 | ||
356 | Instructions themselves are one byte long. Some instructions take | |
357 | parameters, which follow the instruction byte in the instruction | |
358 | stream. | |
359 | ||
360 | Sometimes the compiler can figure out that it is compiling a special | |
361 | case that can be run more efficiently. So, for example, while Guile | |
362 | offers a generic test-and-branch instruction, it also offers specific | |
363 | instructions for special cases, so that the following cases all have | |
364 | their own test-and-branch instructions: | |
365 | ||
366 | @example | |
367 | (if pred then else) | |
368 | (if (not pred) then else) | |
369 | (if (null? l) then else) | |
370 | (if (not (null? l)) then else) | |
371 | @end example | |
372 | ||
373 | In addition, some Scheme primitives have their own inline | |
679cceed | 374 | implementations, e.g.@: @code{cons}, and @code{list}, as we saw in the |
81fd3152 | 375 | previous section. |
bd7aa35f AW |
376 | |
377 | So Guile's instruction set is a @emph{complete} instruction set, in | |
378 | that it provides the instructions that are suited to the problem, and | |
379 | is not concerned with making a minimal, orthogonal set of | |
380 | instructions. More instructions may be added over time. | |
8680d53b AW |
381 | |
382 | @menu | |
acc51c3e AW |
383 | * Lexical Environment Instructions:: |
384 | * Top-Level Environment Instructions:: | |
385 | * Procedure Call and Return Instructions:: | |
386 | * Function Prologue Instructions:: | |
387 | * Trampoline Instructions:: | |
8680d53b | 388 | * Branch Instructions:: |
acc51c3e | 389 | * Data Constructor Instructions:: |
bd7aa35f | 390 | * Loading Instructions:: |
acc51c3e | 391 | * Dynamic Environment Instructions:: |
8680d53b AW |
392 | * Miscellaneous Instructions:: |
393 | * Inlined Scheme Instructions:: | |
394 | * Inlined Mathematical Instructions:: | |
98850fd7 | 395 | * Inlined Bytevector Instructions:: |
8680d53b AW |
396 | @end menu |
397 | ||
8680d53b | 398 | |
acc51c3e AW |
399 | @node Lexical Environment Instructions |
400 | @subsubsection Lexical Environment Instructions | |
401 | ||
402 | These instructions access and mutate the lexical environment of a | |
403 | compiled procedure---its free and bound variables. | |
8680d53b | 404 | |
98850fd7 AW |
405 | Some of these instructions have @code{long-} variants, the difference |
406 | being that they take 16-bit arguments, encoded in big-endianness, | |
407 | instead of the normal 8-bit range. | |
408 | ||
acc51c3e AW |
409 | @xref{Stack Layout}, for more information on the format of stack frames. |
410 | ||
ca445ba5 | 411 | @deffn Instruction local-ref index |
98850fd7 | 412 | @deffnx Instruction long-local-ref index |
8680d53b | 413 | Push onto the stack the value of the local variable located at |
ca445ba5 | 414 | @var{index} within the current stack frame. |
bd7aa35f AW |
415 | |
416 | Note that arguments and local variables are all in one block. Thus the | |
ca445ba5 | 417 | first argument, if any, is at index 0, and local bindings follow the |
bd7aa35f | 418 | arguments. |
8680d53b AW |
419 | @end deffn |
420 | ||
ca445ba5 | 421 | @deffn Instruction local-set index |
acc51c3e | 422 | @deffnx Instruction long-local-set index |
8680d53b | 423 | Pop the Scheme object located on top of the stack and make it the new |
ca445ba5 | 424 | value of the local variable located at @var{index} within the current |
8680d53b AW |
425 | stack frame. |
426 | @end deffn | |
427 | ||
acc51c3e AW |
428 | @deffn Instruction box index |
429 | Pop a value off the stack, and set the @var{index}nth local variable | |
430 | to a box containing that value. A shortcut for @code{make-variable} | |
431 | then @code{local-set}, used when binding boxed variables. | |
432 | @end deffn | |
433 | ||
434 | @deffn Instruction empty-box index | |
64de6db5 | 435 | Set the @var{index}th local variable to a box containing a variable |
acc51c3e AW |
436 | whose value is unbound. Used when compiling some @code{letrec} |
437 | expressions. | |
438 | @end deffn | |
439 | ||
440 | @deffn Instruction local-boxed-ref index | |
3248c954 | 441 | @deffnx Instruction local-boxed-set index |
acc51c3e AW |
442 | Get or set the value of the variable located at @var{index} within the |
443 | current stack frame. A shortcut for @code{local-ref} then | |
444 | @code{variable-ref} or @code{variable-set}, respectively. | |
445 | @end deffn | |
446 | ||
98850fd7 AW |
447 | @deffn Instruction free-ref index |
448 | Push the value of the captured variable located at position | |
449 | @var{index} within the program's vector of captured variables. | |
8680d53b AW |
450 | @end deffn |
451 | ||
98850fd7 AW |
452 | @deffn Instruction free-boxed-ref index |
453 | @deffnx Instruction free-boxed-set index | |
acc51c3e AW |
454 | Get or set a boxed free variable. A shortcut for @code{free-ref} then |
455 | @code{variable-ref} or @code{variable-set}, respectively. | |
98850fd7 | 456 | |
acc51c3e AW |
457 | Note that there is no @code{free-set} instruction, as variables that are |
458 | @code{set!} must be boxed. | |
8680d53b AW |
459 | @end deffn |
460 | ||
acc51c3e AW |
461 | @deffn Instruction make-closure num-free-vars |
462 | Pop @var{num-free-vars} values and a program object off the stack in | |
463 | that order, and push a new program object closing over the given free | |
464 | variables. @var{num-free-vars} is encoded as a two-byte big-endian | |
465 | value. | |
98850fd7 | 466 | |
acc51c3e AW |
467 | The free variables are stored in an array, inline to the new program |
468 | object, in the order that they were on the stack (not the order they are | |
469 | popped off). The new closure shares state with the original program. At | |
470 | the time of this writing, the space overhead of closures is 3 words, | |
471 | plus one word for each free variable. | |
8680d53b AW |
472 | @end deffn |
473 | ||
98850fd7 | 474 | @deffn Instruction fix-closure index |
acc51c3e AW |
475 | Fix up the free variables array of the closure stored in the |
476 | @var{index}th local variable. @var{index} is a two-byte big-endian | |
477 | integer. | |
98850fd7 | 478 | |
acc51c3e AW |
479 | This instruction will pop as many values from the stack as are in the |
480 | corresponding closure's free variables array. The topmost value on the | |
481 | stack will be stored as the closure's last free variable, with other | |
482 | values filling in free variable slots in order. | |
98850fd7 | 483 | |
acc51c3e AW |
484 | @code{fix-closure} is part of a hack for allocating mutually recursive |
485 | procedures. The hack is to store the procedures in their corresponding | |
486 | local variable slots, with space already allocated for free variables. | |
487 | Then once they are all in place, this instruction fixes up their | |
488 | procedures' free variable bindings in place. This allows most | |
489 | @code{letrec}-bound procedures to be allocated unboxed on the stack. | |
98850fd7 AW |
490 | @end deffn |
491 | ||
acc51c3e AW |
492 | @deffn Instruction local-bound? index |
493 | @deffnx Instruction long-local-bound? index | |
494 | Push @code{#t} on the stack if the @code{index}th local variable has | |
495 | been assigned, or @code{#f} otherwise. Mostly useful for handling | |
496 | optional arguments in procedure prologues. | |
98850fd7 AW |
497 | @end deffn |
498 | ||
acc51c3e AW |
499 | |
500 | @node Top-Level Environment Instructions | |
501 | @subsubsection Top-Level Environment Instructions | |
502 | ||
503 | These instructions access values in the top-level environment: bindings | |
504 | that were not lexically apparent at the time that the code in question | |
505 | was compiled. | |
506 | ||
507 | The location in which a toplevel binding is stored can be looked up once | |
508 | and cached for later. The binding itself may change over time, but its | |
509 | location will stay constant. | |
510 | ||
511 | Currently only toplevel references within procedures are cached, as only | |
512 | procedures have a place to cache them, in their object tables. | |
bd7aa35f | 513 | |
ca445ba5 | 514 | @deffn Instruction toplevel-ref index |
a9b0f876 | 515 | @deffnx Instruction long-toplevel-ref index |
bd7aa35f | 516 | Push the value of the toplevel binding whose location is stored in at |
acc51c3e AW |
517 | position @var{index} in the current procedure's object table. The |
518 | @code{long-} variant encodes the index over two bytes. | |
bd7aa35f | 519 | |
acc51c3e AW |
520 | Initially, a cell in a procedure's object table that is used by |
521 | @code{toplevel-ref} is initialized to one of two forms. The normal case | |
522 | is that the cell holds a symbol, whose binding will be looked up | |
bd7aa35f AW |
523 | relative to the module that was current when the current program was |
524 | created. | |
525 | ||
526 | Alternately, the lookup may be performed relative to a particular | |
679cceed | 527 | module, determined at compile-time (e.g.@: via @code{@@} or |
bd7aa35f | 528 | @code{@@@@}). In that case, the cell in the object table holds a list: |
81fd3152 AW |
529 | @code{(@var{modname} @var{sym} @var{public?})}. The symbol @var{sym} |
530 | will be looked up in the module named @var{modname} (a list of | |
531 | symbols). The lookup will be performed against the module's public | |
532 | interface, unless @var{public?} is @code{#f}, which it is for example | |
533 | when compiling @code{@@@@}. | |
bd7aa35f AW |
534 | |
535 | In any case, if the symbol is unbound, an error is signalled. | |
536 | Otherwise the initial form is replaced with the looked-up variable, an | |
537 | in-place mutation of the object table. This mechanism provides for | |
538 | lazy variable resolution, and an important cached fast-path once the | |
539 | variable has been successfully resolved. | |
540 | ||
541 | This instruction pushes the value of the variable onto the stack. | |
8680d53b AW |
542 | @end deffn |
543 | ||
a9b0f876 AW |
544 | @deffn Instruction toplevel-set index |
545 | @deffnx Instruction long-toplevel-set index | |
bd7aa35f | 546 | Pop a value off the stack, and set it as the value of the toplevel |
ca445ba5 | 547 | variable stored at @var{index} in the object table. If the variable |
bd7aa35f | 548 | has not yet been looked up, we do the lookup as in |
98850fd7 AW |
549 | @code{toplevel-ref}. |
550 | @end deffn | |
551 | ||
552 | @deffn Instruction define | |
553 | Pop a symbol and a value from the stack, in that order. Look up its | |
554 | binding in the current toplevel environment, creating the binding if | |
555 | necessary. Set the variable to the value. | |
8680d53b AW |
556 | @end deffn |
557 | ||
bd7aa35f AW |
558 | @deffn Instruction link-now |
559 | Pop a value, @var{x}, from the stack. Look up the binding for @var{x}, | |
560 | according to the rules for @code{toplevel-ref}, and push that variable | |
561 | on the stack. If the lookup fails, an error will be signalled. | |
562 | ||
563 | This instruction is mostly used when loading programs, because it can | |
acc51c3e | 564 | do toplevel variable lookups without an object table. |
bd7aa35f AW |
565 | @end deffn |
566 | ||
567 | @deffn Instruction variable-ref | |
568 | Dereference the variable object which is on top of the stack and | |
569 | replace it by the value of the variable it represents. | |
570 | @end deffn | |
571 | ||
572 | @deffn Instruction variable-set | |
573 | Pop off two objects from the stack, a variable and a value, and set | |
574 | the variable to the value. | |
575 | @end deffn | |
576 | ||
acc51c3e AW |
577 | @deffn Instruction variable-bound? |
578 | Pop off the variable object from top of the stack and push @code{#t} if | |
579 | it is bound, or @code{#f} otherwise. Mostly useful in procedure | |
580 | prologues for defining default values for boxed optional variables. | |
581 | @end deffn | |
582 | ||
98850fd7 AW |
583 | @deffn Instruction make-variable |
584 | Replace the top object on the stack with a variable containing it. | |
585 | Used in some circumstances when compiling @code{letrec} expressions. | |
586 | @end deffn | |
587 | ||
81fd3152 | 588 | |
acc51c3e AW |
589 | @node Procedure Call and Return Instructions |
590 | @subsubsection Procedure Call and Return Instructions | |
8680d53b | 591 | |
acc51c3e | 592 | @c something about the calling convention here? |
8680d53b | 593 | |
acc51c3e | 594 | @deffn Instruction new-frame |
8274228f AW |
595 | Push a new frame on the stack, reserving space for the dynamic link, |
596 | return address, and the multiple-values return address. The frame | |
597 | pointer is not yet updated, because the frame is not yet active -- it | |
598 | has to be patched by a @code{call} instruction to get the return | |
599 | address. | |
8680d53b AW |
600 | @end deffn |
601 | ||
602 | @deffn Instruction call nargs | |
bd7aa35f | 603 | Call the procedure located at @code{sp[-nargs]} with the @var{nargs} |
81fd3152 AW |
604 | arguments located from @code{sp[-nargs + 1]} to @code{sp[0]}. |
605 | ||
acc51c3e AW |
606 | This instruction requires that a new frame be pushed on the stack before |
607 | the procedure, via @code{new-frame}. @xref{Stack Layout}, for more | |
608 | information. It patches up that frame with the current @code{ip} as the | |
609 | return address, then dispatches to the first instruction in the called | |
610 | procedure, relying on the called procedure to return one value to the | |
611 | newly-created continuation. Because the new frame pointer will point to | |
612 | @code{sp[-nargs + 1]}, the arguments don't have to be shuffled around -- | |
613 | they are already in place. | |
8680d53b AW |
614 | @end deffn |
615 | ||
a5bbb22e | 616 | @deffn Instruction tail-call nargs |
acc51c3e AW |
617 | Transfer control to the procedure located at @code{sp[-nargs]} with the |
618 | @var{nargs} arguments located from @code{sp[-nargs + 1]} to | |
619 | @code{sp[0]}. | |
8680d53b | 620 | |
acc51c3e AW |
621 | Unlike @code{call}, which requires a new frame to be pushed onto the |
622 | stack, @code{tail-call} simply shuffles down the procedure and arguments | |
623 | to the current stack frame. This instruction implements tail calls as | |
624 | required by RnRS. | |
8680d53b | 625 | @end deffn |
bd7aa35f AW |
626 | |
627 | @deffn Instruction apply nargs | |
a5bbb22e AW |
628 | @deffnx Instruction tail-apply nargs |
629 | Like @code{call} and @code{tail-call}, except that the top item on the | |
bd7aa35f AW |
630 | stack must be a list. The elements of that list are then pushed on the |
631 | stack and treated as additional arguments, replacing the list itself, | |
632 | then the procedure is invoked as usual. | |
8680d53b | 633 | @end deffn |
bd7aa35f AW |
634 | |
635 | @deffn Instruction call/nargs | |
a5bbb22e AW |
636 | @deffnx Instruction tail-call/nargs |
637 | These are like @code{call} and @code{tail-call}, except they take the | |
bd7aa35f AW |
638 | number of arguments from the stack instead of the instruction stream. |
639 | These instructions are used in the implementation of multiple value | |
640 | returns, where the actual number of values is pushed on the stack. | |
8680d53b AW |
641 | @end deffn |
642 | ||
bd7aa35f AW |
643 | @deffn Instruction mv-call nargs offset |
644 | Like @code{call}, except that a multiple-value continuation is created | |
645 | in addition to a single-value continuation. | |
646 | ||
acc51c3e AW |
647 | The offset (a three-byte value) is an offset within the instruction |
648 | stream; the multiple-value return address in the new frame (@pxref{Stack | |
649 | Layout}) will be set to the normal return address plus this offset. | |
650 | Instructions at that offset will expect the top value of the stack to be | |
651 | the number of values, and below that values themselves, pushed | |
652 | separately. | |
8680d53b | 653 | @end deffn |
bd7aa35f | 654 | |
8274228f AW |
655 | @deffn Instruction return |
656 | Free the program's frame, returning the top value from the stack to | |
657 | the current continuation. (The stack should have exactly one value on | |
658 | it.) | |
659 | ||
660 | Specifically, the @code{sp} is decremented to one below the current | |
661 | @code{fp}, the @code{ip} is reset to the current return address, the | |
662 | @code{fp} is reset to the value of the current dynamic link, and then | |
acc51c3e | 663 | the returned value is pushed on the stack. |
8274228f AW |
664 | @end deffn |
665 | ||
bd7aa35f | 666 | @deffn Instruction return/values nvalues |
acc51c3e AW |
667 | @deffnx Instruction return/nvalues |
668 | Return the top @var{nvalues} to the current continuation. In the case of | |
669 | @code{return/nvalues}, @var{nvalues} itself is first popped from the top | |
670 | of the stack. | |
bd7aa35f AW |
671 | |
672 | If the current continuation is a multiple-value continuation, | |
673 | @code{return/values} pushes the number of values on the stack, then | |
674 | returns as in @code{return}, but to the multiple-value return address. | |
675 | ||
679cceed | 676 | Otherwise if the current continuation accepts only one value, i.e.@: the |
bd7aa35f AW |
677 | multiple-value return address is @code{NULL}, then we assume the user |
678 | only wants one value, and we give them the first one. If there are no | |
679 | values, an error is signaled. | |
8680d53b | 680 | @end deffn |
bd7aa35f AW |
681 | |
682 | @deffn Instruction return/values* nvalues | |
683 | Like a combination of @code{apply} and @code{return/values}, in which | |
684 | the top value on the stack is interpreted as a list of additional | |
685 | values. This is an optimization for the common @code{(apply values | |
686 | ...)} case. | |
8680d53b AW |
687 | @end deffn |
688 | ||
bd7aa35f AW |
689 | @deffn Instruction truncate-values nbinds nrest |
690 | Used in multiple-value continuations, this instruction takes the | |
81fd3152 | 691 | values that are on the stack (including the number-of-values marker) |
bd7aa35f AW |
692 | and truncates them for a binding construct. |
693 | ||
694 | For example, a call to @code{(receive (x y . z) (foo) ...)} would, | |
695 | logically speaking, pop off the values returned from @code{(foo)} and | |
696 | push them as three values, corresponding to @code{x}, @code{y}, and | |
697 | @code{z}. In that case, @var{nbinds} would be 3, and @var{nrest} would | |
81fd3152 | 698 | be 1 (to indicate that one of the bindings was a rest argument). |
bd7aa35f AW |
699 | |
700 | Signals an error if there is an insufficient number of values. | |
701 | @end deffn | |
8680d53b | 702 | |
8274228f | 703 | @deffn Instruction call/cc |
a5bbb22e | 704 | @deffnx Instruction tail-call/cc |
8274228f AW |
705 | Capture the current continuation, and then call (or tail-call) the |
706 | procedure on the top of the stack, with the continuation as the | |
707 | argument. | |
708 | ||
709 | @code{call/cc} does not require a @code{new-frame} to be pushed on the | |
710 | stack, as @code{call} does, because it needs to capture the stack | |
711 | before the frame is pushed. | |
712 | ||
713 | Both the VM continuation and the C continuation are captured. | |
714 | @end deffn | |
715 | ||
8680d53b | 716 | |
acc51c3e AW |
717 | @node Function Prologue Instructions |
718 | @subsubsection Function Prologue Instructions | |
719 | ||
720 | A function call in Guile is very cheap: the VM simply hands control to | |
721 | the procedure. The procedure itself is responsible for asserting that it | |
722 | has been passed an appropriate number of arguments. This strategy allows | |
723 | arbitrarily complex argument parsing idioms to be developed, without | |
724 | harming the common case. | |
725 | ||
726 | For example, only calls to keyword-argument procedures ``pay'' for the | |
727 | cost of parsing keyword arguments. (At the time of this writing, calling | |
728 | procedures with keyword arguments is typically two to four times as | |
729 | costly as calling procedures with a fixed set of arguments.) | |
730 | ||
731 | @deffn Instruction assert-nargs-ee n | |
732 | @deffnx Instruction assert-nargs-ge n | |
733 | Assert that the current procedure has been passed exactly @var{n} | |
734 | arguments, for the @code{-ee} case, or @var{n} or more arguments, for | |
735 | the @code{-ge} case. @var{n} is encoded over two bytes. | |
736 | ||
737 | The number of arguments is determined by subtracting the frame pointer | |
738 | from the stack pointer (@code{sp - (fp -1)}). @xref{Stack Layout}, for | |
739 | more details on stack frames. | |
740 | @end deffn | |
741 | ||
742 | @deffn Instruction br-if-nargs-ne n offset | |
743 | @deffnx Instruction br-if-nargs-gt n offset | |
744 | @deffnx Instruction br-if-nargs-lt n offset | |
745 | Jump to @var{offset} if the number of arguments is not equal to, greater | |
746 | than, or less than @var{n}. @var{n} is encoded over two bytes, and | |
747 | @var{offset} has the normal three-byte encoding. | |
748 | ||
ecb87335 | 749 | These instructions are used to implement multiple arities, as in |
acc51c3e AW |
750 | @code{case-lambda}. @xref{Case-lambda}, for more information. |
751 | @end deffn | |
752 | ||
753 | @deffn Instruction bind-optionals n | |
754 | If the procedure has been called with fewer than @var{n} arguments, fill | |
755 | in the remaining arguments with an unbound value (@code{SCM_UNDEFINED}). | |
756 | @var{n} is encoded over two bytes. | |
757 | ||
758 | The optionals can be later initialized conditionally via the | |
759 | @code{local-bound?} instruction. | |
760 | @end deffn | |
761 | ||
762 | @deffn Instruction push-rest n | |
763 | Pop off excess arguments (more than @var{n}), collecting them into a | |
764 | list, and push that list. Used to bind a rest argument, if the procedure | |
765 | has no keyword arguments. Procedures with keyword arguments use | |
766 | @code{bind-rest} instead. | |
767 | @end deffn | |
768 | ||
769 | @deffn Instruction bind-rest n idx | |
770 | Pop off excess arguments (more than @var{n}), collecting them into a | |
771 | list. The list is then assigned to the @var{idx}th local variable. | |
772 | @end deffn | |
773 | ||
774 | @deffn Instruction bind-optionals/shuffle nreq nreq-and-opt ntotal | |
581f410f | 775 | @deffnx Instruction bind-optionals/shuffle-or-br nreq nreq-and-opt ntotal offset |
acc51c3e AW |
776 | Shuffle keyword arguments to the top of the stack, filling in the holes |
777 | with @code{SCM_UNDEFINED}. Each argument is encoded over two bytes. | |
778 | ||
779 | This instruction is used by procedures with keyword arguments. | |
780 | @var{nreq} is the number of required arguments to the procedure, and | |
781 | @var{nreq-and-opt} is the total number of positional arguments (required | |
782 | plus optional). @code{bind-optionals/shuffle} will scan the stack from | |
783 | the @var{nreq}th argument up to the @var{nreq-and-opt}th, and start | |
784 | shuffling when it sees the first keyword argument or runs out of | |
785 | positional arguments. | |
786 | ||
581f410f AW |
787 | @code{bind-optionals/shuffle-or-br} does the same, except that it checks |
788 | if there are too many positional arguments before shuffling. If this is | |
789 | the case, it jumps to @var{offset}, encoded using the normal three-byte | |
790 | encoding. | |
791 | ||
acc51c3e AW |
792 | Shuffling simply moves the keyword arguments past the total number of |
793 | arguments, @var{ntotal}, which includes keyword and rest arguments. The | |
794 | free slots created by the shuffle are filled in with | |
795 | @code{SCM_UNDEFINED}, so they may be conditionally initialized later in | |
796 | the function's prologue. | |
797 | @end deffn | |
798 | ||
799 | @deffn Instruction bind-kwargs idx ntotal flags | |
800 | Parse keyword arguments, assigning their values to the corresponding | |
801 | local variables. The keyword arguments should already have been shuffled | |
802 | above the @var{ntotal}th stack slot by @code{bind-optionals/shuffle}. | |
803 | ||
804 | The parsing is driven by a keyword arguments association list, looked up | |
805 | from the @var{idx}th element of the procedures object array. The alist | |
806 | is a list of pairs of the form @code{(@var{kw} . @var{index})}, mapping | |
807 | keyword arguments to their local variable indices. | |
808 | ||
809 | There are two bitflags that affect the parser, @code{allow-other-keys?} | |
810 | (@code{0x1}) and @code{rest?} (@code{0x2}). Unless | |
811 | @code{allow-other-keys?} is set, the parser will signal an error if an | |
ecb87335 | 812 | unknown key is found. If @code{rest?} is set, errors parsing the |
acc51c3e AW |
813 | keyword arguments will be ignored, as a later @code{bind-rest} |
814 | instruction will collect all of the tail arguments, including the | |
815 | keywords, into a list. Otherwise if the keyword arguments are invalid, | |
816 | an error is signalled. | |
817 | ||
818 | @var{idx} and @var{ntotal} are encoded over two bytes each, and | |
819 | @var{flags} is encoded over one byte. | |
820 | @end deffn | |
821 | ||
822 | @deffn Instruction reserve-locals n | |
823 | Resets the stack pointer to have space for @var{n} local variables, | |
824 | including the arguments. If this operation increments the stack pointer, | |
825 | as in a push, the new slots are filled with @code{SCM_UNBOUND}. If this | |
826 | operation decrements the stack pointer, any excess values are dropped. | |
827 | ||
828 | @code{reserve-locals} is typically used after argument parsing to | |
829 | reserve space for local variables. | |
830 | @end deffn | |
831 | ||
de45d8ee AW |
832 | @deffn Instruction assert-nargs-ee/locals n |
833 | @deffnx Instruction assert-nargs-ge/locals n | |
834 | A combination of @code{assert-nargs-ee} and @code{reserve-locals}. The | |
835 | number of arguments is encoded in the lower three bits of @var{n}, a | |
836 | one-byte value. The number of additional local variables is take from | |
837 | the upper 5 bits of @var{n}. | |
838 | @end deffn | |
839 | ||
acc51c3e AW |
840 | |
841 | @node Trampoline Instructions | |
842 | @subsubsection Trampoline Instructions | |
843 | ||
844 | Though most applicable objects in Guile are procedures implemented | |
845 | in bytecode, not all are. There are primitives, continuations, and other | |
846 | procedure-like objects that have their own calling convention. Instead | |
847 | of adding special cases to the @code{call} instruction, Guile wraps | |
848 | these other applicable objects in VM trampoline procedures, then | |
849 | provides special support for these objects in bytecode. | |
850 | ||
851 | Trampoline procedures are typically generated by Guile at runtime, for | |
852 | example in response to a call to @code{scm_c_make_gsubr}. As such, a | |
853 | compiler probably shouldn't emit code with these instructions. However, | |
854 | it's still interesting to know how these things work, so we document | |
855 | these trampoline instructions here. | |
856 | ||
857 | @deffn Instruction subr-call nargs | |
858 | Pop off a foreign pointer (which should have been pushed on by the | |
859 | trampoline), and call it directly, with the @var{nargs} arguments from | |
860 | the stack. Return the resulting value or values to the calling | |
861 | procedure. | |
862 | @end deffn | |
863 | ||
864 | @deffn Instruction foreign-call nargs | |
865 | Pop off an internal foreign object (which should have been pushed on by | |
866 | the trampoline), and call that foreign function with the @var{nargs} | |
867 | arguments from the stack. Return the resulting value to the calling | |
868 | procedure. | |
acc51c3e AW |
869 | @end deffn |
870 | ||
871 | @deffn Instruction continuation-call | |
872 | Pop off an internal continuation object (which should have been pushed | |
873 | on by the trampoline), and reinstate that continuation. All of the | |
874 | procedure's arguments are passed to the continuation. Does not return. | |
875 | @end deffn | |
876 | ||
877 | @deffn Instruction partial-cont-call | |
878 | Pop off two objects from the stack: the dynamic winds associated with | |
879 | the partial continuation, and the VM continuation object. Unroll the | |
880 | continuation onto the stack, rewinding the dynamic environment and | |
881 | overwriting the current frame, and pass all arguments to the | |
882 | continuation. Control flow proceeds where the continuation was captured. | |
883 | @end deffn | |
884 | ||
885 | ||
886 | @node Branch Instructions | |
887 | @subsubsection Branch Instructions | |
888 | ||
889 | All the conditional branch instructions described below work in the | |
890 | same way: | |
891 | ||
892 | @itemize | |
893 | @item They pop off Scheme object(s) located on the stack for use in the | |
894 | branch condition | |
895 | @item If the condition is true, then the instruction pointer is | |
896 | increased by the offset passed as an argument to the branch | |
897 | instruction; | |
898 | @item Program execution proceeds with the next instruction (that is, | |
899 | the one to which the instruction pointer points). | |
900 | @end itemize | |
901 | ||
902 | Note that the offset passed to the instruction is encoded as three 8-bit | |
903 | integers, in big-endian order, effectively giving Guile a 24-bit | |
904 | relative address space. | |
905 | ||
906 | @deffn Instruction br offset | |
907 | Jump to @var{offset}. No values are popped. | |
908 | @end deffn | |
909 | ||
910 | @deffn Instruction br-if offset | |
911 | Jump to @var{offset} if the object on the stack is not false. | |
912 | @end deffn | |
913 | ||
914 | @deffn Instruction br-if-not offset | |
915 | Jump to @var{offset} if the object on the stack is false. | |
916 | @end deffn | |
917 | ||
918 | @deffn Instruction br-if-eq offset | |
919 | Jump to @var{offset} if the two objects located on the stack are | |
64de6db5 | 920 | equal in the sense of @code{eq?}. Note that, for this instruction, the |
acc51c3e AW |
921 | stack pointer is decremented by two Scheme objects instead of only |
922 | one. | |
923 | @end deffn | |
924 | ||
925 | @deffn Instruction br-if-not-eq offset | |
64de6db5 | 926 | Same as @code{br-if-eq} for non-@code{eq?} objects. |
acc51c3e AW |
927 | @end deffn |
928 | ||
929 | @deffn Instruction br-if-null offset | |
930 | Jump to @var{offset} if the object on the stack is @code{'()}. | |
931 | @end deffn | |
932 | ||
933 | @deffn Instruction br-if-not-null offset | |
934 | Jump to @var{offset} if the object on the stack is not @code{'()}. | |
935 | @end deffn | |
936 | ||
937 | ||
938 | @node Data Constructor Instructions | |
939 | @subsubsection Data Constructor Instructions | |
940 | ||
941 | These instructions push simple immediate values onto the stack, | |
ecb87335 | 942 | or construct compound data structures from values on the stack. |
bd7aa35f | 943 | |
8680d53b AW |
944 | @deffn Instruction make-int8 value |
945 | Push @var{value}, an 8-bit integer, onto the stack. | |
946 | @end deffn | |
947 | ||
948 | @deffn Instruction make-int8:0 | |
949 | Push the immediate value @code{0} onto the stack. | |
950 | @end deffn | |
951 | ||
952 | @deffn Instruction make-int8:1 | |
953 | Push the immediate value @code{1} onto the stack. | |
954 | @end deffn | |
955 | ||
956 | @deffn Instruction make-int16 value | |
957 | Push @var{value}, a 16-bit integer, onto the stack. | |
958 | @end deffn | |
959 | ||
586cfdec AW |
960 | @deffn Instruction make-uint64 value |
961 | Push @var{value}, an unsigned 64-bit integer, onto the stack. The | |
962 | value is encoded in 8 bytes, most significant byte first (big-endian). | |
963 | @end deffn | |
964 | ||
965 | @deffn Instruction make-int64 value | |
966 | Push @var{value}, a signed 64-bit integer, onto the stack. The value | |
967 | is encoded in 8 bytes, most significant byte first (big-endian), in | |
968 | twos-complement arithmetic. | |
969 | @end deffn | |
970 | ||
8680d53b AW |
971 | @deffn Instruction make-false |
972 | Push @code{#f} onto the stack. | |
973 | @end deffn | |
974 | ||
975 | @deffn Instruction make-true | |
976 | Push @code{#t} onto the stack. | |
977 | @end deffn | |
978 | ||
4530432e | 979 | @deffn Instruction make-nil |
92a61010 | 980 | Push @code{#nil} onto the stack. |
4530432e DK |
981 | @end deffn |
982 | ||
8680d53b AW |
983 | @deffn Instruction make-eol |
984 | Push @code{'()} onto the stack. | |
985 | @end deffn | |
986 | ||
987 | @deffn Instruction make-char8 value | |
988 | Push @var{value}, an 8-bit character, onto the stack. | |
989 | @end deffn | |
990 | ||
98850fd7 AW |
991 | @deffn Instruction make-char32 value |
992 | Push @var{value}, an 32-bit character, onto the stack. The value is | |
993 | encoded in big-endian order. | |
994 | @end deffn | |
995 | ||
996 | @deffn Instruction make-symbol | |
997 | Pops a string off the stack, and pushes a symbol. | |
998 | @end deffn | |
999 | ||
1000 | @deffn Instruction make-keyword value | |
1001 | Pops a symbol off the stack, and pushes a keyword. | |
1002 | @end deffn | |
1003 | ||
8680d53b AW |
1004 | @deffn Instruction list n |
1005 | Pops off the top @var{n} values off of the stack, consing them up into | |
1006 | a list, then pushes that list on the stack. What was the topmost value | |
81fd3152 AW |
1007 | will be the last element in the list. @var{n} is a two-byte value, |
1008 | most significant byte first. | |
8680d53b AW |
1009 | @end deffn |
1010 | ||
1011 | @deffn Instruction vector n | |
1012 | Create and fill a vector with the top @var{n} values from the stack, | |
81fd3152 AW |
1013 | popping off those values and pushing on the resulting vector. @var{n} |
1014 | is a two-byte value, like in @code{vector}. | |
8680d53b AW |
1015 | @end deffn |
1016 | ||
acc51c3e AW |
1017 | @deffn Instruction make-struct n |
1018 | Make a new struct from the top @var{n} values on the stack. The values | |
1019 | are popped, and the new struct is pushed. | |
1020 | ||
1021 | The deepest value is used as the vtable for the struct, and the rest are | |
1022 | used in order as the field initializers. Tail arrays are not supported | |
1023 | by this instruction. | |
1024 | @end deffn | |
1025 | ||
1026 | @deffn Instruction make-array n | |
1027 | Pop an array shape from the stack, then pop the remaining @var{n} | |
1028 | values, pushing a new array. @var{n} is encoded over three bytes. | |
1029 | ||
1030 | The array shape should be appropriate to store @var{n} values. | |
1031 | @xref{Array Procedures}, for more information on array shapes. | |
1032 | @end deffn | |
1033 | ||
1034 | Many of these data structures are constant, never changing over the | |
1035 | course of the different invocations of the procedure. In that case it is | |
1036 | often advantageous to make them once when the procedure is created, and | |
1037 | just reference them from the object table thereafter. @xref{Variables | |
1038 | and the VM}, for more information on the object table. | |
1039 | ||
1040 | @deffn Instruction object-ref n | |
1041 | @deffnx Instruction long-object-ref n | |
1042 | Push @var{n}th value from the current program's object vector. The | |
1043 | ``long'' variant has a 16-bit index instead of an 8-bit index. | |
1044 | @end deffn | |
1045 | ||
1046 | ||
1047 | @node Loading Instructions | |
1048 | @subsubsection Loading Instructions | |
1049 | ||
1050 | In addition to VM instructions, an instruction stream may contain | |
1051 | variable-length data embedded within it. This data is always preceded | |
1052 | by special loading instructions, which interpret the data and advance | |
1053 | the instruction pointer to the next VM instruction. | |
1054 | ||
1055 | All of these loading instructions have a @code{length} parameter, | |
1056 | indicating the size of the embedded data, in bytes. The length itself | |
1057 | is encoded in 3 bytes. | |
1058 | ||
1059 | @deffn Instruction load-number length | |
1060 | Load an arbitrary number from the instruction stream. The number is | |
1061 | embedded in the stream as a string. | |
1062 | @end deffn | |
1063 | @deffn Instruction load-string length | |
1064 | Load a string from the instruction stream. The string is assumed to be | |
080a9d4f | 1065 | encoded in the ``latin1'' locale. |
acc51c3e AW |
1066 | @end deffn |
1067 | @deffn Instruction load-wide-string length | |
1068 | Load a UTF-32 string from the instruction stream. @var{length} is the | |
ecb87335 | 1069 | length in bytes, not in codepoints. |
acc51c3e AW |
1070 | @end deffn |
1071 | @deffn Instruction load-symbol length | |
1072 | Load a symbol from the instruction stream. The symbol is assumed to be | |
080a9d4f | 1073 | encoded in the ``latin1'' locale. Symbols backed by wide strings may |
acc51c3e AW |
1074 | be loaded via @code{load-wide-string} then @code{make-symbol}. |
1075 | @end deffn | |
1076 | @deffn Instruction load-array length | |
1077 | Load a uniform array from the instruction stream. The shape and type | |
1078 | of the array are popped off the stack, in that order. | |
1079 | @end deffn | |
1080 | ||
1081 | @deffn Instruction load-program | |
1082 | Load bytecode from the instruction stream, and push a compiled | |
1083 | procedure. | |
1084 | ||
1085 | This instruction pops one value from the stack: the program's object | |
1086 | table, as a vector, or @code{#f} in the case that the program has no | |
1087 | object table. A program that does not reference toplevel bindings and | |
1088 | does not use @code{object-ref} does not need an object table. | |
1089 | ||
1090 | This instruction is unlike the rest of the loading instructions, | |
1091 | because instead of parsing its data, it directly maps the instruction | |
1092 | stream onto a C structure, @code{struct scm_objcode}. @xref{Bytecode | |
1093 | and Objcode}, for more information. | |
1094 | ||
1095 | The resulting compiled procedure will not have any free variables | |
1096 | captured, so it may be loaded only once but used many times to create | |
1097 | closures. | |
1098 | @end deffn | |
1099 | ||
1100 | @node Dynamic Environment Instructions | |
1101 | @subsubsection Dynamic Environment Instructions | |
1102 | ||
1103 | Guile's virtual machine has low-level support for @code{dynamic-wind}, | |
1104 | dynamic binding, and composable prompts and aborts. | |
1105 | ||
1106 | @deffn Instruction wind | |
1107 | Pop an unwind thunk and a wind thunk from the stack, in that order, and | |
1108 | push them onto the ``dynamic stack''. The unwind thunk will be called on | |
1109 | nonlocal exits, and the wind thunk on reentries. Used to implement | |
1110 | @code{dynamic-wind}. | |
1111 | ||
1112 | Note that neither thunk is actually called; the compiler should emit | |
1113 | calls to wind and unwind for the normal dynamic-wind control flow. | |
1114 | @xref{Dynamic Wind}. | |
1115 | @end deffn | |
1116 | ||
1117 | @deffn Instruction unwind | |
1118 | Pop off the top entry from the ``dynamic stack'', for example, a | |
1119 | wind/unwind thunk pair. @code{unwind} instructions should be properly | |
1120 | paired with their winding instructions, like @code{wind}. | |
1121 | @end deffn | |
1122 | ||
1123 | @deffn Instruction wind-fluids n | |
1124 | Pop off @var{n} values and @var{n} fluids from the stack, in that order. | |
1125 | Set the fluids to the values by creating a with-fluids object and | |
1126 | pushing that object on the dynamic stack. @xref{Fluids and Dynamic | |
1127 | States}. | |
1128 | @end deffn | |
1129 | ||
1130 | @deffn Instruction unwind-fluids | |
1131 | Pop a with-fluids object from the dynamic stack, and swap the current | |
1132 | values of its fluids with the saved values of its fluids. In this way, | |
1133 | the dynamic environment is left as it was before the corresponding | |
1134 | @code{wind-fluids} instruction was processed. | |
1135 | @end deffn | |
1136 | ||
1137 | @deffn Instruction fluid-ref | |
1138 | Pop a fluid from the stack, and push its current value. | |
1139 | @end deffn | |
1140 | ||
1141 | @deffn Instruction fluid-set | |
1142 | Pop a value and a fluid from the stack, in that order, and set the fluid | |
1143 | to the value. | |
1144 | @end deffn | |
1145 | ||
1146 | @deffn Instruction prompt escape-only? offset | |
1147 | Establish a dynamic prompt. @xref{Prompts}, for more information on | |
1148 | prompts. | |
1149 | ||
1150 | The prompt will be pushed on the dynamic stack. The normal control flow | |
1151 | should ensure that the prompt is popped off at the end, via | |
1152 | @code{unwind}. | |
1153 | ||
1154 | If an abort is made to this prompt, control will jump to @var{offset}, a | |
1155 | three-byte relative address. The continuation and all arguments to the | |
1156 | abort will be pushed on the stack, along with the total number of | |
1157 | arguments (including the continuation. If control returns to the | |
1158 | handler, the prompt is already popped off by the abort mechanism. | |
1159 | (Guile's @code{prompt} implements Felleisen's @dfn{--F--} operator.) | |
1160 | ||
1161 | If @var{escape-only?} is nonzero, the prompt will be marked as | |
1162 | escape-only, which allows an abort to this prompt to avoid reifying the | |
1163 | continuation. | |
1164 | @end deffn | |
1165 | ||
1166 | @deffn Instruction abort n | |
1167 | Abort to a dynamic prompt. | |
1168 | ||
1169 | This instruction pops one tail argument list, @var{n} arguments, and a | |
1170 | prompt tag from the stack. The dynamic environment is then searched for | |
1171 | a prompt having the given tag. If none is found, an error is signalled. | |
1172 | Otherwise all arguments are passed to the prompt's handler, along with | |
1173 | the captured continuation, if necessary. | |
1174 | ||
1175 | If the prompt's handler can be proven to not reference the captured | |
1176 | continuation, no continuation is allocated. This decision happens | |
1177 | dynamically, at run-time; the general case is that the continuation may | |
1178 | be captured, and thus resumed. A reinstated continuation will have its | |
1179 | arguments pushed on the stack, along with the number of arguments, as in | |
1180 | the multiple-value return convention. Therefore an @code{abort} | |
1181 | instruction should be followed by code ready to handle the equivalent of | |
1182 | a multiply-valued return. | |
1183 | @end deffn | |
1184 | ||
8680d53b AW |
1185 | @node Miscellaneous Instructions |
1186 | @subsubsection Miscellaneous Instructions | |
1187 | ||
1188 | @deffn Instruction nop | |
98850fd7 AW |
1189 | Does nothing! Used for padding other instructions to certain |
1190 | alignments. | |
8680d53b AW |
1191 | @end deffn |
1192 | ||
1193 | @deffn Instruction halt | |
bd7aa35f AW |
1194 | Exits the VM, returning a SCM value. Normally, this instruction is |
1195 | only part of the ``bootstrap program'', a program run when a virtual | |
1196 | machine is first entered; compiled Scheme procedures will not contain | |
1197 | this instruction. | |
1198 | ||
1199 | If multiple values have been returned, the SCM value will be a | |
e3ba263d | 1200 | multiple-values object (@pxref{Multiple Values}). |
8680d53b AW |
1201 | @end deffn |
1202 | ||
1203 | @deffn Instruction break | |
1204 | Does nothing, but invokes the break hook. | |
1205 | @end deffn | |
1206 | ||
1207 | @deffn Instruction drop | |
1208 | Pops off the top value from the stack, throwing it away. | |
1209 | @end deffn | |
1210 | ||
1211 | @deffn Instruction dup | |
1212 | Re-pushes the top value onto the stack. | |
1213 | @end deffn | |
1214 | ||
1215 | @deffn Instruction void | |
1216 | Pushes ``the unspecified value'' onto the stack. | |
1217 | @end deffn | |
1218 | ||
1219 | @node Inlined Scheme Instructions | |
1220 | @subsubsection Inlined Scheme Instructions | |
1221 | ||
bd7aa35f | 1222 | The Scheme compiler can recognize the application of standard Scheme |
81fd3152 AW |
1223 | procedures. It tries to inline these small operations to avoid the |
1224 | overhead of creating new stack frames. | |
bd7aa35f AW |
1225 | |
1226 | Since most of these operations are historically implemented as C | |
1227 | primitives, not inlining them would entail constantly calling out from | |
86872cc3 | 1228 | the VM to the interpreter, which has some costs---registers must be |
bd7aa35f | 1229 | saved, the interpreter has to dispatch, called procedures have to do |
ecb87335 | 1230 | much type checking, etc. It's much more efficient to inline these |
bd7aa35f AW |
1231 | operations in the virtual machine itself. |
1232 | ||
1233 | All of these instructions pop their arguments from the stack and push | |
1234 | their results, and take no parameters from the instruction stream. | |
1235 | Thus, unlike in the previous sections, these instruction definitions | |
1236 | show stack parameters instead of parameters from the instruction | |
1237 | stream. | |
1238 | ||
8680d53b | 1239 | @deffn Instruction not x |
bd7aa35f AW |
1240 | @deffnx Instruction not-not x |
1241 | @deffnx Instruction eq? x y | |
1242 | @deffnx Instruction not-eq? x y | |
1243 | @deffnx Instruction null? | |
1244 | @deffnx Instruction not-null? | |
1245 | @deffnx Instruction eqv? x y | |
1246 | @deffnx Instruction equal? x y | |
1247 | @deffnx Instruction pair? x y | |
81fd3152 | 1248 | @deffnx Instruction list? x |
bd7aa35f AW |
1249 | @deffnx Instruction set-car! pair x |
1250 | @deffnx Instruction set-cdr! pair x | |
81fd3152 | 1251 | @deffnx Instruction cons x y |
bd7aa35f AW |
1252 | @deffnx Instruction car x |
1253 | @deffnx Instruction cdr x | |
98850fd7 AW |
1254 | @deffnx Instruction vector-ref x y |
1255 | @deffnx Instruction vector-set x n y | |
acc51c3e AW |
1256 | @deffnx Instruction struct? x |
1257 | @deffnx Instruction struct-ref x n | |
1258 | @deffnx Instruction struct-set x n v | |
1259 | @deffnx Instruction struct-vtable x | |
1260 | @deffnx Instruction class-of x | |
1261 | @deffnx Instruction slot-ref struct n | |
1262 | @deffnx Instruction slot-set struct n x | |
bd7aa35f AW |
1263 | Inlined implementations of their Scheme equivalents. |
1264 | @end deffn | |
1265 | ||
1266 | Note that @code{caddr} and friends compile to a series of @code{car} | |
1267 | and @code{cdr} instructions. | |
8680d53b AW |
1268 | |
1269 | @node Inlined Mathematical Instructions | |
1270 | @subsubsection Inlined Mathematical Instructions | |
1271 | ||
bd7aa35f AW |
1272 | Inlining mathematical operations has the obvious advantage of handling |
1273 | fixnums without function calls or allocations. The trick, of course, | |
1274 | is knowing when the result of an operation will be a fixnum, and there | |
1275 | might be a couple bugs here. | |
1276 | ||
1277 | More instructions could be added here over time. | |
1278 | ||
1279 | As in the previous section, the definitions below show stack | |
1280 | parameters instead of instruction stream parameters. | |
1281 | ||
1282 | @deffn Instruction add x y | |
98850fd7 | 1283 | @deffnx Instruction add1 x |
bd7aa35f | 1284 | @deffnx Instruction sub x y |
98850fd7 | 1285 | @deffnx Instruction sub1 x |
bd7aa35f AW |
1286 | @deffnx Instruction mul x y |
1287 | @deffnx Instruction div x y | |
1288 | @deffnx Instruction quo x y | |
1289 | @deffnx Instruction rem x y | |
1290 | @deffnx Instruction mod x y | |
1291 | @deffnx Instruction ee? x y | |
1292 | @deffnx Instruction lt? x y | |
1293 | @deffnx Instruction gt? x y | |
1294 | @deffnx Instruction le? x y | |
1295 | @deffnx Instruction ge? x y | |
acc51c3e AW |
1296 | @deffnx Instruction ash x n |
1297 | @deffnx Instruction logand x y | |
1298 | @deffnx Instruction logior x y | |
1299 | @deffnx Instruction logxor x y | |
bd7aa35f | 1300 | Inlined implementations of the corresponding mathematical operations. |
8680d53b | 1301 | @end deffn |
98850fd7 AW |
1302 | |
1303 | @node Inlined Bytevector Instructions | |
1304 | @subsubsection Inlined Bytevector Instructions | |
1305 | ||
1306 | Bytevector operations correspond closely to what the current hardware | |
1307 | can do, so it makes sense to inline them to VM instructions, providing | |
1308 | a clear path for eventual native compilation. Without this, Scheme | |
1309 | programs would need other primitives for accessing raw bytes -- but | |
1310 | these primitives are as good as any. | |
1311 | ||
1312 | As in the previous section, the definitions below show stack | |
1313 | parameters instead of instruction stream parameters. | |
1314 | ||
1315 | The multibyte formats (@code{u16}, @code{f64}, etc) take an extra | |
1316 | endianness argument. Only aligned native accesses are currently | |
1317 | fast-pathed in Guile's VM. | |
1318 | ||
1319 | @deffn Instruction bv-u8-ref bv n | |
1320 | @deffnx Instruction bv-s8-ref bv n | |
1321 | @deffnx Instruction bv-u16-native-ref bv n | |
1322 | @deffnx Instruction bv-s16-native-ref bv n | |
1323 | @deffnx Instruction bv-u32-native-ref bv n | |
1324 | @deffnx Instruction bv-s32-native-ref bv n | |
1325 | @deffnx Instruction bv-u64-native-ref bv n | |
1326 | @deffnx Instruction bv-s64-native-ref bv n | |
1327 | @deffnx Instruction bv-f32-native-ref bv n | |
1328 | @deffnx Instruction bv-f64-native-ref bv n | |
1329 | @deffnx Instruction bv-u16-ref bv n endianness | |
1330 | @deffnx Instruction bv-s16-ref bv n endianness | |
1331 | @deffnx Instruction bv-u32-ref bv n endianness | |
1332 | @deffnx Instruction bv-s32-ref bv n endianness | |
1333 | @deffnx Instruction bv-u64-ref bv n endianness | |
1334 | @deffnx Instruction bv-s64-ref bv n endianness | |
1335 | @deffnx Instruction bv-f32-ref bv n endianness | |
1336 | @deffnx Instruction bv-f64-ref bv n endianness | |
1337 | @deffnx Instruction bv-u8-set bv n val | |
1338 | @deffnx Instruction bv-s8-set bv n val | |
1339 | @deffnx Instruction bv-u16-native-set bv n val | |
1340 | @deffnx Instruction bv-s16-native-set bv n val | |
1341 | @deffnx Instruction bv-u32-native-set bv n val | |
1342 | @deffnx Instruction bv-s32-native-set bv n val | |
1343 | @deffnx Instruction bv-u64-native-set bv n val | |
1344 | @deffnx Instruction bv-s64-native-set bv n val | |
1345 | @deffnx Instruction bv-f32-native-set bv n val | |
1346 | @deffnx Instruction bv-f64-native-set bv n val | |
1347 | @deffnx Instruction bv-u16-set bv n val endianness | |
1348 | @deffnx Instruction bv-s16-set bv n val endianness | |
1349 | @deffnx Instruction bv-u32-set bv n val endianness | |
1350 | @deffnx Instruction bv-s32-set bv n val endianness | |
1351 | @deffnx Instruction bv-u64-set bv n val endianness | |
1352 | @deffnx Instruction bv-s64-set bv n val endianness | |
1353 | @deffnx Instruction bv-f32-set bv n val endianness | |
1354 | @deffnx Instruction bv-f64-set bv n val endianness | |
1355 | Inlined implementations of the corresponding bytevector operations. | |
1356 | @end deffn |