Merge commit '4c9e29ec38350a5206aa3e8e72ad4376512ada2b' into vm-check
[bpt/guile.git] / doc / ref / vm.texi
1 @c -*-texinfo-*-
2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 2008,2009
4 @c Free Software Foundation, Inc.
5 @c See the file guile.texi for copying conditions.
6
7 @node A Virtual Machine for Guile
8 @section A Virtual Machine for Guile
9
10 Guile has both an interpreter and a compiler. To a user, the
11 difference is largely transparent---interpreted and compiled
12 procedures can call each other as they please.
13
14 The difference is that the compiler creates and interprets bytecode
15 for a custom virtual machine, instead of interpreting the
16 S-expressions directly. Running compiled code is faster than running
17 interpreted code.
18
19 The virtual machine that does the bytecode interpretation is a part of
20 Guile itself. This section describes the nature of Guile's virtual
21 machine.
22
23 @menu
24 * Why a VM?::
25 * VM Concepts::
26 * Stack Layout::
27 * Variables and the VM::
28 * VM Programs::
29 * Instruction Set::
30 @end menu
31
32 @node Why a VM?
33 @subsection Why a VM?
34
35 @cindex interpreter
36 @cindex evaluator
37 For a long time, Guile only had an interpreter, called the
38 @dfn{evaluator}. Guile's evaluator operates directly on the
39 S-expression representation of Scheme source code.
40
41 But while the evaluator is highly optimized and hand-tuned, and
42 contains some extensive speed trickery (@pxref{Memoization}), it still
43 performs many needless computations during the course of evaluating an
44 expression. For example, application of a function to arguments
45 needlessly conses up the arguments in a list. Evaluation of an
46 expression always has to figure out what the car of the expression is
47 -- a procedure, a memoized form, or something else. All values have to
48 be allocated on the heap. Et cetera.
49
50 The solution to this problem is to compile the higher-level language,
51 Scheme, into a lower-level language for which all of the checks and
52 dispatching have already been done---the code is instead stripped to
53 the bare minimum needed to ``do the job''.
54
55 The question becomes then, what low-level language to choose? There
56 are many options. We could compile to native code directly, but that
57 poses portability problems for Guile, as it is a highly cross-platform
58 project.
59
60 So we want the performance gains that compilation provides, but we
61 also want to maintain the portability benefits of a single code path.
62 The obvious solution is to compile to a virtual machine that is
63 present on all Guile installations.
64
65 The easiest (and most fun) way to depend on a virtual machine is to
66 implement the virtual machine within Guile itself. This way the
67 virtual machine provides what Scheme needs (tail calls, multiple
68 values, @code{call/cc}) and can provide optimized inline instructions
69 for Guile (@code{cons}, @code{struct-ref}, etc.).
70
71 So this is what Guile does. The rest of this section describes that VM
72 that Guile implements, and the compiled procedures that run on it.
73
74 Note that this decision to implement a bytecode compiler does not
75 preclude native compilation. We can compile from bytecode to native
76 code at runtime, or even do ahead of time compilation. More
77 possibilities are discussed in @ref{Extending the Compiler}.
78
79 @node VM Concepts
80 @subsection VM Concepts
81
82 A virtual machine (VM) is a Scheme object. Users may create virtual
83 machines using the standard procedures described later in this manual,
84 but that is usually unnecessary, as Guile ensures that there is one
85 virtual machine per thread. When a VM-compiled procedure is run, Guile
86 looks up the virtual machine for the current thread and executes the
87 procedure using that VM.
88
89 Guile's virtual machine is a stack machine---that is, it has few
90 registers, and the instructions defined in the VM operate by pushing
91 and popping values from a stack.
92
93 Stack memory is exclusive to the virtual machine that owns it. In
94 addition to their stacks, virtual machines also have access to the
95 global memory (modules, global bindings, etc) that is shared among
96 other parts of Guile, including other VMs.
97
98 A VM has generic instructions, such as those to reference local
99 variables, and instructions designed to support Guile's languages --
100 mathematical instructions that support the entire numerical tower, an
101 inlined implementation of @code{cons}, etc.
102
103 The registers that a VM has are as follows:
104
105 @itemize
106 @item ip - Instruction pointer
107 @item sp - Stack pointer
108 @item fp - Frame pointer
109 @end itemize
110
111 In other architectures, the instruction pointer is sometimes called
112 the ``program counter'' (pc). This set of registers is pretty typical
113 for stack machines; their exact meanings in the context of Guile's VM
114 is described in the next section.
115
116 A virtual machine executes by loading a compiled procedure, and
117 executing the object code associated with that procedure. Of course,
118 that procedure may call other procedures, tail-call others, ad
119 infinitum---indeed, within a guile whose modules have all been
120 compiled to object code, one might never leave the virtual machine.
121
122 @c wingo: I wish the following were true, but currently we just use
123 @c the one engine. This kind of thing is possible tho.
124
125 @c A VM may have one of three engines: reckless, regular, or debugging.
126 @c Reckless engine is fastest but dangerous. Regular engine is normally
127 @c fail-safe and reasonably fast. Debugging engine is safest and
128 @c functional but very slow.
129
130 @node Stack Layout
131 @subsection Stack Layout
132
133 While not strictly necessary to understand how to work with the VM, it
134 is instructive and sometimes entertaining to consider the struture of
135 the VM stack.
136
137 Logically speaking, a VM stack is composed of ``frames''. Each frame
138 corresponds to the application of one compiled procedure, and contains
139 storage space for arguments, local variables, intermediate values, and
140 some bookkeeping information (such as what to do after the frame
141 computes its value).
142
143 While the compiler is free to do whatever it wants to, as long as the
144 semantics of a computation are preserved, in practice every time you
145 call a function, a new frame is created. (The notable exception of
146 course is the tail call case, @pxref{Tail Calls}.)
147
148 Within a frame, you have the data associated with the function
149 application itself, which is of a fixed size, and the stack space for
150 intermediate values. Sometimes only the former is referred to as the
151 ``frame'', and the latter is the ``stack'', although all pending
152 application frames can have some intermediate computations interleaved
153 on the stack.
154
155 The structure of the fixed part of an application frame is as follows:
156
157 @example
158 Stack
159 | | <- fp + bp->nargs + bp->nlocs + 4
160 +------------------+ = SCM_FRAME_UPPER_ADDRESS (fp)
161 | Return address |
162 | MV return address|
163 | Dynamic link |
164 | External link | <- fp + bp->nargs + bp->nlocs
165 | Local variable 1 | = SCM_FRAME_DATA_ADDRESS (fp)
166 | Local variable 0 | <- fp + bp->nargs
167 | Argument 1 |
168 | Argument 0 | <- fp
169 | Program | <- fp - 1
170 +------------------+ = SCM_FRAME_LOWER_ADDRESS (fp)
171 | |
172 @end example
173
174 In the above drawing, the stack grows upward. The intermediate values
175 stored in the application of this frame are stored above
176 @code{SCM_FRAME_UPPER_ADDRESS (fp)}. @code{bp} refers to the
177 @code{struct scm_program*} data associated with the program at
178 @code{fp - 1}. @code{nargs} and @code{nlocs} are properties of the
179 compiled procedure, which will be discussed later.
180
181 The individual fields of the frame are as follows:
182
183 @table @asis
184 @item Return address
185 The @code{ip} that was in effect before this program was applied. When
186 we return from this activation frame, we will jump back to this
187 @code{ip}.
188
189 @item MV return address
190 The @code{ip} to return to if this application returns multiple
191 values. For continuations that only accept one value, this value will
192 be @code{NULL}; for others, it will be an @code{ip} that points to a
193 multiple-value return address in the calling code. That code will
194 expect the top value on the stack to be an integer---the number of
195 values being returned---and that below that integer there are the
196 values being returned.
197
198 @item Dynamic link
199 This is the @code{fp} in effect before this program was applied. In
200 effect, this and the return address are the registers that are always
201 ``saved''.
202
203 @item External link
204 This field is a reference to the list of heap-allocated variables
205 associated with this frame. For a discussion of heap versus stack
206 allocation, @xref{Variables and the VM}.
207
208 @item Local variable @var{n}
209 Lambda-local variables that are allocated on the stack are all
210 allocated as part of the frame. This makes access to non-captured,
211 non-mutated variables very cheap.
212
213 @item Argument @var{n}
214 The calling convention of the VM requires arguments of a function
215 application to be pushed on the stack, and here they are. Normally
216 references to arguments dispatch to these locations on the stack.
217 However if an argument has to be stored on the heap, it will be copied
218 from its initial value here onto a location in the heap, and
219 thereafter only referenced on the heap.
220
221 @item Program
222 This is the program being applied. For more information on how
223 programs are implemented, @xref{VM Programs}.
224 @end table
225
226 @node Variables and the VM
227 @subsection Variables and the VM
228
229 Let's think about the following Scheme code as an example:
230
231 @example
232 (define (foo a)
233 (lambda (b) (list foo a b)))
234 @end example
235
236 Within the lambda expression, "foo" is a top-level variable, "a" is a
237 lexically captured variable, and "b" is a local variable.
238
239 That is to say: @code{b} may safely be allocated on the stack, as
240 there is no enclosed procedure that references it, nor is it ever
241 mutated.
242
243 @code{a}, on the other hand, is referenced by an enclosed procedure,
244 that of the lambda. Thus it must be allocated on the heap, as it may
245 (and will) outlive the dynamic extent of the invocation of @code{foo}.
246
247 @code{foo} is a toplevel variable, as mandated by Scheme's semantics:
248
249 @example
250 (define proc (foo 'bar)) ; assuming prev. definition of @code{foo}
251 (define foo 42) ; redefinition
252 (proc 'baz)
253 @result{} (42 bar baz)
254 @end example
255
256 Note that variables that are mutated (via @code{set!}) must be
257 allocated on the heap, even if they are local variables. This is
258 because any called subprocedure might capture the continuation, which
259 would need to capture locations instead of values. Thus perhaps
260 counterintuitively, what would seem ``closer to the metal'', viz
261 @code{set!}, actually forces heap allocation instead of stack
262 allocation.
263
264 @node VM Programs
265 @subsection Compiled Procedures are VM Programs
266
267 By default, when you enter in expressions at Guile's REPL, they are
268 first compiled to VM object code, then that VM object code is executed
269 to produce a value. If the expression evaluates to a procedure, the
270 result of this process is a compiled procedure.
271
272 A compiled procedure is a compound object, consisting of its bytecode,
273 a reference to any captured lexical variables, an object array, and
274 some metadata such as the procedure's arity, name, and documentation.
275 You can pick apart these pieces with the accessors in @code{(system vm
276 program)}. @xref{Compiled Procedures}, for a full API reference.
277
278 @cindex object table
279 The object array of a compiled procedure, also known as the
280 @dfn{object table}, holds all Scheme objects whose values are known
281 not to change across invocations of the procedure: constant strings,
282 symbols, etc. The object table of a program is initialized right
283 before a program is loaded with @code{load-program}.
284 @xref{Loading Instructions}, for more information.
285
286 Variable objects are one such type of constant object: when a global
287 binding is defined, a variable object is associated to it and that
288 object will remain constant over time, even if the value bound to it
289 changes. Therefore, toplevel bindings only need to be looked up once.
290 Thereafter, references to the corresponding toplevel variables from
291 within the program are then performed via the @code{toplevel-ref}
292 instruction, which uses the object vector, and are almost as fast as
293 local variable references.
294
295 We can see how these concepts tie together by disassembling the
296 @code{foo} function to see what is going on:
297
298 @smallexample
299 scheme@@(guile-user)> (define (foo a) (lambda (b) (list foo a b)))
300 scheme@@(guile-user)> ,x foo
301 Disassembly of #<program foo (a)>:
302
303 Bytecode:
304
305 0 (local-ref 0) ;; `a' (arg)
306 2 (external-set 0) ;; `a' (arg)
307 4 (object-ref 0) ;; #<program #(0 28 #f) (b)>
308 6 (make-closure) at (unknown file):0:16
309 7 (return)
310
311 ----------------------------------------
312 Disassembly of #<program #(0 28 #f) (b)>:
313
314 Bytecode:
315
316 0 (toplevel-ref 0) ;; `list'
317 2 (toplevel-ref 1) ;; `foo'
318 4 (external-ref 0) ;; (closure variable)
319 6 (local-ref 0) ;; `b' (arg)
320 8 (goto/args 3) at (unknown file):0:28
321 @end smallexample
322
323 At @code{ip} 0 and 2, we do the copy from argument to heap for
324 @code{a}. @code{Ip} 4 loads up the compiled lambda, and then at
325 @code{ip} 6 we make a closure---binding code (from the compiled
326 lambda) with data (the heap-allocated variables). Finally we return
327 the closure.
328
329 The second stanza disassembles the compiled lambda. Toplevel variables
330 are resolved relative to the module that was current when the
331 procedure was created. This lookup occurs lazily, at the first time
332 the variable is actually referenced, and the location of the lookup is
333 cached so that future references are very cheap. @xref{Environment
334 Control Instructions}, for more details.
335
336 Then we see a reference to an external variable, corresponding to
337 @code{a}. The disassembler doesn't have enough information to give a
338 name to that variable, so it just marks it as being a ``closure
339 variable''. Finally we see the reference to @code{b}, then a tail call
340 (@code{goto/args}) with three arguments.
341
342 @node Instruction Set
343 @subsection Instruction Set
344
345 There are about 100 instructions in Guile's virtual machine. These
346 instructions represent atomic units of a program's execution. Ideally,
347 they perform one task without conditional branches, then dispatch to
348 the next instruction in the stream.
349
350 Instructions themselves are one byte long. Some instructions take
351 parameters, which follow the instruction byte in the instruction
352 stream.
353
354 Sometimes the compiler can figure out that it is compiling a special
355 case that can be run more efficiently. So, for example, while Guile
356 offers a generic test-and-branch instruction, it also offers specific
357 instructions for special cases, so that the following cases all have
358 their own test-and-branch instructions:
359
360 @example
361 (if pred then else)
362 (if (not pred) then else)
363 (if (null? l) then else)
364 (if (not (null? l)) then else)
365 @end example
366
367 In addition, some Scheme primitives have their own inline
368 implementations, e.g. @code{cons}.
369
370 So Guile's instruction set is a @emph{complete} instruction set, in
371 that it provides the instructions that are suited to the problem, and
372 is not concerned with making a minimal, orthogonal set of
373 instructions. More instructions may be added over time.
374
375 @menu
376 * Environment Control Instructions::
377 * Branch Instructions::
378 * Loading Instructions::
379 * Procedural Instructions::
380 * Data Control Instructions::
381 * Miscellaneous Instructions::
382 * Inlined Scheme Instructions::
383 * Inlined Mathematical Instructions::
384 @end menu
385
386 @node Environment Control Instructions
387 @subsubsection Environment Control Instructions
388
389 These instructions access and mutate the environment of a compiled
390 procedure---the local bindings, the ``external'' bindings, and the
391 toplevel bindings.
392
393 @deffn Instruction local-ref index
394 Push onto the stack the value of the local variable located at
395 @var{index} within the current stack frame.
396
397 Note that arguments and local variables are all in one block. Thus the
398 first argument, if any, is at index 0, and local bindings follow the
399 arguments.
400 @end deffn
401
402 @deffn Instruction local-set index
403 Pop the Scheme object located on top of the stack and make it the new
404 value of the local variable located at @var{index} within the current
405 stack frame.
406 @end deffn
407
408 @deffn Instruction external-ref index
409 Push the value of the closure variable located at position
410 @var{index} within the program's list of external variables.
411 @end deffn
412
413 @deffn Instruction external-set index
414 Pop the Scheme object located on top of the stack and make it the new
415 value of the closure variable located at @var{index} within the
416 program's list of external variables.
417 @end deffn
418
419 The external variable lookup algorithm should probably be made more
420 efficient in the future via addressing by frame and index. Currently,
421 external variables are all consed onto a list, which results in O(N)
422 lookup time.
423
424 @deffn Instruction externals
425 Pushes the current list of external variables onto the stack. This
426 instruction is used in the implementation of
427 @code{compile-time-environment}. @xref{The Scheme Compiler}.
428 @end deffn
429
430 @deffn Instruction toplevel-ref index
431 Push the value of the toplevel binding whose location is stored in at
432 position @var{index} in the object table.
433
434 Initially, a cell in the object table that is used by
435 @code{toplevel-ref} is initialized to one of two forms. The normal
436 case is that the cell holds a symbol, whose binding will be looked up
437 relative to the module that was current when the current program was
438 created.
439
440 Alternately, the lookup may be performed relative to a particular
441 module, determined at compile-time (e.g. via @code{@@} or
442 @code{@@@@}). In that case, the cell in the object table holds a list:
443 @code{(@var{modname} @var{sym} @var{interface?})}. The symbol
444 @var{sym} will be looked up in the module named @var{modname} (a list
445 of symbols). The lookup will be performed against the module's public
446 interface, unless @var{interface?} is @code{#f}, which it is for
447 example when compiling @code{@@@@}.
448
449 In any case, if the symbol is unbound, an error is signalled.
450 Otherwise the initial form is replaced with the looked-up variable, an
451 in-place mutation of the object table. This mechanism provides for
452 lazy variable resolution, and an important cached fast-path once the
453 variable has been successfully resolved.
454
455 This instruction pushes the value of the variable onto the stack.
456 @end deffn
457
458 @deffn Instruction toplevel-ref index
459 Pop a value off the stack, and set it as the value of the toplevel
460 variable stored at @var{index} in the object table. If the variable
461 has not yet been looked up, we do the lookup as in
462 @code{toplevel-ref}.
463 @end deffn
464
465 @deffn Instruction link-now
466 Pop a value, @var{x}, from the stack. Look up the binding for @var{x},
467 according to the rules for @code{toplevel-ref}, and push that variable
468 on the stack. If the lookup fails, an error will be signalled.
469
470 This instruction is mostly used when loading programs, because it can
471 do toplevel variable lookups without an object vector.
472 @end deffn
473
474 @deffn Instruction variable-ref
475 Dereference the variable object which is on top of the stack and
476 replace it by the value of the variable it represents.
477 @end deffn
478
479 @deffn Instruction variable-set
480 Pop off two objects from the stack, a variable and a value, and set
481 the variable to the value.
482 @end deffn
483
484 @deffn Instruction object-ref n
485 Push @var{n}th value from the current program's object vector.
486 @end deffn
487
488 @node Branch Instructions
489 @subsubsection Branch Instructions
490
491 All the conditional branch instructions described below work in the
492 same way:
493
494 @itemize
495 @item They pop off the Scheme object located on the stack and use it as
496 the branch condition;
497 @item If the condition is true, then the instruction pointer is
498 increased by the offset passed as an argument to the branch
499 instruction;
500 @item Program execution proceeds with the next instruction (that is,
501 the one to which the instruction pointer points).
502 @end itemize
503
504 Note that the offset passed to the instruction is encoded on two 8-bit
505 integers which are then combined by the VM as one 16-bit integer.
506
507 @deffn Instruction br offset
508 Jump to @var{offset}.
509 @end deffn
510
511 @deffn Instruction br-if offset
512 Jump to @var{offset} if the condition on the stack is not false.
513 @end deffn
514
515 @deffn Instruction br-if-not offset
516 Jump to @var{offset} if the condition on the stack is false.
517 @end deffn
518
519 @deffn Instruction br-if-eq offset
520 Jump to @var{offset} if the two objects located on the stack are
521 equal in the sense of @var{eq?}. Note that, for this instruction, the
522 stack pointer is decremented by two Scheme objects instead of only
523 one.
524 @end deffn
525
526 @deffn Instruction br-if-not-eq offset
527 Same as @var{br-if-eq} for non-@code{eq?} objects.
528 @end deffn
529
530 @deffn Instruction br-if-null offset
531 Jump to @var{offset} if the object on the stack is @code{'()}.
532 @end deffn
533
534 @deffn Instruction br-if-not-null offset
535 Jump to @var{offset} if the object on the stack is not @code{'()}.
536 @end deffn
537
538
539 @node Loading Instructions
540 @subsubsection Loading Instructions
541
542 In addition to VM instructions, an instruction stream may contain
543 variable-length data embedded within it. This data is always preceded
544 by special loading instructions, which interpret the data and advance
545 the instruction pointer to the next VM instruction.
546
547 All of these loading instructions have a @code{length} parameter,
548 indicating the size of the embedded data, in bytes. The length itself
549 may be encoded in 1, 2, or 4 bytes.
550
551 @deffn Instruction load-integer length
552 @deffnx Instruction load-unsigned-integer length
553 Load a 32-bit integer (respectively unsigned integer) from the
554 instruction stream.
555 @end deffn
556 @deffn Instruction load-number length
557 Load an arbitrary number from the instruction stream. The number is
558 embedded in the stream as a string.
559 @end deffn
560 @deffn Instruction load-string length
561 Load a string from the instruction stream.
562 @end deffn
563 @deffn Instruction load-symbol length
564 Load a symbol from the instruction stream.
565 @end deffn
566 @deffn Instruction load-keyword length
567 Load a keyword from the instruction stream.
568 @end deffn
569
570 @deffn Instruction define length
571 Load a symbol from the instruction stream, and look up its binding in
572 the current toplevel environment, creating the binding if necessary.
573 Push the variable corresponding to the binding.
574 @end deffn
575
576 @deffn Instruction load-program length
577 Load bytecode from the instruction stream, and push a compiled
578 procedure. This instruction pops the following values from the stack:
579
580 @itemize
581 @item Optionally, a thunk, which when called should return metadata
582 associated with this program---for example its name, the names of its
583 arguments, its documentation string, debugging information, etc.
584
585 Normally, this thunk its itself a compiled procedure (with no
586 metadata). Metadata is represented this way so that the initial load
587 of a procedure is fast: the VM just mmap's the thunk and goes. The
588 symbols and pairs associated with the metadata are only created if the
589 user asks for them.
590
591 For information on the format of the thunk's return value,
592 @xref{Compiled Procedures}.
593 @item Optionally, the program's object table, as a vector.
594
595 A program that does not reference toplevel bindings and does not use
596 @code{object-ref} does not need an object table.
597 @item Finally, either one immediate integer or four immediate integers
598 representing the arity of the program.
599
600 In the four-fixnum case, the values are respectively the number of
601 arguments taken by the function (@var{nargs}), the number of @dfn{rest
602 arguments} (@var{nrest}, 0 or 1), the number of local variables
603 (@var{nlocs}) and the number of external variables (@var{nexts})
604 (@pxref{Environment Control Instructions}).
605
606 The common single-fixnum case represents all of these values within a
607 16-bit bitmask.
608 @end itemize
609
610 The resulting compiled procedure will not have any ``external''
611 variables captured, so it will be loaded only once but may be used
612 many times to create closures.
613 @end deffn
614
615 Finally, while this instruction is not strictly a ``loading''
616 instruction, it's useful to wind up the @code{load-program} discussion
617 here:
618
619 @deffn Instruction make-closure
620 Pop the program object from the stack, capture the current set of
621 ``external'' variables, and assign those external variables to a copy
622 of the program. Push the new program object, which shares state with
623 the original program. Also captures the current module.
624 @end deffn
625
626 @node Procedural Instructions
627 @subsubsection Procedural Instructions
628
629 @deffn Instruction return
630 Free the program's frame, returning the top value from the stack to
631 the current continuation. (The stack should have exactly one value on
632 it.)
633
634 Specifically, the @code{sp} is decremented to one below the current
635 @code{fp}, the @code{ip} is reset to the current return address, the
636 @code{fp} is reset to the value of the current dynamic link, and then
637 the top item on the stack (formerly the procedure being applied) is
638 set to the returned value.
639 @end deffn
640
641 @deffn Instruction call nargs
642 Call the procedure located at @code{sp[-nargs]} with the @var{nargs}
643 arguments located from @code{sp[0]} to @code{sp[-nargs + 1]}.
644
645 For non-compiled procedures (continuations, primitives, and
646 interpreted procedures), @code{call} will pop the procedure and
647 arguments off the stack, and push the result of calling
648 @code{scm_apply}.
649
650 For compiled procedures, this instruction sets up a new stack frame,
651 as described in @ref{Stack Layout}, and then dispatches to the first
652 instruction in the called procedure, relying on the called procedure
653 to return one value to the newly-created continuation.
654 @end deffn
655
656 @deffn Instruction goto/args nargs
657 Like @code{call}, but reusing the current continuation. This
658 instruction implements tail calling as required by RnRS.
659
660 For compiled procedures, that means that @code{goto/args} reuses the
661 current frame instead of building a new one. The @code{goto/*}
662 instruction family is named as it is because tail calls are equivalent
663 to @code{goto}, along with relabeled variables.
664
665 For non-VM procedures, the result is the same, but the current VM
666 invocation remains on the C stack. True tail calls are not currently
667 possible between compiled and non-compiled procedures.
668 @end deffn
669
670 @deffn Instruction apply nargs
671 @deffnx Instruction goto/apply nargs
672 Like @code{call} and @code{goto/args}, except that the top item on the
673 stack must be a list. The elements of that list are then pushed on the
674 stack and treated as additional arguments, replacing the list itself,
675 then the procedure is invoked as usual.
676 @end deffn
677
678 @deffn Instruction call/nargs
679 @deffnx Instruction goto/nargs
680 These are like @code{call} and @code{goto/args}, except they take the
681 number of arguments from the stack instead of the instruction stream.
682 These instructions are used in the implementation of multiple value
683 returns, where the actual number of values is pushed on the stack.
684 @end deffn
685
686 @deffn Instruction call/cc
687 @deffnx Instruction goto/cc
688 Capture the current continuation, and then call (or tail-call) the
689 procedure on the top of the stack, with the continuation as the
690 argument.
691
692 Both the VM continuation and the C continuation are captured.
693 @end deffn
694
695 @deffn Instruction mv-call nargs offset
696 Like @code{call}, except that a multiple-value continuation is created
697 in addition to a single-value continuation.
698
699 The offset (a two-byte value) is an offset within the instruction
700 stream; the multiple-value return address in the new frame
701 (@pxref{Stack Layout}) will be set to the normal return address plus
702 this offset. Instructions at that offset will expect the top value of
703 the stack to be the number of values, and below that values
704 themselves, pushed separately.
705 @end deffn
706
707 @deffn Instruction return/values nvalues
708 Return the top @var{nvalues} to the current continuation.
709
710 If the current continuation is a multiple-value continuation,
711 @code{return/values} pushes the number of values on the stack, then
712 returns as in @code{return}, but to the multiple-value return address.
713
714 Otherwise if the current continuation accepts only one value, i.e. the
715 multiple-value return address is @code{NULL}, then we assume the user
716 only wants one value, and we give them the first one. If there are no
717 values, an error is signaled.
718 @end deffn
719
720 @deffn Instruction return/values* nvalues
721 Like a combination of @code{apply} and @code{return/values}, in which
722 the top value on the stack is interpreted as a list of additional
723 values. This is an optimization for the common @code{(apply values
724 ...)} case.
725 @end deffn
726
727 @deffn Instruction truncate-values nbinds nrest
728 Used in multiple-value continuations, this instruction takes the
729 values that are on the stack (including the number-of-value marker)
730 and truncates them for a binding construct.
731
732 For example, a call to @code{(receive (x y . z) (foo) ...)} would,
733 logically speaking, pop off the values returned from @code{(foo)} and
734 push them as three values, corresponding to @code{x}, @code{y}, and
735 @code{z}. In that case, @var{nbinds} would be 3, and @var{nrest} would
736 be 1 (to indicate that one of the bindings was a rest arguments).
737
738 Signals an error if there is an insufficient number of values.
739 @end deffn
740
741 @node Data Control Instructions
742 @subsubsection Data Control Instructions
743
744 These instructions push simple immediate values onto the stack, or
745 manipulate lists and vectors on the stack.
746
747 @deffn Instruction make-int8 value
748 Push @var{value}, an 8-bit integer, onto the stack.
749 @end deffn
750
751 @deffn Instruction make-int8:0
752 Push the immediate value @code{0} onto the stack.
753 @end deffn
754
755 @deffn Instruction make-int8:1
756 Push the immediate value @code{1} onto the stack.
757 @end deffn
758
759 @deffn Instruction make-int16 value
760 Push @var{value}, a 16-bit integer, onto the stack.
761 @end deffn
762
763 @deffn Instruction make-false
764 Push @code{#f} onto the stack.
765 @end deffn
766
767 @deffn Instruction make-true
768 Push @code{#t} onto the stack.
769 @end deffn
770
771 @deffn Instruction make-eol
772 Push @code{'()} onto the stack.
773 @end deffn
774
775 @deffn Instruction make-char8 value
776 Push @var{value}, an 8-bit character, onto the stack.
777 @end deffn
778
779 @deffn Instruction list n
780 Pops off the top @var{n} values off of the stack, consing them up into
781 a list, then pushes that list on the stack. What was the topmost value
782 will be the last element in the list.
783 @end deffn
784
785 @deffn Instruction vector n
786 Create and fill a vector with the top @var{n} values from the stack,
787 popping off those values and pushing on the resulting vector.
788 @end deffn
789
790 @deffn Instruction mark
791 Pushes a special value onto the stack that other stack instructions
792 like @code{list-mark} can use.
793 @end deffn
794
795 @deffn Instruction list-mark
796 Create a list from values from the stack, as in @code{list}, but
797 instead of knowing beforehand how many there will be, keep going until
798 we see a @code{mark} value.
799 @end deffn
800
801 @deffn Instruction cons-mark
802 As the scheme procedure @code{cons*} is to the scheme procedure
803 @code{list}, so the instruction @code{cons-mark} is to the instruction
804 @code{list-mark}.
805 @end deffn
806
807 @deffn Instruction vector-mark
808 Like @code{list-mark}, but makes a vector instead of a list.
809 @end deffn
810
811 @deffn Instruction list-break
812 The opposite of @code{list}: pops a value, which should be a list, and
813 pushes its elements on the stack.
814 @end deffn
815
816 @node Miscellaneous Instructions
817 @subsubsection Miscellaneous Instructions
818
819 @deffn Instruction nop
820 Does nothing!
821 @end deffn
822
823 @deffn Instruction halt
824 Exits the VM, returning a SCM value. Normally, this instruction is
825 only part of the ``bootstrap program'', a program run when a virtual
826 machine is first entered; compiled Scheme procedures will not contain
827 this instruction.
828
829 If multiple values have been returned, the SCM value will be a
830 multiple-values object (@pxref{Multiple Values}).
831 @end deffn
832
833 @deffn Instruction break
834 Does nothing, but invokes the break hook.
835 @end deffn
836
837 @deffn Instruction drop
838 Pops off the top value from the stack, throwing it away.
839 @end deffn
840
841 @deffn Instruction dup
842 Re-pushes the top value onto the stack.
843 @end deffn
844
845 @deffn Instruction void
846 Pushes ``the unspecified value'' onto the stack.
847 @end deffn
848
849 @node Inlined Scheme Instructions
850 @subsubsection Inlined Scheme Instructions
851
852 The Scheme compiler can recognize the application of standard Scheme
853 procedures, or unbound variables that look like they are bound to
854 standard Scheme procedures. It tries to inline these small operations
855 to avoid the overhead of creating new stack frames.
856
857 Since most of these operations are historically implemented as C
858 primitives, not inlining them would entail constantly calling out from
859 the VM to the interpreter, which has some costs---registers must be
860 saved, the interpreter has to dispatch, called procedures have to do
861 much typechecking, etc. It's much more efficient to inline these
862 operations in the virtual machine itself.
863
864 All of these instructions pop their arguments from the stack and push
865 their results, and take no parameters from the instruction stream.
866 Thus, unlike in the previous sections, these instruction definitions
867 show stack parameters instead of parameters from the instruction
868 stream.
869
870 @deffn Instruction not x
871 @deffnx Instruction not-not x
872 @deffnx Instruction eq? x y
873 @deffnx Instruction not-eq? x y
874 @deffnx Instruction null?
875 @deffnx Instruction not-null?
876 @deffnx Instruction eqv? x y
877 @deffnx Instruction equal? x y
878 @deffnx Instruction pair? x y
879 @deffnx Instruction list? x y
880 @deffnx Instruction set-car! pair x
881 @deffnx Instruction set-cdr! pair x
882 @deffnx Instruction slot-ref struct n
883 @deffnx Instruction slot-set struct n x
884 @deffnx Instruction cons x
885 @deffnx Instruction car x
886 @deffnx Instruction cdr x
887 Inlined implementations of their Scheme equivalents.
888 @end deffn
889
890 Note that @code{caddr} and friends compile to a series of @code{car}
891 and @code{cdr} instructions.
892
893 @node Inlined Mathematical Instructions
894 @subsubsection Inlined Mathematical Instructions
895
896 Inlining mathematical operations has the obvious advantage of handling
897 fixnums without function calls or allocations. The trick, of course,
898 is knowing when the result of an operation will be a fixnum, and there
899 might be a couple bugs here.
900
901 More instructions could be added here over time.
902
903 As in the previous section, the definitions below show stack
904 parameters instead of instruction stream parameters.
905
906 @deffn Instruction add x y
907 @deffnx Instruction sub x y
908 @deffnx Instruction mul x y
909 @deffnx Instruction div x y
910 @deffnx Instruction quo x y
911 @deffnx Instruction rem x y
912 @deffnx Instruction mod x y
913 @deffnx Instruction ee? x y
914 @deffnx Instruction lt? x y
915 @deffnx Instruction gt? x y
916 @deffnx Instruction le? x y
917 @deffnx Instruction ge? x y
918 Inlined implementations of the corresponding mathematical operations.
919 @end deffn