Use uc_tolower in number conversion
[bpt/guile.git] / doc / ref / vm.texi
1 @c -*-texinfo-*-
2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 2008,2009
4 @c Free Software Foundation, Inc.
5 @c See the file guile.texi for copying conditions.
6
7 @node A Virtual Machine for Guile
8 @section A Virtual Machine for Guile
9
10 Guile has both an interpreter and a compiler. To a user, the
11 difference is largely transparent---interpreted and compiled
12 procedures can call each other as they please.
13
14 The difference is that the compiler creates and interprets bytecode
15 for a custom virtual machine, instead of interpreting the
16 S-expressions directly. Loading and running compiled code is faster
17 than loading and running source code.
18
19 The virtual machine that does the bytecode interpretation is a part of
20 Guile itself. This section describes the nature of Guile's virtual
21 machine.
22
23 @menu
24 * Why a VM?::
25 * VM Concepts::
26 * Stack Layout::
27 * Variables and the VM::
28 * VM Programs::
29 * Instruction Set::
30 @end menu
31
32 @node Why a VM?
33 @subsection Why a VM?
34
35 @cindex interpreter
36 @cindex evaluator
37 For a long time, Guile only had an interpreter, called the
38 @dfn{evaluator}. Guile's evaluator operates directly on the
39 S-expression representation of Scheme source code.
40
41 But while the evaluator is highly optimized and hand-tuned, and
42 contains some extensive speed trickery (@pxref{Memoization}), it still
43 performs many needless computations during the course of evaluating an
44 expression. For example, application of a function to arguments
45 needlessly conses up the arguments in a list. Evaluation of an
46 expression always has to figure out what the car of the expression is
47 -- a procedure, a memoized form, or something else. All values have to
48 be allocated on the heap. Et cetera.
49
50 The solution to this problem is to compile the higher-level language,
51 Scheme, into a lower-level language for which all of the checks and
52 dispatching have already been done---the code is instead stripped to
53 the bare minimum needed to ``do the job''.
54
55 The question becomes then, what low-level language to choose? There
56 are many options. We could compile to native code directly, but that
57 poses portability problems for Guile, as it is a highly cross-platform
58 project.
59
60 So we want the performance gains that compilation provides, but we
61 also want to maintain the portability benefits of a single code path.
62 The obvious solution is to compile to a virtual machine that is
63 present on all Guile installations.
64
65 The easiest (and most fun) way to depend on a virtual machine is to
66 implement the virtual machine within Guile itself. This way the
67 virtual machine provides what Scheme needs (tail calls, multiple
68 values, @code{call/cc}) and can provide optimized inline instructions
69 for Guile (@code{cons}, @code{struct-ref}, etc.).
70
71 So this is what Guile does. The rest of this section describes that VM
72 that Guile implements, and the compiled procedures that run on it.
73
74 Note that this decision to implement a bytecode compiler does not
75 preclude native compilation. We can compile from bytecode to native
76 code at runtime, or even do ahead of time compilation. More
77 possibilities are discussed in @ref{Extending the Compiler}.
78
79 @node VM Concepts
80 @subsection VM Concepts
81
82 A virtual machine (VM) is a Scheme object. Users may create virtual
83 machines using the standard procedures described later in this manual,
84 but that is usually unnecessary, as Guile ensures that there is one
85 virtual machine per thread. When a VM-compiled procedure is run, Guile
86 looks up the virtual machine for the current thread and executes the
87 procedure using that VM.
88
89 Guile's virtual machine is a stack machine---that is, it has few
90 registers, and the instructions defined in the VM operate by pushing
91 and popping values from a stack.
92
93 Stack memory is exclusive to the virtual machine that owns it. In
94 addition to their stacks, virtual machines also have access to the
95 global memory (modules, global bindings, etc) that is shared among
96 other parts of Guile, including other VMs.
97
98 A VM has generic instructions, such as those to reference local
99 variables, and instructions designed to support Guile's languages --
100 mathematical instructions that support the entire numerical tower, an
101 inlined implementation of @code{cons}, etc.
102
103 The registers that a VM has are as follows:
104
105 @itemize
106 @item ip - Instruction pointer
107 @item sp - Stack pointer
108 @item fp - Frame pointer
109 @end itemize
110
111 In other architectures, the instruction pointer is sometimes called
112 the ``program counter'' (pc). This set of registers is pretty typical
113 for stack machines; their exact meanings in the context of Guile's VM
114 are described in the next section.
115
116 A virtual machine executes by loading a compiled procedure, and
117 executing the object code associated with that procedure. Of course,
118 that procedure may call other procedures, tail-call others, ad
119 infinitum---indeed, within a guile whose modules have all been
120 compiled to object code, one might never leave the virtual machine.
121
122 @c wingo: The following is true, but I don't know in what context to
123 @c describe it. A documentation FIXME.
124
125 @c A VM may have one of three engines: reckless, regular, or debugging.
126 @c Reckless engine is fastest but dangerous. Regular engine is normally
127 @c fail-safe and reasonably fast. Debugging engine is safest and
128 @c functional but very slow.
129
130 @c (Actually we have just a regular and a debugging engine; normally
131 @c we use the latter, it's almost as fast as the ``regular'' engine.)
132
133 @node Stack Layout
134 @subsection Stack Layout
135
136 While not strictly necessary to understand how to work with the VM, it
137 is instructive and sometimes entertaining to consider the structure of
138 the VM stack.
139
140 Logically speaking, a VM stack is composed of ``frames''. Each frame
141 corresponds to the application of one compiled procedure, and contains
142 storage space for arguments, local variables, intermediate values, and
143 some bookkeeping information (such as what to do after the frame
144 computes its value).
145
146 While the compiler is free to do whatever it wants to, as long as the
147 semantics of a computation are preserved, in practice every time you
148 call a function, a new frame is created. (The notable exception of
149 course is the tail call case, @pxref{Tail Calls}.)
150
151 Within a frame, you have the data associated with the function
152 application itself, which is of a fixed size, and the stack space for
153 intermediate values. Sometimes only the former is referred to as the
154 ``frame'', and the latter is the ``stack'', although all pending
155 application frames can have some intermediate computations interleaved
156 on the stack.
157
158 The structure of the fixed part of an application frame is as follows:
159
160 @example
161 Stack
162 | ... |
163 | Intermed. val. 0 | <- fp + bp->nargs + bp->nlocs = SCM_FRAME_UPPER_ADDRESS (fp)
164 +==================+
165 | Local variable 1 |
166 | Local variable 0 | <- fp + bp->nargs
167 | Argument 1 |
168 | Argument 0 | <- fp
169 | Program | <- fp - 1
170 +------------------+
171 | Return address |
172 | MV return address|
173 | Dynamic link | <- fp - 4 = SCM_FRAME_DATA_ADDRESS (fp) = SCM_FRAME_LOWER_ADDRESS (fp)
174 +==================+
175 | |
176 @end example
177
178 In the above drawing, the stack grows upward. The intermediate values
179 stored in the application of this frame are stored above
180 @code{SCM_FRAME_UPPER_ADDRESS (fp)}. @code{bp} refers to the
181 @code{struct scm_objcode} data associated with the program at
182 @code{fp - 1}. @code{nargs} and @code{nlocs} are properties of the
183 compiled procedure, which will be discussed later.
184
185 The individual fields of the frame are as follows:
186
187 @table @asis
188 @item Return address
189 The @code{ip} that was in effect before this program was applied. When
190 we return from this activation frame, we will jump back to this
191 @code{ip}.
192
193 @item MV return address
194 The @code{ip} to return to if this application returns multiple
195 values. For continuations that only accept one value, this value will
196 be @code{NULL}; for others, it will be an @code{ip} that points to a
197 multiple-value return address in the calling code. That code will
198 expect the top value on the stack to be an integer---the number of
199 values being returned---and that below that integer there are the
200 values being returned.
201
202 @item Dynamic link
203 This is the @code{fp} in effect before this program was applied. In
204 effect, this and the return address are the registers that are always
205 ``saved''. The dynamic link links the current frame to the previous
206 frame; computing a stack trace involves traversing these frames.
207
208 @item Local variable @var{n}
209 Lambda-local variables that are all allocated as part of the frame.
210 This makes access to variables very cheap.
211
212 @item Argument @var{n}
213 The calling convention of the VM requires arguments of a function
214 application to be pushed on the stack, and here they are. References
215 to arguments dispatch to these locations on the stack.
216
217 @item Program
218 This is the program being applied. For more information on how
219 programs are implemented, @xref{VM Programs}.
220 @end table
221
222 @node Variables and the VM
223 @subsection Variables and the VM
224
225 Consider the following Scheme code as an example:
226
227 @example
228 (define (foo a)
229 (lambda (b) (list foo a b)))
230 @end example
231
232 Within the lambda expression, @code{foo} is a top-level variable, @code{a} is a
233 lexically captured variable, and @code{b} is a local variable.
234
235 Another way to refer to @code{a} and @code{b} is to say that @code{a}
236 is a ``free'' variable, since it is not defined within the lambda, and
237 @code{b} is a ``bound'' variable. These are the terms used in the
238 @dfn{lambda calculus}, a mathematical notation for describing
239 functions. The lambda calculus is useful because it allows one to
240 prove statements about functions. It is especially good at describing
241 scope relations, and it is for that reason that we mention it here.
242
243 Guile allocates all variables on the stack. When a lexically enclosed
244 procedure with free variables---a @dfn{closure}---is created, it
245 copies those variables its free variable vector. References to free
246 variables are then redirected through the free variable vector.
247
248 If a variable is ever @code{set!}, however, it will need to be
249 heap-allocated instead of stack-allocated, so that different closures
250 that capture the same variable can see the same value. Also, this
251 allows continuations to capture a reference to the variable, instead
252 of to its value at one point in time. For these reasons, @code{set!}
253 variables are allocated in ``boxes''---actually, in variable cells.
254 @xref{Variables}, for more information. References to @code{set!}
255 variables are indirected through the boxes.
256
257 Thus perhaps counterintuitively, what would seem ``closer to the
258 metal'', viz @code{set!}, actually forces an extra memory allocation
259 and indirection.
260
261 Going back to our example, @code{b} may be allocated on the stack, as
262 it is never mutated.
263
264 @code{a} may also be allocated on the stack, as it too is never
265 mutated. Within the enclosed lambda, its value will be copied into
266 (and referenced from) the free variables vector.
267
268 @code{foo} is a top-level variable, because @code{foo} is not
269 lexically bound in this example.
270
271 @node VM Programs
272 @subsection Compiled Procedures are VM Programs
273
274 By default, when you enter in expressions at Guile's REPL, they are
275 first compiled to VM object code, then that VM object code is executed
276 to produce a value. If the expression evaluates to a procedure, the
277 result of this process is a compiled procedure.
278
279 A compiled procedure is a compound object, consisting of its bytecode,
280 a reference to any captured lexical variables, an object array, and
281 some metadata such as the procedure's arity, name, and documentation.
282 You can pick apart these pieces with the accessors in @code{(system vm
283 program)}. @xref{Compiled Procedures}, for a full API reference.
284
285 @cindex object table
286 @cindex object array
287 The object array of a compiled procedure, also known as the
288 @dfn{object table}, holds all Scheme objects whose values are known
289 not to change across invocations of the procedure: constant strings,
290 symbols, etc. The object table of a program is initialized right
291 before a program is loaded with @code{load-program}.
292 @xref{Loading Instructions}, for more information.
293
294 Variable objects are one such type of constant object: when a global
295 binding is defined, a variable object is associated to it and that
296 object will remain constant over time, even if the value bound to it
297 changes. Therefore, toplevel bindings only need to be looked up once.
298 Thereafter, references to the corresponding toplevel variables from
299 within the program are then performed via the @code{toplevel-ref}
300 instruction, which uses the object vector, and are almost as fast as
301 local variable references.
302
303 We can see how these concepts tie together by disassembling the
304 @code{foo} function we defined earlier to see what is going on:
305
306 @smallexample
307 scheme@@(guile-user)> (define (foo a) (lambda (b) (list foo a b)))
308 scheme@@(guile-user)> ,x foo
309 Disassembly of #<program foo (a)>:
310
311 0 (object-ref 1) ;; #<program b7e478b0 at <unknown port>:0:16 (b)>
312 2 (local-ref 0) ;; `a' (arg)
313 4 (vector 0 1) ;; 1 element
314 7 (make-closure)
315 8 (return)
316
317 ----------------------------------------
318 Disassembly of #<program b7e478b0 at <unknown port>:0:16 (b)>:
319
320 0 (toplevel-ref 1) ;; `foo'
321 2 (free-ref 0) ;; (closure variable)
322 4 (local-ref 0) ;; `b' (arg)
323 6 (list 0 3) ;; 3 elements at (unknown file):0:28
324 9 (return)
325 @end smallexample
326
327 At @code{ip} 0, we load up the compiled lambda. @code{Ip} 2 and 4
328 create the free variables vector, and @code{ip} 7 makes the
329 closure---binding code (from the compiled lambda) with data (the
330 free-variable vector). Finally we return the closure.
331
332 The second stanza disassembles the compiled lambda. Toplevel variables
333 are resolved relative to the module that was current when the
334 procedure was created. This lookup occurs lazily, at the first time
335 the variable is actually referenced, and the location of the lookup is
336 cached so that future references are very cheap. @xref{Environment
337 Control Instructions}, for more details.
338
339 Then we see a reference to an external variable, corresponding to
340 @code{a}. The disassembler doesn't have enough information to give a
341 name to that variable, so it just marks it as being a ``closure
342 variable''. Finally we see the reference to @code{b}, then the
343 @code{list} opcode, an inline implementation of the @code{list} scheme
344 routine.
345
346 @node Instruction Set
347 @subsection Instruction Set
348
349 There are about 150 instructions in Guile's virtual machine. These
350 instructions represent atomic units of a program's execution. Ideally,
351 they perform one task without conditional branches, then dispatch to
352 the next instruction in the stream.
353
354 Instructions themselves are one byte long. Some instructions take
355 parameters, which follow the instruction byte in the instruction
356 stream.
357
358 Sometimes the compiler can figure out that it is compiling a special
359 case that can be run more efficiently. So, for example, while Guile
360 offers a generic test-and-branch instruction, it also offers specific
361 instructions for special cases, so that the following cases all have
362 their own test-and-branch instructions:
363
364 @example
365 (if pred then else)
366 (if (not pred) then else)
367 (if (null? l) then else)
368 (if (not (null? l)) then else)
369 @end example
370
371 In addition, some Scheme primitives have their own inline
372 implementations, e.g. @code{cons}, and @code{list}, as we saw in the
373 previous section.
374
375 So Guile's instruction set is a @emph{complete} instruction set, in
376 that it provides the instructions that are suited to the problem, and
377 is not concerned with making a minimal, orthogonal set of
378 instructions. More instructions may be added over time.
379
380 @menu
381 * Environment Control Instructions::
382 * Branch Instructions::
383 * Loading Instructions::
384 * Procedural Instructions::
385 * Data Control Instructions::
386 * Miscellaneous Instructions::
387 * Inlined Scheme Instructions::
388 * Inlined Mathematical Instructions::
389 * Inlined Bytevector Instructions::
390 @end menu
391
392 @node Environment Control Instructions
393 @subsubsection Environment Control Instructions
394
395 These instructions access and mutate the environment of a compiled
396 procedure---the local bindings, the free (captured) bindings, and the
397 toplevel bindings.
398
399 Some of these instructions have @code{long-} variants, the difference
400 being that they take 16-bit arguments, encoded in big-endianness,
401 instead of the normal 8-bit range.
402
403 @deffn Instruction local-ref index
404 @deffnx Instruction long-local-ref index
405 Push onto the stack the value of the local variable located at
406 @var{index} within the current stack frame.
407
408 Note that arguments and local variables are all in one block. Thus the
409 first argument, if any, is at index 0, and local bindings follow the
410 arguments.
411 @end deffn
412
413 @deffn Instruction local-set index
414 @deffnx Instruction long-local-ref index
415 Pop the Scheme object located on top of the stack and make it the new
416 value of the local variable located at @var{index} within the current
417 stack frame.
418 @end deffn
419
420 @deffn Instruction free-ref index
421 Push the value of the captured variable located at position
422 @var{index} within the program's vector of captured variables.
423 @end deffn
424
425 @deffn Instruction free-boxed-ref index
426 @deffnx Instruction free-boxed-set index
427 Get or set a boxed free variable. Note that there is no free-set
428 instruction, as variables that are @code{set!} must be boxed.
429
430 These instructions assume that the value at position @var{index} in
431 the free variables vector is a variable.
432 @end deffn
433
434 @deffn Instruction make-closure
435 Pop a vector and a program object off the stack, in that order, and
436 push a new program object with the given free variables vector. The
437 new program object shares state with the original program.
438
439 At the time of this writing, the space overhead of closures is 4 words
440 per closure.
441 @end deffn
442
443 @deffn Instruction fix-closure index
444 Pop a vector off the stack, and set it as the @var{index}th local
445 variable's free variable vector. The @var{index}th local variable is
446 assumed to be a procedure.
447
448 This instruction is part of a hack for allocating mutually recursive
449 procedures. The hack is to first perform a @code{local-set} for all of
450 the recursive procedures, then fix up the procedures' free variable
451 bindings in place. This allows most @code{letrec}-bound procedures to
452 be allocated unboxed on the stack.
453
454 One could of course do a @code{local-ref}, then @code{make-closure},
455 then @code{local-set}, but this macroinstruction helps to speed up the
456 common case.
457 @end deffn
458
459 @deffn Instruction box index
460 Pop a value off the stack, and set the @var{index}nth local variable
461 to a box containing that value. A shortcut for @code{make-variable}
462 then @code{local-set}, used when binding boxed variables.
463 @end deffn
464
465 @deffn Instruction empty-box index
466 Set the @var{indext}h local variable to a box containing a variable
467 whose value is unbound. Used when compiling some @code{letrec}
468 expressions.
469 @end deffn
470
471 @deffn Instruction toplevel-ref index
472 @deffnx Instruction long-toplevel-ref index
473 Push the value of the toplevel binding whose location is stored in at
474 position @var{index} in the object table.
475
476 Initially, a cell in the object table that is used by
477 @code{toplevel-ref} is initialized to one of two forms. The normal
478 case is that the cell holds a symbol, whose binding will be looked up
479 relative to the module that was current when the current program was
480 created.
481
482 Alternately, the lookup may be performed relative to a particular
483 module, determined at compile-time (e.g. via @code{@@} or
484 @code{@@@@}). In that case, the cell in the object table holds a list:
485 @code{(@var{modname} @var{sym} @var{public?})}. The symbol @var{sym}
486 will be looked up in the module named @var{modname} (a list of
487 symbols). The lookup will be performed against the module's public
488 interface, unless @var{public?} is @code{#f}, which it is for example
489 when compiling @code{@@@@}.
490
491 In any case, if the symbol is unbound, an error is signalled.
492 Otherwise the initial form is replaced with the looked-up variable, an
493 in-place mutation of the object table. This mechanism provides for
494 lazy variable resolution, and an important cached fast-path once the
495 variable has been successfully resolved.
496
497 This instruction pushes the value of the variable onto the stack.
498 @end deffn
499
500 @deffn Instruction toplevel-set index
501 @deffnx Instruction long-toplevel-set index
502 Pop a value off the stack, and set it as the value of the toplevel
503 variable stored at @var{index} in the object table. If the variable
504 has not yet been looked up, we do the lookup as in
505 @code{toplevel-ref}.
506 @end deffn
507
508 @deffn Instruction define
509 Pop a symbol and a value from the stack, in that order. Look up its
510 binding in the current toplevel environment, creating the binding if
511 necessary. Set the variable to the value.
512 @end deffn
513
514 @deffn Instruction link-now
515 Pop a value, @var{x}, from the stack. Look up the binding for @var{x},
516 according to the rules for @code{toplevel-ref}, and push that variable
517 on the stack. If the lookup fails, an error will be signalled.
518
519 This instruction is mostly used when loading programs, because it can
520 do toplevel variable lookups without an object vector.
521 @end deffn
522
523 @deffn Instruction variable-ref
524 Dereference the variable object which is on top of the stack and
525 replace it by the value of the variable it represents.
526 @end deffn
527
528 @deffn Instruction variable-set
529 Pop off two objects from the stack, a variable and a value, and set
530 the variable to the value.
531 @end deffn
532
533 @deffn Instruction make-variable
534 Replace the top object on the stack with a variable containing it.
535 Used in some circumstances when compiling @code{letrec} expressions.
536 @end deffn
537
538 @deffn Instruction object-ref n
539 @deffnx Instruction long-object-ref n
540 Push @var{n}th value from the current program's object vector. The
541 ``long'' variant has a 16-bit index instead of an 8-bit index.
542 @end deffn
543
544 @node Branch Instructions
545 @subsubsection Branch Instructions
546
547 All the conditional branch instructions described below work in the
548 same way:
549
550 @itemize
551 @item They pop off the Scheme object located on the stack and use it as
552 the branch condition;
553 @item If the condition is true, then the instruction pointer is
554 increased by the offset passed as an argument to the branch
555 instruction;
556 @item Program execution proceeds with the next instruction (that is,
557 the one to which the instruction pointer points).
558 @end itemize
559
560 Note that the offset passed to the instruction is encoded on two 8-bit
561 integers which are then combined by the VM as one 16-bit integer. Note
562 also that jump targets in Guile are aligned on 8-byte boundaries, and
563 that the offset refers to the @var{n}th 8-byte boundary, effectively
564 giving Guile a 19-bit relative address space.
565
566 @deffn Instruction br offset
567 Jump to @var{offset}.
568 @end deffn
569
570 @deffn Instruction br-if offset
571 Jump to @var{offset} if the condition on the stack is not false.
572 @end deffn
573
574 @deffn Instruction br-if-not offset
575 Jump to @var{offset} if the condition on the stack is false.
576 @end deffn
577
578 @deffn Instruction br-if-eq offset
579 Jump to @var{offset} if the two objects located on the stack are
580 equal in the sense of @var{eq?}. Note that, for this instruction, the
581 stack pointer is decremented by two Scheme objects instead of only
582 one.
583 @end deffn
584
585 @deffn Instruction br-if-not-eq offset
586 Same as @var{br-if-eq} for non-@code{eq?} objects.
587 @end deffn
588
589 @deffn Instruction br-if-null offset
590 Jump to @var{offset} if the object on the stack is @code{'()}.
591 @end deffn
592
593 @deffn Instruction br-if-not-null offset
594 Jump to @var{offset} if the object on the stack is not @code{'()}.
595 @end deffn
596
597
598 @node Loading Instructions
599 @subsubsection Loading Instructions
600
601 In addition to VM instructions, an instruction stream may contain
602 variable-length data embedded within it. This data is always preceded
603 by special loading instructions, which interpret the data and advance
604 the instruction pointer to the next VM instruction.
605
606 All of these loading instructions have a @code{length} parameter,
607 indicating the size of the embedded data, in bytes. The length itself
608 is encoded in 3 bytes.
609
610 @deffn Instruction load-number length
611 Load an arbitrary number from the instruction stream. The number is
612 embedded in the stream as a string.
613 @end deffn
614 @deffn Instruction load-string length
615 Load a string from the instruction stream. The string is assumed to be
616 encoded in the ``latin1'' locale.
617 @end deffn
618 @deffn Instruction load-wide-string length
619 Load a UTF-32 string from the instruction stream. @var{length} is the
620 length in bytes, not in codepoints
621 @end deffn
622 @deffn Instruction load-symbol length
623 Load a symbol from the instruction stream. The symbol is assumed to be
624 encoded in the ``latin1'' locale. Symbols backed by wide strings may
625 be loaded via @code{load-wide-string} then @code{make-symbol}.
626 @end deffn
627 @deffn Instruction load-array length
628 Load a uniform array from the instruction stream. The shape and type
629 of the array are popped off the stack, in that order.
630 @end deffn
631
632 @deffn Instruction load-program
633 Load bytecode from the instruction stream, and push a compiled
634 procedure.
635
636 This instruction pops one value from the stack: the program's object
637 table, as a vector, or @code{#f} in the case that the program has no
638 object table. A program that does not reference toplevel bindings and
639 does not use @code{object-ref} does not need an object table.
640
641 This instruction is unlike the rest of the loading instructions,
642 because instead of parsing its data, it directly maps the instruction
643 stream onto a C structure, @code{struct scm_objcode}. @xref{Bytecode
644 and Objcode}, for more information.
645
646 The resulting compiled procedure will not have any free variables
647 captured, so it may be loaded only once but used many times to create
648 closures.
649 @end deffn
650
651 @node Procedural Instructions
652 @subsubsection Procedural Instructions
653
654 @deffn Instructions new-frame
655 Push a new frame on the stack, reserving space for the dynamic link,
656 return address, and the multiple-values return address. The frame
657 pointer is not yet updated, because the frame is not yet active -- it
658 has to be patched by a @code{call} instruction to get the return
659 address.
660 @end deffn
661
662 @deffn Instruction call nargs
663 Call the procedure located at @code{sp[-nargs]} with the @var{nargs}
664 arguments located from @code{sp[-nargs + 1]} to @code{sp[0]}.
665
666 This instruction requires that a new frame be pushed on the stack
667 before the procedure, via @code{new-frame}. @xref{Stack Layout}, for
668 more information. It patches up that frame with the current @code{ip}
669 as the return address, then dispatches to the first instruction in the
670 called procedure, relying on the called procedure to return one value
671 to the newly-created continuation. Because the new frame pointer will
672 point to sp[-nargs + 1], the arguments don't have to be shuffled
673 around -- they are already in place.
674
675 For non-compiled procedures (continuations, primitives, and
676 interpreted procedures), @code{call} will pop the frame, procedure,
677 and arguments off the stack, and push the result of calling
678 @code{scm_apply}.
679 @end deffn
680
681 @deffn Instruction goto/args nargs
682 Like @code{call}, but reusing the current continuation. This
683 instruction implements tail calls as required by RnRS.
684
685 For compiled procedures, that means that @code{goto/args} simply
686 shuffles down the procedure and arguments to the current stack frame.
687 The @code{goto/*} instruction family is named as it is because tail
688 calls are equivalent to @code{goto}, along with relabeled variables.
689
690 For non-VM procedures, the result is the same, but the current VM
691 invocation remains on the C stack. True tail calls are not currently
692 possible between compiled and non-compiled procedures.
693 @end deffn
694
695 @deffn Instruction apply nargs
696 @deffnx Instruction goto/apply nargs
697 Like @code{call} and @code{goto/args}, except that the top item on the
698 stack must be a list. The elements of that list are then pushed on the
699 stack and treated as additional arguments, replacing the list itself,
700 then the procedure is invoked as usual.
701 @end deffn
702
703 @deffn Instruction call/nargs
704 @deffnx Instruction goto/nargs
705 These are like @code{call} and @code{goto/args}, except they take the
706 number of arguments from the stack instead of the instruction stream.
707 These instructions are used in the implementation of multiple value
708 returns, where the actual number of values is pushed on the stack.
709 @end deffn
710
711 @deffn Instruction mv-call nargs offset
712 Like @code{call}, except that a multiple-value continuation is created
713 in addition to a single-value continuation.
714
715 The offset (a two-byte value) is an offset within the instruction
716 stream; the multiple-value return address in the new frame
717 (@pxref{Stack Layout}) will be set to the normal return address plus
718 this offset. Instructions at that offset will expect the top value of
719 the stack to be the number of values, and below that values
720 themselves, pushed separately.
721 @end deffn
722
723 @deffn Instruction return
724 Free the program's frame, returning the top value from the stack to
725 the current continuation. (The stack should have exactly one value on
726 it.)
727
728 Specifically, the @code{sp} is decremented to one below the current
729 @code{fp}, the @code{ip} is reset to the current return address, the
730 @code{fp} is reset to the value of the current dynamic link, and then
731 the top item on the stack (formerly the procedure being applied) is
732 set to the returned value.
733 @end deffn
734
735 @deffn Instruction return/values nvalues
736 Return the top @var{nvalues} to the current continuation.
737
738 If the current continuation is a multiple-value continuation,
739 @code{return/values} pushes the number of values on the stack, then
740 returns as in @code{return}, but to the multiple-value return address.
741
742 Otherwise if the current continuation accepts only one value, i.e. the
743 multiple-value return address is @code{NULL}, then we assume the user
744 only wants one value, and we give them the first one. If there are no
745 values, an error is signaled.
746 @end deffn
747
748 @deffn Instruction return/values* nvalues
749 Like a combination of @code{apply} and @code{return/values}, in which
750 the top value on the stack is interpreted as a list of additional
751 values. This is an optimization for the common @code{(apply values
752 ...)} case.
753 @end deffn
754
755 @deffn Instruction truncate-values nbinds nrest
756 Used in multiple-value continuations, this instruction takes the
757 values that are on the stack (including the number-of-values marker)
758 and truncates them for a binding construct.
759
760 For example, a call to @code{(receive (x y . z) (foo) ...)} would,
761 logically speaking, pop off the values returned from @code{(foo)} and
762 push them as three values, corresponding to @code{x}, @code{y}, and
763 @code{z}. In that case, @var{nbinds} would be 3, and @var{nrest} would
764 be 1 (to indicate that one of the bindings was a rest argument).
765
766 Signals an error if there is an insufficient number of values.
767 @end deffn
768
769 @deffn Instruction call/cc
770 @deffnx Instruction goto/cc
771 Capture the current continuation, and then call (or tail-call) the
772 procedure on the top of the stack, with the continuation as the
773 argument.
774
775 @code{call/cc} does not require a @code{new-frame} to be pushed on the
776 stack, as @code{call} does, because it needs to capture the stack
777 before the frame is pushed.
778
779 Both the VM continuation and the C continuation are captured.
780 @end deffn
781
782 @node Data Control Instructions
783 @subsubsection Data Control Instructions
784
785 These instructions push simple immediate values onto the stack, or
786 manipulate lists and vectors on the stack.
787
788 @deffn Instruction make-int8 value
789 Push @var{value}, an 8-bit integer, onto the stack.
790 @end deffn
791
792 @deffn Instruction make-int8:0
793 Push the immediate value @code{0} onto the stack.
794 @end deffn
795
796 @deffn Instruction make-int8:1
797 Push the immediate value @code{1} onto the stack.
798 @end deffn
799
800 @deffn Instruction make-int16 value
801 Push @var{value}, a 16-bit integer, onto the stack.
802 @end deffn
803
804 @deffn Instruction make-uint64 value
805 Push @var{value}, an unsigned 64-bit integer, onto the stack. The
806 value is encoded in 8 bytes, most significant byte first (big-endian).
807 @end deffn
808
809 @deffn Instruction make-int64 value
810 Push @var{value}, a signed 64-bit integer, onto the stack. The value
811 is encoded in 8 bytes, most significant byte first (big-endian), in
812 twos-complement arithmetic.
813 @end deffn
814
815 @deffn Instruction make-false
816 Push @code{#f} onto the stack.
817 @end deffn
818
819 @deffn Instruction make-true
820 Push @code{#t} onto the stack.
821 @end deffn
822
823 @deffn Instruction make-eol
824 Push @code{'()} onto the stack.
825 @end deffn
826
827 @deffn Instruction make-char8 value
828 Push @var{value}, an 8-bit character, onto the stack.
829 @end deffn
830
831 @deffn Instruction make-char32 value
832 Push @var{value}, an 32-bit character, onto the stack. The value is
833 encoded in big-endian order.
834 @end deffn
835
836 @deffn Instruction make-symbol
837 Pops a string off the stack, and pushes a symbol.
838 @end deffn
839
840 @deffn Instruction make-keyword value
841 Pops a symbol off the stack, and pushes a keyword.
842 @end deffn
843
844 @deffn Instruction list n
845 Pops off the top @var{n} values off of the stack, consing them up into
846 a list, then pushes that list on the stack. What was the topmost value
847 will be the last element in the list. @var{n} is a two-byte value,
848 most significant byte first.
849 @end deffn
850
851 @deffn Instruction vector n
852 Create and fill a vector with the top @var{n} values from the stack,
853 popping off those values and pushing on the resulting vector. @var{n}
854 is a two-byte value, like in @code{vector}.
855 @end deffn
856
857 @node Miscellaneous Instructions
858 @subsubsection Miscellaneous Instructions
859
860 @deffn Instruction nop
861 Does nothing! Used for padding other instructions to certain
862 alignments.
863 @end deffn
864
865 @deffn Instruction halt
866 Exits the VM, returning a SCM value. Normally, this instruction is
867 only part of the ``bootstrap program'', a program run when a virtual
868 machine is first entered; compiled Scheme procedures will not contain
869 this instruction.
870
871 If multiple values have been returned, the SCM value will be a
872 multiple-values object (@pxref{Multiple Values}).
873 @end deffn
874
875 @deffn Instruction break
876 Does nothing, but invokes the break hook.
877 @end deffn
878
879 @deffn Instruction drop
880 Pops off the top value from the stack, throwing it away.
881 @end deffn
882
883 @deffn Instruction dup
884 Re-pushes the top value onto the stack.
885 @end deffn
886
887 @deffn Instruction void
888 Pushes ``the unspecified value'' onto the stack.
889 @end deffn
890
891 @node Inlined Scheme Instructions
892 @subsubsection Inlined Scheme Instructions
893
894 The Scheme compiler can recognize the application of standard Scheme
895 procedures. It tries to inline these small operations to avoid the
896 overhead of creating new stack frames.
897
898 Since most of these operations are historically implemented as C
899 primitives, not inlining them would entail constantly calling out from
900 the VM to the interpreter, which has some costs---registers must be
901 saved, the interpreter has to dispatch, called procedures have to do
902 much typechecking, etc. It's much more efficient to inline these
903 operations in the virtual machine itself.
904
905 All of these instructions pop their arguments from the stack and push
906 their results, and take no parameters from the instruction stream.
907 Thus, unlike in the previous sections, these instruction definitions
908 show stack parameters instead of parameters from the instruction
909 stream.
910
911 @deffn Instruction not x
912 @deffnx Instruction not-not x
913 @deffnx Instruction eq? x y
914 @deffnx Instruction not-eq? x y
915 @deffnx Instruction null?
916 @deffnx Instruction not-null?
917 @deffnx Instruction eqv? x y
918 @deffnx Instruction equal? x y
919 @deffnx Instruction pair? x y
920 @deffnx Instruction list? x
921 @deffnx Instruction set-car! pair x
922 @deffnx Instruction set-cdr! pair x
923 @deffnx Instruction slot-ref struct n
924 @deffnx Instruction slot-set struct n x
925 @deffnx Instruction cons x y
926 @deffnx Instruction car x
927 @deffnx Instruction cdr x
928 @deffnx Instruction vector-ref x y
929 @deffnx Instruction vector-set x n y
930 Inlined implementations of their Scheme equivalents.
931 @end deffn
932
933 Note that @code{caddr} and friends compile to a series of @code{car}
934 and @code{cdr} instructions.
935
936 @node Inlined Mathematical Instructions
937 @subsubsection Inlined Mathematical Instructions
938
939 Inlining mathematical operations has the obvious advantage of handling
940 fixnums without function calls or allocations. The trick, of course,
941 is knowing when the result of an operation will be a fixnum, and there
942 might be a couple bugs here.
943
944 More instructions could be added here over time.
945
946 As in the previous section, the definitions below show stack
947 parameters instead of instruction stream parameters.
948
949 @deffn Instruction add x y
950 @deffnx Instruction add1 x
951 @deffnx Instruction sub x y
952 @deffnx Instruction sub1 x
953 @deffnx Instruction mul x y
954 @deffnx Instruction div x y
955 @deffnx Instruction quo x y
956 @deffnx Instruction rem x y
957 @deffnx Instruction mod x y
958 @deffnx Instruction ee? x y
959 @deffnx Instruction lt? x y
960 @deffnx Instruction gt? x y
961 @deffnx Instruction le? x y
962 @deffnx Instruction ge? x y
963 Inlined implementations of the corresponding mathematical operations.
964 @end deffn
965
966 @node Inlined Bytevector Instructions
967 @subsubsection Inlined Bytevector Instructions
968
969 Bytevector operations correspond closely to what the current hardware
970 can do, so it makes sense to inline them to VM instructions, providing
971 a clear path for eventual native compilation. Without this, Scheme
972 programs would need other primitives for accessing raw bytes -- but
973 these primitives are as good as any.
974
975 As in the previous section, the definitions below show stack
976 parameters instead of instruction stream parameters.
977
978 The multibyte formats (@code{u16}, @code{f64}, etc) take an extra
979 endianness argument. Only aligned native accesses are currently
980 fast-pathed in Guile's VM.
981
982 @deffn Instruction bv-u8-ref bv n
983 @deffnx Instruction bv-s8-ref bv n
984 @deffnx Instruction bv-u16-native-ref bv n
985 @deffnx Instruction bv-s16-native-ref bv n
986 @deffnx Instruction bv-u32-native-ref bv n
987 @deffnx Instruction bv-s32-native-ref bv n
988 @deffnx Instruction bv-u64-native-ref bv n
989 @deffnx Instruction bv-s64-native-ref bv n
990 @deffnx Instruction bv-f32-native-ref bv n
991 @deffnx Instruction bv-f64-native-ref bv n
992 @deffnx Instruction bv-u16-ref bv n endianness
993 @deffnx Instruction bv-s16-ref bv n endianness
994 @deffnx Instruction bv-u32-ref bv n endianness
995 @deffnx Instruction bv-s32-ref bv n endianness
996 @deffnx Instruction bv-u64-ref bv n endianness
997 @deffnx Instruction bv-s64-ref bv n endianness
998 @deffnx Instruction bv-f32-ref bv n endianness
999 @deffnx Instruction bv-f64-ref bv n endianness
1000 @deffnx Instruction bv-u8-set bv n val
1001 @deffnx Instruction bv-s8-set bv n val
1002 @deffnx Instruction bv-u16-native-set bv n val
1003 @deffnx Instruction bv-s16-native-set bv n val
1004 @deffnx Instruction bv-u32-native-set bv n val
1005 @deffnx Instruction bv-s32-native-set bv n val
1006 @deffnx Instruction bv-u64-native-set bv n val
1007 @deffnx Instruction bv-s64-native-set bv n val
1008 @deffnx Instruction bv-f32-native-set bv n val
1009 @deffnx Instruction bv-f64-native-set bv n val
1010 @deffnx Instruction bv-u16-set bv n val endianness
1011 @deffnx Instruction bv-s16-set bv n val endianness
1012 @deffnx Instruction bv-u32-set bv n val endianness
1013 @deffnx Instruction bv-s32-set bv n val endianness
1014 @deffnx Instruction bv-u64-set bv n val endianness
1015 @deffnx Instruction bv-s64-set bv n val endianness
1016 @deffnx Instruction bv-f32-set bv n val endianness
1017 @deffnx Instruction bv-f64-set bv n val endianness
1018 Inlined implementations of the corresponding bytevector operations.
1019 @end deffn