doc/ref/vm.texi

   1 @c -*-texinfo-*-
   2 @c This is part of the GNU Guile Reference Manual.
   3 @c Copyright (C)  2008,2009
   4 @c   Free Software Foundation, Inc.
   5 @c See the file guile.texi for copying conditions.
   6
   7 @node A Virtual Machine for Guile
   8 @section A Virtual Machine for Guile
   9
  10 Guile has both an interpreter and a compiler. To a user, the
  11 difference is largely transparent---interpreted and compiled
  12 procedures can call each other as they please.
  13
  14 The difference is that the compiler creates and interprets bytecode
  15 for a custom virtual machine, instead of interpreting the
  16 S-expressions directly. Running compiled code is faster than running
  17 interpreted code.
  18
  19 The virtual machine that does the bytecode interpretation is a part of
  20 Guile itself. This section describes the nature of Guile's virtual
  21 machine.
  22
  23 @menu
  24 * Why a VM?::
  25 * VM Concepts::
  26 * Stack Layout::
  27 * Variables and the VM::
  28 * VM Programs::
  29 * Instruction Set::
  30 @end menu
  31
  32 @node Why a VM?
  33 @subsection Why a VM?
  34
  35 @cindex interpreter
  36 @cindex evaluator
  37 For a long time, Guile only had an interpreter, called the
  38 @dfn{evaluator}. Guile's evaluator operates directly on the
  39 S-expression representation of Scheme source code.
  40
  41 But while the evaluator is highly optimized and hand-tuned, and
  42 contains some extensive speed trickery (@pxref{Memoization}), it still
  43 performs many needless computations during the course of evaluating an
  44 expression. For example, application of a function to arguments
  45 needlessly conses up the arguments in a list. Evaluation of an
  46 expression always has to figure out what the car of the expression is
  47 -- a procedure, a memoized form, or something else. All values have to
  48 be allocated on the heap. Et cetera.
  49
  50 The solution to this problem is to compile the higher-level language,
  51 Scheme, into a lower-level language for which all of the checks and
  52 dispatching have already been done---the code is instead stripped to
  53 the bare minimum needed to ``do the job''.
  54
  55 The question becomes then, what low-level language to choose? There
  56 are many options. We could compile to native code directly, but that
  57 poses portability problems for Guile, as it is a highly cross-platform
  58 project.
  59
  60 So we want the performance gains that compilation provides, but we
  61 also want to maintain the portability benefits of a single code path.
  62 The obvious solution is to compile to a virtual machine that is
  63 present on all Guile installations.
  64
  65 The easiest (and most fun) way to depend on a virtual machine is to
  66 implement the virtual machine within Guile itself. This way the
  67 virtual machine provides what Scheme needs (tail calls, multiple
  68 values, @code{call/cc}) and can provide optimized inline instructions
  69 for Guile (@code{cons}, @code{struct-ref}, etc.).
  70
  71 So this is what Guile does. The rest of this section describes that VM
  72 that Guile implements, and the compiled procedures that run on it.
  73
  74 Note that this decision to implement a bytecode compiler does not
  75 preclude native compilation. We can compile from bytecode to native
  76 code at runtime, or even do ahead of time compilation. More
  77 possibilities are discussed in @ref{Extending the Compiler}.
  78
  79 @node VM Concepts
  80 @subsection VM Concepts
  81
  82 A virtual machine (VM) is a Scheme object. Users may create virtual
  83 machines using the standard procedures described later in this manual,
  84 but that is usually unnecessary, as Guile ensures that there is one
  85 virtual machine per thread. When a VM-compiled procedure is run, Guile
  86 looks up the virtual machine for the current thread and executes the
  87 procedure using that VM.
  88
  89 Guile's virtual machine is a stack machine---that is, it has few
  90 registers, and the instructions defined in the VM operate by pushing
  91 and popping values from a stack.
  92
  93 Stack memory is exclusive to the virtual machine that owns it. In
  94 addition to their stacks, virtual machines also have access to the
  95 global memory (modules, global bindings, etc) that is shared among
  96 other parts of Guile, including other VMs.
  97
  98 A VM has generic instructions, such as those to reference local
  99 variables, and instructions designed to support Guile's languages --
 100 mathematical instructions that support the entire numerical tower, an
 101 inlined implementation of @code{cons}, etc.
 102
 103 The registers that a VM has are as follows:
 104
 105 @itemize
 106 @item ip - Instruction pointer
 107 @item sp - Stack pointer
 108 @item fp - Frame pointer
 109 @end itemize
 110
 111 In other architectures, the instruction pointer is sometimes called
 112 the ``program counter'' (pc). This set of registers is pretty typical
 113 for stack machines; their exact meanings in the context of Guile's VM
 114 are described in the next section.
 115
 116 A virtual machine executes by loading a compiled procedure, and
 117 executing the object code associated with that procedure. Of course,
 118 that procedure may call other procedures, tail-call others, ad
 119 infinitum---indeed, within a guile whose modules have all been
 120 compiled to object code, one might never leave the virtual machine.
 121
 122 @c wingo: The following is true, but I don't know in what context to
 123 @c describe it. A documentation FIXME.
 124
 125 @c A VM may have one of three engines: reckless, regular, or debugging.
 126 @c Reckless engine is fastest but dangerous.  Regular engine is normally
 127 @c fail-safe and reasonably fast.  Debugging engine is safest and
 128 @c functional but very slow.
 129
 130 @c (Actually we have just a regular and a debugging engine; normally
 131 @c we use the latter, it's almost as fast as the ``regular'' engine.)
 132
 133 @node Stack Layout
 134 @subsection Stack Layout
 135
 136 While not strictly necessary to understand how to work with the VM, it
 137 is instructive and sometimes entertaining to consider the struture of
 138 the VM stack.
 139
 140 Logically speaking, a VM stack is composed of ``frames''. Each frame
 141 corresponds to the application of one compiled procedure, and contains
 142 storage space for arguments, local variables, intermediate values, and
 143 some bookkeeping information (such as what to do after the frame
 144 computes its value).
 145
 146 While the compiler is free to do whatever it wants to, as long as the
 147 semantics of a computation are preserved, in practice every time you
 148 call a function, a new frame is created. (The notable exception of
 149 course is the tail call case, @pxref{Tail Calls}.)
 150
 151 Within a frame, you have the data associated with the function
 152 application itself, which is of a fixed size, and the stack space for
 153 intermediate values. Sometimes only the former is referred to as the
 154 ``frame'', and the latter is the ``stack'', although all pending
 155 application frames can have some intermediate computations interleaved
 156 on the stack.
 157
 158 The structure of the fixed part of an application frame is as follows:
 159
 160 @example
 161              Stack
 162    |                  | <- fp + bp->nargs + bp->nlocs + 4
 163    +------------------+    = SCM_FRAME_UPPER_ADDRESS (fp)
 164    | Return address   |
 165    | MV return address|
 166    | Dynamic link     |
 167    | External link    | <- fp + bp->nargs + bp->nlocs
 168    | Local variable 1 |    = SCM_FRAME_DATA_ADDRESS (fp)
 169    | Local variable 0 | <- fp + bp->nargs
 170    | Argument 1       |
 171    | Argument 0       | <- fp
 172    | Program          | <- fp - 1
 173    +------------------+    = SCM_FRAME_LOWER_ADDRESS (fp)
 174    |                  |
 175 @end example
 176
 177 In the above drawing, the stack grows upward. The intermediate values
 178 stored in the application of this frame are stored above
 179 @code{SCM_FRAME_UPPER_ADDRESS (fp)}. @code{bp} refers to the
 180 @code{struct scm_objcode} data associated with the program at
 181 @code{fp - 1}. @code{nargs} and @code{nlocs} are properties of the
 182 compiled procedure, which will be discussed later.
 183
 184 The individual fields of the frame are as follows:
 185
 186 @table @asis
 187 @item Return address
 188 The @code{ip} that was in effect before this program was applied. When
 189 we return from this activation frame, we will jump back to this
 190 @code{ip}.
 191
 192 @item MV return address
 193 The @code{ip} to return to if this application returns multiple
 194 values. For continuations that only accept one value, this value will
 195 be @code{NULL}; for others, it will be an @code{ip} that points to a
 196 multiple-value return address in the calling code. That code will
 197 expect the top value on the stack to be an integer---the number of
 198 values being returned---and that below that integer there are the
 199 values being returned.
 200
 201 @item Dynamic link
 202 This is the @code{fp} in effect before this program was applied. In
 203 effect, this and the return address are the registers that are always
 204 ``saved''.
 205
 206 @item External link
 207 This field is a reference to the list of heap-allocated variables
 208 associated with this frame. For a discussion of heap versus stack
 209 allocation, @xref{Variables and the VM}.
 210
 211 @item Local variable @var{n}
 212 Lambda-local variables that are allocated on the stack are all
 213 allocated as part of the frame. This makes access to non-captured,
 214 non-mutated variables very cheap.
 215
 216 @item Argument @var{n}
 217 The calling convention of the VM requires arguments of a function
 218 application to be pushed on the stack, and here they are. Normally
 219 references to arguments dispatch to these locations on the stack.
 220 However if an argument has to be stored on the heap, it will be copied
 221 from its initial value here onto a location in the heap, and
 222 thereafter only referenced on the heap.
 223
 224 @item Program
 225 This is the program being applied. For more information on how
 226 programs are implemented, @xref{VM Programs}.
 227 @end table
 228
 229 @node Variables and the VM
 230 @subsection Variables and the VM
 231
 232 Consider the following Scheme code as an example:
 233
 234 @example
 235   (define (foo a)
 236     (lambda (b) (list foo a b)))
 237 @end example
 238
 239 Within the lambda expression, "foo" is a top-level variable, "a" is a
 240 lexically captured variable, and "b" is a local variable.
 241
 242 @code{b} may safely be allocated on the stack, as there is no enclosed
 243 procedure that references it, nor is it ever mutated.
 244
 245 @code{a}, on the other hand, is referenced by an enclosed procedure,
 246 that of the lambda. Thus it must be allocated on the heap, as it may
 247 (and will) outlive the dynamic extent of the invocation of @code{foo}.
 248
 249 @code{foo} is a top-level variable, because it names the procedure
 250 @code{foo}, which is here defined at the top-level.
 251
 252 Note that variables that are mutated (via @code{set!}) must be
 253 allocated on the heap, even if they are local variables. This is
 254 because any called subprocedure might capture the continuation, which
 255 would need to capture locations instead of values. Thus perhaps
 256 counterintuitively, what would seem ``closer to the metal'', viz
 257 @code{set!}, actually forces heap allocation instead of stack
 258 allocation.
 259
 260 @node VM Programs
 261 @subsection Compiled Procedures are VM Programs
 262
 263 By default, when you enter in expressions at Guile's REPL, they are
 264 first compiled to VM object code, then that VM object code is executed
 265 to produce a value. If the expression evaluates to a procedure, the
 266 result of this process is a compiled procedure.
 267
 268 A compiled procedure is a compound object, consisting of its bytecode,
 269 a reference to any captured lexical variables, an object array, and
 270 some metadata such as the procedure's arity, name, and documentation.
 271 You can pick apart these pieces with the accessors in @code{(system vm
 272 program)}. @xref{Compiled Procedures}, for a full API reference.
 273
 274 @cindex object table
 275 @cindex object array
 276 The object array of a compiled procedure, also known as the
 277 @dfn{object table}, holds all Scheme objects whose values are known
 278 not to change across invocations of the procedure: constant strings,
 279 symbols, etc. The object table of a program is initialized right
 280 before a program is loaded with @code{load-program}.
 281 @xref{Loading Instructions}, for more information.
 282
 283 Variable objects are one such type of constant object: when a global
 284 binding is defined, a variable object is associated to it and that
 285 object will remain constant over time, even if the value bound to it
 286 changes. Therefore, toplevel bindings only need to be looked up once.
 287 Thereafter, references to the corresponding toplevel variables from
 288 within the program are then performed via the @code{toplevel-ref}
 289 instruction, which uses the object vector, and are almost as fast as
 290 local variable references.
 291
 292 We can see how these concepts tie together by disassembling the
 293 @code{foo} function we defined earlier to see what is going on:
 294
 295 @smallexample
 296 scheme@@(guile-user)> (define (foo a) (lambda (b) (list foo a b)))
 297 scheme@@(guile-user)> ,x foo
 298 Disassembly of #<program foo (a)>:
 299
 300    0    (local-ref 0)           ;; `a' (arg)
 301    2    (external-set 0)        ;; `a' (arg)
 302    4    (object-ref 1)          ;; #<program b70d2910 at <unknown port>:0:16 (b)>
 303    6    (make-closure)
 304    7    (return)
 305
 306 ----------------------------------------
 307 Disassembly of #<program b70d2910 at <unknown port>:0:16 (b)>:
 308
 309    0    (toplevel-ref 1)        ;; `foo'
 310    2    (external-ref 0)        ;; (closure variable)
 311    4    (local-ref 0)           ;; `b' (arg)
 312    6    (list 0 3)              ;; 3 elements         at (unknown file):0:28
 313    9    (return)
 314 @end smallexample
 315
 316 At @code{ip} 0 and 2, we do the copy from argument to heap for
 317 @code{a}. @code{Ip} 4 loads up the compiled lambda, and then at
 318 @code{ip} 6 we make a closure---binding code (from the compiled
 319 lambda) with data (the heap-allocated variables). Finally we return
 320 the closure.
 321
 322 The second stanza disassembles the compiled lambda. Toplevel variables
 323 are resolved relative to the module that was current when the
 324 procedure was created. This lookup occurs lazily, at the first time
 325 the variable is actually referenced, and the location of the lookup is
 326 cached so that future references are very cheap. @xref{Environment
 327 Control Instructions}, for more details.
 328
 329 Then we see a reference to an external variable, corresponding to
 330 @code{a}. The disassembler doesn't have enough information to give a
 331 name to that variable, so it just marks it as being a ``closure
 332 variable''. Finally we see the reference to @code{b}, then the
 333 @code{list} opcode, an inline implementation of the @code{list} scheme
 334 routine.
 335
 336 @node Instruction Set
 337 @subsection Instruction Set
 338
 339 There are about 100 instructions in Guile's virtual machine. These
 340 instructions represent atomic units of a program's execution. Ideally,
 341 they perform one task without conditional branches, then dispatch to
 342 the next instruction in the stream.
 343
 344 Instructions themselves are one byte long. Some instructions take
 345 parameters, which follow the instruction byte in the instruction
 346 stream.
 347
 348 Sometimes the compiler can figure out that it is compiling a special
 349 case that can be run more efficiently. So, for example, while Guile
 350 offers a generic test-and-branch instruction, it also offers specific
 351 instructions for special cases, so that the following cases all have
 352 their own test-and-branch instructions:
 353
 354 @example
 355 (if pred then else)
 356 (if (not pred) then else)
 357 (if (null? l) then else)
 358 (if (not (null? l)) then else)
 359 @end example
 360
 361 In addition, some Scheme primitives have their own inline
 362 implementations, e.g. @code{cons}, and @code{list}, as we saw in the
 363 previous section.
 364
 365 So Guile's instruction set is a @emph{complete} instruction set, in
 366 that it provides the instructions that are suited to the problem, and
 367 is not concerned with making a minimal, orthogonal set of
 368 instructions. More instructions may be added over time.
 369
 370 @menu
 371 * Environment Control Instructions::
 372 * Branch Instructions::
 373 * Loading Instructions::
 374 * Procedural Instructions::
 375 * Data Control Instructions::
 376 * Miscellaneous Instructions::
 377 * Inlined Scheme Instructions::
 378 * Inlined Mathematical Instructions::
 379 @end menu
 380
 381 @node Environment Control Instructions
 382 @subsubsection Environment Control Instructions
 383
 384 These instructions access and mutate the environment of a compiled
 385 procedure---the local bindings, the ``external'' bindings, and the
 386 toplevel bindings.
 387
 388 @deffn Instruction local-ref index
 389 Push onto the stack the value of the local variable located at
 390 @var{index} within the current stack frame.
 391
 392 Note that arguments and local variables are all in one block. Thus the
 393 first argument, if any, is at index 0, and local bindings follow the
 394 arguments.
 395 @end deffn
 396
 397 @deffn Instruction local-set index
 398 Pop the Scheme object located on top of the stack and make it the new
 399 value of the local variable located at @var{index} within the current
 400 stack frame.
 401 @end deffn
 402
 403 @deffn Instruction external-ref index
 404 Push the value of the closure variable located at position
 405 @var{index} within the program's list of external variables.
 406 @end deffn
 407
 408 @deffn Instruction external-set index
 409 Pop the Scheme object located on top of the stack and make it the new
 410 value of the closure variable located at @var{index} within the
 411 program's list of external variables.
 412 @end deffn
 413
 414 The external variable lookup algorithm should probably be made more
 415 efficient in the future via addressing by frame and index. Currently,
 416 external variables are all consed onto a list, which results in O(N)
 417 lookup time.
 418
 419 @deffn Instruction toplevel-ref index
 420 @deffnx Instruction long-toplevel-ref index
 421 Push the value of the toplevel binding whose location is stored in at
 422 position @var{index} in the object table.
 423
 424 Initially, a cell in the object table that is used by
 425 @code{toplevel-ref} is initialized to one of two forms. The normal
 426 case is that the cell holds a symbol, whose binding will be looked up
 427 relative to the module that was current when the current program was
 428 created.
 429
 430 Alternately, the lookup may be performed relative to a particular
 431 module, determined at compile-time (e.g. via @code{@@} or
 432 @code{@@@@}). In that case, the cell in the object table holds a list:
 433 @code{(@var{modname} @var{sym} @var{public?})}. The symbol @var{sym}
 434 will be looked up in the module named @var{modname} (a list of
 435 symbols). The lookup will be performed against the module's public
 436 interface, unless @var{public?} is @code{#f}, which it is for example
 437 when compiling @code{@@@@}.
 438
 439 In any case, if the symbol is unbound, an error is signalled.
 440 Otherwise the initial form is replaced with the looked-up variable, an
 441 in-place mutation of the object table. This mechanism provides for
 442 lazy variable resolution, and an important cached fast-path once the
 443 variable has been successfully resolved.
 444
 445 The ``long'' variant has a 16-bit index instead of an 8-bit index,
 446 with the most significant byte first.
 447
 448 This instruction pushes the value of the variable onto the stack.
 449 @end deffn
 450
 451 @deffn Instruction toplevel-set index
 452 @deffnx Instruction long-toplevel-set index
 453 Pop a value off the stack, and set it as the value of the toplevel
 454 variable stored at @var{index} in the object table. If the variable
 455 has not yet been looked up, we do the lookup as in
 456 @code{toplevel-ref}. The ``long'' variant has a 16-bit index instead
 457 of an 8-bit index.
 458 @end deffn
 459
 460 @deffn Instruction link-now
 461 Pop a value, @var{x}, from the stack. Look up the binding for @var{x},
 462 according to the rules for @code{toplevel-ref}, and push that variable
 463 on the stack. If the lookup fails, an error will be signalled.
 464
 465 This instruction is mostly used when loading programs, because it can
 466 do toplevel variable lookups without an object vector.
 467 @end deffn
 468
 469 @deffn Instruction variable-ref
 470 Dereference the variable object which is on top of the stack and
 471 replace it by the value of the variable it represents.
 472 @end deffn
 473
 474 @deffn Instruction variable-set
 475 Pop off two objects from the stack, a variable and a value, and set
 476 the variable to the value.
 477 @end deffn
 478
 479 @deffn Instruction object-ref n
 480 @deffnx Instruction long-object-ref n
 481 Push @var{n}th value from the current program's object vector. The
 482 ``long'' variant has a 16-bit index instead of an 8-bit index.
 483 @end deffn
 484
 485 @node Branch Instructions
 486 @subsubsection Branch Instructions
 487
 488 All the conditional branch instructions described below work in the
 489 same way:
 490
 491 @itemize
 492 @item They pop off the Scheme object located on the stack and use it as
 493 the branch condition;
 494 @item If the condition is true, then the instruction pointer is
 495 increased by the offset passed as an argument to the branch
 496 instruction;
 497 @item Program execution proceeds with the next instruction (that is,
 498 the one to which the instruction pointer points).
 499 @end itemize
 500
 501 Note that the offset passed to the instruction is encoded on two 8-bit
 502 integers which are then combined by the VM as one 16-bit integer.
 503
 504 @deffn Instruction br offset
 505 Jump to @var{offset}.
 506 @end deffn
 507
 508 @deffn Instruction br-if offset
 509 Jump to @var{offset} if the condition on the stack is not false.
 510 @end deffn
 511
 512 @deffn Instruction br-if-not offset
 513 Jump to @var{offset} if the condition on the stack is false.
 514 @end deffn
 515
 516 @deffn Instruction br-if-eq offset
 517 Jump to @var{offset} if the two objects located on the stack are
 518 equal in the sense of @var{eq?}.  Note that, for this instruction, the
 519 stack pointer is decremented by two Scheme objects instead of only
 520 one.
 521 @end deffn
 522
 523 @deffn Instruction br-if-not-eq offset
 524 Same as @var{br-if-eq} for non-@code{eq?} objects.
 525 @end deffn
 526
 527 @deffn Instruction br-if-null offset
 528 Jump to @var{offset} if the object on the stack is @code{'()}.
 529 @end deffn
 530
 531 @deffn Instruction br-if-not-null offset
 532 Jump to @var{offset} if the object on the stack is not @code{'()}.
 533 @end deffn
 534
 535
 536 @node Loading Instructions
 537 @subsubsection Loading Instructions
 538
 539 In addition to VM instructions, an instruction stream may contain
 540 variable-length data embedded within it. This data is always preceded
 541 by special loading instructions, which interpret the data and advance
 542 the instruction pointer to the next VM instruction.
 543
 544 All of these loading instructions have a @code{length} parameter,
 545 indicating the size of the embedded data, in bytes. The length itself
 546 is encoded in 3 bytes.
 547
 548 @deffn Instruction load-number length
 549 Load an arbitrary number from the instruction stream. The number is
 550 embedded in the stream as a string.
 551 @end deffn
 552 @deffn Instruction load-string length
 553 Load a string from the instruction stream.
 554 @end deffn
 555 @deffn Instruction load-symbol length
 556 Load a symbol from the instruction stream.
 557 @end deffn
 558 @deffn Instruction load-keyword length
 559 Load a keyword from the instruction stream.
 560 @end deffn
 561
 562 @deffn Instruction define length
 563 Load a symbol from the instruction stream, and look up its binding in
 564 the current toplevel environment, creating the binding if necessary.
 565 Push the variable corresponding to the binding.
 566 @end deffn
 567
 568 @deffn Instruction load-program
 569 Load bytecode from the instruction stream, and push a compiled
 570 procedure.
 571
 572 This instruction pops one value from the stack: the program's object
 573 table, as a vector, or @code{#f} in the case that the program has no
 574 object table. A program that does not reference toplevel bindings and
 575 does not use @code{object-ref} does not need an object table.
 576
 577 This instruction is unlike the rest of the loading instructions,
 578 because instead of parsing its data, it directly maps the instruction
 579 stream onto a C structure, @code{struct scm_objcode}. @xref{Bytecode
 580 and Objcode}, for more information.
 581
 582 The resulting compiled procedure will not have any ``external''
 583 variables captured, so it may be loaded only once but used many times
 584 to create closures.
 585 @end deffn
 586
 587 Finally, while this instruction is not strictly a ``loading''
 588 instruction, it's useful to wind up the @code{load-program} discussion
 589 here:
 590
 591 @deffn Instruction make-closure
 592 Pop the program object from the stack, capture the current set of
 593 ``external'' variables, and assign those external variables to a copy
 594 of the program. Push the new program object, which shares state with
 595 the original program.
 596
 597 At the time of this writing, the space overhead of closures is 4 words
 598 per closure.
 599 @end deffn
 600
 601 @node Procedural Instructions
 602 @subsubsection Procedural Instructions
 603
 604 @deffn Instruction return
 605 Free the program's frame, returning the top value from the stack to
 606 the current continuation. (The stack should have exactly one value on
 607 it.)
 608
 609 Specifically, the @code{sp} is decremented to one below the current
 610 @code{fp}, the @code{ip} is reset to the current return address, the
 611 @code{fp} is reset to the value of the current dynamic link, and then
 612 the top item on the stack (formerly the procedure being applied) is
 613 set to the returned value.
 614 @end deffn
 615
 616 @deffn Instruction call nargs
 617 Call the procedure located at @code{sp[-nargs]} with the @var{nargs}
 618 arguments located from @code{sp[-nargs + 1]} to @code{sp[0]}.
 619
 620 For compiled procedures, this instruction sets up a new stack frame,
 621 as described in @ref{Stack Layout}, and then dispatches to the first
 622 instruction in the called procedure, relying on the called procedure
 623 to return one value to the newly-created continuation. Because the new
 624 frame pointer will point to sp[-nargs + 1], the arguments don't have
 625 to be shuffled around -- they are already in place.
 626
 627 For non-compiled procedures (continuations, primitives, and
 628 interpreted procedures), @code{call} will pop the procedure and
 629 arguments off the stack, and push the result of calling
 630 @code{scm_apply}.
 631 @end deffn
 632
 633 @deffn Instruction goto/args nargs
 634 Like @code{call}, but reusing the current continuation. This
 635 instruction implements tail calls as required by RnRS.
 636
 637 For compiled procedures, that means that @code{goto/args} reuses the
 638 current frame instead of building a new one. The @code{goto/*}
 639 instruction family is named as it is because tail calls are equivalent
 640 to @code{goto}, along with relabeled variables.
 641
 642 For non-VM procedures, the result is the same, but the current VM
 643 invocation remains on the C stack. True tail calls are not currently
 644 possible between compiled and non-compiled procedures.
 645 @end deffn
 646
 647 @deffn Instruction apply nargs
 648 @deffnx Instruction goto/apply nargs
 649 Like @code{call} and @code{goto/args}, except that the top item on the
 650 stack must be a list. The elements of that list are then pushed on the
 651 stack and treated as additional arguments, replacing the list itself,
 652 then the procedure is invoked as usual.
 653 @end deffn
 654
 655 @deffn Instruction call/nargs
 656 @deffnx Instruction goto/nargs
 657 These are like @code{call} and @code{goto/args}, except they take the
 658 number of arguments from the stack instead of the instruction stream.
 659 These instructions are used in the implementation of multiple value
 660 returns, where the actual number of values is pushed on the stack.
 661 @end deffn
 662
 663 @deffn Instruction call/cc
 664 @deffnx Instruction goto/cc
 665 Capture the current continuation, and then call (or tail-call) the
 666 procedure on the top of the stack, with the continuation as the
 667 argument.
 668
 669 Both the VM continuation and the C continuation are captured.
 670 @end deffn
 671
 672 @deffn Instruction mv-call nargs offset
 673 Like @code{call}, except that a multiple-value continuation is created
 674 in addition to a single-value continuation.
 675
 676 The offset (a two-byte value) is an offset within the instruction
 677 stream; the multiple-value return address in the new frame
 678 (@pxref{Stack Layout}) will be set to the normal return address plus
 679 this offset. Instructions at that offset will expect the top value of
 680 the stack to be the number of values, and below that values
 681 themselves, pushed separately.
 682 @end deffn
 683
 684 @deffn Instruction return/values nvalues
 685 Return the top @var{nvalues} to the current continuation.
 686
 687 If the current continuation is a multiple-value continuation,
 688 @code{return/values} pushes the number of values on the stack, then
 689 returns as in @code{return}, but to the multiple-value return address.
 690
 691 Otherwise if the current continuation accepts only one value, i.e. the
 692 multiple-value return address is @code{NULL}, then we assume the user
 693 only wants one value, and we give them the first one. If there are no
 694 values, an error is signaled.
 695 @end deffn
 696
 697 @deffn Instruction return/values* nvalues
 698 Like a combination of @code{apply} and @code{return/values}, in which
 699 the top value on the stack is interpreted as a list of additional
 700 values. This is an optimization for the common @code{(apply values
 701 ...)} case.
 702 @end deffn
 703
 704 @deffn Instruction truncate-values nbinds nrest
 705 Used in multiple-value continuations, this instruction takes the
 706 values that are on the stack (including the number-of-values marker)
 707 and truncates them for a binding construct.
 708
 709 For example, a call to @code{(receive (x y . z) (foo) ...)} would,
 710 logically speaking, pop off the values returned from @code{(foo)} and
 711 push them as three values, corresponding to @code{x}, @code{y}, and
 712 @code{z}. In that case, @var{nbinds} would be 3, and @var{nrest} would
 713 be 1 (to indicate that one of the bindings was a rest argument).
 714
 715 Signals an error if there is an insufficient number of values.
 716 @end deffn
 717
 718 @node Data Control Instructions
 719 @subsubsection Data Control Instructions
 720
 721 These instructions push simple immediate values onto the stack, or
 722 manipulate lists and vectors on the stack.
 723
 724 @deffn Instruction make-int8 value
 725 Push @var{value}, an 8-bit integer, onto the stack.
 726 @end deffn
 727
 728 @deffn Instruction make-int8:0
 729 Push the immediate value @code{0} onto the stack.
 730 @end deffn
 731
 732 @deffn Instruction make-int8:1
 733 Push the immediate value @code{1} onto the stack.
 734 @end deffn
 735
 736 @deffn Instruction make-int16 value
 737 Push @var{value}, a 16-bit integer, onto the stack.
 738 @end deffn
 739
 740 @deffn Instruction make-uint64 value
 741 Push @var{value}, an unsigned 64-bit integer, onto the stack. The
 742 value is encoded in 8 bytes, most significant byte first (big-endian).
 743 @end deffn
 744
 745 @deffn Instruction make-int64 value
 746 Push @var{value}, a signed 64-bit integer, onto the stack. The value
 747 is encoded in 8 bytes, most significant byte first (big-endian), in
 748 twos-complement arithmetic.
 749 @end deffn
 750
 751 @deffn Instruction make-false
 752 Push @code{#f} onto the stack.
 753 @end deffn
 754
 755 @deffn Instruction make-true
 756 Push @code{#t} onto the stack.
 757 @end deffn
 758
 759 @deffn Instruction make-eol
 760 Push @code{'()} onto the stack.
 761 @end deffn
 762
 763 @deffn Instruction make-char8 value
 764 Push @var{value}, an 8-bit character, onto the stack.
 765 @end deffn
 766
 767 @deffn Instruction list n
 768 Pops off the top @var{n} values off of the stack, consing them up into
 769 a list, then pushes that list on the stack. What was the topmost value
 770 will be the last element in the list. @var{n} is a two-byte value,
 771 most significant byte first.
 772 @end deffn
 773
 774 @deffn Instruction vector n
 775 Create and fill a vector with the top @var{n} values from the stack,
 776 popping off those values and pushing on the resulting vector. @var{n}
 777 is a two-byte value, like in @code{vector}.
 778 @end deffn
 779
 780 @deffn Instruction mark
 781 Pushes a special value onto the stack that other stack instructions
 782 like @code{list-mark} can use.
 783 @end deffn
 784
 785 @deffn Instruction list-mark
 786 Create a list from values from the stack, as in @code{list}, but
 787 instead of knowing beforehand how many there will be, keep going until
 788 we see a @code{mark} value.
 789 @end deffn
 790
 791 @deffn Instruction cons-mark
 792 As the scheme procedure @code{cons*} is to the scheme procedure
 793 @code{list}, so the instruction @code{cons-mark} is to the instruction
 794 @code{list-mark}.
 795 @end deffn
 796
 797 @deffn Instruction vector-mark
 798 Like @code{list-mark}, but makes a vector instead of a list.
 799 @end deffn
 800
 801 @deffn Instruction list-break
 802 The opposite of @code{list}: pops a value, which should be a list, and
 803 pushes its elements on the stack.
 804 @end deffn
 805
 806 @node Miscellaneous Instructions
 807 @subsubsection Miscellaneous Instructions
 808
 809 @deffn Instruction nop
 810 Does nothing!
 811 @end deffn
 812
 813 @deffn Instruction halt
 814 Exits the VM, returning a SCM value. Normally, this instruction is
 815 only part of the ``bootstrap program'', a program run when a virtual
 816 machine is first entered; compiled Scheme procedures will not contain
 817 this instruction.
 818
 819 If multiple values have been returned, the SCM value will be a
 820 multiple-values object (@pxref{Multiple Values}).
 821 @end deffn
 822
 823 @deffn Instruction break
 824 Does nothing, but invokes the break hook.
 825 @end deffn
 826
 827 @deffn Instruction drop
 828 Pops off the top value from the stack, throwing it away.
 829 @end deffn
 830
 831 @deffn Instruction dup
 832 Re-pushes the top value onto the stack.
 833 @end deffn
 834
 835 @deffn Instruction void
 836 Pushes ``the unspecified value'' onto the stack.
 837 @end deffn
 838
 839 @node Inlined Scheme Instructions
 840 @subsubsection Inlined Scheme Instructions
 841
 842 The Scheme compiler can recognize the application of standard Scheme
 843 procedures. It tries to inline these small operations to avoid the
 844 overhead of creating new stack frames.
 845
 846 Since most of these operations are historically implemented as C
 847 primitives, not inlining them would entail constantly calling out from
 848 the VM to the interpreter, which has some costs---registers must be
 849 saved, the interpreter has to dispatch, called procedures have to do
 850 much typechecking, etc. It's much more efficient to inline these
 851 operations in the virtual machine itself.
 852
 853 All of these instructions pop their arguments from the stack and push
 854 their results, and take no parameters from the instruction stream.
 855 Thus, unlike in the previous sections, these instruction definitions
 856 show stack parameters instead of parameters from the instruction
 857 stream.
 858
 859 @deffn Instruction not x
 860 @deffnx Instruction not-not x
 861 @deffnx Instruction eq? x y
 862 @deffnx Instruction not-eq? x y
 863 @deffnx Instruction null?
 864 @deffnx Instruction not-null?
 865 @deffnx Instruction eqv? x y
 866 @deffnx Instruction equal? x y
 867 @deffnx Instruction pair? x y
 868 @deffnx Instruction list? x
 869 @deffnx Instruction set-car! pair x
 870 @deffnx Instruction set-cdr! pair x
 871 @deffnx Instruction slot-ref struct n
 872 @deffnx Instruction slot-set struct n x
 873 @deffnx Instruction cons x y
 874 @deffnx Instruction car x
 875 @deffnx Instruction cdr x
 876 Inlined implementations of their Scheme equivalents.
 877 @end deffn
 878
 879 Note that @code{caddr} and friends compile to a series of @code{car}
 880 and @code{cdr} instructions.
 881
 882 @node Inlined Mathematical Instructions
 883 @subsubsection Inlined Mathematical Instructions
 884
 885 Inlining mathematical operations has the obvious advantage of handling
 886 fixnums without function calls or allocations. The trick, of course,
 887 is knowing when the result of an operation will be a fixnum, and there
 888 might be a couple bugs here.
 889
 890 More instructions could be added here over time.
 891
 892 As in the previous section, the definitions below show stack
 893 parameters instead of instruction stream parameters.
 894
 895 @deffn Instruction add x y
 896 @deffnx Instruction sub x y
 897 @deffnx Instruction mul x y
 898 @deffnx Instruction div x y
 899 @deffnx Instruction quo x y
 900 @deffnx Instruction rem x y
 901 @deffnx Instruction mod x y
 902 @deffnx Instruction ee? x y
 903 @deffnx Instruction lt? x y
 904 @deffnx Instruction gt? x y
 905 @deffnx Instruction le? x y
 906 @deffnx Instruction ge? x y
 907 Inlined implementations of the corresponding mathematical operations.
 908 @end deffn