X-Git-Url: https://git.hcoop.net/bpt/guile.git/blobdiff_plain/80a7d5dc8e39efcedb9882f20902a42e66371053..c3c3032608c9658c5dc5019d85446b6a1c2f7fcc:/doc/ref/vm.texi diff --git a/doc/ref/vm.texi b/doc/ref/vm.texi index 042645200..9936ad97d 100644 --- a/doc/ref/vm.texi +++ b/doc/ref/vm.texi @@ -1,20 +1,20 @@ @c -*-texinfo-*- @c This is part of the GNU Guile Reference Manual. -@c Copyright (C) 2008,2009 +@c Copyright (C) 2008,2009,2010,2013 @c Free Software Foundation, Inc. @c See the file guile.texi for copying conditions. @node A Virtual Machine for Guile @section A Virtual Machine for Guile -Guile has both an interpreter and a compiler. To a user, the -difference is largely transparent---interpreted and compiled -procedures can call each other as they please. +Guile has both an interpreter and a compiler. To a user, the difference +is transparent---interpreted and compiled procedures can call each other +as they please. The difference is that the compiler creates and interprets bytecode for a custom virtual machine, instead of interpreting the -S-expressions directly. Running compiled code is faster than running -interpreted code. +S-expressions directly. Loading and running compiled code is faster +than loading and running source code. The virtual machine that does the bytecode interpretation is a part of Guile itself. This section describes the nature of Guile's virtual @@ -33,21 +33,19 @@ machine. @subsection Why a VM? @cindex interpreter -@cindex evaluator -For a long time, Guile only had an interpreter, called the -@dfn{evaluator}. Guile's evaluator operates directly on the -S-expression representation of Scheme source code. +For a long time, Guile only had an interpreter. Guile's interpreter +operated directly on the S-expression representation of Scheme source +code. -But while the evaluator is highly optimized and hand-tuned, and -contains some extensive speed trickery (@pxref{Memoization}), it still +But while the interpreter was highly optimized and hand-tuned, it still performs many needless computations during the course of evaluating an expression. For example, application of a function to arguments -needlessly conses up the arguments in a list. Evaluation of an -expression always has to figure out what the car of the expression is --- a procedure, a memoized form, or something else. All values have to -be allocated on the heap. Et cetera. +needlessly consed up the arguments in a list. Evaluation of an +expression always had to figure out what the car of the expression is -- +a procedure, a memoized form, or something else. All values have to be +allocated on the heap. Et cetera. -The solution to this problem is to compile the higher-level language, +The solution to this problem was to compile the higher-level language, Scheme, into a lower-level language for which all of the checks and dispatching have already been done---the code is instead stripped to the bare minimum needed to ``do the job''. @@ -71,7 +69,21 @@ for Guile (@code{cons}, @code{struct-ref}, etc.). So this is what Guile does. The rest of this section describes that VM that Guile implements, and the compiled procedures that run on it. -Note that this decision to implement a bytecode compiler does not +Before moving on, though, we should note that though we spoke of the +interpreter in the past tense, Guile still has an interpreter. The +difference is that before, it was Guile's main evaluator, and so was +implemented in highly optimized C; now, it is actually implemented in +Scheme, and compiled down to VM bytecode, just like any other program. +(There is still a C interpreter around, used to bootstrap the compiler, +but it is not normally used at runtime.) + +The upside of implementing the interpreter in Scheme is that we preserve +tail calls and multiple-value handling between interpreted and compiled +code. The downside is that the interpreter in Guile 2.0 is slower than +the interpreter in 1.8. We hope the that the compiler's speed makes up +for the loss! + +Also note that this decision to implement a bytecode compiler does not preclude native compilation. We can compile from bytecode to native code at runtime, or even do ahead of time compilation. More possibilities are discussed in @ref{Extending the Compiler}. @@ -79,12 +91,9 @@ possibilities are discussed in @ref{Extending the Compiler}. @node VM Concepts @subsection VM Concepts -A virtual machine (VM) is a Scheme object. Users may create virtual -machines using the standard procedures described later in this manual, -but that is usually unnecessary, as Guile ensures that there is one -virtual machine per thread. When a VM-compiled procedure is run, Guile -looks up the virtual machine for the current thread and executes the -procedure using that VM. +Compiled code is run by a virtual machine (VM). Each thread has its own +VM. When a compiled procedure is run, Guile looks up the virtual machine +for the current thread and executes the procedure using that VM. Guile's virtual machine is a stack machine---that is, it has few registers, and the instructions defined in the VM operate by pushing @@ -111,27 +120,24 @@ The registers that a VM has are as follows: In other architectures, the instruction pointer is sometimes called the ``program counter'' (pc). This set of registers is pretty typical for stack machines; their exact meanings in the context of Guile's VM -is described in the next section. +are described in the next section. -A virtual machine executes by loading a compiled procedure, and -executing the object code associated with that procedure. Of course, -that procedure may call other procedures, tail-call others, ad -infinitum---indeed, within a guile whose modules have all been -compiled to object code, one might never leave the virtual machine. - -@c wingo: I wish the following were true, but currently we just use -@c the one engine. This kind of thing is possible tho. +@c wingo: The following is true, but I don't know in what context to +@c describe it. A documentation FIXME. @c A VM may have one of three engines: reckless, regular, or debugging. @c Reckless engine is fastest but dangerous. Regular engine is normally @c fail-safe and reasonably fast. Debugging engine is safest and @c functional but very slow. +@c (Actually we have just a regular and a debugging engine; normally +@c we use the latter, it's almost as fast as the ``regular'' engine.) + @node Stack Layout @subsection Stack Layout While not strictly necessary to understand how to work with the VM, it -is instructive and sometimes entertaining to consider the struture of +is instructive and sometimes entertaining to consider the structure of the VM stack. Logically speaking, a VM stack is composed of ``frames''. Each frame @@ -156,25 +162,26 @@ The structure of the fixed part of an application frame is as follows: @example Stack - | | <- fp + bp->nargs + bp->nlocs + 4 - +------------------+ = SCM_FRAME_UPPER_ADDRESS (fp) - | Return address | - | MV return address| - | Dynamic link | - | External link | <- fp + bp->nargs + bp->nlocs - | Local variable 1 | = SCM_FRAME_DATA_ADDRESS (fp) + | ... | + | Intermed. val. 0 | <- fp + bp->nargs + bp->nlocs = SCM_FRAME_UPPER_ADDRESS (fp) + +==================+ + | Local variable 1 | | Local variable 0 | <- fp + bp->nargs | Argument 1 | | Argument 0 | <- fp | Program | <- fp - 1 - +------------------+ = SCM_FRAME_LOWER_ADDRESS (fp) + +------------------+ + | Return address | + | MV return address| + | Dynamic link | <- fp - 4 = SCM_FRAME_DATA_ADDRESS (fp) = SCM_FRAME_LOWER_ADDRESS (fp) + +==================+ | | @end example In the above drawing, the stack grows upward. The intermediate values stored in the application of this frame are stored above @code{SCM_FRAME_UPPER_ADDRESS (fp)}. @code{bp} refers to the -@code{struct scm_program*} data associated with the program at +@code{struct scm_objcode} data associated with the program at @code{fp - 1}. @code{nargs} and @code{nlocs} are properties of the compiled procedure, which will be discussed later. @@ -198,25 +205,17 @@ values being returned. @item Dynamic link This is the @code{fp} in effect before this program was applied. In effect, this and the return address are the registers that are always -``saved''. - -@item External link -This field is a reference to the list of heap-allocated variables -associated with this frame. For a discussion of heap versus stack -allocation, @xref{Variables and the VM}. +``saved''. The dynamic link links the current frame to the previous +frame; computing a stack trace involves traversing these frames. @item Local variable @var{n} -Lambda-local variables that are allocated on the stack are all -allocated as part of the frame. This makes access to non-captured, -non-mutated variables very cheap. +Lambda-local variables that are all allocated as part of the frame. +This makes access to variables very cheap. @item Argument @var{n} The calling convention of the VM requires arguments of a function -application to be pushed on the stack, and here they are. Normally -references to arguments dispatch to these locations on the stack. -However if an argument has to be stored on the heap, it will be copied -from its initial value here onto a location in the heap, and -thereafter only referenced on the heap. +application to be pushed on the stack, and here they are. References +to arguments dispatch to these locations on the stack. @item Program This is the program being applied. For more information on how @@ -226,40 +225,51 @@ programs are implemented, @xref{VM Programs}. @node Variables and the VM @subsection Variables and the VM -Let's think about the following Scheme code as an example: +Consider the following Scheme code as an example: @example (define (foo a) (lambda (b) (list foo a b))) @end example -Within the lambda expression, "foo" is a top-level variable, "a" is a -lexically captured variable, and "b" is a local variable. - -That is to say: @code{b} may safely be allocated on the stack, as -there is no enclosed procedure that references it, nor is it ever -mutated. - -@code{a}, on the other hand, is referenced by an enclosed procedure, -that of the lambda. Thus it must be allocated on the heap, as it may -(and will) outlive the dynamic extent of the invocation of @code{foo}. - -@code{foo} is a toplevel variable, as mandated by Scheme's semantics: - -@example - (define proc (foo 'bar)) ; assuming prev. definition of @code{foo} - (define foo 42) ; redefinition - (proc 'baz) - @result{} (42 bar baz) -@end example - -Note that variables that are mutated (via @code{set!}) must be -allocated on the heap, even if they are local variables. This is -because any called subprocedure might capture the continuation, which -would need to capture locations instead of values. Thus perhaps -counterintuitively, what would seem ``closer to the metal'', viz -@code{set!}, actually forces heap allocation instead of stack -allocation. +Within the lambda expression, @code{foo} is a top-level variable, @code{a} is a +lexically captured variable, and @code{b} is a local variable. + +Another way to refer to @code{a} and @code{b} is to say that @code{a} +is a ``free'' variable, since it is not defined within the lambda, and +@code{b} is a ``bound'' variable. These are the terms used in the +@dfn{lambda calculus}, a mathematical notation for describing +functions. The lambda calculus is useful because it allows one to +prove statements about functions. It is especially good at describing +scope relations, and it is for that reason that we mention it here. + +Guile allocates all variables on the stack. When a lexically enclosed +procedure with free variables---a @dfn{closure}---is created, it copies +those variables into its free variable vector. References to free +variables are then redirected through the free variable vector. + +If a variable is ever @code{set!}, however, it will need to be +heap-allocated instead of stack-allocated, so that different closures +that capture the same variable can see the same value. Also, this +allows continuations to capture a reference to the variable, instead +of to its value at one point in time. For these reasons, @code{set!} +variables are allocated in ``boxes''---actually, in variable cells. +@xref{Variables}, for more information. References to @code{set!} +variables are indirected through the boxes. + +Thus perhaps counterintuitively, what would seem ``closer to the +metal'', viz @code{set!}, actually forces an extra memory allocation +and indirection. + +Going back to our example, @code{b} may be allocated on the stack, as +it is never mutated. + +@code{a} may also be allocated on the stack, as it too is never +mutated. Within the enclosed lambda, its value will be copied into +(and referenced from) the free variables vector. + +@code{foo} is a top-level variable, because @code{foo} is not +lexically bound in this example. @node VM Programs @subsection Compiled Procedures are VM Programs @@ -276,6 +286,7 @@ You can pick apart these pieces with the accessors in @code{(system vm program)}. @xref{Compiled Procedures}, for a full API reference. @cindex object table +@cindex object array The object array of a compiled procedure, also known as the @dfn{object table}, holds all Scheme objects whose values are known not to change across invocations of the procedure: constant strings, @@ -293,56 +304,51 @@ instruction, which uses the object vector, and are almost as fast as local variable references. We can see how these concepts tie together by disassembling the -@code{foo} function to see what is going on: +@code{foo} function we defined earlier to see what is going on: @smallexample scheme@@(guile-user)> (define (foo a) (lambda (b) (list foo a b))) scheme@@(guile-user)> ,x foo -Disassembly of #: - -Bytecode: - - 0 (local-ref 0) ;; `a' (arg) - 2 (external-set 0) ;; `a' (arg) - 4 (object-ref 0) ;; # - 6 (make-closure) at (unknown file):0:16 - 7 (return) + 0 (assert-nargs-ee/locals 1) + 2 (object-ref 1) ;; #:0:17 (b)> + 4 (local-ref 0) ;; `a' + 6 (make-closure 0 1) + 9 (return) ---------------------------------------- -Disassembly of #: - -Bytecode: +Disassembly of #:0:17 (b)>: - 0 (toplevel-ref 0) ;; `list' + 0 (assert-nargs-ee/locals 1) 2 (toplevel-ref 1) ;; `foo' - 4 (external-ref 0) ;; (closure variable) - 6 (local-ref 0) ;; `b' (arg) - 8 (goto/args 3) at (unknown file):0:28 + 4 (free-ref 0) ;; (closure variable) + 6 (local-ref 0) ;; `b' + 8 (list 0 3) ;; 3 elements at (unknown file):0:29 + 11 (return) @end smallexample -At @code{ip} 0 and 2, we do the copy from argument to heap for -@code{a}. @code{Ip} 4 loads up the compiled lambda, and then at -@code{ip} 6 we make a closure---binding code (from the compiled -lambda) with data (the heap-allocated variables). Finally we return -the closure. - -The second stanza disassembles the compiled lambda. Toplevel variables -are resolved relative to the module that was current when the -procedure was created. This lookup occurs lazily, at the first time -the variable is actually referenced, and the location of the lookup is -cached so that future references are very cheap. @xref{Environment -Control Instructions}, for more details. - -Then we see a reference to an external variable, corresponding to -@code{a}. The disassembler doesn't have enough information to give a -name to that variable, so it just marks it as being a ``closure -variable''. Finally we see the reference to @code{b}, then a tail call -(@code{goto/args}) with three arguments. +First there's some prelude, where @code{foo} checks that it was called with only +1 argument. Then at @code{ip} 2, we load up the compiled lambda. @code{Ip} 4 +loads up `a', so that it can be captured into a closure by at @code{ip} +6---binding code (from the compiled lambda) with data (the free-variable +vector). Finally we return the closure. + +The second stanza disassembles the compiled lambda. After the prelude, we note +that toplevel variables are resolved relative to the module that was current +when the procedure was created. This lookup occurs lazily, at the first time the +variable is actually referenced, and the location of the lookup is cached so +that future references are very cheap. @xref{Top-Level Environment Instructions}, +for more details. + +Then we see a reference to a free variable, corresponding to @code{a}. The +disassembler doesn't have enough information to give a name to that variable, so +it just marks it as being a ``closure variable''. Finally we see the reference +to @code{b}, then the @code{list} opcode, an inline implementation of the +@code{list} scheme routine. @node Instruction Set @subsection Instruction Set -There are about 100 instructions in Guile's virtual machine. These +There are about 180 instructions in Guile's virtual machine. These instructions represent atomic units of a program's execution. Ideally, they perform one task without conditional branches, then dispatch to the next instruction in the stream. @@ -365,7 +371,8 @@ their own test-and-branch instructions: @end example In addition, some Scheme primitives have their own inline -implementations, e.g. @code{cons}. +implementations, e.g.@: @code{cons}, and @code{list}, as we saw in the +previous section. So Guile's instruction set is a @emph{complete} instruction set, in that it provides the instructions that are suited to the problem, and @@ -373,24 +380,36 @@ is not concerned with making a minimal, orthogonal set of instructions. More instructions may be added over time. @menu -* Environment Control Instructions:: +* Lexical Environment Instructions:: +* Top-Level Environment Instructions:: +* Procedure Call and Return Instructions:: +* Function Prologue Instructions:: +* Trampoline Instructions:: * Branch Instructions:: +* Data Constructor Instructions:: * Loading Instructions:: -* Procedural Instructions:: -* Data Control Instructions:: +* Dynamic Environment Instructions:: * Miscellaneous Instructions:: * Inlined Scheme Instructions:: * Inlined Mathematical Instructions:: +* Inlined Bytevector Instructions:: @end menu -@node Environment Control Instructions -@subsubsection Environment Control Instructions -These instructions access and mutate the environment of a compiled -procedure---the local bindings, the ``external'' bindings, and the -toplevel bindings. +@node Lexical Environment Instructions +@subsubsection Lexical Environment Instructions + +These instructions access and mutate the lexical environment of a +compiled procedure---its free and bound variables. + +Some of these instructions have @code{long-} variants, the difference +being that they take 16-bit arguments, encoded in big-endianness, +instead of the normal 8-bit range. + +@xref{Stack Layout}, for more information on the format of stack frames. @deffn Instruction local-ref index +@deffnx Instruction long-local-ref index Push onto the stack the value of the local variable located at @var{index} within the current stack frame. @@ -400,51 +419,118 @@ arguments. @end deffn @deffn Instruction local-set index +@deffnx Instruction long-local-set index Pop the Scheme object located on top of the stack and make it the new value of the local variable located at @var{index} within the current stack frame. @end deffn -@deffn Instruction external-ref index -Push the value of the closure variable located at position -@var{index} within the program's list of external variables. +@deffn Instruction box index +Pop a value off the stack, and set the @var{index}nth local variable +to a box containing that value. A shortcut for @code{make-variable} +then @code{local-set}, used when binding boxed variables. @end deffn -@deffn Instruction external-set index -Pop the Scheme object located on top of the stack and make it the new -value of the closure variable located at @var{index} within the -program's list of external variables. +@deffn Instruction empty-box index +Set the @var{index}th local variable to a box containing a variable +whose value is unbound. Used when compiling some @code{letrec} +expressions. +@end deffn + +@deffn Instruction local-boxed-ref index +@deffnx Instruction local-boxed-set index +Get or set the value of the variable located at @var{index} within the +current stack frame. A shortcut for @code{local-ref} then +@code{variable-ref} or @code{variable-set}, respectively. +@end deffn + +@deffn Instruction free-ref index +Push the value of the captured variable located at position +@var{index} within the program's vector of captured variables. @end deffn -The external variable lookup algorithm should probably be made more -efficient in the future via addressing by frame and index. Currently, -external variables are all consed onto a list, which results in O(N) -lookup time. +@deffn Instruction free-boxed-ref index +@deffnx Instruction free-boxed-set index +Get or set a boxed free variable. A shortcut for @code{free-ref} then +@code{variable-ref} or @code{variable-set}, respectively. + +Note that there is no @code{free-set} instruction, as variables that are +@code{set!} must be boxed. +@end deffn + +@deffn Instruction make-closure num-free-vars +Pop @var{num-free-vars} values and a program object off the stack in +that order, and push a new program object closing over the given free +variables. @var{num-free-vars} is encoded as a two-byte big-endian +value. + +The free variables are stored in an array, inline to the new program +object, in the order that they were on the stack (not the order they are +popped off). The new closure shares state with the original program. At +the time of this writing, the space overhead of closures is 3 words, +plus one word for each free variable. +@end deffn -@deffn Instruction externals -Pushes the current list of external variables onto the stack. This -instruction is used in the implementation of -@code{compile-time-environment}. @xref{The Scheme Compiler}. +@deffn Instruction fix-closure index +Fix up the free variables array of the closure stored in the +@var{index}th local variable. @var{index} is a two-byte big-endian +integer. + +This instruction will pop as many values from the stack as are in the +corresponding closure's free variables array. The topmost value on the +stack will be stored as the closure's last free variable, with other +values filling in free variable slots in order. + +@code{fix-closure} is part of a hack for allocating mutually recursive +procedures. The hack is to store the procedures in their corresponding +local variable slots, with space already allocated for free variables. +Then once they are all in place, this instruction fixes up their +procedures' free variable bindings in place. This allows most +@code{letrec}-bound procedures to be allocated unboxed on the stack. @end deffn +@deffn Instruction local-bound? index +@deffnx Instruction long-local-bound? index +Push @code{#t} on the stack if the @code{index}th local variable has +been assigned, or @code{#f} otherwise. Mostly useful for handling +optional arguments in procedure prologues. +@end deffn + + +@node Top-Level Environment Instructions +@subsubsection Top-Level Environment Instructions + +These instructions access values in the top-level environment: bindings +that were not lexically apparent at the time that the code in question +was compiled. + +The location in which a toplevel binding is stored can be looked up once +and cached for later. The binding itself may change over time, but its +location will stay constant. + +Currently only toplevel references within procedures are cached, as only +procedures have a place to cache them, in their object tables. + @deffn Instruction toplevel-ref index +@deffnx Instruction long-toplevel-ref index Push the value of the toplevel binding whose location is stored in at -position @var{index} in the object table. +position @var{index} in the current procedure's object table. The +@code{long-} variant encodes the index over two bytes. -Initially, a cell in the object table that is used by -@code{toplevel-ref} is initialized to one of two forms. The normal -case is that the cell holds a symbol, whose binding will be looked up +Initially, a cell in a procedure's object table that is used by +@code{toplevel-ref} is initialized to one of two forms. The normal case +is that the cell holds a symbol, whose binding will be looked up relative to the module that was current when the current program was created. Alternately, the lookup may be performed relative to a particular -module, determined at compile-time (e.g. via @code{@@} or +module, determined at compile-time (e.g.@: via @code{@@} or @code{@@@@}). In that case, the cell in the object table holds a list: -@code{(@var{modname} @var{sym} @var{interface?})}. The symbol -@var{sym} will be looked up in the module named @var{modname} (a list -of symbols). The lookup will be performed against the module's public -interface, unless @var{interface?} is @code{#f}, which it is for -example when compiling @code{@@@@}. +@code{(@var{modname} @var{sym} @var{public?})}. The symbol @var{sym} +will be looked up in the module named @var{modname} (a list of +symbols). The lookup will be performed against the module's public +interface, unless @var{public?} is @code{#f}, which it is for example +when compiling @code{@@@@}. In any case, if the symbol is unbound, an error is signalled. Otherwise the initial form is replaced with the looked-up variable, an @@ -455,20 +541,27 @@ variable has been successfully resolved. This instruction pushes the value of the variable onto the stack. @end deffn -@deffn Instruction toplevel-ref index +@deffn Instruction toplevel-set index +@deffnx Instruction long-toplevel-set index Pop a value off the stack, and set it as the value of the toplevel variable stored at @var{index} in the object table. If the variable has not yet been looked up, we do the lookup as in @code{toplevel-ref}. @end deffn +@deffn Instruction define +Pop a symbol and a value from the stack, in that order. Look up its +binding in the current toplevel environment, creating the binding if +necessary. Set the variable to the value. +@end deffn + @deffn Instruction link-now Pop a value, @var{x}, from the stack. Look up the binding for @var{x}, according to the rules for @code{toplevel-ref}, and push that variable on the stack. If the lookup fails, an error will be signalled. This instruction is mostly used when loading programs, because it can -do toplevel variable lookups without an object vector. +do toplevel variable lookups without an object table. @end deffn @deffn Instruction variable-ref @@ -481,268 +574,372 @@ Pop off two objects from the stack, a variable and a value, and set the variable to the value. @end deffn -@deffn Instruction object-ref n -Push @var{n}th value from the current program's object vector. +@deffn Instruction variable-bound? +Pop off the variable object from top of the stack and push @code{#t} if +it is bound, or @code{#f} otherwise. Mostly useful in procedure +prologues for defining default values for boxed optional variables. @end deffn -@node Branch Instructions -@subsubsection Branch Instructions +@deffn Instruction make-variable +Replace the top object on the stack with a variable containing it. +Used in some circumstances when compiling @code{letrec} expressions. +@end deffn -All the conditional branch instructions described below work in the -same way: -@itemize -@item They pop off the Scheme object located on the stack and use it as -the branch condition; -@item If the condition is true, then the instruction pointer is -increased by the offset passed as an argument to the branch -instruction; -@item Program execution proceeds with the next instruction (that is, -the one to which the instruction pointer points). -@end itemize +@node Procedure Call and Return Instructions +@subsubsection Procedure Call and Return Instructions -Note that the offset passed to the instruction is encoded on two 8-bit -integers which are then combined by the VM as one 16-bit integer. +@c something about the calling convention here? -@deffn Instruction br offset -Jump to @var{offset}. +@deffn Instruction new-frame +Push a new frame on the stack, reserving space for the dynamic link, +return address, and the multiple-values return address. The frame +pointer is not yet updated, because the frame is not yet active -- it +has to be patched by a @code{call} instruction to get the return +address. @end deffn -@deffn Instruction br-if offset -Jump to @var{offset} if the condition on the stack is not false. +@deffn Instruction call nargs +Call the procedure located at @code{sp[-nargs]} with the @var{nargs} +arguments located from @code{sp[-nargs + 1]} to @code{sp[0]}. + +This instruction requires that a new frame be pushed on the stack before +the procedure, via @code{new-frame}. @xref{Stack Layout}, for more +information. It patches up that frame with the current @code{ip} as the +return address, then dispatches to the first instruction in the called +procedure, relying on the called procedure to return one value to the +newly-created continuation. Because the new frame pointer will point to +@code{sp[-nargs + 1]}, the arguments don't have to be shuffled around -- +they are already in place. @end deffn -@deffn Instruction br-if-not offset -Jump to @var{offset} if the condition on the stack is false. -@end deffn +@deffn Instruction tail-call nargs +Transfer control to the procedure located at @code{sp[-nargs]} with the +@var{nargs} arguments located from @code{sp[-nargs + 1]} to +@code{sp[0]}. -@deffn Instruction br-if-eq offset -Jump to @var{offset} if the two objects located on the stack are -equal in the sense of @var{eq?}. Note that, for this instruction, the -stack pointer is decremented by two Scheme objects instead of only -one. +Unlike @code{call}, which requires a new frame to be pushed onto the +stack, @code{tail-call} simply shuffles down the procedure and arguments +to the current stack frame. This instruction implements tail calls as +required by RnRS. @end deffn -@deffn Instruction br-if-not-eq offset -Same as @var{br-if-eq} for non-@code{eq?} objects. +@deffn Instruction apply nargs +@deffnx Instruction tail-apply nargs +Like @code{call} and @code{tail-call}, except that the top item on the +stack must be a list. The elements of that list are then pushed on the +stack and treated as additional arguments, replacing the list itself, +then the procedure is invoked as usual. @end deffn -@deffn Instruction br-if-null offset -Jump to @var{offset} if the object on the stack is @code{'()}. +@deffn Instruction call/nargs +@deffnx Instruction tail-call/nargs +These are like @code{call} and @code{tail-call}, except they take the +number of arguments from the stack instead of the instruction stream. +These instructions are used in the implementation of multiple value +returns, where the actual number of values is pushed on the stack. @end deffn -@deffn Instruction br-if-not-null offset -Jump to @var{offset} if the object on the stack is not @code{'()}. +@deffn Instruction mv-call nargs offset +Like @code{call}, except that a multiple-value continuation is created +in addition to a single-value continuation. + +The offset (a three-byte value) is an offset within the instruction +stream; the multiple-value return address in the new frame (@pxref{Stack +Layout}) will be set to the normal return address plus this offset. +Instructions at that offset will expect the top value of the stack to be +the number of values, and below that values themselves, pushed +separately. @end deffn +@deffn Instruction return +Free the program's frame, returning the top value from the stack to +the current continuation. (The stack should have exactly one value on +it.) -@node Loading Instructions -@subsubsection Loading Instructions +Specifically, the @code{sp} is decremented to one below the current +@code{fp}, the @code{ip} is reset to the current return address, the +@code{fp} is reset to the value of the current dynamic link, and then +the returned value is pushed on the stack. +@end deffn -In addition to VM instructions, an instruction stream may contain -variable-length data embedded within it. This data is always preceded -by special loading instructions, which interpret the data and advance -the instruction pointer to the next VM instruction. +@deffn Instruction return/values nvalues +@deffnx Instruction return/nvalues +Return the top @var{nvalues} to the current continuation. In the case of +@code{return/nvalues}, @var{nvalues} itself is first popped from the top +of the stack. -All of these loading instructions have a @code{length} parameter, -indicating the size of the embedded data, in bytes. The length itself -may be encoded in 1, 2, or 4 bytes. +If the current continuation is a multiple-value continuation, +@code{return/values} pushes the number of values on the stack, then +returns as in @code{return}, but to the multiple-value return address. -@deffn Instruction load-integer length -@deffnx Instruction load-unsigned-integer length -Load a 32-bit integer (respectively unsigned integer) from the -instruction stream. -@end deffn -@deffn Instruction load-number length -Load an arbitrary number from the instruction stream. The number is -embedded in the stream as a string. -@end deffn -@deffn Instruction load-string length -Load a string from the instruction stream. +Otherwise if the current continuation accepts only one value, i.e.@: the +multiple-value return address is @code{NULL}, then we assume the user +only wants one value, and we give them the first one. If there are no +values, an error is signaled. @end deffn -@deffn Instruction load-symbol length -Load a symbol from the instruction stream. + +@deffn Instruction return/values* nvalues +Like a combination of @code{apply} and @code{return/values}, in which +the top value on the stack is interpreted as a list of additional +values. This is an optimization for the common @code{(apply values +...)} case. @end deffn -@deffn Instruction load-keyword length -Load a keyword from the instruction stream. + +@deffn Instruction truncate-values nbinds nrest +Used in multiple-value continuations, this instruction takes the +values that are on the stack (including the number-of-values marker) +and truncates them for a binding construct. + +For example, a call to @code{(receive (x y . z) (foo) ...)} would, +logically speaking, pop off the values returned from @code{(foo)} and +push them as three values, corresponding to @code{x}, @code{y}, and +@code{z}. In that case, @var{nbinds} would be 3, and @var{nrest} would +be 1 (to indicate that one of the bindings was a rest argument). + +Signals an error if there is an insufficient number of values. @end deffn -@deffn Instruction define length -Load a symbol from the instruction stream, and look up its binding in -the current toplevel environment, creating the binding if necessary. -Push the variable corresponding to the binding. +@deffn Instruction call/cc +@deffnx Instruction tail-call/cc +Capture the current continuation, and then call (or tail-call) the +procedure on the top of the stack, with the continuation as the +argument. + +@code{call/cc} does not require a @code{new-frame} to be pushed on the +stack, as @code{call} does, because it needs to capture the stack +before the frame is pushed. + +Both the VM continuation and the C continuation are captured. @end deffn -@deffn Instruction load-program length -Load bytecode from the instruction stream, and push a compiled -procedure. This instruction pops the following values from the stack: -@itemize -@item Optionally, a thunk, which when called should return metadata -associated with this program---for example its name, the names of its -arguments, its documentation string, debugging information, etc. - -Normally, this thunk its itself a compiled procedure (with no -metadata). Metadata is represented this way so that the initial load -of a procedure is fast: the VM just mmap's the thunk and goes. The -symbols and pairs associated with the metadata are only created if the -user asks for them. - -For information on the format of the thunk's return value, -@xref{Compiled Procedures}. -@item Optionally, the program's object table, as a vector. - -A program that does not reference toplevel bindings and does not use -@code{object-ref} does not need an object table. -@item Finally, either one immediate integer or four immediate integers -representing the arity of the program. - -In the four-fixnum case, the values are respectively the number of -arguments taken by the function (@var{nargs}), the number of @dfn{rest -arguments} (@var{nrest}, 0 or 1), the number of local variables -(@var{nlocs}) and the number of external variables (@var{nexts}) -(@pxref{Environment Control Instructions}). - -The common single-fixnum case represents all of these values within a -16-bit bitmask. -@end itemize +@node Function Prologue Instructions +@subsubsection Function Prologue Instructions + +A function call in Guile is very cheap: the VM simply hands control to +the procedure. The procedure itself is responsible for asserting that it +has been passed an appropriate number of arguments. This strategy allows +arbitrarily complex argument parsing idioms to be developed, without +harming the common case. -The resulting compiled procedure will not have any ``external'' -variables captured, so it will be loaded only once but may be used -many times to create closures. +For example, only calls to keyword-argument procedures ``pay'' for the +cost of parsing keyword arguments. (At the time of this writing, calling +procedures with keyword arguments is typically two to four times as +costly as calling procedures with a fixed set of arguments.) + +@deffn Instruction assert-nargs-ee n +@deffnx Instruction assert-nargs-ge n +Assert that the current procedure has been passed exactly @var{n} +arguments, for the @code{-ee} case, or @var{n} or more arguments, for +the @code{-ge} case. @var{n} is encoded over two bytes. + +The number of arguments is determined by subtracting the frame pointer +from the stack pointer (@code{sp - (fp -1)}). @xref{Stack Layout}, for +more details on stack frames. @end deffn -Finally, while this instruction is not strictly a ``loading'' -instruction, it's useful to wind up the @code{load-program} discussion -here: +@deffn Instruction br-if-nargs-ne n offset +@deffnx Instruction br-if-nargs-gt n offset +@deffnx Instruction br-if-nargs-lt n offset +Jump to @var{offset} if the number of arguments is not equal to, greater +than, or less than @var{n}. @var{n} is encoded over two bytes, and +@var{offset} has the normal three-byte encoding. -@deffn Instruction make-closure -Pop the program object from the stack, capture the current set of -``external'' variables, and assign those external variables to a copy -of the program. Push the new program object, which shares state with -the original program. Also captures the current module. +These instructions are used to implement multiple arities, as in +@code{case-lambda}. @xref{Case-lambda}, for more information. @end deffn -@node Procedural Instructions -@subsubsection Procedural Instructions +@deffn Instruction bind-optionals n +If the procedure has been called with fewer than @var{n} arguments, fill +in the remaining arguments with an unbound value (@code{SCM_UNDEFINED}). +@var{n} is encoded over two bytes. -@deffn Instruction return -Free the program's frame, returning the top value from the stack to -the current continuation. (The stack should have exactly one value on -it.) +The optionals can be later initialized conditionally via the +@code{local-bound?} instruction. +@end deffn -Specifically, the @code{sp} is decremented to one below the current -@code{fp}, the @code{ip} is reset to the current return address, the -@code{fp} is reset to the value of the current dynamic link, and then -the top item on the stack (formerly the procedure being applied) is -set to the returned value. +@deffn Instruction push-rest n +Pop off excess arguments (more than @var{n}), collecting them into a +list, and push that list. Used to bind a rest argument, if the procedure +has no keyword arguments. Procedures with keyword arguments use +@code{bind-rest} instead. @end deffn -@deffn Instruction call nargs -Call the procedure located at @code{sp[-nargs]} with the @var{nargs} -arguments located from @code{sp[0]} to @code{sp[-nargs + 1]}. +@deffn Instruction bind-rest n idx +Pop off excess arguments (more than @var{n}), collecting them into a +list. The list is then assigned to the @var{idx}th local variable. +@end deffn -For non-compiled procedures (continuations, primitives, and -interpreted procedures), @code{call} will pop the procedure and -arguments off the stack, and push the result of calling -@code{scm_apply}. +@deffn Instruction bind-optionals/shuffle nreq nreq-and-opt ntotal +@deffnx Instruction bind-optionals/shuffle-or-br nreq nreq-and-opt ntotal offset +Shuffle keyword arguments to the top of the stack, filling in the holes +with @code{SCM_UNDEFINED}. Each argument is encoded over two bytes. + +This instruction is used by procedures with keyword arguments. +@var{nreq} is the number of required arguments to the procedure, and +@var{nreq-and-opt} is the total number of positional arguments (required +plus optional). @code{bind-optionals/shuffle} will scan the stack from +the @var{nreq}th argument up to the @var{nreq-and-opt}th, and start +shuffling when it sees the first keyword argument or runs out of +positional arguments. + +@code{bind-optionals/shuffle-or-br} does the same, except that it checks +if there are too many positional arguments before shuffling. If this is +the case, it jumps to @var{offset}, encoded using the normal three-byte +encoding. + +Shuffling simply moves the keyword arguments past the total number of +arguments, @var{ntotal}, which includes keyword and rest arguments. The +free slots created by the shuffle are filled in with +@code{SCM_UNDEFINED}, so they may be conditionally initialized later in +the function's prologue. +@end deffn -For compiled procedures, this instruction sets up a new stack frame, -as described in @ref{Stack Layout}, and then dispatches to the first -instruction in the called procedure, relying on the called procedure -to return one value to the newly-created continuation. +@deffn Instruction bind-kwargs idx ntotal flags +Parse keyword arguments, assigning their values to the corresponding +local variables. The keyword arguments should already have been shuffled +above the @var{ntotal}th stack slot by @code{bind-optionals/shuffle}. + +The parsing is driven by a keyword arguments association list, looked up +from the @var{idx}th element of the procedures object array. The alist +is a list of pairs of the form @code{(@var{kw} . @var{index})}, mapping +keyword arguments to their local variable indices. + +There are two bitflags that affect the parser, @code{allow-other-keys?} +(@code{0x1}) and @code{rest?} (@code{0x2}). Unless +@code{allow-other-keys?} is set, the parser will signal an error if an +unknown key is found. If @code{rest?} is set, errors parsing the +keyword arguments will be ignored, as a later @code{bind-rest} +instruction will collect all of the tail arguments, including the +keywords, into a list. Otherwise if the keyword arguments are invalid, +an error is signalled. + +@var{idx} and @var{ntotal} are encoded over two bytes each, and +@var{flags} is encoded over one byte. @end deffn -@deffn Instruction goto/args nargs -Like @code{call}, but reusing the current continuation. This -instruction implements tail calling as required by RnRS. +@deffn Instruction reserve-locals n +Resets the stack pointer to have space for @var{n} local variables, +including the arguments. If this operation increments the stack pointer, +as in a push, the new slots are filled with @code{SCM_UNBOUND}. If this +operation decrements the stack pointer, any excess values are dropped. -For compiled procedures, that means that @code{goto/args} reuses the -current frame instead of building a new one. The @code{goto/*} -instruction family is named as it is because tail calls are equivalent -to @code{goto}, along with relabeled variables. +@code{reserve-locals} is typically used after argument parsing to +reserve space for local variables. +@end deffn -For non-VM procedures, the result is the same, but the current VM -invocation remains on the C stack. True tail calls are not currently -possible between compiled and non-compiled procedures. +@deffn Instruction assert-nargs-ee/locals n +@deffnx Instruction assert-nargs-ge/locals n +A combination of @code{assert-nargs-ee} and @code{reserve-locals}. The +number of arguments is encoded in the lower three bits of @var{n}, a +one-byte value. The number of additional local variables is take from +the upper 5 bits of @var{n}. @end deffn -@deffn Instruction apply nargs -@deffnx Instruction goto/apply nargs -Like @code{call} and @code{goto/args}, except that the top item on the -stack must be a list. The elements of that list are then pushed on the -stack and treated as additional arguments, replacing the list itself, -then the procedure is invoked as usual. + +@node Trampoline Instructions +@subsubsection Trampoline Instructions + +Though most applicable objects in Guile are procedures implemented +in bytecode, not all are. There are primitives, continuations, and other +procedure-like objects that have their own calling convention. Instead +of adding special cases to the @code{call} instruction, Guile wraps +these other applicable objects in VM trampoline procedures, then +provides special support for these objects in bytecode. + +Trampoline procedures are typically generated by Guile at runtime, for +example in response to a call to @code{scm_c_make_gsubr}. As such, a +compiler probably shouldn't emit code with these instructions. However, +it's still interesting to know how these things work, so we document +these trampoline instructions here. + +@deffn Instruction subr-call nargs +Pop off a foreign pointer (which should have been pushed on by the +trampoline), and call it directly, with the @var{nargs} arguments from +the stack. Return the resulting value or values to the calling +procedure. @end deffn -@deffn Instruction call/nargs -@deffnx Instruction goto/nargs -These are like @code{call} and @code{goto/args}, except they take the -number of arguments from the stack instead of the instruction stream. -These instructions are used in the implementation of multiple value -returns, where the actual number of values is pushed on the stack. +@deffn Instruction foreign-call nargs +Pop off an internal foreign object (which should have been pushed on by +the trampoline), and call that foreign function with the @var{nargs} +arguments from the stack. Return the resulting value to the calling +procedure. @end deffn -@deffn Instruction call/cc -@deffnx Instruction goto/cc -Capture the current continuation, and then call (or tail-call) the -procedure on the top of the stack, with the continuation as the -argument. +@deffn Instruction continuation-call +Pop off an internal continuation object (which should have been pushed +on by the trampoline), and reinstate that continuation. All of the +procedure's arguments are passed to the continuation. Does not return. +@end deffn -Both the VM continuation and the C continuation are captured. +@deffn Instruction partial-cont-call +Pop off two objects from the stack: the dynamic winds associated with +the partial continuation, and the VM continuation object. Unroll the +continuation onto the stack, rewinding the dynamic environment and +overwriting the current frame, and pass all arguments to the +continuation. Control flow proceeds where the continuation was captured. @end deffn -@deffn Instruction mv-call nargs offset -Like @code{call}, except that a multiple-value continuation is created -in addition to a single-value continuation. -The offset (a two-byte value) is an offset within the instruction -stream; the multiple-value return address in the new frame -(@pxref{Stack Layout}) will be set to the normal return address plus -this offset. Instructions at that offset will expect the top value of -the stack to be the number of values, and below that values -themselves, pushed separately. -@end deffn +@node Branch Instructions +@subsubsection Branch Instructions -@deffn Instruction return/values nvalues -Return the top @var{nvalues} to the current continuation. +All the conditional branch instructions described below work in the +same way: -If the current continuation is a multiple-value continuation, -@code{return/values} pushes the number of values on the stack, then -returns as in @code{return}, but to the multiple-value return address. +@itemize +@item They pop off Scheme object(s) located on the stack for use in the +branch condition +@item If the condition is true, then the instruction pointer is +increased by the offset passed as an argument to the branch +instruction; +@item Program execution proceeds with the next instruction (that is, +the one to which the instruction pointer points). +@end itemize -Otherwise if the current continuation accepts only one value, i.e. the -multiple-value return address is @code{NULL}, then we assume the user -only wants one value, and we give them the first one. If there are no -values, an error is signaled. +Note that the offset passed to the instruction is encoded as three 8-bit +integers, in big-endian order, effectively giving Guile a 24-bit +relative address space. + +@deffn Instruction br offset +Jump to @var{offset}. No values are popped. @end deffn -@deffn Instruction return/values* nvalues -Like a combination of @code{apply} and @code{return/values}, in which -the top value on the stack is interpreted as a list of additional -values. This is an optimization for the common @code{(apply values -...)} case. +@deffn Instruction br-if offset +Jump to @var{offset} if the object on the stack is not false. @end deffn -@deffn Instruction truncate-values nbinds nrest -Used in multiple-value continuations, this instruction takes the -values that are on the stack (including the number-of-value marker) -and truncates them for a binding construct. +@deffn Instruction br-if-not offset +Jump to @var{offset} if the object on the stack is false. +@end deffn -For example, a call to @code{(receive (x y . z) (foo) ...)} would, -logically speaking, pop off the values returned from @code{(foo)} and -push them as three values, corresponding to @code{x}, @code{y}, and -@code{z}. In that case, @var{nbinds} would be 3, and @var{nrest} would -be 1 (to indicate that one of the bindings was a rest arguments). +@deffn Instruction br-if-eq offset +Jump to @var{offset} if the two objects located on the stack are +equal in the sense of @code{eq?}. Note that, for this instruction, the +stack pointer is decremented by two Scheme objects instead of only +one. +@end deffn -Signals an error if there is an insufficient number of values. +@deffn Instruction br-if-not-eq offset +Same as @code{br-if-eq} for non-@code{eq?} objects. @end deffn -@node Data Control Instructions -@subsubsection Data Control Instructions +@deffn Instruction br-if-null offset +Jump to @var{offset} if the object on the stack is @code{'()}. +@end deffn + +@deffn Instruction br-if-not-null offset +Jump to @var{offset} if the object on the stack is not @code{'()}. +@end deffn + + +@node Data Constructor Instructions +@subsubsection Data Constructor Instructions -These instructions push simple immediate values onto the stack, or -manipulate lists and vectors on the stack. +These instructions push simple immediate values onto the stack, +or construct compound data structures from values on the stack. @deffn Instruction make-int8 value Push @var{value}, an 8-bit integer, onto the stack. @@ -760,6 +957,17 @@ Push the immediate value @code{1} onto the stack. Push @var{value}, a 16-bit integer, onto the stack. @end deffn +@deffn Instruction make-uint64 value +Push @var{value}, an unsigned 64-bit integer, onto the stack. The +value is encoded in 8 bytes, most significant byte first (big-endian). +@end deffn + +@deffn Instruction make-int64 value +Push @var{value}, a signed 64-bit integer, onto the stack. The value +is encoded in 8 bytes, most significant byte first (big-endian), in +twos-complement arithmetic. +@end deffn + @deffn Instruction make-false Push @code{#f} onto the stack. @end deffn @@ -768,6 +976,10 @@ Push @code{#f} onto the stack. Push @code{#t} onto the stack. @end deffn +@deffn Instruction make-nil +Push @code{#nil} onto the stack. +@end deffn + @deffn Instruction make-eol Push @code{'()} onto the stack. @end deffn @@ -776,48 +988,206 @@ Push @code{'()} onto the stack. Push @var{value}, an 8-bit character, onto the stack. @end deffn +@deffn Instruction make-char32 value +Push @var{value}, an 32-bit character, onto the stack. The value is +encoded in big-endian order. +@end deffn + +@deffn Instruction make-symbol +Pops a string off the stack, and pushes a symbol. +@end deffn + +@deffn Instruction make-keyword value +Pops a symbol off the stack, and pushes a keyword. +@end deffn + @deffn Instruction list n Pops off the top @var{n} values off of the stack, consing them up into a list, then pushes that list on the stack. What was the topmost value -will be the last element in the list. +will be the last element in the list. @var{n} is a two-byte value, +most significant byte first. @end deffn @deffn Instruction vector n Create and fill a vector with the top @var{n} values from the stack, -popping off those values and pushing on the resulting vector. +popping off those values and pushing on the resulting vector. @var{n} +is a two-byte value, like in @code{vector}. +@end deffn + +@deffn Instruction make-struct n +Make a new struct from the top @var{n} values on the stack. The values +are popped, and the new struct is pushed. + +The deepest value is used as the vtable for the struct, and the rest are +used in order as the field initializers. Tail arrays are not supported +by this instruction. +@end deffn + +@deffn Instruction make-array n +Pop an array shape from the stack, then pop the remaining @var{n} +values, pushing a new array. @var{n} is encoded over three bytes. + +The array shape should be appropriate to store @var{n} values. +@xref{Array Procedures}, for more information on array shapes. +@end deffn + +Many of these data structures are constant, never changing over the +course of the different invocations of the procedure. In that case it is +often advantageous to make them once when the procedure is created, and +just reference them from the object table thereafter. @xref{Variables +and the VM}, for more information on the object table. + +@deffn Instruction object-ref n +@deffnx Instruction long-object-ref n +Push @var{n}th value from the current program's object vector. The +``long'' variant has a 16-bit index instead of an 8-bit index. +@end deffn + + +@node Loading Instructions +@subsubsection Loading Instructions + +In addition to VM instructions, an instruction stream may contain +variable-length data embedded within it. This data is always preceded +by special loading instructions, which interpret the data and advance +the instruction pointer to the next VM instruction. + +All of these loading instructions have a @code{length} parameter, +indicating the size of the embedded data, in bytes. The length itself +is encoded in 3 bytes. + +@deffn Instruction load-number length +Load an arbitrary number from the instruction stream. The number is +embedded in the stream as a string. +@end deffn +@deffn Instruction load-string length +Load a string from the instruction stream. The string is assumed to be +encoded in the ``latin1'' locale. +@end deffn +@deffn Instruction load-wide-string length +Load a UTF-32 string from the instruction stream. @var{length} is the +length in bytes, not in codepoints. +@end deffn +@deffn Instruction load-symbol length +Load a symbol from the instruction stream. The symbol is assumed to be +encoded in the ``latin1'' locale. Symbols backed by wide strings may +be loaded via @code{load-wide-string} then @code{make-symbol}. +@end deffn +@deffn Instruction load-array length +Load a uniform array from the instruction stream. The shape and type +of the array are popped off the stack, in that order. +@end deffn + +@deffn Instruction load-program +Load bytecode from the instruction stream, and push a compiled +procedure. + +This instruction pops one value from the stack: the program's object +table, as a vector, or @code{#f} in the case that the program has no +object table. A program that does not reference toplevel bindings and +does not use @code{object-ref} does not need an object table. + +This instruction is unlike the rest of the loading instructions, +because instead of parsing its data, it directly maps the instruction +stream onto a C structure, @code{struct scm_objcode}. @xref{Bytecode +and Objcode}, for more information. + +The resulting compiled procedure will not have any free variables +captured, so it may be loaded only once but used many times to create +closures. +@end deffn + +@node Dynamic Environment Instructions +@subsubsection Dynamic Environment Instructions + +Guile's virtual machine has low-level support for @code{dynamic-wind}, +dynamic binding, and composable prompts and aborts. + +@deffn Instruction wind +Pop an unwind thunk and a wind thunk from the stack, in that order, and +push them onto the ``dynamic stack''. The unwind thunk will be called on +nonlocal exits, and the wind thunk on reentries. Used to implement +@code{dynamic-wind}. + +Note that neither thunk is actually called; the compiler should emit +calls to wind and unwind for the normal dynamic-wind control flow. +@xref{Dynamic Wind}. @end deffn -@deffn Instruction mark -Pushes a special value onto the stack that other stack instructions -like @code{list-mark} can use. +@deffn Instruction unwind +Pop off the top entry from the ``dynamic stack'', for example, a +wind/unwind thunk pair. @code{unwind} instructions should be properly +paired with their winding instructions, like @code{wind}. @end deffn -@deffn Instruction list-mark -Create a list from values from the stack, as in @code{list}, but -instead of knowing beforehand how many there will be, keep going until -we see a @code{mark} value. +@deffn Instruction wind-fluids n +Pop off @var{n} values and @var{n} fluids from the stack, in that order. +Set the fluids to the values by creating a with-fluids object and +pushing that object on the dynamic stack. @xref{Fluids and Dynamic +States}. @end deffn -@deffn Instruction cons-mark -As the scheme procedure @code{cons*} is to the scheme procedure -@code{list}, so the instruction @code{cons-mark} is to the instruction -@code{list-mark}. +@deffn Instruction unwind-fluids +Pop a with-fluids object from the dynamic stack, and swap the current +values of its fluids with the saved values of its fluids. In this way, +the dynamic environment is left as it was before the corresponding +@code{wind-fluids} instruction was processed. @end deffn -@deffn Instruction vector-mark -Like @code{list-mark}, but makes a vector instead of a list. +@deffn Instruction fluid-ref +Pop a fluid from the stack, and push its current value. @end deffn -@deffn Instruction list-break -The opposite of @code{list}: pops a value, which should be a list, and -pushes its elements on the stack. +@deffn Instruction fluid-set +Pop a value and a fluid from the stack, in that order, and set the fluid +to the value. +@end deffn + +@deffn Instruction prompt escape-only? offset +Establish a dynamic prompt. @xref{Prompts}, for more information on +prompts. + +The prompt will be pushed on the dynamic stack. The normal control flow +should ensure that the prompt is popped off at the end, via +@code{unwind}. + +If an abort is made to this prompt, control will jump to @var{offset}, a +three-byte relative address. The continuation and all arguments to the +abort will be pushed on the stack, along with the total number of +arguments (including the continuation. If control returns to the +handler, the prompt is already popped off by the abort mechanism. +(Guile's @code{prompt} implements Felleisen's @dfn{--F--} operator.) + +If @var{escape-only?} is nonzero, the prompt will be marked as +escape-only, which allows an abort to this prompt to avoid reifying the +continuation. +@end deffn + +@deffn Instruction abort n +Abort to a dynamic prompt. + +This instruction pops one tail argument list, @var{n} arguments, and a +prompt tag from the stack. The dynamic environment is then searched for +a prompt having the given tag. If none is found, an error is signalled. +Otherwise all arguments are passed to the prompt's handler, along with +the captured continuation, if necessary. + +If the prompt's handler can be proven to not reference the captured +continuation, no continuation is allocated. This decision happens +dynamically, at run-time; the general case is that the continuation may +be captured, and thus resumed. A reinstated continuation will have its +arguments pushed on the stack, along with the number of arguments, as in +the multiple-value return convention. Therefore an @code{abort} +instruction should be followed by code ready to handle the equivalent of +a multiply-valued return. @end deffn @node Miscellaneous Instructions @subsubsection Miscellaneous Instructions @deffn Instruction nop -Does nothing! +Does nothing! Used for padding other instructions to certain +alignments. @end deffn @deffn Instruction halt @@ -850,15 +1220,14 @@ Pushes ``the unspecified value'' onto the stack. @subsubsection Inlined Scheme Instructions The Scheme compiler can recognize the application of standard Scheme -procedures, or unbound variables that look like they are bound to -standard Scheme procedures. It tries to inline these small operations -to avoid the overhead of creating new stack frames. +procedures. It tries to inline these small operations to avoid the +overhead of creating new stack frames. Since most of these operations are historically implemented as C primitives, not inlining them would entail constantly calling out from the VM to the interpreter, which has some costs---registers must be saved, the interpreter has to dispatch, called procedures have to do -much typechecking, etc. It's much more efficient to inline these +much type checking, etc. It's much more efficient to inline these operations in the virtual machine itself. All of these instructions pop their arguments from the stack and push @@ -876,14 +1245,21 @@ stream. @deffnx Instruction eqv? x y @deffnx Instruction equal? x y @deffnx Instruction pair? x y -@deffnx Instruction list? x y +@deffnx Instruction list? x @deffnx Instruction set-car! pair x @deffnx Instruction set-cdr! pair x -@deffnx Instruction slot-ref struct n -@deffnx Instruction slot-set struct n x -@deffnx Instruction cons x +@deffnx Instruction cons x y @deffnx Instruction car x @deffnx Instruction cdr x +@deffnx Instruction vector-ref x y +@deffnx Instruction vector-set x n y +@deffnx Instruction struct? x +@deffnx Instruction struct-ref x n +@deffnx Instruction struct-set x n v +@deffnx Instruction struct-vtable x +@deffnx Instruction class-of x +@deffnx Instruction slot-ref struct n +@deffnx Instruction slot-set struct n x Inlined implementations of their Scheme equivalents. @end deffn @@ -904,7 +1280,9 @@ As in the previous section, the definitions below show stack parameters instead of instruction stream parameters. @deffn Instruction add x y +@deffnx Instruction add1 x @deffnx Instruction sub x y +@deffnx Instruction sub1 x @deffnx Instruction mul x y @deffnx Instruction div x y @deffnx Instruction quo x y @@ -915,5 +1293,64 @@ parameters instead of instruction stream parameters. @deffnx Instruction gt? x y @deffnx Instruction le? x y @deffnx Instruction ge? x y +@deffnx Instruction ash x n +@deffnx Instruction logand x y +@deffnx Instruction logior x y +@deffnx Instruction logxor x y Inlined implementations of the corresponding mathematical operations. @end deffn + +@node Inlined Bytevector Instructions +@subsubsection Inlined Bytevector Instructions + +Bytevector operations correspond closely to what the current hardware +can do, so it makes sense to inline them to VM instructions, providing +a clear path for eventual native compilation. Without this, Scheme +programs would need other primitives for accessing raw bytes -- but +these primitives are as good as any. + +As in the previous section, the definitions below show stack +parameters instead of instruction stream parameters. + +The multibyte formats (@code{u16}, @code{f64}, etc) take an extra +endianness argument. Only aligned native accesses are currently +fast-pathed in Guile's VM. + +@deffn Instruction bv-u8-ref bv n +@deffnx Instruction bv-s8-ref bv n +@deffnx Instruction bv-u16-native-ref bv n +@deffnx Instruction bv-s16-native-ref bv n +@deffnx Instruction bv-u32-native-ref bv n +@deffnx Instruction bv-s32-native-ref bv n +@deffnx Instruction bv-u64-native-ref bv n +@deffnx Instruction bv-s64-native-ref bv n +@deffnx Instruction bv-f32-native-ref bv n +@deffnx Instruction bv-f64-native-ref bv n +@deffnx Instruction bv-u16-ref bv n endianness +@deffnx Instruction bv-s16-ref bv n endianness +@deffnx Instruction bv-u32-ref bv n endianness +@deffnx Instruction bv-s32-ref bv n endianness +@deffnx Instruction bv-u64-ref bv n endianness +@deffnx Instruction bv-s64-ref bv n endianness +@deffnx Instruction bv-f32-ref bv n endianness +@deffnx Instruction bv-f64-ref bv n endianness +@deffnx Instruction bv-u8-set bv n val +@deffnx Instruction bv-s8-set bv n val +@deffnx Instruction bv-u16-native-set bv n val +@deffnx Instruction bv-s16-native-set bv n val +@deffnx Instruction bv-u32-native-set bv n val +@deffnx Instruction bv-s32-native-set bv n val +@deffnx Instruction bv-u64-native-set bv n val +@deffnx Instruction bv-s64-native-set bv n val +@deffnx Instruction bv-f32-native-set bv n val +@deffnx Instruction bv-f64-native-set bv n val +@deffnx Instruction bv-u16-set bv n val endianness +@deffnx Instruction bv-s16-set bv n val endianness +@deffnx Instruction bv-u32-set bv n val endianness +@deffnx Instruction bv-s32-set bv n val endianness +@deffnx Instruction bv-u64-set bv n val endianness +@deffnx Instruction bv-s64-set bv n val endianness +@deffnx Instruction bv-f32-set bv n val endianness +@deffnx Instruction bv-f64-set bv n val endianness +Inlined implementations of the corresponding bytevector operations. +@end deffn