Improved the VM's efficiency. The VM is as fast as the interpreter. :-(
[bpt/guile.git] / doc / guile-vm.texi
1 \input texinfo @c -*-texinfo-*-
2 @c %**start of header
3 @setfilename guile-vm.info
4 @settitle Guile VM Specification
5 @footnotestyle end
6 @setchapternewpage odd
7 @c %**end of header
8
9 @set EDITION 0.6
10 @set VERSION 0.6
11 @set UPDATED 2005-04-26
12
13 @c Macro for instruction definitions.
14 @macro insn{}
15 Instruction
16 @end macro
17
18 @ifinfo
19 @dircategory Scheme Programming
20 @direntry
21 * Guile VM: (guile-vm). Guile's Virtual Machine.
22 @end direntry
23
24 This file documents Guile VM.
25
26 Copyright @copyright{} 2000 Keisuke Nishida
27 Copyright @copyright{} 2005 Ludovic Court`es
28
29 Permission is granted to make and distribute verbatim copies of this
30 manual provided the copyright notice and this permission notice are
31 preserved on all copies.
32
33 @ignore
34 Permission is granted to process this file through TeX and print the
35 results, provided the printed document carries a copying permission
36 notice identical to this one except for the removal of this paragraph
37 (this paragraph not being relevant to the printed manual).
38
39 @end ignore
40 Permission is granted to copy and distribute modified versions of this
41 manual under the conditions for verbatim copying, provided that the
42 entire resulting derived work is distributed under the terms of a
43 permission notice identical to this one.
44
45 Permission is granted to copy and distribute translations of this manual
46 into another language, under the above conditions for modified versions,
47 except that this permission notice may be stated in a translation
48 approved by the Free Software Foundation.
49 @end ifinfo
50
51 @titlepage
52 @title Guile VM Specification
53 @subtitle for Guile VM @value{VERSION}
54 @author Keisuke Nishida
55
56 @page
57 @vskip 0pt plus 1filll
58 Edition @value{EDITION} @*
59 Updated for Guile VM @value{VERSION} @*
60 @value{UPDATED} @*
61
62 Copyright @copyright{} 2000 Keisuke Nishida
63 Copyright @copyright{} 2005 Ludovic Court`es
64
65 Permission is granted to make and distribute verbatim copies of this
66 manual provided the copyright notice and this permission notice are
67 preserved on all copies.
68
69 Permission is granted to copy and distribute modified versions of this
70 manual under the conditions for verbatim copying, provided that the
71 entire resulting derived work is distributed under the terms of a
72 permission notice identical to this one.
73
74 Permission is granted to copy and distribute translations of this manual
75 into another language, under the above conditions for modified versions,
76 except that this permission notice may be stated in a translation
77 approved by the Free Software Foundation.
78 @end titlepage
79
80 @contents
81
82 @c *********************************************************************
83 @node Top, Introduction, (dir), (dir)
84 @top Guile VM Specification
85
86 This document would like to correspond to Guile VM @value{VERSION}.
87 However, be warned that important parts still correspond to version
88 0.0 and are not valid anymore.
89
90 @menu
91 * Introduction::
92 * Variable Management::
93 * Program Execution::
94 * Instruction Set::
95 @end menu
96
97 @c *********************************************************************
98 @node Introduction, Variable Management, Top, Top
99 @chapter What is Guile VM?
100
101 A Guile VM has a set of registers and its own stack memory. Guile may
102 have more than one VM's. Each VM may execute at most one program at a
103 time. Guile VM is a CISC system so designed as to execute Scheme and
104 other languages efficiently.
105
106 @unnumberedsubsec Registers
107
108 @itemize
109 @item pc - Program counter ;; ip (instruction poiner) is better?
110 @item sp - Stack pointer
111 @item bp - Base pointer
112 @item ac - Accumulator
113 @end itemize
114
115 @unnumberedsubsec Engine
116
117 A VM may have one of three engines: reckless, regular, or debugging.
118 Reckless engine is fastest but dangerous. Regular engine is normally
119 fail-safe and reasonably fast. Debugging engine is safest and
120 functional but very slow.
121
122 @unnumberedsubsec Memory
123
124 Stack is the only memory that each VM owns. The other memory is shared
125 memory that is shared among every VM and other part of Guile.
126
127 @unnumberedsubsec Program
128
129 A VM program consists of a bytecode that is executed and an environment
130 in which execution is done. Each program is allocated in the shared
131 memory and may be executed by any VM. A program may call other programs
132 within a VM.
133
134 @unnumberedsubsec Instruction
135
136 Guile VM has dozens of system instructions and (possibly) hundreds of
137 functional instructions. Some Scheme procedures such as cons and car
138 are implemented as VM's builtin functions, which are very efficient.
139 Other procedures defined outside of the VM are also considered as VM's
140 functional features, since they do not change the state of VM.
141 Procedures defined within the VM are called subprograms.
142
143 Most instructions deal with the accumulator (ac). The VM stores all
144 results from functions in ac, instead of pushing them into the stack.
145 I'm not sure whether this is a good thing or not.
146
147 @node Variable Management, Program Execution, Introduction, Top
148 @chapter Variable Management
149
150 A program may have access to local variables, external variables, and
151 top-level variables.
152
153 @section Local/external variables
154
155 A stack is logically divided into several blocks during execution. A
156 "block" is such a unit that maintains local variables and dynamic chain.
157 A "frame" is an upper level unit that maintains subprogram calls.
158
159 @example
160 Stack
161 dynamic | | | |
162 chain +==========+ - =
163 | |local vars| | |
164 `-|block data| | block |
165 /|frame data| | |
166 | +----------+ - |
167 | |local vars| | | frame
168 `-|block data| | |
169 /+----------+ - |
170 | |local vars| | |
171 `-|block data| | |
172 /+==========+ - =
173 | |local vars| | |
174 `-|block data| | |
175 /|frame data| | |
176 | +----------+ - |
177 | | | | |
178 @end example
179
180 The first block of each frame may look like this:
181
182 @example
183 Address Data
184 ------- ----
185 xxx0028 Local variable 2
186 xxx0024 Local variable 1
187 bp ->xxx0020 Local variable 0
188 xxx001c Local link (block data)
189 xxx0018 External link (block data)
190 xxx0014 Stack pointer (block data)
191 xxx0010 Return address (frame data)
192 xxx000c Parent program (frame data)
193 @end example
194
195 The base pointer (bp) always points to the lowest address of local
196 variables of the recent block. Local variables are referred as "bp[n]".
197 The local link field has a pointer to the dynamic parent of the block.
198 The parent's variables are referred as "bp[-1][n]", and grandparent's
199 are "bp[-1][-1][n]". Thus, any local variable is represented by its
200 depth and offset from the current bp.
201
202 A variable may be "external", which is allocated in the shared memory.
203 The external link field of a block has a pointer to such a variable set,
204 which I call "fragment" (what should I call?). A fragment has a set of
205 variables and its own chain.
206
207 @example
208 local external
209 chain| | chain
210 | +-----+ .--------, |
211 `-|block|--+->|external|-'
212 /+-----+ | `--------'\,
213 `-|block|--' |
214 /+-----+ .--------, |
215 `-|block|---->|external|-'
216 +-----+ `--------'
217 | |
218 @end example
219
220 An external variable is referred as "bp[-2]->variables[n]" or
221 "bp[-2]->link->...->variables[n]". This is also represented by a pair
222 of depth and offset. At any point of execution, the value of bp
223 determines the current local link and external link, and thus the
224 current environment of a program.
225
226 Other data fields are described later.
227
228 @section Top-level variables
229
230 Guile VM uses the same top-level variables as the regular Guile. A
231 program may have direct access to vcells. Currently this is done by
232 calling scm_intern0, but a program is possible to have any top-level
233 environment defined by the current module.
234
235 @section Scheme and VM variable
236
237 Let's think about the following Scheme code as an example:
238
239 @example
240 (define (foo a)
241 (lambda (b) (list foo a b)))
242 @end example
243
244 In the lambda expression, "foo" is a top-level variable, "a" is an
245 external variable, and "b" is a local variable.
246
247 When a VM executes foo, it allocates a block for "a". Since "a" may be
248 externally referred from the closure, the VM creates a fragment with a
249 copy of "a" in it. When the VM evaluates the lambda expression, it
250 creates a subprogram (closure), associating the fragment with the
251 subprogram as its external environment. When the closure is executed,
252 its environment will look like this:
253
254 @example
255 block Top-level: foo
256 +-------------+
257 |local var: b | fragment
258 +-------------+ .-----------,
259 |external link|---->|variable: a|
260 +-------------+ `-----------'
261 @end example
262
263 The fragment remains as long as the closure exists.
264
265 @section Addressing mode
266
267 Guile VM has five addressing modes:
268
269 @itemize
270 @item Real address
271 @item Local position
272 @item External position
273 @item Top-level location
274 @item Constant object
275 @end itemize
276
277 Real address points to the address in the real program and is only used
278 with the program counter (pc).
279
280 Local position and external position are represented as a pair of depth
281 and offset from bp, as described above. These are base relative
282 addresses, and the real address may vary during execution.
283
284 Top-level location is represented as a Guile's vcell. This location is
285 determined at loading time, so the use of this address is efficient.
286
287 Constant object is not an address but gives an instruction an Scheme
288 object directly.
289
290 [ We'll also need dynamic scope addressing to support Emacs Lisp? ]
291
292 @unnumberedsubsec At a Glance
293
294 Guile VM has a set of instructions for each instruction family. `%load'
295 is, for example, a family to load an object from memory and set the
296 accumulator (ac). There are four basic `%load' instructions:
297
298 @example
299 %loadl - Local addressing
300 %loade - External addressing
301 %loadt - Top-level addressing
302 %loadi - Immediate addressing
303 @end example
304
305 A possible program code may look like this:
306
307 @example
308 %loadl (0 . 1) ; ac = local[0][1]
309 %loade (2 . 3) ; ac = external[2][3]
310 %loadt (foo . #<undefined>) ; ac = #<undefined>
311 %loadi "hello" ; ac = "hello"
312 @end example
313
314 One instruction that uses real addressing is `%jump', which changes the
315 value of the program counter:
316
317 @example
318 %jump 0x80234ab8 ; pc = 0x80234ab8
319 @end example
320
321
322 @node Program Execution, Instruction Set, Variable Management, Top
323 @chapter Program Execution
324
325 Overall procedure:
326
327 @enumerate
328 @item A source program is compiled into a bytecode.
329 @item A bytecode is given an environment and becomes a program.
330 @item A VM starts execution, creating a frame for it.
331 @item Whenever a program calls a subprogram, a new frame is created for it.
332 @item When a program finishes execution, it returns a value, and the VM
333 continues execution of the parent program.
334 @item When all programs terminated, the VM returns the final value and stops.
335 @end enumerate
336
337 @section Environment
338
339 Local variable:
340
341 @example
342 (let ((a 1) (b 2) (c 3)) (+ a b c)) ->
343
344 %pushi 1 ; a
345 %pushi 2 ; b
346 %pushi 3 ; c
347 %bind 3 ; create local bindings
348 %pushl (0 . 0) ; local variable a
349 %pushl (0 . 1) ; local variable b
350 %pushl (0 . 2) ; local variable c
351 add 3 ; ac = a + b + c
352 %unbind ; remove local bindings
353 @end example
354
355 External variable:
356
357 @example
358 (define foo (let ((n 0)) (lambda () n)))
359
360 %pushi 0 ; n
361 %bind 1 ; create local bindings
362 %export [0] ; make it an external variable
363 %make-program #<bytecode xxx> ; create a program in this environment
364 %unbind ; remove local bindings
365 %savet (foo . #<undefined>) ; save the program in foo
366
367 (foo) ->
368
369 %loadt (foo . #<program xxx>) ; program has an external link
370 %call 0 ; change the current external link
371 %loade (0 . 0) ; external variable n
372 %return ; recover the external link
373 @end example
374
375 Top-level variable:
376
377 @example
378 foo ->
379
380 %loadt (foo . #<program xxx>) ; top-level variable foo
381 @end example
382
383 @section Flow control
384
385 @example
386 (if #t 1 0) ->
387
388 %loadi #t
389 %br-if-not L1
390 %loadi 1
391 %jump L2
392 L1: %loadi 0
393 L2:
394 @end example
395
396 @section Function call
397
398 Builtin function:
399
400 @example
401 (1+ 2) ->
402
403 %loadi 2 ; ac = 2
404 1+ ; one argument
405
406 (+ 1 2) ->
407
408 %pushi 1 ; 1 -> stack
409 %loadi 2 ; ac = 2
410 add2 ; two argument
411
412 (+ 1 2 3) ->
413
414 %pushi 1 ; 1 -> stack
415 %pushi 2 ; 2 -> stack
416 %pushi 3 ; 3 -> stack
417 add 3 ; many argument
418 @end example
419
420 External function:
421
422 @example
423 (version) ->
424
425 %func0 (version . #<primitive-procedure version>) ; no argument
426
427 (display "hello") ->
428
429 %loadi "hello"
430 %func1 (display . #<primitive-procedure display>) ; one argument
431
432 (open-file "file" "w") ->
433
434 %pushi "file"
435 %loadi "w"
436 %func2 (open-file . #<primitive-procedure open-file>) ; two arguments
437
438 (equal 1 2 3)
439
440 %pushi 1
441 %pushi 2
442 %pushi 3
443 %loadi 3 ; the number of arguments
444 %func (equal . #<primitive-procedure equal>) ; many arguments
445 @end example
446
447 @section Subprogram call
448
449 @example
450 (define (plus a b) (+ a b))
451 (plus 1 2) ->
452
453 %pushi 1 ; argument 1
454 %pushi 2 ; argument 2
455 %loadt (plus . #<program xxx>) ; load the program
456 %call 2 ; call it with two arguments
457 %pushl (0 . 0) ; argument 1
458 %loadl (0 . 1) ; argument 2
459 add2 ; ac = 1 + 2
460 %return ; result is 3
461 @end example
462
463 @node Instruction Set, , Program Execution, Top
464 @chapter Instruction Set
465
466 The Guile VM instruction set is roughly divided two groups: system
467 instructions and functional instructions. System instructions control
468 the execution of programs, while functional instructions provide many
469 useful calculations.
470
471 @menu
472 * Environment Control Instructions::
473 * Subprogram Control Instructions::
474 * Data Control Instructions::
475 @end menu
476
477 @node Environment Control Instructions, Subprogram Control Instructions, Instruction Set, Instruction Set
478 @section Environment Control Instructions
479
480 @deffn @insn{} link binding-name
481 Look up @var{binding-name} (a string) in the current environment and
482 push the corresponding variable object onto the stack. If
483 @var{binding-name} is not bound yet, then create a new binding and
484 push its variable object.
485 @end deffn
486
487 @deffn @insn{} variable-ref
488 Dereference the variable object which is on top of the stack and
489 replace it by the value of the variable it represents.
490 @end deffn
491
492 @deffn @insn{} variable-set
493 Set the value of the variable on top of the stack (at @code{sp[0]}) to
494 the object located immediately before (at @code{sp[-1]}).
495 @end deffn
496
497 As an example, let us look at what a simple function call looks like:
498
499 @example
500 (+ 2 3)
501 @end example
502
503 This call yields the following sequence of instructions:
504
505 @example
506 (link "+") ;; lookup binding "x"
507 (variable-ref) ;; dereference it
508 (make-int8 2) ;; push immediate value `2'
509 (make-int8 3) ;; push immediate value `3'
510 (tail-call 2) ;; call the proc at sp[-3] with two args
511 @end example
512
513 @itemize
514 @item %alloc
515 @item %bind
516 @item %export
517 @item %unbind
518 @end itemize
519
520 @node Subprogram Control Instructions, Data Control Instructions, Environment Control Instructions, Instruction Set
521 @section Subprogram Control Instructions
522
523 Programs (read: ``compiled procedure'') may refer to external
524 bindings, like variables or functions defined outside the program
525 itself, in the environment in which it will evaluate at run-time. In
526 a sense, a program's environment and its bindings are an implicit
527 parameter of every program.
528
529 @cindex Object table
530 In order to handle such bindings, each program has an @dfn{object
531 table} associated to it. This table (actually a vector) contains all
532 the variable objects corresponding to the external bindings referenced
533 by the program. The object table of a program is initialized right
534 before a program is loaded and run with @var{load-program}.
535
536 Therefore, external bindings only need to be looked up once before the
537 program is loaded. References to the corresponding external variables
538 from within the program are then performed via the @var{object-ref}
539 instruction and are almost as fast as local variable references.
540
541 Let us consider the following program (procedure) which references
542 external bindings @code{frob} and @var{%magic}:
543
544 @example
545 (lambda (x)
546 (frob x %magic))
547 @end example
548
549 This yields the following assembly code:
550
551 @example
552 (make-int8 64) ;; number of args, vars, etc. (see below)
553 (link "frob")
554 (link "%magic")
555 (vector 2)
556 ...
557 (load-program #u8(20 0 23 21 0 20 1 23 36 2))
558 (return)
559 @end example
560
561 All the instructions occurring before @var{load-program} (some were
562 omitted for simplicity) form a @dfn{prologue} which, among other
563 things, pushed an object table (a vector) that contains the variable
564 objects for the variables bound to @var{frob} and @var{%magic}. This
565 vector and other data pushed onto the stack are then popped by the
566 @var{load-program} instruction.
567
568 Besides, the @var{load-program} instruction takes one explicit
569 argument which is the bytecode of the program itself. Disassembled,
570 this bytecode looks like:
571
572 @example
573 (object-ref 0) ;; push the variable object of `frob'
574 (variable-ref) ;; dereference it
575 (local-ref 0) ;; push the value of `x'
576 (object-ref 1) ;; push the variable object of `%magic'
577 (variable-ref) ;; dereference it
578 (tail-call 2) ;; call `frob' with two parameters
579 @end example
580
581 This clearly shows that there is little difference between references
582 to local variables and references to externally bound variables.
583
584 @deffn @insn{} load-program bytecode
585 Load the program whose bytecode is @var{bytecode} (a u8vector) and pop
586 its meta-information from the stack. The program's meta-information
587 may consist of (in the order in which it should be pushed onto the
588 stack):
589
590 @itemize
591 @item optionally, a pair representing meta-data (see the
592 @var{program-meta} procedure); [FIXME: explain their meaning]
593 @item optionally, a vector which is the program's object table (a
594 program that does not reference external bindings does not need an
595 object table);
596 @item either one immediate integer or four immediate integers
597 representing respectively the number of arguments taken by the
598 function (@var{nargs}), the number of @dfn{rest arguments}
599 (@var{nrest}, 0 or 1), the number of local variables (@var{nlocs}) and
600 the number of external variables (@var{nexts}) (see the example
601 above).
602 @end itemize
603
604 In the end, push a program object onto the stack.
605
606 @end deffn
607
608 @deffn @insn{} object-ref offset
609 Push the variable object for the external variable located at
610 @var{offset} within the program's object table.
611 @end deffn
612
613 @deffn @insn{} return
614 Free the program's frame.
615 @end deffn
616
617
618 @node Data Control Instructions, , Subprogram Control Instructions, Instruction Set
619 @section Data Control Instructions
620
621 @deffn @insn{} make-int8 value
622 Push @var{value}, an 8-bit integer, onto the stack.
623 @end deffn
624
625 @deffn @insn{} make-int8:0
626 Push the immediate value @code{0} onto the stack.
627 @end deffn
628
629 @deffn @insn{} make-int8:1
630 Push the immediate value @code{1} onto the stack.
631 @end deffn
632
633 @deffn @insn{} make-false
634 Push @code{#f} onto the stack.
635 @end deffn
636
637 @deffn @insn{} make-true
638 Push @code{#t} onto the stack.
639 @end deffn
640
641 @itemize
642 @item %push
643 @item %pushi
644 @item %pushl, %pushl:0:0, %pushl:0:1, %pushl:0:2, %pushl:0:3
645 @item %pushe, %pushe:0:0, %pushe:0:1, %pushe:0:2, %pushe:0:3
646 @item %pusht
647 @end itemize
648
649 @itemize
650 @item %loadi
651 @item %loadl, %loadl:0:0, %loadl:0:1, %loadl:0:2, %loadl:0:3
652 @item %loade, %loade:0:0, %loade:0:1, %loade:0:2, %loade:0:3
653 @item %loadt
654 @end itemize
655
656 @itemize
657 @item %savei
658 @item %savel, %savel:0:0, %savel:0:1, %savel:0:2, %savel:0:3
659 @item %savee, %savee:0:0, %savee:0:1, %savee:0:2, %savee:0:3
660 @item %savet
661 @end itemize
662
663 @section Flow control instructions
664
665 @itemize
666 @item %br-if
667 @item %br-if-not
668 @item %jump
669 @end itemize
670
671 @section Function call instructions
672
673 @itemize
674 @item %func, %func0, %func1, %func2
675 @end itemize
676
677 @section Scheme built-in functions
678
679 @itemize
680 @item cons
681 @item car
682 @item cdr
683 @end itemize
684
685 @section Mathematical buitin functions
686
687 @itemize
688 @item 1+
689 @item 1-
690 @item add, add2
691 @item sub, sub2, minus
692 @item mul2
693 @item div2
694 @item lt2
695 @item gt2
696 @item le2
697 @item ge2
698 @item num-eq2
699 @end itemize
700
701 @c *********************************************************************
702 @c @node Concept Index, Command Index, Related Information, Top
703 @c @unnumbered Concept Index
704 @c @printindex cp
705
706 @c @node Command Index, Variable Index, Concept Index, Top
707 @c @unnumbered Command Index
708 @c @printindex fn
709
710 @c @node Variable Index, , Command Index, Top
711 @c @unnumbered Variable Index
712 @c @printindex vr
713
714 @bye
715 \f
716 @c Local Variables:
717 @c mode:outline-minor
718 @c outline-regexp:"@\\(ch\\|sec\\|subs\\)"
719 @c End:
720
721 @c LocalWords: bytecode