Implementation of SRFI-98 (An interface to access environment variables).
[bpt/guile.git] / doc / guile-vm.texi
1 \input texinfo @c -*-texinfo-*-
2 @c %**start of header
3 @setfilename guile-vm.info
4 @settitle Guile VM Specification
5 @footnotestyle end
6 @setchapternewpage odd
7 @c %**end of header
8
9 @set EDITION 0.6
10 @set VERSION 0.6
11 @set UPDATED 2005-04-26
12
13 @c Macro for instruction definitions.
14 @macro insn{}
15 Instruction
16 @end macro
17
18 @c For Scheme procedure definitions.
19 @macro scmproc{}
20 Scheme Procedure
21 @end macro
22
23 @c Scheme records.
24 @macro scmrec{}
25 Record
26 @end macro
27
28 @ifinfo
29 @dircategory Scheme Programming
30 @direntry
31 * Guile VM: (guile-vm). Guile's Virtual Machine.
32 @end direntry
33
34 This file documents Guile VM.
35
36 Copyright @copyright{} 2000 Keisuke Nishida
37 Copyright @copyright{} 2005 Ludovic Court`es
38
39 Permission is granted to make and distribute verbatim copies of this
40 manual provided the copyright notice and this permission notice are
41 preserved on all copies.
42
43 @ignore
44 Permission is granted to process this file through TeX and print the
45 results, provided the printed document carries a copying permission
46 notice identical to this one except for the removal of this paragraph
47 (this paragraph not being relevant to the printed manual).
48
49 @end ignore
50 Permission is granted to copy and distribute modified versions of this
51 manual under the conditions for verbatim copying, provided that the
52 entire resulting derived work is distributed under the terms of a
53 permission notice identical to this one.
54
55 Permission is granted to copy and distribute translations of this manual
56 into another language, under the above conditions for modified versions,
57 except that this permission notice may be stated in a translation
58 approved by the Free Software Foundation.
59 @end ifinfo
60
61 @titlepage
62 @title Guile VM Specification
63 @subtitle for Guile VM @value{VERSION}
64 @author Keisuke Nishida
65
66 @page
67 @vskip 0pt plus 1filll
68 Edition @value{EDITION} @*
69 Updated for Guile VM @value{VERSION} @*
70 @value{UPDATED} @*
71
72 Copyright @copyright{} 2000 Keisuke Nishida
73 Copyright @copyright{} 2005 Ludovic Court`es
74
75 Permission is granted to make and distribute verbatim copies of this
76 manual provided the copyright notice and this permission notice are
77 preserved on all copies.
78
79 Permission is granted to copy and distribute modified versions of this
80 manual under the conditions for verbatim copying, provided that the
81 entire resulting derived work is distributed under the terms of a
82 permission notice identical to this one.
83
84 Permission is granted to copy and distribute translations of this manual
85 into another language, under the above conditions for modified versions,
86 except that this permission notice may be stated in a translation
87 approved by the Free Software Foundation.
88 @end titlepage
89
90 @contents
91
92 @c *********************************************************************
93 @node Top, Introduction, (dir), (dir)
94 @top Guile VM Specification
95
96 This document would like to correspond to Guile VM @value{VERSION}.
97 However, be warned that important parts still correspond to version
98 0.0 and are not valid anymore.
99
100 @menu
101 * Introduction::
102 * Variable Management::
103 * Instruction Set::
104 * The Compiler::
105 * Concept Index::
106 * Function and Instruction Index::
107 * Command and Variable Index::
108
109 @detailmenu
110 --- The Detailed Node Listing ---
111
112 Instruction Set
113
114 * Environment Control Instructions::
115 * Branch Instructions::
116 * Subprogram Control Instructions::
117 * Data Control Instructions::
118
119 The Compiler
120
121 * Overview::
122 * The Language Front-Ends::
123 * GHIL::
124 * Compiling Scheme Code::
125 * GLIL::
126 * The Assembler::
127
128 @end detailmenu
129 @end menu
130
131 @c *********************************************************************
132 @node Introduction, Variable Management, Top, Top
133 @chapter What is Guile VM?
134
135 A Guile VM has a set of registers and its own stack memory. Guile may
136 have more than one VM's. Each VM may execute at most one program at a
137 time. Guile VM is a CISC system so designed as to execute Scheme and
138 other languages efficiently.
139
140 @unnumberedsubsec Registers
141
142 @itemize
143 @item pc - Program counter ;; ip (instruction poiner) is better?
144 @item sp - Stack pointer
145 @item bp - Base pointer
146 @item ac - Accumulator
147 @end itemize
148
149 @unnumberedsubsec Engine
150
151 A VM may have one of three engines: reckless, regular, or debugging.
152 Reckless engine is fastest but dangerous. Regular engine is normally
153 fail-safe and reasonably fast. Debugging engine is safest and
154 functional but very slow.
155
156 @unnumberedsubsec Memory
157
158 Stack is the only memory that each VM owns. The other memory is shared
159 memory that is shared among every VM and other part of Guile.
160
161 @unnumberedsubsec Program
162
163 A VM program consists of a bytecode that is executed and an environment
164 in which execution is done. Each program is allocated in the shared
165 memory and may be executed by any VM. A program may call other programs
166 within a VM.
167
168 @unnumberedsubsec Instruction
169
170 Guile VM has dozens of system instructions and (possibly) hundreds of
171 functional instructions. Some Scheme procedures such as cons and car
172 are implemented as VM's builtin functions, which are very efficient.
173 Other procedures defined outside of the VM are also considered as VM's
174 functional features, since they do not change the state of VM.
175 Procedures defined within the VM are called subprograms.
176
177 Most instructions deal with the accumulator (ac). The VM stores all
178 results from functions in ac, instead of pushing them into the stack.
179 I'm not sure whether this is a good thing or not.
180
181 @node Variable Management, Instruction Set, Introduction, Top
182 @chapter Variable Management
183
184 FIXME: This chapter needs to be reviewed so that it matches reality.
185 A more up-to-date description of the mechanisms described in this
186 section is given in @ref{Instruction Set}.
187
188 A program may have access to local variables, external variables, and
189 top-level variables.
190
191 @section Local/external variables
192
193 A stack is logically divided into several blocks during execution. A
194 "block" is such a unit that maintains local variables and dynamic chain.
195 A "frame" is an upper level unit that maintains subprogram calls.
196
197 @example
198 Stack
199 dynamic | | | |
200 chain +==========+ - =
201 | |local vars| | |
202 `-|block data| | block |
203 /|frame data| | |
204 | +----------+ - |
205 | |local vars| | | frame
206 `-|block data| | |
207 /+----------+ - |
208 | |local vars| | |
209 `-|block data| | |
210 /+==========+ - =
211 | |local vars| | |
212 `-|block data| | |
213 /|frame data| | |
214 | +----------+ - |
215 | | | | |
216 @end example
217
218 The first block of each frame may look like this:
219
220 @example
221 Address Data
222 ------- ----
223 xxx0028 Local variable 2
224 xxx0024 Local variable 1
225 bp ->xxx0020 Local variable 0
226 xxx001c Local link (block data)
227 xxx0018 External link (block data)
228 xxx0014 Stack pointer (block data)
229 xxx0010 Return address (frame data)
230 xxx000c Parent program (frame data)
231 @end example
232
233 The base pointer (bp) always points to the lowest address of local
234 variables of the recent block. Local variables are referred as "bp[n]".
235 The local link field has a pointer to the dynamic parent of the block.
236 The parent's variables are referred as "bp[-1][n]", and grandparent's
237 are "bp[-1][-1][n]". Thus, any local variable is represented by its
238 depth and offset from the current bp.
239
240 A variable may be "external", which is allocated in the shared memory.
241 The external link field of a block has a pointer to such a variable set,
242 which I call "fragment" (what should I call?). A fragment has a set of
243 variables and its own chain.
244
245 @example
246 local external
247 chain| | chain
248 | +-----+ .--------, |
249 `-|block|--+->|external|-'
250 /+-----+ | `--------'\,
251 `-|block|--' |
252 /+-----+ .--------, |
253 `-|block|---->|external|-'
254 +-----+ `--------'
255 | |
256 @end example
257
258 An external variable is referred as "bp[-2]->variables[n]" or
259 "bp[-2]->link->...->variables[n]". This is also represented by a pair
260 of depth and offset. At any point of execution, the value of bp
261 determines the current local link and external link, and thus the
262 current environment of a program.
263
264 Other data fields are described later.
265
266 @section Top-level variables
267
268 Guile VM uses the same top-level variables as the regular Guile. A
269 program may have direct access to vcells. Currently this is done by
270 calling scm_intern0, but a program is possible to have any top-level
271 environment defined by the current module.
272
273 @section Scheme and VM variable
274
275 Let's think about the following Scheme code as an example:
276
277 @example
278 (define (foo a)
279 (lambda (b) (list foo a b)))
280 @end example
281
282 In the lambda expression, "foo" is a top-level variable, "a" is an
283 external variable, and "b" is a local variable.
284
285 When a VM executes foo, it allocates a block for "a". Since "a" may be
286 externally referred from the closure, the VM creates a fragment with a
287 copy of "a" in it. When the VM evaluates the lambda expression, it
288 creates a subprogram (closure), associating the fragment with the
289 subprogram as its external environment. When the closure is executed,
290 its environment will look like this:
291
292 @example
293 block Top-level: foo
294 +-------------+
295 |local var: b | fragment
296 +-------------+ .-----------,
297 |external link|---->|variable: a|
298 +-------------+ `-----------'
299 @end example
300
301 The fragment remains as long as the closure exists.
302
303 @section Addressing mode
304
305 Guile VM has five addressing modes:
306
307 @itemize
308 @item Real address
309 @item Local position
310 @item External position
311 @item Top-level location
312 @item Constant object
313 @end itemize
314
315 Real address points to the address in the real program and is only used
316 with the program counter (pc).
317
318 Local position and external position are represented as a pair of depth
319 and offset from bp, as described above. These are base relative
320 addresses, and the real address may vary during execution.
321
322 Top-level location is represented as a Guile's vcell. This location is
323 determined at loading time, so the use of this address is efficient.
324
325 Constant object is not an address but gives an instruction an Scheme
326 object directly.
327
328 [ We'll also need dynamic scope addressing to support Emacs Lisp? ]
329
330
331 Overall procedure:
332
333 @enumerate
334 @item A source program is compiled into a bytecode.
335 @item A bytecode is given an environment and becomes a program.
336 @item A VM starts execution, creating a frame for it.
337 @item Whenever a program calls a subprogram, a new frame is created for it.
338 @item When a program finishes execution, it returns a value, and the VM
339 continues execution of the parent program.
340 @item When all programs terminated, the VM returns the final value and stops.
341 @end enumerate
342
343 \f
344 @node Instruction Set, The Compiler, Variable Management, Top
345 @chapter Instruction Set
346
347 The Guile VM instruction set is roughly divided two groups: system
348 instructions and functional instructions. System instructions control
349 the execution of programs, while functional instructions provide many
350 useful calculations.
351
352 @menu
353 * Environment Control Instructions::
354 * Branch Instructions::
355 * Subprogram Control Instructions::
356 * Data Control Instructions::
357 @end menu
358
359 @node Environment Control Instructions, Branch Instructions, Instruction Set, Instruction Set
360 @section Environment Control Instructions
361
362 @deffn @insn{} link binding-name
363 Look up @var{binding-name} (a string) in the current environment and
364 push the corresponding variable object onto the stack. If
365 @var{binding-name} is not bound yet, then create a new binding and
366 push its variable object.
367 @end deffn
368
369 @deffn @insn{} variable-ref
370 Dereference the variable object which is on top of the stack and
371 replace it by the value of the variable it represents.
372 @end deffn
373
374 @deffn @insn{} variable-set
375 Set the value of the variable on top of the stack (at @code{sp[0]}) to
376 the object located immediately before (at @code{sp[-1]}).
377 @end deffn
378
379 As an example, let us look at what a simple function call looks like:
380
381 @example
382 (+ 2 3)
383 @end example
384
385 This call yields the following sequence of instructions:
386
387 @example
388 (link "+") ;; lookup binding "+"
389 (variable-ref) ;; dereference it
390 (make-int8 2) ;; push immediate value `2'
391 (make-int8 3) ;; push immediate value `3'
392 (tail-call 2) ;; call the proc at sp[-3] with two args
393 @end example
394
395 @deffn @insn{} local-ref offset
396 Push onto the stack the value of the local variable located at
397 @var{offset} within the current stack frame.
398 @end deffn
399
400 @deffn @insn{} local-set offset
401 Pop the Scheme object located on top of the stack and make it the new
402 value of the local variable located at @var{offset} within the current
403 stack frame.
404 @end deffn
405
406 @deffn @insn{} external-ref offset
407 Push the value of the closure variable located at position
408 @var{offset} within the program's list of external variables.
409 @end deffn
410
411 @deffn @insn{} external-set offset
412 Pop the Scheme object located on top of the stack and make it the new
413 value of the closure variable located at @var{offset} within the
414 program's list of external variables.
415 @end deffn
416
417 @deffn @insn{} make-closure
418 Pop the program object from the stack and assign it the current
419 closure variable list as its closure. Push the result program
420 object.
421 @end deffn
422
423 Let's illustrate this:
424
425 @example
426 (let ((x 2))
427 (lambda ()
428 (let ((x++ (+ 1 x)))
429 (set! x x++)
430 x++)))
431 @end example
432
433 The resulting program has one external (closure) variable, i.e. its
434 @var{nexts} is set to 1 (@pxref{Subprogram Control Instructions}).
435 This yields the following code:
436
437 @example
438 ;; the traditional program prologue with NLOCS = 0 and NEXTS = 1
439
440 0 (make-int8 2)
441 2 (external-set 0)
442 4 (make-int8 4)
443 6 (link "+") ;; lookup `+'
444 9 (vector 1) ;; create the external variable vector for
445 ;; later use by `object-ref' and `object-set'
446 ...
447 40 (load-program ##34#)
448 59 (make-closure) ;; assign the current closure to the program
449 ;; just pushed by `load-program'
450 60 (return)
451 @end example
452
453 The program loaded here by @var{load-program} contains the following
454 sequence of instructions:
455
456 @example
457 0 (object-ref 0) ;; push the variable for `+'
458 2 (variable-ref) ;; dereference `+'
459 3 (make-int8:1) ;; push 1
460 4 (external-ref 0) ;; push the value of `x'
461 6 (call 2) ;; call `+' and push the result
462 8 (local-set 0) ;; make it the new value of `x++'
463 10 (local-ref 0) ;; push the value of `x++'
464 12 (external-set 0) ;; make it the new value of `x'
465 14 (local-ref 0) ;; push the value of `x++'
466 16 (return) ;; return it
467 @end example
468
469 At this point, you should know pretty much everything about the three
470 types of variables a program may need to access.
471
472
473 @node Branch Instructions, Subprogram Control Instructions, Environment Control Instructions, Instruction Set
474 @section Branch Instructions
475
476 All the conditional branch instructions described below work in the
477 same way:
478
479 @itemize
480 @item They take the Scheme object located on the stack and use it as
481 the branch condition;
482 @item If the condition if false, then program execution continues with
483 the next instruction;
484 @item If the condition is true, then the instruction pointer is
485 increased by the offset passed as an argument to the branch
486 instruction;
487 @item Finally, when the instruction finished, the condition object is
488 removed from the stack.
489 @end itemize
490
491 Note that the offset passed to the instruction is encoded on two 8-bit
492 integers which are then combined by the VM as one 16-bit integer.
493
494 @deffn @insn{} br offset
495 Jump to @var{offset}.
496 @end deffn
497
498 @deffn @insn{} br-if offset
499 Jump to @var{offset} if the condition on the stack is not false.
500 @end deffn
501
502 @deffn @insn{} br-if-not offset
503 Jump to @var{offset} if the condition on the stack is false.
504 @end deffn
505
506 @deffn @insn{} br-if-eq offset
507 Jump to @var{offset} if the two objects located on the stack are
508 equal in the sense of @var{eq?}. Note that, for this instruction, the
509 stack pointer is decremented by two Scheme objects instead of only
510 one.
511 @end deffn
512
513 @deffn @insn{} br-if-not-eq offset
514 Same as @var{br-if-eq} for non-equal objects.
515 @end deffn
516
517 @deffn @insn{} br-if-null offset
518 Jump to @var{offset} if the object on the stack is @code{'()}.
519 @end deffn
520
521 @deffn @insn{} br-if-not-null offset
522 Jump to @var{offset} if the object on the stack is not @code{'()}.
523 @end deffn
524
525
526 @node Subprogram Control Instructions, Data Control Instructions, Branch Instructions, Instruction Set
527 @section Subprogram Control Instructions
528
529 Programs (read: ``compiled procedure'') may refer to external
530 bindings, like variables or functions defined outside the program
531 itself, in the environment in which it will evaluate at run-time. In
532 a sense, a program's environment and its bindings are an implicit
533 parameter of every program.
534
535 @cindex object table
536 In order to handle such bindings, each program has an @dfn{object
537 table} associated to it. This table (actually a Scheme vector)
538 contains all constant objects referenced by the program. The object
539 table of a program is initialized right before a program is loaded
540 with @var{load-program}.
541
542 Variable objects are one such type of constant object: when a global
543 binding is defined, a variable object is associated to it and that
544 object will remain constant over time, even if the value bound to it
545 changes. Therefore, external bindings only need to be looked up once
546 when the program is loaded. References to the corresponding external
547 variables from within the program are then performed via the
548 @var{object-ref} instruction and are almost as fast as local variable
549 references.
550
551 Let us consider the following program (procedure) which references
552 external bindings @code{frob} and @var{%magic}:
553
554 @example
555 (lambda (x)
556 (frob x %magic))
557 @end example
558
559 This yields the following assembly code:
560
561 @example
562 (make-int8 64) ;; number of args, vars, etc. (see below)
563 (link "frob")
564 (link "%magic")
565 (vector 2) ;; object table (external bindings)
566 ...
567 (load-program #u8(20 0 23 21 0 20 1 23 36 2))
568 (return)
569 @end example
570
571 All the instructions occurring before @var{load-program} (some were
572 omitted for simplicity) form a @dfn{prologue} which, among other
573 things, pushed an object table (a vector) that contains the variable
574 objects for the variables bound to @var{frob} and @var{%magic}. This
575 vector and other data pushed onto the stack are then popped by the
576 @var{load-program} instruction.
577
578 Besides, the @var{load-program} instruction takes one explicit
579 argument which is the bytecode of the program itself. Disassembled,
580 this bytecode looks like:
581
582 @example
583 (object-ref 0) ;; push the variable object of `frob'
584 (variable-ref) ;; dereference it
585 (local-ref 0) ;; push the value of `x'
586 (object-ref 1) ;; push the variable object of `%magic'
587 (variable-ref) ;; dereference it
588 (tail-call 2) ;; call `frob' with two parameters
589 @end example
590
591 This clearly shows that there is little difference between references
592 to local variables and references to externally bound variables since
593 lookup of externally bound variables if performed only once before the
594 program is run.
595
596 @deffn @insn{} load-program bytecode
597 Load the program whose bytecode is @var{bytecode} (a u8vector), pop
598 its meta-information from the stack, and push a corresponding program
599 object onto the stack. The program's meta-information may consist of
600 (in the order in which it should be pushed onto the stack):
601
602 @itemize
603 @item optionally, a pair representing meta-data (see the
604 @var{program-meta} procedure); [FIXME: explain their meaning]
605 @item optionally, a vector which is the program's object table (a
606 program that does not reference external bindings does not need an
607 object table);
608 @item either one immediate integer or four immediate integers
609 representing respectively the number of arguments taken by the
610 function (@var{nargs}), the number of @dfn{rest arguments}
611 (@var{nrest}, 0 or 1), the number of local variables (@var{nlocs}) and
612 the number of external variables (@var{nexts}) (@pxref{Environment
613 Control Instructions}).
614 @end itemize
615
616 @end deffn
617
618 @deffn @insn{} object-ref offset
619 Push the variable object for the external variable located at
620 @var{offset} within the program's object table.
621 @end deffn
622
623 @deffn @insn{} return
624 Free the program's frame.
625 @end deffn
626
627 @deffn @insn{} call nargs
628 Call the procedure, continuation or program located at
629 @code{sp[-nargs]} with the @var{nargs} arguments located from
630 @code{sp[0]} to @code{sp[-nargs + 1]}. The
631 procedure/continuation/program and its arguments are dropped from the
632 stack and the result is pushed. When calling a program, the
633 @code{call} instruction reserves room for its local variables on the
634 stack, and initializes its list of closure variables and its vector of
635 externally bound variables.
636 @end deffn
637
638 @deffn @insn{} tail-call nargs
639 Same as @code{call} except that, for tail-recursive calls to a
640 program, the current stack frame is re-used, as required by RnRS.
641 This instruction is otherwise similar to @code{call}.
642 @end deffn
643
644
645 @node Data Control Instructions, , Subprogram Control Instructions, Instruction Set
646 @section Data Control Instructions
647
648 @deffn @insn{} make-int8 value
649 Push @var{value}, an 8-bit integer, onto the stack.
650 @end deffn
651
652 @deffn @insn{} make-int8:0
653 Push the immediate value @code{0} onto the stack.
654 @end deffn
655
656 @deffn @insn{} make-int8:1
657 Push the immediate value @code{1} onto the stack.
658 @end deffn
659
660 @deffn @insn{} make-false
661 Push @code{#f} onto the stack.
662 @end deffn
663
664 @deffn @insn{} make-true
665 Push @code{#t} onto the stack.
666 @end deffn
667
668 @itemize
669 @item %push
670 @item %pushi
671 @item %pushl, %pushl:0:0, %pushl:0:1, %pushl:0:2, %pushl:0:3
672 @item %pushe, %pushe:0:0, %pushe:0:1, %pushe:0:2, %pushe:0:3
673 @item %pusht
674 @end itemize
675
676 @itemize
677 @item %loadi
678 @item %loadl, %loadl:0:0, %loadl:0:1, %loadl:0:2, %loadl:0:3
679 @item %loade, %loade:0:0, %loade:0:1, %loade:0:2, %loade:0:3
680 @item %loadt
681 @end itemize
682
683 @itemize
684 @item %savei
685 @item %savel, %savel:0:0, %savel:0:1, %savel:0:2, %savel:0:3
686 @item %savee, %savee:0:0, %savee:0:1, %savee:0:2, %savee:0:3
687 @item %savet
688 @end itemize
689
690 @section Flow control instructions
691
692 @itemize
693 @item %br-if
694 @item %br-if-not
695 @item %jump
696 @end itemize
697
698 @section Function call instructions
699
700 @itemize
701 @item %func, %func0, %func1, %func2
702 @end itemize
703
704 @section Scheme built-in functions
705
706 @itemize
707 @item cons
708 @item car
709 @item cdr
710 @end itemize
711
712 @section Mathematical buitin functions
713
714 @itemize
715 @item 1+
716 @item 1-
717 @item add, add2
718 @item sub, sub2, minus
719 @item mul2
720 @item div2
721 @item lt2
722 @item gt2
723 @item le2
724 @item ge2
725 @item num-eq2
726 @end itemize
727
728
729 \f
730 @node The Compiler, Concept Index, Instruction Set, Top
731 @chapter The Compiler
732
733 This section describes Guile-VM's compiler and the compilation process
734 to produce bytecode executable by the VM itself (@pxref{Instruction
735 Set}).
736
737 @menu
738 * Overview::
739 * The Language Front-Ends::
740 * GHIL::
741 * Compiling Scheme Code::
742 * GLIL::
743 * The Assembler::
744 @end menu
745
746 @node Overview, The Language Front-Ends, The Compiler, The Compiler
747 @section Overview
748
749 Compilation in Guile-VM is a three-stage process:
750
751 @cindex intermediate language
752 @cindex assembler
753 @cindex compiler
754 @cindex GHIL
755 @cindex GLIL
756 @cindex bytecode
757
758 @enumerate
759 @item the source programming language (e.g. R5RS Scheme) is read and
760 translated into GHIL, @dfn{Guile's High-Level Intermediate Language};
761 @item GHIL code is then translated into a lower-level intermediate
762 language call GLIL, @dfn{Guile's Low-Level Intermediate Language};
763 @item finally, GLIL is @dfn{assembled} into the VM's assembly language
764 (@pxref{Instruction Set}) and bytecode.
765 @end enumerate
766
767 The use of two separate intermediate languages eases the
768 implementation of front-ends since the gap between high-level
769 languages like Scheme and GHIL is relatively small.
770
771 @vindex guilec
772 From an end-user viewpoint, compiling a Guile program into bytecode
773 can be done either by using the @command{guilec} command-line tool, or
774 by using the @code{compile-file} procedure exported by the
775 @code{(system base compile)} module.
776
777 @deffn @scmproc{} compile-file file . opts
778 Compile Scheme source code from file @var{file} using compilation
779 options @var{opts}. The resulting file, a Guile object file, will be
780 name according the application of the @code{compiled-file-name}
781 procedure to @var{file}. The possible values for @var{opts} are the
782 same as for the @code{compile-in} procedure (see below, @pxref{The Language
783 Front-Ends}).
784 @end deffn
785
786 @deffn @scmproc{} compiled-file-name file
787 Given source file name @var{file} (a string), return a string that
788 denotes the name of the Guile object file corresponding to
789 @var{file}. By default, the file name returned is @var{file} minus
790 its extension and plus the @code{.go} file extension.
791 @end deffn
792
793 @cindex self-hosting
794 It is worth noting, as you might have already guessed, that Guile-VM's
795 compiler is written in Guile Scheme and is @dfn{self-hosted}: it can
796 compile itself.
797
798 @node The Language Front-Ends, GHIL, Overview, The Compiler
799 @section The Language Front-Ends
800
801 Guile-VM comes with a number of @dfn{language front-ends}, that is,
802 code that can read a given high-level programming language like R5RS
803 Scheme, and translate it into a lower-level representation suitable to
804 the compiler.
805
806 Each language front-end provides a @dfn{specification} and a
807 @dfn{translator} to GHIL. Both of them come in the @code{language}
808 module hierarchy. As an example, the front-end for Scheme is located
809 in the @code{(language scheme spec)} and @code{(language scheme
810 translate)} modules. Language front-ends can then be retrieved using
811 the @code{lookup-language} procedure of the @code{(system base
812 language)} module.
813
814 @deftp @scmrec{} <language> name title version reader printer read-file expander translator evaluator environment
815 Denotes a language front-end specification a various methods used by
816 the compiler to handle source written in that language. Of particular
817 interest is the @code{translator} slot (@pxref{GHIL}).
818 @end deftp
819
820 @deffn @scmproc{} lookup-language lang
821 Look for a language front-end named @var{lang}, a symbol (e.g,
822 @code{scheme}), and return the @code{<language>} record describing it
823 if found. If @var{lang} does not denote a language front-end, an
824 error is raised. Note that this procedure assumes that language
825 @var{lang} exists if there exist a @code{(language @var{lang} spec)}
826 module.
827 @end deffn
828
829 The @code{(system base compile)} module defines a procedure similar to
830 @code{compile-file} but that is not limited to the Scheme language:
831
832 @deffn @scmproc{} compile-in expr env lang . opts
833 Compile expression @var{expr}, which is written in language @var{lang}
834 (a @code{<language>} object), using compilation options @var{opts},
835 and return bytecode as produced by the assembler (@pxref{The
836 Assembler}).
837
838 Options @var{opts} may contain the following keywords:
839
840 @table @code
841 @item :e
842 compilation will stop after the code expansion phase.
843 @item :t
844 compilation will stop after the code translation phase, i.e. after
845 code in the source language @var{lang} has been translated into GHIL
846 (@pxref{GHIL}).
847 @item :c
848 compilation will stop after the compilation phase and before the
849 assembly phase, i.e. once GHIL has been translated into GLIL
850 (@pxref{GLIL}).
851 @end table
852
853 Additionally, @var{opts} may contain any option understood by the
854 GHIL-to-GLIL compiler described in @xref{GLIL}.
855 @end deffn
856
857
858 @node GHIL, Compiling Scheme Code, The Language Front-Ends, The Compiler
859 @section Guile's High-Level Intermediate Language
860
861 GHIL has constructs almost equivalent to those found in Scheme.
862 However, unlike Scheme, it is meant to be read only by the compiler
863 itself. Therefore, a sequence of GHIL code is only a sequence of GHIL
864 @emph{objects} (records), as opposed to symbols, each of which
865 represents a particular language feature. These records are all
866 defined in the @code{(system il ghil)} module and are named
867 @code{<ghil-*>}.
868
869 Each GHIL record has at least two fields: one containing the
870 environment (Guile module) in which it is considered, and one
871 containing its location [FIXME: currently seems to be unused]. Below
872 is a list of the main GHIL object types and their fields:
873
874 @example
875 ;; Objects
876 (<ghil-void> env loc)
877 (<ghil-quote> env loc obj)
878 (<ghil-quasiquote> env loc exp)
879 (<ghil-unquote> env loc exp)
880 (<ghil-unquote-splicing> env loc exp)
881 ;; Variables
882 (<ghil-ref> env loc var)
883 (<ghil-set> env loc var val)
884 (<ghil-define> env loc var val)
885 ;; Controls
886 (<ghil-if> env loc test then else)
887 (<ghil-and> env loc exps)
888 (<ghil-or> env loc exps)
889 (<ghil-begin> env loc exps)
890 (<ghil-bind> env loc vars vals body)
891 (<ghil-lambda> env loc vars rest body)
892 (<ghil-call> env loc proc args)
893 (<ghil-inline> env loc inline args)
894 @end example
895
896 As can be seen from this examples, the constructs in GHIL are pretty
897 close to the fundamental primitives of Scheme.
898
899 It is the role of front-end language translators (@pxref{The Language
900 Front-Ends}) to produce a sequence of GHIL objects from the
901 human-readable, source programming language. The next section
902 describes the translator for the Scheme language.
903
904 @node Compiling Scheme Code, GLIL, GHIL, The Compiler
905 @section Compiling Scheme Code
906
907 The language object for Scheme, as returned by @code{(lookup-language
908 'scheme)} (@pxref{The Language Front-Ends}), defines a translator
909 procedure that returns a sequence of GHIL objects given Scheme code.
910 Before actually performing this operation, the Scheme translator
911 expands macros in the original source code.
912
913 The macros that may be expanded can come from different sources:
914
915 @itemize
916 @item core Guile macros, such as @code{false-if-exception};
917 @item macros defined in modules used by the module being compiled,
918 e.g., @code{receive} in @code{(ice-9 receive)};
919 @item macros defined within the module being compiled.
920 @end itemize
921
922 @cindex macro
923 @cindex syntax transformer
924 @findex define-macro
925 @findex defmacro
926 The main complexity in handling macros at compilation time is that
927 Guile's macros are first-class objects. For instance, when using
928 @code{define-macro}, one actually defines a @emph{procedure} that
929 returns code; of course, unlike a ``regular'' procedure, it is
930 executed when an S-exp is @dfn{memoized} by the evaluator, i.e.,
931 before the actual evaluation takes place. Worse, it is possible to
932 turn a procedure into a macro, or @dfn{syntax transformer}, thus
933 removing, to some extent, the boundary between the macro expansion and
934 evaluation phases, @inforef{Internal Macros, , guile}.
935
936 [FIXME: explain limitations, etc.]
937
938
939 @node GLIL, The Assembler, Compiling Scheme Code, The Compiler
940 @section Guile's Low-Level Intermediate Language
941
942 A GHIL instruction sequence can be compiled into GLIL using the
943 @code{compile} procedure exported by the @code{(system il compile)}
944 module. During this translation process, various optimizations may
945 also be performed.
946
947 The module @code{(system il glil)} defines record types representing
948 various low-level abstractions. Compared to GHIL, the flow control
949 primitives in GLIL are much more low-level: only @code{<glil-label>},
950 @code{<glil-branch>} and @code{<glil-call>} are available, no
951 @code{lambda}, @code{if}, etc.
952
953
954 @deffn @scmproc{} compile ghil environment . opts
955 Compile @var{ghil}, a GHIL instruction sequence, within
956 environment/module @var{environment}, and return the resulting GLIL
957 instruction sequence. The option list @var{opts} may be either the
958 empty list or a list containing the @code{:O} keyword in which case
959 @code{compile} will first go through an optimization stage of
960 @var{ghil}.
961
962 Note that the @code{:O} option may be passed at a higher-level to the
963 @code{compile-file} and @code{compile-in} procedures (@pxref{The
964 Language Front-Ends}).
965 @end deffn
966
967 @deffn @scmproc{} pprint-glil glil . port
968 Print @var{glil}, a GLIL sequence instructions, in a human-readable
969 form. If @var{port} is passed, it will be used as the output port.
970 @end deffn
971
972
973 Let's consider the following Scheme expression:
974
975 @example
976 (lambda (x) (+ x 1))
977 @end example
978
979 The corresponding (unoptimized) GLIL code, as shown by
980 @code{pprint-glil}, looks like this:
981
982 @example
983 (@@asm (0 0 0 0)
984 (@@asm (1 0 0 0) ;; expect one arg.
985 (@@bind (x argument 0)) ;; debugging info
986 (module-ref #f +) ;; lookup `+'
987 (argument-ref 0) ;; push the argument onto
988 ;; the stack
989 (const 1) ;; push `1'
990 (tail-call 2) ;; call `+', with 2 args,
991 ;; using the same stack frame
992 (@@source 15 33)) ;; additional debugging info
993 (return 0))
994 @end example
995
996 This is not unlike the VM's assembly language described in
997 @ref{Instruction Set}.
998
999 @node The Assembler, , GLIL, The Compiler
1000 @section The Assembler
1001
1002 @findex code->bytes
1003
1004 The final compilation step consists in converting the GLIL instruction
1005 sequence into VM bytecode. This is what the @code{assemble} procedure
1006 defined in the @code{(system vm assemble)} module is for. It relies
1007 on the @code{code->bytes} procedure of the @code{(system vm conv)}
1008 module to convert instructions (represented as lists whose @code{car}
1009 is a symbol naming the instruction, e.g. @code{object-ref},
1010 @pxref{Instruction Set}) into binary code, or @dfn{bytecode}.
1011 Bytecode itself is represented using SRFI-4 byte vectors,
1012 @inforef{SRFI-4, SRFI-4 homogeneous numeric vectors, guile}.
1013
1014
1015 @deffn @scmproc{} assemble glil environment . opts
1016 Return a binary representation of @var{glil} (bytecode), either in the
1017 form of an SRFI-4 @code{u8vector} or a @code{<bytespec>} object.
1018 [FIXME: Why is that?]
1019 @end deffn
1020
1021
1022
1023 @c *********************************************************************
1024 @node Concept Index, Function and Instruction Index, The Compiler, Top
1025 @unnumbered Concept Index
1026 @printindex cp
1027
1028 @node Function and Instruction Index, Command and Variable Index, Concept Index, Top
1029 @unnumbered Function and Instruction Index
1030 @printindex fn
1031
1032 @node Command and Variable Index, , Function and Instruction Index, Top
1033 @unnumbered Command and Variable Index
1034 @printindex vr
1035
1036 @bye
1037
1038 @c Local Variables:
1039 @c ispell-local-dictionary: "american";
1040 @c End:
1041
1042 @c LocalWords: bytecode