Change Guile license to LGPLv3+
[bpt/guile.git] / doc / guile-vm.texi
CommitLineData
015959cb
KN
1\input texinfo @c -*-texinfo-*-
2@c %**start of header
3@setfilename guile-vm.info
4@settitle Guile VM Specification
5@footnotestyle end
6@setchapternewpage odd
7@c %**end of header
8
fa19602c
LC
9@set EDITION 0.6
10@set VERSION 0.6
11@set UPDATED 2005-04-26
015959cb 12
238e7a11
LC
13@c Macro for instruction definitions.
14@macro insn{}
15Instruction
16@end macro
17
a52b2d3d
LC
18@c For Scheme procedure definitions.
19@macro scmproc{}
20Scheme Procedure
21@end macro
22
23@c Scheme records.
24@macro scmrec{}
25Record
26@end macro
27
015959cb
KN
28@ifinfo
29@dircategory Scheme Programming
30@direntry
238e7a11 31* Guile VM: (guile-vm). Guile's Virtual Machine.
015959cb
KN
32@end direntry
33
34This file documents Guile VM.
35
36Copyright @copyright{} 2000 Keisuke Nishida
238e7a11 37Copyright @copyright{} 2005 Ludovic Court`es
015959cb
KN
38
39Permission is granted to make and distribute verbatim copies of this
40manual provided the copyright notice and this permission notice are
41preserved on all copies.
42
43@ignore
44Permission is granted to process this file through TeX and print the
45results, provided the printed document carries a copying permission
46notice identical to this one except for the removal of this paragraph
47(this paragraph not being relevant to the printed manual).
48
49@end ignore
50Permission is granted to copy and distribute modified versions of this
51manual under the conditions for verbatim copying, provided that the
52entire resulting derived work is distributed under the terms of a
53permission notice identical to this one.
54
55Permission is granted to copy and distribute translations of this manual
56into another language, under the above conditions for modified versions,
57except that this permission notice may be stated in a translation
58approved by the Free Software Foundation.
59@end ifinfo
60
61@titlepage
62@title Guile VM Specification
63@subtitle for Guile VM @value{VERSION}
64@author Keisuke Nishida
65
66@page
67@vskip 0pt plus 1filll
68Edition @value{EDITION} @*
69Updated for Guile VM @value{VERSION} @*
70@value{UPDATED} @*
71
72Copyright @copyright{} 2000 Keisuke Nishida
238e7a11 73Copyright @copyright{} 2005 Ludovic Court`es
015959cb
KN
74
75Permission is granted to make and distribute verbatim copies of this
76manual provided the copyright notice and this permission notice are
77preserved on all copies.
78
79Permission is granted to copy and distribute modified versions of this
80manual under the conditions for verbatim copying, provided that the
81entire resulting derived work is distributed under the terms of a
82permission notice identical to this one.
83
84Permission is granted to copy and distribute translations of this manual
85into another language, under the above conditions for modified versions,
86except that this permission notice may be stated in a translation
87approved by the Free Software Foundation.
88@end titlepage
89
90@contents
91
92@c *********************************************************************
93@node Top, Introduction, (dir), (dir)
94@top Guile VM Specification
95
238e7a11
LC
96This document would like to correspond to Guile VM @value{VERSION}.
97However, be warned that important parts still correspond to version
980.0 and are not valid anymore.
015959cb
KN
99
100@menu
fa19602c
LC
101* Introduction::
102* Variable Management::
fa19602c 103* Instruction Set::
a52b2d3d 104* The Compiler::
b6368dbb
LC
105* Concept Index::
106* Function and Instruction Index::
107* Command and Variable Index::
f41cb00c
LC
108
109@detailmenu
110 --- The Detailed Node Listing ---
111
112Instruction Set
113
114* Environment Control Instructions::
115* Branch Instructions::
116* Subprogram Control Instructions::
117* Data Control Instructions::
118
a52b2d3d
LC
119The Compiler
120
121* Overview::
122* The Language Front-Ends::
123* GHIL::
9dbbe4bb 124* Compiling Scheme Code::
a52b2d3d
LC
125* GLIL::
126* The Assembler::
127
f41cb00c 128@end detailmenu
015959cb
KN
129@end menu
130
131@c *********************************************************************
fa19602c 132@node Introduction, Variable Management, Top, Top
015959cb 133@chapter What is Guile VM?
a98cef7e
KN
134
135A Guile VM has a set of registers and its own stack memory. Guile may
136have more than one VM's. Each VM may execute at most one program at a
137time. Guile VM is a CISC system so designed as to execute Scheme and
138other languages efficiently.
139
fa19602c 140@unnumberedsubsec Registers
a98cef7e 141
fa19602c
LC
142@itemize
143@item pc - Program counter ;; ip (instruction poiner) is better?
144@item sp - Stack pointer
145@item bp - Base pointer
146@item ac - Accumulator
147@end itemize
a98cef7e 148
fa19602c 149@unnumberedsubsec Engine
a98cef7e
KN
150
151A VM may have one of three engines: reckless, regular, or debugging.
152Reckless engine is fastest but dangerous. Regular engine is normally
153fail-safe and reasonably fast. Debugging engine is safest and
154functional but very slow.
155
fa19602c 156@unnumberedsubsec Memory
a98cef7e
KN
157
158Stack is the only memory that each VM owns. The other memory is shared
159memory that is shared among every VM and other part of Guile.
160
fa19602c 161@unnumberedsubsec Program
a98cef7e
KN
162
163A VM program consists of a bytecode that is executed and an environment
164in which execution is done. Each program is allocated in the shared
165memory and may be executed by any VM. A program may call other programs
166within a VM.
167
fa19602c 168@unnumberedsubsec Instruction
a98cef7e
KN
169
170Guile VM has dozens of system instructions and (possibly) hundreds of
171functional instructions. Some Scheme procedures such as cons and car
172are implemented as VM's builtin functions, which are very efficient.
173Other procedures defined outside of the VM are also considered as VM's
174functional features, since they do not change the state of VM.
175Procedures defined within the VM are called subprograms.
176
177Most instructions deal with the accumulator (ac). The VM stores all
178results from functions in ac, instead of pushing them into the stack.
179I'm not sure whether this is a good thing or not.
180
a52b2d3d 181@node Variable Management, Instruction Set, Introduction, Top
015959cb 182@chapter Variable Management
a98cef7e 183
a52b2d3d
LC
184FIXME: This chapter needs to be reviewed so that it matches reality.
185A more up-to-date description of the mechanisms described in this
186section is given in @ref{Instruction Set}.
187
a98cef7e
KN
188A program may have access to local variables, external variables, and
189top-level variables.
190
fa19602c 191@section Local/external variables
a98cef7e
KN
192
193A stack is logically divided into several blocks during execution. A
194"block" is such a unit that maintains local variables and dynamic chain.
195A "frame" is an upper level unit that maintains subprogram calls.
196
fa19602c 197@example
a98cef7e
KN
198 Stack
199 dynamic | | | |
200 chain +==========+ - =
201 | |local vars| | |
202 `-|block data| | block |
203 /|frame data| | |
204 | +----------+ - |
205 | |local vars| | | frame
206 `-|block data| | |
207 /+----------+ - |
208 | |local vars| | |
209 `-|block data| | |
210 /+==========+ - =
211 | |local vars| | |
212 `-|block data| | |
213 /|frame data| | |
214 | +----------+ - |
215 | | | | |
fa19602c 216@end example
a98cef7e
KN
217
218The first block of each frame may look like this:
219
fa19602c 220@example
a98cef7e
KN
221 Address Data
222 ------- ----
223 xxx0028 Local variable 2
224 xxx0024 Local variable 1
225 bp ->xxx0020 Local variable 0
226 xxx001c Local link (block data)
227 xxx0018 External link (block data)
228 xxx0014 Stack pointer (block data)
229 xxx0010 Return address (frame data)
230 xxx000c Parent program (frame data)
fa19602c 231@end example
a98cef7e
KN
232
233The base pointer (bp) always points to the lowest address of local
234variables of the recent block. Local variables are referred as "bp[n]".
235The local link field has a pointer to the dynamic parent of the block.
236The parent's variables are referred as "bp[-1][n]", and grandparent's
237are "bp[-1][-1][n]". Thus, any local variable is represented by its
238depth and offset from the current bp.
239
240A variable may be "external", which is allocated in the shared memory.
241The external link field of a block has a pointer to such a variable set,
242which I call "fragment" (what should I call?). A fragment has a set of
243variables and its own chain.
244
fa19602c 245@example
a98cef7e
KN
246 local external
247 chain| | chain
248 | +-----+ .--------, |
015959cb 249 `-|block|--+->|external|-'
a98cef7e
KN
250 /+-----+ | `--------'\,
251 `-|block|--' |
252 /+-----+ .--------, |
015959cb 253 `-|block|---->|external|-'
a98cef7e
KN
254 +-----+ `--------'
255 | |
fa19602c 256@end example
a98cef7e
KN
257
258An external variable is referred as "bp[-2]->variables[n]" or
259"bp[-2]->link->...->variables[n]". This is also represented by a pair
260of depth and offset. At any point of execution, the value of bp
261determines the current local link and external link, and thus the
262current environment of a program.
263
264Other data fields are described later.
265
fa19602c 266@section Top-level variables
a98cef7e
KN
267
268Guile VM uses the same top-level variables as the regular Guile. A
269program may have direct access to vcells. Currently this is done by
270calling scm_intern0, but a program is possible to have any top-level
271environment defined by the current module.
272
fa19602c 273@section Scheme and VM variable
a98cef7e
KN
274
275Let's think about the following Scheme code as an example:
276
fa19602c 277@example
a98cef7e
KN
278 (define (foo a)
279 (lambda (b) (list foo a b)))
fa19602c 280@end example
a98cef7e
KN
281
282In the lambda expression, "foo" is a top-level variable, "a" is an
283external variable, and "b" is a local variable.
284
285When a VM executes foo, it allocates a block for "a". Since "a" may be
286externally referred from the closure, the VM creates a fragment with a
287copy of "a" in it. When the VM evaluates the lambda expression, it
288creates a subprogram (closure), associating the fragment with the
289subprogram as its external environment. When the closure is executed,
290its environment will look like this:
291
fa19602c 292@example
a98cef7e
KN
293 block Top-level: foo
294 +-------------+
295 |local var: b | fragment
296 +-------------+ .-----------,
297 |external link|---->|variable: a|
298 +-------------+ `-----------'
fa19602c 299@end example
a98cef7e
KN
300
301The fragment remains as long as the closure exists.
302
fa19602c 303@section Addressing mode
a98cef7e
KN
304
305Guile VM has five addressing modes:
306
fa19602c
LC
307@itemize
308@item Real address
309@item Local position
310@item External position
311@item Top-level location
312@item Constant object
313@end itemize
a98cef7e
KN
314
315Real address points to the address in the real program and is only used
316with the program counter (pc).
317
318Local position and external position are represented as a pair of depth
319and offset from bp, as described above. These are base relative
320addresses, and the real address may vary during execution.
321
322Top-level location is represented as a Guile's vcell. This location is
323determined at loading time, so the use of this address is efficient.
324
015959cb 325Constant object is not an address but gives an instruction an Scheme
a98cef7e
KN
326object directly.
327
328[ We'll also need dynamic scope addressing to support Emacs Lisp? ]
329
a98cef7e 330
fa19602c 331Overall procedure:
a98cef7e 332
fa19602c
LC
333@enumerate
334@item A source program is compiled into a bytecode.
335@item A bytecode is given an environment and becomes a program.
336@item A VM starts execution, creating a frame for it.
337@item Whenever a program calls a subprogram, a new frame is created for it.
338@item When a program finishes execution, it returns a value, and the VM
339 continues execution of the parent program.
340@item When all programs terminated, the VM returns the final value and stops.
341@end enumerate
a98cef7e 342
a52b2d3d
LC
343\f
344@node Instruction Set, The Compiler, Variable Management, Top
fa19602c 345@chapter Instruction Set
a98cef7e
KN
346
347The Guile VM instruction set is roughly divided two groups: system
348instructions and functional instructions. System instructions control
349the execution of programs, while functional instructions provide many
238e7a11
LC
350useful calculations.
351
352@menu
353* Environment Control Instructions::
f41cb00c 354* Branch Instructions::
238e7a11
LC
355* Subprogram Control Instructions::
356* Data Control Instructions::
357@end menu
358
f41cb00c 359@node Environment Control Instructions, Branch Instructions, Instruction Set, Instruction Set
238e7a11
LC
360@section Environment Control Instructions
361
362@deffn @insn{} link binding-name
363Look up @var{binding-name} (a string) in the current environment and
364push the corresponding variable object onto the stack. If
365@var{binding-name} is not bound yet, then create a new binding and
366push its variable object.
367@end deffn
368
369@deffn @insn{} variable-ref
370Dereference the variable object which is on top of the stack and
371replace it by the value of the variable it represents.
372@end deffn
373
374@deffn @insn{} variable-set
375Set the value of the variable on top of the stack (at @code{sp[0]}) to
376the object located immediately before (at @code{sp[-1]}).
377@end deffn
378
379As an example, let us look at what a simple function call looks like:
380
381@example
382(+ 2 3)
383@end example
384
385This call yields the following sequence of instructions:
a98cef7e 386
238e7a11 387@example
62082959 388(link "+") ;; lookup binding "+"
238e7a11
LC
389(variable-ref) ;; dereference it
390(make-int8 2) ;; push immediate value `2'
391(make-int8 3) ;; push immediate value `3'
392(tail-call 2) ;; call the proc at sp[-3] with two args
393@end example
fa19602c 394
62082959
LC
395@deffn @insn{} local-ref offset
396Push onto the stack the value of the local variable located at
397@var{offset} within the current stack frame.
398@end deffn
399
400@deffn @insn{} local-set offset
401Pop the Scheme object located on top of the stack and make it the new
402value of the local variable located at @var{offset} within the current
403stack frame.
404@end deffn
405
406@deffn @insn{} external-ref offset
407Push the value of the closure variable located at position
408@var{offset} within the program's list of external variables.
409@end deffn
410
411@deffn @insn{} external-set offset
412Pop the Scheme object located on top of the stack and make it the new
413value of the closure variable located at @var{offset} within the
414program's list of external variables.
415@end deffn
416
0b5f0e49
LC
417@deffn @insn{} make-closure
418Pop the program object from the stack and assign it the current
419closure variable list as its closure. Push the result program
420object.
421@end deffn
422
423Let's illustrate this:
62082959
LC
424
425@example
426(let ((x 2))
427 (lambda ()
428 (let ((x++ (+ 1 x)))
429 (set! x x++)
430 x++)))
431@end example
432
433The resulting program has one external (closure) variable, i.e. its
434@var{nexts} is set to 1 (@pxref{Subprogram Control Instructions}).
435This yields the following code:
436
437@example
0b5f0e49
LC
438 ;; the traditional program prologue with NLOCS = 0 and NEXTS = 1
439
62082959
LC
440 0 (make-int8 2)
441 2 (external-set 0)
442 4 (make-int8 4)
0b5f0e49
LC
443 6 (link "+") ;; lookup `+'
444 9 (vector 1) ;; create the external variable vector for
445 ;; later use by `object-ref' and `object-set'
62082959
LC
446 ...
447 40 (load-program ##34#)
0b5f0e49
LC
448 59 (make-closure) ;; assign the current closure to the program
449 ;; just pushed by `load-program'
450 60 (return)
62082959
LC
451@end example
452
453The program loaded here by @var{load-program} contains the following
454sequence of instructions:
455
456@example
457 0 (object-ref 0) ;; push the variable for `+'
458 2 (variable-ref) ;; dereference `+'
459 3 (make-int8:1) ;; push 1
460 4 (external-ref 0) ;; push the value of `x'
461 6 (call 2) ;; call `+' and push the result
462 8 (local-set 0) ;; make it the new value of `x++'
463 10 (local-ref 0) ;; push the value of `x++'
464 12 (external-set 0) ;; make it the new value of `x'
465 14 (local-ref 0) ;; push the value of `x++'
466 16 (return) ;; return it
467@end example
468
0b5f0e49
LC
469At this point, you should know pretty much everything about the three
470types of variables a program may need to access.
fa19602c 471
f41cb00c
LC
472
473@node Branch Instructions, Subprogram Control Instructions, Environment Control Instructions, Instruction Set
474@section Branch Instructions
475
476All the conditional branch instructions described below work in the
477same way:
478
479@itemize
480@item They take the Scheme object located on the stack and use it as
481the branch condition;
482@item If the condition if false, then program execution continues with
483the next instruction;
484@item If the condition is true, then the instruction pointer is
485increased by the offset passed as an argument to the branch
486instruction;
487@item Finally, when the instruction finished, the condition object is
488removed from the stack.
489@end itemize
490
491Note that the offset passed to the instruction is encoded on two 8-bit
492integers which are then combined by the VM as one 16-bit integer.
493
494@deffn @insn{} br offset
495Jump to @var{offset}.
496@end deffn
497
498@deffn @insn{} br-if offset
499Jump to @var{offset} if the condition on the stack is not false.
500@end deffn
501
502@deffn @insn{} br-if-not offset
503Jump to @var{offset} if the condition on the stack is false.
504@end deffn
505
506@deffn @insn{} br-if-eq offset
507Jump to @var{offset} if the two objects located on the stack are
508equal in the sense of @var{eq?}. Note that, for this instruction, the
509stack pointer is decremented by two Scheme objects instead of only
510one.
511@end deffn
512
513@deffn @insn{} br-if-not-eq offset
514Same as @var{br-if-eq} for non-equal objects.
515@end deffn
516
517@deffn @insn{} br-if-null offset
518Jump to @var{offset} if the object on the stack is @code{'()}.
519@end deffn
520
521@deffn @insn{} br-if-not-null offset
522Jump to @var{offset} if the object on the stack is not @code{'()}.
523@end deffn
524
525
526@node Subprogram Control Instructions, Data Control Instructions, Branch Instructions, Instruction Set
238e7a11
LC
527@section Subprogram Control Instructions
528
529Programs (read: ``compiled procedure'') may refer to external
530bindings, like variables or functions defined outside the program
531itself, in the environment in which it will evaluate at run-time. In
532a sense, a program's environment and its bindings are an implicit
533parameter of every program.
534
b6368dbb 535@cindex object table
238e7a11 536In order to handle such bindings, each program has an @dfn{object
0b5f0e49
LC
537table} associated to it. This table (actually a Scheme vector)
538contains all constant objects referenced by the program. The object
539table of a program is initialized right before a program is loaded
540with @var{load-program}.
541
542Variable objects are one such type of constant object: when a global
543binding is defined, a variable object is associated to it and that
544object will remain constant over time, even if the value bound to it
545changes. Therefore, external bindings only need to be looked up once
546when the program is loaded. References to the corresponding external
547variables from within the program are then performed via the
548@var{object-ref} instruction and are almost as fast as local variable
549references.
238e7a11
LC
550
551Let us consider the following program (procedure) which references
552external bindings @code{frob} and @var{%magic}:
553
554@example
555(lambda (x)
556 (frob x %magic))
557@end example
558
559This yields the following assembly code:
560
561@example
562(make-int8 64) ;; number of args, vars, etc. (see below)
563(link "frob")
564(link "%magic")
62082959 565(vector 2) ;; object table (external bindings)
238e7a11
LC
566...
567(load-program #u8(20 0 23 21 0 20 1 23 36 2))
568(return)
569@end example
570
571All the instructions occurring before @var{load-program} (some were
572omitted for simplicity) form a @dfn{prologue} which, among other
573things, pushed an object table (a vector) that contains the variable
574objects for the variables bound to @var{frob} and @var{%magic}. This
575vector and other data pushed onto the stack are then popped by the
576@var{load-program} instruction.
577
578Besides, the @var{load-program} instruction takes one explicit
579argument which is the bytecode of the program itself. Disassembled,
580this bytecode looks like:
581
582@example
0b5f0e49 583(object-ref 0) ;; push the variable object of `frob'
238e7a11
LC
584(variable-ref) ;; dereference it
585(local-ref 0) ;; push the value of `x'
586(object-ref 1) ;; push the variable object of `%magic'
587(variable-ref) ;; dereference it
588(tail-call 2) ;; call `frob' with two parameters
589@end example
590
591This clearly shows that there is little difference between references
62082959
LC
592to local variables and references to externally bound variables since
593lookup of externally bound variables if performed only once before the
594program is run.
238e7a11
LC
595
596@deffn @insn{} load-program bytecode
f41cb00c
LC
597Load the program whose bytecode is @var{bytecode} (a u8vector), pop
598its meta-information from the stack, and push a corresponding program
599object onto the stack. The program's meta-information may consist of
600(in the order in which it should be pushed onto the stack):
fa19602c
LC
601
602@itemize
238e7a11
LC
603@item optionally, a pair representing meta-data (see the
604@var{program-meta} procedure); [FIXME: explain their meaning]
605@item optionally, a vector which is the program's object table (a
606program that does not reference external bindings does not need an
607object table);
2d80426a
LC
608@item either one immediate integer or four immediate integers
609representing respectively the number of arguments taken by the
610function (@var{nargs}), the number of @dfn{rest arguments}
611(@var{nrest}, 0 or 1), the number of local variables (@var{nlocs}) and
62082959
LC
612the number of external variables (@var{nexts}) (@pxref{Environment
613Control Instructions}).
fa19602c
LC
614@end itemize
615
238e7a11
LC
616@end deffn
617
618@deffn @insn{} object-ref offset
619Push the variable object for the external variable located at
620@var{offset} within the program's object table.
621@end deffn
622
623@deffn @insn{} return
624Free the program's frame.
625@end deffn
626
f41cb00c
LC
627@deffn @insn{} call nargs
628Call the procedure, continuation or program located at
629@code{sp[-nargs]} with the @var{nargs} arguments located from
630@code{sp[0]} to @code{sp[-nargs + 1]}. The
631procedure/continuation/program and its arguments are dropped from the
62082959
LC
632stack and the result is pushed. When calling a program, the
633@code{call} instruction reserves room for its local variables on the
634stack, and initializes its list of closure variables and its vector of
635externally bound variables.
f41cb00c
LC
636@end deffn
637
638@deffn @insn{} tail-call nargs
639Same as @code{call} except that, for tail-recursive calls to a
640program, the current stack frame is re-used, as required by RnRS.
62082959 641This instruction is otherwise similar to @code{call}.
f41cb00c
LC
642@end deffn
643
238e7a11
LC
644
645@node Data Control Instructions, , Subprogram Control Instructions, Instruction Set
646@section Data Control Instructions
647
648@deffn @insn{} make-int8 value
649Push @var{value}, an 8-bit integer, onto the stack.
650@end deffn
651
652@deffn @insn{} make-int8:0
653Push the immediate value @code{0} onto the stack.
654@end deffn
655
656@deffn @insn{} make-int8:1
657Push the immediate value @code{1} onto the stack.
658@end deffn
659
660@deffn @insn{} make-false
661Push @code{#f} onto the stack.
662@end deffn
663
664@deffn @insn{} make-true
665Push @code{#t} onto the stack.
666@end deffn
fa19602c
LC
667
668@itemize
669@item %push
670@item %pushi
671@item %pushl, %pushl:0:0, %pushl:0:1, %pushl:0:2, %pushl:0:3
672@item %pushe, %pushe:0:0, %pushe:0:1, %pushe:0:2, %pushe:0:3
673@item %pusht
674@end itemize
675
676@itemize
677@item %loadi
678@item %loadl, %loadl:0:0, %loadl:0:1, %loadl:0:2, %loadl:0:3
679@item %loade, %loade:0:0, %loade:0:1, %loade:0:2, %loade:0:3
680@item %loadt
681@end itemize
682
683@itemize
684@item %savei
685@item %savel, %savel:0:0, %savel:0:1, %savel:0:2, %savel:0:3
686@item %savee, %savee:0:0, %savee:0:1, %savee:0:2, %savee:0:3
687@item %savet
688@end itemize
689
690@section Flow control instructions
691
692@itemize
693@item %br-if
694@item %br-if-not
695@item %jump
696@end itemize
697
698@section Function call instructions
699
700@itemize
701@item %func, %func0, %func1, %func2
702@end itemize
703
704@section Scheme built-in functions
705
706@itemize
707@item cons
708@item car
709@item cdr
710@end itemize
711
712@section Mathematical buitin functions
713
714@itemize
715@item 1+
716@item 1-
717@item add, add2
718@item sub, sub2, minus
719@item mul2
720@item div2
721@item lt2
722@item gt2
723@item le2
724@item ge2
725@item num-eq2
726@end itemize
015959cb 727
a52b2d3d
LC
728
729\f
b6368dbb 730@node The Compiler, Concept Index, Instruction Set, Top
a52b2d3d
LC
731@chapter The Compiler
732
733This section describes Guile-VM's compiler and the compilation process
734to produce bytecode executable by the VM itself (@pxref{Instruction
735Set}).
736
737@menu
738* Overview::
739* The Language Front-Ends::
740* GHIL::
9dbbe4bb 741* Compiling Scheme Code::
a52b2d3d
LC
742* GLIL::
743* The Assembler::
744@end menu
745
746@node Overview, The Language Front-Ends, The Compiler, The Compiler
747@section Overview
748
749Compilation in Guile-VM is a three-stage process:
750
b6368dbb
LC
751@cindex intermediate language
752@cindex assembler
753@cindex compiler
754@cindex GHIL
755@cindex GLIL
756@cindex bytecode
757
a52b2d3d
LC
758@enumerate
759@item the source programming language (e.g. R5RS Scheme) is read and
760translated into GHIL, @dfn{Guile's High-Level Intermediate Language};
761@item GHIL code is then translated into a lower-level intermediate
762language call GLIL, @dfn{Guile's Low-Level Intermediate Language};
763@item finally, GLIL is @dfn{assembled} into the VM's assembly language
764(@pxref{Instruction Set}) and bytecode.
765@end enumerate
766
767The use of two separate intermediate languages eases the
768implementation of front-ends since the gap between high-level
769languages like Scheme and GHIL is relatively small.
770
b6368dbb 771@vindex guilec
a52b2d3d
LC
772From an end-user viewpoint, compiling a Guile program into bytecode
773can be done either by using the @command{guilec} command-line tool, or
774by using the @code{compile-file} procedure exported by the
775@code{(system base compile)} module.
776
884d46de
LC
777@deffn @scmproc{} compile-file file . opts
778Compile Scheme source code from file @var{file} using compilation
779options @var{opts}. The resulting file, a Guile object file, will be
780name according the application of the @code{compiled-file-name}
781procedure to @var{file}. The possible values for @var{opts} are the
782same as for the @code{compile-in} procedure (see below, @pxref{The Language
783Front-Ends}).
784@end deffn
785
786@deffn @scmproc{} compiled-file-name file
787Given source file name @var{file} (a string), return a string that
788denotes the name of the Guile object file corresponding to
789@var{file}. By default, the file name returned is @var{file} minus
790its extension and plus the @code{.go} file extension.
791@end deffn
792
793@cindex self-hosting
794It is worth noting, as you might have already guessed, that Guile-VM's
795compiler is written in Guile Scheme and is @dfn{self-hosted}: it can
796compile itself.
a52b2d3d
LC
797
798@node The Language Front-Ends, GHIL, Overview, The Compiler
799@section The Language Front-Ends
800
801Guile-VM comes with a number of @dfn{language front-ends}, that is,
802code that can read a given high-level programming language like R5RS
803Scheme, and translate it into a lower-level representation suitable to
804the compiler.
805
806Each language front-end provides a @dfn{specification} and a
807@dfn{translator} to GHIL. Both of them come in the @code{language}
808module hierarchy. As an example, the front-end for Scheme is located
809in the @code{(language scheme spec)} and @code{(language scheme
810translate)} modules. Language front-ends can then be retrieved using
811the @code{lookup-language} procedure of the @code{(system base
812language)} module.
813
814@deftp @scmrec{} <language> name title version reader printer read-file expander translator evaluator environment
815Denotes a language front-end specification a various methods used by
816the compiler to handle source written in that language. Of particular
817interest is the @code{translator} slot (@pxref{GHIL}).
818@end deftp
819
9dbbe4bb
LC
820@deffn @scmproc{} lookup-language lang
821Look for a language front-end named @var{lang}, a symbol (e.g,
822@code{scheme}), and return the @code{<language>} record describing it
823if found. If @var{lang} does not denote a language front-end, an
824error is raised. Note that this procedure assumes that language
825@var{lang} exists if there exist a @code{(language @var{lang} spec)}
826module.
a52b2d3d
LC
827@end deffn
828
884d46de
LC
829The @code{(system base compile)} module defines a procedure similar to
830@code{compile-file} but that is not limited to the Scheme language:
831
832@deffn @scmproc{} compile-in expr env lang . opts
833Compile expression @var{expr}, which is written in language @var{lang}
834(a @code{<language>} object), using compilation options @var{opts},
835and return bytecode as produced by the assembler (@pxref{The
836Assembler}).
837
838Options @var{opts} may contain the following keywords:
839
840@table @code
841@item :e
842compilation will stop after the code expansion phase.
843@item :t
844compilation will stop after the code translation phase, i.e. after
845code in the source language @var{lang} has been translated into GHIL
846(@pxref{GHIL}).
847@item :c
848compilation will stop after the compilation phase and before the
849assembly phase, i.e. once GHIL has been translated into GLIL
850(@pxref{GLIL}).
851@end table
852
853Additionally, @var{opts} may contain any option understood by the
854GHIL-to-GLIL compiler described in @xref{GLIL}.
855@end deffn
856
a52b2d3d 857
9dbbe4bb 858@node GHIL, Compiling Scheme Code, The Language Front-Ends, The Compiler
a52b2d3d
LC
859@section Guile's High-Level Intermediate Language
860
861GHIL has constructs almost equivalent to those found in Scheme.
862However, unlike Scheme, it is meant to be read only by the compiler
863itself. Therefore, a sequence of GHIL code is only a sequence of GHIL
864@emph{objects} (records), as opposed to symbols, each of which
865represents a particular language feature. These records are all
866defined in the @code{(system il ghil)} module and are named
867@code{<ghil-*>}.
868
b6368dbb 869Each GHIL record has at least two fields: one containing the
a52b2d3d 870environment (Guile module) in which it is considered, and one
b6368dbb 871containing its location [FIXME: currently seems to be unused]. Below
a52b2d3d
LC
872is a list of the main GHIL object types and their fields:
873
874@example
875;; Objects
876(<ghil-void> env loc)
877(<ghil-quote> env loc obj)
878(<ghil-quasiquote> env loc exp)
879(<ghil-unquote> env loc exp)
880(<ghil-unquote-splicing> env loc exp)
881;; Variables
882(<ghil-ref> env loc var)
883(<ghil-set> env loc var val)
884(<ghil-define> env loc var val)
885;; Controls
886(<ghil-if> env loc test then else)
887(<ghil-and> env loc exps)
888(<ghil-or> env loc exps)
889(<ghil-begin> env loc exps)
890(<ghil-bind> env loc vars vals body)
891(<ghil-lambda> env loc vars rest body)
892(<ghil-call> env loc proc args)
893(<ghil-inline> env loc inline args)
894@end example
895
896As can be seen from this examples, the constructs in GHIL are pretty
897close to the fundamental primitives of Scheme.
898
899It is the role of front-end language translators (@pxref{The Language
900Front-Ends}) to produce a sequence of GHIL objects from the
9dbbe4bb
LC
901human-readable, source programming language. The next section
902describes the translator for the Scheme language.
a52b2d3d 903
9dbbe4bb
LC
904@node Compiling Scheme Code, GLIL, GHIL, The Compiler
905@section Compiling Scheme Code
a52b2d3d 906
9dbbe4bb
LC
907The language object for Scheme, as returned by @code{(lookup-language
908'scheme)} (@pxref{The Language Front-Ends}), defines a translator
909procedure that returns a sequence of GHIL objects given Scheme code.
910Before actually performing this operation, the Scheme translator
911expands macros in the original source code.
912
913The macros that may be expanded can come from different sources:
914
915@itemize
916@item core Guile macros, such as @code{false-if-exception};
917@item macros defined in modules used by the module being compiled,
918e.g., @code{receive} in @code{(ice-9 receive)};
919@item macros defined within the module being compiled.
920@end itemize
921
922@cindex macro
923@cindex syntax transformer
924@findex define-macro
925@findex defmacro
926The main complexity in handling macros at compilation time is that
927Guile's macros are first-class objects. For instance, when using
928@code{define-macro}, one actually defines a @emph{procedure} that
929returns code; of course, unlike a ``regular'' procedure, it is
930executed when an S-exp is @dfn{memoized} by the evaluator, i.e.,
931before the actual evaluation takes place. Worse, it is possible to
932turn a procedure into a macro, or @dfn{syntax transformer}, thus
933removing, to some extent, the boundary between the macro expansion and
934evaluation phases, @inforef{Internal Macros, , guile}.
935
936[FIXME: explain limitations, etc.]
937
938
939@node GLIL, The Assembler, Compiling Scheme Code, The Compiler
a52b2d3d
LC
940@section Guile's Low-Level Intermediate Language
941
942A GHIL instruction sequence can be compiled into GLIL using the
943@code{compile} procedure exported by the @code{(system il compile)}
944module. During this translation process, various optimizations may
945also be performed.
946
947The module @code{(system il glil)} defines record types representing
948various low-level abstractions. Compared to GHIL, the flow control
949primitives in GLIL are much more low-level: only @code{<glil-label>},
950@code{<glil-branch>} and @code{<glil-call>} are available, no
951@code{lambda}, @code{if}, etc.
952
953
954@deffn @scmproc{} compile ghil environment . opts
955Compile @var{ghil}, a GHIL instruction sequence, within
956environment/module @var{environment}, and return the resulting GLIL
957instruction sequence. The option list @var{opts} may be either the
958empty list or a list containing the @code{:O} keyword in which case
959@code{compile} will first go through an optimization stage of
960@var{ghil}.
884d46de
LC
961
962Note that the @code{:O} option may be passed at a higher-level to the
963@code{compile-file} and @code{compile-in} procedures (@pxref{The
964Language Front-Ends}).
a52b2d3d
LC
965@end deffn
966
967@deffn @scmproc{} pprint-glil glil . port
968Print @var{glil}, a GLIL sequence instructions, in a human-readable
969form. If @var{port} is passed, it will be used as the output port.
970@end deffn
971
972
973Let's consider the following Scheme expression:
974
975@example
976(lambda (x) (+ x 1))
977@end example
978
979The corresponding (unoptimized) GLIL code, as shown by
980@code{pprint-glil}, looks like this:
981
982@example
983(@@asm (0 0 0 0)
984 (@@asm (1 0 0 0) ;; expect one arg.
985 (@@bind (x argument 0)) ;; debugging info
986 (module-ref #f +) ;; lookup `+'
987 (argument-ref 0) ;; push the argument onto
988 ;; the stack
989 (const 1) ;; push `1'
990 (tail-call 2) ;; call `+', with 2 args,
991 ;; using the same stack frame
992 (@@source 15 33)) ;; additional debugging info
993 (return 0))
994@end example
995
996This is not unlike the VM's assembly language described in
997@ref{Instruction Set}.
998
999@node The Assembler, , GLIL, The Compiler
1000@section The Assembler
1001
b6368dbb
LC
1002@findex code->bytes
1003
a52b2d3d
LC
1004The final compilation step consists in converting the GLIL instruction
1005sequence into VM bytecode. This is what the @code{assemble} procedure
1006defined in the @code{(system vm assemble)} module is for. It relies
1007on the @code{code->bytes} procedure of the @code{(system vm conv)}
1008module to convert instructions (represented as lists whose @code{car}
1009is a symbol naming the instruction, e.g. @code{object-ref},
1010@pxref{Instruction Set}) into binary code, or @dfn{bytecode}.
1011Bytecode itself is represented using SRFI-4 byte vectors,
1012@inforef{SRFI-4, SRFI-4 homogeneous numeric vectors, guile}.
1013
1014
1015@deffn @scmproc{} assemble glil environment . opts
1016Return a binary representation of @var{glil} (bytecode), either in the
1017form of an SRFI-4 @code{u8vector} or a @code{<bytespec>} object.
1018[FIXME: Why is that?]
1019@end deffn
1020
1021
b6368dbb 1022
015959cb 1023@c *********************************************************************
b6368dbb
LC
1024@node Concept Index, Function and Instruction Index, The Compiler, Top
1025@unnumbered Concept Index
1026@printindex cp
015959cb 1027
b6368dbb
LC
1028@node Function and Instruction Index, Command and Variable Index, Concept Index, Top
1029@unnumbered Function and Instruction Index
1030@printindex fn
015959cb 1031
b6368dbb
LC
1032@node Command and Variable Index, , Function and Instruction Index, Top
1033@unnumbered Command and Variable Index
1034@printindex vr
015959cb
KN
1035
1036@bye
b6368dbb 1037
015959cb 1038@c Local Variables:
b6368dbb 1039@c ispell-local-dictionary: "american";
015959cb 1040@c End:
238e7a11
LC
1041
1042@c LocalWords: bytecode