*** empty log message ***
[bpt/guile.git] / doc / ref / libguile-concepts.texi
CommitLineData
3229f68b
MV
1@c -*-texinfo-*-
2@c This is part of the GNU Guile Reference Manual.
76da80e7 3@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004
3229f68b
MV
4@c Free Software Foundation, Inc.
5@c See the file guile.texi for copying conditions.
6
7@page
8@node General Libguile Concepts
9@section General concepts for using libguile
10
76da80e7
MV
11When you want to embed the Guile Scheme interpreter into your program,
12you need to link it against the @file{libguile} library (@pxref{Linking
13Programs With Guile}). Once you have done this, your C code has access
14to a number of data types and functions that can be used to invoke the
15interpreter, or make new functions that you have written in C available
16to be called from Scheme code, among other things.
3229f68b
MV
17
18Scheme is different from C in a number of significant ways, and Guile
19tries to make the advantages of Scheme available to C as well. Thus, in
20addition to a Scheme interpreter, libguile also offers dynamic types,
21garbage collection, continuations, arithmetic on arbitrary sized
22numbers, and other things.
23
24The two fundamental concepts are dynamic types and garbage collection.
25You need to understand how libguile offers them to C programs in order
26to use the rest of libguile. Also, the more general control flow of
27Scheme caused by continuations needs to be dealt with.
28
29@menu
30* Dynamic Types:: Dynamic Types.
31* Garbage Collection:: Garbage Collection.
32* Control Flow:: Control Flow.
33@end menu
34
35@node Dynamic Types
36@subsection Dynamic Types
37
38Scheme is a dynamically-typed language; this means that the system
39cannot, in general, determine the type of a given expression at compile
40time. Types only become apparent at run time. Variables do not have
41fixed types; a variable may hold a pair at one point, an integer at the
42next, and a thousand-element vector later. Instead, values, not
43variables, have fixed types.
44
45In order to implement standard Scheme functions like @code{pair?} and
46@code{string?} and provide garbage collection, the representation of
47every value must contain enough information to accurately determine its
48type at run time. Often, Scheme systems also use this information to
49determine whether a program has attempted to apply an operation to an
50inappropriately typed value (such as taking the @code{car} of a string).
51
52Because variables, pairs, and vectors may hold values of any type,
53Scheme implementations use a uniform representation for values --- a
54single type large enough to hold either a complete value or a pointer
55to a complete value, along with the necessary typing information.
56
57In Guile, this uniform representation of all Scheme values is the C type
58@code{SCM}. This is an opaque type and its size is typically equivalent
59to that of a pointer to @code{void}. Thus, @code{SCM} values can be
60passed around efficiently and they take up reasonably little storage on
61their own.
62
63The most important rule is: You never access a @code{SCM} value
64directly; you only pass it to functions or macros defined in libguile.
65
66As an obvious example, although a @code{SCM} variable can contain
67integers, you can of course not compute the sum of two @code{SCM} values
68by adding them with the C @code{+} operator. You must use the libguile
69function @code{scm_sum}.
70
71Less obvious and therefore more important to keep in mind is that you
72also cannot directly test @code{SCM} values for trueness. In Scheme,
73the value @code{#f} is considered false and of course a @code{SCM}
74variable can represent that value. But there is no guarantee that the
75@code{SCM} representation of @code{#f} looks false to C code as well.
76You need to use @code{scm_is_true} or @code{scm_is_false} to test a
77@code{SCM} value for trueness or falseness, respectively.
78
79You also can not directly compare two @code{SCM} values to find out
80whether they are identical (that is, whether they are @code{eq?} in
81Scheme terms). You need to use @code{scm_is_eq} for this.
82
83The one exception is that you can directly assign a @code{SCM} value to
84a @code{SCM} variable by using the C @code{=} operator.
85
86The following (contrieved) example shows how to do it right. It
87implements a function of two arguments (@var{a} and @var{flag}) that
88returns @var{a}+1 if @var{flag} is true, else it returns @var{a}
89unchanged.
90
91@example
92SCM
93my_incrementing_function (SCM a, SCM flag)
94@{
95 SCM result;
96
97 if (scm_is_true (flag))
98 result = scm_sum (a, scm_from_int (1));
99 else
100 result = a;
101
102 return result;
103@}
104@end example
105
106Often, you need to convert between @code{SCM} values and approriate C
107values. For example, we needed to convert the integer @code{1} to its
108@code{SCM} representation in order to add it to @var{a}. Libguile
109provides many function to do these conversions, both from C to
110@code{SCM} and from @code{SCM} to C.
111
112The conversion functions follow a common naming pattern: those that make
113a @code{SCM} value from a C value have names of the form
114@code{scm_from_@var{type} (@dots{})} and those that convert a @code{SCM}
115value to a C value use the form @code{scm_to_@var{type} (@dots{})}.
116
117However, it is best to avoid converting values when you can. When you
118must combine C values and @code{SCM} values in a computation, it is
119often better to convert the C values to @code{SCM} values and do the
120computation by using libguile functions than to the other way around
121(converting @code{SCM} to C and doing the computation some other way).
122
123As a simple example, consider this version of
124@code{my_incrementing_function} from above:
125
126@example
127SCM
128my_other_incrementing_function (SCM a, SCM flag)
129@{
130 int result;
131
132 if (scm_is_true (flag))
133 result = scm_to_int (a) + 1;
134 else
135 result = scm_to_int (a);
136
137 return scm_from_int (result);
138@}
139@end example
140
141This version is much less general than the original one: it will only
142work for values @var{A} that can fit into a @code{int}. The original
143function will work for all values that Guile can represent and that
144@code{scm_sum} can understand, including integers bigger than @code{long
145long}, floating point numbers, complex numbers, and new numerical types
146that have been added to Guile by third-party libraries.
147
148Also, computing with @code{SCM} is not necessarily inefficient. Small
149integers will be encoded directly in the @code{SCM} value, for example,
150and do not need any additional memory on the heap. See @ref{Data
151Representation} to find out the details.
152
153Some special @code{SCM} values are available to C code without needing
154to convert them from C values:
155
156@multitable {Scheme value} {C representation}
157@item Scheme value @tab C representation
158@item @nicode{#f} @tab @nicode{SCM_BOOL_F}
159@item @nicode{#t} @tab @nicode{SCM_BOOL_T}
160@item @nicode{()} @tab @nicode{SCM_EOL}
161@end multitable
162
163In addition to @code{SCM}, Guile also defines the related type
164@code{scm_t_bits}. This is an unsigned integral type of sufficient
165size to hold all information that is directly contained in a
166@code{SCM} value. The @code{scm_t_bits} type is used internally by
167Guile to do all the bit twiddling explained in @ref{Data
168Representation}, but you will encounter it occasionally in low-level
169user code as well.
170
171
172@node Garbage Collection
173@subsection Garbage Collection
174
175As explained above, the @code{SCM} type can represent all Scheme values.
176Some values fit entirely into a @code{SCM} value (such as small
177integers), but other values require additional storage in the heap (such
178as strings and vectors). This additional storage is managed
179automatically by Guile. You don't need to explicitely deallocate it
180when a @code{SCM} value is no longer used.
181
182Two things must be guaranteed so that Guile is able to manage the
183storage automatically: it must know about all blocks of memory that have
184ever been allocated for Scheme values, and it must know about all Scheme
185values that are still being used. Given this knowledge, Guile can
186periodically free all blocks that have been allocated but are not used
187by any active Scheme values. This activity is called @dfn{garbage
188collection}.
189
190It is easy for Guile to remember all blocks of memory that is has
191allocated for use by Scheme values, but you need to help it with finding
192all Scheme values that are in use by C code.
193
194You do this when writing a SMOB mark function, for example
195(@pxref{Garbage Collecting Smobs}). By calling this function, the
196garbage collector learns about all references that your SMOB has to
197other @code{SCM} values.
198
199Other references to @code{SCM} objects, such as global variables of type
200@code{SCM} or other random data structures in the heap that contain
201fields of type @code{SCM}, can be made visible to the garbage collector
202by calling the functions @code{scm_gc_protect} or
203@code{scm_permanent_object}. You normally use these funtions for long
204lived objects such as a hash table that is stored in a global variable.
205For temporary references in local variables or function arguments, using
206these functions would be too expensive.
207
208These references are handled differently: Local variables (and function
209arguments) of type @code{SCM} are automatically visible to the garbage
210collector. This works because the collector scans the stack for
211potential references to @code{SCM} objects and considers all referenced
212objects to be alive. The scanning considers each and every word of the
213stack, regardless of what it is actually used for, and then decides
214whether it could possible be a reference to a @code{SCM} object. Thus,
215the scanning is guaranteed to find all actual references, but it might
216also find words that only accidentally look like references. These
217`false positives' might keep @code{SCM} objects alive that would
218otherwise be considered dead. While this might waste memory, keeping an
219object around longer than it strictly needs to is harmless. This is why
220this technique is called ``conservative garbage collection''. In
221practice, the wasted memory seems to be no problem.
222
223The stack of every thread is scanned in this way and the registers of
224the CPU and all other memory locations where local variables or function
225parameters might show up are included in this scan as well.
226
227The consequence of the conservative scanning is that you can just
228declare local variables and function parameters of type @code{SCM} and
229be sure that the garbage collector will not free the corresponding
230objects.
231
232However, a local variable or function parameter is only protected as
233long as it is really on the stack (or in some register). As an
234optimization, the C compiler might reuse its location for some other
235value and the @code{SCM} object would no longer be protected. Normally,
236this leads to exactly the right behabvior: the compiler will only
237overwrite a reference when it is no longer needed and thus the object
238becomes unprotected precisely when the reference disappears, just as
239wanted.
240
241There are situations, however, where a @code{SCM} object needs to be
242around longer than its reference from a local variable or function
243parameter. This happens, for example, when you retrieve the array of
244characters from a Scheme string and work on that array directly. The
245reference to the @code{SCM} string object might be dead after the
246character array has been retrieved, but the array itself is still in use
247and thus the string object must be protected. The compiler does not
248know about this connection and might overwrite the @code{SCM} reference
249too early.
250
251To get around this problem, you can use @code{scm_remember_upto_here_1}
252and its cousins. It will keep the compiler from overwriting the
253reference. For a typical example of its use, see @ref{Remembering
254During Operations}.
255
256@node Control Flow
257@subsection Control Flow
258
259Scheme has a more general view of program flow than C, both locally and
260non-locally.
261
262Controlling the local flow of control involves things like gotos, loops,
263calling functions and returning from them. Non-local control flow
264refers to situations where the program jumps across one or more levels
265of function activations without using the normal call or return
266operations.
267
268The primitive means of C for local control flow is the @code{goto}
269statement, together with @code{if}. Loops done with @code{for},
270@code{while} or @code{do} could in principle be rewritten with just
271@code{goto} and @code{if}. In Scheme, the primitive means for local
272control flow is the @emph{function call} (together with @code{if}).
273Thus, the repetition of some computation in a loop is ultimately
274implemented by a function that calls itself, that is, by recursion.
275
276This approach is theoretically very powerful since it is easier to
277reason formally about recursion than about gotos. In C, using
278recursion exclusively would not be practical, tho, since it would eat
279up the stack very quickly. In Scheme, however, it is practical:
280function calls that appear in a @dfn{tail position} do not use any
281additional stack space.
282
283A function call is in a tail position when it is the last thing the
284calling function does. The value returned by the called function is
285immediately returned from the calling function. In the following
286example, the call to @code{bar-1} is in a tail position, while the
287call to @code{bar-2} is not. (The call to @code{1-} in @code{foo-2}
288is in a tail position, tho.)
289
290@lisp
291(define (foo-1 x)
292 (bar-1 (1- x)))
293
294(define (foo-2 x)
295 (1- (bar-2 x)))
296@end lisp
297
298Thus, when you take care to recurse only in tail positions, the
299recursion will only use constant stack space and will be as good as a
300loop constructed from gotos.
301
302Scheme offers a few syntactic abstractions (@code{do} and @dfn{named}
303@code{let}) that make writing loops slightly easier.
304
305But only Scheme functions can call other functions in a tail position:
306C functions can not. This matters when you have, say, two functions
307that call each other recursively to form a common loop. The following
308(unrealistic) example shows how one might go about determing whether a
309non-negative integer @var{n} is even or odd.
310
311@lisp
312(define (my-even? n)
313 (cond ((zero? n) #t)
314 (else (my-odd? (1- n)))))
315
316(define (my-odd? n)
317 (cond ((zero? n) #f)
318 (else (my-even? (1- n)))))
319@end lisp
320
321Because the calls to @code{my-even?} and @code{my-odd?} are in tail
322positions, these two procedures can be applied to arbitrary large
323integers without overflowing the stack. (They will still take a lot
324of time, of course.)
325
326However, when one or both of the two procedures would be rewritten in
327C, it could no longer call its companion in a tail position (since C
328does not have this concept). You might need to take this
329consideration into account when deciding which parts of your program
330to write in Scheme and which in C.
331
332In addition to calling functions and returning from them, a Scheme
333program can also exit non-locally from a function so that the control
334flow returns directly to an outer level. This means that some functions
335might not return at all.
336
337Even more, it is not only possible to jump to some outer level of
338control, a Scheme program can also jump back into the middle of a
339function that has already exited. This might cause some functions to
340return more than once.
341
342In general, these non-local jumps are done by invoking
343@dfn{continuations} that have previously been captured using
344@code{call-with-current-continuation}. Guile also offers a slightly
345restricted set of functions, @code{catch} and @code{throw}, that can
346only be used for non-local exits. This restriction makes them more
347efficient. Error reporting (with the function @code{error}) is
348implemented by invoking @code{throw}, for example. The functions
349@code{catch} and @code{throw} belong to the topic of @dfn{exceptions}.
350
351Since Scheme functions can call C functions and vice versa, C code can
352experience the more general control flow of Scheme as well. It is
353possible that a C function will not return at all, or will return more
354than once. While C does offer @code{setjmp} and @code{longjmp} for
355non-local exits, it is still an unusual thing for C code. In
356contrast, non-local exits are very common in Scheme, mostly to report
357errors.
358
359You need to be prepared for the non-local jumps in the control flow
360whenever you use a function from @code{libguile}: it is best to assume
361that any @code{libguile} function might signal an error or run a pending
362signal handler (which in turn can do arbitrary things).
363
364It is often necessary to take cleanup actions when the control leaves a
365function non-locally. Also, when the control returns non-locally, some
366setup actions might be called for. For example, the Scheme function
367@code{with-output-to-port} needs to modify the global state so that
368@code{current-output-port} returns the port passed to
369@code{with-output-to-port}. The global output port needs to be reset to
370its previous value when @code{with-output-to-port} returns normally or
371when it is exited non-locally. Likewise, the port needs to be set again
372when control enters non-locally.
373
374Scheme code can use the @code{dynamic-wind} function to arrange for the
375setting and resetting of the global state. C code could use the
376corresponding @code{scm_internal_dynamic_wind} function, but it might
377prefer to use the @dfn{frames} concept that is more natural for C code,
378(@pxref{Frames}).
379