Commit | Line | Data |
---|---|---|
3229f68b MV |
1 | @c -*-texinfo-*- |
2 | @c This is part of the GNU Guile Reference Manual. | |
76da80e7 | 3 | @c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004 |
3229f68b MV |
4 | @c Free Software Foundation, Inc. |
5 | @c See the file guile.texi for copying conditions. | |
6 | ||
7 | @page | |
8 | @node General Libguile Concepts | |
9 | @section General concepts for using libguile | |
10 | ||
76da80e7 MV |
11 | When you want to embed the Guile Scheme interpreter into your program, |
12 | you need to link it against the @file{libguile} library (@pxref{Linking | |
13 | Programs With Guile}). Once you have done this, your C code has access | |
14 | to a number of data types and functions that can be used to invoke the | |
15 | interpreter, or make new functions that you have written in C available | |
16 | to be called from Scheme code, among other things. | |
3229f68b MV |
17 | |
18 | Scheme is different from C in a number of significant ways, and Guile | |
19 | tries to make the advantages of Scheme available to C as well. Thus, in | |
20 | addition to a Scheme interpreter, libguile also offers dynamic types, | |
21 | garbage collection, continuations, arithmetic on arbitrary sized | |
22 | numbers, and other things. | |
23 | ||
24 | The two fundamental concepts are dynamic types and garbage collection. | |
25 | You need to understand how libguile offers them to C programs in order | |
26 | to use the rest of libguile. Also, the more general control flow of | |
27 | Scheme caused by continuations needs to be dealt with. | |
28 | ||
29 | @menu | |
30 | * Dynamic Types:: Dynamic Types. | |
31 | * Garbage Collection:: Garbage Collection. | |
32 | * Control Flow:: Control Flow. | |
33 | @end menu | |
34 | ||
35 | @node Dynamic Types | |
36 | @subsection Dynamic Types | |
37 | ||
38 | Scheme is a dynamically-typed language; this means that the system | |
39 | cannot, in general, determine the type of a given expression at compile | |
40 | time. Types only become apparent at run time. Variables do not have | |
41 | fixed types; a variable may hold a pair at one point, an integer at the | |
42 | next, and a thousand-element vector later. Instead, values, not | |
43 | variables, have fixed types. | |
44 | ||
45 | In order to implement standard Scheme functions like @code{pair?} and | |
46 | @code{string?} and provide garbage collection, the representation of | |
47 | every value must contain enough information to accurately determine its | |
48 | type at run time. Often, Scheme systems also use this information to | |
49 | determine whether a program has attempted to apply an operation to an | |
50 | inappropriately typed value (such as taking the @code{car} of a string). | |
51 | ||
52 | Because variables, pairs, and vectors may hold values of any type, | |
53 | Scheme implementations use a uniform representation for values --- a | |
54 | single type large enough to hold either a complete value or a pointer | |
55 | to a complete value, along with the necessary typing information. | |
56 | ||
57 | In Guile, this uniform representation of all Scheme values is the C type | |
58 | @code{SCM}. This is an opaque type and its size is typically equivalent | |
59 | to that of a pointer to @code{void}. Thus, @code{SCM} values can be | |
60 | passed around efficiently and they take up reasonably little storage on | |
61 | their own. | |
62 | ||
63 | The most important rule is: You never access a @code{SCM} value | |
64 | directly; you only pass it to functions or macros defined in libguile. | |
65 | ||
66 | As an obvious example, although a @code{SCM} variable can contain | |
67 | integers, you can of course not compute the sum of two @code{SCM} values | |
68 | by adding them with the C @code{+} operator. You must use the libguile | |
69 | function @code{scm_sum}. | |
70 | ||
71 | Less obvious and therefore more important to keep in mind is that you | |
72 | also cannot directly test @code{SCM} values for trueness. In Scheme, | |
73 | the value @code{#f} is considered false and of course a @code{SCM} | |
74 | variable can represent that value. But there is no guarantee that the | |
75 | @code{SCM} representation of @code{#f} looks false to C code as well. | |
76 | You need to use @code{scm_is_true} or @code{scm_is_false} to test a | |
77 | @code{SCM} value for trueness or falseness, respectively. | |
78 | ||
79 | You also can not directly compare two @code{SCM} values to find out | |
80 | whether they are identical (that is, whether they are @code{eq?} in | |
81 | Scheme terms). You need to use @code{scm_is_eq} for this. | |
82 | ||
83 | The one exception is that you can directly assign a @code{SCM} value to | |
84 | a @code{SCM} variable by using the C @code{=} operator. | |
85 | ||
86 | The following (contrieved) example shows how to do it right. It | |
87 | implements a function of two arguments (@var{a} and @var{flag}) that | |
88 | returns @var{a}+1 if @var{flag} is true, else it returns @var{a} | |
89 | unchanged. | |
90 | ||
91 | @example | |
92 | SCM | |
93 | my_incrementing_function (SCM a, SCM flag) | |
94 | @{ | |
95 | SCM result; | |
96 | ||
97 | if (scm_is_true (flag)) | |
98 | result = scm_sum (a, scm_from_int (1)); | |
99 | else | |
100 | result = a; | |
101 | ||
102 | return result; | |
103 | @} | |
104 | @end example | |
105 | ||
106 | Often, you need to convert between @code{SCM} values and approriate C | |
107 | values. For example, we needed to convert the integer @code{1} to its | |
108 | @code{SCM} representation in order to add it to @var{a}. Libguile | |
109 | provides many function to do these conversions, both from C to | |
110 | @code{SCM} and from @code{SCM} to C. | |
111 | ||
112 | The conversion functions follow a common naming pattern: those that make | |
113 | a @code{SCM} value from a C value have names of the form | |
114 | @code{scm_from_@var{type} (@dots{})} and those that convert a @code{SCM} | |
115 | value to a C value use the form @code{scm_to_@var{type} (@dots{})}. | |
116 | ||
117 | However, it is best to avoid converting values when you can. When you | |
118 | must combine C values and @code{SCM} values in a computation, it is | |
119 | often better to convert the C values to @code{SCM} values and do the | |
120 | computation by using libguile functions than to the other way around | |
121 | (converting @code{SCM} to C and doing the computation some other way). | |
122 | ||
123 | As a simple example, consider this version of | |
124 | @code{my_incrementing_function} from above: | |
125 | ||
126 | @example | |
127 | SCM | |
128 | my_other_incrementing_function (SCM a, SCM flag) | |
129 | @{ | |
130 | int result; | |
131 | ||
132 | if (scm_is_true (flag)) | |
133 | result = scm_to_int (a) + 1; | |
134 | else | |
135 | result = scm_to_int (a); | |
136 | ||
137 | return scm_from_int (result); | |
138 | @} | |
139 | @end example | |
140 | ||
141 | This version is much less general than the original one: it will only | |
142 | work for values @var{A} that can fit into a @code{int}. The original | |
143 | function will work for all values that Guile can represent and that | |
144 | @code{scm_sum} can understand, including integers bigger than @code{long | |
145 | long}, floating point numbers, complex numbers, and new numerical types | |
146 | that have been added to Guile by third-party libraries. | |
147 | ||
148 | Also, computing with @code{SCM} is not necessarily inefficient. Small | |
149 | integers will be encoded directly in the @code{SCM} value, for example, | |
150 | and do not need any additional memory on the heap. See @ref{Data | |
151 | Representation} to find out the details. | |
152 | ||
153 | Some special @code{SCM} values are available to C code without needing | |
154 | to convert them from C values: | |
155 | ||
156 | @multitable {Scheme value} {C representation} | |
157 | @item Scheme value @tab C representation | |
158 | @item @nicode{#f} @tab @nicode{SCM_BOOL_F} | |
159 | @item @nicode{#t} @tab @nicode{SCM_BOOL_T} | |
160 | @item @nicode{()} @tab @nicode{SCM_EOL} | |
161 | @end multitable | |
162 | ||
163 | In addition to @code{SCM}, Guile also defines the related type | |
164 | @code{scm_t_bits}. This is an unsigned integral type of sufficient | |
165 | size to hold all information that is directly contained in a | |
166 | @code{SCM} value. The @code{scm_t_bits} type is used internally by | |
167 | Guile to do all the bit twiddling explained in @ref{Data | |
168 | Representation}, but you will encounter it occasionally in low-level | |
169 | user code as well. | |
170 | ||
171 | ||
172 | @node Garbage Collection | |
173 | @subsection Garbage Collection | |
174 | ||
175 | As explained above, the @code{SCM} type can represent all Scheme values. | |
176 | Some values fit entirely into a @code{SCM} value (such as small | |
177 | integers), but other values require additional storage in the heap (such | |
178 | as strings and vectors). This additional storage is managed | |
179 | automatically by Guile. You don't need to explicitely deallocate it | |
180 | when a @code{SCM} value is no longer used. | |
181 | ||
182 | Two things must be guaranteed so that Guile is able to manage the | |
183 | storage automatically: it must know about all blocks of memory that have | |
184 | ever been allocated for Scheme values, and it must know about all Scheme | |
185 | values that are still being used. Given this knowledge, Guile can | |
186 | periodically free all blocks that have been allocated but are not used | |
187 | by any active Scheme values. This activity is called @dfn{garbage | |
188 | collection}. | |
189 | ||
190 | It is easy for Guile to remember all blocks of memory that is has | |
191 | allocated for use by Scheme values, but you need to help it with finding | |
192 | all Scheme values that are in use by C code. | |
193 | ||
194 | You do this when writing a SMOB mark function, for example | |
195 | (@pxref{Garbage Collecting Smobs}). By calling this function, the | |
196 | garbage collector learns about all references that your SMOB has to | |
197 | other @code{SCM} values. | |
198 | ||
199 | Other references to @code{SCM} objects, such as global variables of type | |
200 | @code{SCM} or other random data structures in the heap that contain | |
201 | fields of type @code{SCM}, can be made visible to the garbage collector | |
202 | by calling the functions @code{scm_gc_protect} or | |
203 | @code{scm_permanent_object}. You normally use these funtions for long | |
204 | lived objects such as a hash table that is stored in a global variable. | |
205 | For temporary references in local variables or function arguments, using | |
206 | these functions would be too expensive. | |
207 | ||
208 | These references are handled differently: Local variables (and function | |
209 | arguments) of type @code{SCM} are automatically visible to the garbage | |
210 | collector. This works because the collector scans the stack for | |
211 | potential references to @code{SCM} objects and considers all referenced | |
212 | objects to be alive. The scanning considers each and every word of the | |
213 | stack, regardless of what it is actually used for, and then decides | |
214 | whether it could possible be a reference to a @code{SCM} object. Thus, | |
215 | the scanning is guaranteed to find all actual references, but it might | |
216 | also find words that only accidentally look like references. These | |
217 | `false positives' might keep @code{SCM} objects alive that would | |
218 | otherwise be considered dead. While this might waste memory, keeping an | |
219 | object around longer than it strictly needs to is harmless. This is why | |
220 | this technique is called ``conservative garbage collection''. In | |
221 | practice, the wasted memory seems to be no problem. | |
222 | ||
223 | The stack of every thread is scanned in this way and the registers of | |
224 | the CPU and all other memory locations where local variables or function | |
225 | parameters might show up are included in this scan as well. | |
226 | ||
227 | The consequence of the conservative scanning is that you can just | |
228 | declare local variables and function parameters of type @code{SCM} and | |
229 | be sure that the garbage collector will not free the corresponding | |
230 | objects. | |
231 | ||
232 | However, a local variable or function parameter is only protected as | |
233 | long as it is really on the stack (or in some register). As an | |
234 | optimization, the C compiler might reuse its location for some other | |
235 | value and the @code{SCM} object would no longer be protected. Normally, | |
236 | this leads to exactly the right behabvior: the compiler will only | |
237 | overwrite a reference when it is no longer needed and thus the object | |
238 | becomes unprotected precisely when the reference disappears, just as | |
239 | wanted. | |
240 | ||
241 | There are situations, however, where a @code{SCM} object needs to be | |
242 | around longer than its reference from a local variable or function | |
243 | parameter. This happens, for example, when you retrieve the array of | |
244 | characters from a Scheme string and work on that array directly. The | |
245 | reference to the @code{SCM} string object might be dead after the | |
246 | character array has been retrieved, but the array itself is still in use | |
247 | and thus the string object must be protected. The compiler does not | |
248 | know about this connection and might overwrite the @code{SCM} reference | |
249 | too early. | |
250 | ||
251 | To get around this problem, you can use @code{scm_remember_upto_here_1} | |
252 | and its cousins. It will keep the compiler from overwriting the | |
253 | reference. For a typical example of its use, see @ref{Remembering | |
254 | During Operations}. | |
255 | ||
256 | @node Control Flow | |
257 | @subsection Control Flow | |
258 | ||
259 | Scheme has a more general view of program flow than C, both locally and | |
260 | non-locally. | |
261 | ||
262 | Controlling the local flow of control involves things like gotos, loops, | |
263 | calling functions and returning from them. Non-local control flow | |
264 | refers to situations where the program jumps across one or more levels | |
265 | of function activations without using the normal call or return | |
266 | operations. | |
267 | ||
268 | The primitive means of C for local control flow is the @code{goto} | |
269 | statement, together with @code{if}. Loops done with @code{for}, | |
270 | @code{while} or @code{do} could in principle be rewritten with just | |
271 | @code{goto} and @code{if}. In Scheme, the primitive means for local | |
272 | control flow is the @emph{function call} (together with @code{if}). | |
273 | Thus, the repetition of some computation in a loop is ultimately | |
274 | implemented by a function that calls itself, that is, by recursion. | |
275 | ||
276 | This approach is theoretically very powerful since it is easier to | |
277 | reason formally about recursion than about gotos. In C, using | |
278 | recursion exclusively would not be practical, tho, since it would eat | |
279 | up the stack very quickly. In Scheme, however, it is practical: | |
280 | function calls that appear in a @dfn{tail position} do not use any | |
281 | additional stack space. | |
282 | ||
283 | A function call is in a tail position when it is the last thing the | |
284 | calling function does. The value returned by the called function is | |
285 | immediately returned from the calling function. In the following | |
286 | example, the call to @code{bar-1} is in a tail position, while the | |
287 | call to @code{bar-2} is not. (The call to @code{1-} in @code{foo-2} | |
288 | is in a tail position, tho.) | |
289 | ||
290 | @lisp | |
291 | (define (foo-1 x) | |
292 | (bar-1 (1- x))) | |
293 | ||
294 | (define (foo-2 x) | |
295 | (1- (bar-2 x))) | |
296 | @end lisp | |
297 | ||
298 | Thus, when you take care to recurse only in tail positions, the | |
299 | recursion will only use constant stack space and will be as good as a | |
300 | loop constructed from gotos. | |
301 | ||
302 | Scheme offers a few syntactic abstractions (@code{do} and @dfn{named} | |
303 | @code{let}) that make writing loops slightly easier. | |
304 | ||
305 | But only Scheme functions can call other functions in a tail position: | |
306 | C functions can not. This matters when you have, say, two functions | |
307 | that call each other recursively to form a common loop. The following | |
308 | (unrealistic) example shows how one might go about determing whether a | |
309 | non-negative integer @var{n} is even or odd. | |
310 | ||
311 | @lisp | |
312 | (define (my-even? n) | |
313 | (cond ((zero? n) #t) | |
314 | (else (my-odd? (1- n))))) | |
315 | ||
316 | (define (my-odd? n) | |
317 | (cond ((zero? n) #f) | |
318 | (else (my-even? (1- n))))) | |
319 | @end lisp | |
320 | ||
321 | Because the calls to @code{my-even?} and @code{my-odd?} are in tail | |
322 | positions, these two procedures can be applied to arbitrary large | |
323 | integers without overflowing the stack. (They will still take a lot | |
324 | of time, of course.) | |
325 | ||
326 | However, when one or both of the two procedures would be rewritten in | |
327 | C, it could no longer call its companion in a tail position (since C | |
328 | does not have this concept). You might need to take this | |
329 | consideration into account when deciding which parts of your program | |
330 | to write in Scheme and which in C. | |
331 | ||
332 | In addition to calling functions and returning from them, a Scheme | |
333 | program can also exit non-locally from a function so that the control | |
334 | flow returns directly to an outer level. This means that some functions | |
335 | might not return at all. | |
336 | ||
337 | Even more, it is not only possible to jump to some outer level of | |
338 | control, a Scheme program can also jump back into the middle of a | |
339 | function that has already exited. This might cause some functions to | |
340 | return more than once. | |
341 | ||
342 | In general, these non-local jumps are done by invoking | |
343 | @dfn{continuations} that have previously been captured using | |
344 | @code{call-with-current-continuation}. Guile also offers a slightly | |
345 | restricted set of functions, @code{catch} and @code{throw}, that can | |
346 | only be used for non-local exits. This restriction makes them more | |
347 | efficient. Error reporting (with the function @code{error}) is | |
348 | implemented by invoking @code{throw}, for example. The functions | |
349 | @code{catch} and @code{throw} belong to the topic of @dfn{exceptions}. | |
350 | ||
351 | Since Scheme functions can call C functions and vice versa, C code can | |
352 | experience the more general control flow of Scheme as well. It is | |
353 | possible that a C function will not return at all, or will return more | |
354 | than once. While C does offer @code{setjmp} and @code{longjmp} for | |
355 | non-local exits, it is still an unusual thing for C code. In | |
356 | contrast, non-local exits are very common in Scheme, mostly to report | |
357 | errors. | |
358 | ||
359 | You need to be prepared for the non-local jumps in the control flow | |
360 | whenever you use a function from @code{libguile}: it is best to assume | |
361 | that any @code{libguile} function might signal an error or run a pending | |
362 | signal handler (which in turn can do arbitrary things). | |
363 | ||
364 | It is often necessary to take cleanup actions when the control leaves a | |
365 | function non-locally. Also, when the control returns non-locally, some | |
366 | setup actions might be called for. For example, the Scheme function | |
367 | @code{with-output-to-port} needs to modify the global state so that | |
368 | @code{current-output-port} returns the port passed to | |
369 | @code{with-output-to-port}. The global output port needs to be reset to | |
370 | its previous value when @code{with-output-to-port} returns normally or | |
371 | when it is exited non-locally. Likewise, the port needs to be set again | |
372 | when control enters non-locally. | |
373 | ||
374 | Scheme code can use the @code{dynamic-wind} function to arrange for the | |
375 | setting and resetting of the global state. C code could use the | |
376 | corresponding @code{scm_internal_dynamic_wind} function, but it might | |
377 | prefer to use the @dfn{frames} concept that is more natural for C code, | |
378 | (@pxref{Frames}). | |
379 |