Reference "Blocking" for scm_leave_guile/scm_enter_guile.
[bpt/guile.git] / doc / ref / libguile-concepts.texi
CommitLineData
3229f68b
MV
1@c -*-texinfo-*-
2@c This is part of the GNU Guile Reference Manual.
b4fddbbe 3@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2005
3229f68b
MV
4@c Free Software Foundation, Inc.
5@c See the file guile.texi for copying conditions.
6
7@page
8@node General Libguile Concepts
9@section General concepts for using libguile
10
b4fddbbe
MV
11When you want to embed the Guile Scheme interpreter into your program or
12library, you need to link it against the @file{libguile} library
13(@pxref{Linking Programs With Guile}). Once you have done this, your C
14code has access to a number of data types and functions that can be used
15to invoke the interpreter, or make new functions that you have written
16in C available to be called from Scheme code, among other things.
3229f68b
MV
17
18Scheme is different from C in a number of significant ways, and Guile
19tries to make the advantages of Scheme available to C as well. Thus, in
20addition to a Scheme interpreter, libguile also offers dynamic types,
21garbage collection, continuations, arithmetic on arbitrary sized
22numbers, and other things.
23
24The two fundamental concepts are dynamic types and garbage collection.
25You need to understand how libguile offers them to C programs in order
26to use the rest of libguile. Also, the more general control flow of
27Scheme caused by continuations needs to be dealt with.
28
b4fddbbe
MV
29Running asynchronous signal handlers and multi-threading is known to C
30code already, but there are of course a few additional rules when using
31them together with libguile.
32
3229f68b
MV
33@menu
34* Dynamic Types:: Dynamic Types.
35* Garbage Collection:: Garbage Collection.
36* Control Flow:: Control Flow.
b4fddbbe
MV
37* Asynchronous Signals:: Asynchronous Signals
38* Multi-Threading:: Multi-Threading
3229f68b
MV
39@end menu
40
41@node Dynamic Types
42@subsection Dynamic Types
43
44Scheme is a dynamically-typed language; this means that the system
45cannot, in general, determine the type of a given expression at compile
46time. Types only become apparent at run time. Variables do not have
47fixed types; a variable may hold a pair at one point, an integer at the
48next, and a thousand-element vector later. Instead, values, not
49variables, have fixed types.
50
51In order to implement standard Scheme functions like @code{pair?} and
52@code{string?} and provide garbage collection, the representation of
53every value must contain enough information to accurately determine its
54type at run time. Often, Scheme systems also use this information to
55determine whether a program has attempted to apply an operation to an
56inappropriately typed value (such as taking the @code{car} of a string).
57
58Because variables, pairs, and vectors may hold values of any type,
59Scheme implementations use a uniform representation for values --- a
60single type large enough to hold either a complete value or a pointer
61to a complete value, along with the necessary typing information.
62
63In Guile, this uniform representation of all Scheme values is the C type
64@code{SCM}. This is an opaque type and its size is typically equivalent
65to that of a pointer to @code{void}. Thus, @code{SCM} values can be
66passed around efficiently and they take up reasonably little storage on
67their own.
68
69The most important rule is: You never access a @code{SCM} value
70directly; you only pass it to functions or macros defined in libguile.
71
72As an obvious example, although a @code{SCM} variable can contain
73integers, you can of course not compute the sum of two @code{SCM} values
74by adding them with the C @code{+} operator. You must use the libguile
75function @code{scm_sum}.
76
77Less obvious and therefore more important to keep in mind is that you
78also cannot directly test @code{SCM} values for trueness. In Scheme,
79the value @code{#f} is considered false and of course a @code{SCM}
80variable can represent that value. But there is no guarantee that the
81@code{SCM} representation of @code{#f} looks false to C code as well.
82You need to use @code{scm_is_true} or @code{scm_is_false} to test a
83@code{SCM} value for trueness or falseness, respectively.
84
85You also can not directly compare two @code{SCM} values to find out
86whether they are identical (that is, whether they are @code{eq?} in
87Scheme terms). You need to use @code{scm_is_eq} for this.
88
89The one exception is that you can directly assign a @code{SCM} value to
90a @code{SCM} variable by using the C @code{=} operator.
91
92The following (contrieved) example shows how to do it right. It
93implements a function of two arguments (@var{a} and @var{flag}) that
94returns @var{a}+1 if @var{flag} is true, else it returns @var{a}
95unchanged.
96
97@example
98SCM
99my_incrementing_function (SCM a, SCM flag)
100@{
101 SCM result;
102
103 if (scm_is_true (flag))
104 result = scm_sum (a, scm_from_int (1));
105 else
106 result = a;
107
108 return result;
109@}
110@end example
111
112Often, you need to convert between @code{SCM} values and approriate C
113values. For example, we needed to convert the integer @code{1} to its
114@code{SCM} representation in order to add it to @var{a}. Libguile
115provides many function to do these conversions, both from C to
116@code{SCM} and from @code{SCM} to C.
117
118The conversion functions follow a common naming pattern: those that make
119a @code{SCM} value from a C value have names of the form
120@code{scm_from_@var{type} (@dots{})} and those that convert a @code{SCM}
121value to a C value use the form @code{scm_to_@var{type} (@dots{})}.
122
123However, it is best to avoid converting values when you can. When you
124must combine C values and @code{SCM} values in a computation, it is
125often better to convert the C values to @code{SCM} values and do the
126computation by using libguile functions than to the other way around
127(converting @code{SCM} to C and doing the computation some other way).
128
129As a simple example, consider this version of
130@code{my_incrementing_function} from above:
131
132@example
133SCM
134my_other_incrementing_function (SCM a, SCM flag)
135@{
136 int result;
137
138 if (scm_is_true (flag))
139 result = scm_to_int (a) + 1;
140 else
141 result = scm_to_int (a);
142
143 return scm_from_int (result);
144@}
145@end example
146
147This version is much less general than the original one: it will only
148work for values @var{A} that can fit into a @code{int}. The original
149function will work for all values that Guile can represent and that
150@code{scm_sum} can understand, including integers bigger than @code{long
151long}, floating point numbers, complex numbers, and new numerical types
152that have been added to Guile by third-party libraries.
153
154Also, computing with @code{SCM} is not necessarily inefficient. Small
155integers will be encoded directly in the @code{SCM} value, for example,
156and do not need any additional memory on the heap. See @ref{Data
157Representation} to find out the details.
158
159Some special @code{SCM} values are available to C code without needing
160to convert them from C values:
161
162@multitable {Scheme value} {C representation}
163@item Scheme value @tab C representation
164@item @nicode{#f} @tab @nicode{SCM_BOOL_F}
165@item @nicode{#t} @tab @nicode{SCM_BOOL_T}
166@item @nicode{()} @tab @nicode{SCM_EOL}
167@end multitable
168
169In addition to @code{SCM}, Guile also defines the related type
170@code{scm_t_bits}. This is an unsigned integral type of sufficient
171size to hold all information that is directly contained in a
172@code{SCM} value. The @code{scm_t_bits} type is used internally by
173Guile to do all the bit twiddling explained in @ref{Data
174Representation}, but you will encounter it occasionally in low-level
175user code as well.
176
177
178@node Garbage Collection
179@subsection Garbage Collection
180
181As explained above, the @code{SCM} type can represent all Scheme values.
182Some values fit entirely into a @code{SCM} value (such as small
183integers), but other values require additional storage in the heap (such
184as strings and vectors). This additional storage is managed
185automatically by Guile. You don't need to explicitely deallocate it
186when a @code{SCM} value is no longer used.
187
188Two things must be guaranteed so that Guile is able to manage the
189storage automatically: it must know about all blocks of memory that have
190ever been allocated for Scheme values, and it must know about all Scheme
191values that are still being used. Given this knowledge, Guile can
192periodically free all blocks that have been allocated but are not used
193by any active Scheme values. This activity is called @dfn{garbage
194collection}.
195
196It is easy for Guile to remember all blocks of memory that is has
197allocated for use by Scheme values, but you need to help it with finding
198all Scheme values that are in use by C code.
199
200You do this when writing a SMOB mark function, for example
201(@pxref{Garbage Collecting Smobs}). By calling this function, the
202garbage collector learns about all references that your SMOB has to
203other @code{SCM} values.
204
205Other references to @code{SCM} objects, such as global variables of type
206@code{SCM} or other random data structures in the heap that contain
207fields of type @code{SCM}, can be made visible to the garbage collector
208by calling the functions @code{scm_gc_protect} or
209@code{scm_permanent_object}. You normally use these funtions for long
210lived objects such as a hash table that is stored in a global variable.
211For temporary references in local variables or function arguments, using
212these functions would be too expensive.
213
214These references are handled differently: Local variables (and function
215arguments) of type @code{SCM} are automatically visible to the garbage
216collector. This works because the collector scans the stack for
217potential references to @code{SCM} objects and considers all referenced
218objects to be alive. The scanning considers each and every word of the
219stack, regardless of what it is actually used for, and then decides
220whether it could possible be a reference to a @code{SCM} object. Thus,
221the scanning is guaranteed to find all actual references, but it might
222also find words that only accidentally look like references. These
223`false positives' might keep @code{SCM} objects alive that would
224otherwise be considered dead. While this might waste memory, keeping an
225object around longer than it strictly needs to is harmless. This is why
226this technique is called ``conservative garbage collection''. In
227practice, the wasted memory seems to be no problem.
228
229The stack of every thread is scanned in this way and the registers of
230the CPU and all other memory locations where local variables or function
231parameters might show up are included in this scan as well.
232
233The consequence of the conservative scanning is that you can just
234declare local variables and function parameters of type @code{SCM} and
235be sure that the garbage collector will not free the corresponding
236objects.
237
238However, a local variable or function parameter is only protected as
239long as it is really on the stack (or in some register). As an
240optimization, the C compiler might reuse its location for some other
241value and the @code{SCM} object would no longer be protected. Normally,
242this leads to exactly the right behabvior: the compiler will only
243overwrite a reference when it is no longer needed and thus the object
244becomes unprotected precisely when the reference disappears, just as
245wanted.
246
247There are situations, however, where a @code{SCM} object needs to be
248around longer than its reference from a local variable or function
249parameter. This happens, for example, when you retrieve the array of
250characters from a Scheme string and work on that array directly. The
251reference to the @code{SCM} string object might be dead after the
252character array has been retrieved, but the array itself is still in use
253and thus the string object must be protected. The compiler does not
254know about this connection and might overwrite the @code{SCM} reference
255too early.
256
257To get around this problem, you can use @code{scm_remember_upto_here_1}
258and its cousins. It will keep the compiler from overwriting the
259reference. For a typical example of its use, see @ref{Remembering
260During Operations}.
261
262@node Control Flow
263@subsection Control Flow
264
265Scheme has a more general view of program flow than C, both locally and
266non-locally.
267
268Controlling the local flow of control involves things like gotos, loops,
269calling functions and returning from them. Non-local control flow
270refers to situations where the program jumps across one or more levels
271of function activations without using the normal call or return
272operations.
273
274The primitive means of C for local control flow is the @code{goto}
275statement, together with @code{if}. Loops done with @code{for},
276@code{while} or @code{do} could in principle be rewritten with just
277@code{goto} and @code{if}. In Scheme, the primitive means for local
278control flow is the @emph{function call} (together with @code{if}).
279Thus, the repetition of some computation in a loop is ultimately
280implemented by a function that calls itself, that is, by recursion.
281
282This approach is theoretically very powerful since it is easier to
283reason formally about recursion than about gotos. In C, using
284recursion exclusively would not be practical, tho, since it would eat
285up the stack very quickly. In Scheme, however, it is practical:
286function calls that appear in a @dfn{tail position} do not use any
51545a90 287additional stack space (@pxref{Tail Calls}).
3229f68b
MV
288
289A function call is in a tail position when it is the last thing the
290calling function does. The value returned by the called function is
291immediately returned from the calling function. In the following
292example, the call to @code{bar-1} is in a tail position, while the
293call to @code{bar-2} is not. (The call to @code{1-} in @code{foo-2}
294is in a tail position, tho.)
295
296@lisp
297(define (foo-1 x)
298 (bar-1 (1- x)))
299
300(define (foo-2 x)
301 (1- (bar-2 x)))
302@end lisp
303
304Thus, when you take care to recurse only in tail positions, the
305recursion will only use constant stack space and will be as good as a
306loop constructed from gotos.
307
308Scheme offers a few syntactic abstractions (@code{do} and @dfn{named}
309@code{let}) that make writing loops slightly easier.
310
311But only Scheme functions can call other functions in a tail position:
312C functions can not. This matters when you have, say, two functions
313that call each other recursively to form a common loop. The following
314(unrealistic) example shows how one might go about determing whether a
315non-negative integer @var{n} is even or odd.
316
317@lisp
318(define (my-even? n)
319 (cond ((zero? n) #t)
320 (else (my-odd? (1- n)))))
321
322(define (my-odd? n)
323 (cond ((zero? n) #f)
324 (else (my-even? (1- n)))))
325@end lisp
326
327Because the calls to @code{my-even?} and @code{my-odd?} are in tail
328positions, these two procedures can be applied to arbitrary large
329integers without overflowing the stack. (They will still take a lot
330of time, of course.)
331
332However, when one or both of the two procedures would be rewritten in
333C, it could no longer call its companion in a tail position (since C
334does not have this concept). You might need to take this
335consideration into account when deciding which parts of your program
336to write in Scheme and which in C.
337
338In addition to calling functions and returning from them, a Scheme
339program can also exit non-locally from a function so that the control
340flow returns directly to an outer level. This means that some functions
341might not return at all.
342
343Even more, it is not only possible to jump to some outer level of
344control, a Scheme program can also jump back into the middle of a
345function that has already exited. This might cause some functions to
346return more than once.
347
348In general, these non-local jumps are done by invoking
349@dfn{continuations} that have previously been captured using
350@code{call-with-current-continuation}. Guile also offers a slightly
351restricted set of functions, @code{catch} and @code{throw}, that can
352only be used for non-local exits. This restriction makes them more
353efficient. Error reporting (with the function @code{error}) is
354implemented by invoking @code{throw}, for example. The functions
355@code{catch} and @code{throw} belong to the topic of @dfn{exceptions}.
356
357Since Scheme functions can call C functions and vice versa, C code can
358experience the more general control flow of Scheme as well. It is
359possible that a C function will not return at all, or will return more
360than once. While C does offer @code{setjmp} and @code{longjmp} for
361non-local exits, it is still an unusual thing for C code. In
362contrast, non-local exits are very common in Scheme, mostly to report
363errors.
364
365You need to be prepared for the non-local jumps in the control flow
366whenever you use a function from @code{libguile}: it is best to assume
367that any @code{libguile} function might signal an error or run a pending
368signal handler (which in turn can do arbitrary things).
369
370It is often necessary to take cleanup actions when the control leaves a
371function non-locally. Also, when the control returns non-locally, some
372setup actions might be called for. For example, the Scheme function
373@code{with-output-to-port} needs to modify the global state so that
374@code{current-output-port} returns the port passed to
375@code{with-output-to-port}. The global output port needs to be reset to
376its previous value when @code{with-output-to-port} returns normally or
377when it is exited non-locally. Likewise, the port needs to be set again
378when control enters non-locally.
379
380Scheme code can use the @code{dynamic-wind} function to arrange for the
381setting and resetting of the global state. C code could use the
382corresponding @code{scm_internal_dynamic_wind} function, but it might
383prefer to use the @dfn{frames} concept that is more natural for C code,
384(@pxref{Frames}).
385
b4fddbbe
MV
386@node Asynchronous Signals
387@subsection Asynchronous Signals
388
389You can not call libguile functions from handlers for POSIX signals, but
390you can register Scheme handlers for POSIX signals such as
391@code{SIGINT}. These handlers do not run during the actual signal
392delivery. Instead, they are run when the program (more precisely, the
393thread that the handler has been registered for) reaches the next
394@emph{safe point}.
395
396The libguile functions themselves have many such safe points.
397Consequently, you must be prepared for arbitrary actions anytime you
398call a libguile function. For example, even @code{scm_cons} can contain
399a safe point and when a signal handler is pending for your thread,
400calling @code{scm_cons} will run this handler and anything might happen,
401including a non-local exit although @code{scm_cons} would not ordinarily
402do such a thing on its own.
403
404If you do not want to allow the running of asynchronous signal handlers,
405you can block them temporarily with @code{scm_frame_block_asyncs}, for
406example. See @xref{System asyncs}.
407
408Since signal handling in Guile relies on safe points, you need to make
409sure that your functions do offer enough of them. Normally, calling
410libguile functions in the normal course of action is all that is needed.
411But when a thread might spent a long time in a code section that calls
412no libguile function, it is good to include explicit safe points. This
413can allow the user to interrupt your code with @key{C-c}, for example.
414
415You can do this with the macro @code{SCM_TICK}. This macro is
416syntactically a statement. That is, you could use it like this:
417
418@example
419while (1)
420 @{
421 SCM_TICK;
422 do_some_work ();
423 @}
424@end example
425
426Frequent execution of a safe point is even more important in multi
427threaded programs, @xref{Multi-Threading}.
428
429@node Multi-Threading
430@subsection Multi-Threading
431
432Guile can be used in multi-threaded programs just as well as in
433single-threaded ones.
434
435Each thread that wants to use functions from libguile must put itself
436into @emph{guile mode} and must then follow a few rules. If it doesn't
437want to honor these rules in certain situations, a thread can
438temporarily leave guile mode (but can no longer use libguile functions
439during that time, of course).
440
441Threads enter guile mode by calling @code{scm_with_guile},
442@code{scm_boot_guile}, or @code{scm_init_guile}. As explained in the
443reference documentation for these functions, Guile will then learn about
444the stack bounds of the thread and can protect the @code{SCM} values
445that are stored in local variables. When a thread puts itself into
446guile mode for the first time, it gets a Scheme representation and is
447listed by @code{all-threads}, for example.
448
449While in guile mode, a thread promises to reach a safe point reasonably
450frequently (@pxref{Asynchronous Signals}). In addition to running
451signal handlers, these points are also potential rendezvous points of
452all guile mode threads where Guile can orchestrate global things like
453garbage collection. Consequently, when a thread in guile mode blocks
454and does no longer frequent safe points, it might cause all other guile
455mode threads to block as well. To prevent this from happening, a guile
456mode thread should either only block in libguile functions (who know how
457to do it right), or should temporarily leave guile mode with
458@code{scm_without_guile} or
459@code{scm_leave_guile}/@code{scm_enter_guile}.
460
461For some common blocking operations, Guile provides convenience
462functions. For example, if you want to lock a pthread mutex while in
463guile mode, you might want to use @code{scm_pthread_mutex_lock} which is
464just like @code{pthread_mutex_lock} except that it leaves guile mode
465while blocking.
466
467
468All libguile functions are (intended to be) robust in the face of
469multiple threads using them concurrently. This means that there is no
470risk of the internal data structures of libguile becoming corrupted in
471such a way that the process crashes.
472
473A program might still produce non-sensical results, though. Taking
474hashtables as an example, Guile guarantees that you can use them from
475multiple threads concurrently and a hashtable will always remain a valid
476hashtable and Guile will not crash when you access it. It does not
477guarantee, however, that inserting into it concurrently from two threads
478will give useful results: only one insertion might actually happen, none
479might happen, or the table might in general be modified in a totally
480arbitrary manner. (It will still be a valid hashtable, but not the one
481that you might have expected.) Guile might also signal an error when it
482detects a harmful race condition.
483
484Thus, you need to put in additional synchronizations when multiple
485threads want to use a single hashtable, or any other mutable Scheme
486object.
487
488When writing C code for use with libguile, you should try to make it
489robust as well. An example that converts a list into a vector will help
490to illustrate. Here is a correct version:
491
492@example
493SCM
494my_list_to_vector (SCM list)
495@{
496 SCM vector = scm_make_vector (scm_length (list), SCM_UNDEFINED);
497 size_t len, i;
498
499 len = SCM_SIMPLE_VECTOR_LENGTH (vector);
500 i = 0;
501 while (i < len && scm_is_pair (list))
502 @{
503 SCM_SIMPLE_VECTOR_SET (vector, i, SCM_CAR (list));
504 list = SCM_CDR (list);
505 i++;
506 @}
507
508 return vector;
509@}
510@end example
511
512The first thing to note is that storing into a @code{SCM} location
513concurrently from multiple threads is guaranteed to be robust: you don't
514know which value wins but it will in any case be a valid @code{SCM}
515value.
516
517But there is no guarantee that the list referenced by @var{list} is not
518modified in another thread while the loop iterates over it. Thus, while
519copying its elements into the vector, the list might get longer or
520shorter. For this reason, the loop must check both that it doesn't
521overrun the vector (@code{SCM_SIMPLE_VECTOR_SET} does no range-checking)
522and that it doesn't overrung the list (@code{SCM_CAR} and @code{SCM_CDR}
523likewise do no type checking).
524
525It is safe to use @code{SCM_CAR} and @code{SCM_CDR} on the local
526variable @var{list} once it is known that the variable contains a pair.
527The contents of the pair might change spontaneously, but it will always
528stay a valid pair (and a local variable will of course not spontaneously
529point to a different Scheme object).
530
531Likewise, a simple vector such as the one returned by
532@code{scm_make_vector} is guaranteed to always stay the same length so
533that it is safe to only use SCM_SIMPLE_VECTOR_LENGTH once and store the
534result. (In the example, @var{vector} is safe anyway since it is a
535fresh object that no other thread can possibly know about until it is
536returned from @code{my_list_to_vector}.)
537
538Of course the behavior of @code{my_list_to_vector} is suboptimal when
539@var{list} does indeed gets asynchronously lengthened or shortened in
540another thread. But it is robust: it will always return a valid vector.
541That vector might be shorter than expected, or its last elements might
542be unspecified, but it is a valid vector and if a program wants to rule
543out these cases, it must avoid modifying the list asynchronously.
544
545Here is another version that is also correct:
546
547@example
548SCM
549my_pedantic_list_to_vector (SCM list)
550@{
551 SCM vector = scm_make_vector (scm_length (list), SCM_UNDEFINED);
552 size_t len, i;
553
554 len = SCM_SIMPLE_VECTOR_LENGTH (vector);
555 i = 0;
556 while (i < len)
557 @{
558 SCM_SIMPLE_VECTOR_SET (vector, i, scm_car (list));
559 list = scm_cdr (list);
560 i++;
561 @}
562
563 return vector;
564@}
565@end example
566
567This version uses the type-checking and thread-robust functions
568@code{scm_car} and @code{scm_cdr} instead of the faster, but less robust
569macros @code{SCM_CAR} and @code{SCM_CDR}. When the list is shortened
570(that is, when @var{list} holds a non-pair), @code{scm_car} will throw
571an error. This might be preferable to just returning a half-initialized
572vector.
573
574The API for accessing vectors and arrays of various kinds from C takes a
575slightly different approach to thread-robustness. In order to get at
576the raw memory that stores the elements of an array, you need to
577@emph{reserve} that array as long as you need the raw memory. During
578the time an array is reserved, its elements can still spontaneously
579change their values, but the memory itself and other things like the
580size of the array are guaranteed to stay fixed. Any operation that
581would change these parameters of an array that is currently reserved
582will signal an error. In order to avoid these errors, a program should
583of course put suitable synchronization mechanisms in place. As you can
584see, Guile itself is again only concerned about robustness, not about
585correctness: without proper synchronization, your program will likely
586not be correct, but the worst consequence is an error message.