| 1 | @c -*-texinfo-*- |
| 2 | @c This is part of the GNU Guile Reference Manual. |
| 3 | @c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2005 |
| 4 | @c Free Software Foundation, Inc. |
| 5 | @c See the file guile.texi for copying conditions. |
| 6 | |
| 7 | @page |
| 8 | @node General Libguile Concepts |
| 9 | @section General concepts for using libguile |
| 10 | |
| 11 | When you want to embed the Guile Scheme interpreter into your program or |
| 12 | library, you need to link it against the @file{libguile} library |
| 13 | (@pxref{Linking Programs With Guile}). Once you have done this, your C |
| 14 | code has access to a number of data types and functions that can be used |
| 15 | to invoke the interpreter, or make new functions that you have written |
| 16 | in C available to be called from Scheme code, among other things. |
| 17 | |
| 18 | Scheme is different from C in a number of significant ways, and Guile |
| 19 | tries to make the advantages of Scheme available to C as well. Thus, in |
| 20 | addition to a Scheme interpreter, libguile also offers dynamic types, |
| 21 | garbage collection, continuations, arithmetic on arbitrary sized |
| 22 | numbers, and other things. |
| 23 | |
| 24 | The two fundamental concepts are dynamic types and garbage collection. |
| 25 | You need to understand how libguile offers them to C programs in order |
| 26 | to use the rest of libguile. Also, the more general control flow of |
| 27 | Scheme caused by continuations needs to be dealt with. |
| 28 | |
| 29 | Running asynchronous signal handlers and multi-threading is known to C |
| 30 | code already, but there are of course a few additional rules when using |
| 31 | them together with libguile. |
| 32 | |
| 33 | @menu |
| 34 | * Dynamic Types:: Dynamic Types. |
| 35 | * Garbage Collection:: Garbage Collection. |
| 36 | * Control Flow:: Control Flow. |
| 37 | * Asynchronous Signals:: Asynchronous Signals |
| 38 | * Multi-Threading:: Multi-Threading |
| 39 | @end menu |
| 40 | |
| 41 | @node Dynamic Types |
| 42 | @subsection Dynamic Types |
| 43 | |
| 44 | Scheme is a dynamically-typed language; this means that the system |
| 45 | cannot, in general, determine the type of a given expression at compile |
| 46 | time. Types only become apparent at run time. Variables do not have |
| 47 | fixed types; a variable may hold a pair at one point, an integer at the |
| 48 | next, and a thousand-element vector later. Instead, values, not |
| 49 | variables, have fixed types. |
| 50 | |
| 51 | In order to implement standard Scheme functions like @code{pair?} and |
| 52 | @code{string?} and provide garbage collection, the representation of |
| 53 | every value must contain enough information to accurately determine its |
| 54 | type at run time. Often, Scheme systems also use this information to |
| 55 | determine whether a program has attempted to apply an operation to an |
| 56 | inappropriately typed value (such as taking the @code{car} of a string). |
| 57 | |
| 58 | Because variables, pairs, and vectors may hold values of any type, |
| 59 | Scheme implementations use a uniform representation for values --- a |
| 60 | single type large enough to hold either a complete value or a pointer |
| 61 | to a complete value, along with the necessary typing information. |
| 62 | |
| 63 | In Guile, this uniform representation of all Scheme values is the C type |
| 64 | @code{SCM}. This is an opaque type and its size is typically equivalent |
| 65 | to that of a pointer to @code{void}. Thus, @code{SCM} values can be |
| 66 | passed around efficiently and they take up reasonably little storage on |
| 67 | their own. |
| 68 | |
| 69 | The most important rule is: You never access a @code{SCM} value |
| 70 | directly; you only pass it to functions or macros defined in libguile. |
| 71 | |
| 72 | As an obvious example, although a @code{SCM} variable can contain |
| 73 | integers, you can of course not compute the sum of two @code{SCM} values |
| 74 | by adding them with the C @code{+} operator. You must use the libguile |
| 75 | function @code{scm_sum}. |
| 76 | |
| 77 | Less obvious and therefore more important to keep in mind is that you |
| 78 | also cannot directly test @code{SCM} values for trueness. In Scheme, |
| 79 | the value @code{#f} is considered false and of course a @code{SCM} |
| 80 | variable can represent that value. But there is no guarantee that the |
| 81 | @code{SCM} representation of @code{#f} looks false to C code as well. |
| 82 | You need to use @code{scm_is_true} or @code{scm_is_false} to test a |
| 83 | @code{SCM} value for trueness or falseness, respectively. |
| 84 | |
| 85 | You also can not directly compare two @code{SCM} values to find out |
| 86 | whether they are identical (that is, whether they are @code{eq?} in |
| 87 | Scheme terms). You need to use @code{scm_is_eq} for this. |
| 88 | |
| 89 | The one exception is that you can directly assign a @code{SCM} value to |
| 90 | a @code{SCM} variable by using the C @code{=} operator. |
| 91 | |
| 92 | The following (contrieved) example shows how to do it right. It |
| 93 | implements a function of two arguments (@var{a} and @var{flag}) that |
| 94 | returns @var{a}+1 if @var{flag} is true, else it returns @var{a} |
| 95 | unchanged. |
| 96 | |
| 97 | @example |
| 98 | SCM |
| 99 | my_incrementing_function (SCM a, SCM flag) |
| 100 | @{ |
| 101 | SCM result; |
| 102 | |
| 103 | if (scm_is_true (flag)) |
| 104 | result = scm_sum (a, scm_from_int (1)); |
| 105 | else |
| 106 | result = a; |
| 107 | |
| 108 | return result; |
| 109 | @} |
| 110 | @end example |
| 111 | |
| 112 | Often, you need to convert between @code{SCM} values and approriate C |
| 113 | values. For example, we needed to convert the integer @code{1} to its |
| 114 | @code{SCM} representation in order to add it to @var{a}. Libguile |
| 115 | provides many function to do these conversions, both from C to |
| 116 | @code{SCM} and from @code{SCM} to C. |
| 117 | |
| 118 | The conversion functions follow a common naming pattern: those that make |
| 119 | a @code{SCM} value from a C value have names of the form |
| 120 | @code{scm_from_@var{type} (@dots{})} and those that convert a @code{SCM} |
| 121 | value to a C value use the form @code{scm_to_@var{type} (@dots{})}. |
| 122 | |
| 123 | However, it is best to avoid converting values when you can. When you |
| 124 | must combine C values and @code{SCM} values in a computation, it is |
| 125 | often better to convert the C values to @code{SCM} values and do the |
| 126 | computation by using libguile functions than to the other way around |
| 127 | (converting @code{SCM} to C and doing the computation some other way). |
| 128 | |
| 129 | As a simple example, consider this version of |
| 130 | @code{my_incrementing_function} from above: |
| 131 | |
| 132 | @example |
| 133 | SCM |
| 134 | my_other_incrementing_function (SCM a, SCM flag) |
| 135 | @{ |
| 136 | int result; |
| 137 | |
| 138 | if (scm_is_true (flag)) |
| 139 | result = scm_to_int (a) + 1; |
| 140 | else |
| 141 | result = scm_to_int (a); |
| 142 | |
| 143 | return scm_from_int (result); |
| 144 | @} |
| 145 | @end example |
| 146 | |
| 147 | This version is much less general than the original one: it will only |
| 148 | work for values @var{A} that can fit into a @code{int}. The original |
| 149 | function will work for all values that Guile can represent and that |
| 150 | @code{scm_sum} can understand, including integers bigger than @code{long |
| 151 | long}, floating point numbers, complex numbers, and new numerical types |
| 152 | that have been added to Guile by third-party libraries. |
| 153 | |
| 154 | Also, computing with @code{SCM} is not necessarily inefficient. Small |
| 155 | integers will be encoded directly in the @code{SCM} value, for example, |
| 156 | and do not need any additional memory on the heap. See @ref{Data |
| 157 | Representation} to find out the details. |
| 158 | |
| 159 | Some special @code{SCM} values are available to C code without needing |
| 160 | to convert them from C values: |
| 161 | |
| 162 | @multitable {Scheme value} {C representation} |
| 163 | @item Scheme value @tab C representation |
| 164 | @item @nicode{#f} @tab @nicode{SCM_BOOL_F} |
| 165 | @item @nicode{#t} @tab @nicode{SCM_BOOL_T} |
| 166 | @item @nicode{()} @tab @nicode{SCM_EOL} |
| 167 | @end multitable |
| 168 | |
| 169 | In addition to @code{SCM}, Guile also defines the related type |
| 170 | @code{scm_t_bits}. This is an unsigned integral type of sufficient |
| 171 | size to hold all information that is directly contained in a |
| 172 | @code{SCM} value. The @code{scm_t_bits} type is used internally by |
| 173 | Guile to do all the bit twiddling explained in @ref{Data |
| 174 | Representation}, but you will encounter it occasionally in low-level |
| 175 | user code as well. |
| 176 | |
| 177 | |
| 178 | @node Garbage Collection |
| 179 | @subsection Garbage Collection |
| 180 | |
| 181 | As explained above, the @code{SCM} type can represent all Scheme values. |
| 182 | Some values fit entirely into a @code{SCM} value (such as small |
| 183 | integers), but other values require additional storage in the heap (such |
| 184 | as strings and vectors). This additional storage is managed |
| 185 | automatically by Guile. You don't need to explicitely deallocate it |
| 186 | when a @code{SCM} value is no longer used. |
| 187 | |
| 188 | Two things must be guaranteed so that Guile is able to manage the |
| 189 | storage automatically: it must know about all blocks of memory that have |
| 190 | ever been allocated for Scheme values, and it must know about all Scheme |
| 191 | values that are still being used. Given this knowledge, Guile can |
| 192 | periodically free all blocks that have been allocated but are not used |
| 193 | by any active Scheme values. This activity is called @dfn{garbage |
| 194 | collection}. |
| 195 | |
| 196 | It is easy for Guile to remember all blocks of memory that is has |
| 197 | allocated for use by Scheme values, but you need to help it with finding |
| 198 | all Scheme values that are in use by C code. |
| 199 | |
| 200 | You do this when writing a SMOB mark function, for example |
| 201 | (@pxref{Garbage Collecting Smobs}). By calling this function, the |
| 202 | garbage collector learns about all references that your SMOB has to |
| 203 | other @code{SCM} values. |
| 204 | |
| 205 | Other references to @code{SCM} objects, such as global variables of type |
| 206 | @code{SCM} or other random data structures in the heap that contain |
| 207 | fields of type @code{SCM}, can be made visible to the garbage collector |
| 208 | by calling the functions @code{scm_gc_protect} or |
| 209 | @code{scm_permanent_object}. You normally use these funtions for long |
| 210 | lived objects such as a hash table that is stored in a global variable. |
| 211 | For temporary references in local variables or function arguments, using |
| 212 | these functions would be too expensive. |
| 213 | |
| 214 | These references are handled differently: Local variables (and function |
| 215 | arguments) of type @code{SCM} are automatically visible to the garbage |
| 216 | collector. This works because the collector scans the stack for |
| 217 | potential references to @code{SCM} objects and considers all referenced |
| 218 | objects to be alive. The scanning considers each and every word of the |
| 219 | stack, regardless of what it is actually used for, and then decides |
| 220 | whether it could possible be a reference to a @code{SCM} object. Thus, |
| 221 | the scanning is guaranteed to find all actual references, but it might |
| 222 | also find words that only accidentally look like references. These |
| 223 | `false positives' might keep @code{SCM} objects alive that would |
| 224 | otherwise be considered dead. While this might waste memory, keeping an |
| 225 | object around longer than it strictly needs to is harmless. This is why |
| 226 | this technique is called ``conservative garbage collection''. In |
| 227 | practice, the wasted memory seems to be no problem. |
| 228 | |
| 229 | The stack of every thread is scanned in this way and the registers of |
| 230 | the CPU and all other memory locations where local variables or function |
| 231 | parameters might show up are included in this scan as well. |
| 232 | |
| 233 | The consequence of the conservative scanning is that you can just |
| 234 | declare local variables and function parameters of type @code{SCM} and |
| 235 | be sure that the garbage collector will not free the corresponding |
| 236 | objects. |
| 237 | |
| 238 | However, a local variable or function parameter is only protected as |
| 239 | long as it is really on the stack (or in some register). As an |
| 240 | optimization, the C compiler might reuse its location for some other |
| 241 | value and the @code{SCM} object would no longer be protected. Normally, |
| 242 | this leads to exactly the right behabvior: the compiler will only |
| 243 | overwrite a reference when it is no longer needed and thus the object |
| 244 | becomes unprotected precisely when the reference disappears, just as |
| 245 | wanted. |
| 246 | |
| 247 | There are situations, however, where a @code{SCM} object needs to be |
| 248 | around longer than its reference from a local variable or function |
| 249 | parameter. This happens, for example, when you retrieve the array of |
| 250 | characters from a Scheme string and work on that array directly. The |
| 251 | reference to the @code{SCM} string object might be dead after the |
| 252 | character array has been retrieved, but the array itself is still in use |
| 253 | and thus the string object must be protected. The compiler does not |
| 254 | know about this connection and might overwrite the @code{SCM} reference |
| 255 | too early. |
| 256 | |
| 257 | To get around this problem, you can use @code{scm_remember_upto_here_1} |
| 258 | and its cousins. It will keep the compiler from overwriting the |
| 259 | reference. For a typical example of its use, see @ref{Remembering |
| 260 | During Operations}. |
| 261 | |
| 262 | @node Control Flow |
| 263 | @subsection Control Flow |
| 264 | |
| 265 | Scheme has a more general view of program flow than C, both locally and |
| 266 | non-locally. |
| 267 | |
| 268 | Controlling the local flow of control involves things like gotos, loops, |
| 269 | calling functions and returning from them. Non-local control flow |
| 270 | refers to situations where the program jumps across one or more levels |
| 271 | of function activations without using the normal call or return |
| 272 | operations. |
| 273 | |
| 274 | The primitive means of C for local control flow is the @code{goto} |
| 275 | statement, together with @code{if}. Loops done with @code{for}, |
| 276 | @code{while} or @code{do} could in principle be rewritten with just |
| 277 | @code{goto} and @code{if}. In Scheme, the primitive means for local |
| 278 | control flow is the @emph{function call} (together with @code{if}). |
| 279 | Thus, the repetition of some computation in a loop is ultimately |
| 280 | implemented by a function that calls itself, that is, by recursion. |
| 281 | |
| 282 | This approach is theoretically very powerful since it is easier to |
| 283 | reason formally about recursion than about gotos. In C, using |
| 284 | recursion exclusively would not be practical, tho, since it would eat |
| 285 | up the stack very quickly. In Scheme, however, it is practical: |
| 286 | function calls that appear in a @dfn{tail position} do not use any |
| 287 | additional stack space (@pxref{Tail Calls}). |
| 288 | |
| 289 | A function call is in a tail position when it is the last thing the |
| 290 | calling function does. The value returned by the called function is |
| 291 | immediately returned from the calling function. In the following |
| 292 | example, the call to @code{bar-1} is in a tail position, while the |
| 293 | call to @code{bar-2} is not. (The call to @code{1-} in @code{foo-2} |
| 294 | is in a tail position, tho.) |
| 295 | |
| 296 | @lisp |
| 297 | (define (foo-1 x) |
| 298 | (bar-1 (1- x))) |
| 299 | |
| 300 | (define (foo-2 x) |
| 301 | (1- (bar-2 x))) |
| 302 | @end lisp |
| 303 | |
| 304 | Thus, when you take care to recurse only in tail positions, the |
| 305 | recursion will only use constant stack space and will be as good as a |
| 306 | loop constructed from gotos. |
| 307 | |
| 308 | Scheme offers a few syntactic abstractions (@code{do} and @dfn{named} |
| 309 | @code{let}) that make writing loops slightly easier. |
| 310 | |
| 311 | But only Scheme functions can call other functions in a tail position: |
| 312 | C functions can not. This matters when you have, say, two functions |
| 313 | that call each other recursively to form a common loop. The following |
| 314 | (unrealistic) example shows how one might go about determing whether a |
| 315 | non-negative integer @var{n} is even or odd. |
| 316 | |
| 317 | @lisp |
| 318 | (define (my-even? n) |
| 319 | (cond ((zero? n) #t) |
| 320 | (else (my-odd? (1- n))))) |
| 321 | |
| 322 | (define (my-odd? n) |
| 323 | (cond ((zero? n) #f) |
| 324 | (else (my-even? (1- n))))) |
| 325 | @end lisp |
| 326 | |
| 327 | Because the calls to @code{my-even?} and @code{my-odd?} are in tail |
| 328 | positions, these two procedures can be applied to arbitrary large |
| 329 | integers without overflowing the stack. (They will still take a lot |
| 330 | of time, of course.) |
| 331 | |
| 332 | However, when one or both of the two procedures would be rewritten in |
| 333 | C, it could no longer call its companion in a tail position (since C |
| 334 | does not have this concept). You might need to take this |
| 335 | consideration into account when deciding which parts of your program |
| 336 | to write in Scheme and which in C. |
| 337 | |
| 338 | In addition to calling functions and returning from them, a Scheme |
| 339 | program can also exit non-locally from a function so that the control |
| 340 | flow returns directly to an outer level. This means that some functions |
| 341 | might not return at all. |
| 342 | |
| 343 | Even more, it is not only possible to jump to some outer level of |
| 344 | control, a Scheme program can also jump back into the middle of a |
| 345 | function that has already exited. This might cause some functions to |
| 346 | return more than once. |
| 347 | |
| 348 | In general, these non-local jumps are done by invoking |
| 349 | @dfn{continuations} that have previously been captured using |
| 350 | @code{call-with-current-continuation}. Guile also offers a slightly |
| 351 | restricted set of functions, @code{catch} and @code{throw}, that can |
| 352 | only be used for non-local exits. This restriction makes them more |
| 353 | efficient. Error reporting (with the function @code{error}) is |
| 354 | implemented by invoking @code{throw}, for example. The functions |
| 355 | @code{catch} and @code{throw} belong to the topic of @dfn{exceptions}. |
| 356 | |
| 357 | Since Scheme functions can call C functions and vice versa, C code can |
| 358 | experience the more general control flow of Scheme as well. It is |
| 359 | possible that a C function will not return at all, or will return more |
| 360 | than once. While C does offer @code{setjmp} and @code{longjmp} for |
| 361 | non-local exits, it is still an unusual thing for C code. In |
| 362 | contrast, non-local exits are very common in Scheme, mostly to report |
| 363 | errors. |
| 364 | |
| 365 | You need to be prepared for the non-local jumps in the control flow |
| 366 | whenever you use a function from @code{libguile}: it is best to assume |
| 367 | that any @code{libguile} function might signal an error or run a pending |
| 368 | signal handler (which in turn can do arbitrary things). |
| 369 | |
| 370 | It is often necessary to take cleanup actions when the control leaves a |
| 371 | function non-locally. Also, when the control returns non-locally, some |
| 372 | setup actions might be called for. For example, the Scheme function |
| 373 | @code{with-output-to-port} needs to modify the global state so that |
| 374 | @code{current-output-port} returns the port passed to |
| 375 | @code{with-output-to-port}. The global output port needs to be reset to |
| 376 | its previous value when @code{with-output-to-port} returns normally or |
| 377 | when it is exited non-locally. Likewise, the port needs to be set again |
| 378 | when control enters non-locally. |
| 379 | |
| 380 | Scheme code can use the @code{dynamic-wind} function to arrange for the |
| 381 | setting and resetting of the global state. C code could use the |
| 382 | corresponding @code{scm_internal_dynamic_wind} function, but it might |
| 383 | prefer to use the @dfn{frames} concept that is more natural for C code, |
| 384 | (@pxref{Frames}). |
| 385 | |
| 386 | @node Asynchronous Signals |
| 387 | @subsection Asynchronous Signals |
| 388 | |
| 389 | You can not call libguile functions from handlers for POSIX signals, but |
| 390 | you can register Scheme handlers for POSIX signals such as |
| 391 | @code{SIGINT}. These handlers do not run during the actual signal |
| 392 | delivery. Instead, they are run when the program (more precisely, the |
| 393 | thread that the handler has been registered for) reaches the next |
| 394 | @emph{safe point}. |
| 395 | |
| 396 | The libguile functions themselves have many such safe points. |
| 397 | Consequently, you must be prepared for arbitrary actions anytime you |
| 398 | call a libguile function. For example, even @code{scm_cons} can contain |
| 399 | a safe point and when a signal handler is pending for your thread, |
| 400 | calling @code{scm_cons} will run this handler and anything might happen, |
| 401 | including a non-local exit although @code{scm_cons} would not ordinarily |
| 402 | do such a thing on its own. |
| 403 | |
| 404 | If you do not want to allow the running of asynchronous signal handlers, |
| 405 | you can block them temporarily with @code{scm_frame_block_asyncs}, for |
| 406 | example. See @xref{System asyncs}. |
| 407 | |
| 408 | Since signal handling in Guile relies on safe points, you need to make |
| 409 | sure that your functions do offer enough of them. Normally, calling |
| 410 | libguile functions in the normal course of action is all that is needed. |
| 411 | But when a thread might spent a long time in a code section that calls |
| 412 | no libguile function, it is good to include explicit safe points. This |
| 413 | can allow the user to interrupt your code with @key{C-c}, for example. |
| 414 | |
| 415 | You can do this with the macro @code{SCM_TICK}. This macro is |
| 416 | syntactically a statement. That is, you could use it like this: |
| 417 | |
| 418 | @example |
| 419 | while (1) |
| 420 | @{ |
| 421 | SCM_TICK; |
| 422 | do_some_work (); |
| 423 | @} |
| 424 | @end example |
| 425 | |
| 426 | Frequent execution of a safe point is even more important in multi |
| 427 | threaded programs, @xref{Multi-Threading}. |
| 428 | |
| 429 | @node Multi-Threading |
| 430 | @subsection Multi-Threading |
| 431 | |
| 432 | Guile can be used in multi-threaded programs just as well as in |
| 433 | single-threaded ones. |
| 434 | |
| 435 | Each thread that wants to use functions from libguile must put itself |
| 436 | into @emph{guile mode} and must then follow a few rules. If it doesn't |
| 437 | want to honor these rules in certain situations, a thread can |
| 438 | temporarily leave guile mode (but can no longer use libguile functions |
| 439 | during that time, of course). |
| 440 | |
| 441 | Threads enter guile mode by calling @code{scm_with_guile}, |
| 442 | @code{scm_boot_guile}, or @code{scm_init_guile}. As explained in the |
| 443 | reference documentation for these functions, Guile will then learn about |
| 444 | the stack bounds of the thread and can protect the @code{SCM} values |
| 445 | that are stored in local variables. When a thread puts itself into |
| 446 | guile mode for the first time, it gets a Scheme representation and is |
| 447 | listed by @code{all-threads}, for example. |
| 448 | |
| 449 | While in guile mode, a thread promises to reach a safe point reasonably |
| 450 | frequently (@pxref{Asynchronous Signals}). In addition to running |
| 451 | signal handlers, these points are also potential rendezvous points of |
| 452 | all guile mode threads where Guile can orchestrate global things like |
| 453 | garbage collection. Consequently, when a thread in guile mode blocks |
| 454 | and does no longer frequent safe points, it might cause all other guile |
| 455 | mode threads to block as well. To prevent this from happening, a guile |
| 456 | mode thread should either only block in libguile functions (who know how |
| 457 | to do it right), or should temporarily leave guile mode with |
| 458 | @code{scm_without_guile} or |
| 459 | @code{scm_leave_guile}/@code{scm_enter_guile}. |
| 460 | |
| 461 | For some common blocking operations, Guile provides convenience |
| 462 | functions. For example, if you want to lock a pthread mutex while in |
| 463 | guile mode, you might want to use @code{scm_pthread_mutex_lock} which is |
| 464 | just like @code{pthread_mutex_lock} except that it leaves guile mode |
| 465 | while blocking. |
| 466 | |
| 467 | |
| 468 | All libguile functions are (intended to be) robust in the face of |
| 469 | multiple threads using them concurrently. This means that there is no |
| 470 | risk of the internal data structures of libguile becoming corrupted in |
| 471 | such a way that the process crashes. |
| 472 | |
| 473 | A program might still produce non-sensical results, though. Taking |
| 474 | hashtables as an example, Guile guarantees that you can use them from |
| 475 | multiple threads concurrently and a hashtable will always remain a valid |
| 476 | hashtable and Guile will not crash when you access it. It does not |
| 477 | guarantee, however, that inserting into it concurrently from two threads |
| 478 | will give useful results: only one insertion might actually happen, none |
| 479 | might happen, or the table might in general be modified in a totally |
| 480 | arbitrary manner. (It will still be a valid hashtable, but not the one |
| 481 | that you might have expected.) Guile might also signal an error when it |
| 482 | detects a harmful race condition. |
| 483 | |
| 484 | Thus, you need to put in additional synchronizations when multiple |
| 485 | threads want to use a single hashtable, or any other mutable Scheme |
| 486 | object. |
| 487 | |
| 488 | When writing C code for use with libguile, you should try to make it |
| 489 | robust as well. An example that converts a list into a vector will help |
| 490 | to illustrate. Here is a correct version: |
| 491 | |
| 492 | @example |
| 493 | SCM |
| 494 | my_list_to_vector (SCM list) |
| 495 | @{ |
| 496 | SCM vector = scm_make_vector (scm_length (list), SCM_UNDEFINED); |
| 497 | size_t len, i; |
| 498 | |
| 499 | len = SCM_SIMPLE_VECTOR_LENGTH (vector); |
| 500 | i = 0; |
| 501 | while (i < len && scm_is_pair (list)) |
| 502 | @{ |
| 503 | SCM_SIMPLE_VECTOR_SET (vector, i, SCM_CAR (list)); |
| 504 | list = SCM_CDR (list); |
| 505 | i++; |
| 506 | @} |
| 507 | |
| 508 | return vector; |
| 509 | @} |
| 510 | @end example |
| 511 | |
| 512 | The first thing to note is that storing into a @code{SCM} location |
| 513 | concurrently from multiple threads is guaranteed to be robust: you don't |
| 514 | know which value wins but it will in any case be a valid @code{SCM} |
| 515 | value. |
| 516 | |
| 517 | But there is no guarantee that the list referenced by @var{list} is not |
| 518 | modified in another thread while the loop iterates over it. Thus, while |
| 519 | copying its elements into the vector, the list might get longer or |
| 520 | shorter. For this reason, the loop must check both that it doesn't |
| 521 | overrun the vector (@code{SCM_SIMPLE_VECTOR_SET} does no range-checking) |
| 522 | and that it doesn't overrung the list (@code{SCM_CAR} and @code{SCM_CDR} |
| 523 | likewise do no type checking). |
| 524 | |
| 525 | It is safe to use @code{SCM_CAR} and @code{SCM_CDR} on the local |
| 526 | variable @var{list} once it is known that the variable contains a pair. |
| 527 | The contents of the pair might change spontaneously, but it will always |
| 528 | stay a valid pair (and a local variable will of course not spontaneously |
| 529 | point to a different Scheme object). |
| 530 | |
| 531 | Likewise, a simple vector such as the one returned by |
| 532 | @code{scm_make_vector} is guaranteed to always stay the same length so |
| 533 | that it is safe to only use SCM_SIMPLE_VECTOR_LENGTH once and store the |
| 534 | result. (In the example, @var{vector} is safe anyway since it is a |
| 535 | fresh object that no other thread can possibly know about until it is |
| 536 | returned from @code{my_list_to_vector}.) |
| 537 | |
| 538 | Of course the behavior of @code{my_list_to_vector} is suboptimal when |
| 539 | @var{list} does indeed gets asynchronously lengthened or shortened in |
| 540 | another thread. But it is robust: it will always return a valid vector. |
| 541 | That vector might be shorter than expected, or its last elements might |
| 542 | be unspecified, but it is a valid vector and if a program wants to rule |
| 543 | out these cases, it must avoid modifying the list asynchronously. |
| 544 | |
| 545 | Here is another version that is also correct: |
| 546 | |
| 547 | @example |
| 548 | SCM |
| 549 | my_pedantic_list_to_vector (SCM list) |
| 550 | @{ |
| 551 | SCM vector = scm_make_vector (scm_length (list), SCM_UNDEFINED); |
| 552 | size_t len, i; |
| 553 | |
| 554 | len = SCM_SIMPLE_VECTOR_LENGTH (vector); |
| 555 | i = 0; |
| 556 | while (i < len) |
| 557 | @{ |
| 558 | SCM_SIMPLE_VECTOR_SET (vector, i, scm_car (list)); |
| 559 | list = scm_cdr (list); |
| 560 | i++; |
| 561 | @} |
| 562 | |
| 563 | return vector; |
| 564 | @} |
| 565 | @end example |
| 566 | |
| 567 | This version uses the type-checking and thread-robust functions |
| 568 | @code{scm_car} and @code{scm_cdr} instead of the faster, but less robust |
| 569 | macros @code{SCM_CAR} and @code{SCM_CDR}. When the list is shortened |
| 570 | (that is, when @var{list} holds a non-pair), @code{scm_car} will throw |
| 571 | an error. This might be preferable to just returning a half-initialized |
| 572 | vector. |
| 573 | |
| 574 | The API for accessing vectors and arrays of various kinds from C takes a |
| 575 | slightly different approach to thread-robustness. In order to get at |
| 576 | the raw memory that stores the elements of an array, you need to |
| 577 | @emph{reserve} that array as long as you need the raw memory. During |
| 578 | the time an array is reserved, its elements can still spontaneously |
| 579 | change their values, but the memory itself and other things like the |
| 580 | size of the array are guaranteed to stay fixed. Any operation that |
| 581 | would change these parameters of an array that is currently reserved |
| 582 | will signal an error. In order to avoid these errors, a program should |
| 583 | of course put suitable synchronization mechanisms in place. As you can |
| 584 | see, Guile itself is again only concerned about robustness, not about |
| 585 | correctness: without proper synchronization, your program will likely |
| 586 | not be correct, but the worst consequence is an error message. |