Commit | Line | Data |
---|---|---|
21b4d3c2 JB |
1 | \input texinfo |
2 | @c -*-texinfo-*- | |
3 | @c %**start of header | |
4 | @setfilename data-rep.info | |
5 | @settitle Data Representation in Guile | |
6 | @c %**end of header | |
7 | ||
8 | @include version.texi | |
9 | ||
741d6079 | 10 | @dircategory The Algorithmic Language Scheme |
21b4d3c2 JB |
11 | @direntry |
12 | * data-rep: (data-rep). Data Representation in Guile --- how to use | |
13 | Guile objects in your C code. | |
14 | @end direntry | |
15 | ||
16 | @setchapternewpage off | |
17 | ||
18 | @ifinfo | |
19 | Data Representation in Guile | |
20 | ||
0a27f7d3 | 21 | Copyright (C) 1998, 1999 Free Software Foundation |
21b4d3c2 JB |
22 | |
23 | Permission is granted to make and distribute verbatim copies of | |
24 | this manual provided the copyright notice and this permission notice | |
25 | are preserved on all copies. | |
26 | ||
27 | @ignore | |
28 | Permission is granted to process this file through TeX and print the | |
29 | results, provided the printed document carries copying permission | |
30 | notice identical to this one except for the removal of this paragraph | |
31 | (this paragraph not being relevant to the printed manual). | |
32 | @end ignore | |
33 | ||
34 | Permission is granted to copy and distribute modified versions of this | |
35 | manual under the conditions for verbatim copying, provided that the entire | |
36 | resulting derived work is distributed under the terms of a permission | |
37 | notice identical to this one. | |
38 | ||
39 | Permission is granted to copy and distribute translations of this manual | |
40 | into another language, under the above conditions for modified versions, | |
41 | except that this permission notice may be stated in a translation approved | |
42 | by the Free Software Foundation. | |
43 | @end ifinfo | |
44 | ||
45 | @titlepage | |
46 | @sp 10 | |
47 | @comment The title is printed in a large font. | |
48 | @title Data Representation in Guile | |
741d6079 | 49 | @subtitle $Id: data-rep.texi,v 1.8 1999-12-07 22:43:01 ghouston Exp $ |
21b4d3c2 JB |
50 | @subtitle For use with Guile @value{VERSION} |
51 | @author Jim Blandy | |
52 | @author Free Software Foundation | |
53 | @author @email{jimb@@red-bean.com} | |
54 | @c The following two commands start the copyright page. | |
55 | @page | |
56 | @vskip 0pt plus 1filll | |
57 | @vskip 0pt plus 1filll | |
58 | Copyright @copyright{} 1998 Free Software Foundation | |
59 | ||
60 | Permission is granted to make and distribute verbatim copies of | |
61 | this manual provided the copyright notice and this permission notice | |
62 | are preserved on all copies. | |
63 | ||
64 | Permission is granted to copy and distribute modified versions of this | |
65 | manual under the conditions for verbatim copying, provided that the entire | |
66 | resulting derived work is distributed under the terms of a permission | |
67 | notice identical to this one. | |
68 | ||
69 | Permission is granted to copy and distribute translations of this manual | |
70 | into another language, under the above conditions for modified versions, | |
71 | except that this permission notice may be stated in a translation approved | |
72 | by Free Software Foundation. | |
73 | @end titlepage | |
74 | ||
75 | @c @smallbook | |
76 | @c @finalout | |
77 | @headings double | |
78 | ||
79 | ||
80 | @node Top, Data Representation in Scheme, (dir), (dir) | |
81 | @top Data Representation in Guile | |
82 | ||
83 | @ifinfo | |
84 | This essay is meant to provide the background necessary to read and | |
85 | write C code that manipulates Scheme values in a way that conforms to | |
86 | libguile's interface. If you would like to write or maintain a | |
87 | Guile-based application, this is the first information you need. | |
88 | ||
89 | In order to make sense of Guile's SCM_ functions, or read libguile's | |
90 | source code, it's essential to have a good grasp of how Guile actually | |
91 | represents Scheme values. Otherwise, a lot of the code, and the | |
92 | conventions it follows, won't make very much sense. | |
93 | ||
94 | We assume you know both C and Scheme, but we do not assume you are | |
95 | familiar with Guile's C interface. | |
96 | @end ifinfo | |
97 | ||
98 | @menu | |
99 | * Data Representation in Scheme:: Why things aren't just totally | |
100 | straightforward, in general terms. | |
101 | * How Guile does it:: How to write C code that manipulates | |
102 | Guile values, with an explanation | |
103 | of Guile's garbage collector. | |
104 | * Defining New Types (Smobs):: How to extend Guile with your own | |
105 | application-specific datatypes. | |
106 | @end menu | |
107 | ||
108 | @node Data Representation in Scheme, How Guile does it, Top, Top | |
109 | @section Data Representation in Scheme | |
110 | ||
111 | Scheme is a latently-typed language; this means that the system cannot, | |
112 | in general, determine the type of a given expression at compile time. | |
113 | Types only become apparent at run time. Variables do not have fixed | |
114 | types; a variable may hold a pair at one point, an integer at the next, | |
5e263dae JB |
115 | and a thousand-element vector later. Instead, values, not variables, |
116 | have fixed types. | |
21b4d3c2 JB |
117 | |
118 | In order to implement standard Scheme functions like @code{pair?} and | |
119 | @code{string?} and provide garbage collection, the representation of | |
120 | every value must contain enough information to accurately determine its | |
121 | type at run time. Often, Scheme systems also use this information to | |
122 | determine whether a program has attempted to apply an operation to an | |
123 | inappropriately typed value (such as taking the @code{car} of a string). | |
124 | ||
125 | Because variables, pairs, and vectors may hold values of any type, | |
126 | Scheme implementations use a uniform representation for values --- a | |
127 | single type large enough to hold either a complete value or a pointer | |
128 | to a complete value, along with the necessary typing information. | |
129 | ||
130 | The following sections will present a simple typing system, and then | |
131 | make some refinements to correct its major weaknesses. However, this is | |
132 | not a description of the system Guile actually uses. It is only an | |
133 | illustration of the issues Guile's system must address. We provide all | |
134 | the information one needs to work with Guile's data in @ref{How Guile | |
135 | does it}. | |
136 | ||
137 | ||
138 | @menu | |
139 | * A Simple Representation:: | |
140 | * Faster Integers:: | |
141 | * Cheaper Pairs:: | |
142 | * Guile Is Hairier:: | |
143 | @end menu | |
144 | ||
145 | @node A Simple Representation, Faster Integers, Data Representation in Scheme, Data Representation in Scheme | |
146 | @subsection A Simple Representation | |
147 | ||
148 | The simplest way to meet the above requirements in C would be to | |
149 | represent each value as a pointer to a structure containing a type | |
150 | indicator, followed by a union carrying the real value. Assuming that | |
151 | @code{SCM} is the name of our universal type, we can write: | |
152 | ||
153 | @example | |
154 | enum type @{ integer, pair, string, vector, ... @}; | |
155 | ||
156 | typedef struct value *SCM; | |
157 | ||
158 | struct value @{ | |
159 | enum type type; | |
160 | union @{ | |
161 | int integer; | |
162 | struct @{ SCM car, cdr; @} pair; | |
163 | struct @{ int length; char *elts; @} string; | |
164 | struct @{ int length; SCM *elts; @} vector; | |
165 | ... | |
166 | @} value; | |
167 | @}; | |
168 | @end example | |
169 | with the ellipses replaced with code for the remaining Scheme types. | |
170 | ||
171 | This representation is sufficient to implement all of Scheme's | |
172 | semantics. If @var{x} is an @code{SCM} value: | |
173 | @itemize @bullet | |
174 | @item | |
175 | To test if @var{x} is an integer, we can write @code{@var{x}->type == integer}. | |
176 | @item | |
177 | To find its value, we can write @code{@var{x}->value.integer}. | |
178 | @item | |
179 | To test if @var{x} is a vector, we can write @code{@var{x}->type == vector}. | |
180 | @item | |
181 | If we know @var{x} is a vector, we can write | |
182 | @code{@var{x}->value.vector.elts[0]} to refer to its first element. | |
183 | @item | |
184 | If we know @var{x} is a pair, we can write | |
185 | @code{@var{x}->value.pair.car} to extract its car. | |
186 | @end itemize | |
187 | ||
188 | ||
189 | @node Faster Integers, Cheaper Pairs, A Simple Representation, Data Representation in Scheme | |
190 | @subsection Faster Integers | |
191 | ||
192 | Unfortunately, the above representation has a serious disadvantage. In | |
193 | order to return an integer, an expression must allocate a @code{struct | |
194 | value}, initialize it to represent that integer, and return a pointer to | |
195 | it. Furthermore, fetching an integer's value requires a memory | |
196 | reference, which is much slower than a register reference on most | |
197 | processors. Since integers are extremely common, this representation is | |
198 | too costly, in both time and space. Integers should be very cheap to | |
199 | create and manipulate. | |
200 | ||
201 | One possible solution comes from the observation that, on many | |
202 | architectures, structures must be aligned on a four-byte boundary. | |
203 | (Whether or not the machine actually requires it, we can write our own | |
204 | allocator for @code{struct value} objects that assures this is true.) | |
205 | In this case, the lower two bits of the structure's address are known to | |
206 | be zero. | |
207 | ||
208 | This gives us the room we need to provide an improved representation | |
209 | for integers. We make the following rules: | |
210 | @itemize @bullet | |
211 | @item | |
212 | If the lower two bits of an @code{SCM} value are zero, then the SCM | |
213 | value is a pointer to a @code{struct value}, and everything proceeds as | |
214 | before. | |
215 | @item | |
216 | Otherwise, the @code{SCM} value represents an integer, whose value | |
217 | appears in its upper bits. | |
218 | @end itemize | |
219 | ||
220 | Here is C code implementing this convention: | |
221 | @example | |
222 | enum type @{ pair, string, vector, ... @}; | |
223 | ||
224 | typedef struct value *SCM; | |
225 | ||
226 | struct value @{ | |
227 | enum type type; | |
228 | union @{ | |
229 | struct @{ SCM car, cdr; @} pair; | |
230 | struct @{ int length; char *elts; @} string; | |
231 | struct @{ int length; SCM *elts; @} vector; | |
232 | ... | |
233 | @} value; | |
234 | @}; | |
235 | ||
236 | #define POINTER_P(x) (((int) (x) & 3) == 0) | |
237 | #define INTEGER_P(x) (! POINTER_P (x)) | |
238 | ||
239 | #define GET_INTEGER(x) ((int) (x) >> 2) | |
240 | #define MAKE_INTEGER(x) ((SCM) (((x) << 2) | 1)) | |
241 | @end example | |
242 | ||
243 | Notice that @code{integer} no longer appears as an element of @code{enum | |
244 | type}, and the union has lost its @code{integer} member. Instead, we | |
245 | use the @code{POINTER_P} and @code{INTEGER_P} macros to make a coarse | |
246 | classification of values into integers and non-integers, and do further | |
247 | type testing as before. | |
248 | ||
249 | Here's how we would answer the questions posed above (again, assume | |
250 | @var{x} is an @code{SCM} value): | |
251 | @itemize @bullet | |
252 | @item | |
253 | To test if @var{x} is an integer, we can write @code{INTEGER_P (@var{x})}. | |
254 | @item | |
255 | To find its value, we can write @code{GET_INTEGER (@var{x})}. | |
256 | @item | |
257 | To test if @var{x} is a vector, we can write: | |
258 | @example | |
259 | @code{POINTER_P (@var{x}) && @var{x}->type == vector} | |
260 | @end example | |
261 | Given the new representation, we must make sure @var{x} is truly a | |
262 | pointer before we dereference it to determine its complete type. | |
263 | @item | |
264 | If we know @var{x} is a vector, we can write | |
265 | @code{@var{x}->value.vector.elts[0]} to refer to its first element, as | |
266 | before. | |
267 | @item | |
268 | If we know @var{x} is a pair, we can write | |
269 | @code{@var{x}->value.pair.car} to extract its car, just as before. | |
270 | @end itemize | |
271 | ||
272 | This representation allows us to operate more efficiently on integers | |
273 | than the first. For example, if @var{x} and @var{y} are known to be | |
274 | integers, we can compute their sum as follows: | |
275 | @example | |
276 | MAKE_INTEGER (GET_INTEGER (@var{x}) + GET_INTEGER (@var{y})) | |
277 | @end example | |
278 | Now, integer math requires no allocation or memory references. Most | |
279 | real Scheme systems actually use an even more efficient representation, | |
280 | but this essay isn't about bit-twiddling. (Hint: what if pointers had | |
281 | @code{01} in their least significant bits, and integers had @code{00}?) | |
282 | ||
283 | ||
284 | @node Cheaper Pairs, Guile Is Hairier, Faster Integers, Data Representation in Scheme | |
285 | @subsection Cheaper Pairs | |
286 | ||
287 | However, there is yet another issue to confront. Most Scheme heaps | |
288 | contain more pairs than any other type of object; Jonathan Rees says | |
289 | that pairs occupy 45% of the heap in his Scheme implementation, Scheme | |
290 | 48. However, our representation above spends three @code{SCM}-sized | |
291 | words per pair --- one for the type, and two for the @sc{car} and | |
292 | @sc{cdr}. Is there any way to represent pairs using only two words? | |
293 | ||
294 | Let us refine the convention we established earlier. Let us assert | |
295 | that: | |
296 | @itemize @bullet | |
297 | @item | |
298 | If the bottom two bits of an @code{SCM} value are @code{#b00}, then | |
299 | it is a pointer, as before. | |
300 | @item | |
301 | If the bottom two bits are @code{#b01}, then the upper bits are an | |
302 | integer. This is a bit more restrictive than before. | |
303 | @item | |
304 | If the bottom two bits are @code{#b10}, then the value, with the bottom | |
305 | two bits masked out, is the address of a pair. | |
306 | @end itemize | |
307 | ||
308 | Here is the new C code: | |
309 | @example | |
310 | enum type @{ string, vector, ... @}; | |
311 | ||
312 | typedef struct value *SCM; | |
313 | ||
314 | struct value @{ | |
315 | enum type type; | |
316 | union @{ | |
317 | struct @{ int length; char *elts; @} string; | |
318 | struct @{ int length; SCM *elts; @} vector; | |
319 | ... | |
320 | @} value; | |
321 | @}; | |
322 | ||
323 | struct pair @{ | |
324 | SCM car, cdr; | |
325 | @}; | |
326 | ||
327 | #define POINTER_P(x) (((int) (x) & 3) == 0) | |
328 | ||
329 | #define INTEGER_P(x) (((int) (x) & 3) == 1) | |
330 | #define GET_INTEGER(x) ((int) (x) >> 2) | |
331 | #define MAKE_INTEGER(x) ((SCM) (((x) << 2) | 1)) | |
332 | ||
333 | #define PAIR_P(x) (((int) (x) & 3) == 2) | |
334 | #define GET_PAIR(x) ((struct pair *) ((int) (x) & ~3)) | |
335 | @end example | |
336 | ||
337 | Notice that @code{enum type} and @code{struct value} now only contain | |
338 | provisions for vectors and strings; both integers and pairs have become | |
339 | special cases. The code above also assumes that an @code{int} is large | |
340 | enough to hold a pointer, which isn't generally true. | |
341 | ||
342 | ||
343 | Our list of examples is now as follows: | |
344 | @itemize @bullet | |
345 | @item | |
346 | To test if @var{x} is an integer, we can write @code{INTEGER_P | |
347 | (@var{x})}; this is as before. | |
348 | @item | |
349 | To find its value, we can write @code{GET_INTEGER (@var{x})}, as | |
350 | before. | |
351 | @item | |
352 | To test if @var{x} is a vector, we can write: | |
353 | @example | |
354 | @code{POINTER_P (@var{x}) && @var{x}->type == vector} | |
355 | @end example | |
356 | We must still make sure that @var{x} is a pointer to a @code{struct | |
357 | value} before dereferencing it to find its type. | |
358 | @item | |
359 | If we know @var{x} is a vector, we can write | |
360 | @code{@var{x}->value.vector.elts[0]} to refer to its first element, as | |
361 | before. | |
362 | @item | |
363 | We can write @code{PAIR_P (@var{x})} to determine if @var{x} is a | |
364 | pair, and then write @code{GET_PAIR (@var{x})->car} to refer to its | |
365 | car. | |
366 | @end itemize | |
367 | ||
368 | This change in representation reduces our heap size by 15%. It also | |
369 | makes it cheaper to decide if a value is a pair, because no memory | |
370 | references are necessary; it suffices to check the bottom two bits of | |
371 | the @code{SCM} value. This may be significant when traversing lists, a | |
372 | common activity in a Scheme system. | |
373 | ||
374 | Again, most real Scheme systems use a slighty different implementation; | |
375 | for example, if GET_PAIR subtracts off the low bits of @code{x}, instead | |
376 | of masking them off, the optimizer will often be able to combine that | |
377 | subtraction with the addition of the offset of the structure member we | |
378 | are referencing, making a modified pointer as fast to use as an | |
379 | unmodified pointer. | |
380 | ||
381 | ||
382 | @node Guile Is Hairier, , Cheaper Pairs, Data Representation in Scheme | |
383 | @subsection Guile Is Hairier | |
384 | ||
385 | We originally started with a very simple typing system --- each object | |
386 | has a field that indicates its type. Then, for the sake of efficiency | |
387 | in both time and space, we moved some of the typing information directly | |
388 | into the @code{SCM} value, and left the rest in the @code{struct value}. | |
389 | Guile itself employs a more complex hierarchy, storing finer and finer | |
390 | gradations of type information in different places, depending on the | |
391 | object's coarser type. | |
392 | ||
393 | In the author's opinion, Guile could be simplified greatly without | |
394 | significant loss of efficiency, but the simplified system would still be | |
395 | more complex than what we've presented above. | |
396 | ||
397 | ||
398 | @node How Guile does it, Defining New Types (Smobs), Data Representation in Scheme, Top | |
399 | @section How Guile does it | |
400 | ||
401 | Here we present the specifics of how Guile represents its data. We | |
402 | don't go into complete detail; an exhaustive description of Guile's | |
403 | system would be boring, and we do not wish to encourage people to write | |
404 | code which depends on its details anyway. We do, however, present | |
405 | everything one need know to use Guile's data. | |
406 | ||
407 | ||
408 | @menu | |
409 | * General Rules:: | |
410 | * Garbage Collection:: | |
411 | * Immediates vs. Non-immediates:: | |
412 | * Immediate Datatypes:: | |
413 | * Non-immediate Datatypes:: | |
414 | * Signalling Type Errors:: | |
415 | @end menu | |
416 | ||
417 | @node General Rules, Garbage Collection, How Guile does it, How Guile does it | |
418 | @subsection General Rules | |
419 | ||
420 | Any code which operates on Guile datatypes must @code{#include} the | |
421 | header file @code{<libguile.h>}. This file contains a definition for | |
422 | the @code{SCM} typedef (Guile's universal type, as in the examples | |
423 | above), and definitions and declarations for a host of macros and | |
424 | functions that operate on @code{SCM} values. | |
425 | ||
426 | All identifiers declared by @code{<libguile.h>} begin with @code{scm_} | |
427 | or @code{SCM_}. | |
428 | ||
429 | @c [[I wish this were true, but I don't think it is at the moment. -JimB]] | |
430 | @c Macros do not evaluate their arguments more than once, unless documented | |
431 | @c to do so. | |
432 | ||
433 | The functions described here generally check the types of their | |
434 | @code{SCM} arguments, and signal an error if their arguments are of an | |
435 | inappropriate type. Macros generally do not, unless that is their | |
436 | specified purpose. You must verify their argument types beforehand, as | |
437 | necessary. | |
438 | ||
439 | Macros and functions that return a boolean value have names ending in | |
440 | @code{P} or @code{_p} (for ``predicate''). Those that return a negated | |
441 | boolean value have names starting with @code{SCM_N}. For example, | |
442 | @code{SCM_IMP (@var{x})} is a predicate which returns non-zero iff | |
443 | @var{x} is an immediate value (an @code{IM}). @code{SCM_NCONSP | |
444 | (@var{x})} is a predicate which returns non-zero iff @var{x} is | |
445 | @emph{not} a pair object (a @code{CONS}). | |
446 | ||
447 | ||
448 | @node Garbage Collection, Immediates vs. Non-immediates, General Rules, How Guile does it | |
449 | @subsection Garbage Collection | |
450 | ||
451 | Aside from the latent typing, the major source of constraints on a | |
452 | Scheme implementation's data representation is the garbage collector. | |
453 | The collector must be able to traverse every live object in the heap, to | |
454 | determine which objects are not live. | |
455 | ||
456 | There are many ways to implement this, but Guile uses an algorithm | |
457 | called @dfn{mark and sweep}. The collector scans the system's global | |
458 | variables and the local variables on the stack to determine which | |
459 | objects are immediately accessible by the C code. It then scans those | |
460 | objects to find the objects they point to, @i{et cetera}. The collector | |
461 | sets a @dfn{mark bit} on each object it finds, so each object is | |
462 | traversed only once. This process is called @dfn{tracing}. | |
463 | ||
464 | When the collector can find no unmarked objects pointed to by marked | |
465 | objects, it assumes that any objects that are still unmarked will never | |
466 | be used by the program (since there is no path of dereferences from any | |
467 | global or local variable that reaches them) and deallocates them. | |
468 | ||
469 | In the above paragraphs, we did not specify how the garbage collector | |
470 | finds the global and local variables; as usual, there are many different | |
471 | approaches. Frequently, the programmer must maintain a list of pointers | |
472 | to all global variables that refer to the heap, and another list | |
473 | (adjusted upon entry to and exit from each function) of local variables, | |
474 | for the collector's benefit. | |
475 | ||
476 | The list of global variables is usually not too difficult to maintain, | |
477 | since global variables are relatively rare. However, an explicitly | |
478 | maintained list of local variables (in the author's personal experience) | |
479 | is a nightmare to maintain. Thus, Guile uses a technique called | |
480 | @dfn{conservative garbage collection}, to make the local variable list | |
481 | unnecessary. | |
482 | ||
483 | The trick to conservative collection is to treat the stack as an | |
484 | ordinary range of memory, and assume that @emph{every} word on the stack | |
485 | is a pointer into the heap. Thus, the collector marks all objects whose | |
486 | addresses appear anywhere in the stack, without knowing for sure how | |
487 | that word is meant to be interpreted. | |
488 | ||
489 | Obviously, such a system will occasionally retain objects that are | |
490 | actually garbage, and should be freed. In practice, this is not a | |
491 | problem. The alternative, an explicitly maintained list of local | |
492 | variable addresses, is effectively much less reliable, due to programmer | |
493 | error. | |
494 | ||
495 | To accomodate this technique, data must be represented so that the | |
496 | collector can accurately determine whether a given stack word is a | |
497 | pointer or not. Guile does this as follows: | |
498 | @itemize @bullet | |
499 | ||
500 | @item | |
501 | Every heap object has a two-word header, called a @dfn{cell}. Some | |
502 | objects, like pairs, fit entirely in a cell's two words; others may | |
503 | store pointers to additional memory in either of the words. For | |
504 | example, strings and vectors store their length in the first word, and a | |
505 | pointer to their elements in the second. | |
506 | ||
507 | @item | |
508 | Guile allocates whole arrays of cells at a time, called @dfn{heap | |
509 | segments}. These segments are always allocated so that the cells they | |
510 | contain fall on eight-byte boundaries, or whatever is appropriate for | |
511 | the machine's word size. Guile keeps all cells in a heap segment | |
512 | initialized, whether or not they are currently in use. | |
513 | ||
514 | @item | |
515 | Guile maintains a sorted table of heap segments. | |
516 | ||
517 | @end itemize | |
518 | ||
519 | Thus, given any random word @var{w} fetched from the stack, Guile's | |
520 | garbage collector can consult the table to see if @var{w} falls within a | |
521 | known heap segment, and check @var{w}'s alignment. If both tests pass, | |
522 | the collector knows that @var{w} is a valid pointer to a cell, | |
523 | intentional or not, and proceeds to trace the cell. | |
524 | ||
525 | Note that heap segments do not contain all the data Guile uses; cells | |
526 | for objects like vectors and strings contain pointers to other memory | |
527 | areas. However, since those pointers are internal, and not shared among | |
528 | many pieces of code, it is enough for the collector to find the cell, | |
529 | and then use the cell's type to find more pointers to trace. | |
530 | ||
531 | ||
532 | @node Immediates vs. Non-immediates, Immediate Datatypes, Garbage Collection, How Guile does it | |
533 | @subsection Immediates vs. Non-immediates | |
534 | ||
535 | Guile classifies Scheme objects into two kinds: those that fit entirely | |
536 | within an @code{SCM}, and those that require heap storage. | |
537 | ||
538 | The former class are called @dfn{immediates}. The class of immediates | |
539 | includes small integers, characters, boolean values, the empty list, the | |
540 | mysterious end-of-file object, and some others. | |
541 | ||
542 | The remaining types are called, not suprisingly, @dfn{non-immediates}. | |
543 | They include pairs, procedures, strings, vectors, and all other data | |
544 | types in Guile. | |
545 | ||
546 | @deftypefn Macro int SCM_IMP (SCM @var{x}) | |
547 | Return non-zero iff @var{x} is an immediate object. | |
548 | @end deftypefn | |
549 | ||
550 | @deftypefn Macro int SCM_NIMP (SCM @var{x}) | |
551 | Return non-zero iff @var{x} is a non-immediate object. This is the | |
552 | exact complement of @code{SCM_IMP}, above. | |
553 | ||
554 | You must use this macro before calling a finer-grained predicate to | |
555 | determine @var{x}'s type. For example, to see if @var{x} is a pair, you | |
556 | must write: | |
557 | @example | |
558 | SCM_NIMP (@var{x}) && SCM_CONSP (@var{x}) | |
559 | @end example | |
560 | This is because Guile stores typing information for non-immediate values | |
561 | in their cells, rather than in the @code{SCM} value itself; thus, you | |
562 | must determine whether @var{x} refers to a cell before looking inside | |
563 | it. | |
564 | ||
565 | This is somewhat of a pity, because it means that the programmer needs | |
566 | to know which types Guile implements as immediates vs. non-immediates. | |
567 | There are (possibly better) representations in which @code{SCM_CONSP} | |
568 | can be self-sufficient. The immediate type predicates do not suffer | |
569 | from this weakness. | |
570 | @end deftypefn | |
571 | ||
572 | ||
573 | @node Immediate Datatypes, Non-immediate Datatypes, Immediates vs. Non-immediates, How Guile does it | |
574 | @subsection Immediate Datatypes | |
575 | ||
576 | The following datatypes are immediate values; that is, they fit entirely | |
577 | within an @code{SCM} value. The @code{SCM_IMP} and @code{SCM_NIMP} | |
578 | macros will distinguish these from non-immediates; see @ref{Immediates | |
579 | vs. Non-immediates} for an explanation of the distinction. | |
580 | ||
581 | Note that the type predicates for immediate values work correctly on any | |
582 | @code{SCM} value; you do not need to call @code{SCM_IMP} first, to | |
583 | establish that a value is immediate. This differs from the | |
584 | non-immediate type predicates, which work correctly only on | |
585 | non-immediate values; you must be sure the value is @code{SCM_NIMP} | |
586 | before applying them. | |
587 | ||
588 | ||
589 | @menu | |
590 | * Integers:: | |
591 | * Characters:: | |
592 | * Booleans:: | |
593 | * Unique Values:: | |
594 | @end menu | |
595 | ||
596 | @node Integers, Characters, Immediate Datatypes, Immediate Datatypes | |
597 | @subsubsection Integers | |
598 | ||
599 | Here are functions for operating on small integers, that fit within an | |
600 | @code{SCM}. Such integers are called @dfn{immediate numbers}, or | |
601 | @dfn{INUMs}. In general, INUMs occupy all but two bits of an | |
602 | @code{SCM}. | |
603 | ||
604 | Bignums and floating-point numbers are non-immediate objects, and have | |
605 | their own, separate accessors. The functions here will not work on | |
606 | them. This is not as much of a problem as you might think, however, | |
607 | because the system never constructs bignums that could fit in an INUM, | |
608 | and never uses floating point values for exact integers. | |
609 | ||
610 | @deftypefn Macro int SCM_INUMP (SCM @var{x}) | |
611 | Return non-zero iff @var{x} is a small integer value. | |
612 | @end deftypefn | |
613 | ||
614 | @deftypefn Macro int SCM_NINUMP (SCM @var{x}) | |
615 | The complement of SCM_INUMP. | |
616 | @end deftypefn | |
617 | ||
618 | @deftypefn Macro int SCM_INUM (SCM @var{x}) | |
619 | Return the value of @var{x} as an ordinary, C integer. If @var{x} | |
620 | is not an INUM, the result is undefined. | |
621 | @end deftypefn | |
622 | ||
623 | @deftypefn Macro SCM SCM_MAKINUM (int @var{i}) | |
624 | Given a C integer @var{i}, return its representation as an @code{SCM}. | |
625 | This function does not check for overflow. | |
626 | @end deftypefn | |
627 | ||
628 | ||
629 | @node Characters, Booleans, Integers, Immediate Datatypes | |
630 | @subsubsection Characters | |
631 | ||
632 | Here are functions for operating on characters. | |
633 | ||
634 | @deftypefn Macro int SCM_ICHRP (SCM @var{x}) | |
635 | Return non-zero iff @var{x} is a character value. | |
636 | @end deftypefn | |
637 | ||
638 | @deftypefn Macro {unsigned int} SCM_ICHR (SCM @var{x}) | |
639 | Return the value of @code{x} as a C character. If @var{x} is not a | |
640 | Scheme character, the result is undefined. | |
641 | @end deftypefn | |
642 | ||
643 | @deftypefn Macro SCM SCM_MAKICHR (SCM @var{c}) | |
644 | Given a C character @var{c}, return its representation as a Scheme | |
645 | character value. | |
646 | @end deftypefn | |
647 | ||
648 | ||
649 | @node Booleans, Unique Values, Characters, Immediate Datatypes | |
650 | @subsubsection Booleans | |
651 | ||
652 | Here are functions and macros for operating on booleans. | |
653 | ||
654 | @deftypefn Macro SCM SCM_BOOL_T | |
655 | @deftypefnx Macro SCM SCM_BOOL_F | |
656 | The Scheme true and false values. | |
657 | @end deftypefn | |
658 | ||
659 | @deftypefn Macro int SCM_NFALSEP (@var{x}) | |
660 | Convert the Scheme boolean value to a C boolean. Since every object in | |
661 | Scheme except @code{#f} is true, this amounts to comparing @var{x} to | |
662 | @code{#f}; hence the name. | |
663 | @c Noel feels a chill here. | |
664 | @end deftypefn | |
665 | ||
666 | @deftypefn Macro SCM SCM_BOOL_NOT (@var{x}) | |
667 | Return the boolean inverse of @var{x}. If @var{x} is not a | |
668 | Scheme boolean, the result is undefined. | |
669 | @end deftypefn | |
670 | ||
671 | ||
672 | @node Unique Values, , Booleans, Immediate Datatypes | |
673 | @subsubsection Unique Values | |
674 | ||
675 | The immediate values that are neither small integers, characters, nor | |
676 | booleans are all unique values --- that is, datatypes with only one | |
677 | instance. | |
678 | ||
679 | @deftypefn Macro SCM SCM_EOL | |
680 | The Scheme empty list object, or ``End Of List'' object, usually written | |
681 | in Scheme as @code{'()}. | |
682 | @end deftypefn | |
683 | ||
684 | @deftypefn Macro SCM SCM_EOF_VAL | |
685 | The Scheme end-of-file value. It has no standard written | |
686 | representation, for obvious reasons. | |
687 | @end deftypefn | |
688 | ||
689 | @deftypefn Macro SCM SCM_UNSPECIFIED | |
690 | The value returned by expressions which the Scheme standard says return | |
691 | an ``unspecified'' value. | |
692 | ||
693 | This is sort of a weirdly literal way to take things, but the standard | |
694 | read-eval-print loop prints nothing when the expression returns this | |
695 | value, so it's not a bad idea to return this when you can't think of | |
696 | anything else helpful. | |
697 | @end deftypefn | |
698 | ||
699 | @deftypefn Macro SCM SCM_UNDEFINED | |
700 | The ``undefined'' value. Its most important property is that is not | |
701 | equal to any valid Scheme value. This is put to various internal uses | |
702 | by C code interacting with Guile. | |
703 | ||
704 | For example, when you write a C function that is callable from Scheme | |
705 | and which takes optional arguments, the interpreter passes | |
706 | @code{SCM_UNDEFINED} for any arguments you did not receive. | |
707 | ||
708 | We also use this to mark unbound variables. | |
709 | @end deftypefn | |
710 | ||
711 | @deftypefn Macro int SCM_UNBNDP (SCM @var{x}) | |
712 | Return true if @var{x} is @code{SCM_UNDEFINED}. Apply this to a | |
713 | symbol's value to see if it has a binding as a global variable. | |
714 | @end deftypefn | |
715 | ||
716 | ||
717 | @node Non-immediate Datatypes, Signalling Type Errors, Immediate Datatypes, How Guile does it | |
718 | @subsection Non-immediate Datatypes | |
719 | ||
720 | A non-immediate datatype is one which lives in the heap, either because | |
721 | it cannot fit entirely within a @code{SCM} word, or because it denotes a | |
722 | specific storage location (in the nomenclature of the Revised^4 Report | |
723 | on Scheme). | |
724 | ||
725 | The @code{SCM_IMP} and @code{SCM_NIMP} macros will distinguish these | |
726 | from immediates; see @ref{Immediates vs. Non-immediates}. | |
727 | ||
728 | Given a cell, Guile distinguishes between pairs and other non-immediate | |
729 | types by storing special @dfn{tag} values in a non-pair cell's car, that | |
730 | cannot appear in normal pairs. A cell with a non-tag value in its car | |
731 | is an ordinary pair. The type of a cell with a tag in its car depends | |
732 | on the tag; the non-immediate type predicates test this value. If a tag | |
733 | value appears elsewhere (in a vector, for example), the heap may become | |
734 | corrupted. | |
735 | ||
736 | ||
737 | @menu | |
738 | * Non-immediate Type Predicates:: Special rules for using the type | |
739 | predicates described here. | |
740 | * Pairs:: | |
741 | * Vectors:: | |
742 | * Procedures:: | |
743 | * Closures:: | |
744 | * Subrs:: | |
745 | * Ports:: | |
746 | @end menu | |
747 | ||
748 | @node Non-immediate Type Predicates, Pairs, Non-immediate Datatypes, Non-immediate Datatypes | |
749 | @subsubsection Non-immediate Type Predicates | |
750 | ||
751 | As mentioned in @ref{Garbage Collection}, all non-immediate objects | |
752 | start with a @dfn{cell}, or a pair of words. Furthermore, all type | |
753 | information that distinguishes one kind of non-immediate from another is | |
754 | stored in the cell. The type information in the @code{SCM} value | |
755 | indicates only that the object is a non-immediate; all finer | |
756 | distinctions require one to examine the cell itself, usually with the | |
757 | appropriate type predicate macro. | |
758 | ||
759 | The type predicates for non-immediate objects generally assume that | |
760 | their argument is a non-immediate value. Thus, you must be sure that a | |
761 | value is @code{SCM_NIMP} first before passing it to a non-immediate type | |
762 | predicate. Thus, the idiom for testing whether a value is a cell or not | |
763 | is: | |
764 | @example | |
765 | SCM_NIMP (@var{x}) && SCM_CONSP (@var{x}) | |
766 | @end example | |
767 | ||
768 | ||
769 | @node Pairs, Vectors, Non-immediate Type Predicates, Non-immediate Datatypes | |
770 | @subsubsection Pairs | |
771 | ||
772 | Pairs are the essential building block of list structure in Scheme. A | |
773 | pair object has two fields, called the @dfn{car} and the @dfn{cdr}. | |
774 | ||
775 | It is conventional for a pair's @sc{car} to contain an element of a | |
776 | list, and the @sc{cdr} to point to the next pair in the list, or to | |
777 | contain @code{SCM_EOL}, indicating the end of the list. Thus, a set of | |
778 | pairs chained through their @sc{cdr}s constitutes a singly-linked list. | |
779 | Scheme and libguile define many functions which operate on lists | |
780 | constructed in this fashion, so although lists chained through the | |
781 | @sc{car}s of pairs will work fine too, they may be less convenient to | |
782 | manipulate, and receive less support from the community. | |
783 | ||
784 | Guile implements pairs by mapping the @sc{car} and @sc{cdr} of a pair | |
785 | directly into the two words of the cell. | |
786 | ||
787 | ||
788 | @deftypefn Macro int SCM_CONSP (SCM @var{x}) | |
789 | Return non-zero iff @var{x} is a Scheme pair object. | |
790 | The results are undefined if @var{x} is an immediate value. | |
791 | @end deftypefn | |
792 | ||
793 | @deftypefn Macro int SCM_NCONSP (SCM @var{x}) | |
794 | The complement of SCM_CONSP. | |
795 | @end deftypefn | |
796 | ||
797 | @deftypefn Macro void SCM_NEWCELL (SCM @var{into}) | |
798 | Allocate a new cell, and set @var{into} to point to it. This macro | |
799 | expands to a statement, not an expression, and @var{into} must be an | |
800 | lvalue of type SCM. | |
801 | ||
802 | This is the most primitive way to allocate a cell; it is quite fast. | |
803 | ||
804 | The @sc{car} of the cell initially tags it as a ``free cell''. If the | |
805 | caller intends to use it as an ordinary cons, she must store ordinary | |
806 | SCM values in its @sc{car} and @sc{cdr}. | |
807 | ||
808 | If the caller intends to use it as a header for some other type, she | |
809 | must store an appropriate magic value in the cell's @sc{car}, to mark | |
810 | it as a member of that type, and store whatever value in the @sc{cdr} | |
811 | that type expects. You should generally not do this, unless you are | |
812 | implementing a new datatype, and thoroughly understand the code in | |
813 | @code{<libguile/tags.h>}. | |
814 | @end deftypefn | |
815 | ||
816 | @deftypefun SCM scm_cons (SCM @var{car}, SCM @var{cdr}) | |
817 | Allocate (``CONStruct'') a new pair, with @var{car} and @var{cdr} as its | |
818 | contents. | |
819 | @end deftypefun | |
820 | ||
821 | ||
822 | The macros below perform no typechecking. The results are undefined if | |
823 | @var{cell} is an immediate. However, since all non-immediate Guile | |
824 | objects are constructed from cells, and these macros simply return the | |
825 | first element of a cell, they actually can be useful on datatypes other | |
826 | than pairs. (Of course, it is not very modular to use them outside of | |
827 | the code which implements that datatype.) | |
828 | ||
829 | @deftypefn Macro SCM SCM_CAR (SCM @var{cell}) | |
830 | Return the @sc{car}, or first field, of @var{cell}. | |
831 | @end deftypefn | |
832 | ||
833 | @deftypefn Macro SCM SCM_CDR (SCM @var{cell}) | |
834 | Return the @sc{cdr}, or second field, of @var{cell}. | |
835 | @end deftypefn | |
836 | ||
837 | @deftypefn Macro void SCM_SETCAR (SCM @var{cell}, SCM @var{x}) | |
838 | Set the @sc{car} of @var{cell} to @var{x}. | |
839 | @end deftypefn | |
840 | ||
841 | @deftypefn Macro void SCM_SETCDR (SCM @var{cell}, SCM @var{x}) | |
842 | Set the @sc{cdr} of @var{cell} to @var{x}. | |
843 | @end deftypefn | |
844 | ||
845 | @deftypefn Macro SCM SCM_CAAR (SCM @var{cell}) | |
846 | @deftypefnx Macro SCM SCM_CADR (SCM @var{cell}) | |
847 | @deftypefnx Macro SCM SCM_CDAR (SCM @var{cell}) @dots{} | |
848 | @deftypefnx Macro SCM SCM_CDDDDR (SCM @var{cell}) | |
849 | Return the @sc{car} of the @sc{car} of @var{cell}, the @sc{car} of the | |
850 | @sc{cdr} of @var{cell}, @i{et cetera}. | |
851 | @end deftypefn | |
852 | ||
853 | ||
854 | @node Vectors, Procedures, Pairs, Non-immediate Datatypes | |
855 | @subsubsection Vectors, Strings, and Symbols | |
856 | ||
857 | Vectors, strings, and symbols have some properties in common. They all | |
858 | have a length, and they all have an array of elements. In the case of a | |
859 | vector, the elements are @code{SCM} values; in the case of a string or | |
860 | symbol, the elements are characters. | |
861 | ||
862 | All these types store their length (along with some tagging bits) in the | |
863 | @sc{car} of their header cell, and store a pointer to the elements in | |
864 | their @sc{cdr}. Thus, the @code{SCM_CAR} and @code{SCM_CDR} macros | |
865 | are (somewhat) meaningful when applied to these datatypes. | |
866 | ||
867 | @deftypefn Macro int SCM_VECTORP (SCM @var{x}) | |
868 | Return non-zero iff @var{x} is a vector. | |
869 | The results are undefined if @var{x} is an immediate value. | |
870 | @end deftypefn | |
871 | ||
872 | @deftypefn Macro int SCM_STRINGP (SCM @var{x}) | |
873 | Return non-zero iff @var{x} is a string. | |
874 | The results are undefined if @var{x} is an immediate value. | |
875 | @end deftypefn | |
876 | ||
877 | @deftypefn Macro int SCM_SYMBOLP (SCM @var{x}) | |
878 | Return non-zero iff @var{x} is a symbol. | |
879 | The results are undefined if @var{x} is an immediate value. | |
880 | @end deftypefn | |
881 | ||
882 | @deftypefn Macro int SCM_LENGTH (SCM @var{x}) | |
883 | Return the length of the object @var{x}. | |
884 | The results are undefined if @var{x} is not a vector, string, or symbol. | |
885 | @end deftypefn | |
886 | ||
887 | @deftypefn Macro {SCM *} SCM_VELTS (SCM @var{x}) | |
888 | Return a pointer to the array of elements of the vector @var{x}. | |
889 | The results are undefined if @var{x} is not a vector. | |
890 | @end deftypefn | |
891 | ||
892 | @deftypefn Macro {char *} SCM_CHARS (SCM @var{x}) | |
893 | Return a pointer to the characters of @var{x}. | |
894 | The results are undefined if @var{x} is not a symbol or a string. | |
895 | @end deftypefn | |
896 | ||
897 | There are also a few magic values stuffed into memory before a symbol's | |
898 | characters, but you don't want to know about those. What cruft! | |
899 | ||
900 | ||
901 | @node Procedures, Closures, Vectors, Non-immediate Datatypes | |
902 | @subsubsection Procedures | |
903 | ||
904 | Guile provides two kinds of procedures: @dfn{closures}, which are the | |
905 | result of evaluating a @code{lambda} expression, and @dfn{subrs}, which | |
906 | are C functions packaged up as Scheme objects, to make them available to | |
907 | Scheme programmers. | |
908 | ||
909 | (There are actually other sorts of procedures: compiled closures, and | |
910 | continuations; see the source code for details about them.) | |
911 | ||
912 | @deftypefun SCM scm_procedure_p (SCM @var{x}) | |
913 | Return @code{SCM_BOOL_T} iff @var{x} is a Scheme procedure object, of | |
914 | any sort. Otherwise, return @code{SCM_BOOL_F}. | |
915 | @end deftypefun | |
916 | ||
917 | ||
918 | @node Closures, Subrs, Procedures, Non-immediate Datatypes | |
919 | @subsubsection Closures | |
920 | ||
921 | [FIXME: this needs to be further subbed, but texinfo has no subsubsub] | |
922 | ||
923 | A closure is a procedure object, generated as the value of a | |
924 | @code{lambda} expression in Scheme. The representation of a closure is | |
925 | straightforward --- it contains a pointer to the code of the lambda | |
926 | expression from which it was created, and a pointer to the environment | |
927 | it closes over. | |
928 | ||
929 | In Guile, each closure also has a property list, allowing the system to | |
930 | store information about the closure. I'm not sure what this is used for | |
931 | at the moment --- the debugger, maybe? | |
932 | ||
933 | @deftypefn Macro int SCM_CLOSUREP (SCM @var{x}) | |
934 | Return non-zero iff @var{x} is a closure. The results are | |
935 | undefined if @var{x} is an immediate value. | |
936 | @end deftypefn | |
937 | ||
938 | @deftypefn Macro SCM SCM_PROCPROPS (SCM @var{x}) | |
939 | Return the property list of the closure @var{x}. The results are | |
940 | undefined if @var{x} is not a closure. | |
941 | @end deftypefn | |
942 | ||
943 | @deftypefn Macro void SCM_SETPROCPROPS (SCM @var{x}, SCM @var{p}) | |
944 | Set the property list of the closure @var{x} to @var{p}. The results | |
945 | are undefined if @var{x} is not a closure. | |
946 | @end deftypefn | |
947 | ||
948 | @deftypefn Macro SCM SCM_CODE (SCM @var{x}) | |
949 | Return the code of the closure @var{x}. The results are undefined if | |
950 | @var{x} is not a closure. | |
951 | ||
952 | This function should probably only be used internally by the | |
953 | interpreter, since the representation of the code is intimately | |
954 | connected with the interpreter's implementation. | |
955 | @end deftypefn | |
956 | ||
957 | @deftypefn Macro SCM SCM_ENV (SCM @var{x}) | |
958 | Return the environment enclosed by @var{x}. | |
959 | The results are undefined if @var{x} is not a closure. | |
960 | ||
961 | This function should probably only be used internally by the | |
962 | interpreter, since the representation of the environment is intimately | |
963 | connected with the interpreter's implementation. | |
964 | @end deftypefn | |
965 | ||
966 | ||
967 | @node Subrs, Ports, Closures, Non-immediate Datatypes | |
968 | @subsubsection Subrs | |
969 | ||
970 | [FIXME: this needs to be further subbed, but texinfo has no subsubsub] | |
971 | ||
972 | A subr is a pointer to a C function, packaged up as a Scheme object to | |
973 | make it callable by Scheme code. In addition to the function pointer, | |
974 | the subr also contains a pointer to the name of the function, and | |
975 | information about the number of arguments accepted by the C fuction, for | |
976 | the sake of error checking. | |
977 | ||
978 | There is no single type predicate macro that recognizes subrs, as | |
979 | distinct from other kinds of procedures. The closest thing is | |
980 | @code{scm_procedure_p}; see @ref{Procedures}. | |
981 | ||
982 | @deftypefn Macro {char *} SCM_SNAME (@var{x}) | |
983 | Return the name of the subr @var{x}. The results are undefined if | |
984 | @var{x} is not a subr. | |
985 | @end deftypefn | |
986 | ||
987 | @deftypefun SCM scm_make_gsubr (char *@var{name}, int @var{req}, int @var{opt}, int @var{rest}, SCM (*@var{function})()) | |
988 | Create a new subr object named @var{name}, based on the C function | |
989 | @var{function}, make it visible to Scheme the value of as a global | |
990 | variable named @var{name}, and return the subr object. | |
991 | ||
992 | The subr object accepts @var{req} required arguments, @var{opt} optional | |
993 | arguments, and a @var{rest} argument iff @var{rest} is non-zero. The C | |
994 | function @var{function} should accept @code{@var{req} + @var{opt}} | |
995 | arguments, or @code{@var{req} + @var{opt} + 1} arguments if @code{rest} | |
996 | is non-zero. | |
997 | ||
998 | When a subr object is applied, it must be applied to at least @var{req} | |
999 | arguments, or else Guile signals an error. @var{function} receives the | |
1000 | subr's first @var{req} arguments as its first @var{req} arguments. If | |
1001 | there are fewer than @var{opt} arguments remaining, then @var{function} | |
1002 | receives the value @code{SCM_UNDEFINED} for any missing optional | |
1003 | arguments. If @var{rst} is non-zero, then any arguments after the first | |
1004 | @code{@var{req} + @var{opt}} are packaged up as a list as passed as | |
1005 | @var{function}'s last argument. | |
1006 | ||
1007 | Note that subrs can actually only accept a predefined set of | |
1008 | combinations of required, optional, and rest arguments. For example, a | |
1009 | subr can take one required argument, or one required and one optional | |
1010 | argument, but a subr can't take one required and two optional arguments. | |
1011 | It's bizarre, but that's the way the interpreter was written. If the | |
1012 | arguments to @code{scm_make_gsubr} do not fit one of the predefined | |
1013 | patterns, then @code{scm_make_gsubr} will return a compiled closure | |
1014 | object instead of a subr object. | |
1015 | @end deftypefun | |
1016 | ||
1017 | ||
1018 | @node Ports, , Subrs, Non-immediate Datatypes | |
1019 | @subsubsection Ports | |
1020 | ||
1021 | Haven't written this yet, 'cos I don't understand ports yet. | |
1022 | ||
1023 | ||
1024 | @node Signalling Type Errors, , Non-immediate Datatypes, How Guile does it | |
1025 | @subsection Signalling Type Errors | |
1026 | ||
1027 | Every function visible at the Scheme level should aggressively check the | |
1028 | types of its arguments, to avoid misinterpreting a value, and perhaps | |
1029 | causing a segmentation fault. Guile provides some macros to make this | |
1030 | easier. | |
1031 | ||
1032 | @deftypefn Macro void SCM_ASSERT (int @var{test}, SCM @var{obj}, int @var{position}, char *@var{subr}) | |
1033 | If @var{test} is zero, signal an error, attributed to the subroutine | |
1034 | named @var{subr}, operating on the value @var{obj}. The @var{position} | |
1035 | value determines exactly what sort of error to signal. | |
1036 | ||
1037 | If @var{position} is a string, @code{SCM_ASSERT} raises a | |
1038 | ``miscellaneous'' error whose message is that string. | |
1039 | ||
1040 | Otherwise, @var{position} should be one of the values defined below. | |
1041 | @end deftypefn | |
1042 | ||
1043 | @deftypefn Macro int SCM_ARG1 | |
1044 | @deftypefnx Macro int SCM_ARG2 | |
1045 | @deftypefnx Macro int SCM_ARG3 | |
1046 | @deftypefnx Macro int SCM_ARG4 | |
1047 | @deftypefnx Macro int SCM_ARG5 | |
1048 | Signal a ``wrong type argument'' error. When used as the @var{position} | |
1049 | argument of @code{SCM_ASSERT}, @code{SCM_ARG@var{n}} claims that | |
1050 | @var{obj} has the wrong type for the @var{n}'th argument of @var{subr}. | |
1051 | ||
1052 | The only way to complain about the type of an argument after the fifth | |
1053 | is to use @code{SCM_ARGn}, defined below, which doesn't specify which | |
1054 | argument is wrong. You could pass your own error message to | |
1055 | @code{SCM_ASSERT} as the @var{position}, but then the error signalled is | |
1056 | a ``miscellaneous'' error, not a ``wrong type argument'' error. This | |
1057 | seems kludgy to me. | |
1058 | @comment Any function with more than two arguments is wrong --- Perlis | |
1059 | @comment Despite Perlis, I agree. Why not have two Macros, one with | |
1060 | @comment a string error message, and the other with an integer position | |
1061 | @comment that only claims a type error in an argument? | |
1062 | @comment --- Keith Wright | |
1063 | @end deftypefn | |
1064 | ||
1065 | @deftypefn Macro int SCM_ARGn | |
1066 | As above, but does not specify which argument's type is incorrect. | |
1067 | @end deftypefn | |
1068 | ||
1069 | @deftypefn Macro int SCM_WNA | |
1070 | Signal an error complaining that the function received the wrong number | |
1071 | of arguments. | |
1072 | ||
1073 | Interestingly, the message is attributed to the function named by | |
1074 | @var{obj}, not @var{subr}, so @var{obj} must be a Scheme string object | |
1075 | naming the function. Usually, Guile catches these errors before ever | |
1076 | invoking the subr, so we don't run into these problems. | |
1077 | @end deftypefn | |
1078 | ||
1079 | @deftypefn Macro int SCM_OUTOFRANGE | |
1080 | Signal an error complaining that @var{obj} is ``out of range'' for | |
1081 | @var{subr}. | |
1082 | @end deftypefn | |
1083 | ||
1084 | ||
1085 | @node Defining New Types (Smobs), , How Guile does it, Top | |
1086 | @section Defining New Types (Smobs) | |
1087 | ||
1088 | @dfn{Smobs} are Guile's mechanism for adding new non-immediate types to | |
1089 | the system.@footnote{The term ``smob'' was coined by Aubrey Jaffer, who | |
1090 | says it comes from ``small object'', referring to the fact that only the | |
1091 | @sc{cdr} and part of the @sc{car} of a smob's cell are available for | |
1092 | use.} To define a new smob type, the programmer provides Guile with | |
1093 | some essential information about the type --- how to print it, how to | |
7d12f033 | 1094 | garbage collect it, and so on --- and Guile returns a fresh type tag for |
21b4d3c2 | 1095 | use in the @sc{car} of new cells. The programmer can then use |
7d12f033 JB |
1096 | @code{scm_make_gsubr} to make a set of C functions that create and |
1097 | operate on these objects visible to Scheme code. | |
21b4d3c2 | 1098 | |
7d12f033 JB |
1099 | (You can find a complete version of the example code used in this |
1100 | section in the Guile distribution, in @file{doc/example-smob}. That | |
1101 | directory includes a makefile and a suitable @code{main} function, so | |
1102 | you can build a complete interactive Guile shell, extended with the | |
1103 | datatypes described here.) | |
21b4d3c2 JB |
1104 | |
1105 | @menu | |
1106 | * Describing a New Type:: | |
1107 | * Creating Instances:: | |
1108 | * Typechecking:: | |
1109 | * Garbage Collecting Smobs:: | |
0a27f7d3 | 1110 | * A Common Mistake In Allocating Smobs:: |
21b4d3c2 JB |
1111 | * Garbage Collecting Simple Smobs:: |
1112 | * A Complete Example:: | |
1113 | @end menu | |
1114 | ||
1115 | @node Describing a New Type, Creating Instances, Defining New Types (Smobs), Defining New Types (Smobs) | |
1116 | @subsection Describing a New Type | |
1117 | ||
5bdab016 MD |
1118 | To define a new type, the programmer must write four functions to |
1119 | manage instances of the type: | |
21b4d3c2 JB |
1120 | |
1121 | @table @code | |
1122 | @item mark | |
1123 | Guile will apply this function to each instance of the new type it | |
1124 | encounters during garbage collection. This function is responsible for | |
1125 | telling the collector about any other non-immediate objects the object | |
5bdab016 MD |
1126 | refers to. The default smob mark function is to not mark any data. |
1127 | @xref{Garbage Collecting Smobs}, for more details. | |
21b4d3c2 JB |
1128 | |
1129 | @item free | |
1130 | Guile will apply this function to each instance of the new type it could | |
1131 | not find any live pointers to. The function should release all | |
5bdab016 MD |
1132 | resources held by the object and return the number of bytes released. |
1133 | This is analagous to the Java finalization method-- it is invoked at | |
1134 | an unspecified time (when garbage collection occurs) after the object | |
1135 | is dead. | |
1136 | The default free function frees the smob data (if the size of the struct | |
1137 | passed to @code{scm_make_smob_type} or @code{scm_make_smob_type_mfpe} is | |
1138 | non-zero) using @code{scm_must_free} and returns the size of that | |
1139 | struct. @xref{Garbage Collecting Smobs}, for more details. | |
21b4d3c2 JB |
1140 | |
1141 | @item print | |
5bdab016 MD |
1142 | @c GJB:FIXME:: @var{exp} and @var{port} need to refer to a prototype of |
1143 | @c the print function.... where is that, or where should it go? | |
21b4d3c2 JB |
1144 | Guile will apply this function to each instance of the new type to print |
1145 | the value, as for @code{display} or @code{write}. The function should | |
1146 | write a printed representation of @var{exp} on @var{port}, in accordance | |
1147 | with the parameters in @var{pstate}. (For more information on print | |
5bdab016 MD |
1148 | states, see @ref{Ports}.) The default print function prints @code{#<NAME ADDRESS>} |
1149 | where @code{NAME} is the first argument passed to @code{scm_make_smob_type} or | |
1150 | @code{scm_make_smob_type_mfpe}. | |
21b4d3c2 JB |
1151 | |
1152 | @item equalp | |
1153 | If Scheme code asks the @code{equal?} function to compare two instances | |
1154 | of the same smob type, Guile calls this function. It should return | |
1155 | @code{SCM_BOOL_T} if @var{a} and @var{b} should be considered | |
1156 | @code{equal?}, or @code{SCM_BOOL_F} otherwise. If @code{equalp} is | |
5bdab016 | 1157 | @code{NULL}, @code{equal?} will assume that two instances of this type are |
21b4d3c2 JB |
1158 | never @code{equal?} unless they are @code{eq?}. |
1159 | ||
1160 | @end table | |
1161 | ||
5bdab016 MD |
1162 | To actually register the new smob type, you must call either @code{scm_make_smob_type} |
1163 | or @code{scm_make_smob_type_mfpe}: | |
1164 | ||
1165 | @deftypefun long scm_make_smob_type (const char *name, scm_sizet size) | |
1166 | This function adds a new smob type, named @var{name}, with instance size @var{size} to the system. | |
1167 | The return value is a tag that is used in creating instances of the type. If @var{size} | |
1168 | is 0, then no memory will be allocated when instances of the smob are created, and | |
1169 | nothing will be freed by the default free function. | |
1170 | @end deftypefun | |
1171 | ||
1172 | Each of the below @code{scm_set_smob_XXX} functions registers a smob | |
1173 | special function for a given type. You can instead use | |
1174 | @code{scm_make_smob_type_mfpe} to register the special smob functions | |
1175 | when you create the smob type, if you prefer. | |
1176 | ||
1177 | @deftypefun void scm_set_smob_mark (long tc, SCM (*mark) (SCM)) | |
1178 | This function sets the smob marking procedure for the smob type specified by | |
1179 | the tag @var{tc}. @var{tc} is the tag returned by @code{scm_make_smob_type}. | |
1180 | @end deftypefun | |
1181 | ||
1182 | @deftypefun void scm_set_smob_free (long tc, scm_sizet (*free) (SCM)) | |
1183 | This function sets the smob freeing procedure for the smob type specified by | |
1184 | the tag @var{tc}. @var{tc} is the tag returned by @code{scm_make_smob_type}. | |
1185 | @end deftypefun | |
21b4d3c2 | 1186 | |
5bdab016 MD |
1187 | @deftypefun void scm_set_smob_print (long tc, int (*print) (SCM,SCM,scm_print_state*)) |
1188 | This function sets the smob printing procedure for the smob type specified by | |
1189 | the tag @var{tc}. @var{tc} is the tag returned by @code{scm_make_smob_type}. | |
1190 | @end deftypefun | |
1191 | ||
1192 | @deftypefun void scm_set_smob_equalp (long tc, SCM (*equalp) (SCM,SCM)) | |
1193 | This function sets the smob equality-testing predicate for the smob type specified by | |
1194 | the tag @var{tc}. @var{tc} is the tag returned by @code{scm_make_smob_type}. | |
1195 | @end deftypefun | |
1196 | ||
1197 | Instead of using @code{scm_make_smob_type} and calling each of the individual | |
1198 | @code{scm_set_smob_XXXX} functions to register each special function | |
1199 | independently, you can use @code{scm_make_smob_type_mfpe} to register all | |
1200 | of the special functions at once as you create the smob type: | |
1201 | ||
1202 | @deftypefun long scm_make_smob_type_mfpe(const char *name, scm_sizet size, SCM (*mark) (SCM), scm_sizet (*free) (SCM), int (*print) (SCM, SCM, scm_print_state*), SCM (*equalp) (SCM, SCM)) | |
1203 | This function invokes @code{scm_make_smob_type} on its first two arguments | |
1204 | to add a new smob type named @var{name}, with instance size @var{size} to the system. | |
1205 | It also registers the @var{mark}, @var{free}, @var{print}, @var{equalp} smob | |
1206 | special functions for that new type. Any of these parameters can be @code{NULL} | |
1207 | to have that special function use the default behaviour for guile. | |
1208 | The return value is a tag that is used in creating instances of the type. If @var{size} | |
1209 | is 0, then no memory will be allocated when instances of the smob are created, and | |
1210 | nothing will be freed by the default free function. | |
21b4d3c2 JB |
1211 | @end deftypefun |
1212 | ||
1213 | For example, here is how one might declare and register a new type | |
1214 | representing eight-bit grayscale images: | |
1215 | @example | |
1216 | #include <libguile.h> | |
1217 | ||
7d12f033 JB |
1218 | long image_tag; |
1219 | ||
21b4d3c2 JB |
1220 | void |
1221 | init_image_type () | |
1222 | @{ | |
5bdab016 MD |
1223 | image_tag = scm_make_smob_type_mfpe ("image",sizeof(struct image), |
1224 | mark_image, free_image, print_image, NULL); | |
21b4d3c2 JB |
1225 | @} |
1226 | @end example | |
1227 | ||
1228 | ||
1229 | @node Creating Instances, Typechecking, Describing a New Type, Defining New Types (Smobs) | |
1230 | @subsection Creating Instances | |
1231 | ||
1232 | Like other non-immediate types, smobs start with a cell whose @sc{car} | |
5bdab016 MD |
1233 | contains typing information, and whose @code{cdr} is free for any use. For smobs, |
1234 | the @code{cdr} stores a pointer to the internal C structure holding the | |
1235 | smob-specific data. | |
1236 | To create an instance of a smob type following these standards, you should | |
1237 | use @code{SCM_NEWSMOB}: | |
1238 | ||
1239 | @deftypefn Macro void SCM_NEWSMOB(SCM value,long tag,void *data) | |
1240 | Make @var{value} contain a smob instance of the type with tag @var{tag} | |
1241 | and smob data @var{data}. @var{value} must be previously declared | |
1242 | as C type @code{SCM}. | |
1243 | @end deftypefn | |
1244 | ||
1245 | Since it is often the case (e.g., in smob constructors) that you will | |
1246 | create a smob instance and return it, there is also a slightly specialized | |
1247 | macro for this situation: | |
1248 | ||
1249 | @deftypefn Macro fn_returns SCM_RETURN_NEWSMOB(long tab, void *data) | |
1250 | This macro expands to a block of code that creates a smob instance of | |
1251 | the type with tag @var{tag} and smob data @var{data}, and returns | |
1252 | that @code{SCM} value. It should be the last piece of code in | |
1253 | a block. | |
1254 | @end deftypefn | |
21b4d3c2 JB |
1255 | |
1256 | Guile provides the following functions for managing memory, which are | |
1257 | often helpful when implementing smobs: | |
1258 | ||
1259 | @deftypefun {char *} scm_must_malloc (long @var{len}, char *@var{what}) | |
1260 | Allocate @var{len} bytes of memory, using @code{malloc}, and return a | |
1261 | pointer to them. | |
1262 | ||
1263 | If there is not enough memory available, invoke the garbage collector, | |
1264 | and try once more. If there is still not enough, signal an error, | |
1265 | reporting that we could not allocate @var{what}. | |
1266 | ||
1267 | This function also helps maintain statistics about the size of the heap. | |
1268 | @end deftypefun | |
1269 | ||
1270 | @deftypefun {char *} scm_must_realloc (char *@var{addr}, long @var{olen}, long @var{len}, char *@var{what}) | |
1271 | Resize (and possibly relocate) the block of memory at @var{addr}, to | |
1272 | have a size of @var{len} bytes, by calling @code{realloc}. Return a | |
1273 | pointer to the new block. | |
1274 | ||
1275 | If there is not enough memory available, invoke the garbage collector, | |
1276 | and try once more. If there is still not enough, signal an error, | |
1277 | reporting that we could not allocate @var{what}. | |
1278 | ||
1279 | The value @var{olen} should be the old size of the block of memory at | |
1280 | @var{addr}; it is only used for keeping statistics on the size of the | |
1281 | heap. | |
1282 | @end deftypefun | |
1283 | ||
1284 | @deftypefun void scm_must_free (char *@var{addr}) | |
1285 | Free the block of memory at @var{addr}, using @code{free}. If | |
1286 | @var{addr} is zero, signal an error, complaining of an attempt to free | |
1287 | something that is already free. | |
1288 | ||
1289 | This does no record-keeping; instead, the smob's @code{free} function | |
1290 | must take care of that. | |
1291 | ||
1292 | This function isn't usually sufficiently different from the usual | |
1293 | @code{free} function to be worth using. | |
1294 | @end deftypefun | |
1295 | ||
1296 | ||
1297 | Continuing the above example, if the global variable @code{image_tag} | |
1298 | contains a tag returned by @code{scm_newsmob}, here is how we could | |
1299 | construct a smob whose @sc{cdr} contains a pointer to a freshly | |
1300 | allocated @code{struct image}: | |
1301 | ||
1302 | @example | |
1303 | struct image @{ | |
1304 | int width, height; | |
1305 | char *pixels; | |
1306 | ||
1307 | /* The name of this image */ | |
1308 | SCM name; | |
1309 | ||
1310 | /* A function to call when this image is | |
1311 | modified, e.g., to update the screen, | |
1312 | or SCM_BOOL_F if no action necessary */ | |
1313 | SCM update_func; | |
1314 | @}; | |
1315 | ||
1316 | SCM | |
7d12f033 | 1317 | make_image (SCM name, SCM s_width, SCM s_height) |
21b4d3c2 JB |
1318 | @{ |
1319 | struct image *image; | |
7d12f033 JB |
1320 | int width, height; |
1321 | ||
1322 | SCM_ASSERT (SCM_NIMP (name) && SCM_STRINGP (name), name, | |
1323 | SCM_ARG1, "make-image"); | |
1324 | SCM_ASSERT (SCM_INUMP (s_width), s_width, SCM_ARG2, "make-image"); | |
1325 | SCM_ASSERT (SCM_INUMP (s_height), s_height, SCM_ARG3, "make-image"); | |
21b4d3c2 | 1326 | |
7d12f033 JB |
1327 | width = SCM_INUM (s_width); |
1328 | height = SCM_INUM (s_height); | |
1329 | ||
1330 | image = (struct image *) scm_must_malloc (sizeof (struct image), "image"); | |
21b4d3c2 JB |
1331 | image->width = width; |
1332 | image->height = height; | |
1333 | image->pixels = scm_must_malloc (width * height, "image pixels"); | |
1334 | image->name = name; | |
1335 | image->update_func = SCM_BOOL_F; | |
1336 | ||
5bdab016 | 1337 | SCM_RETURN_NEWSMOB (image_tag, image); |
21b4d3c2 JB |
1338 | @} |
1339 | @end example | |
1340 | ||
1341 | ||
1342 | @node Typechecking, Garbage Collecting Smobs, Creating Instances, Defining New Types (Smobs) | |
1343 | @subsection Typechecking | |
1344 | ||
1345 | Functions that operate on smobs should aggressively check the types of | |
1346 | their arguments, to avoid misinterpreting some other datatype as a smob, | |
1347 | and perhaps causing a segmentation fault. Fortunately, this is pretty | |
1348 | simple to do. The function need only verify that its argument is a | |
1349 | non-immediate, whose @sc{car} is the type tag returned by | |
1350 | @code{scm_newsmob}. | |
1351 | ||
1352 | For example, here is a simple function that operates on an image smob, | |
1353 | and checks the type of its argument. We also present an expanded | |
7d12f033 JB |
1354 | version of the @code{init_image_type} function, to make |
1355 | @code{clear_image} and the image constructor function @code{make_image} | |
1356 | visible to Scheme code. | |
21b4d3c2 JB |
1357 | @example |
1358 | SCM | |
1359 | clear_image (SCM image_smob) | |
1360 | @{ | |
1361 | int area; | |
1362 | struct image *image; | |
1363 | ||
1364 | SCM_ASSERT ((SCM_NIMP (image_smob) | |
1365 | && SCM_CAR (image_smob) == image_tag), | |
1366 | image_smob, SCM_ARG1, "clear-image"); | |
1367 | ||
1368 | image = (struct image *) SCM_CDR (image_smob); | |
1369 | area = image->width * image->height; | |
1370 | memset (image->pixels, 0, area); | |
1371 | ||
1372 | /* Invoke the image's update function. */ | |
1373 | if (image->update_func != SCM_BOOL_F) | |
1374 | scm_apply (image->update_func, SCM_EOL, SCM_EOL); | |
1375 | ||
1376 | return SCM_UNSPECIFIED; | |
1377 | @} | |
1378 | ||
1379 | ||
1380 | void | |
1381 | init_image_type () | |
1382 | @{ | |
1383 | image_tag = scm_newsmob (&image_funs); | |
7d12f033 JB |
1384 | |
1385 | scm_make_gsubr ("make-image", 3, 0, 0, make_image); | |
21b4d3c2 JB |
1386 | scm_make_gsubr ("clear-image", 1, 0, 0, clear_image); |
1387 | @} | |
1388 | @end example | |
1389 | ||
1390 | Note that checking types is a little more complicated during garbage | |
1391 | collection; see the description of @code{SCM_GCTYP16} in @ref{Garbage | |
1392 | Collecting Smobs}. | |
1393 | ||
5bdab016 | 1394 | @c GJB:FIXME:: should talk about guile-snarf somewhere! |
21b4d3c2 | 1395 | |
0a27f7d3 | 1396 | @node Garbage Collecting Smobs, A Common Mistake In Allocating Smobs, Typechecking, Defining New Types (Smobs) |
21b4d3c2 JB |
1397 | @subsection Garbage Collecting Smobs |
1398 | ||
1399 | Once a smob has been released to the tender mercies of the Scheme | |
1400 | system, it must be prepared to survive garbage collection. Guile calls | |
1401 | the @code{mark} and @code{free} functions of the @code{scm_smobfuns} | |
1402 | structure to manage this. | |
1403 | ||
1404 | As described before (@pxref{Garbage Collection}), every object in the | |
1405 | Scheme system has a @dfn{mark bit}, which the garbage collector uses to | |
1406 | tell live objects from dead ones. When collection starts, every | |
1407 | object's mark bit is clear. The collector traces pointers through the | |
1408 | heap, starting from objects known to be live, and sets the mark bit on | |
1409 | each object it encounters. When it can find no more unmarked objects, | |
1410 | the collector walks all objects, live and dead, frees those whose mark | |
1411 | bits are still clear, and clears the mark bit on the others. | |
1412 | ||
1413 | The two main portions of the collection are called the @dfn{mark phase}, | |
1414 | during which the collector marks live objects, and the @dfn{sweep | |
1415 | phase}, during which the collector frees all unmarked objects. | |
1416 | ||
1417 | The mark bit of a smob lives in its @sc{car}, along with the smob's type | |
1418 | tag. When the collector encounters a smob, it sets the smob's mark bit, | |
1419 | and uses the smob's type tag to find the appropriate @code{mark} | |
1420 | function for that smob: the one listed in that smob's | |
1421 | @code{scm_smobfuns} structure. It then calls the @code{mark} function, | |
1422 | passing it the smob as its only argument. | |
1423 | ||
7d12f033 | 1424 | The @code{mark} function is responsible for marking any other Scheme |
21b4d3c2 JB |
1425 | objects the smob refers to. If it does not do so, the objects' mark |
1426 | bits will still be clear when the collector begins to sweep, and the | |
1427 | collector will free them. If this occurs, it will probably break, or at | |
1428 | least confuse, any code operating on the smob; the smob's @code{SCM} | |
1429 | values will have become dangling references. | |
1430 | ||
1431 | To mark an arbitrary Scheme object, the @code{mark} function may call | |
1432 | this function: | |
1433 | ||
1434 | @deftypefun void scm_gc_mark (SCM @var{x}) | |
1435 | Mark the object @var{x}, and recurse on any objects @var{x} refers to. | |
1436 | If @var{x}'s mark bit is already set, return immediately. | |
1437 | @end deftypefun | |
1438 | ||
1439 | Thus, here is how we might write the @code{mark} function for the image | |
1440 | smob type discussed above: | |
1441 | @example | |
1442 | @group | |
1443 | SCM | |
1444 | mark_image (SCM image_smob) | |
1445 | @{ | |
1446 | /* Mark the image's name and update function. */ | |
1447 | struct image *image = (struct image *) SCM_CDR (image_smob); | |
1448 | ||
1449 | scm_gc_mark (image->name); | |
1450 | scm_gc_mark (image->update_func); | |
1451 | ||
1452 | return SCM_BOOL_F; | |
1453 | @} | |
1454 | @end group | |
1455 | @end example | |
1456 | ||
1457 | Note that, even though the image's @code{update_func} could be an | |
1458 | arbitrarily complex structure (representing a procedure and any values | |
1459 | enclosed in its environment), @code{scm_gc_mark} will recurse as | |
1460 | necessary to mark all its components. Because @code{scm_gc_mark} sets | |
1461 | an object's mark bit before it recurses, it is not confused by | |
1462 | circular structures. | |
1463 | ||
1464 | As an optimization, the collector will mark whatever value is returned | |
1465 | by the @code{mark} function; this helps limit depth of recursion during | |
1466 | the mark phase. Thus, the code above could also be written as: | |
1467 | @example | |
1468 | @group | |
1469 | SCM | |
1470 | mark_image (SCM image_smob) | |
1471 | @{ | |
1472 | /* Mark the image's name and update function. */ | |
1473 | struct image *image = (struct image *) SCM_CDR (image_smob); | |
1474 | ||
1475 | scm_gc_mark (image->name); | |
1476 | return image->update_func; | |
1477 | @} | |
1478 | @end group | |
1479 | @end example | |
1480 | ||
1481 | ||
1482 | Finally, when the collector encounters an unmarked smob during the sweep | |
1483 | phase, it uses the smob's tag to find the appropriate @code{free} | |
1484 | function for the smob. It then calls the function, passing it the smob | |
1485 | as its only argument. | |
1486 | ||
1487 | The @code{free} function must release any resources used by the smob. | |
1488 | However, it need not free objects managed by the collector; the | |
1489 | collector will take care of them. The return type of the @code{free} | |
1490 | function should be @code{scm_sizet}, an unsigned integral type; the | |
1491 | @code{free} function should return the number of bytes released, to help | |
1492 | the collector maintain statistics on the size of the heap. | |
1493 | ||
1494 | Here is how we might write the @code{free} function for the image smob | |
1495 | type: | |
1496 | @example | |
1497 | scm_sizet | |
1498 | free_image (SCM image_smob) | |
1499 | @{ | |
1500 | struct image *image = (struct image *) SCM_CDR (image_smob); | |
1501 | scm_sizet size = image->width * image->height + sizeof (*image); | |
1502 | ||
1503 | free (image->pixels); | |
1504 | free (image); | |
1505 | ||
1506 | return size; | |
1507 | @} | |
1508 | @end example | |
1509 | ||
1510 | During the sweep phase, the garbage collector will clear the mark bits | |
1511 | on all live objects. The code which implements a smob need not do this | |
1512 | itself. | |
1513 | ||
1514 | There is no way for smob code to be notified when collection is | |
1515 | complete. | |
1516 | ||
1517 | Note that, since a smob's mark bit lives in its @sc{car}, along with the | |
1518 | smob's type tag, the technique for checking the type of a smob described | |
1519 | in @ref{Typechecking} will not necessarily work during GC. If you need | |
1520 | to find out whether a given object is a particular smob type during GC, | |
1521 | use the following macro: | |
1522 | ||
1523 | @deftypefn Macro void SCM_GCTYP16 (SCM @var{x}) | |
1524 | Return the type bits of the smob @var{x}, with the mark bit clear. | |
1525 | ||
1526 | Use this macro instead of @code{SCM_CAR} to check the type of a smob | |
1527 | during GC. Usually, only code called by the smob's @code{mark} function | |
1528 | need worry about this. | |
1529 | @end deftypefn | |
1530 | ||
1531 | It is usually a good idea to minimize the amount of processing done | |
1532 | during garbage collection; keep @code{mark} and @code{free} functions | |
1533 | very simple. Since collections occur at unpredictable times, it is easy | |
1534 | for any unusual activity to interfere with normal code. | |
1535 | ||
1536 | ||
0a27f7d3 JB |
1537 | @node A Common Mistake In Allocating Smobs, Garbage Collecting Simple Smobs, Garbage Collecting Smobs, Defining New Types (Smobs) |
1538 | @subsection A Common Mistake In Allocating Smobs | |
1539 | ||
1540 | When constructing new objects, you must be careful that the garbage | |
1541 | collector can always find any new objects you allocate. For example, | |
1542 | suppose we wrote the @code{make_image} function this way: | |
1543 | ||
1544 | @example | |
1545 | SCM | |
1546 | make_image (SCM name, SCM s_width, SCM s_height) | |
1547 | @{ | |
1548 | struct image *image; | |
1549 | SCM image_smob; | |
1550 | int width, height; | |
1551 | ||
1552 | SCM_ASSERT (SCM_NIMP (name) && SCM_STRINGP (name), name, | |
1553 | SCM_ARG1, "make-image"); | |
1554 | SCM_ASSERT (SCM_INUMP (s_width), s_width, SCM_ARG2, "make-image"); | |
1555 | SCM_ASSERT (SCM_INUMP (s_height), s_height, SCM_ARG3, "make-image"); | |
1556 | ||
1557 | width = SCM_INUM (s_width); | |
1558 | height = SCM_INUM (s_height); | |
1559 | ||
1560 | image = (struct image *) scm_must_malloc (sizeof (struct image), "image"); | |
1561 | image->width = width; | |
1562 | image->height = height; | |
1563 | image->pixels = scm_must_malloc (width * height, "image pixels"); | |
1564 | ||
1565 | /* THESE TWO LINES HAVE CHANGED: */ | |
1566 | image->name = scm_string_copy (name); | |
1567 | image->update_func = scm_make_gsubr (@dots{}); | |
1568 | ||
1569 | SCM_NEWCELL (image_smob); | |
1570 | SCM_SETCDR (image_smob, image); | |
1571 | SCM_SETCAR (image_smob, image_tag); | |
1572 | ||
1573 | return image_smob; | |
1574 | @} | |
1575 | @end example | |
1576 | ||
1577 | This code is incorrect. The calls to @code{scm_string_copy} and | |
1578 | @code{scm_make_gsubr} allocate fresh objects. Allocating any new object | |
1579 | may cause the garbage collector to run. If @code{scm_make_gsubr} | |
1580 | invokes a collection, the garbage collector has no way to discover that | |
1581 | @code{image->name} points to the new string object; the @code{image} | |
1582 | structure is not yet part of any Scheme object, so the garbage collector | |
1583 | will not traverse it. Since the garbage collector cannot find any | |
1584 | references to the new string object, it will free it, leaving | |
1585 | @code{image} pointing to a dead object. | |
1586 | ||
1587 | A correct implementation might say, instead: | |
1588 | @example | |
1589 | image->name = SCM_BOOL_F; | |
1590 | image->update_func = SCM_BOOL_F; | |
1591 | ||
1592 | SCM_NEWCELL (image_smob); | |
1593 | SCM_SETCDR (image_smob, image); | |
1594 | SCM_SETCAR (image_smob, image_tag); | |
1595 | ||
1596 | image->name = scm_string_copy (name); | |
1597 | image->update_func = scm_make_gsubr (@dots{}); | |
1598 | ||
1599 | return image_smob; | |
1600 | @end example | |
1601 | ||
1602 | Now, by the time we allocate the new string and function objects, | |
1603 | @code{image_smob} points to @code{image}. If the garbage collector | |
1604 | scans the stack, it will find a reference to @code{image_smob} and | |
1605 | traverse @code{image}, so any objects @code{image} points to will be | |
1606 | preserved. | |
1607 | ||
1608 | ||
1609 | @node Garbage Collecting Simple Smobs, A Complete Example, A Common Mistake In Allocating Smobs, Defining New Types (Smobs) | |
21b4d3c2 JB |
1610 | @subsection Garbage Collecting Simple Smobs |
1611 | ||
1612 | It is often useful to define very simple smob types --- smobs which have | |
1613 | no data to mark, other than the cell itself, or smobs whose @sc{cdr} is | |
1614 | simply an ordinary Scheme object, to be marked recursively. Guile | |
7d12f033 JB |
1615 | provides some functions to handle these common cases; you can use these |
1616 | functions as your smob type's @code{mark} function, if your smob's | |
1617 | structure is simple enough. | |
21b4d3c2 JB |
1618 | |
1619 | If the smob refers to no other Scheme objects, then no action is | |
1620 | necessary; the garbage collector has already marked the smob cell | |
1621 | itself. In that case, you can use zero as your mark function. | |
1622 | ||
1623 | @deftypefun SCM scm_markcdr (SCM @var{x}) | |
1624 | Mark the references in the smob @var{x}, assuming that @var{x}'s | |
1625 | @sc{cdr} contains an ordinary Scheme object, and @var{x} refers to no | |
1626 | other objects. This function simply returns @var{x}'s @sc{cdr}. | |
1627 | @end deftypefun | |
1628 | ||
1629 | @deftypefun scm_sizet scm_free0 (SCM @var{x}) | |
1630 | Do nothing; return zero. This function is appropriate for smobs that | |
1631 | use either zero or @code{scm_markcdr} as their marking functions, and | |
1632 | refer to no heap storage, including memory managed by @code{malloc}, | |
1633 | other than the smob's header cell. | |
1634 | @end deftypefun | |
1635 | ||
1636 | ||
1637 | @node A Complete Example, , Garbage Collecting Simple Smobs, Defining New Types (Smobs) | |
1638 | @subsection A Complete Example | |
1639 | ||
1640 | Here is the complete text of the implementation of the image datatype, | |
1641 | as presented in the sections above. We also provide a definition for | |
7d12f033 JB |
1642 | the smob's @code{print} function, and make some objects and functions |
1643 | static, to clarify exactly what the surrounding code is using. | |
1644 | ||
1645 | As mentioned above, you can find this code in the Guile distribution, in | |
1646 | @file{doc/example-smob}. That directory includes a makefile and a | |
1647 | suitable @code{main} function, so you can build a complete interactive | |
1648 | Guile shell, extended with the datatypes described here.) | |
21b4d3c2 JB |
1649 | |
1650 | @example | |
7d12f033 JB |
1651 | /* file "image-type.c" */ |
1652 | ||
21b4d3c2 JB |
1653 | #include <stdlib.h> |
1654 | #include <libguile.h> | |
1655 | ||
7d12f033 | 1656 | static long image_tag; |
21b4d3c2 JB |
1657 | |
1658 | struct image @{ | |
1659 | int width, height; | |
1660 | char *pixels; | |
1661 | ||
1662 | /* The name of this image */ | |
1663 | SCM name; | |
1664 | ||
1665 | /* A function to call when this image is | |
1666 | modified, e.g., to update the screen, | |
1667 | or SCM_BOOL_F if no action necessary */ | |
1668 | SCM update_func; | |
1669 | @}; | |
1670 | ||
7d12f033 JB |
1671 | static SCM |
1672 | make_image (SCM name, SCM s_width, SCM s_height) | |
21b4d3c2 JB |
1673 | @{ |
1674 | struct image *image; | |
1675 | SCM image_smob; | |
7d12f033 JB |
1676 | int width, height; |
1677 | ||
1678 | SCM_ASSERT (SCM_NIMP (name) && SCM_STRINGP (name), name, | |
1679 | SCM_ARG1, "make-image"); | |
1680 | SCM_ASSERT (SCM_INUMP (s_width), s_width, SCM_ARG2, "make-image"); | |
1681 | SCM_ASSERT (SCM_INUMP (s_height), s_height, SCM_ARG3, "make-image"); | |
21b4d3c2 | 1682 | |
7d12f033 JB |
1683 | width = SCM_INUM (s_width); |
1684 | height = SCM_INUM (s_height); | |
1685 | ||
1686 | image = (struct image *) scm_must_malloc (sizeof (struct image), "image"); | |
21b4d3c2 JB |
1687 | image->width = width; |
1688 | image->height = height; | |
1689 | image->pixels = scm_must_malloc (width * height, "image pixels"); | |
1690 | image->name = name; | |
1691 | image->update_func = SCM_BOOL_F; | |
1692 | ||
1693 | SCM_NEWCELL (image_smob); | |
1694 | SCM_SETCDR (image_smob, image); | |
1695 | SCM_SETCAR (image_smob, image_tag); | |
1696 | ||
1697 | return image_smob; | |
1698 | @} | |
1699 | ||
7d12f033 | 1700 | static SCM |
21b4d3c2 JB |
1701 | clear_image (SCM image_smob) |
1702 | @{ | |
1703 | int area; | |
1704 | struct image *image; | |
1705 | ||
1706 | SCM_ASSERT ((SCM_NIMP (image_smob) | |
1707 | && SCM_CAR (image_smob) == image_tag), | |
1708 | image_smob, SCM_ARG1, "clear-image"); | |
1709 | ||
1710 | image = (struct image *) SCM_CDR (image_smob); | |
1711 | area = image->width * image->height; | |
1712 | memset (image->pixels, 0, area); | |
1713 | ||
1714 | /* Invoke the image's update function. */ | |
1715 | if (image->update_func != SCM_BOOL_F) | |
1716 | scm_apply (image->update_func, SCM_EOL, SCM_EOL); | |
1717 | ||
1718 | return SCM_UNSPECIFIED; | |
1719 | @} | |
1720 | ||
7d12f033 | 1721 | static SCM |
21b4d3c2 JB |
1722 | mark_image (SCM image_smob) |
1723 | @{ | |
1724 | struct image *image = (struct image *) SCM_CDR (image_smob); | |
1725 | ||
1726 | scm_gc_mark (image->name); | |
1727 | return image->update_func; | |
1728 | @} | |
1729 | ||
7d12f033 | 1730 | static scm_sizet |
21b4d3c2 JB |
1731 | free_image (SCM image_smob) |
1732 | @{ | |
1733 | struct image *image = (struct image *) SCM_CDR (image_smob); | |
7d12f033 | 1734 | scm_sizet size = image->width * image->height + sizeof (struct image); |
21b4d3c2 JB |
1735 | |
1736 | free (image->pixels); | |
1737 | free (image); | |
1738 | ||
1739 | return size; | |
1740 | @} | |
1741 | ||
7d12f033 JB |
1742 | static int |
1743 | print_image (SCM image_smob, SCM port, scm_print_state *pstate) | |
21b4d3c2 JB |
1744 | @{ |
1745 | struct image *image = (struct image *) SCM_CDR (image_smob); | |
1746 | ||
7d12f033 | 1747 | scm_puts ("#<image ", port); |
21b4d3c2 | 1748 | scm_display (image->name, port); |
7d12f033 | 1749 | scm_puts (">", port); |
21b4d3c2 JB |
1750 | |
1751 | /* non-zero means success */ | |
1752 | return 1; | |
1753 | @} | |
1754 | ||
7d12f033 | 1755 | static scm_smobfuns image_funs = @{ |
21b4d3c2 JB |
1756 | mark_image, free_image, print_image, 0 |
1757 | @}; | |
1758 | ||
21b4d3c2 JB |
1759 | void |
1760 | init_image_type () | |
1761 | @{ | |
1762 | image_tag = scm_newsmob (&image_funs); | |
7d12f033 | 1763 | |
21b4d3c2 | 1764 | scm_make_gsubr ("clear-image", 1, 0, 0, clear_image); |
7d12f033 | 1765 | scm_make_gsubr ("make-image", 3, 0, 0, make_image); |
21b4d3c2 JB |
1766 | @} |
1767 | @end example | |
1768 | ||
7d12f033 JB |
1769 | Here is a sample build and interaction with the code from the |
1770 | @file{example-smob} directory, on the author's machine: | |
1771 | ||
1772 | @example | |
1773 | zwingli:example-smob$ make CC=gcc | |
1774 | gcc `guile-config compile` -c image-type.c -o image-type.o | |
1775 | gcc `guile-config compile` -c myguile.c -o myguile.o | |
ea7e04f7 | 1776 | gcc image-type.o myguile.o `guile-config link` -o myguile |
7d12f033 JB |
1777 | zwingli:example-smob$ ./myguile |
1778 | guile> make-image | |
1779 | #<primitive-procedure make-image> | |
1780 | guile> (define i (make-image "Whistler's Mother" 100 100)) | |
1781 | guile> i | |
1782 | #<image Whistler's Mother> | |
1783 | guile> (clear-image i) | |
1784 | guile> (clear-image 4) | |
1785 | ERROR: In procedure clear-image in expression (clear-image 4): | |
1786 | ERROR: Wrong type argument in position 1: 4 | |
1787 | ABORT: (wrong-type-arg) | |
1788 | ||
1789 | Type "(backtrace)" to get more information. | |
1790 | guile> | |
1791 | @end example | |
1792 | ||
21b4d3c2 | 1793 | @bye |