Big reorganization of the whole manual to give it a simpler structure.
[bpt/guile.git] / doc / ref / libguile-smobs.texi
1 @c -*-texinfo-*-
2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004
4 @c Free Software Foundation, Inc.
5 @c See the file guile.texi for copying conditions.
6
7 @node Defining New Types (Smobs)
8 @section Defining New Types (Smobs)
9
10 @dfn{Smobs} are Guile's mechanism for adding new primitive types to
11 the system. The term ``smob'' was coined by Aubrey Jaffer, who says
12 it comes from ``small object'', referring to the fact that they are
13 quite limited in size: they can hold just one pointer to a larger
14 memory block plus 16 extra bits.
15
16 To define a new smob type, the programmer provides Guile with some
17 essential information about the type --- how to print it, how to
18 garbage collect it, and so on --- and Guile allocates a fresh type tag
19 for it. The programmer can then use @code{scm_c_define_gsubr} to make
20 a set of C functions visible to Scheme code that create and operate on
21 these objects.
22
23 (You can find a complete version of the example code used in this
24 section in the Guile distribution, in @file{doc/example-smob}. That
25 directory includes a makefile and a suitable @code{main} function, so
26 you can build a complete interactive Guile shell, extended with the
27 datatypes described here.)
28
29 @menu
30 * Describing a New Type::
31 * Creating Instances::
32 * Type checking::
33 * Garbage Collecting Smobs::
34 * Garbage Collecting Simple Smobs::
35 * Remembering During Operations::
36 * Double Smobs::
37 * The Complete Example::
38 @end menu
39
40 @node Describing a New Type
41 @subsection Describing a New Type
42
43 To define a new type, the programmer must write four functions to
44 manage instances of the type:
45
46 @table @code
47 @item mark
48 Guile will apply this function to each instance of the new type it
49 encounters during garbage collection. This function is responsible for
50 telling the collector about any other @code{SCM} values that the object
51 has stored. The default smob mark function does nothing.
52 @xref{Garbage Collecting Smobs}, for more details.
53
54 @item free
55 Guile will apply this function to each instance of the new type that is
56 to be deallocated. The function should release all resources held by
57 the object. This is analogous to the Java finalization method-- it is
58 invoked at an unspecified time (when garbage collection occurs) after
59 the object is dead. The default free function frees the smob data (if
60 the size of the struct passed to @code{scm_make_smob_type} is non-zero)
61 using @code{scm_gc_free}. @xref{Garbage Collecting Smobs}, for more
62 details.
63
64 @item print
65 Guile will apply this function to each instance of the new type to print
66 the value, as for @code{display} or @code{write}. The default print
67 function prints @code{#<NAME ADDRESS>} where @code{NAME} is the first
68 argument passed to @code{scm_make_smob_type}. For more information on
69 printing, see @ref{Port Data}.
70
71 @item equalp
72 If Scheme code asks the @code{equal?} function to compare two instances
73 of the same smob type, Guile calls this function. It should return
74 @code{SCM_BOOL_T} if @var{a} and @var{b} should be considered
75 @code{equal?}, or @code{SCM_BOOL_F} otherwise. If @code{equalp} is
76 @code{NULL}, @code{equal?} will assume that two instances of this type are
77 never @code{equal?} unless they are @code{eq?}.
78
79 @end table
80
81 To actually register the new smob type, call @code{scm_make_smob_type}.
82 It returns a value of type @code{scm_t_bits} which identifies the new
83 smob type.
84
85 The four special functions descrtibed above are registered by calling
86 one of @code{scm_set_smob_mark}, @code{scm_set_smob_free},
87 @code{scm_set_smob_print}, or @code{scm_set_smob_equalp}, as
88 appropriate. Each function is intended to be used at most once per
89 type, and the call should be placed immediately following the call to
90 @code{scm_make_smob_type}.
91
92 There can only be at most 256 different smob types in the system.
93 Instead of registering a huge number of smob types (for example, one
94 for each relevant C struct in your application), it is sometimes
95 better to register just one and implement a second alyer of type
96 dispatching on top of it. This second layer might use the 16 extra
97 bits for as an extended type, for example.
98
99 Here is how one might declare and register a new type representing
100 eight-bit gray-scale images:
101
102 @example
103 #include <libguile.h>
104
105 struct image @{
106 int width, height;
107 char *pixels;
108
109 /* The name of this image */
110 SCM name;
111
112 /* A function to call when this image is
113 modified, e.g., to update the screen,
114 or SCM_BOOL_F if no action necessary */
115 SCM update_func;
116 @};
117
118 static scm_t_bits image_tag;
119
120 void
121 init_image_type (void)
122 @{
123 image_tag = scm_make_smob_type ("image", sizeof (struct image));
124 scm_set_smob_mark (image_tag, mark_image);
125 scm_set_smob_free (image_tag, free_image);
126 scm_set_smob_print (image_tag, print_image);
127 @}
128 @end example
129
130
131 @node Creating Instances
132 @subsection Creating Instances
133
134 Normally, smobs can have one @emph{immediate} words of data. This word
135 stores either a pointer to an additional memory block that holds the
136 real data, or it might hold the data itself when it fits. The word is
137 of type @code{scm_t_bits} and is large enough for a @code{SCM} value or
138 a pointer to @code{void}.
139
140 You can also create smobs that have two or three immediate words, and
141 when these words suffice to store all data, it is more efficient to use
142 these super-sized smobs instead of using a normal smob plus a memory
143 block. @xref{Double Smobs}, for their discussion.
144
145 To retrieve the immediate word of a smob, you use the macro
146 @code{SCM_SMOB_DATA}. It can be set with @code{SCM_SET_SMOB_DATA}.
147 The 16 extra bits can be accessed with @code{SCM_SMOB_FLAGS} and
148 @code{SCM_SET_SMOB_FLAGS}.
149
150 Guile provides functions for managing memory which are often helpful
151 when implementing smobs. @xref{Memory Blocks}.
152
153 Creating a smob instance can be tricky when it consists of multiple
154 steps that allocate resources and might fail. It is recommended that
155 you go about creating a smob in the following way:
156
157 @itemize
158 @item
159 Allocate the memory block for holding the data with
160 @code{scm_gc_malloc}.
161 @item
162 Initialize it to a valid state without calling any functions that might
163 cause a non-local exits. For example, initialize pointers to NULL.
164 Also, do not store @code{SCM} values in it that must be protected.
165 Initialize these fields with @code{SCM_BOOL_F}.
166
167 A valid state is one that can be safely acted upon by the @emph{mark}
168 and @emph{free} functions of your smob type.
169 @item
170 Create the smob using @code{SCM_NEWSMOB}, passing it the initialized
171 memory block. (This step will always succeed.)
172 @item
173 Complete the initialization of the memory block by, for example,
174 allocating additional resources and making it point to them.
175 @end itemize
176
177 This precedure ensures that the smob is in a valid state as soon as it
178 exists, that all resources that are allocated for the smob are properly
179 associated with it so that they can be properly freed, and that no
180 @code{SCM} values that need to be protected are stored in it while the
181 smob does not yet competely exist and thus can not protect them.
182
183 Continuing the example from above, if the global variable
184 @code{image_tag} contains a tag returned by @code{scm_make_smob_type},
185 here is how we could construct a smob whose immediate word contains a
186 pointer to a freshly allocated @code{struct image}:
187
188 @example
189 SCM
190 make_image (SCM name, SCM s_width, SCM s_height)
191 @{
192 SCM smob;
193 struct image *image;
194 int width = scm_to_int (s_width);
195 int height = scm_to_int (s_height);
196
197 /* Step 1: Allocate the memory block.
198 */
199 image = (struct image *) scm_gc_malloc (sizeof (struct image), "image");
200
201 /* Step 2: Initialize it with straight code.
202 */
203 image->width = width;
204 image->height = height;
205 image->pixels = NULL;
206 image->name = SCM_BOOL_F;
207 image->update_func = SCM_BOOL_F;
208
209 /* Step 3: Create the smob.
210 */
211 SCM_NEWSMOB (smob, image);
212
213 /* Step 4: Finish the initialization.
214 */
215 image->name = name;
216 image->pixels = scm_gc_malloc (width * height, "image pixels");
217
218 return smob;
219 @}
220 @end example
221
222 Let us look at what might happen when @code{make_image} is called.
223
224 The conversions of @var{s_width} and @var{s_height} to @code{int}s might
225 fail and signal an error, thus causing a non-local exit. This is not a
226 problem since no resources have been allocated yet that would have to be
227 freed.
228
229 The allocation of @var{image} in step 1 might fail, but this is likewise
230 no problem.
231
232 Step 2 can not exit non-locally. At the end of it, the @var{image}
233 struct is in a valid state for the @code{mark_image} and
234 @code{free_image} functions (see below).
235
236 Step 3 can not exit non-locally either. This is guaranteed by Guile.
237 After it, @var{smob} contains a valid smob that is properly initialized
238 and protected, and in turn can properly protect the Scheme values in its
239 @var{image} struct.
240
241 But before the smob is completely created, @code{SCM_NEWSMOB} might
242 cause the garbage collector to run. During this garbage collection, the
243 @code{SCM} values in the @var{image} struct would be invisible to Guile.
244 It only gets to know about them via the @code{mark_image} function, but
245 that function can not yet do its job since the smob has not been created
246 yet. Thus, it is important to not store @code{SCM} values in the
247 @var{image} struct until after the smob has been created.
248
249 Step 4, finally, might fail and cause a non-local exit. In that case,
250 the creation of the smob has not been successful. It will eventually be
251 freed by the garbage collector, and all the resources that have been
252 allocated for it will be correctly freed by @code{free_image}.
253
254 @node Type checking
255 @subsection Type checking
256
257 Functions that operate on smobs should check that the passed @code{SCM}
258 value indeed is a suitable smob before accessing its data.
259
260 For example, here is a simple function that operates on an image smob,
261 and checks the type of its argument.
262
263 @example
264 SCM
265 clear_image (SCM image_smob)
266 @{
267 int area;
268 struct image *image;
269
270 SCM_ASSERT (SCM_SMOB_PREDICATE (image_tag, image_smob),
271 image_smob, SCM_ARG1, "clear-image");
272
273 image = (struct image *) SCM_SMOB_DATA (image_smob);
274 area = image->width * image->height;
275 memset (image->pixels, 0, area);
276
277 /* Invoke the image's update function.
278 */
279 if (scm_is_true (image->update_func))
280 scm_call_0 (image->update_func);
281
282 scm_remember_upto_here_1 (image_smob);
283
284 return SCM_UNSPECIFIED;
285 @}
286 @end example
287
288 See @ref{Remembering During Operations} for an explanation of the call
289 to @code{scm_remember_upto_here_1}.
290
291
292 @node Garbage Collecting Smobs
293 @subsection Garbage Collecting Smobs
294
295 Once a smob has been released to the tender mercies of the Scheme
296 system, it must be prepared to survive garbage collection. Guile calls
297 the @emph{mark} and @emph{free} functions of the smob to manage this.
298
299 As described in more detail elsewhere (@pxref{Conservative GC}), every
300 object in the Scheme system has a @dfn{mark bit}, which the garbage
301 collector uses to tell live objects from dead ones. When collection
302 starts, every object's mark bit is clear. The collector traces pointers
303 through the heap, starting from objects known to be live, and sets the
304 mark bit on each object it encounters. When it can find no more
305 unmarked objects, the collector walks all objects, live and dead, frees
306 those whose mark bits are still clear, and clears the mark bit on the
307 others.
308
309 The two main portions of the collection are called the @dfn{mark phase},
310 during which the collector marks live objects, and the @dfn{sweep
311 phase}, during which the collector frees all unmarked objects.
312
313 The mark bit of a smob lives in a special memory region. When the
314 collector encounters a smob, it sets the smob's mark bit, and uses the
315 smob's type tag to find the appropriate @emph{mark} function for that
316 smob. It then calls this @emph{mark} function, passing it the smob as
317 its only argument.
318
319 The @emph{mark} function is responsible for marking any other Scheme
320 objects the smob refers to. If it does not do so, the objects' mark
321 bits will still be clear when the collector begins to sweep, and the
322 collector will free them. If this occurs, it will probably break, or at
323 least confuse, any code operating on the smob; the smob's @code{SCM}
324 values will have become dangling references.
325
326 To mark an arbitrary Scheme object, the @emph{mark} function calls
327 @code{scm_gc_mark}.
328
329 Thus, here is how we might write @code{mark_image}:
330
331 @example
332 @group
333 SCM
334 mark_image (SCM image_smob)
335 @{
336 /* Mark the image's name and update function. */
337 struct image *image = (struct image *) SCM_SMOB_DATA (image_smob);
338
339 scm_gc_mark (image->name);
340 scm_gc_mark (image->update_func);
341
342 return SCM_BOOL_F;
343 @}
344 @end group
345 @end example
346
347 Note that, even though the image's @code{update_func} could be an
348 arbitrarily complex structure (representing a procedure and any values
349 enclosed in its environment), @code{scm_gc_mark} will recurse as
350 necessary to mark all its components. Because @code{scm_gc_mark} sets
351 an object's mark bit before it recurses, it is not confused by
352 circular structures.
353
354 As an optimization, the collector will mark whatever value is returned
355 by the @emph{mark} function; this helps limit depth of recursion during
356 the mark phase. Thus, the code above should really be written as:
357 @example
358 @group
359 SCM
360 mark_image (SCM image_smob)
361 @{
362 /* Mark the image's name and update function. */
363 struct image *image = (struct image *) SCM_SMOB_DATA (image_smob);
364
365 scm_gc_mark (image->name);
366 return image->update_func;
367 @}
368 @end group
369 @end example
370
371
372 Finally, when the collector encounters an unmarked smob during the sweep
373 phase, it uses the smob's tag to find the appropriate @emph{free}
374 function for the smob. It then calls that function, passing it the smob
375 as its only argument.
376
377 The @emph{free} function must release any resources used by the smob.
378 However, it must not free objects managed by the collector; the
379 collector will take care of them. For historical reasons, the return
380 type of the @emph{free} function should be @code{size_t}, an unsigned
381 integral type; the @emph{free} function should always return zero.
382
383 Here is how we might write the @code{free_image} function for the image
384 smob type:
385 @example
386 size_t
387 free_image (SCM image_smob)
388 @{
389 struct image *image = (struct image *) SCM_SMOB_DATA (image_smob);
390
391 scm_gc_free (image->pixels, image->width * image->height, "image pixels");
392 scm_gc_free (image, sizeof (struct image), "image");
393
394 return 0;
395 @}
396 @end example
397
398 During the sweep phase, the garbage collector will clear the mark bits
399 on all live objects. The code which implements a smob need not do this
400 itself.
401
402 There is no way for smob code to be notified when collection is
403 complete.
404
405 It is usually a good idea to minimize the amount of processing done
406 during garbage collection; keep the @emph{mark} and @emph{free}
407 functions very simple. Since collections occur at unpredictable times,
408 it is easy for any unusual activity to interfere with normal code.
409
410
411 @node Garbage Collecting Simple Smobs
412 @subsection Garbage Collecting Simple Smobs
413
414 It is often useful to define very simple smob types --- smobs which have
415 no data to mark, other than the cell itself, or smobs whose immediate
416 data word is simply an ordinary Scheme object, to be marked recursively.
417 Guile provides some functions to handle these common cases; you can use
418 this function as your smob type's @emph{mark} function, if your smob's
419 structure is simple enough.
420
421 If the smob refers to no other Scheme objects, then no action is
422 necessary; the garbage collector has already marked the smob cell
423 itself. In that case, you can use zero as your mark function.
424
425 @deftypefun SCM scm_markcdr (SCM @var{x})
426 Mark the references in the smob @var{x}, assuming that @var{x}'s first
427 data word contains an ordinary Scheme object, and @var{x} refers to no
428 other objects. This function simply returns @var{x}'s first data word.
429
430 This is only useful for simple smobs created by @code{SCM_NEWSMOB} or
431 @code{SCM_RETURN_NEWSMOB}, not for smobs allocated as double cells.
432 @end deftypefun
433
434 @node Remembering During Operations
435 @subsection Remembering During Operations
436 @cindex Remembering
437
438 It's important that a smob is visible to the garbage collector
439 whenever its contents are being accessed. Otherwise it could be freed
440 while code is still using it.
441
442 For example, consider a procedure to convert image data to a list of
443 pixel values.
444
445 @example
446 SCM
447 image_to_list (SCM image_smob)
448 @{
449 struct image *image;
450 SCM lst;
451 int i;
452 SCM_ASSERT (SCM_SMOB_PREDICATE (image_tag, image_smob),
453 image_smob, SCM_ARG1, "image->list");
454
455 image = (struct image *) SCM_SMOB_DATA (image_smob);
456 lst = SCM_EOL;
457 for (i = image->width * image->height - 1; i >= 0; i--)
458 lst = scm_cons (scm_from_char (image->pixels[i]), lst);
459
460 scm_remember_upto_here_1 (image_smob);
461 return lst;
462 @}
463 @end example
464
465 In the loop, only the @code{image} pointer is used and the C compiler
466 has no reason to keep the @code{image_smob} value anywhere. If
467 @code{scm_cons} results in a garbage collection, @code{image_smob} might
468 not be on the stack or anywhere else and could be freed, leaving the
469 loop accessing freed data. The use of @code{scm_remember_upto_here_1}
470 prevents this, by creating a reference to @code{image_smob} after all
471 data accesses.
472
473 There's no need to do the same for @code{lst}, since that's the return
474 value and the compiler will certainly keep it in a register or
475 somewhere throughout the routine.
476
477 The @code{clear_image} example previously shown (@pxref{Type checking})
478 also used @code{scm_remember_upto_here_1} for this reason.
479
480 It's only in quite rare circumstances that a missing
481 @code{scm_remember_upto_here_1} will bite, but when it happens the
482 consequences are serious. Fortunately the rule is simple: whenever
483 calling a Guile library function or doing something that might, ensure
484 the @code{SCM} of a smob is referenced past all accesses to its
485 insides. Do this by adding an @code{scm_remember_upto_here_1} if
486 there are no other references.
487
488 In a multi-threaded program, the rule is the same. As far as a given
489 thread is concerned, a garbage collection still only occurs within a
490 Guile library function, not at an arbitrary time. (Guile waits for all
491 threads to reach one of its library functions, and holds them there
492 while the collector runs.)
493
494 @node Double Smobs
495 @subsection Double Smobs
496
497 Smobs are called smob because they are small: they normally have only
498 room for one @code{scm_t_bits} value plus 16 bits. The reason for
499 this is that smobs are directly implemented by using the low-level,
500 two-word cells of Guile that are also used to implement pairs, for
501 example. (@pxref{Data Representation} for the details.) One word of
502 the two-word cells is used for @code{SCM_SMOB_DATA}, the other
503 contains the 16-bit type tag and the 16 extra bits.
504
505 In addition to the fundamental two-word cells, Guile also has
506 four-word cells, which are appropriately called @dfn{double cells}.
507 You can use them for @dfn{double smobs} and get two more immediate
508 words of type @code{scm_t_bits}.
509
510 A double smob is created with @code{SCM_NEWSMOB2} or
511 @code{SCM_NEWSMOB3} instead of @code{SCM_NEWSMOB}. Its immediate
512 words can be retrieved with @code{SCM_SMOB_DATA2} and
513 @code{SCM_SMOB_DATA3} in addition to @code{SCM_SMOB_DATA}.
514 Unsurprisingly, the words can be set with @code{SCM_SET_SMOB_DATA2}
515 and @code{SCM_SET_SMOB_DATA3}.
516
517 @node The Complete Example
518 @subsection The Complete Example
519
520 Here is the complete text of the implementation of the image datatype,
521 as presented in the sections above. We also provide a definition for
522 the smob's @emph{print} function, and make some objects and functions
523 static, to clarify exactly what the surrounding code is using.
524
525 As mentioned above, you can find this code in the Guile distribution, in
526 @file{doc/example-smob}. That directory includes a makefile and a
527 suitable @code{main} function, so you can build a complete interactive
528 Guile shell, extended with the datatypes described here.)
529
530 @example
531 /* file "image-type.c" */
532
533 #include <stdlib.h>
534 #include <libguile.h>
535
536 static scm_t_bits image_tag;
537
538 struct image @{
539 int width, height;
540 char *pixels;
541
542 /* The name of this image */
543 SCM name;
544
545 /* A function to call when this image is
546 modified, e.g., to update the screen,
547 or SCM_BOOL_F if no action necessary */
548 SCM update_func;
549 @};
550
551 static SCM
552 make_image (SCM name, SCM s_width, SCM s_height)
553 @{
554 SCM smob;
555 struct image *image;
556 int width = scm_to_int (s_width);
557 int height = scm_to_int (s_height);
558
559 /* Step 1: Allocate the memory block.
560 */
561 image = (struct image *) scm_gc_malloc (sizeof (struct image), "image");
562
563 /* Step 2: Initialize it with straight code.
564 */
565 image->width = width;
566 image->height = height;
567 image->pixels = NULL;
568 image->name = SCM_BOOL_F;
569 image->update_func = SCM_BOOL_F;
570
571 /* Step 3: Create the smob.
572 */
573 SCM_NEWSMOB (smob, image);
574
575 /* Step 4: Finish the initialization.
576 */
577 image->name = name;
578 image->pixels = scm_gc_malloc (width * height, "image pixels");
579
580 return smob;
581 @}
582
583 SCM
584 clear_image (SCM image_smob)
585 @{
586 int area;
587 struct image *image;
588
589 SCM_ASSERT (SCM_SMOB_PREDICATE (image_tag, image_smob),
590 image_smob, SCM_ARG1, "clear-image");
591
592 image = (struct image *) SCM_SMOB_DATA (image_smob);
593 area = image->width * image->height;
594 memset (image->pixels, 0, area);
595
596 /* Invoke the image's update function.
597 */
598 if (scm_is_true (image->update_func))
599 scm_call_0 (image->update_func);
600
601 scm_remember_upto_here_1 (image_smob);
602
603 return SCM_UNSPECIFIED;
604 @}
605
606 static SCM
607 mark_image (SCM image_smob)
608 @{
609 /* Mark the image's name and update function. */
610 struct image *image = (struct image *) SCM_SMOB_DATA (image_smob);
611
612 scm_gc_mark (image->name);
613 return image->update_func;
614 @}
615
616 static size_t
617 free_image (SCM image_smob)
618 @{
619 struct image *image = (struct image *) SCM_SMOB_DATA (image_smob);
620
621 scm_gc_free (image->pixels, image->width * image->height, "image pixels");
622 scm_gc_free (image, sizeof (struct image), "image");
623
624 return 0;
625 @}
626
627 static int
628 print_image (SCM image_smob, SCM port, scm_print_state *pstate)
629 @{
630 struct image *image = (struct image *) SCM_SMOB_DATA (image_smob);
631
632 scm_puts ("#<image ", port);
633 scm_display (image->name, port);
634 scm_puts (">", port);
635
636 /* non-zero means success */
637 return 1;
638 @}
639
640 void
641 init_image_type (void)
642 @{
643 image_tag = scm_make_smob_type ("image", sizeof (struct image));
644 scm_set_smob_mark (image_tag, mark_image);
645 scm_set_smob_free (image_tag, free_image);
646 scm_set_smob_print (image_tag, print_image);
647
648 scm_c_define_gsubr ("clear-image", 1, 0, 0, clear_image);
649 scm_c_define_gsubr ("make-image", 3, 0, 0, make_image);
650 @}
651 @end example
652
653 Here is a sample build and interaction with the code from the
654 @file{example-smob} directory, on the author's machine:
655
656 @example
657 zwingli:example-smob$ make CC=gcc
658 gcc `guile-config compile` -c image-type.c -o image-type.o
659 gcc `guile-config compile` -c myguile.c -o myguile.o
660 gcc image-type.o myguile.o `guile-config link` -o myguile
661 zwingli:example-smob$ ./myguile
662 guile> make-image
663 #<primitive-procedure make-image>
664 guile> (define i (make-image "Whistler's Mother" 100 100))
665 guile> i
666 #<image Whistler's Mother>
667 guile> (clear-image i)
668 guile> (clear-image 4)
669 ERROR: In procedure clear-image in expression (clear-image 4):
670 ERROR: Wrong type argument in position 1: 4
671 ABORT: (wrong-type-arg)
672
673 Type "(backtrace)" to get more information.
674 guile>
675 @end example