Merge commit '1e3fd6a0c81bb3e9900a93a9d1923cc788de0f99'
[bpt/guile.git] / doc / ref / libguile-smobs.texi
1 @c -*-texinfo-*-
2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2005, 2010, 2011, 2013
4 @c Free Software Foundation, Inc.
5 @c See the file guile.texi for copying conditions.
6
7 @node Defining New Types (Smobs)
8 @section Defining New Types (Smobs)
9
10 @dfn{Smobs} are Guile's mechanism for adding new primitive types to
11 the system. The term ``smob'' was coined by Aubrey Jaffer, who says
12 it comes from ``small object'', referring to the fact that they are
13 quite limited in size: they can hold just one pointer to a larger
14 memory block plus 16 extra bits.
15
16 To define a new smob type, the programmer provides Guile with some
17 essential information about the type --- how to print it, how to
18 garbage collect it, and so on --- and Guile allocates a fresh type tag
19 for it. The programmer can then use @code{scm_c_define_gsubr} to make
20 a set of C functions visible to Scheme code that create and operate on
21 these objects.
22
23 (You can find a complete version of the example code used in this
24 section in the Guile distribution, in @file{doc/example-smob}. That
25 directory includes a makefile and a suitable @code{main} function, so
26 you can build a complete interactive Guile shell, extended with the
27 datatypes described here.)
28
29 @menu
30 * Describing a New Type::
31 * Creating Smob Instances::
32 * Type checking::
33 * Garbage Collecting Smobs::
34 * Remembering During Operations::
35 * Double Smobs::
36 * The Complete Example::
37 @end menu
38
39 @node Describing a New Type
40 @subsection Describing a New Type
41
42 To define a new type, the programmer must write two functions to
43 manage instances of the type:
44
45 @table @code
46 @item print
47 Guile will apply this function to each instance of the new type to print
48 the value, as for @code{display} or @code{write}. The default print
49 function prints @code{#<NAME ADDRESS>} where @code{NAME} is the first
50 argument passed to @code{scm_make_smob_type}.
51
52 @item equalp
53 If Scheme code asks the @code{equal?} function to compare two instances
54 of the same smob type, Guile calls this function. It should return
55 @code{SCM_BOOL_T} if @var{a} and @var{b} should be considered
56 @code{equal?}, or @code{SCM_BOOL_F} otherwise. If @code{equalp} is
57 @code{NULL}, @code{equal?} will assume that two instances of this type are
58 never @code{equal?} unless they are @code{eq?}.
59
60 @end table
61
62 When the only resource associated with a smob is memory managed by the
63 garbage collector---i.e., memory allocated with the @code{scm_gc_malloc}
64 functions---this is sufficient. However, when a smob is associated with
65 other kinds of resources, it may be necessary to define one of the
66 following functions, or both:
67
68 @table @code
69 @item mark
70 Guile will apply this function to each instance of the new type it
71 encounters during garbage collection. This function is responsible for
72 telling the collector about any other @code{SCM} values that the object
73 has stored, and that are in memory regions not already scanned by the
74 garbage collector. @xref{Garbage Collecting Smobs}, for more details.
75
76 @item free
77 Guile will apply this function to each instance of the new type that is
78 to be deallocated. The function should release all resources held by
79 the object. This is analogous to the Java finalization method---it is
80 invoked at an unspecified time (when garbage collection occurs) after
81 the object is dead. @xref{Garbage Collecting Smobs}, for more details.
82
83 This function operates while the heap is in an inconsistent state and
84 must therefore be careful. @xref{Smobs}, for details about what this
85 function is allowed to do.
86 @end table
87
88 To actually register the new smob type, call @code{scm_make_smob_type}.
89 It returns a value of type @code{scm_t_bits} which identifies the new
90 smob type.
91
92 The four special functions described above are registered by calling
93 one of @code{scm_set_smob_mark}, @code{scm_set_smob_free},
94 @code{scm_set_smob_print}, or @code{scm_set_smob_equalp}, as
95 appropriate. Each function is intended to be used at most once per
96 type, and the call should be placed immediately following the call to
97 @code{scm_make_smob_type}.
98
99 There can only be at most 256 different smob types in the system.
100 Instead of registering a huge number of smob types (for example, one
101 for each relevant C struct in your application), it is sometimes
102 better to register just one and implement a second layer of type
103 dispatching on top of it. This second layer might use the 16 extra
104 bits to extend its type, for example.
105
106 Here is how one might declare and register a new type representing
107 eight-bit gray-scale images:
108
109 @example
110 #include <libguile.h>
111
112 struct image @{
113 int width, height;
114 char *pixels;
115
116 /* The name of this image */
117 SCM name;
118
119 /* A function to call when this image is
120 modified, e.g., to update the screen,
121 or SCM_BOOL_F if no action necessary */
122 SCM update_func;
123 @};
124
125 static scm_t_bits image_tag;
126
127 void
128 init_image_type (void)
129 @{
130 image_tag = scm_make_smob_type ("image", sizeof (struct image));
131 scm_set_smob_mark (image_tag, mark_image);
132 scm_set_smob_free (image_tag, free_image);
133 scm_set_smob_print (image_tag, print_image);
134 @}
135 @end example
136
137
138 @node Creating Smob Instances
139 @subsection Creating Smob Instances
140
141 Normally, smobs can have one @emph{immediate} word of data. This word
142 stores either a pointer to an additional memory block that holds the
143 real data, or it might hold the data itself when it fits. The word is
144 large enough for a @code{SCM} value, a pointer to @code{void}, or an
145 integer that fits into a @code{size_t} or @code{ssize_t}.
146
147 You can also create smobs that have two or three immediate words, and
148 when these words suffice to store all data, it is more efficient to use
149 these super-sized smobs instead of using a normal smob plus a memory
150 block. @xref{Double Smobs}, for their discussion.
151
152 Guile provides functions for managing memory which are often helpful
153 when implementing smobs. @xref{Memory Blocks}.
154
155 To retrieve the immediate word of a smob, you use the macro
156 @code{SCM_SMOB_DATA}. It can be set with @code{SCM_SET_SMOB_DATA}.
157 The 16 extra bits can be accessed with @code{SCM_SMOB_FLAGS} and
158 @code{SCM_SET_SMOB_FLAGS}.
159
160 The two macros @code{SCM_SMOB_DATA} and @code{SCM_SET_SMOB_DATA} treat
161 the immediate word as if it were of type @code{scm_t_bits}, which is
162 an unsigned integer type large enough to hold a pointer to
163 @code{void}. Thus you can use these macros to store arbitrary
164 pointers in the smob word.
165
166 When you want to store a @code{SCM} value directly in the immediate
167 word of a smob, you should use the macros @code{SCM_SMOB_OBJECT} and
168 @code{SCM_SET_SMOB_OBJECT} to access it.
169
170 Creating a smob instance can be tricky when it consists of multiple
171 steps that allocate resources. Most of the time, this is mainly about
172 allocating memory to hold associated data structures. Using memory
173 managed by the garbage collector simplifies things: the garbage
174 collector will automatically scan those data structures for pointers,
175 and reclaim them when they are no longer referenced.
176
177 Continuing the example from above, if the global variable
178 @code{image_tag} contains a tag returned by @code{scm_make_smob_type},
179 here is how we could construct a smob whose immediate word contains a
180 pointer to a freshly allocated @code{struct image}:
181
182 @example
183 SCM
184 make_image (SCM name, SCM s_width, SCM s_height)
185 @{
186 SCM smob;
187 struct image *image;
188 int width = scm_to_int (s_width);
189 int height = scm_to_int (s_height);
190
191 /* Step 1: Allocate the memory block.
192 */
193 image = (struct image *)
194 scm_gc_malloc (sizeof (struct image), "image");
195
196 /* Step 2: Initialize it with straight code.
197 */
198 image->width = width;
199 image->height = height;
200 image->pixels = NULL;
201 image->name = SCM_BOOL_F;
202 image->update_func = SCM_BOOL_F;
203
204 /* Step 3: Create the smob.
205 */
206 smob = scm_new_smob (image_tag, image);
207
208 /* Step 4: Finish the initialization.
209 */
210 image->name = name;
211 image->pixels =
212 scm_gc_malloc_pointerless (width * height, "image pixels");
213
214 return smob;
215 @}
216 @end example
217
218 We use @code{scm_gc_malloc_pointerless} for the pixel buffer to tell the
219 garbage collector not to scan it for pointers. Calls to
220 @code{scm_gc_malloc}, @code{scm_new_smob}, and
221 @code{scm_gc_malloc_pointerless} raise an exception in out-of-memory
222 conditions; the garbage collector is able to reclaim previously
223 allocated memory if that happens.
224
225
226 @node Type checking
227 @subsection Type checking
228
229 Functions that operate on smobs should check that the passed
230 @code{SCM} value indeed is a suitable smob before accessing its data.
231 They can do this with @code{scm_assert_smob_type}.
232
233 For example, here is a simple function that operates on an image smob,
234 and checks the type of its argument.
235
236 @example
237 SCM
238 clear_image (SCM image_smob)
239 @{
240 int area;
241 struct image *image;
242
243 scm_assert_smob_type (image_tag, image_smob);
244
245 image = (struct image *) SCM_SMOB_DATA (image_smob);
246 area = image->width * image->height;
247 memset (image->pixels, 0, area);
248
249 /* Invoke the image's update function.
250 */
251 if (scm_is_true (image->update_func))
252 scm_call_0 (image->update_func);
253
254 scm_remember_upto_here_1 (image_smob);
255
256 return SCM_UNSPECIFIED;
257 @}
258 @end example
259
260 See @ref{Remembering During Operations} for an explanation of the call
261 to @code{scm_remember_upto_here_1}.
262
263
264 @node Garbage Collecting Smobs
265 @subsection Garbage Collecting Smobs
266
267 Once a smob has been released to the tender mercies of the Scheme
268 system, it must be prepared to survive garbage collection. In the
269 example above, all the memory associated with the smob is managed by the
270 garbage collector because we used the @code{scm_gc_} allocation
271 functions. Thus, no special care must be taken: the garbage collector
272 automatically scans them and reclaims any unused memory.
273
274 However, when data associated with a smob is managed in some other
275 way---e.g., @code{malloc}'d memory or file descriptors---it is possible
276 to specify a @emph{free} function to release those resources when the
277 smob is reclaimed, and a @emph{mark} function to mark Scheme objects
278 otherwise invisible to the garbage collector.
279
280 As described in more detail elsewhere (@pxref{Conservative GC}), every
281 object in the Scheme system has a @dfn{mark bit}, which the garbage
282 collector uses to tell live objects from dead ones. When collection
283 starts, every object's mark bit is clear. The collector traces pointers
284 through the heap, starting from objects known to be live, and sets the
285 mark bit on each object it encounters. When it can find no more
286 unmarked objects, the collector walks all objects, live and dead, frees
287 those whose mark bits are still clear, and clears the mark bit on the
288 others.
289
290 The two main portions of the collection are called the @dfn{mark phase},
291 during which the collector marks live objects, and the @dfn{sweep
292 phase}, during which the collector frees all unmarked objects.
293
294 The mark bit of a smob lives in a special memory region. When the
295 collector encounters a smob, it sets the smob's mark bit, and uses the
296 smob's type tag to find the appropriate @emph{mark} function for that
297 smob. It then calls this @emph{mark} function, passing it the smob as
298 its only argument.
299
300 The @emph{mark} function is responsible for marking any other Scheme
301 objects the smob refers to. If it does not do so, the objects' mark
302 bits will still be clear when the collector begins to sweep, and the
303 collector will free them. If this occurs, it will probably break, or at
304 least confuse, any code operating on the smob; the smob's @code{SCM}
305 values will have become dangling references.
306
307 To mark an arbitrary Scheme object, the @emph{mark} function calls
308 @code{scm_gc_mark}.
309
310 Thus, here is how we might write @code{mark_image}---again this is not
311 needed in our example since we used the @code{scm_gc_} allocation
312 routines, so this is just for the sake of illustration:
313
314 @example
315 @group
316 SCM
317 mark_image (SCM image_smob)
318 @{
319 /* Mark the image's name and update function. */
320 struct image *image = (struct image *) SCM_SMOB_DATA (image_smob);
321
322 scm_gc_mark (image->name);
323 scm_gc_mark (image->update_func);
324
325 return SCM_BOOL_F;
326 @}
327 @end group
328 @end example
329
330 Note that, even though the image's @code{update_func} could be an
331 arbitrarily complex structure (representing a procedure and any values
332 enclosed in its environment), @code{scm_gc_mark} will recurse as
333 necessary to mark all its components. Because @code{scm_gc_mark} sets
334 an object's mark bit before it recurses, it is not confused by
335 circular structures.
336
337 As an optimization, the collector will mark whatever value is returned
338 by the @emph{mark} function; this helps limit depth of recursion during
339 the mark phase. Thus, the code above should really be written as:
340 @example
341 @group
342 SCM
343 mark_image (SCM image_smob)
344 @{
345 /* Mark the image's name and update function. */
346 struct image *image = (struct image *) SCM_SMOB_DATA (image_smob);
347
348 scm_gc_mark (image->name);
349 return image->update_func;
350 @}
351 @end group
352 @end example
353
354
355 Finally, when the collector encounters an unmarked smob during the sweep
356 phase, it uses the smob's tag to find the appropriate @emph{free}
357 function for the smob. It then calls that function, passing it the smob
358 as its only argument.
359
360 The @emph{free} function must release any resources used by the smob.
361 However, it must not free objects managed by the collector; the
362 collector will take care of them. For historical reasons, the return
363 type of the @emph{free} function should be @code{size_t}, an unsigned
364 integral type; the @emph{free} function should always return zero.
365
366 Here is how we might write the @code{free_image} function for the image
367 smob type---again for the sake of illustration, since our example does
368 not need it thanks to the use of the @code{scm_gc_} allocation routines:
369 @example
370 size_t
371 free_image (SCM image_smob)
372 @{
373 struct image *image = (struct image *) SCM_SMOB_DATA (image_smob);
374
375 scm_gc_free (image->pixels,
376 image->width * image->height,
377 "image pixels");
378 scm_gc_free (image, sizeof (struct image), "image");
379
380 return 0;
381 @}
382 @end example
383
384 During the sweep phase, the garbage collector will clear the mark bits
385 on all live objects. The code which implements a smob need not do this
386 itself.
387
388 There is no way for smob code to be notified when collection is
389 complete.
390
391 It is usually a good idea to minimize the amount of processing done
392 during garbage collection; keep the @emph{mark} and @emph{free}
393 functions very simple. Since collections occur at unpredictable times,
394 it is easy for any unusual activity to interfere with normal code.
395
396 @node Remembering During Operations
397 @subsection Remembering During Operations
398 @cindex remembering
399
400 @c FIXME: Remove this section?
401
402 It's important that a smob is visible to the garbage collector
403 whenever its contents are being accessed. Otherwise it could be freed
404 while code is still using it.
405
406 For example, consider a procedure to convert image data to a list of
407 pixel values.
408
409 @example
410 SCM
411 image_to_list (SCM image_smob)
412 @{
413 struct image *image;
414 SCM lst;
415 int i;
416
417 scm_assert_smob_type (image_tag, image_smob);
418
419 image = (struct image *) SCM_SMOB_DATA (image_smob);
420 lst = SCM_EOL;
421 for (i = image->width * image->height - 1; i >= 0; i--)
422 lst = scm_cons (scm_from_char (image->pixels[i]), lst);
423
424 scm_remember_upto_here_1 (image_smob);
425 return lst;
426 @}
427 @end example
428
429 In the loop, only the @code{image} pointer is used and the C compiler
430 has no reason to keep the @code{image_smob} value anywhere. If
431 @code{scm_cons} results in a garbage collection, @code{image_smob} might
432 not be on the stack or anywhere else and could be freed, leaving the
433 loop accessing freed data. The use of @code{scm_remember_upto_here_1}
434 prevents this, by creating a reference to @code{image_smob} after all
435 data accesses.
436
437 There's no need to do the same for @code{lst}, since that's the return
438 value and the compiler will certainly keep it in a register or
439 somewhere throughout the routine.
440
441 The @code{clear_image} example previously shown (@pxref{Type checking})
442 also used @code{scm_remember_upto_here_1} for this reason.
443
444 It's only in quite rare circumstances that a missing
445 @code{scm_remember_upto_here_1} will bite, but when it happens the
446 consequences are serious. Fortunately the rule is simple: whenever
447 calling a Guile library function or doing something that might, ensure
448 that the @code{SCM} of a smob is referenced past all accesses to its
449 insides. Do this by adding an @code{scm_remember_upto_here_1} if
450 there are no other references.
451
452 In a multi-threaded program, the rule is the same. As far as a given
453 thread is concerned, a garbage collection still only occurs within a
454 Guile library function, not at an arbitrary time. (Guile waits for all
455 threads to reach one of its library functions, and holds them there
456 while the collector runs.)
457
458 @node Double Smobs
459 @subsection Double Smobs
460
461 @c FIXME: Remove this section?
462
463 Smobs are called smob because they are small: they normally have only
464 room for one @code{void*} or @code{SCM} value plus 16 bits. The
465 reason for this is that smobs are directly implemented by using the
466 low-level, two-word cells of Guile that are also used to implement
467 pairs, for example. (@pxref{Data Representation} for the
468 details.) One word of the two-word cells is used for
469 @code{SCM_SMOB_DATA} (or @code{SCM_SMOB_OBJECT}), the other contains
470 the 16-bit type tag and the 16 extra bits.
471
472 In addition to the fundamental two-word cells, Guile also has
473 four-word cells, which are appropriately called @dfn{double cells}.
474 You can use them for @dfn{double smobs} and get two more immediate
475 words of type @code{scm_t_bits}.
476
477 A double smob is created with @code{scm_new_double_smob}. Its immediate
478 words can be retrieved as @code{scm_t_bits} with @code{SCM_SMOB_DATA_2}
479 and @code{SCM_SMOB_DATA_3} in addition to @code{SCM_SMOB_DATA}.
480 Unsurprisingly, the words can be set to @code{scm_t_bits} values with
481 @code{SCM_SET_SMOB_DATA_2} and @code{SCM_SET_SMOB_DATA_3}.
482
483 Of course there are also @code{SCM_SMOB_OBJECT_2},
484 @code{SCM_SMOB_OBJECT_3}, @code{SCM_SET_SMOB_OBJECT_2}, and
485 @code{SCM_SET_SMOB_OBJECT_3}.
486
487 @node The Complete Example
488 @subsection The Complete Example
489
490 Here is the complete text of the implementation of the image datatype,
491 as presented in the sections above. We also provide a definition for
492 the smob's @emph{print} function, and make some objects and functions
493 static, to clarify exactly what the surrounding code is using.
494
495 As mentioned above, you can find this code in the Guile distribution, in
496 @file{doc/example-smob}. That directory includes a makefile and a
497 suitable @code{main} function, so you can build a complete interactive
498 Guile shell, extended with the datatypes described here.)
499
500 @example
501 /* file "image-type.c" */
502
503 #include <stdlib.h>
504 #include <libguile.h>
505
506 static scm_t_bits image_tag;
507
508 struct image @{
509 int width, height;
510 char *pixels;
511
512 /* The name of this image */
513 SCM name;
514
515 /* A function to call when this image is
516 modified, e.g., to update the screen,
517 or SCM_BOOL_F if no action necessary */
518 SCM update_func;
519 @};
520
521 static SCM
522 make_image (SCM name, SCM s_width, SCM s_height)
523 @{
524 SCM smob;
525 struct image *image;
526 int width = scm_to_int (s_width);
527 int height = scm_to_int (s_height);
528
529 /* Step 1: Allocate the memory block.
530 */
531 image = (struct image *)
532 scm_gc_malloc (sizeof (struct image), "image");
533
534 /* Step 2: Initialize it with straight code.
535 */
536 image->width = width;
537 image->height = height;
538 image->pixels = NULL;
539 image->name = SCM_BOOL_F;
540 image->update_func = SCM_BOOL_F;
541
542 /* Step 3: Create the smob.
543 */
544 smob = scm_new_smob (image_tag, image);
545
546 /* Step 4: Finish the initialization.
547 */
548 image->name = name;
549 image->pixels =
550 scm_gc_malloc (width * height, "image pixels");
551
552 return smob;
553 @}
554
555 SCM
556 clear_image (SCM image_smob)
557 @{
558 int area;
559 struct image *image;
560
561 scm_assert_smob_type (image_tag, image_smob);
562
563 image = (struct image *) SCM_SMOB_DATA (image_smob);
564 area = image->width * image->height;
565 memset (image->pixels, 0, area);
566
567 /* Invoke the image's update function.
568 */
569 if (scm_is_true (image->update_func))
570 scm_call_0 (image->update_func);
571
572 scm_remember_upto_here_1 (image_smob);
573
574 return SCM_UNSPECIFIED;
575 @}
576
577 static SCM
578 mark_image (SCM image_smob)
579 @{
580 /* Mark the image's name and update function. */
581 struct image *image = (struct image *) SCM_SMOB_DATA (image_smob);
582
583 scm_gc_mark (image->name);
584 return image->update_func;
585 @}
586
587 static size_t
588 free_image (SCM image_smob)
589 @{
590 struct image *image = (struct image *) SCM_SMOB_DATA (image_smob);
591
592 scm_gc_free (image->pixels,
593 image->width * image->height,
594 "image pixels");
595 scm_gc_free (image, sizeof (struct image), "image");
596
597 return 0;
598 @}
599
600 static int
601 print_image (SCM image_smob, SCM port, scm_print_state *pstate)
602 @{
603 struct image *image = (struct image *) SCM_SMOB_DATA (image_smob);
604
605 scm_puts ("#<image ", port);
606 scm_display (image->name, port);
607 scm_puts (">", port);
608
609 /* non-zero means success */
610 return 1;
611 @}
612
613 void
614 init_image_type (void)
615 @{
616 image_tag = scm_make_smob_type ("image", sizeof (struct image));
617 scm_set_smob_mark (image_tag, mark_image);
618 scm_set_smob_free (image_tag, free_image);
619 scm_set_smob_print (image_tag, print_image);
620
621 scm_c_define_gsubr ("clear-image", 1, 0, 0, clear_image);
622 scm_c_define_gsubr ("make-image", 3, 0, 0, make_image);
623 @}
624 @end example
625
626 Here is a sample build and interaction with the code from the
627 @file{example-smob} directory, on the author's machine:
628
629 @example
630 zwingli:example-smob$ make CC=gcc
631 gcc `pkg-config --cflags guile-@value{EFFECTIVE-VERSION}` -c image-type.c -o image-type.o
632 gcc `pkg-config --cflags guile-@value{EFFECTIVE-VERSION}` -c myguile.c -o myguile.o
633 gcc image-type.o myguile.o `pkg-config --libs guile-@value{EFFECTIVE-VERSION}` -o myguile
634 zwingli:example-smob$ ./myguile
635 guile> make-image
636 #<primitive-procedure make-image>
637 guile> (define i (make-image "Whistler's Mother" 100 100))
638 guile> i
639 #<image Whistler's Mother>
640 guile> (clear-image i)
641 guile> (clear-image 4)
642 ERROR: In procedure clear-image in expression (clear-image 4):
643 ERROR: Wrong type (expecting image): 4
644 ABORT: (wrong-type-arg)
645
646 Type "(backtrace)" to get more information.
647 guile>
648 @end example