Add (ice-9 unicode) module
[bpt/guile.git] / doc / ref / api-data.texi
1 @c -*-texinfo-*-
2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2006, 2007,
4 @c 2008, 2009, 2010, 2011, 2012, 2013, 2014 Free Software Foundation, Inc.
5 @c See the file guile.texi for copying conditions.
6
7 @node Simple Data Types
8 @section Simple Generic Data Types
9
10 This chapter describes those of Guile's simple data types which are
11 primarily used for their role as items of generic data. By
12 @dfn{simple} we mean data types that are not primarily used as
13 containers to hold other data --- i.e.@: pairs, lists, vectors and so on.
14 For the documentation of such @dfn{compound} data types, see
15 @ref{Compound Data Types}.
16
17 @c One of the great strengths of Scheme is that there is no straightforward
18 @c distinction between ``data'' and ``functionality''. For example,
19 @c Guile's support for dynamic linking could be described:
20
21 @c @itemize @bullet
22 @c @item
23 @c either in a ``data-centric'' way, as the behaviour and properties of the
24 @c ``dynamically linked object'' data type, and the operations that may be
25 @c applied to instances of this type
26
27 @c @item
28 @c or in a ``functionality-centric'' way, as the set of procedures that
29 @c constitute Guile's support for dynamic linking, in the context of the
30 @c module system.
31 @c @end itemize
32
33 @c The contents of this chapter are, therefore, a matter of judgment. By
34 @c @dfn{generic}, we mean to select those data types whose typical use as
35 @c @emph{data} in a wide variety of programming contexts is more important
36 @c than their use in the implementation of a particular piece of
37 @c @emph{functionality}. The last section of this chapter provides
38 @c references for all the data types that are documented not here but in a
39 @c ``functionality-centric'' way elsewhere in the manual.
40
41 @menu
42 * Booleans:: True/false values.
43 * Numbers:: Numerical data types.
44 * Characters:: Single characters.
45 * Character Sets:: Sets of characters.
46 * Strings:: Sequences of characters.
47 * Bytevectors:: Sequences of bytes.
48 * Symbols:: Symbols.
49 * Keywords:: Self-quoting, customizable display keywords.
50 * Other Types:: "Functionality-centric" data types.
51 @end menu
52
53
54 @node Booleans
55 @subsection Booleans
56 @tpindex Booleans
57
58 The two boolean values are @code{#t} for true and @code{#f} for false.
59 They can also be written as @code{#true} and @code{#false}, as per R7RS.
60
61 Boolean values are returned by predicate procedures, such as the general
62 equality predicates @code{eq?}, @code{eqv?} and @code{equal?}
63 (@pxref{Equality}) and numerical and string comparison operators like
64 @code{string=?} (@pxref{String Comparison}) and @code{<=}
65 (@pxref{Comparison}).
66
67 @lisp
68 (<= 3 8)
69 @result{} #t
70
71 (<= 3 -3)
72 @result{} #f
73
74 (equal? "house" "houses")
75 @result{} #f
76
77 (eq? #f #f)
78 @result{}
79 #t
80 @end lisp
81
82 In test condition contexts like @code{if} and @code{cond}
83 (@pxref{Conditionals}), where a group of subexpressions will be
84 evaluated only if a @var{condition} expression evaluates to ``true'',
85 ``true'' means any value at all except @code{#f}.
86
87 @lisp
88 (if #t "yes" "no")
89 @result{} "yes"
90
91 (if 0 "yes" "no")
92 @result{} "yes"
93
94 (if #f "yes" "no")
95 @result{} "no"
96 @end lisp
97
98 A result of this asymmetry is that typical Scheme source code more often
99 uses @code{#f} explicitly than @code{#t}: @code{#f} is necessary to
100 represent an @code{if} or @code{cond} false value, whereas @code{#t} is
101 not necessary to represent an @code{if} or @code{cond} true value.
102
103 It is important to note that @code{#f} is @strong{not} equivalent to any
104 other Scheme value. In particular, @code{#f} is not the same as the
105 number 0 (like in C and C++), and not the same as the ``empty list''
106 (like in some Lisp dialects).
107
108 In C, the two Scheme boolean values are available as the two constants
109 @code{SCM_BOOL_T} for @code{#t} and @code{SCM_BOOL_F} for @code{#f}.
110 Care must be taken with the false value @code{SCM_BOOL_F}: it is not
111 false when used in C conditionals. In order to test for it, use
112 @code{scm_is_false} or @code{scm_is_true}.
113
114 @rnindex not
115 @deffn {Scheme Procedure} not x
116 @deffnx {C Function} scm_not (x)
117 Return @code{#t} if @var{x} is @code{#f}, else return @code{#f}.
118 @end deffn
119
120 @rnindex boolean?
121 @deffn {Scheme Procedure} boolean? obj
122 @deffnx {C Function} scm_boolean_p (obj)
123 Return @code{#t} if @var{obj} is either @code{#t} or @code{#f}, else
124 return @code{#f}.
125 @end deffn
126
127 @deftypevr {C Macro} SCM SCM_BOOL_T
128 The @code{SCM} representation of the Scheme object @code{#t}.
129 @end deftypevr
130
131 @deftypevr {C Macro} SCM SCM_BOOL_F
132 The @code{SCM} representation of the Scheme object @code{#f}.
133 @end deftypevr
134
135 @deftypefn {C Function} int scm_is_true (SCM obj)
136 Return @code{0} if @var{obj} is @code{#f}, else return @code{1}.
137 @end deftypefn
138
139 @deftypefn {C Function} int scm_is_false (SCM obj)
140 Return @code{1} if @var{obj} is @code{#f}, else return @code{0}.
141 @end deftypefn
142
143 @deftypefn {C Function} int scm_is_bool (SCM obj)
144 Return @code{1} if @var{obj} is either @code{#t} or @code{#f}, else
145 return @code{0}.
146 @end deftypefn
147
148 @deftypefn {C Function} SCM scm_from_bool (int val)
149 Return @code{#f} if @var{val} is @code{0}, else return @code{#t}.
150 @end deftypefn
151
152 @deftypefn {C Function} int scm_to_bool (SCM val)
153 Return @code{1} if @var{val} is @code{SCM_BOOL_T}, return @code{0}
154 when @var{val} is @code{SCM_BOOL_F}, else signal a `wrong type' error.
155
156 You should probably use @code{scm_is_true} instead of this function
157 when you just want to test a @code{SCM} value for trueness.
158 @end deftypefn
159
160 @node Numbers
161 @subsection Numerical data types
162 @tpindex Numbers
163
164 Guile supports a rich ``tower'' of numerical types --- integer,
165 rational, real and complex --- and provides an extensive set of
166 mathematical and scientific functions for operating on numerical
167 data. This section of the manual documents those types and functions.
168
169 You may also find it illuminating to read R5RS's presentation of numbers
170 in Scheme, which is particularly clear and accessible: see
171 @ref{Numbers,,,r5rs,R5RS}.
172
173 @menu
174 * Numerical Tower:: Scheme's numerical "tower".
175 * Integers:: Whole numbers.
176 * Reals and Rationals:: Real and rational numbers.
177 * Complex Numbers:: Complex numbers.
178 * Exactness:: Exactness and inexactness.
179 * Number Syntax:: Read syntax for numerical data.
180 * Integer Operations:: Operations on integer values.
181 * Comparison:: Comparison predicates.
182 * Conversion:: Converting numbers to and from strings.
183 * Complex:: Complex number operations.
184 * Arithmetic:: Arithmetic functions.
185 * Scientific:: Scientific functions.
186 * Bitwise Operations:: Logical AND, OR, NOT, and so on.
187 * Random:: Random number generation.
188 @end menu
189
190
191 @node Numerical Tower
192 @subsubsection Scheme's Numerical ``Tower''
193 @rnindex number?
194
195 Scheme's numerical ``tower'' consists of the following categories of
196 numbers:
197
198 @table @dfn
199 @item integers
200 Whole numbers, positive or negative; e.g.@: --5, 0, 18.
201
202 @item rationals
203 The set of numbers that can be expressed as @math{@var{p}/@var{q}}
204 where @var{p} and @var{q} are integers; e.g.@: @math{9/16} works, but
205 pi (an irrational number) doesn't. These include integers
206 (@math{@var{n}/1}).
207
208 @item real numbers
209 The set of numbers that describes all possible positions along a
210 one-dimensional line. This includes rationals as well as irrational
211 numbers.
212
213 @item complex numbers
214 The set of numbers that describes all possible positions in a two
215 dimensional space. This includes real as well as imaginary numbers
216 (@math{@var{a}+@var{b}i}, where @var{a} is the @dfn{real part},
217 @var{b} is the @dfn{imaginary part}, and @math{i} is the square root of
218 @minus{}1.)
219 @end table
220
221 It is called a tower because each category ``sits on'' the one that
222 follows it, in the sense that every integer is also a rational, every
223 rational is also real, and every real number is also a complex number
224 (but with zero imaginary part).
225
226 In addition to the classification into integers, rationals, reals and
227 complex numbers, Scheme also distinguishes between whether a number is
228 represented exactly or not. For example, the result of
229 @m{2\sin(\pi/4),2*sin(pi/4)} is exactly @m{\sqrt{2},2^(1/2)}, but Guile
230 can represent neither @m{\pi/4,pi/4} nor @m{\sqrt{2},2^(1/2)} exactly.
231 Instead, it stores an inexact approximation, using the C type
232 @code{double}.
233
234 Guile can represent exact rationals of any magnitude, inexact
235 rationals that fit into a C @code{double}, and inexact complex numbers
236 with @code{double} real and imaginary parts.
237
238 The @code{number?} predicate may be applied to any Scheme value to
239 discover whether the value is any of the supported numerical types.
240
241 @deffn {Scheme Procedure} number? obj
242 @deffnx {C Function} scm_number_p (obj)
243 Return @code{#t} if @var{obj} is any kind of number, else @code{#f}.
244 @end deffn
245
246 For example:
247
248 @lisp
249 (number? 3)
250 @result{} #t
251
252 (number? "hello there!")
253 @result{} #f
254
255 (define pi 3.141592654)
256 (number? pi)
257 @result{} #t
258 @end lisp
259
260 @deftypefn {C Function} int scm_is_number (SCM obj)
261 This is equivalent to @code{scm_is_true (scm_number_p (obj))}.
262 @end deftypefn
263
264 The next few subsections document each of Guile's numerical data types
265 in detail.
266
267 @node Integers
268 @subsubsection Integers
269
270 @tpindex Integer numbers
271
272 @rnindex integer?
273
274 Integers are whole numbers, that is numbers with no fractional part,
275 such as 2, 83, and @minus{}3789.
276
277 Integers in Guile can be arbitrarily big, as shown by the following
278 example.
279
280 @lisp
281 (define (factorial n)
282 (let loop ((n n) (product 1))
283 (if (= n 0)
284 product
285 (loop (- n 1) (* product n)))))
286
287 (factorial 3)
288 @result{} 6
289
290 (factorial 20)
291 @result{} 2432902008176640000
292
293 (- (factorial 45))
294 @result{} -119622220865480194561963161495657715064383733760000000000
295 @end lisp
296
297 Readers whose background is in programming languages where integers are
298 limited by the need to fit into just 4 or 8 bytes of memory may find
299 this surprising, or suspect that Guile's representation of integers is
300 inefficient. In fact, Guile achieves a near optimal balance of
301 convenience and efficiency by using the host computer's native
302 representation of integers where possible, and a more general
303 representation where the required number does not fit in the native
304 form. Conversion between these two representations is automatic and
305 completely invisible to the Scheme level programmer.
306
307 C has a host of different integer types, and Guile offers a host of
308 functions to convert between them and the @code{SCM} representation.
309 For example, a C @code{int} can be handled with @code{scm_to_int} and
310 @code{scm_from_int}. Guile also defines a few C integer types of its
311 own, to help with differences between systems.
312
313 C integer types that are not covered can be handled with the generic
314 @code{scm_to_signed_integer} and @code{scm_from_signed_integer} for
315 signed types, or with @code{scm_to_unsigned_integer} and
316 @code{scm_from_unsigned_integer} for unsigned types.
317
318 Scheme integers can be exact and inexact. For example, a number
319 written as @code{3.0} with an explicit decimal-point is inexact, but
320 it is also an integer. The functions @code{integer?} and
321 @code{scm_is_integer} report true for such a number, but the functions
322 @code{exact-integer?}, @code{scm_is_exact_integer},
323 @code{scm_is_signed_integer}, and @code{scm_is_unsigned_integer} only
324 allow exact integers and thus report false. Likewise, the conversion
325 functions like @code{scm_to_signed_integer} only accept exact
326 integers.
327
328 The motivation for this behavior is that the inexactness of a number
329 should not be lost silently. If you want to allow inexact integers,
330 you can explicitly insert a call to @code{inexact->exact} or to its C
331 equivalent @code{scm_inexact_to_exact}. (Only inexact integers will
332 be converted by this call into exact integers; inexact non-integers
333 will become exact fractions.)
334
335 @deffn {Scheme Procedure} integer? x
336 @deffnx {C Function} scm_integer_p (x)
337 Return @code{#t} if @var{x} is an exact or inexact integer number, else
338 return @code{#f}.
339
340 @lisp
341 (integer? 487)
342 @result{} #t
343
344 (integer? 3.0)
345 @result{} #t
346
347 (integer? -3.4)
348 @result{} #f
349
350 (integer? +inf.0)
351 @result{} #f
352 @end lisp
353 @end deffn
354
355 @deftypefn {C Function} int scm_is_integer (SCM x)
356 This is equivalent to @code{scm_is_true (scm_integer_p (x))}.
357 @end deftypefn
358
359 @deffn {Scheme Procedure} exact-integer? x
360 @deffnx {C Function} scm_exact_integer_p (x)
361 Return @code{#t} if @var{x} is an exact integer number, else
362 return @code{#f}.
363
364 @lisp
365 (exact-integer? 37)
366 @result{} #t
367
368 (exact-integer? 3.0)
369 @result{} #f
370 @end lisp
371 @end deffn
372
373 @deftypefn {C Function} int scm_is_exact_integer (SCM x)
374 This is equivalent to @code{scm_is_true (scm_exact_integer_p (x))}.
375 @end deftypefn
376
377 @defvr {C Type} scm_t_int8
378 @defvrx {C Type} scm_t_uint8
379 @defvrx {C Type} scm_t_int16
380 @defvrx {C Type} scm_t_uint16
381 @defvrx {C Type} scm_t_int32
382 @defvrx {C Type} scm_t_uint32
383 @defvrx {C Type} scm_t_int64
384 @defvrx {C Type} scm_t_uint64
385 @defvrx {C Type} scm_t_intmax
386 @defvrx {C Type} scm_t_uintmax
387 The C types are equivalent to the corresponding ISO C types but are
388 defined on all platforms, with the exception of @code{scm_t_int64} and
389 @code{scm_t_uint64}, which are only defined when a 64-bit type is
390 available. For example, @code{scm_t_int8} is equivalent to
391 @code{int8_t}.
392
393 You can regard these definitions as a stop-gap measure until all
394 platforms provide these types. If you know that all the platforms
395 that you are interested in already provide these types, it is better
396 to use them directly instead of the types provided by Guile.
397 @end defvr
398
399 @deftypefn {C Function} int scm_is_signed_integer (SCM x, scm_t_intmax min, scm_t_intmax max)
400 @deftypefnx {C Function} int scm_is_unsigned_integer (SCM x, scm_t_uintmax min, scm_t_uintmax max)
401 Return @code{1} when @var{x} represents an exact integer that is
402 between @var{min} and @var{max}, inclusive.
403
404 These functions can be used to check whether a @code{SCM} value will
405 fit into a given range, such as the range of a given C integer type.
406 If you just want to convert a @code{SCM} value to a given C integer
407 type, use one of the conversion functions directly.
408 @end deftypefn
409
410 @deftypefn {C Function} scm_t_intmax scm_to_signed_integer (SCM x, scm_t_intmax min, scm_t_intmax max)
411 @deftypefnx {C Function} scm_t_uintmax scm_to_unsigned_integer (SCM x, scm_t_uintmax min, scm_t_uintmax max)
412 When @var{x} represents an exact integer that is between @var{min} and
413 @var{max} inclusive, return that integer. Else signal an error,
414 either a `wrong-type' error when @var{x} is not an exact integer, or
415 an `out-of-range' error when it doesn't fit the given range.
416 @end deftypefn
417
418 @deftypefn {C Function} SCM scm_from_signed_integer (scm_t_intmax x)
419 @deftypefnx {C Function} SCM scm_from_unsigned_integer (scm_t_uintmax x)
420 Return the @code{SCM} value that represents the integer @var{x}. This
421 function will always succeed and will always return an exact number.
422 @end deftypefn
423
424 @deftypefn {C Function} char scm_to_char (SCM x)
425 @deftypefnx {C Function} {signed char} scm_to_schar (SCM x)
426 @deftypefnx {C Function} {unsigned char} scm_to_uchar (SCM x)
427 @deftypefnx {C Function} short scm_to_short (SCM x)
428 @deftypefnx {C Function} {unsigned short} scm_to_ushort (SCM x)
429 @deftypefnx {C Function} int scm_to_int (SCM x)
430 @deftypefnx {C Function} {unsigned int} scm_to_uint (SCM x)
431 @deftypefnx {C Function} long scm_to_long (SCM x)
432 @deftypefnx {C Function} {unsigned long} scm_to_ulong (SCM x)
433 @deftypefnx {C Function} {long long} scm_to_long_long (SCM x)
434 @deftypefnx {C Function} {unsigned long long} scm_to_ulong_long (SCM x)
435 @deftypefnx {C Function} size_t scm_to_size_t (SCM x)
436 @deftypefnx {C Function} ssize_t scm_to_ssize_t (SCM x)
437 @deftypefnx {C Function} scm_t_ptrdiff scm_to_ptrdiff_t (SCM x)
438 @deftypefnx {C Function} scm_t_int8 scm_to_int8 (SCM x)
439 @deftypefnx {C Function} scm_t_uint8 scm_to_uint8 (SCM x)
440 @deftypefnx {C Function} scm_t_int16 scm_to_int16 (SCM x)
441 @deftypefnx {C Function} scm_t_uint16 scm_to_uint16 (SCM x)
442 @deftypefnx {C Function} scm_t_int32 scm_to_int32 (SCM x)
443 @deftypefnx {C Function} scm_t_uint32 scm_to_uint32 (SCM x)
444 @deftypefnx {C Function} scm_t_int64 scm_to_int64 (SCM x)
445 @deftypefnx {C Function} scm_t_uint64 scm_to_uint64 (SCM x)
446 @deftypefnx {C Function} scm_t_intmax scm_to_intmax (SCM x)
447 @deftypefnx {C Function} scm_t_uintmax scm_to_uintmax (SCM x)
448 When @var{x} represents an exact integer that fits into the indicated
449 C type, return that integer. Else signal an error, either a
450 `wrong-type' error when @var{x} is not an exact integer, or an
451 `out-of-range' error when it doesn't fit the given range.
452
453 The functions @code{scm_to_long_long}, @code{scm_to_ulong_long},
454 @code{scm_to_int64}, and @code{scm_to_uint64} are only available when
455 the corresponding types are.
456 @end deftypefn
457
458 @deftypefn {C Function} SCM scm_from_char (char x)
459 @deftypefnx {C Function} SCM scm_from_schar (signed char x)
460 @deftypefnx {C Function} SCM scm_from_uchar (unsigned char x)
461 @deftypefnx {C Function} SCM scm_from_short (short x)
462 @deftypefnx {C Function} SCM scm_from_ushort (unsigned short x)
463 @deftypefnx {C Function} SCM scm_from_int (int x)
464 @deftypefnx {C Function} SCM scm_from_uint (unsigned int x)
465 @deftypefnx {C Function} SCM scm_from_long (long x)
466 @deftypefnx {C Function} SCM scm_from_ulong (unsigned long x)
467 @deftypefnx {C Function} SCM scm_from_long_long (long long x)
468 @deftypefnx {C Function} SCM scm_from_ulong_long (unsigned long long x)
469 @deftypefnx {C Function} SCM scm_from_size_t (size_t x)
470 @deftypefnx {C Function} SCM scm_from_ssize_t (ssize_t x)
471 @deftypefnx {C Function} SCM scm_from_ptrdiff_t (scm_t_ptrdiff x)
472 @deftypefnx {C Function} SCM scm_from_int8 (scm_t_int8 x)
473 @deftypefnx {C Function} SCM scm_from_uint8 (scm_t_uint8 x)
474 @deftypefnx {C Function} SCM scm_from_int16 (scm_t_int16 x)
475 @deftypefnx {C Function} SCM scm_from_uint16 (scm_t_uint16 x)
476 @deftypefnx {C Function} SCM scm_from_int32 (scm_t_int32 x)
477 @deftypefnx {C Function} SCM scm_from_uint32 (scm_t_uint32 x)
478 @deftypefnx {C Function} SCM scm_from_int64 (scm_t_int64 x)
479 @deftypefnx {C Function} SCM scm_from_uint64 (scm_t_uint64 x)
480 @deftypefnx {C Function} SCM scm_from_intmax (scm_t_intmax x)
481 @deftypefnx {C Function} SCM scm_from_uintmax (scm_t_uintmax x)
482 Return the @code{SCM} value that represents the integer @var{x}.
483 These functions will always succeed and will always return an exact
484 number.
485 @end deftypefn
486
487 @deftypefn {C Function} void scm_to_mpz (SCM val, mpz_t rop)
488 Assign @var{val} to the multiple precision integer @var{rop}.
489 @var{val} must be an exact integer, otherwise an error will be
490 signalled. @var{rop} must have been initialized with @code{mpz_init}
491 before this function is called. When @var{rop} is no longer needed
492 the occupied space must be freed with @code{mpz_clear}.
493 @xref{Initializing Integers,,, gmp, GNU MP Manual}, for details.
494 @end deftypefn
495
496 @deftypefn {C Function} SCM scm_from_mpz (mpz_t val)
497 Return the @code{SCM} value that represents @var{val}.
498 @end deftypefn
499
500 @node Reals and Rationals
501 @subsubsection Real and Rational Numbers
502 @tpindex Real numbers
503 @tpindex Rational numbers
504
505 @rnindex real?
506 @rnindex rational?
507
508 Mathematically, the real numbers are the set of numbers that describe
509 all possible points along a continuous, infinite, one-dimensional line.
510 The rational numbers are the set of all numbers that can be written as
511 fractions @var{p}/@var{q}, where @var{p} and @var{q} are integers.
512 All rational numbers are also real, but there are real numbers that
513 are not rational, for example @m{\sqrt{2}, the square root of 2}, and
514 @m{\pi,pi}.
515
516 Guile can represent both exact and inexact rational numbers, but it
517 cannot represent precise finite irrational numbers. Exact rationals are
518 represented by storing the numerator and denominator as two exact
519 integers. Inexact rationals are stored as floating point numbers using
520 the C type @code{double}.
521
522 Exact rationals are written as a fraction of integers. There must be
523 no whitespace around the slash:
524
525 @lisp
526 1/2
527 -22/7
528 @end lisp
529
530 Even though the actual encoding of inexact rationals is in binary, it
531 may be helpful to think of it as a decimal number with a limited
532 number of significant figures and a decimal point somewhere, since
533 this corresponds to the standard notation for non-whole numbers. For
534 example:
535
536 @lisp
537 0.34
538 -0.00000142857931198
539 -5648394822220000000000.0
540 4.0
541 @end lisp
542
543 The limited precision of Guile's encoding means that any finite ``real''
544 number in Guile can be written in a rational form, by multiplying and
545 then dividing by sufficient powers of 10 (or in fact, 2). For example,
546 @samp{-0.00000142857931198} is the same as @minus{}142857931198 divided
547 by 100000000000000000. In Guile's current incarnation, therefore, the
548 @code{rational?} and @code{real?} predicates are equivalent for finite
549 numbers.
550
551
552 Dividing by an exact zero leads to a error message, as one might expect.
553 However, dividing by an inexact zero does not produce an error.
554 Instead, the result of the division is either plus or minus infinity,
555 depending on the sign of the divided number and the sign of the zero
556 divisor (some platforms support signed zeroes @samp{-0.0} and
557 @samp{+0.0}; @samp{0.0} is the same as @samp{+0.0}).
558
559 Dividing zero by an inexact zero yields a @acronym{NaN} (`not a number')
560 value, although they are actually considered numbers by Scheme.
561 Attempts to compare a @acronym{NaN} value with any number (including
562 itself) using @code{=}, @code{<}, @code{>}, @code{<=} or @code{>=}
563 always returns @code{#f}. Although a @acronym{NaN} value is not
564 @code{=} to itself, it is both @code{eqv?} and @code{equal?} to itself
565 and other @acronym{NaN} values. However, the preferred way to test for
566 them is by using @code{nan?}.
567
568 The real @acronym{NaN} values and infinities are written @samp{+nan.0},
569 @samp{+inf.0} and @samp{-inf.0}. This syntax is also recognized by
570 @code{read} as an extension to the usual Scheme syntax. These special
571 values are considered by Scheme to be inexact real numbers but not
572 rational. Note that non-real complex numbers may also contain
573 infinities or @acronym{NaN} values in their real or imaginary parts. To
574 test a real number to see if it is infinite, a @acronym{NaN} value, or
575 neither, use @code{inf?}, @code{nan?}, or @code{finite?}, respectively.
576 Every real number in Scheme belongs to precisely one of those three
577 classes.
578
579 On platforms that follow @acronym{IEEE} 754 for their floating point
580 arithmetic, the @samp{+inf.0}, @samp{-inf.0}, and @samp{+nan.0} values
581 are implemented using the corresponding @acronym{IEEE} 754 values.
582 They behave in arithmetic operations like @acronym{IEEE} 754 describes
583 it, i.e., @code{(= +nan.0 +nan.0)} @result{} @code{#f}.
584
585 @deffn {Scheme Procedure} real? obj
586 @deffnx {C Function} scm_real_p (obj)
587 Return @code{#t} if @var{obj} is a real number, else @code{#f}. Note
588 that the sets of integer and rational values form subsets of the set
589 of real numbers, so the predicate will also be fulfilled if @var{obj}
590 is an integer number or a rational number.
591 @end deffn
592
593 @deffn {Scheme Procedure} rational? x
594 @deffnx {C Function} scm_rational_p (x)
595 Return @code{#t} if @var{x} is a rational number, @code{#f} otherwise.
596 Note that the set of integer values forms a subset of the set of
597 rational numbers, i.e.@: the predicate will also be fulfilled if
598 @var{x} is an integer number.
599 @end deffn
600
601 @deffn {Scheme Procedure} rationalize x eps
602 @deffnx {C Function} scm_rationalize (x, eps)
603 Returns the @emph{simplest} rational number differing
604 from @var{x} by no more than @var{eps}.
605
606 As required by @acronym{R5RS}, @code{rationalize} only returns an
607 exact result when both its arguments are exact. Thus, you might need
608 to use @code{inexact->exact} on the arguments.
609
610 @lisp
611 (rationalize (inexact->exact 1.2) 1/100)
612 @result{} 6/5
613 @end lisp
614
615 @end deffn
616
617 @deffn {Scheme Procedure} inf? x
618 @deffnx {C Function} scm_inf_p (x)
619 Return @code{#t} if the real number @var{x} is @samp{+inf.0} or
620 @samp{-inf.0}. Otherwise return @code{#f}.
621 @end deffn
622
623 @deffn {Scheme Procedure} nan? x
624 @deffnx {C Function} scm_nan_p (x)
625 Return @code{#t} if the real number @var{x} is @samp{+nan.0}, or
626 @code{#f} otherwise.
627 @end deffn
628
629 @deffn {Scheme Procedure} finite? x
630 @deffnx {C Function} scm_finite_p (x)
631 Return @code{#t} if the real number @var{x} is neither infinite nor a
632 NaN, @code{#f} otherwise.
633 @end deffn
634
635 @deffn {Scheme Procedure} nan
636 @deffnx {C Function} scm_nan ()
637 Return @samp{+nan.0}, a @acronym{NaN} value.
638 @end deffn
639
640 @deffn {Scheme Procedure} inf
641 @deffnx {C Function} scm_inf ()
642 Return @samp{+inf.0}, positive infinity.
643 @end deffn
644
645 @deffn {Scheme Procedure} numerator x
646 @deffnx {C Function} scm_numerator (x)
647 Return the numerator of the rational number @var{x}.
648 @end deffn
649
650 @deffn {Scheme Procedure} denominator x
651 @deffnx {C Function} scm_denominator (x)
652 Return the denominator of the rational number @var{x}.
653 @end deffn
654
655 @deftypefn {C Function} int scm_is_real (SCM val)
656 @deftypefnx {C Function} int scm_is_rational (SCM val)
657 Equivalent to @code{scm_is_true (scm_real_p (val))} and
658 @code{scm_is_true (scm_rational_p (val))}, respectively.
659 @end deftypefn
660
661 @deftypefn {C Function} double scm_to_double (SCM val)
662 Returns the number closest to @var{val} that is representable as a
663 @code{double}. Returns infinity for a @var{val} that is too large in
664 magnitude. The argument @var{val} must be a real number.
665 @end deftypefn
666
667 @deftypefn {C Function} SCM scm_from_double (double val)
668 Return the @code{SCM} value that represents @var{val}. The returned
669 value is inexact according to the predicate @code{inexact?}, but it
670 will be exactly equal to @var{val}.
671 @end deftypefn
672
673 @node Complex Numbers
674 @subsubsection Complex Numbers
675 @tpindex Complex numbers
676
677 @rnindex complex?
678
679 Complex numbers are the set of numbers that describe all possible points
680 in a two-dimensional space. The two coordinates of a particular point
681 in this space are known as the @dfn{real} and @dfn{imaginary} parts of
682 the complex number that describes that point.
683
684 In Guile, complex numbers are written in rectangular form as the sum of
685 their real and imaginary parts, using the symbol @code{i} to indicate
686 the imaginary part.
687
688 @lisp
689 3+4i
690 @result{}
691 3.0+4.0i
692
693 (* 3-8i 2.3+0.3i)
694 @result{}
695 9.3-17.5i
696 @end lisp
697
698 @cindex polar form
699 @noindent
700 Polar form can also be used, with an @samp{@@} between magnitude and
701 angle,
702
703 @lisp
704 1@@3.141592 @result{} -1.0 (approx)
705 -1@@1.57079 @result{} 0.0-1.0i (approx)
706 @end lisp
707
708 Guile represents a complex number as a pair of inexact reals, so the
709 real and imaginary parts of a complex number have the same properties of
710 inexactness and limited precision as single inexact real numbers.
711
712 Note that each part of a complex number may contain any inexact real
713 value, including the special values @samp{+nan.0}, @samp{+inf.0} and
714 @samp{-inf.0}, as well as either of the signed zeroes @samp{0.0} or
715 @samp{-0.0}.
716
717
718 @deffn {Scheme Procedure} complex? z
719 @deffnx {C Function} scm_complex_p (z)
720 Return @code{#t} if @var{z} is a complex number, @code{#f}
721 otherwise. Note that the sets of real, rational and integer
722 values form subsets of the set of complex numbers, i.e.@: the
723 predicate will also be fulfilled if @var{z} is a real,
724 rational or integer number.
725 @end deffn
726
727 @deftypefn {C Function} int scm_is_complex (SCM val)
728 Equivalent to @code{scm_is_true (scm_complex_p (val))}.
729 @end deftypefn
730
731 @node Exactness
732 @subsubsection Exact and Inexact Numbers
733 @tpindex Exact numbers
734 @tpindex Inexact numbers
735
736 @rnindex exact?
737 @rnindex inexact?
738 @rnindex exact->inexact
739 @rnindex inexact->exact
740
741 R5RS requires that, with few exceptions, a calculation involving inexact
742 numbers always produces an inexact result. To meet this requirement,
743 Guile distinguishes between an exact integer value such as @samp{5} and
744 the corresponding inexact integer value which, to the limited precision
745 available, has no fractional part, and is printed as @samp{5.0}. Guile
746 will only convert the latter value to the former when forced to do so by
747 an invocation of the @code{inexact->exact} procedure.
748
749 The only exception to the above requirement is when the values of the
750 inexact numbers do not affect the result. For example @code{(expt n 0)}
751 is @samp{1} for any value of @code{n}, therefore @code{(expt 5.0 0)} is
752 permitted to return an exact @samp{1}.
753
754 @deffn {Scheme Procedure} exact? z
755 @deffnx {C Function} scm_exact_p (z)
756 Return @code{#t} if the number @var{z} is exact, @code{#f}
757 otherwise.
758
759 @lisp
760 (exact? 2)
761 @result{} #t
762
763 (exact? 0.5)
764 @result{} #f
765
766 (exact? (/ 2))
767 @result{} #t
768 @end lisp
769
770 @end deffn
771
772 @deftypefn {C Function} int scm_is_exact (SCM z)
773 Return a @code{1} if the number @var{z} is exact, and @code{0}
774 otherwise. This is equivalent to @code{scm_is_true (scm_exact_p (z))}.
775
776 An alternate approch to testing the exactness of a number is to
777 use @code{scm_is_signed_integer} or @code{scm_is_unsigned_integer}.
778 @end deftypefn
779
780 @deffn {Scheme Procedure} inexact? z
781 @deffnx {C Function} scm_inexact_p (z)
782 Return @code{#t} if the number @var{z} is inexact, @code{#f}
783 else.
784 @end deffn
785
786 @deftypefn {C Function} int scm_is_inexact (SCM z)
787 Return a @code{1} if the number @var{z} is inexact, and @code{0}
788 otherwise. This is equivalent to @code{scm_is_true (scm_inexact_p (z))}.
789 @end deftypefn
790
791 @deffn {Scheme Procedure} inexact->exact z
792 @deffnx {C Function} scm_inexact_to_exact (z)
793 Return an exact number that is numerically closest to @var{z}, when
794 there is one. For inexact rationals, Guile returns the exact rational
795 that is numerically equal to the inexact rational. Inexact complex
796 numbers with a non-zero imaginary part can not be made exact.
797
798 @lisp
799 (inexact->exact 0.5)
800 @result{} 1/2
801 @end lisp
802
803 The following happens because 12/10 is not exactly representable as a
804 @code{double} (on most platforms). However, when reading a decimal
805 number that has been marked exact with the ``#e'' prefix, Guile is
806 able to represent it correctly.
807
808 @lisp
809 (inexact->exact 1.2)
810 @result{} 5404319552844595/4503599627370496
811
812 #e1.2
813 @result{} 6/5
814 @end lisp
815
816 @end deffn
817
818 @c begin (texi-doc-string "guile" "exact->inexact")
819 @deffn {Scheme Procedure} exact->inexact z
820 @deffnx {C Function} scm_exact_to_inexact (z)
821 Convert the number @var{z} to its inexact representation.
822 @end deffn
823
824
825 @node Number Syntax
826 @subsubsection Read Syntax for Numerical Data
827
828 The read syntax for integers is a string of digits, optionally
829 preceded by a minus or plus character, a code indicating the
830 base in which the integer is encoded, and a code indicating whether
831 the number is exact or inexact. The supported base codes are:
832
833 @table @code
834 @item #b
835 @itemx #B
836 the integer is written in binary (base 2)
837
838 @item #o
839 @itemx #O
840 the integer is written in octal (base 8)
841
842 @item #d
843 @itemx #D
844 the integer is written in decimal (base 10)
845
846 @item #x
847 @itemx #X
848 the integer is written in hexadecimal (base 16)
849 @end table
850
851 If the base code is omitted, the integer is assumed to be decimal. The
852 following examples show how these base codes are used.
853
854 @lisp
855 -13
856 @result{} -13
857
858 #d-13
859 @result{} -13
860
861 #x-13
862 @result{} -19
863
864 #b+1101
865 @result{} 13
866
867 #o377
868 @result{} 255
869 @end lisp
870
871 The codes for indicating exactness (which can, incidentally, be applied
872 to all numerical values) are:
873
874 @table @code
875 @item #e
876 @itemx #E
877 the number is exact
878
879 @item #i
880 @itemx #I
881 the number is inexact.
882 @end table
883
884 If the exactness indicator is omitted, the number is exact unless it
885 contains a radix point. Since Guile can not represent exact complex
886 numbers, an error is signalled when asking for them.
887
888 @lisp
889 (exact? 1.2)
890 @result{} #f
891
892 (exact? #e1.2)
893 @result{} #t
894
895 (exact? #e+1i)
896 ERROR: Wrong type argument
897 @end lisp
898
899 Guile also understands the syntax @samp{+inf.0} and @samp{-inf.0} for
900 plus and minus infinity, respectively. The value must be written
901 exactly as shown, that is, they always must have a sign and exactly
902 one zero digit after the decimal point. It also understands
903 @samp{+nan.0} and @samp{-nan.0} for the special `not-a-number' value.
904 The sign is ignored for `not-a-number' and the value is always printed
905 as @samp{+nan.0}.
906
907 @node Integer Operations
908 @subsubsection Operations on Integer Values
909 @rnindex odd?
910 @rnindex even?
911 @rnindex quotient
912 @rnindex remainder
913 @rnindex modulo
914 @rnindex gcd
915 @rnindex lcm
916
917 @deffn {Scheme Procedure} odd? n
918 @deffnx {C Function} scm_odd_p (n)
919 Return @code{#t} if @var{n} is an odd number, @code{#f}
920 otherwise.
921 @end deffn
922
923 @deffn {Scheme Procedure} even? n
924 @deffnx {C Function} scm_even_p (n)
925 Return @code{#t} if @var{n} is an even number, @code{#f}
926 otherwise.
927 @end deffn
928
929 @c begin (texi-doc-string "guile" "quotient")
930 @c begin (texi-doc-string "guile" "remainder")
931 @deffn {Scheme Procedure} quotient n d
932 @deffnx {Scheme Procedure} remainder n d
933 @deffnx {C Function} scm_quotient (n, d)
934 @deffnx {C Function} scm_remainder (n, d)
935 Return the quotient or remainder from @var{n} divided by @var{d}. The
936 quotient is rounded towards zero, and the remainder will have the same
937 sign as @var{n}. In all cases quotient and remainder satisfy
938 @math{@var{n} = @var{q}*@var{d} + @var{r}}.
939
940 @lisp
941 (remainder 13 4) @result{} 1
942 (remainder -13 4) @result{} -1
943 @end lisp
944
945 See also @code{truncate-quotient}, @code{truncate-remainder} and
946 related operations in @ref{Arithmetic}.
947 @end deffn
948
949 @c begin (texi-doc-string "guile" "modulo")
950 @deffn {Scheme Procedure} modulo n d
951 @deffnx {C Function} scm_modulo (n, d)
952 Return the remainder from @var{n} divided by @var{d}, with the same
953 sign as @var{d}.
954
955 @lisp
956 (modulo 13 4) @result{} 1
957 (modulo -13 4) @result{} 3
958 (modulo 13 -4) @result{} -3
959 (modulo -13 -4) @result{} -1
960 @end lisp
961
962 See also @code{floor-quotient}, @code{floor-remainder} and
963 related operations in @ref{Arithmetic}.
964 @end deffn
965
966 @c begin (texi-doc-string "guile" "gcd")
967 @deffn {Scheme Procedure} gcd x@dots{}
968 @deffnx {C Function} scm_gcd (x, y)
969 Return the greatest common divisor of all arguments.
970 If called without arguments, 0 is returned.
971
972 The C function @code{scm_gcd} always takes two arguments, while the
973 Scheme function can take an arbitrary number.
974 @end deffn
975
976 @c begin (texi-doc-string "guile" "lcm")
977 @deffn {Scheme Procedure} lcm x@dots{}
978 @deffnx {C Function} scm_lcm (x, y)
979 Return the least common multiple of the arguments.
980 If called without arguments, 1 is returned.
981
982 The C function @code{scm_lcm} always takes two arguments, while the
983 Scheme function can take an arbitrary number.
984 @end deffn
985
986 @deffn {Scheme Procedure} modulo-expt n k m
987 @deffnx {C Function} scm_modulo_expt (n, k, m)
988 Return @var{n} raised to the integer exponent
989 @var{k}, modulo @var{m}.
990
991 @lisp
992 (modulo-expt 2 3 5)
993 @result{} 3
994 @end lisp
995 @end deffn
996
997 @deftypefn {Scheme Procedure} {} exact-integer-sqrt @var{k}
998 @deftypefnx {C Function} void scm_exact_integer_sqrt (SCM @var{k}, SCM *@var{s}, SCM *@var{r})
999 Return two exact non-negative integers @var{s} and @var{r}
1000 such that @math{@var{k} = @var{s}^2 + @var{r}} and
1001 @math{@var{s}^2 <= @var{k} < (@var{s} + 1)^2}.
1002 An error is raised if @var{k} is not an exact non-negative integer.
1003
1004 @lisp
1005 (exact-integer-sqrt 10) @result{} 3 and 1
1006 @end lisp
1007 @end deftypefn
1008
1009 @node Comparison
1010 @subsubsection Comparison Predicates
1011 @rnindex zero?
1012 @rnindex positive?
1013 @rnindex negative?
1014
1015 The C comparison functions below always takes two arguments, while the
1016 Scheme functions can take an arbitrary number. Also keep in mind that
1017 the C functions return one of the Scheme boolean values
1018 @code{SCM_BOOL_T} or @code{SCM_BOOL_F} which are both true as far as C
1019 is concerned. Thus, always write @code{scm_is_true (scm_num_eq_p (x,
1020 y))} when testing the two Scheme numbers @code{x} and @code{y} for
1021 equality, for example.
1022
1023 @c begin (texi-doc-string "guile" "=")
1024 @deffn {Scheme Procedure} =
1025 @deffnx {C Function} scm_num_eq_p (x, y)
1026 Return @code{#t} if all parameters are numerically equal.
1027 @end deffn
1028
1029 @c begin (texi-doc-string "guile" "<")
1030 @deffn {Scheme Procedure} <
1031 @deffnx {C Function} scm_less_p (x, y)
1032 Return @code{#t} if the list of parameters is monotonically
1033 increasing.
1034 @end deffn
1035
1036 @c begin (texi-doc-string "guile" ">")
1037 @deffn {Scheme Procedure} >
1038 @deffnx {C Function} scm_gr_p (x, y)
1039 Return @code{#t} if the list of parameters is monotonically
1040 decreasing.
1041 @end deffn
1042
1043 @c begin (texi-doc-string "guile" "<=")
1044 @deffn {Scheme Procedure} <=
1045 @deffnx {C Function} scm_leq_p (x, y)
1046 Return @code{#t} if the list of parameters is monotonically
1047 non-decreasing.
1048 @end deffn
1049
1050 @c begin (texi-doc-string "guile" ">=")
1051 @deffn {Scheme Procedure} >=
1052 @deffnx {C Function} scm_geq_p (x, y)
1053 Return @code{#t} if the list of parameters is monotonically
1054 non-increasing.
1055 @end deffn
1056
1057 @c begin (texi-doc-string "guile" "zero?")
1058 @deffn {Scheme Procedure} zero? z
1059 @deffnx {C Function} scm_zero_p (z)
1060 Return @code{#t} if @var{z} is an exact or inexact number equal to
1061 zero.
1062 @end deffn
1063
1064 @c begin (texi-doc-string "guile" "positive?")
1065 @deffn {Scheme Procedure} positive? x
1066 @deffnx {C Function} scm_positive_p (x)
1067 Return @code{#t} if @var{x} is an exact or inexact number greater than
1068 zero.
1069 @end deffn
1070
1071 @c begin (texi-doc-string "guile" "negative?")
1072 @deffn {Scheme Procedure} negative? x
1073 @deffnx {C Function} scm_negative_p (x)
1074 Return @code{#t} if @var{x} is an exact or inexact number less than
1075 zero.
1076 @end deffn
1077
1078
1079 @node Conversion
1080 @subsubsection Converting Numbers To and From Strings
1081 @rnindex number->string
1082 @rnindex string->number
1083
1084 The following procedures read and write numbers according to their
1085 external representation as defined by R5RS (@pxref{Lexical structure,
1086 R5RS Lexical Structure,, r5rs, The Revised^5 Report on the Algorithmic
1087 Language Scheme}). @xref{Number Input and Output, the @code{(ice-9
1088 i18n)} module}, for locale-dependent number parsing.
1089
1090 @deffn {Scheme Procedure} number->string n [radix]
1091 @deffnx {C Function} scm_number_to_string (n, radix)
1092 Return a string holding the external representation of the
1093 number @var{n} in the given @var{radix}. If @var{n} is
1094 inexact, a radix of 10 will be used.
1095 @end deffn
1096
1097 @deffn {Scheme Procedure} string->number string [radix]
1098 @deffnx {C Function} scm_string_to_number (string, radix)
1099 Return a number of the maximally precise representation
1100 expressed by the given @var{string}. @var{radix} must be an
1101 exact integer, either 2, 8, 10, or 16. If supplied, @var{radix}
1102 is a default radix that may be overridden by an explicit radix
1103 prefix in @var{string} (e.g.@: "#o177"). If @var{radix} is not
1104 supplied, then the default radix is 10. If string is not a
1105 syntactically valid notation for a number, then
1106 @code{string->number} returns @code{#f}.
1107 @end deffn
1108
1109 @deftypefn {C Function} SCM scm_c_locale_stringn_to_number (const char *string, size_t len, unsigned radix)
1110 As per @code{string->number} above, but taking a C string, as pointer
1111 and length. The string characters should be in the current locale
1112 encoding (@code{locale} in the name refers only to that, there's no
1113 locale-dependent parsing).
1114 @end deftypefn
1115
1116
1117 @node Complex
1118 @subsubsection Complex Number Operations
1119 @rnindex make-rectangular
1120 @rnindex make-polar
1121 @rnindex real-part
1122 @rnindex imag-part
1123 @rnindex magnitude
1124 @rnindex angle
1125
1126 @deffn {Scheme Procedure} make-rectangular real_part imaginary_part
1127 @deffnx {C Function} scm_make_rectangular (real_part, imaginary_part)
1128 Return a complex number constructed of the given @var{real-part} and @var{imaginary-part} parts.
1129 @end deffn
1130
1131 @deffn {Scheme Procedure} make-polar mag ang
1132 @deffnx {C Function} scm_make_polar (mag, ang)
1133 @cindex polar form
1134 Return the complex number @var{mag} * e^(i * @var{ang}).
1135 @end deffn
1136
1137 @c begin (texi-doc-string "guile" "real-part")
1138 @deffn {Scheme Procedure} real-part z
1139 @deffnx {C Function} scm_real_part (z)
1140 Return the real part of the number @var{z}.
1141 @end deffn
1142
1143 @c begin (texi-doc-string "guile" "imag-part")
1144 @deffn {Scheme Procedure} imag-part z
1145 @deffnx {C Function} scm_imag_part (z)
1146 Return the imaginary part of the number @var{z}.
1147 @end deffn
1148
1149 @c begin (texi-doc-string "guile" "magnitude")
1150 @deffn {Scheme Procedure} magnitude z
1151 @deffnx {C Function} scm_magnitude (z)
1152 Return the magnitude of the number @var{z}. This is the same as
1153 @code{abs} for real arguments, but also allows complex numbers.
1154 @end deffn
1155
1156 @c begin (texi-doc-string "guile" "angle")
1157 @deffn {Scheme Procedure} angle z
1158 @deffnx {C Function} scm_angle (z)
1159 Return the angle of the complex number @var{z}.
1160 @end deffn
1161
1162 @deftypefn {C Function} SCM scm_c_make_rectangular (double re, double im)
1163 @deftypefnx {C Function} SCM scm_c_make_polar (double x, double y)
1164 Like @code{scm_make_rectangular} or @code{scm_make_polar},
1165 respectively, but these functions take @code{double}s as their
1166 arguments.
1167 @end deftypefn
1168
1169 @deftypefn {C Function} double scm_c_real_part (z)
1170 @deftypefnx {C Function} double scm_c_imag_part (z)
1171 Returns the real or imaginary part of @var{z} as a @code{double}.
1172 @end deftypefn
1173
1174 @deftypefn {C Function} double scm_c_magnitude (z)
1175 @deftypefnx {C Function} double scm_c_angle (z)
1176 Returns the magnitude or angle of @var{z} as a @code{double}.
1177 @end deftypefn
1178
1179
1180 @node Arithmetic
1181 @subsubsection Arithmetic Functions
1182 @rnindex max
1183 @rnindex min
1184 @rnindex +
1185 @rnindex *
1186 @rnindex -
1187 @rnindex /
1188 @findex 1+
1189 @findex 1-
1190 @rnindex abs
1191 @rnindex floor
1192 @rnindex ceiling
1193 @rnindex truncate
1194 @rnindex round
1195 @rnindex euclidean/
1196 @rnindex euclidean-quotient
1197 @rnindex euclidean-remainder
1198 @rnindex floor/
1199 @rnindex floor-quotient
1200 @rnindex floor-remainder
1201 @rnindex ceiling/
1202 @rnindex ceiling-quotient
1203 @rnindex ceiling-remainder
1204 @rnindex truncate/
1205 @rnindex truncate-quotient
1206 @rnindex truncate-remainder
1207 @rnindex centered/
1208 @rnindex centered-quotient
1209 @rnindex centered-remainder
1210 @rnindex round/
1211 @rnindex round-quotient
1212 @rnindex round-remainder
1213
1214 The C arithmetic functions below always takes two arguments, while the
1215 Scheme functions can take an arbitrary number. When you need to
1216 invoke them with just one argument, for example to compute the
1217 equivalent of @code{(- x)}, pass @code{SCM_UNDEFINED} as the second
1218 one: @code{scm_difference (x, SCM_UNDEFINED)}.
1219
1220 @c begin (texi-doc-string "guile" "+")
1221 @deffn {Scheme Procedure} + z1 @dots{}
1222 @deffnx {C Function} scm_sum (z1, z2)
1223 Return the sum of all parameter values. Return 0 if called without any
1224 parameters.
1225 @end deffn
1226
1227 @c begin (texi-doc-string "guile" "-")
1228 @deffn {Scheme Procedure} - z1 z2 @dots{}
1229 @deffnx {C Function} scm_difference (z1, z2)
1230 If called with one argument @var{z1}, -@var{z1} is returned. Otherwise
1231 the sum of all but the first argument are subtracted from the first
1232 argument.
1233 @end deffn
1234
1235 @c begin (texi-doc-string "guile" "*")
1236 @deffn {Scheme Procedure} * z1 @dots{}
1237 @deffnx {C Function} scm_product (z1, z2)
1238 Return the product of all arguments. If called without arguments, 1 is
1239 returned.
1240 @end deffn
1241
1242 @c begin (texi-doc-string "guile" "/")
1243 @deffn {Scheme Procedure} / z1 z2 @dots{}
1244 @deffnx {C Function} scm_divide (z1, z2)
1245 Divide the first argument by the product of the remaining arguments. If
1246 called with one argument @var{z1}, 1/@var{z1} is returned.
1247 @end deffn
1248
1249 @deffn {Scheme Procedure} 1+ z
1250 @deffnx {C Function} scm_oneplus (z)
1251 Return @math{@var{z} + 1}.
1252 @end deffn
1253
1254 @deffn {Scheme Procedure} 1- z
1255 @deffnx {C function} scm_oneminus (z)
1256 Return @math{@var{z} - 1}.
1257 @end deffn
1258
1259 @c begin (texi-doc-string "guile" "abs")
1260 @deffn {Scheme Procedure} abs x
1261 @deffnx {C Function} scm_abs (x)
1262 Return the absolute value of @var{x}.
1263
1264 @var{x} must be a number with zero imaginary part. To calculate the
1265 magnitude of a complex number, use @code{magnitude} instead.
1266 @end deffn
1267
1268 @c begin (texi-doc-string "guile" "max")
1269 @deffn {Scheme Procedure} max x1 x2 @dots{}
1270 @deffnx {C Function} scm_max (x1, x2)
1271 Return the maximum of all parameter values.
1272 @end deffn
1273
1274 @c begin (texi-doc-string "guile" "min")
1275 @deffn {Scheme Procedure} min x1 x2 @dots{}
1276 @deffnx {C Function} scm_min (x1, x2)
1277 Return the minimum of all parameter values.
1278 @end deffn
1279
1280 @c begin (texi-doc-string "guile" "truncate")
1281 @deffn {Scheme Procedure} truncate x
1282 @deffnx {C Function} scm_truncate_number (x)
1283 Round the inexact number @var{x} towards zero.
1284 @end deffn
1285
1286 @c begin (texi-doc-string "guile" "round")
1287 @deffn {Scheme Procedure} round x
1288 @deffnx {C Function} scm_round_number (x)
1289 Round the inexact number @var{x} to the nearest integer. When exactly
1290 halfway between two integers, round to the even one.
1291 @end deffn
1292
1293 @c begin (texi-doc-string "guile" "floor")
1294 @deffn {Scheme Procedure} floor x
1295 @deffnx {C Function} scm_floor (x)
1296 Round the number @var{x} towards minus infinity.
1297 @end deffn
1298
1299 @c begin (texi-doc-string "guile" "ceiling")
1300 @deffn {Scheme Procedure} ceiling x
1301 @deffnx {C Function} scm_ceiling (x)
1302 Round the number @var{x} towards infinity.
1303 @end deffn
1304
1305 @deftypefn {C Function} double scm_c_truncate (double x)
1306 @deftypefnx {C Function} double scm_c_round (double x)
1307 Like @code{scm_truncate_number} or @code{scm_round_number},
1308 respectively, but these functions take and return @code{double}
1309 values.
1310 @end deftypefn
1311
1312 @deftypefn {Scheme Procedure} {} euclidean/ @var{x} @var{y}
1313 @deftypefnx {Scheme Procedure} {} euclidean-quotient @var{x} @var{y}
1314 @deftypefnx {Scheme Procedure} {} euclidean-remainder @var{x} @var{y}
1315 @deftypefnx {C Function} void scm_euclidean_divide (SCM @var{x}, SCM @var{y}, SCM *@var{q}, SCM *@var{r})
1316 @deftypefnx {C Function} SCM scm_euclidean_quotient (SCM @var{x}, SCM @var{y})
1317 @deftypefnx {C Function} SCM scm_euclidean_remainder (SCM @var{x}, SCM @var{y})
1318 These procedures accept two real numbers @var{x} and @var{y}, where the
1319 divisor @var{y} must be non-zero. @code{euclidean-quotient} returns the
1320 integer @var{q} and @code{euclidean-remainder} returns the real number
1321 @var{r} such that @math{@var{x} = @var{q}*@var{y} + @var{r}} and
1322 @math{0 <= @var{r} < |@var{y}|}. @code{euclidean/} returns both @var{q} and
1323 @var{r}, and is more efficient than computing each separately. Note
1324 that when @math{@var{y} > 0}, @code{euclidean-quotient} returns
1325 @math{floor(@var{x}/@var{y})}, otherwise it returns
1326 @math{ceiling(@var{x}/@var{y})}.
1327
1328 Note that these operators are equivalent to the R6RS operators
1329 @code{div}, @code{mod}, and @code{div-and-mod}.
1330
1331 @lisp
1332 (euclidean-quotient 123 10) @result{} 12
1333 (euclidean-remainder 123 10) @result{} 3
1334 (euclidean/ 123 10) @result{} 12 and 3
1335 (euclidean/ 123 -10) @result{} -12 and 3
1336 (euclidean/ -123 10) @result{} -13 and 7
1337 (euclidean/ -123 -10) @result{} 13 and 7
1338 (euclidean/ -123.2 -63.5) @result{} 2.0 and 3.8
1339 (euclidean/ 16/3 -10/7) @result{} -3 and 22/21
1340 @end lisp
1341 @end deftypefn
1342
1343 @deftypefn {Scheme Procedure} {} floor/ @var{x} @var{y}
1344 @deftypefnx {Scheme Procedure} {} floor-quotient @var{x} @var{y}
1345 @deftypefnx {Scheme Procedure} {} floor-remainder @var{x} @var{y}
1346 @deftypefnx {C Function} void scm_floor_divide (SCM @var{x}, SCM @var{y}, SCM *@var{q}, SCM *@var{r})
1347 @deftypefnx {C Function} SCM scm_floor_quotient (@var{x}, @var{y})
1348 @deftypefnx {C Function} SCM scm_floor_remainder (@var{x}, @var{y})
1349 These procedures accept two real numbers @var{x} and @var{y}, where the
1350 divisor @var{y} must be non-zero. @code{floor-quotient} returns the
1351 integer @var{q} and @code{floor-remainder} returns the real number
1352 @var{r} such that @math{@var{q} = floor(@var{x}/@var{y})} and
1353 @math{@var{x} = @var{q}*@var{y} + @var{r}}. @code{floor/} returns
1354 both @var{q} and @var{r}, and is more efficient than computing each
1355 separately. Note that @var{r}, if non-zero, will have the same sign
1356 as @var{y}.
1357
1358 When @var{x} and @var{y} are integers, @code{floor-remainder} is
1359 equivalent to the R5RS integer-only operator @code{modulo}.
1360
1361 @lisp
1362 (floor-quotient 123 10) @result{} 12
1363 (floor-remainder 123 10) @result{} 3
1364 (floor/ 123 10) @result{} 12 and 3
1365 (floor/ 123 -10) @result{} -13 and -7
1366 (floor/ -123 10) @result{} -13 and 7
1367 (floor/ -123 -10) @result{} 12 and -3
1368 (floor/ -123.2 -63.5) @result{} 1.0 and -59.7
1369 (floor/ 16/3 -10/7) @result{} -4 and -8/21
1370 @end lisp
1371 @end deftypefn
1372
1373 @deftypefn {Scheme Procedure} {} ceiling/ @var{x} @var{y}
1374 @deftypefnx {Scheme Procedure} {} ceiling-quotient @var{x} @var{y}
1375 @deftypefnx {Scheme Procedure} {} ceiling-remainder @var{x} @var{y}
1376 @deftypefnx {C Function} void scm_ceiling_divide (SCM @var{x}, SCM @var{y}, SCM *@var{q}, SCM *@var{r})
1377 @deftypefnx {C Function} SCM scm_ceiling_quotient (@var{x}, @var{y})
1378 @deftypefnx {C Function} SCM scm_ceiling_remainder (@var{x}, @var{y})
1379 These procedures accept two real numbers @var{x} and @var{y}, where the
1380 divisor @var{y} must be non-zero. @code{ceiling-quotient} returns the
1381 integer @var{q} and @code{ceiling-remainder} returns the real number
1382 @var{r} such that @math{@var{q} = ceiling(@var{x}/@var{y})} and
1383 @math{@var{x} = @var{q}*@var{y} + @var{r}}. @code{ceiling/} returns
1384 both @var{q} and @var{r}, and is more efficient than computing each
1385 separately. Note that @var{r}, if non-zero, will have the opposite sign
1386 of @var{y}.
1387
1388 @lisp
1389 (ceiling-quotient 123 10) @result{} 13
1390 (ceiling-remainder 123 10) @result{} -7
1391 (ceiling/ 123 10) @result{} 13 and -7
1392 (ceiling/ 123 -10) @result{} -12 and 3
1393 (ceiling/ -123 10) @result{} -12 and -3
1394 (ceiling/ -123 -10) @result{} 13 and 7
1395 (ceiling/ -123.2 -63.5) @result{} 2.0 and 3.8
1396 (ceiling/ 16/3 -10/7) @result{} -3 and 22/21
1397 @end lisp
1398 @end deftypefn
1399
1400 @deftypefn {Scheme Procedure} {} truncate/ @var{x} @var{y}
1401 @deftypefnx {Scheme Procedure} {} truncate-quotient @var{x} @var{y}
1402 @deftypefnx {Scheme Procedure} {} truncate-remainder @var{x} @var{y}
1403 @deftypefnx {C Function} void scm_truncate_divide (SCM @var{x}, SCM @var{y}, SCM *@var{q}, SCM *@var{r})
1404 @deftypefnx {C Function} SCM scm_truncate_quotient (@var{x}, @var{y})
1405 @deftypefnx {C Function} SCM scm_truncate_remainder (@var{x}, @var{y})
1406 These procedures accept two real numbers @var{x} and @var{y}, where the
1407 divisor @var{y} must be non-zero. @code{truncate-quotient} returns the
1408 integer @var{q} and @code{truncate-remainder} returns the real number
1409 @var{r} such that @var{q} is @math{@var{x}/@var{y}} rounded toward zero,
1410 and @math{@var{x} = @var{q}*@var{y} + @var{r}}. @code{truncate/} returns
1411 both @var{q} and @var{r}, and is more efficient than computing each
1412 separately. Note that @var{r}, if non-zero, will have the same sign
1413 as @var{x}.
1414
1415 When @var{x} and @var{y} are integers, these operators are
1416 equivalent to the R5RS integer-only operators @code{quotient} and
1417 @code{remainder}.
1418
1419 @lisp
1420 (truncate-quotient 123 10) @result{} 12
1421 (truncate-remainder 123 10) @result{} 3
1422 (truncate/ 123 10) @result{} 12 and 3
1423 (truncate/ 123 -10) @result{} -12 and 3
1424 (truncate/ -123 10) @result{} -12 and -3
1425 (truncate/ -123 -10) @result{} 12 and -3
1426 (truncate/ -123.2 -63.5) @result{} 1.0 and -59.7
1427 (truncate/ 16/3 -10/7) @result{} -3 and 22/21
1428 @end lisp
1429 @end deftypefn
1430
1431 @deftypefn {Scheme Procedure} {} centered/ @var{x} @var{y}
1432 @deftypefnx {Scheme Procedure} {} centered-quotient @var{x} @var{y}
1433 @deftypefnx {Scheme Procedure} {} centered-remainder @var{x} @var{y}
1434 @deftypefnx {C Function} void scm_centered_divide (SCM @var{x}, SCM @var{y}, SCM *@var{q}, SCM *@var{r})
1435 @deftypefnx {C Function} SCM scm_centered_quotient (SCM @var{x}, SCM @var{y})
1436 @deftypefnx {C Function} SCM scm_centered_remainder (SCM @var{x}, SCM @var{y})
1437 These procedures accept two real numbers @var{x} and @var{y}, where the
1438 divisor @var{y} must be non-zero. @code{centered-quotient} returns the
1439 integer @var{q} and @code{centered-remainder} returns the real number
1440 @var{r} such that @math{@var{x} = @var{q}*@var{y} + @var{r}} and
1441 @math{-|@var{y}/2| <= @var{r} < |@var{y}/2|}. @code{centered/}
1442 returns both @var{q} and @var{r}, and is more efficient than computing
1443 each separately.
1444
1445 Note that @code{centered-quotient} returns @math{@var{x}/@var{y}}
1446 rounded to the nearest integer. When @math{@var{x}/@var{y}} lies
1447 exactly half-way between two integers, the tie is broken according to
1448 the sign of @var{y}. If @math{@var{y} > 0}, ties are rounded toward
1449 positive infinity, otherwise they are rounded toward negative infinity.
1450 This is a consequence of the requirement that
1451 @math{-|@var{y}/2| <= @var{r} < |@var{y}/2|}.
1452
1453 Note that these operators are equivalent to the R6RS operators
1454 @code{div0}, @code{mod0}, and @code{div0-and-mod0}.
1455
1456 @lisp
1457 (centered-quotient 123 10) @result{} 12
1458 (centered-remainder 123 10) @result{} 3
1459 (centered/ 123 10) @result{} 12 and 3
1460 (centered/ 123 -10) @result{} -12 and 3
1461 (centered/ -123 10) @result{} -12 and -3
1462 (centered/ -123 -10) @result{} 12 and -3
1463 (centered/ 125 10) @result{} 13 and -5
1464 (centered/ 127 10) @result{} 13 and -3
1465 (centered/ 135 10) @result{} 14 and -5
1466 (centered/ -123.2 -63.5) @result{} 2.0 and 3.8
1467 (centered/ 16/3 -10/7) @result{} -4 and -8/21
1468 @end lisp
1469 @end deftypefn
1470
1471 @deftypefn {Scheme Procedure} {} round/ @var{x} @var{y}
1472 @deftypefnx {Scheme Procedure} {} round-quotient @var{x} @var{y}
1473 @deftypefnx {Scheme Procedure} {} round-remainder @var{x} @var{y}
1474 @deftypefnx {C Function} void scm_round_divide (SCM @var{x}, SCM @var{y}, SCM *@var{q}, SCM *@var{r})
1475 @deftypefnx {C Function} SCM scm_round_quotient (@var{x}, @var{y})
1476 @deftypefnx {C Function} SCM scm_round_remainder (@var{x}, @var{y})
1477 These procedures accept two real numbers @var{x} and @var{y}, where the
1478 divisor @var{y} must be non-zero. @code{round-quotient} returns the
1479 integer @var{q} and @code{round-remainder} returns the real number
1480 @var{r} such that @math{@var{x} = @var{q}*@var{y} + @var{r}} and
1481 @var{q} is @math{@var{x}/@var{y}} rounded to the nearest integer,
1482 with ties going to the nearest even integer. @code{round/}
1483 returns both @var{q} and @var{r}, and is more efficient than computing
1484 each separately.
1485
1486 Note that @code{round/} and @code{centered/} are almost equivalent, but
1487 their behavior differs when @math{@var{x}/@var{y}} lies exactly half-way
1488 between two integers. In this case, @code{round/} chooses the nearest
1489 even integer, whereas @code{centered/} chooses in such a way to satisfy
1490 the constraint @math{-|@var{y}/2| <= @var{r} < |@var{y}/2|}, which
1491 is stronger than the corresponding constraint for @code{round/},
1492 @math{-|@var{y}/2| <= @var{r} <= |@var{y}/2|}. In particular,
1493 when @var{x} and @var{y} are integers, the number of possible remainders
1494 returned by @code{centered/} is @math{|@var{y}|}, whereas the number of
1495 possible remainders returned by @code{round/} is @math{|@var{y}|+1} when
1496 @var{y} is even.
1497
1498 @lisp
1499 (round-quotient 123 10) @result{} 12
1500 (round-remainder 123 10) @result{} 3
1501 (round/ 123 10) @result{} 12 and 3
1502 (round/ 123 -10) @result{} -12 and 3
1503 (round/ -123 10) @result{} -12 and -3
1504 (round/ -123 -10) @result{} 12 and -3
1505 (round/ 125 10) @result{} 12 and 5
1506 (round/ 127 10) @result{} 13 and -3
1507 (round/ 135 10) @result{} 14 and -5
1508 (round/ -123.2 -63.5) @result{} 2.0 and 3.8
1509 (round/ 16/3 -10/7) @result{} -4 and -8/21
1510 @end lisp
1511 @end deftypefn
1512
1513 @node Scientific
1514 @subsubsection Scientific Functions
1515
1516 The following procedures accept any kind of number as arguments,
1517 including complex numbers.
1518
1519 @rnindex sqrt
1520 @c begin (texi-doc-string "guile" "sqrt")
1521 @deffn {Scheme Procedure} sqrt z
1522 Return the square root of @var{z}. Of the two possible roots
1523 (positive and negative), the one with a positive real part is
1524 returned, or if that's zero then a positive imaginary part. Thus,
1525
1526 @example
1527 (sqrt 9.0) @result{} 3.0
1528 (sqrt -9.0) @result{} 0.0+3.0i
1529 (sqrt 1.0+1.0i) @result{} 1.09868411346781+0.455089860562227i
1530 (sqrt -1.0-1.0i) @result{} 0.455089860562227-1.09868411346781i
1531 @end example
1532 @end deffn
1533
1534 @rnindex expt
1535 @c begin (texi-doc-string "guile" "expt")
1536 @deffn {Scheme Procedure} expt z1 z2
1537 Return @var{z1} raised to the power of @var{z2}.
1538 @end deffn
1539
1540 @rnindex sin
1541 @c begin (texi-doc-string "guile" "sin")
1542 @deffn {Scheme Procedure} sin z
1543 Return the sine of @var{z}.
1544 @end deffn
1545
1546 @rnindex cos
1547 @c begin (texi-doc-string "guile" "cos")
1548 @deffn {Scheme Procedure} cos z
1549 Return the cosine of @var{z}.
1550 @end deffn
1551
1552 @rnindex tan
1553 @c begin (texi-doc-string "guile" "tan")
1554 @deffn {Scheme Procedure} tan z
1555 Return the tangent of @var{z}.
1556 @end deffn
1557
1558 @rnindex asin
1559 @c begin (texi-doc-string "guile" "asin")
1560 @deffn {Scheme Procedure} asin z
1561 Return the arcsine of @var{z}.
1562 @end deffn
1563
1564 @rnindex acos
1565 @c begin (texi-doc-string "guile" "acos")
1566 @deffn {Scheme Procedure} acos z
1567 Return the arccosine of @var{z}.
1568 @end deffn
1569
1570 @rnindex atan
1571 @c begin (texi-doc-string "guile" "atan")
1572 @deffn {Scheme Procedure} atan z
1573 @deffnx {Scheme Procedure} atan y x
1574 Return the arctangent of @var{z}, or of @math{@var{y}/@var{x}}.
1575 @end deffn
1576
1577 @rnindex exp
1578 @c begin (texi-doc-string "guile" "exp")
1579 @deffn {Scheme Procedure} exp z
1580 Return e to the power of @var{z}, where e is the base of natural
1581 logarithms (2.71828@dots{}).
1582 @end deffn
1583
1584 @rnindex log
1585 @c begin (texi-doc-string "guile" "log")
1586 @deffn {Scheme Procedure} log z
1587 Return the natural logarithm of @var{z}.
1588 @end deffn
1589
1590 @c begin (texi-doc-string "guile" "log10")
1591 @deffn {Scheme Procedure} log10 z
1592 Return the base 10 logarithm of @var{z}.
1593 @end deffn
1594
1595 @c begin (texi-doc-string "guile" "sinh")
1596 @deffn {Scheme Procedure} sinh z
1597 Return the hyperbolic sine of @var{z}.
1598 @end deffn
1599
1600 @c begin (texi-doc-string "guile" "cosh")
1601 @deffn {Scheme Procedure} cosh z
1602 Return the hyperbolic cosine of @var{z}.
1603 @end deffn
1604
1605 @c begin (texi-doc-string "guile" "tanh")
1606 @deffn {Scheme Procedure} tanh z
1607 Return the hyperbolic tangent of @var{z}.
1608 @end deffn
1609
1610 @c begin (texi-doc-string "guile" "asinh")
1611 @deffn {Scheme Procedure} asinh z
1612 Return the hyperbolic arcsine of @var{z}.
1613 @end deffn
1614
1615 @c begin (texi-doc-string "guile" "acosh")
1616 @deffn {Scheme Procedure} acosh z
1617 Return the hyperbolic arccosine of @var{z}.
1618 @end deffn
1619
1620 @c begin (texi-doc-string "guile" "atanh")
1621 @deffn {Scheme Procedure} atanh z
1622 Return the hyperbolic arctangent of @var{z}.
1623 @end deffn
1624
1625
1626 @node Bitwise Operations
1627 @subsubsection Bitwise Operations
1628
1629 For the following bitwise functions, negative numbers are treated as
1630 infinite precision twos-complements. For instance @math{-6} is bits
1631 @math{@dots{}111010}, with infinitely many ones on the left. It can
1632 be seen that adding 6 (binary 110) to such a bit pattern gives all
1633 zeros.
1634
1635 @deffn {Scheme Procedure} logand n1 n2 @dots{}
1636 @deffnx {C Function} scm_logand (n1, n2)
1637 Return the bitwise @sc{and} of the integer arguments.
1638
1639 @lisp
1640 (logand) @result{} -1
1641 (logand 7) @result{} 7
1642 (logand #b111 #b011 #b001) @result{} 1
1643 @end lisp
1644 @end deffn
1645
1646 @deffn {Scheme Procedure} logior n1 n2 @dots{}
1647 @deffnx {C Function} scm_logior (n1, n2)
1648 Return the bitwise @sc{or} of the integer arguments.
1649
1650 @lisp
1651 (logior) @result{} 0
1652 (logior 7) @result{} 7
1653 (logior #b000 #b001 #b011) @result{} 3
1654 @end lisp
1655 @end deffn
1656
1657 @deffn {Scheme Procedure} logxor n1 n2 @dots{}
1658 @deffnx {C Function} scm_loxor (n1, n2)
1659 Return the bitwise @sc{xor} of the integer arguments. A bit is
1660 set in the result if it is set in an odd number of arguments.
1661
1662 @lisp
1663 (logxor) @result{} 0
1664 (logxor 7) @result{} 7
1665 (logxor #b000 #b001 #b011) @result{} 2
1666 (logxor #b000 #b001 #b011 #b011) @result{} 1
1667 @end lisp
1668 @end deffn
1669
1670 @deffn {Scheme Procedure} lognot n
1671 @deffnx {C Function} scm_lognot (n)
1672 Return the integer which is the ones-complement of the integer
1673 argument, ie.@: each 0 bit is changed to 1 and each 1 bit to 0.
1674
1675 @lisp
1676 (number->string (lognot #b10000000) 2)
1677 @result{} "-10000001"
1678 (number->string (lognot #b0) 2)
1679 @result{} "-1"
1680 @end lisp
1681 @end deffn
1682
1683 @deffn {Scheme Procedure} logtest j k
1684 @deffnx {C Function} scm_logtest (j, k)
1685 Test whether @var{j} and @var{k} have any 1 bits in common. This is
1686 equivalent to @code{(not (zero? (logand j k)))}, but without actually
1687 calculating the @code{logand}, just testing for non-zero.
1688
1689 @lisp
1690 (logtest #b0100 #b1011) @result{} #f
1691 (logtest #b0100 #b0111) @result{} #t
1692 @end lisp
1693 @end deffn
1694
1695 @deffn {Scheme Procedure} logbit? index j
1696 @deffnx {C Function} scm_logbit_p (index, j)
1697 Test whether bit number @var{index} in @var{j} is set. @var{index}
1698 starts from 0 for the least significant bit.
1699
1700 @lisp
1701 (logbit? 0 #b1101) @result{} #t
1702 (logbit? 1 #b1101) @result{} #f
1703 (logbit? 2 #b1101) @result{} #t
1704 (logbit? 3 #b1101) @result{} #t
1705 (logbit? 4 #b1101) @result{} #f
1706 @end lisp
1707 @end deffn
1708
1709 @deffn {Scheme Procedure} ash n count
1710 @deffnx {C Function} scm_ash (n, count)
1711 Return @math{floor(n * 2^count)}.
1712 @var{n} and @var{count} must be exact integers.
1713
1714 With @var{n} viewed as an infinite-precision twos-complement
1715 integer, @code{ash} means a left shift introducing zero bits
1716 when @var{count} is positive, or a right shift dropping bits
1717 when @var{count} is negative. This is an ``arithmetic'' shift.
1718
1719 @lisp
1720 (number->string (ash #b1 3) 2) @result{} "1000"
1721 (number->string (ash #b1010 -1) 2) @result{} "101"
1722
1723 ;; -23 is bits ...11101001, -6 is bits ...111010
1724 (ash -23 -2) @result{} -6
1725 @end lisp
1726 @end deffn
1727
1728 @deffn {Scheme Procedure} round-ash n count
1729 @deffnx {C Function} scm_round_ash (n, count)
1730 Return @math{round(n * 2^count)}.
1731 @var{n} and @var{count} must be exact integers.
1732
1733 With @var{n} viewed as an infinite-precision twos-complement
1734 integer, @code{round-ash} means a left shift introducing zero
1735 bits when @var{count} is positive, or a right shift rounding
1736 to the nearest integer (with ties going to the nearest even
1737 integer) when @var{count} is negative. This is a rounded
1738 ``arithmetic'' shift.
1739
1740 @lisp
1741 (number->string (round-ash #b1 3) 2) @result{} \"1000\"
1742 (number->string (round-ash #b1010 -1) 2) @result{} \"101\"
1743 (number->string (round-ash #b1010 -2) 2) @result{} \"10\"
1744 (number->string (round-ash #b1011 -2) 2) @result{} \"11\"
1745 (number->string (round-ash #b1101 -2) 2) @result{} \"11\"
1746 (number->string (round-ash #b1110 -2) 2) @result{} \"100\"
1747 @end lisp
1748 @end deffn
1749
1750 @deffn {Scheme Procedure} logcount n
1751 @deffnx {C Function} scm_logcount (n)
1752 Return the number of bits in integer @var{n}. If @var{n} is
1753 positive, the 1-bits in its binary representation are counted.
1754 If negative, the 0-bits in its two's-complement binary
1755 representation are counted. If zero, 0 is returned.
1756
1757 @lisp
1758 (logcount #b10101010)
1759 @result{} 4
1760 (logcount 0)
1761 @result{} 0
1762 (logcount -2)
1763 @result{} 1
1764 @end lisp
1765 @end deffn
1766
1767 @deffn {Scheme Procedure} integer-length n
1768 @deffnx {C Function} scm_integer_length (n)
1769 Return the number of bits necessary to represent @var{n}.
1770
1771 For positive @var{n} this is how many bits to the most significant one
1772 bit. For negative @var{n} it's how many bits to the most significant
1773 zero bit in twos complement form.
1774
1775 @lisp
1776 (integer-length #b10101010) @result{} 8
1777 (integer-length #b1111) @result{} 4
1778 (integer-length 0) @result{} 0
1779 (integer-length -1) @result{} 0
1780 (integer-length -256) @result{} 8
1781 (integer-length -257) @result{} 9
1782 @end lisp
1783 @end deffn
1784
1785 @deffn {Scheme Procedure} integer-expt n k
1786 @deffnx {C Function} scm_integer_expt (n, k)
1787 Return @var{n} raised to the power @var{k}. @var{k} must be an exact
1788 integer, @var{n} can be any number.
1789
1790 Negative @var{k} is supported, and results in @m{1/n^|k|, 1/n^abs(k)}
1791 in the usual way. @math{@var{n}^0} is 1, as usual, and that includes
1792 @math{0^0} is 1.
1793
1794 @lisp
1795 (integer-expt 2 5) @result{} 32
1796 (integer-expt -3 3) @result{} -27
1797 (integer-expt 5 -3) @result{} 1/125
1798 (integer-expt 0 0) @result{} 1
1799 @end lisp
1800 @end deffn
1801
1802 @deffn {Scheme Procedure} bit-extract n start end
1803 @deffnx {C Function} scm_bit_extract (n, start, end)
1804 Return the integer composed of the @var{start} (inclusive)
1805 through @var{end} (exclusive) bits of @var{n}. The
1806 @var{start}th bit becomes the 0-th bit in the result.
1807
1808 @lisp
1809 (number->string (bit-extract #b1101101010 0 4) 2)
1810 @result{} "1010"
1811 (number->string (bit-extract #b1101101010 4 9) 2)
1812 @result{} "10110"
1813 @end lisp
1814 @end deffn
1815
1816
1817 @node Random
1818 @subsubsection Random Number Generation
1819
1820 Pseudo-random numbers are generated from a random state object, which
1821 can be created with @code{seed->random-state} or
1822 @code{datum->random-state}. An external representation (i.e.@: one
1823 which can written with @code{write} and read with @code{read}) of a
1824 random state object can be obtained via
1825 @code{random-state->datum}. The @var{state} parameter to the
1826 various functions below is optional, it defaults to the state object
1827 in the @code{*random-state*} variable.
1828
1829 @deffn {Scheme Procedure} copy-random-state [state]
1830 @deffnx {C Function} scm_copy_random_state (state)
1831 Return a copy of the random state @var{state}.
1832 @end deffn
1833
1834 @deffn {Scheme Procedure} random n [state]
1835 @deffnx {C Function} scm_random (n, state)
1836 Return a number in [0, @var{n}).
1837
1838 Accepts a positive integer or real n and returns a
1839 number of the same type between zero (inclusive) and
1840 @var{n} (exclusive). The values returned have a uniform
1841 distribution.
1842 @end deffn
1843
1844 @deffn {Scheme Procedure} random:exp [state]
1845 @deffnx {C Function} scm_random_exp (state)
1846 Return an inexact real in an exponential distribution with mean
1847 1. For an exponential distribution with mean @var{u} use @code{(*
1848 @var{u} (random:exp))}.
1849 @end deffn
1850
1851 @deffn {Scheme Procedure} random:hollow-sphere! vect [state]
1852 @deffnx {C Function} scm_random_hollow_sphere_x (vect, state)
1853 Fills @var{vect} with inexact real random numbers the sum of whose
1854 squares is equal to 1.0. Thinking of @var{vect} as coordinates in
1855 space of dimension @var{n} @math{=} @code{(vector-length @var{vect})},
1856 the coordinates are uniformly distributed over the surface of the unit
1857 n-sphere.
1858 @end deffn
1859
1860 @deffn {Scheme Procedure} random:normal [state]
1861 @deffnx {C Function} scm_random_normal (state)
1862 Return an inexact real in a normal distribution. The distribution
1863 used has mean 0 and standard deviation 1. For a normal distribution
1864 with mean @var{m} and standard deviation @var{d} use @code{(+ @var{m}
1865 (* @var{d} (random:normal)))}.
1866 @end deffn
1867
1868 @deffn {Scheme Procedure} random:normal-vector! vect [state]
1869 @deffnx {C Function} scm_random_normal_vector_x (vect, state)
1870 Fills @var{vect} with inexact real random numbers that are
1871 independent and standard normally distributed
1872 (i.e., with mean 0 and variance 1).
1873 @end deffn
1874
1875 @deffn {Scheme Procedure} random:solid-sphere! vect [state]
1876 @deffnx {C Function} scm_random_solid_sphere_x (vect, state)
1877 Fills @var{vect} with inexact real random numbers the sum of whose
1878 squares is less than 1.0. Thinking of @var{vect} as coordinates in
1879 space of dimension @var{n} @math{=} @code{(vector-length @var{vect})},
1880 the coordinates are uniformly distributed within the unit
1881 @var{n}-sphere.
1882 @c FIXME: What does this mean, particularly the n-sphere part?
1883 @end deffn
1884
1885 @deffn {Scheme Procedure} random:uniform [state]
1886 @deffnx {C Function} scm_random_uniform (state)
1887 Return a uniformly distributed inexact real random number in
1888 [0,1).
1889 @end deffn
1890
1891 @deffn {Scheme Procedure} seed->random-state seed
1892 @deffnx {C Function} scm_seed_to_random_state (seed)
1893 Return a new random state using @var{seed}.
1894 @end deffn
1895
1896 @deffn {Scheme Procedure} datum->random-state datum
1897 @deffnx {C Function} scm_datum_to_random_state (datum)
1898 Return a new random state from @var{datum}, which should have been
1899 obtained by @code{random-state->datum}.
1900 @end deffn
1901
1902 @deffn {Scheme Procedure} random-state->datum state
1903 @deffnx {C Function} scm_random_state_to_datum (state)
1904 Return a datum representation of @var{state} that may be written out and
1905 read back with the Scheme reader.
1906 @end deffn
1907
1908 @deffn {Scheme Procedure} random-state-from-platform
1909 @deffnx {C Function} scm_random_state_from_platform ()
1910 Construct a new random state seeded from a platform-specific source of
1911 entropy, appropriate for use in non-security-critical applications.
1912 Currently @file{/dev/urandom} is tried first, or else the seed is based
1913 on the time, date, process ID, an address from a freshly allocated heap
1914 cell, an address from the local stack frame, and a high-resolution timer
1915 if available.
1916 @end deffn
1917
1918 @defvar *random-state*
1919 The global random state used by the above functions when the
1920 @var{state} parameter is not given.
1921 @end defvar
1922
1923 Note that the initial value of @code{*random-state*} is the same every
1924 time Guile starts up. Therefore, if you don't pass a @var{state}
1925 parameter to the above procedures, and you don't set
1926 @code{*random-state*} to @code{(seed->random-state your-seed)}, where
1927 @code{your-seed} is something that @emph{isn't} the same every time,
1928 you'll get the same sequence of ``random'' numbers on every run.
1929
1930 For example, unless the relevant source code has changed, @code{(map
1931 random (cdr (iota 30)))}, if the first use of random numbers since
1932 Guile started up, will always give:
1933
1934 @lisp
1935 (map random (cdr (iota 19)))
1936 @result{}
1937 (0 1 1 2 2 2 1 2 6 7 10 0 5 3 12 5 5 12)
1938 @end lisp
1939
1940 To seed the random state in a sensible way for non-security-critical
1941 applications, do this during initialization of your program:
1942
1943 @lisp
1944 (set! *random-state* (random-state-from-platform))
1945 @end lisp
1946
1947
1948 @node Characters
1949 @subsection Characters
1950 @tpindex Characters
1951
1952 In Scheme, there is a data type to describe a single character.
1953
1954 Defining what exactly a character @emph{is} can be more complicated
1955 than it seems. Guile follows the advice of R6RS and uses The Unicode
1956 Standard to help define what a character is. So, for Guile, a
1957 character is anything in the Unicode Character Database.
1958
1959 @cindex code point
1960 @cindex Unicode code point
1961
1962 The Unicode Character Database is basically a table of characters
1963 indexed using integers called 'code points'. Valid code points are in
1964 the ranges 0 to @code{#xD7FF} inclusive or @code{#xE000} to
1965 @code{#x10FFFF} inclusive, which is about 1.1 million code points.
1966
1967 @cindex designated code point
1968 @cindex code point, designated
1969
1970 Any code point that has been assigned to a character or that has
1971 otherwise been given a meaning by Unicode is called a 'designated code
1972 point'. Most of the designated code points, about 200,000 of them,
1973 indicate characters, accents or other combining marks that modify
1974 other characters, symbols, whitespace, and control characters. Some
1975 are not characters but indicators that suggest how to format or
1976 display neighboring characters.
1977
1978 @cindex reserved code point
1979 @cindex code point, reserved
1980
1981 If a code point is not a designated code point -- if it has not been
1982 assigned to a character by The Unicode Standard -- it is a 'reserved
1983 code point', meaning that they are reserved for future use. Most of
1984 the code points, about 800,000, are 'reserved code points'.
1985
1986 By convention, a Unicode code point is written as
1987 ``U+XXXX'' where ``XXXX'' is a hexadecimal number. Please note that
1988 this convenient notation is not valid code. Guile does not interpret
1989 ``U+XXXX'' as a character.
1990
1991 In Scheme, a character literal is written as @code{#\@var{name}} where
1992 @var{name} is the name of the character that you want. Printable
1993 characters have their usual single character name; for example,
1994 @code{#\a} is a lower case @code{a}.
1995
1996 Some of the code points are 'combining characters' that are not meant
1997 to be printed by themselves but are instead meant to modify the
1998 appearance of the previous character. For combining characters, an
1999 alternate form of the character literal is @code{#\} followed by
2000 U+25CC (a small, dotted circle), followed by the combining character.
2001 This allows the combining character to be drawn on the circle, not on
2002 the backslash of @code{#\}.
2003
2004 Many of the non-printing characters, such as whitespace characters and
2005 control characters, also have names.
2006
2007 The most commonly used non-printing characters have long character
2008 names, described in the table below.
2009
2010 @multitable {@code{#\backspace}} {Preferred}
2011 @item Character Name @tab Codepoint
2012 @item @code{#\nul} @tab U+0000
2013 @item @code{#\alarm} @tab u+0007
2014 @item @code{#\backspace} @tab U+0008
2015 @item @code{#\tab} @tab U+0009
2016 @item @code{#\linefeed} @tab U+000A
2017 @item @code{#\newline} @tab U+000A
2018 @item @code{#\vtab} @tab U+000B
2019 @item @code{#\page} @tab U+000C
2020 @item @code{#\return} @tab U+000D
2021 @item @code{#\esc} @tab U+001B
2022 @item @code{#\space} @tab U+0020
2023 @item @code{#\delete} @tab U+007F
2024 @end multitable
2025
2026 There are also short names for all of the ``C0 control characters''
2027 (those with code points below 32). The following table lists the short
2028 name for each character.
2029
2030 @multitable @columnfractions .25 .25 .25 .25
2031 @item 0 = @code{#\nul}
2032 @tab 1 = @code{#\soh}
2033 @tab 2 = @code{#\stx}
2034 @tab 3 = @code{#\etx}
2035 @item 4 = @code{#\eot}
2036 @tab 5 = @code{#\enq}
2037 @tab 6 = @code{#\ack}
2038 @tab 7 = @code{#\bel}
2039 @item 8 = @code{#\bs}
2040 @tab 9 = @code{#\ht}
2041 @tab 10 = @code{#\lf}
2042 @tab 11 = @code{#\vt}
2043 @item 12 = @code{#\ff}
2044 @tab 13 = @code{#\cr}
2045 @tab 14 = @code{#\so}
2046 @tab 15 = @code{#\si}
2047 @item 16 = @code{#\dle}
2048 @tab 17 = @code{#\dc1}
2049 @tab 18 = @code{#\dc2}
2050 @tab 19 = @code{#\dc3}
2051 @item 20 = @code{#\dc4}
2052 @tab 21 = @code{#\nak}
2053 @tab 22 = @code{#\syn}
2054 @tab 23 = @code{#\etb}
2055 @item 24 = @code{#\can}
2056 @tab 25 = @code{#\em}
2057 @tab 26 = @code{#\sub}
2058 @tab 27 = @code{#\esc}
2059 @item 28 = @code{#\fs}
2060 @tab 29 = @code{#\gs}
2061 @tab 30 = @code{#\rs}
2062 @tab 31 = @code{#\us}
2063 @item 32 = @code{#\sp}
2064 @end multitable
2065
2066 The short name for the ``delete'' character (code point U+007F) is
2067 @code{#\del}.
2068
2069 The R7RS name for the ``escape'' character (code point U+001B) is
2070 @code{#\escape}.
2071
2072 There are also a few alternative names left over for compatibility with
2073 previous versions of Guile.
2074
2075 @multitable {@code{#\backspace}} {Preferred}
2076 @item Alternate @tab Standard
2077 @item @code{#\nl} @tab @code{#\newline}
2078 @item @code{#\np} @tab @code{#\page}
2079 @item @code{#\null} @tab @code{#\nul}
2080 @end multitable
2081
2082 Characters may also be written using their code point values. They can
2083 be written with as an octal number, such as @code{#\10} for
2084 @code{#\bs} or @code{#\177} for @code{#\del}.
2085
2086 If one prefers hex to octal, there is an additional syntax for character
2087 escapes: @code{#\xHHHH} -- the letter 'x' followed by a hexadecimal
2088 number of one to eight digits.
2089
2090 @rnindex char?
2091 @deffn {Scheme Procedure} char? x
2092 @deffnx {C Function} scm_char_p (x)
2093 Return @code{#t} if @var{x} is a character, else @code{#f}.
2094 @end deffn
2095
2096 Fundamentally, the character comparison operations below are
2097 numeric comparisons of the character's code points.
2098
2099 @rnindex char=?
2100 @deffn {Scheme Procedure} char=? x y
2101 Return @code{#t} if code point of @var{x} is equal to the code point
2102 of @var{y}, else @code{#f}.
2103 @end deffn
2104
2105 @rnindex char<?
2106 @deffn {Scheme Procedure} char<? x y
2107 Return @code{#t} if the code point of @var{x} is less than the code
2108 point of @var{y}, else @code{#f}.
2109 @end deffn
2110
2111 @rnindex char<=?
2112 @deffn {Scheme Procedure} char<=? x y
2113 Return @code{#t} if the code point of @var{x} is less than or equal
2114 to the code point of @var{y}, else @code{#f}.
2115 @end deffn
2116
2117 @rnindex char>?
2118 @deffn {Scheme Procedure} char>? x y
2119 Return @code{#t} if the code point of @var{x} is greater than the
2120 code point of @var{y}, else @code{#f}.
2121 @end deffn
2122
2123 @rnindex char>=?
2124 @deffn {Scheme Procedure} char>=? x y
2125 Return @code{#t} if the code point of @var{x} is greater than or
2126 equal to the code point of @var{y}, else @code{#f}.
2127 @end deffn
2128
2129 @cindex case folding
2130
2131 Case-insensitive character comparisons use @emph{Unicode case
2132 folding}. In case folding comparisons, if a character is lowercase
2133 and has an uppercase form that can be expressed as a single character,
2134 it is converted to uppercase before comparison. All other characters
2135 undergo no conversion before the comparison occurs. This includes the
2136 German sharp S (Eszett) which is not uppercased before conversion
2137 because its uppercase form has two characters. Unicode case folding
2138 is language independent: it uses rules that are generally true, but,
2139 it cannot cover all cases for all languages.
2140
2141 @rnindex char-ci=?
2142 @deffn {Scheme Procedure} char-ci=? x y
2143 Return @code{#t} if the case-folded code point of @var{x} is the same
2144 as the case-folded code point of @var{y}, else @code{#f}.
2145 @end deffn
2146
2147 @rnindex char-ci<?
2148 @deffn {Scheme Procedure} char-ci<? x y
2149 Return @code{#t} if the case-folded code point of @var{x} is less
2150 than the case-folded code point of @var{y}, else @code{#f}.
2151 @end deffn
2152
2153 @rnindex char-ci<=?
2154 @deffn {Scheme Procedure} char-ci<=? x y
2155 Return @code{#t} if the case-folded code point of @var{x} is less
2156 than or equal to the case-folded code point of @var{y}, else
2157 @code{#f}.
2158 @end deffn
2159
2160 @rnindex char-ci>?
2161 @deffn {Scheme Procedure} char-ci>? x y
2162 Return @code{#t} if the case-folded code point of @var{x} is greater
2163 than the case-folded code point of @var{y}, else @code{#f}.
2164 @end deffn
2165
2166 @rnindex char-ci>=?
2167 @deffn {Scheme Procedure} char-ci>=? x y
2168 Return @code{#t} if the case-folded code point of @var{x} is greater
2169 than or equal to the case-folded code point of @var{y}, else
2170 @code{#f}.
2171 @end deffn
2172
2173 @rnindex char-alphabetic?
2174 @deffn {Scheme Procedure} char-alphabetic? chr
2175 @deffnx {C Function} scm_char_alphabetic_p (chr)
2176 Return @code{#t} if @var{chr} is alphabetic, else @code{#f}.
2177 @end deffn
2178
2179 @rnindex char-numeric?
2180 @deffn {Scheme Procedure} char-numeric? chr
2181 @deffnx {C Function} scm_char_numeric_p (chr)
2182 Return @code{#t} if @var{chr} is numeric, else @code{#f}.
2183 @end deffn
2184
2185 @rnindex char-whitespace?
2186 @deffn {Scheme Procedure} char-whitespace? chr
2187 @deffnx {C Function} scm_char_whitespace_p (chr)
2188 Return @code{#t} if @var{chr} is whitespace, else @code{#f}.
2189 @end deffn
2190
2191 @rnindex char-upper-case?
2192 @deffn {Scheme Procedure} char-upper-case? chr
2193 @deffnx {C Function} scm_char_upper_case_p (chr)
2194 Return @code{#t} if @var{chr} is uppercase, else @code{#f}.
2195 @end deffn
2196
2197 @rnindex char-lower-case?
2198 @deffn {Scheme Procedure} char-lower-case? chr
2199 @deffnx {C Function} scm_char_lower_case_p (chr)
2200 Return @code{#t} if @var{chr} is lowercase, else @code{#f}.
2201 @end deffn
2202
2203 @deffn {Scheme Procedure} char-is-both? chr
2204 @deffnx {C Function} scm_char_is_both_p (chr)
2205 Return @code{#t} if @var{chr} is either uppercase or lowercase, else
2206 @code{#f}.
2207 @end deffn
2208
2209 @deffn {Scheme Procedure} char-general-category chr
2210 @deffnx {C Function} scm_char_general_category (chr)
2211 Return a symbol giving the two-letter name of the Unicode general
2212 category assigned to @var{chr} or @code{#f} if no named category is
2213 assigned. The following table provides a list of category names along
2214 with their meanings.
2215
2216 @multitable @columnfractions .1 .4 .1 .4
2217 @item Lu
2218 @tab Uppercase letter
2219 @tab Pf
2220 @tab Final quote punctuation
2221 @item Ll
2222 @tab Lowercase letter
2223 @tab Po
2224 @tab Other punctuation
2225 @item Lt
2226 @tab Titlecase letter
2227 @tab Sm
2228 @tab Math symbol
2229 @item Lm
2230 @tab Modifier letter
2231 @tab Sc
2232 @tab Currency symbol
2233 @item Lo
2234 @tab Other letter
2235 @tab Sk
2236 @tab Modifier symbol
2237 @item Mn
2238 @tab Non-spacing mark
2239 @tab So
2240 @tab Other symbol
2241 @item Mc
2242 @tab Combining spacing mark
2243 @tab Zs
2244 @tab Space separator
2245 @item Me
2246 @tab Enclosing mark
2247 @tab Zl
2248 @tab Line separator
2249 @item Nd
2250 @tab Decimal digit number
2251 @tab Zp
2252 @tab Paragraph separator
2253 @item Nl
2254 @tab Letter number
2255 @tab Cc
2256 @tab Control
2257 @item No
2258 @tab Other number
2259 @tab Cf
2260 @tab Format
2261 @item Pc
2262 @tab Connector punctuation
2263 @tab Cs
2264 @tab Surrogate
2265 @item Pd
2266 @tab Dash punctuation
2267 @tab Co
2268 @tab Private use
2269 @item Ps
2270 @tab Open punctuation
2271 @tab Cn
2272 @tab Unassigned
2273 @item Pe
2274 @tab Close punctuation
2275 @tab
2276 @tab
2277 @item Pi
2278 @tab Initial quote punctuation
2279 @tab
2280 @tab
2281 @end multitable
2282 @end deffn
2283
2284 @rnindex char->integer
2285 @deffn {Scheme Procedure} char->integer chr
2286 @deffnx {C Function} scm_char_to_integer (chr)
2287 Return the code point of @var{chr}.
2288 @end deffn
2289
2290 @rnindex integer->char
2291 @deffn {Scheme Procedure} integer->char n
2292 @deffnx {C Function} scm_integer_to_char (n)
2293 Return the character that has code point @var{n}. The integer @var{n}
2294 must be a valid code point. Valid code points are in the ranges 0 to
2295 @code{#xD7FF} inclusive or @code{#xE000} to @code{#x10FFFF} inclusive.
2296 @end deffn
2297
2298 @rnindex char-upcase
2299 @deffn {Scheme Procedure} char-upcase chr
2300 @deffnx {C Function} scm_char_upcase (chr)
2301 Return the uppercase character version of @var{chr}.
2302 @end deffn
2303
2304 @rnindex char-downcase
2305 @deffn {Scheme Procedure} char-downcase chr
2306 @deffnx {C Function} scm_char_downcase (chr)
2307 Return the lowercase character version of @var{chr}.
2308 @end deffn
2309
2310 @rnindex char-titlecase
2311 @deffn {Scheme Procedure} char-titlecase chr
2312 @deffnx {C Function} scm_char_titlecase (chr)
2313 Return the titlecase character version of @var{chr} if one exists;
2314 otherwise return the uppercase version.
2315
2316 For most characters these will be the same, but the Unicode Standard
2317 includes certain digraph compatibility characters, such as @code{U+01F3}
2318 ``dz'', for which the uppercase and titlecase characters are different
2319 (@code{U+01F1} ``DZ'' and @code{U+01F2} ``Dz'' in this case,
2320 respectively).
2321 @end deffn
2322
2323 @tindex scm_t_wchar
2324 @deftypefn {C Function} scm_t_wchar scm_c_upcase (scm_t_wchar @var{c})
2325 @deftypefnx {C Function} scm_t_wchar scm_c_downcase (scm_t_wchar @var{c})
2326 @deftypefnx {C Function} scm_t_wchar scm_c_titlecase (scm_t_wchar @var{c})
2327
2328 These C functions take an integer representation of a Unicode
2329 codepoint and return the codepoint corresponding to its uppercase,
2330 lowercase, and titlecase forms respectively. The type
2331 @code{scm_t_wchar} is a signed, 32-bit integer.
2332 @end deftypefn
2333
2334 Characters also have ``formal names'', which are defined by Unicode.
2335 These names can be accessed in Guile from the @code{(ice-9 unicode)}
2336 module:
2337
2338 @example
2339 (use-modules (ice-9 unicode))
2340 @end example
2341
2342 @deffn {Scheme Procedure} char->formal-name chr
2343 Return the formal all-upper-case Unicode name of @var{ch},
2344 as a string, or @code{#f} if the character has no name.
2345 @end deffn
2346
2347 @deffn {Scheme Procedure} formal-name->char name
2348 Return the character whose formal all-upper-case Unicode name is
2349 @var{name}, or @code{#f} if no such character is known.
2350 @end deffn
2351
2352 @node Character Sets
2353 @subsection Character Sets
2354
2355 The features described in this section correspond directly to SRFI-14.
2356
2357 The data type @dfn{charset} implements sets of characters
2358 (@pxref{Characters}). Because the internal representation of
2359 character sets is not visible to the user, a lot of procedures for
2360 handling them are provided.
2361
2362 Character sets can be created, extended, tested for the membership of a
2363 characters and be compared to other character sets.
2364
2365 @menu
2366 * Character Set Predicates/Comparison::
2367 * Iterating Over Character Sets:: Enumerate charset elements.
2368 * Creating Character Sets:: Making new charsets.
2369 * Querying Character Sets:: Test charsets for membership etc.
2370 * Character-Set Algebra:: Calculating new charsets.
2371 * Standard Character Sets:: Variables containing predefined charsets.
2372 @end menu
2373
2374 @node Character Set Predicates/Comparison
2375 @subsubsection Character Set Predicates/Comparison
2376
2377 Use these procedures for testing whether an object is a character set,
2378 or whether several character sets are equal or subsets of each other.
2379 @code{char-set-hash} can be used for calculating a hash value, maybe for
2380 usage in fast lookup procedures.
2381
2382 @deffn {Scheme Procedure} char-set? obj
2383 @deffnx {C Function} scm_char_set_p (obj)
2384 Return @code{#t} if @var{obj} is a character set, @code{#f}
2385 otherwise.
2386 @end deffn
2387
2388 @deffn {Scheme Procedure} char-set= char_set @dots{}
2389 @deffnx {C Function} scm_char_set_eq (char_sets)
2390 Return @code{#t} if all given character sets are equal.
2391 @end deffn
2392
2393 @deffn {Scheme Procedure} char-set<= char_set @dots{}
2394 @deffnx {C Function} scm_char_set_leq (char_sets)
2395 Return @code{#t} if every character set @var{char_set}i is a subset
2396 of character set @var{char_set}i+1.
2397 @end deffn
2398
2399 @deffn {Scheme Procedure} char-set-hash cs [bound]
2400 @deffnx {C Function} scm_char_set_hash (cs, bound)
2401 Compute a hash value for the character set @var{cs}. If
2402 @var{bound} is given and non-zero, it restricts the
2403 returned value to the range 0 @dots{} @var{bound} - 1.
2404 @end deffn
2405
2406 @c ===================================================================
2407
2408 @node Iterating Over Character Sets
2409 @subsubsection Iterating Over Character Sets
2410
2411 Character set cursors are a means for iterating over the members of a
2412 character sets. After creating a character set cursor with
2413 @code{char-set-cursor}, a cursor can be dereferenced with
2414 @code{char-set-ref}, advanced to the next member with
2415 @code{char-set-cursor-next}. Whether a cursor has passed past the last
2416 element of the set can be checked with @code{end-of-char-set?}.
2417
2418 Additionally, mapping and (un-)folding procedures for character sets are
2419 provided.
2420
2421 @deffn {Scheme Procedure} char-set-cursor cs
2422 @deffnx {C Function} scm_char_set_cursor (cs)
2423 Return a cursor into the character set @var{cs}.
2424 @end deffn
2425
2426 @deffn {Scheme Procedure} char-set-ref cs cursor
2427 @deffnx {C Function} scm_char_set_ref (cs, cursor)
2428 Return the character at the current cursor position
2429 @var{cursor} in the character set @var{cs}. It is an error to
2430 pass a cursor for which @code{end-of-char-set?} returns true.
2431 @end deffn
2432
2433 @deffn {Scheme Procedure} char-set-cursor-next cs cursor
2434 @deffnx {C Function} scm_char_set_cursor_next (cs, cursor)
2435 Advance the character set cursor @var{cursor} to the next
2436 character in the character set @var{cs}. It is an error if the
2437 cursor given satisfies @code{end-of-char-set?}.
2438 @end deffn
2439
2440 @deffn {Scheme Procedure} end-of-char-set? cursor
2441 @deffnx {C Function} scm_end_of_char_set_p (cursor)
2442 Return @code{#t} if @var{cursor} has reached the end of a
2443 character set, @code{#f} otherwise.
2444 @end deffn
2445
2446 @deffn {Scheme Procedure} char-set-fold kons knil cs
2447 @deffnx {C Function} scm_char_set_fold (kons, knil, cs)
2448 Fold the procedure @var{kons} over the character set @var{cs},
2449 initializing it with @var{knil}.
2450 @end deffn
2451
2452 @deffn {Scheme Procedure} char-set-unfold p f g seed [base_cs]
2453 @deffnx {C Function} scm_char_set_unfold (p, f, g, seed, base_cs)
2454 This is a fundamental constructor for character sets.
2455 @itemize @bullet
2456 @item @var{g} is used to generate a series of ``seed'' values
2457 from the initial seed: @var{seed}, (@var{g} @var{seed}),
2458 (@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}), @dots{}
2459 @item @var{p} tells us when to stop -- when it returns true
2460 when applied to one of the seed values.
2461 @item @var{f} maps each seed value to a character. These
2462 characters are added to the base character set @var{base_cs} to
2463 form the result; @var{base_cs} defaults to the empty set.
2464 @end itemize
2465 @end deffn
2466
2467 @deffn {Scheme Procedure} char-set-unfold! p f g seed base_cs
2468 @deffnx {C Function} scm_char_set_unfold_x (p, f, g, seed, base_cs)
2469 This is a fundamental constructor for character sets.
2470 @itemize @bullet
2471 @item @var{g} is used to generate a series of ``seed'' values
2472 from the initial seed: @var{seed}, (@var{g} @var{seed}),
2473 (@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}), @dots{}
2474 @item @var{p} tells us when to stop -- when it returns true
2475 when applied to one of the seed values.
2476 @item @var{f} maps each seed value to a character. These
2477 characters are added to the base character set @var{base_cs} to
2478 form the result; @var{base_cs} defaults to the empty set.
2479 @end itemize
2480 @end deffn
2481
2482 @deffn {Scheme Procedure} char-set-for-each proc cs
2483 @deffnx {C Function} scm_char_set_for_each (proc, cs)
2484 Apply @var{proc} to every character in the character set
2485 @var{cs}. The return value is not specified.
2486 @end deffn
2487
2488 @deffn {Scheme Procedure} char-set-map proc cs
2489 @deffnx {C Function} scm_char_set_map (proc, cs)
2490 Map the procedure @var{proc} over every character in @var{cs}.
2491 @var{proc} must be a character -> character procedure.
2492 @end deffn
2493
2494 @c ===================================================================
2495
2496 @node Creating Character Sets
2497 @subsubsection Creating Character Sets
2498
2499 New character sets are produced with these procedures.
2500
2501 @deffn {Scheme Procedure} char-set-copy cs
2502 @deffnx {C Function} scm_char_set_copy (cs)
2503 Return a newly allocated character set containing all
2504 characters in @var{cs}.
2505 @end deffn
2506
2507 @deffn {Scheme Procedure} char-set chr @dots{}
2508 @deffnx {C Function} scm_char_set (chrs)
2509 Return a character set containing all given characters.
2510 @end deffn
2511
2512 @deffn {Scheme Procedure} list->char-set list [base_cs]
2513 @deffnx {C Function} scm_list_to_char_set (list, base_cs)
2514 Convert the character list @var{list} to a character set. If
2515 the character set @var{base_cs} is given, the character in this
2516 set are also included in the result.
2517 @end deffn
2518
2519 @deffn {Scheme Procedure} list->char-set! list base_cs
2520 @deffnx {C Function} scm_list_to_char_set_x (list, base_cs)
2521 Convert the character list @var{list} to a character set. The
2522 characters are added to @var{base_cs} and @var{base_cs} is
2523 returned.
2524 @end deffn
2525
2526 @deffn {Scheme Procedure} string->char-set str [base_cs]
2527 @deffnx {C Function} scm_string_to_char_set (str, base_cs)
2528 Convert the string @var{str} to a character set. If the
2529 character set @var{base_cs} is given, the characters in this
2530 set are also included in the result.
2531 @end deffn
2532
2533 @deffn {Scheme Procedure} string->char-set! str base_cs
2534 @deffnx {C Function} scm_string_to_char_set_x (str, base_cs)
2535 Convert the string @var{str} to a character set. The
2536 characters from the string are added to @var{base_cs}, and
2537 @var{base_cs} is returned.
2538 @end deffn
2539
2540 @deffn {Scheme Procedure} char-set-filter pred cs [base_cs]
2541 @deffnx {C Function} scm_char_set_filter (pred, cs, base_cs)
2542 Return a character set containing every character from @var{cs}
2543 so that it satisfies @var{pred}. If provided, the characters
2544 from @var{base_cs} are added to the result.
2545 @end deffn
2546
2547 @deffn {Scheme Procedure} char-set-filter! pred cs base_cs
2548 @deffnx {C Function} scm_char_set_filter_x (pred, cs, base_cs)
2549 Return a character set containing every character from @var{cs}
2550 so that it satisfies @var{pred}. The characters are added to
2551 @var{base_cs} and @var{base_cs} is returned.
2552 @end deffn
2553
2554 @deffn {Scheme Procedure} ucs-range->char-set lower upper [error [base_cs]]
2555 @deffnx {C Function} scm_ucs_range_to_char_set (lower, upper, error, base_cs)
2556 Return a character set containing all characters whose
2557 character codes lie in the half-open range
2558 [@var{lower},@var{upper}).
2559
2560 If @var{error} is a true value, an error is signalled if the
2561 specified range contains characters which are not contained in
2562 the implemented character range. If @var{error} is @code{#f},
2563 these characters are silently left out of the resulting
2564 character set.
2565
2566 The characters in @var{base_cs} are added to the result, if
2567 given.
2568 @end deffn
2569
2570 @deffn {Scheme Procedure} ucs-range->char-set! lower upper error base_cs
2571 @deffnx {C Function} scm_ucs_range_to_char_set_x (lower, upper, error, base_cs)
2572 Return a character set containing all characters whose
2573 character codes lie in the half-open range
2574 [@var{lower},@var{upper}).
2575
2576 If @var{error} is a true value, an error is signalled if the
2577 specified range contains characters which are not contained in
2578 the implemented character range. If @var{error} is @code{#f},
2579 these characters are silently left out of the resulting
2580 character set.
2581
2582 The characters are added to @var{base_cs} and @var{base_cs} is
2583 returned.
2584 @end deffn
2585
2586 @deffn {Scheme Procedure} ->char-set x
2587 @deffnx {C Function} scm_to_char_set (x)
2588 Coerces x into a char-set. @var{x} may be a string, character or
2589 char-set. A string is converted to the set of its constituent
2590 characters; a character is converted to a singleton set; a char-set is
2591 returned as-is.
2592 @end deffn
2593
2594 @c ===================================================================
2595
2596 @node Querying Character Sets
2597 @subsubsection Querying Character Sets
2598
2599 Access the elements and other information of a character set with these
2600 procedures.
2601
2602 @deffn {Scheme Procedure} %char-set-dump cs
2603 Returns an association list containing debugging information
2604 for @var{cs}. The association list has the following entries.
2605 @table @code
2606 @item char-set
2607 The char-set itself
2608 @item len
2609 The number of groups of contiguous code points the char-set
2610 contains
2611 @item ranges
2612 A list of lists where each sublist is a range of code points
2613 and their associated characters
2614 @end table
2615 The return value of this function cannot be relied upon to be
2616 consistent between versions of Guile and should not be used in code.
2617 @end deffn
2618
2619 @deffn {Scheme Procedure} char-set-size cs
2620 @deffnx {C Function} scm_char_set_size (cs)
2621 Return the number of elements in character set @var{cs}.
2622 @end deffn
2623
2624 @deffn {Scheme Procedure} char-set-count pred cs
2625 @deffnx {C Function} scm_char_set_count (pred, cs)
2626 Return the number of the elements int the character set
2627 @var{cs} which satisfy the predicate @var{pred}.
2628 @end deffn
2629
2630 @deffn {Scheme Procedure} char-set->list cs
2631 @deffnx {C Function} scm_char_set_to_list (cs)
2632 Return a list containing the elements of the character set
2633 @var{cs}.
2634 @end deffn
2635
2636 @deffn {Scheme Procedure} char-set->string cs
2637 @deffnx {C Function} scm_char_set_to_string (cs)
2638 Return a string containing the elements of the character set
2639 @var{cs}. The order in which the characters are placed in the
2640 string is not defined.
2641 @end deffn
2642
2643 @deffn {Scheme Procedure} char-set-contains? cs ch
2644 @deffnx {C Function} scm_char_set_contains_p (cs, ch)
2645 Return @code{#t} if the character @var{ch} is contained in the
2646 character set @var{cs}, or @code{#f} otherwise.
2647 @end deffn
2648
2649 @deffn {Scheme Procedure} char-set-every pred cs
2650 @deffnx {C Function} scm_char_set_every (pred, cs)
2651 Return a true value if every character in the character set
2652 @var{cs} satisfies the predicate @var{pred}.
2653 @end deffn
2654
2655 @deffn {Scheme Procedure} char-set-any pred cs
2656 @deffnx {C Function} scm_char_set_any (pred, cs)
2657 Return a true value if any character in the character set
2658 @var{cs} satisfies the predicate @var{pred}.
2659 @end deffn
2660
2661 @c ===================================================================
2662
2663 @node Character-Set Algebra
2664 @subsubsection Character-Set Algebra
2665
2666 Character sets can be manipulated with the common set algebra operation,
2667 such as union, complement, intersection etc. All of these procedures
2668 provide side-effecting variants, which modify their character set
2669 argument(s).
2670
2671 @deffn {Scheme Procedure} char-set-adjoin cs chr @dots{}
2672 @deffnx {C Function} scm_char_set_adjoin (cs, chrs)
2673 Add all character arguments to the first argument, which must
2674 be a character set.
2675 @end deffn
2676
2677 @deffn {Scheme Procedure} char-set-delete cs chr @dots{}
2678 @deffnx {C Function} scm_char_set_delete (cs, chrs)
2679 Delete all character arguments from the first argument, which
2680 must be a character set.
2681 @end deffn
2682
2683 @deffn {Scheme Procedure} char-set-adjoin! cs chr @dots{}
2684 @deffnx {C Function} scm_char_set_adjoin_x (cs, chrs)
2685 Add all character arguments to the first argument, which must
2686 be a character set.
2687 @end deffn
2688
2689 @deffn {Scheme Procedure} char-set-delete! cs chr @dots{}
2690 @deffnx {C Function} scm_char_set_delete_x (cs, chrs)
2691 Delete all character arguments from the first argument, which
2692 must be a character set.
2693 @end deffn
2694
2695 @deffn {Scheme Procedure} char-set-complement cs
2696 @deffnx {C Function} scm_char_set_complement (cs)
2697 Return the complement of the character set @var{cs}.
2698 @end deffn
2699
2700 Note that the complement of a character set is likely to contain many
2701 reserved code points (code points that are not associated with
2702 characters). It may be helpful to modify the output of
2703 @code{char-set-complement} by computing its intersection with the set
2704 of designated code points, @code{char-set:designated}.
2705
2706 @deffn {Scheme Procedure} char-set-union cs @dots{}
2707 @deffnx {C Function} scm_char_set_union (char_sets)
2708 Return the union of all argument character sets.
2709 @end deffn
2710
2711 @deffn {Scheme Procedure} char-set-intersection cs @dots{}
2712 @deffnx {C Function} scm_char_set_intersection (char_sets)
2713 Return the intersection of all argument character sets.
2714 @end deffn
2715
2716 @deffn {Scheme Procedure} char-set-difference cs1 cs @dots{}
2717 @deffnx {C Function} scm_char_set_difference (cs1, char_sets)
2718 Return the difference of all argument character sets.
2719 @end deffn
2720
2721 @deffn {Scheme Procedure} char-set-xor cs @dots{}
2722 @deffnx {C Function} scm_char_set_xor (char_sets)
2723 Return the exclusive-or of all argument character sets.
2724 @end deffn
2725
2726 @deffn {Scheme Procedure} char-set-diff+intersection cs1 cs @dots{}
2727 @deffnx {C Function} scm_char_set_diff_plus_intersection (cs1, char_sets)
2728 Return the difference and the intersection of all argument
2729 character sets.
2730 @end deffn
2731
2732 @deffn {Scheme Procedure} char-set-complement! cs
2733 @deffnx {C Function} scm_char_set_complement_x (cs)
2734 Return the complement of the character set @var{cs}.
2735 @end deffn
2736
2737 @deffn {Scheme Procedure} char-set-union! cs1 cs @dots{}
2738 @deffnx {C Function} scm_char_set_union_x (cs1, char_sets)
2739 Return the union of all argument character sets.
2740 @end deffn
2741
2742 @deffn {Scheme Procedure} char-set-intersection! cs1 cs @dots{}
2743 @deffnx {C Function} scm_char_set_intersection_x (cs1, char_sets)
2744 Return the intersection of all argument character sets.
2745 @end deffn
2746
2747 @deffn {Scheme Procedure} char-set-difference! cs1 cs @dots{}
2748 @deffnx {C Function} scm_char_set_difference_x (cs1, char_sets)
2749 Return the difference of all argument character sets.
2750 @end deffn
2751
2752 @deffn {Scheme Procedure} char-set-xor! cs1 cs @dots{}
2753 @deffnx {C Function} scm_char_set_xor_x (cs1, char_sets)
2754 Return the exclusive-or of all argument character sets.
2755 @end deffn
2756
2757 @deffn {Scheme Procedure} char-set-diff+intersection! cs1 cs2 cs @dots{}
2758 @deffnx {C Function} scm_char_set_diff_plus_intersection_x (cs1, cs2, char_sets)
2759 Return the difference and the intersection of all argument
2760 character sets.
2761 @end deffn
2762
2763 @c ===================================================================
2764
2765 @node Standard Character Sets
2766 @subsubsection Standard Character Sets
2767
2768 In order to make the use of the character set data type and procedures
2769 useful, several predefined character set variables exist.
2770
2771 @cindex codeset
2772 @cindex charset
2773 @cindex locale
2774
2775 These character sets are locale independent and are not recomputed
2776 upon a @code{setlocale} call. They contain characters from the whole
2777 range of Unicode code points. For instance, @code{char-set:letter}
2778 contains about 100,000 characters.
2779
2780 @defvr {Scheme Variable} char-set:lower-case
2781 @defvrx {C Variable} scm_char_set_lower_case
2782 All lower-case characters.
2783 @end defvr
2784
2785 @defvr {Scheme Variable} char-set:upper-case
2786 @defvrx {C Variable} scm_char_set_upper_case
2787 All upper-case characters.
2788 @end defvr
2789
2790 @defvr {Scheme Variable} char-set:title-case
2791 @defvrx {C Variable} scm_char_set_title_case
2792 All single characters that function as if they were an upper-case
2793 letter followed by a lower-case letter.
2794 @end defvr
2795
2796 @defvr {Scheme Variable} char-set:letter
2797 @defvrx {C Variable} scm_char_set_letter
2798 All letters. This includes @code{char-set:lower-case},
2799 @code{char-set:upper-case}, @code{char-set:title-case}, and many
2800 letters that have no case at all. For example, Chinese and Japanese
2801 characters typically have no concept of case.
2802 @end defvr
2803
2804 @defvr {Scheme Variable} char-set:digit
2805 @defvrx {C Variable} scm_char_set_digit
2806 All digits.
2807 @end defvr
2808
2809 @defvr {Scheme Variable} char-set:letter+digit
2810 @defvrx {C Variable} scm_char_set_letter_and_digit
2811 The union of @code{char-set:letter} and @code{char-set:digit}.
2812 @end defvr
2813
2814 @defvr {Scheme Variable} char-set:graphic
2815 @defvrx {C Variable} scm_char_set_graphic
2816 All characters which would put ink on the paper.
2817 @end defvr
2818
2819 @defvr {Scheme Variable} char-set:printing
2820 @defvrx {C Variable} scm_char_set_printing
2821 The union of @code{char-set:graphic} and @code{char-set:whitespace}.
2822 @end defvr
2823
2824 @defvr {Scheme Variable} char-set:whitespace
2825 @defvrx {C Variable} scm_char_set_whitespace
2826 All whitespace characters.
2827 @end defvr
2828
2829 @defvr {Scheme Variable} char-set:blank
2830 @defvrx {C Variable} scm_char_set_blank
2831 All horizontal whitespace characters, which notably includes
2832 @code{#\space} and @code{#\tab}.
2833 @end defvr
2834
2835 @defvr {Scheme Variable} char-set:iso-control
2836 @defvrx {C Variable} scm_char_set_iso_control
2837 The ISO control characters are the C0 control characters (U+0000 to
2838 U+001F), delete (U+007F), and the C1 control characters (U+0080 to
2839 U+009F).
2840 @end defvr
2841
2842 @defvr {Scheme Variable} char-set:punctuation
2843 @defvrx {C Variable} scm_char_set_punctuation
2844 All punctuation characters, such as the characters
2845 @code{!"#%&'()*,-./:;?@@[\\]_@{@}}
2846 @end defvr
2847
2848 @defvr {Scheme Variable} char-set:symbol
2849 @defvrx {C Variable} scm_char_set_symbol
2850 All symbol characters, such as the characters @code{$+<=>^`|~}.
2851 @end defvr
2852
2853 @defvr {Scheme Variable} char-set:hex-digit
2854 @defvrx {C Variable} scm_char_set_hex_digit
2855 The hexadecimal digits @code{0123456789abcdefABCDEF}.
2856 @end defvr
2857
2858 @defvr {Scheme Variable} char-set:ascii
2859 @defvrx {C Variable} scm_char_set_ascii
2860 All ASCII characters.
2861 @end defvr
2862
2863 @defvr {Scheme Variable} char-set:empty
2864 @defvrx {C Variable} scm_char_set_empty
2865 The empty character set.
2866 @end defvr
2867
2868 @defvr {Scheme Variable} char-set:designated
2869 @defvrx {C Variable} scm_char_set_designated
2870 This character set contains all designated code points. This includes
2871 all the code points to which Unicode has assigned a character or other
2872 meaning.
2873 @end defvr
2874
2875 @defvr {Scheme Variable} char-set:full
2876 @defvrx {C Variable} scm_char_set_full
2877 This character set contains all possible code points. This includes
2878 both designated and reserved code points.
2879 @end defvr
2880
2881 @node Strings
2882 @subsection Strings
2883 @tpindex Strings
2884
2885 Strings are fixed-length sequences of characters. They can be created
2886 by calling constructor procedures, but they can also literally get
2887 entered at the @acronym{REPL} or in Scheme source files.
2888
2889 @c Guile provides a rich set of string processing procedures, because text
2890 @c handling is very important when Guile is used as a scripting language.
2891
2892 Strings always carry the information about how many characters they are
2893 composed of with them, so there is no special end-of-string character,
2894 like in C. That means that Scheme strings can contain any character,
2895 even the @samp{#\nul} character @samp{\0}.
2896
2897 To use strings efficiently, you need to know a bit about how Guile
2898 implements them. In Guile, a string consists of two parts, a head and
2899 the actual memory where the characters are stored. When a string (or
2900 a substring of it) is copied, only a new head gets created, the memory
2901 is usually not copied. The two heads start out pointing to the same
2902 memory.
2903
2904 When one of these two strings is modified, as with @code{string-set!},
2905 their common memory does get copied so that each string has its own
2906 memory and modifying one does not accidentally modify the other as well.
2907 Thus, Guile's strings are `copy on write'; the actual copying of their
2908 memory is delayed until one string is written to.
2909
2910 This implementation makes functions like @code{substring} very
2911 efficient in the common case that no modifications are done to the
2912 involved strings.
2913
2914 If you do know that your strings are getting modified right away, you
2915 can use @code{substring/copy} instead of @code{substring}. This
2916 function performs the copy immediately at the time of creation. This
2917 is more efficient, especially in a multi-threaded program. Also,
2918 @code{substring/copy} can avoid the problem that a short substring
2919 holds on to the memory of a very large original string that could
2920 otherwise be recycled.
2921
2922 If you want to avoid the copy altogether, so that modifications of one
2923 string show up in the other, you can use @code{substring/shared}. The
2924 strings created by this procedure are called @dfn{mutation sharing
2925 substrings} since the substring and the original string share
2926 modifications to each other.
2927
2928 If you want to prevent modifications, use @code{substring/read-only}.
2929
2930 Guile provides all procedures of SRFI-13 and a few more.
2931
2932 @menu
2933 * String Syntax:: Read syntax for strings.
2934 * String Predicates:: Testing strings for certain properties.
2935 * String Constructors:: Creating new string objects.
2936 * List/String Conversion:: Converting from/to lists of characters.
2937 * String Selection:: Select portions from strings.
2938 * String Modification:: Modify parts or whole strings.
2939 * String Comparison:: Lexicographic ordering predicates.
2940 * String Searching:: Searching in strings.
2941 * Alphabetic Case Mapping:: Convert the alphabetic case of strings.
2942 * Reversing and Appending Strings:: Appending strings to form a new string.
2943 * Mapping Folding and Unfolding:: Iterating over strings.
2944 * Miscellaneous String Operations:: Replicating, insertion, parsing, ...
2945 * Representing Strings as Bytes:: Encoding and decoding strings.
2946 * Conversion to/from C::
2947 * String Internals:: The storage strategy for strings.
2948 @end menu
2949
2950 @node String Syntax
2951 @subsubsection String Read Syntax
2952
2953 @c In the following @code is used to get a good font in TeX etc, but
2954 @c is omitted for Info format, so as not to risk any confusion over
2955 @c whether surrounding ` ' quotes are part of the escape or are
2956 @c special in a string (they're not).
2957
2958 The read syntax for strings is an arbitrarily long sequence of
2959 characters enclosed in double quotes (@nicode{"}).
2960
2961 Backslash is an escape character and can be used to insert the following
2962 special characters. @nicode{\"} and @nicode{\\} are R5RS standard,
2963 @nicode{\|} is R7RS standard, the next seven are R6RS standard ---
2964 notice they follow C syntax --- and the remaining four are Guile
2965 extensions.
2966
2967 @table @asis
2968 @item @nicode{\\}
2969 Backslash character.
2970
2971 @item @nicode{\"}
2972 Double quote character (an unescaped @nicode{"} is otherwise the end
2973 of the string).
2974
2975 @item @nicode{\|}
2976 Vertical bar character.
2977
2978 @item @nicode{\a}
2979 Bell character (ASCII 7).
2980
2981 @item @nicode{\f}
2982 Formfeed character (ASCII 12).
2983
2984 @item @nicode{\n}
2985 Newline character (ASCII 10).
2986
2987 @item @nicode{\r}
2988 Carriage return character (ASCII 13).
2989
2990 @item @nicode{\t}
2991 Tab character (ASCII 9).
2992
2993 @item @nicode{\v}
2994 Vertical tab character (ASCII 11).
2995
2996 @item @nicode{\b}
2997 Backspace character (ASCII 8).
2998
2999 @item @nicode{\0}
3000 NUL character (ASCII 0).
3001
3002 @item @nicode{\} followed by newline (ASCII 10)
3003 Nothing. This way if @nicode{\} is the last character in a line, the
3004 string will continue with the first character from the next line,
3005 without a line break.
3006
3007 If the @code{hungry-eol-escapes} reader option is enabled, which is not
3008 the case by default, leading whitespace on the next line is discarded.
3009
3010 @lisp
3011 "foo\
3012 bar"
3013 @result{} "foo bar"
3014 (read-enable 'hungry-eol-escapes)
3015 "foo\
3016 bar"
3017 @result{} "foobar"
3018 @end lisp
3019 @item @nicode{\xHH}
3020 Character code given by two hexadecimal digits. For example
3021 @nicode{\x7f} for an ASCII DEL (127).
3022
3023 @item @nicode{\uHHHH}
3024 Character code given by four hexadecimal digits. For example
3025 @nicode{\u0100} for a capital A with macron (U+0100).
3026
3027 @item @nicode{\UHHHHHH}
3028 Character code given by six hexadecimal digits. For example
3029 @nicode{\U010402}.
3030 @end table
3031
3032 @noindent
3033 The following are examples of string literals:
3034
3035 @lisp
3036 "foo"
3037 "bar plonk"
3038 "Hello World"
3039 "\"Hi\", he said."
3040 @end lisp
3041
3042 The three escape sequences @code{\xHH}, @code{\uHHHH} and @code{\UHHHHHH} were
3043 chosen to not break compatibility with code written for previous versions of
3044 Guile. The R6RS specification suggests a different, incompatible syntax for hex
3045 escapes: @code{\xHHHH;} -- a character code followed by one to eight hexadecimal
3046 digits terminated with a semicolon. If this escape format is desired instead,
3047 it can be enabled with the reader option @code{r6rs-hex-escapes}.
3048
3049 @lisp
3050 (read-enable 'r6rs-hex-escapes)
3051 @end lisp
3052
3053 For more on reader options, @xref{Scheme Read}.
3054
3055 @node String Predicates
3056 @subsubsection String Predicates
3057
3058 The following procedures can be used to check whether a given string
3059 fulfills some specified property.
3060
3061 @rnindex string?
3062 @deffn {Scheme Procedure} string? obj
3063 @deffnx {C Function} scm_string_p (obj)
3064 Return @code{#t} if @var{obj} is a string, else @code{#f}.
3065 @end deffn
3066
3067 @deftypefn {C Function} int scm_is_string (SCM obj)
3068 Returns @code{1} if @var{obj} is a string, @code{0} otherwise.
3069 @end deftypefn
3070
3071 @deffn {Scheme Procedure} string-null? str
3072 @deffnx {C Function} scm_string_null_p (str)
3073 Return @code{#t} if @var{str}'s length is zero, and
3074 @code{#f} otherwise.
3075 @lisp
3076 (string-null? "") @result{} #t
3077 y @result{} "foo"
3078 (string-null? y) @result{} #f
3079 @end lisp
3080 @end deffn
3081
3082 @deffn {Scheme Procedure} string-any char_pred s [start [end]]
3083 @deffnx {C Function} scm_string_any (char_pred, s, start, end)
3084 Check if @var{char_pred} is true for any character in string @var{s}.
3085
3086 @var{char_pred} can be a character to check for any equal to that, or
3087 a character set (@pxref{Character Sets}) to check for any in that set,
3088 or a predicate procedure to call.
3089
3090 For a procedure, calls @code{(@var{char_pred} c)} are made
3091 successively on the characters from @var{start} to @var{end}. If
3092 @var{char_pred} returns true (ie.@: non-@code{#f}), @code{string-any}
3093 stops and that return value is the return from @code{string-any}. The
3094 call on the last character (ie.@: at @math{@var{end}-1}), if that
3095 point is reached, is a tail call.
3096
3097 If there are no characters in @var{s} (ie.@: @var{start} equals
3098 @var{end}) then the return is @code{#f}.
3099 @end deffn
3100
3101 @deffn {Scheme Procedure} string-every char_pred s [start [end]]
3102 @deffnx {C Function} scm_string_every (char_pred, s, start, end)
3103 Check if @var{char_pred} is true for every character in string
3104 @var{s}.
3105
3106 @var{char_pred} can be a character to check for every character equal
3107 to that, or a character set (@pxref{Character Sets}) to check for
3108 every character being in that set, or a predicate procedure to call.
3109
3110 For a procedure, calls @code{(@var{char_pred} c)} are made
3111 successively on the characters from @var{start} to @var{end}. If
3112 @var{char_pred} returns @code{#f}, @code{string-every} stops and
3113 returns @code{#f}. The call on the last character (ie.@: at
3114 @math{@var{end}-1}), if that point is reached, is a tail call and the
3115 return from that call is the return from @code{string-every}.
3116
3117 If there are no characters in @var{s} (ie.@: @var{start} equals
3118 @var{end}) then the return is @code{#t}.
3119 @end deffn
3120
3121 @node String Constructors
3122 @subsubsection String Constructors
3123
3124 The string constructor procedures create new string objects, possibly
3125 initializing them with some specified character data. See also
3126 @xref{String Selection}, for ways to create strings from existing
3127 strings.
3128
3129 @c FIXME::martin: list->string belongs into `List/String Conversion'
3130
3131 @deffn {Scheme Procedure} string char@dots{}
3132 @rnindex string
3133 Return a newly allocated string made from the given character
3134 arguments.
3135
3136 @example
3137 (string #\x #\y #\z) @result{} "xyz"
3138 (string) @result{} ""
3139 @end example
3140 @end deffn
3141
3142 @deffn {Scheme Procedure} list->string lst
3143 @deffnx {C Function} scm_string (lst)
3144 @rnindex list->string
3145 Return a newly allocated string made from a list of characters.
3146
3147 @example
3148 (list->string '(#\a #\b #\c)) @result{} "abc"
3149 @end example
3150 @end deffn
3151
3152 @deffn {Scheme Procedure} reverse-list->string lst
3153 @deffnx {C Function} scm_reverse_list_to_string (lst)
3154 Return a newly allocated string made from a list of characters, in
3155 reverse order.
3156
3157 @example
3158 (reverse-list->string '(#\a #\B #\c)) @result{} "cBa"
3159 @end example
3160 @end deffn
3161
3162 @rnindex make-string
3163 @deffn {Scheme Procedure} make-string k [chr]
3164 @deffnx {C Function} scm_make_string (k, chr)
3165 Return a newly allocated string of
3166 length @var{k}. If @var{chr} is given, then all elements of
3167 the string are initialized to @var{chr}, otherwise the contents
3168 of the string are unspecified.
3169 @end deffn
3170
3171 @deftypefn {C Function} SCM scm_c_make_string (size_t len, SCM chr)
3172 Like @code{scm_make_string}, but expects the length as a
3173 @code{size_t}.
3174 @end deftypefn
3175
3176 @deffn {Scheme Procedure} string-tabulate proc len
3177 @deffnx {C Function} scm_string_tabulate (proc, len)
3178 @var{proc} is an integer->char procedure. Construct a string
3179 of size @var{len} by applying @var{proc} to each index to
3180 produce the corresponding string element. The order in which
3181 @var{proc} is applied to the indices is not specified.
3182 @end deffn
3183
3184 @deffn {Scheme Procedure} string-join ls [delimiter [grammar]]
3185 @deffnx {C Function} scm_string_join (ls, delimiter, grammar)
3186 Append the string in the string list @var{ls}, using the string
3187 @var{delimiter} as a delimiter between the elements of @var{ls}.
3188 @var{grammar} is a symbol which specifies how the delimiter is
3189 placed between the strings, and defaults to the symbol
3190 @code{infix}.
3191
3192 @table @code
3193 @item infix
3194 Insert the separator between list elements. An empty string
3195 will produce an empty list.
3196 @item strict-infix
3197 Like @code{infix}, but will raise an error if given the empty
3198 list.
3199 @item suffix
3200 Insert the separator after every list element.
3201 @item prefix
3202 Insert the separator before each list element.
3203 @end table
3204 @end deffn
3205
3206 @node List/String Conversion
3207 @subsubsection List/String conversion
3208
3209 When processing strings, it is often convenient to first convert them
3210 into a list representation by using the procedure @code{string->list},
3211 work with the resulting list, and then convert it back into a string.
3212 These procedures are useful for similar tasks.
3213
3214 @rnindex string->list
3215 @deffn {Scheme Procedure} string->list str [start [end]]
3216 @deffnx {C Function} scm_substring_to_list (str, start, end)
3217 @deffnx {C Function} scm_string_to_list (str)
3218 Convert the string @var{str} into a list of characters.
3219 @end deffn
3220
3221 @deffn {Scheme Procedure} string-split str char_pred
3222 @deffnx {C Function} scm_string_split (str, char_pred)
3223 Split the string @var{str} into a list of substrings delimited
3224 by appearances of characters that
3225
3226 @itemize @bullet
3227 @item
3228 equal @var{char_pred}, if it is a character,
3229
3230 @item
3231 satisfy the predicate @var{char_pred}, if it is a procedure,
3232
3233 @item
3234 are in the set @var{char_pred}, if it is a character set.
3235 @end itemize
3236
3237 Note that an empty substring between separator characters will result in
3238 an empty string in the result list.
3239
3240 @lisp
3241 (string-split "root:x:0:0:root:/root:/bin/bash" #\:)
3242 @result{}
3243 ("root" "x" "0" "0" "root" "/root" "/bin/bash")
3244
3245 (string-split "::" #\:)
3246 @result{}
3247 ("" "" "")
3248
3249 (string-split "" #\:)
3250 @result{}
3251 ("")
3252 @end lisp
3253 @end deffn
3254
3255
3256 @node String Selection
3257 @subsubsection String Selection
3258
3259 Portions of strings can be extracted by these procedures.
3260 @code{string-ref} delivers individual characters whereas
3261 @code{substring} can be used to extract substrings from longer strings.
3262
3263 @rnindex string-length
3264 @deffn {Scheme Procedure} string-length string
3265 @deffnx {C Function} scm_string_length (string)
3266 Return the number of characters in @var{string}.
3267 @end deffn
3268
3269 @deftypefn {C Function} size_t scm_c_string_length (SCM str)
3270 Return the number of characters in @var{str} as a @code{size_t}.
3271 @end deftypefn
3272
3273 @rnindex string-ref
3274 @deffn {Scheme Procedure} string-ref str k
3275 @deffnx {C Function} scm_string_ref (str, k)
3276 Return character @var{k} of @var{str} using zero-origin
3277 indexing. @var{k} must be a valid index of @var{str}.
3278 @end deffn
3279
3280 @deftypefn {C Function} SCM scm_c_string_ref (SCM str, size_t k)
3281 Return character @var{k} of @var{str} using zero-origin
3282 indexing. @var{k} must be a valid index of @var{str}.
3283 @end deftypefn
3284
3285 @rnindex string-copy
3286 @deffn {Scheme Procedure} string-copy str [start [end]]
3287 @deffnx {C Function} scm_substring_copy (str, start, end)
3288 @deffnx {C Function} scm_string_copy (str)
3289 Return a copy of the given string @var{str}.
3290
3291 The returned string shares storage with @var{str} initially, but it is
3292 copied as soon as one of the two strings is modified.
3293 @end deffn
3294
3295 @rnindex substring
3296 @deffn {Scheme Procedure} substring str start [end]
3297 @deffnx {C Function} scm_substring (str, start, end)
3298 Return a new string formed from the characters
3299 of @var{str} beginning with index @var{start} (inclusive) and
3300 ending with index @var{end} (exclusive).
3301 @var{str} must be a string, @var{start} and @var{end} must be
3302 exact integers satisfying:
3303
3304 0 <= @var{start} <= @var{end} <= @code{(string-length @var{str})}.
3305
3306 The returned string shares storage with @var{str} initially, but it is
3307 copied as soon as one of the two strings is modified.
3308 @end deffn
3309
3310 @deffn {Scheme Procedure} substring/shared str start [end]
3311 @deffnx {C Function} scm_substring_shared (str, start, end)
3312 Like @code{substring}, but the strings continue to share their storage
3313 even if they are modified. Thus, modifications to @var{str} show up
3314 in the new string, and vice versa.
3315 @end deffn
3316
3317 @deffn {Scheme Procedure} substring/copy str start [end]
3318 @deffnx {C Function} scm_substring_copy (str, start, end)
3319 Like @code{substring}, but the storage for the new string is copied
3320 immediately.
3321 @end deffn
3322
3323 @deffn {Scheme Procedure} substring/read-only str start [end]
3324 @deffnx {C Function} scm_substring_read_only (str, start, end)
3325 Like @code{substring}, but the resulting string can not be modified.
3326 @end deffn
3327
3328 @deftypefn {C Function} SCM scm_c_substring (SCM str, size_t start, size_t end)
3329 @deftypefnx {C Function} SCM scm_c_substring_shared (SCM str, size_t start, size_t end)
3330 @deftypefnx {C Function} SCM scm_c_substring_copy (SCM str, size_t start, size_t end)
3331 @deftypefnx {C Function} SCM scm_c_substring_read_only (SCM str, size_t start, size_t end)
3332 Like @code{scm_substring}, etc. but the bounds are given as a @code{size_t}.
3333 @end deftypefn
3334
3335 @deffn {Scheme Procedure} string-take s n
3336 @deffnx {C Function} scm_string_take (s, n)
3337 Return the @var{n} first characters of @var{s}.
3338 @end deffn
3339
3340 @deffn {Scheme Procedure} string-drop s n
3341 @deffnx {C Function} scm_string_drop (s, n)
3342 Return all but the first @var{n} characters of @var{s}.
3343 @end deffn
3344
3345 @deffn {Scheme Procedure} string-take-right s n
3346 @deffnx {C Function} scm_string_take_right (s, n)
3347 Return the @var{n} last characters of @var{s}.
3348 @end deffn
3349
3350 @deffn {Scheme Procedure} string-drop-right s n
3351 @deffnx {C Function} scm_string_drop_right (s, n)
3352 Return all but the last @var{n} characters of @var{s}.
3353 @end deffn
3354
3355 @deffn {Scheme Procedure} string-pad s len [chr [start [end]]]
3356 @deffnx {Scheme Procedure} string-pad-right s len [chr [start [end]]]
3357 @deffnx {C Function} scm_string_pad (s, len, chr, start, end)
3358 @deffnx {C Function} scm_string_pad_right (s, len, chr, start, end)
3359 Take characters @var{start} to @var{end} from the string @var{s} and
3360 either pad with @var{chr} or truncate them to give @var{len}
3361 characters.
3362
3363 @code{string-pad} pads or truncates on the left, so for example
3364
3365 @example
3366 (string-pad "x" 3) @result{} " x"
3367 (string-pad "abcde" 3) @result{} "cde"
3368 @end example
3369
3370 @code{string-pad-right} pads or truncates on the right, so for example
3371
3372 @example
3373 (string-pad-right "x" 3) @result{} "x "
3374 (string-pad-right "abcde" 3) @result{} "abc"
3375 @end example
3376 @end deffn
3377
3378 @deffn {Scheme Procedure} string-trim s [char_pred [start [end]]]
3379 @deffnx {Scheme Procedure} string-trim-right s [char_pred [start [end]]]
3380 @deffnx {Scheme Procedure} string-trim-both s [char_pred [start [end]]]
3381 @deffnx {C Function} scm_string_trim (s, char_pred, start, end)
3382 @deffnx {C Function} scm_string_trim_right (s, char_pred, start, end)
3383 @deffnx {C Function} scm_string_trim_both (s, char_pred, start, end)
3384 Trim occurrences of @var{char_pred} from the ends of @var{s}.
3385
3386 @code{string-trim} trims @var{char_pred} characters from the left
3387 (start) of the string, @code{string-trim-right} trims them from the
3388 right (end) of the string, @code{string-trim-both} trims from both
3389 ends.
3390
3391 @var{char_pred} can be a character, a character set, or a predicate
3392 procedure to call on each character. If @var{char_pred} is not given
3393 the default is whitespace as per @code{char-set:whitespace}
3394 (@pxref{Standard Character Sets}).
3395
3396 @example
3397 (string-trim " x ") @result{} "x "
3398 (string-trim-right "banana" #\a) @result{} "banan"
3399 (string-trim-both ".,xy:;" char-set:punctuation)
3400 @result{} "xy"
3401 (string-trim-both "xyzzy" (lambda (c)
3402 (or (eqv? c #\x)
3403 (eqv? c #\y))))
3404 @result{} "zz"
3405 @end example
3406 @end deffn
3407
3408 @node String Modification
3409 @subsubsection String Modification
3410
3411 These procedures are for modifying strings in-place. This means that the
3412 result of the operation is not a new string; instead, the original string's
3413 memory representation is modified.
3414
3415 @rnindex string-set!
3416 @deffn {Scheme Procedure} string-set! str k chr
3417 @deffnx {C Function} scm_string_set_x (str, k, chr)
3418 Store @var{chr} in element @var{k} of @var{str} and return
3419 an unspecified value. @var{k} must be a valid index of
3420 @var{str}.
3421 @end deffn
3422
3423 @deftypefn {C Function} void scm_c_string_set_x (SCM str, size_t k, SCM chr)
3424 Like @code{scm_string_set_x}, but the index is given as a @code{size_t}.
3425 @end deftypefn
3426
3427 @rnindex string-fill!
3428 @deffn {Scheme Procedure} string-fill! str chr [start [end]]
3429 @deffnx {C Function} scm_substring_fill_x (str, chr, start, end)
3430 @deffnx {C Function} scm_string_fill_x (str, chr)
3431 Stores @var{chr} in every element of the given @var{str} and
3432 returns an unspecified value.
3433 @end deffn
3434
3435 @deffn {Scheme Procedure} substring-fill! str start end fill
3436 @deffnx {C Function} scm_substring_fill_x (str, start, end, fill)
3437 Change every character in @var{str} between @var{start} and
3438 @var{end} to @var{fill}.
3439
3440 @lisp
3441 (define y (string-copy "abcdefg"))
3442 (substring-fill! y 1 3 #\r)
3443 y
3444 @result{} "arrdefg"
3445 @end lisp
3446 @end deffn
3447
3448 @deffn {Scheme Procedure} substring-move! str1 start1 end1 str2 start2
3449 @deffnx {C Function} scm_substring_move_x (str1, start1, end1, str2, start2)
3450 Copy the substring of @var{str1} bounded by @var{start1} and @var{end1}
3451 into @var{str2} beginning at position @var{start2}.
3452 @var{str1} and @var{str2} can be the same string.
3453 @end deffn
3454
3455 @deffn {Scheme Procedure} string-copy! target tstart s [start [end]]
3456 @deffnx {C Function} scm_string_copy_x (target, tstart, s, start, end)
3457 Copy the sequence of characters from index range [@var{start},
3458 @var{end}) in string @var{s} to string @var{target}, beginning
3459 at index @var{tstart}. The characters are copied left-to-right
3460 or right-to-left as needed -- the copy is guaranteed to work,
3461 even if @var{target} and @var{s} are the same string. It is an
3462 error if the copy operation runs off the end of the target
3463 string.
3464 @end deffn
3465
3466
3467 @node String Comparison
3468 @subsubsection String Comparison
3469
3470 The procedures in this section are similar to the character ordering
3471 predicates (@pxref{Characters}), but are defined on character sequences.
3472
3473 The first set is specified in R5RS and has names that end in @code{?}.
3474 The second set is specified in SRFI-13 and the names have not ending
3475 @code{?}.
3476
3477 The predicates ending in @code{-ci} ignore the character case
3478 when comparing strings. For now, case-insensitive comparison is done
3479 using the R5RS rules, where every lower-case character that has a
3480 single character upper-case form is converted to uppercase before
3481 comparison. See @xref{Text Collation, the @code{(ice-9
3482 i18n)} module}, for locale-dependent string comparison.
3483
3484 @rnindex string=?
3485 @deffn {Scheme Procedure} string=? s1 s2 s3 @dots{}
3486 Lexicographic equality predicate; return @code{#t} if all strings are
3487 the same length and contain the same characters in the same positions,
3488 otherwise return @code{#f}.
3489
3490 The procedure @code{string-ci=?} treats upper and lower case
3491 letters as though they were the same character, but
3492 @code{string=?} treats upper and lower case as distinct
3493 characters.
3494 @end deffn
3495
3496 @rnindex string<?
3497 @deffn {Scheme Procedure} string<? s1 s2 s3 @dots{}
3498 Lexicographic ordering predicate; return @code{#t} if, for every pair of
3499 consecutive string arguments @var{str_i} and @var{str_i+1}, @var{str_i} is
3500 lexicographically less than @var{str_i+1}.
3501 @end deffn
3502
3503 @rnindex string<=?
3504 @deffn {Scheme Procedure} string<=? s1 s2 s3 @dots{}
3505 Lexicographic ordering predicate; return @code{#t} if, for every pair of
3506 consecutive string arguments @var{str_i} and @var{str_i+1}, @var{str_i} is
3507 lexicographically less than or equal to @var{str_i+1}.
3508 @end deffn
3509
3510 @rnindex string>?
3511 @deffn {Scheme Procedure} string>? s1 s2 s3 @dots{}
3512 Lexicographic ordering predicate; return @code{#t} if, for every pair of
3513 consecutive string arguments @var{str_i} and @var{str_i+1}, @var{str_i} is
3514 lexicographically greater than @var{str_i+1}.
3515 @end deffn
3516
3517 @rnindex string>=?
3518 @deffn {Scheme Procedure} string>=? s1 s2 s3 @dots{}
3519 Lexicographic ordering predicate; return @code{#t} if, for every pair of
3520 consecutive string arguments @var{str_i} and @var{str_i+1}, @var{str_i} is
3521 lexicographically greater than or equal to @var{str_i+1}.
3522 @end deffn
3523
3524 @rnindex string-ci=?
3525 @deffn {Scheme Procedure} string-ci=? s1 s2 s3 @dots{}
3526 Case-insensitive string equality predicate; return @code{#t} if
3527 all strings are the same length and their component
3528 characters match (ignoring case) at each position; otherwise
3529 return @code{#f}.
3530 @end deffn
3531
3532 @rnindex string-ci<?
3533 @deffn {Scheme Procedure} string-ci<? s1 s2 s3 @dots{}
3534 Case insensitive lexicographic ordering predicate; return @code{#t} if,
3535 for every pair of consecutive string arguments @var{str_i} and
3536 @var{str_i+1}, @var{str_i} is lexicographically less than @var{str_i+1}
3537 regardless of case.
3538 @end deffn
3539
3540 @rnindex string<=?
3541 @deffn {Scheme Procedure} string-ci<=? s1 s2 s3 @dots{}
3542 Case insensitive lexicographic ordering predicate; return @code{#t} if,
3543 for every pair of consecutive string arguments @var{str_i} and
3544 @var{str_i+1}, @var{str_i} is lexicographically less than or equal to
3545 @var{str_i+1} regardless of case.
3546 @end deffn
3547
3548 @rnindex string-ci>?
3549 @deffn {Scheme Procedure} string-ci>? s1 s2 s3 @dots{}
3550 Case insensitive lexicographic ordering predicate; return @code{#t} if,
3551 for every pair of consecutive string arguments @var{str_i} and
3552 @var{str_i+1}, @var{str_i} is lexicographically greater than
3553 @var{str_i+1} regardless of case.
3554 @end deffn
3555
3556 @rnindex string-ci>=?
3557 @deffn {Scheme Procedure} string-ci>=? s1 s2 s3 @dots{}
3558 Case insensitive lexicographic ordering predicate; return @code{#t} if,
3559 for every pair of consecutive string arguments @var{str_i} and
3560 @var{str_i+1}, @var{str_i} is lexicographically greater than or equal to
3561 @var{str_i+1} regardless of case.
3562 @end deffn
3563
3564 @deffn {Scheme Procedure} string-compare s1 s2 proc_lt proc_eq proc_gt [start1 [end1 [start2 [end2]]]]
3565 @deffnx {C Function} scm_string_compare (s1, s2, proc_lt, proc_eq, proc_gt, start1, end1, start2, end2)
3566 Apply @var{proc_lt}, @var{proc_eq}, @var{proc_gt} to the
3567 mismatch index, depending upon whether @var{s1} is less than,
3568 equal to, or greater than @var{s2}. The mismatch index is the
3569 largest index @var{i} such that for every 0 <= @var{j} <
3570 @var{i}, @var{s1}[@var{j}] = @var{s2}[@var{j}] -- that is,
3571 @var{i} is the first position that does not match.
3572 @end deffn
3573
3574 @deffn {Scheme Procedure} string-compare-ci s1 s2 proc_lt proc_eq proc_gt [start1 [end1 [start2 [end2]]]]
3575 @deffnx {C Function} scm_string_compare_ci (s1, s2, proc_lt, proc_eq, proc_gt, start1, end1, start2, end2)
3576 Apply @var{proc_lt}, @var{proc_eq}, @var{proc_gt} to the
3577 mismatch index, depending upon whether @var{s1} is less than,
3578 equal to, or greater than @var{s2}. The mismatch index is the
3579 largest index @var{i} such that for every 0 <= @var{j} <
3580 @var{i}, @var{s1}[@var{j}] = @var{s2}[@var{j}] -- that is,
3581 @var{i} is the first position where the lowercased letters
3582 do not match.
3583
3584 @end deffn
3585
3586 @deffn {Scheme Procedure} string= s1 s2 [start1 [end1 [start2 [end2]]]]
3587 @deffnx {C Function} scm_string_eq (s1, s2, start1, end1, start2, end2)
3588 Return @code{#f} if @var{s1} and @var{s2} are not equal, a true
3589 value otherwise.
3590 @end deffn
3591
3592 @deffn {Scheme Procedure} string<> s1 s2 [start1 [end1 [start2 [end2]]]]
3593 @deffnx {C Function} scm_string_neq (s1, s2, start1, end1, start2, end2)
3594 Return @code{#f} if @var{s1} and @var{s2} are equal, a true
3595 value otherwise.
3596 @end deffn
3597
3598 @deffn {Scheme Procedure} string< s1 s2 [start1 [end1 [start2 [end2]]]]
3599 @deffnx {C Function} scm_string_lt (s1, s2, start1, end1, start2, end2)
3600 Return @code{#f} if @var{s1} is greater or equal to @var{s2}, a
3601 true value otherwise.
3602 @end deffn
3603
3604 @deffn {Scheme Procedure} string> s1 s2 [start1 [end1 [start2 [end2]]]]
3605 @deffnx {C Function} scm_string_gt (s1, s2, start1, end1, start2, end2)
3606 Return @code{#f} if @var{s1} is less or equal to @var{s2}, a
3607 true value otherwise.
3608 @end deffn
3609
3610 @deffn {Scheme Procedure} string<= s1 s2 [start1 [end1 [start2 [end2]]]]
3611 @deffnx {C Function} scm_string_le (s1, s2, start1, end1, start2, end2)
3612 Return @code{#f} if @var{s1} is greater to @var{s2}, a true
3613 value otherwise.
3614 @end deffn
3615
3616 @deffn {Scheme Procedure} string>= s1 s2 [start1 [end1 [start2 [end2]]]]
3617 @deffnx {C Function} scm_string_ge (s1, s2, start1, end1, start2, end2)
3618 Return @code{#f} if @var{s1} is less to @var{s2}, a true value
3619 otherwise.
3620 @end deffn
3621
3622 @deffn {Scheme Procedure} string-ci= s1 s2 [start1 [end1 [start2 [end2]]]]
3623 @deffnx {C Function} scm_string_ci_eq (s1, s2, start1, end1, start2, end2)
3624 Return @code{#f} if @var{s1} and @var{s2} are not equal, a true
3625 value otherwise. The character comparison is done
3626 case-insensitively.
3627 @end deffn
3628
3629 @deffn {Scheme Procedure} string-ci<> s1 s2 [start1 [end1 [start2 [end2]]]]
3630 @deffnx {C Function} scm_string_ci_neq (s1, s2, start1, end1, start2, end2)
3631 Return @code{#f} if @var{s1} and @var{s2} are equal, a true
3632 value otherwise. The character comparison is done
3633 case-insensitively.
3634 @end deffn
3635
3636 @deffn {Scheme Procedure} string-ci< s1 s2 [start1 [end1 [start2 [end2]]]]
3637 @deffnx {C Function} scm_string_ci_lt (s1, s2, start1, end1, start2, end2)
3638 Return @code{#f} if @var{s1} is greater or equal to @var{s2}, a
3639 true value otherwise. The character comparison is done
3640 case-insensitively.
3641 @end deffn
3642
3643 @deffn {Scheme Procedure} string-ci> s1 s2 [start1 [end1 [start2 [end2]]]]
3644 @deffnx {C Function} scm_string_ci_gt (s1, s2, start1, end1, start2, end2)
3645 Return @code{#f} if @var{s1} is less or equal to @var{s2}, a
3646 true value otherwise. The character comparison is done
3647 case-insensitively.
3648 @end deffn
3649
3650 @deffn {Scheme Procedure} string-ci<= s1 s2 [start1 [end1 [start2 [end2]]]]
3651 @deffnx {C Function} scm_string_ci_le (s1, s2, start1, end1, start2, end2)
3652 Return @code{#f} if @var{s1} is greater to @var{s2}, a true
3653 value otherwise. The character comparison is done
3654 case-insensitively.
3655 @end deffn
3656
3657 @deffn {Scheme Procedure} string-ci>= s1 s2 [start1 [end1 [start2 [end2]]]]
3658 @deffnx {C Function} scm_string_ci_ge (s1, s2, start1, end1, start2, end2)
3659 Return @code{#f} if @var{s1} is less to @var{s2}, a true value
3660 otherwise. The character comparison is done
3661 case-insensitively.
3662 @end deffn
3663
3664 @deffn {Scheme Procedure} string-hash s [bound [start [end]]]
3665 @deffnx {C Function} scm_substring_hash (s, bound, start, end)
3666 Compute a hash value for @var{s}. The optional argument @var{bound} is a non-negative exact integer specifying the range of the hash function. A positive value restricts the return value to the range [0,bound).
3667 @end deffn
3668
3669 @deffn {Scheme Procedure} string-hash-ci s [bound [start [end]]]
3670 @deffnx {C Function} scm_substring_hash_ci (s, bound, start, end)
3671 Compute a hash value for @var{s}. The optional argument @var{bound} is a non-negative exact integer specifying the range of the hash function. A positive value restricts the return value to the range [0,bound).
3672 @end deffn
3673
3674 Because the same visual appearance of an abstract Unicode character can
3675 be obtained via multiple sequences of Unicode characters, even the
3676 case-insensitive string comparison functions described above may return
3677 @code{#f} when presented with strings containing different
3678 representations of the same character. For example, the Unicode
3679 character ``LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE'' can be
3680 represented with a single character (U+1E69) or by the character ``LATIN
3681 SMALL LETTER S'' (U+0073) followed by the combining marks ``COMBINING
3682 DOT BELOW'' (U+0323) and ``COMBINING DOT ABOVE'' (U+0307).
3683
3684 For this reason, it is often desirable to ensure that the strings
3685 to be compared are using a mutually consistent representation for every
3686 character. The Unicode standard defines two methods of normalizing the
3687 contents of strings: Decomposition, which breaks composite characters
3688 into a set of constituent characters with an ordering defined by the
3689 Unicode Standard; and composition, which performs the converse.
3690
3691 There are two decomposition operations. ``Canonical decomposition''
3692 produces character sequences that share the same visual appearance as
3693 the original characters, while ``compatibility decomposition'' produces
3694 ones whose visual appearances may differ from the originals but which
3695 represent the same abstract character.
3696
3697 These operations are encapsulated in the following set of normalization
3698 forms:
3699
3700 @table @dfn
3701 @item NFD
3702 Characters are decomposed to their canonical forms.
3703
3704 @item NFKD
3705 Characters are decomposed to their compatibility forms.
3706
3707 @item NFC
3708 Characters are decomposed to their canonical forms, then composed.
3709
3710 @item NFKC
3711 Characters are decomposed to their compatibility forms, then composed.
3712
3713 @end table
3714
3715 The functions below put their arguments into one of the forms described
3716 above.
3717
3718 @deffn {Scheme Procedure} string-normalize-nfd s
3719 @deffnx {C Function} scm_string_normalize_nfd (s)
3720 Return the @code{NFD} normalized form of @var{s}.
3721 @end deffn
3722
3723 @deffn {Scheme Procedure} string-normalize-nfkd s
3724 @deffnx {C Function} scm_string_normalize_nfkd (s)
3725 Return the @code{NFKD} normalized form of @var{s}.
3726 @end deffn
3727
3728 @deffn {Scheme Procedure} string-normalize-nfc s
3729 @deffnx {C Function} scm_string_normalize_nfc (s)
3730 Return the @code{NFC} normalized form of @var{s}.
3731 @end deffn
3732
3733 @deffn {Scheme Procedure} string-normalize-nfkc s
3734 @deffnx {C Function} scm_string_normalize_nfkc (s)
3735 Return the @code{NFKC} normalized form of @var{s}.
3736 @end deffn
3737
3738 @node String Searching
3739 @subsubsection String Searching
3740
3741 @deffn {Scheme Procedure} string-index s char_pred [start [end]]
3742 @deffnx {C Function} scm_string_index (s, char_pred, start, end)
3743 Search through the string @var{s} from left to right, returning
3744 the index of the first occurrence of a character which
3745
3746 @itemize @bullet
3747 @item
3748 equals @var{char_pred}, if it is character,
3749
3750 @item
3751 satisfies the predicate @var{char_pred}, if it is a procedure,
3752
3753 @item
3754 is in the set @var{char_pred}, if it is a character set.
3755 @end itemize
3756
3757 Return @code{#f} if no match is found.
3758 @end deffn
3759
3760 @deffn {Scheme Procedure} string-rindex s char_pred [start [end]]
3761 @deffnx {C Function} scm_string_rindex (s, char_pred, start, end)
3762 Search through the string @var{s} from right to left, returning
3763 the index of the last occurrence of a character which
3764
3765 @itemize @bullet
3766 @item
3767 equals @var{char_pred}, if it is character,
3768
3769 @item
3770 satisfies the predicate @var{char_pred}, if it is a procedure,
3771
3772 @item
3773 is in the set if @var{char_pred} is a character set.
3774 @end itemize
3775
3776 Return @code{#f} if no match is found.
3777 @end deffn
3778
3779 @deffn {Scheme Procedure} string-prefix-length s1 s2 [start1 [end1 [start2 [end2]]]]
3780 @deffnx {C Function} scm_string_prefix_length (s1, s2, start1, end1, start2, end2)
3781 Return the length of the longest common prefix of the two
3782 strings.
3783 @end deffn
3784
3785 @deffn {Scheme Procedure} string-prefix-length-ci s1 s2 [start1 [end1 [start2 [end2]]]]
3786 @deffnx {C Function} scm_string_prefix_length_ci (s1, s2, start1, end1, start2, end2)
3787 Return the length of the longest common prefix of the two
3788 strings, ignoring character case.
3789 @end deffn
3790
3791 @deffn {Scheme Procedure} string-suffix-length s1 s2 [start1 [end1 [start2 [end2]]]]
3792 @deffnx {C Function} scm_string_suffix_length (s1, s2, start1, end1, start2, end2)
3793 Return the length of the longest common suffix of the two
3794 strings.
3795 @end deffn
3796
3797 @deffn {Scheme Procedure} string-suffix-length-ci s1 s2 [start1 [end1 [start2 [end2]]]]
3798 @deffnx {C Function} scm_string_suffix_length_ci (s1, s2, start1, end1, start2, end2)
3799 Return the length of the longest common suffix of the two
3800 strings, ignoring character case.
3801 @end deffn
3802
3803 @deffn {Scheme Procedure} string-prefix? s1 s2 [start1 [end1 [start2 [end2]]]]
3804 @deffnx {C Function} scm_string_prefix_p (s1, s2, start1, end1, start2, end2)
3805 Is @var{s1} a prefix of @var{s2}?
3806 @end deffn
3807
3808 @deffn {Scheme Procedure} string-prefix-ci? s1 s2 [start1 [end1 [start2 [end2]]]]
3809 @deffnx {C Function} scm_string_prefix_ci_p (s1, s2, start1, end1, start2, end2)
3810 Is @var{s1} a prefix of @var{s2}, ignoring character case?
3811 @end deffn
3812
3813 @deffn {Scheme Procedure} string-suffix? s1 s2 [start1 [end1 [start2 [end2]]]]
3814 @deffnx {C Function} scm_string_suffix_p (s1, s2, start1, end1, start2, end2)
3815 Is @var{s1} a suffix of @var{s2}?
3816 @end deffn
3817
3818 @deffn {Scheme Procedure} string-suffix-ci? s1 s2 [start1 [end1 [start2 [end2]]]]
3819 @deffnx {C Function} scm_string_suffix_ci_p (s1, s2, start1, end1, start2, end2)
3820 Is @var{s1} a suffix of @var{s2}, ignoring character case?
3821 @end deffn
3822
3823 @deffn {Scheme Procedure} string-index-right s char_pred [start [end]]
3824 @deffnx {C Function} scm_string_index_right (s, char_pred, start, end)
3825 Search through the string @var{s} from right to left, returning
3826 the index of the last occurrence of a character which
3827
3828 @itemize @bullet
3829 @item
3830 equals @var{char_pred}, if it is character,
3831
3832 @item
3833 satisfies the predicate @var{char_pred}, if it is a procedure,
3834
3835 @item
3836 is in the set if @var{char_pred} is a character set.
3837 @end itemize
3838
3839 Return @code{#f} if no match is found.
3840 @end deffn
3841
3842 @deffn {Scheme Procedure} string-skip s char_pred [start [end]]
3843 @deffnx {C Function} scm_string_skip (s, char_pred, start, end)
3844 Search through the string @var{s} from left to right, returning
3845 the index of the first occurrence of a character which
3846
3847 @itemize @bullet
3848 @item
3849 does not equal @var{char_pred}, if it is character,
3850
3851 @item
3852 does not satisfy the predicate @var{char_pred}, if it is a
3853 procedure,
3854
3855 @item
3856 is not in the set if @var{char_pred} is a character set.
3857 @end itemize
3858 @end deffn
3859
3860 @deffn {Scheme Procedure} string-skip-right s char_pred [start [end]]
3861 @deffnx {C Function} scm_string_skip_right (s, char_pred, start, end)
3862 Search through the string @var{s} from right to left, returning
3863 the index of the last occurrence of a character which
3864
3865 @itemize @bullet
3866 @item
3867 does not equal @var{char_pred}, if it is character,
3868
3869 @item
3870 does not satisfy the predicate @var{char_pred}, if it is a
3871 procedure,
3872
3873 @item
3874 is not in the set if @var{char_pred} is a character set.
3875 @end itemize
3876 @end deffn
3877
3878 @deffn {Scheme Procedure} string-count s char_pred [start [end]]
3879 @deffnx {C Function} scm_string_count (s, char_pred, start, end)
3880 Return the count of the number of characters in the string
3881 @var{s} which
3882
3883 @itemize @bullet
3884 @item
3885 equals @var{char_pred}, if it is character,
3886
3887 @item
3888 satisfies the predicate @var{char_pred}, if it is a procedure.
3889
3890 @item
3891 is in the set @var{char_pred}, if it is a character set.
3892 @end itemize
3893 @end deffn
3894
3895 @deffn {Scheme Procedure} string-contains s1 s2 [start1 [end1 [start2 [end2]]]]
3896 @deffnx {C Function} scm_string_contains (s1, s2, start1, end1, start2, end2)
3897 Does string @var{s1} contain string @var{s2}? Return the index
3898 in @var{s1} where @var{s2} occurs as a substring, or false.
3899 The optional start/end indices restrict the operation to the
3900 indicated substrings.
3901 @end deffn
3902
3903 @deffn {Scheme Procedure} string-contains-ci s1 s2 [start1 [end1 [start2 [end2]]]]
3904 @deffnx {C Function} scm_string_contains_ci (s1, s2, start1, end1, start2, end2)
3905 Does string @var{s1} contain string @var{s2}? Return the index
3906 in @var{s1} where @var{s2} occurs as a substring, or false.
3907 The optional start/end indices restrict the operation to the
3908 indicated substrings. Character comparison is done
3909 case-insensitively.
3910 @end deffn
3911
3912 @node Alphabetic Case Mapping
3913 @subsubsection Alphabetic Case Mapping
3914
3915 These are procedures for mapping strings to their upper- or lower-case
3916 equivalents, respectively, or for capitalizing strings.
3917
3918 They use the basic case mapping rules for Unicode characters. No
3919 special language or context rules are considered. The resulting strings
3920 are guaranteed to be the same length as the input strings.
3921
3922 @xref{Character Case Mapping, the @code{(ice-9
3923 i18n)} module}, for locale-dependent case conversions.
3924
3925 @deffn {Scheme Procedure} string-upcase str [start [end]]
3926 @deffnx {C Function} scm_substring_upcase (str, start, end)
3927 @deffnx {C Function} scm_string_upcase (str)
3928 Upcase every character in @code{str}.
3929 @end deffn
3930
3931 @deffn {Scheme Procedure} string-upcase! str [start [end]]
3932 @deffnx {C Function} scm_substring_upcase_x (str, start, end)
3933 @deffnx {C Function} scm_string_upcase_x (str)
3934 Destructively upcase every character in @code{str}.
3935
3936 @lisp
3937 (string-upcase! y)
3938 @result{} "ARRDEFG"
3939 y
3940 @result{} "ARRDEFG"
3941 @end lisp
3942 @end deffn
3943
3944 @deffn {Scheme Procedure} string-downcase str [start [end]]
3945 @deffnx {C Function} scm_substring_downcase (str, start, end)
3946 @deffnx {C Function} scm_string_downcase (str)
3947 Downcase every character in @var{str}.
3948 @end deffn
3949
3950 @deffn {Scheme Procedure} string-downcase! str [start [end]]
3951 @deffnx {C Function} scm_substring_downcase_x (str, start, end)
3952 @deffnx {C Function} scm_string_downcase_x (str)
3953 Destructively downcase every character in @var{str}.
3954
3955 @lisp
3956 y
3957 @result{} "ARRDEFG"
3958 (string-downcase! y)
3959 @result{} "arrdefg"
3960 y
3961 @result{} "arrdefg"
3962 @end lisp
3963 @end deffn
3964
3965 @deffn {Scheme Procedure} string-capitalize str
3966 @deffnx {C Function} scm_string_capitalize (str)
3967 Return a freshly allocated string with the characters in
3968 @var{str}, where the first character of every word is
3969 capitalized.
3970 @end deffn
3971
3972 @deffn {Scheme Procedure} string-capitalize! str
3973 @deffnx {C Function} scm_string_capitalize_x (str)
3974 Upcase the first character of every word in @var{str}
3975 destructively and return @var{str}.
3976
3977 @lisp
3978 y @result{} "hello world"
3979 (string-capitalize! y) @result{} "Hello World"
3980 y @result{} "Hello World"
3981 @end lisp
3982 @end deffn
3983
3984 @deffn {Scheme Procedure} string-titlecase str [start [end]]
3985 @deffnx {C Function} scm_string_titlecase (str, start, end)
3986 Titlecase every first character in a word in @var{str}.
3987 @end deffn
3988
3989 @deffn {Scheme Procedure} string-titlecase! str [start [end]]
3990 @deffnx {C Function} scm_string_titlecase_x (str, start, end)
3991 Destructively titlecase every first character in a word in
3992 @var{str}.
3993 @end deffn
3994
3995 @node Reversing and Appending Strings
3996 @subsubsection Reversing and Appending Strings
3997
3998 @deffn {Scheme Procedure} string-reverse str [start [end]]
3999 @deffnx {C Function} scm_string_reverse (str, start, end)
4000 Reverse the string @var{str}. The optional arguments
4001 @var{start} and @var{end} delimit the region of @var{str} to
4002 operate on.
4003 @end deffn
4004
4005 @deffn {Scheme Procedure} string-reverse! str [start [end]]
4006 @deffnx {C Function} scm_string_reverse_x (str, start, end)
4007 Reverse the string @var{str} in-place. The optional arguments
4008 @var{start} and @var{end} delimit the region of @var{str} to
4009 operate on. The return value is unspecified.
4010 @end deffn
4011
4012 @rnindex string-append
4013 @deffn {Scheme Procedure} string-append arg @dots{}
4014 @deffnx {C Function} scm_string_append (args)
4015 Return a newly allocated string whose characters form the
4016 concatenation of the given strings, @var{arg} @enddots{}.
4017
4018 @example
4019 (let ((h "hello "))
4020 (string-append h "world"))
4021 @result{} "hello world"
4022 @end example
4023 @end deffn
4024
4025 @deffn {Scheme Procedure} string-append/shared arg @dots{}
4026 @deffnx {C Function} scm_string_append_shared (args)
4027 Like @code{string-append}, but the result may share memory
4028 with the argument strings.
4029 @end deffn
4030
4031 @deffn {Scheme Procedure} string-concatenate ls
4032 @deffnx {C Function} scm_string_concatenate (ls)
4033 Append the elements (which must be strings) of @var{ls} together into a
4034 single string. Guaranteed to return a freshly allocated string.
4035 @end deffn
4036
4037 @deffn {Scheme Procedure} string-concatenate-reverse ls [final_string [end]]
4038 @deffnx {C Function} scm_string_concatenate_reverse (ls, final_string, end)
4039 Without optional arguments, this procedure is equivalent to
4040
4041 @lisp
4042 (string-concatenate (reverse ls))
4043 @end lisp
4044
4045 If the optional argument @var{final_string} is specified, it is
4046 consed onto the beginning to @var{ls} before performing the
4047 list-reverse and string-concatenate operations. If @var{end}
4048 is given, only the characters of @var{final_string} up to index
4049 @var{end} are used.
4050
4051 Guaranteed to return a freshly allocated string.
4052 @end deffn
4053
4054 @deffn {Scheme Procedure} string-concatenate/shared ls
4055 @deffnx {C Function} scm_string_concatenate_shared (ls)
4056 Like @code{string-concatenate}, but the result may share memory
4057 with the strings in the list @var{ls}.
4058 @end deffn
4059
4060 @deffn {Scheme Procedure} string-concatenate-reverse/shared ls [final_string [end]]
4061 @deffnx {C Function} scm_string_concatenate_reverse_shared (ls, final_string, end)
4062 Like @code{string-concatenate-reverse}, but the result may
4063 share memory with the strings in the @var{ls} arguments.
4064 @end deffn
4065
4066 @node Mapping Folding and Unfolding
4067 @subsubsection Mapping, Folding, and Unfolding
4068
4069 @deffn {Scheme Procedure} string-map proc s [start [end]]
4070 @deffnx {C Function} scm_string_map (proc, s, start, end)
4071 @var{proc} is a char->char procedure, it is mapped over
4072 @var{s}. The order in which the procedure is applied to the
4073 string elements is not specified.
4074 @end deffn
4075
4076 @deffn {Scheme Procedure} string-map! proc s [start [end]]
4077 @deffnx {C Function} scm_string_map_x (proc, s, start, end)
4078 @var{proc} is a char->char procedure, it is mapped over
4079 @var{s}. The order in which the procedure is applied to the
4080 string elements is not specified. The string @var{s} is
4081 modified in-place, the return value is not specified.
4082 @end deffn
4083
4084 @deffn {Scheme Procedure} string-for-each proc s [start [end]]
4085 @deffnx {C Function} scm_string_for_each (proc, s, start, end)
4086 @var{proc} is mapped over @var{s} in left-to-right order. The
4087 return value is not specified.
4088 @end deffn
4089
4090 @deffn {Scheme Procedure} string-for-each-index proc s [start [end]]
4091 @deffnx {C Function} scm_string_for_each_index (proc, s, start, end)
4092 Call @code{(@var{proc} i)} for each index i in @var{s}, from left to
4093 right.
4094
4095 For example, to change characters to alternately upper and lower case,
4096
4097 @example
4098 (define str (string-copy "studly"))
4099 (string-for-each-index
4100 (lambda (i)
4101 (string-set! str i
4102 ((if (even? i) char-upcase char-downcase)
4103 (string-ref str i))))
4104 str)
4105 str @result{} "StUdLy"
4106 @end example
4107 @end deffn
4108
4109 @deffn {Scheme Procedure} string-fold kons knil s [start [end]]
4110 @deffnx {C Function} scm_string_fold (kons, knil, s, start, end)
4111 Fold @var{kons} over the characters of @var{s}, with @var{knil}
4112 as the terminating element, from left to right. @var{kons}
4113 must expect two arguments: The actual character and the last
4114 result of @var{kons}' application.
4115 @end deffn
4116
4117 @deffn {Scheme Procedure} string-fold-right kons knil s [start [end]]
4118 @deffnx {C Function} scm_string_fold_right (kons, knil, s, start, end)
4119 Fold @var{kons} over the characters of @var{s}, with @var{knil}
4120 as the terminating element, from right to left. @var{kons}
4121 must expect two arguments: The actual character and the last
4122 result of @var{kons}' application.
4123 @end deffn
4124
4125 @deffn {Scheme Procedure} string-unfold p f g seed [base [make_final]]
4126 @deffnx {C Function} scm_string_unfold (p, f, g, seed, base, make_final)
4127 @itemize @bullet
4128 @item @var{g} is used to generate a series of @emph{seed}
4129 values from the initial @var{seed}: @var{seed}, (@var{g}
4130 @var{seed}), (@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}),
4131 @dots{}
4132 @item @var{p} tells us when to stop -- when it returns true
4133 when applied to one of these seed values.
4134 @item @var{f} maps each seed value to the corresponding
4135 character in the result string. These chars are assembled
4136 into the string in a left-to-right order.
4137 @item @var{base} is the optional initial/leftmost portion
4138 of the constructed string; it default to the empty
4139 string.
4140 @item @var{make_final} is applied to the terminal seed
4141 value (on which @var{p} returns true) to produce
4142 the final/rightmost portion of the constructed string.
4143 The default is nothing extra.
4144 @end itemize
4145 @end deffn
4146
4147 @deffn {Scheme Procedure} string-unfold-right p f g seed [base [make_final]]
4148 @deffnx {C Function} scm_string_unfold_right (p, f, g, seed, base, make_final)
4149 @itemize @bullet
4150 @item @var{g} is used to generate a series of @emph{seed}
4151 values from the initial @var{seed}: @var{seed}, (@var{g}
4152 @var{seed}), (@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}),
4153 @dots{}
4154 @item @var{p} tells us when to stop -- when it returns true
4155 when applied to one of these seed values.
4156 @item @var{f} maps each seed value to the corresponding
4157 character in the result string. These chars are assembled
4158 into the string in a right-to-left order.
4159 @item @var{base} is the optional initial/rightmost portion
4160 of the constructed string; it default to the empty
4161 string.
4162 @item @var{make_final} is applied to the terminal seed
4163 value (on which @var{p} returns true) to produce
4164 the final/leftmost portion of the constructed string.
4165 It defaults to @code{(lambda (x) )}.
4166 @end itemize
4167 @end deffn
4168
4169 @node Miscellaneous String Operations
4170 @subsubsection Miscellaneous String Operations
4171
4172 @deffn {Scheme Procedure} xsubstring s from [to [start [end]]]
4173 @deffnx {C Function} scm_xsubstring (s, from, to, start, end)
4174 This is the @emph{extended substring} procedure that implements
4175 replicated copying of a substring of some string.
4176
4177 @var{s} is a string, @var{start} and @var{end} are optional
4178 arguments that demarcate a substring of @var{s}, defaulting to
4179 0 and the length of @var{s}. Replicate this substring up and
4180 down index space, in both the positive and negative directions.
4181 @code{xsubstring} returns the substring of this string
4182 beginning at index @var{from}, and ending at @var{to}, which
4183 defaults to @var{from} + (@var{end} - @var{start}).
4184 @end deffn
4185
4186 @deffn {Scheme Procedure} string-xcopy! target tstart s sfrom [sto [start [end]]]
4187 @deffnx {C Function} scm_string_xcopy_x (target, tstart, s, sfrom, sto, start, end)
4188 Exactly the same as @code{xsubstring}, but the extracted text
4189 is written into the string @var{target} starting at index
4190 @var{tstart}. The operation is not defined if @code{(eq?
4191 @var{target} @var{s})} or these arguments share storage -- you
4192 cannot copy a string on top of itself.
4193 @end deffn
4194
4195 @deffn {Scheme Procedure} string-replace s1 s2 [start1 [end1 [start2 [end2]]]]
4196 @deffnx {C Function} scm_string_replace (s1, s2, start1, end1, start2, end2)
4197 Return the string @var{s1}, but with the characters
4198 @var{start1} @dots{} @var{end1} replaced by the characters
4199 @var{start2} @dots{} @var{end2} from @var{s2}.
4200 @end deffn
4201
4202 @deffn {Scheme Procedure} string-tokenize s [token_set [start [end]]]
4203 @deffnx {C Function} scm_string_tokenize (s, token_set, start, end)
4204 Split the string @var{s} into a list of substrings, where each
4205 substring is a maximal non-empty contiguous sequence of
4206 characters from the character set @var{token_set}, which
4207 defaults to @code{char-set:graphic}.
4208 If @var{start} or @var{end} indices are provided, they restrict
4209 @code{string-tokenize} to operating on the indicated substring
4210 of @var{s}.
4211 @end deffn
4212
4213 @deffn {Scheme Procedure} string-filter char_pred s [start [end]]
4214 @deffnx {C Function} scm_string_filter (char_pred, s, start, end)
4215 Filter the string @var{s}, retaining only those characters which
4216 satisfy @var{char_pred}.
4217
4218 If @var{char_pred} is a procedure, it is applied to each character as
4219 a predicate, if it is a character, it is tested for equality and if it
4220 is a character set, it is tested for membership.
4221 @end deffn
4222
4223 @deffn {Scheme Procedure} string-delete char_pred s [start [end]]
4224 @deffnx {C Function} scm_string_delete (char_pred, s, start, end)
4225 Delete characters satisfying @var{char_pred} from @var{s}.
4226
4227 If @var{char_pred} is a procedure, it is applied to each character as
4228 a predicate, if it is a character, it is tested for equality and if it
4229 is a character set, it is tested for membership.
4230 @end deffn
4231
4232 @node Representing Strings as Bytes
4233 @subsubsection Representing Strings as Bytes
4234
4235 Out in the cold world outside of Guile, not all strings are treated in
4236 the same way. Out there there are only bytes, and there are many ways
4237 of representing a strings (sequences of characters) as binary data
4238 (sequences of bytes).
4239
4240 As a user, usually you don't have to think about this very much. When
4241 you type on your keyboard, your system encodes your keystrokes as bytes
4242 according to the locale that you have configured on your computer.
4243 Guile uses the locale to decode those bytes back into characters --
4244 hopefully the same characters that you typed in.
4245
4246 All is not so clear when dealing with a system with multiple users, such
4247 as a web server. Your web server might get a request from one user for
4248 data encoded in the ISO-8859-1 character set, and then another request
4249 from a different user for UTF-8 data.
4250
4251 @cindex iconv
4252 @cindex character encoding
4253 Guile provides an @dfn{iconv} module for converting between strings and
4254 sequences of bytes. @xref{Bytevectors}, for more on how Guile
4255 represents raw byte sequences. This module gets its name from the
4256 common @sc{unix} command of the same name.
4257
4258 Note that often it is sufficient to just read and write strings from
4259 ports instead of using these functions. To do this, specify the port
4260 encoding using @code{set-port-encoding!}. @xref{Ports}, for more on
4261 ports and character encodings.
4262
4263 Unlike the rest of the procedures in this section, you have to load the
4264 @code{iconv} module before having access to these procedures:
4265
4266 @example
4267 (use-modules (ice-9 iconv))
4268 @end example
4269
4270 @deffn {Scheme Procedure} string->bytevector string encoding [conversion-strategy]
4271 Encode @var{string} as a sequence of bytes.
4272
4273 The string will be encoded in the character set specified by the
4274 @var{encoding} string. If the string has characters that cannot be
4275 represented in the encoding, by default this procedure raises an
4276 @code{encoding-error}. Pass a @var{conversion-strategy} argument to
4277 specify other behaviors.
4278
4279 The return value is a bytevector. @xref{Bytevectors}, for more on
4280 bytevectors. @xref{Ports}, for more on character encodings and
4281 conversion strategies.
4282 @end deffn
4283
4284 @deffn {Scheme Procedure} bytevector->string bytevector encoding [conversion-strategy]
4285 Decode @var{bytevector} into a string.
4286
4287 The bytes will be decoded from the character set by the @var{encoding}
4288 string. If the bytes do not form a valid encoding, by default this
4289 procedure raises an @code{decoding-error}. As with
4290 @code{string->bytevector}, pass the optional @var{conversion-strategy}
4291 argument to modify this behavior. @xref{Ports}, for more on character
4292 encodings and conversion strategies.
4293 @end deffn
4294
4295 @deffn {Scheme Procedure} call-with-output-encoded-string encoding proc [conversion-strategy]
4296 Like @code{call-with-output-string}, but instead of returning a string,
4297 returns a encoding of the string according to @var{encoding}, as a
4298 bytevector. This procedure can be more efficient than collecting a
4299 string and then converting it via @code{string->bytevector}.
4300 @end deffn
4301
4302 @node Conversion to/from C
4303 @subsubsection Conversion to/from C
4304
4305 When creating a Scheme string from a C string or when converting a
4306 Scheme string to a C string, the concept of character encoding becomes
4307 important.
4308
4309 In C, a string is just a sequence of bytes, and the character encoding
4310 describes the relation between these bytes and the actual characters
4311 that make up the string. For Scheme strings, character encoding is not
4312 an issue (most of the time), since in Scheme you usually treat strings
4313 as character sequences, not byte sequences.
4314
4315 Converting to C and converting from C each have their own challenges.
4316
4317 When converting from C to Scheme, it is important that the sequence of
4318 bytes in the C string be valid with respect to its encoding. ASCII
4319 strings, for example, can't have any bytes greater than 127. An ASCII
4320 byte greater than 127 is considered @emph{ill-formed} and cannot be
4321 converted into a Scheme character.
4322
4323 Problems can occur in the reverse operation as well. Not all character
4324 encodings can hold all possible Scheme characters. Some encodings, like
4325 ASCII for example, can only describe a small subset of all possible
4326 characters. So, when converting to C, one must first decide what to do
4327 with Scheme characters that can't be represented in the C string.
4328
4329 Converting a Scheme string to a C string will often allocate fresh
4330 memory to hold the result. You must take care that this memory is
4331 properly freed eventually. In many cases, this can be achieved by
4332 using @code{scm_dynwind_free} inside an appropriate dynwind context,
4333 @xref{Dynamic Wind}.
4334
4335 @deftypefn {C Function} SCM scm_from_locale_string (const char *str)
4336 @deftypefnx {C Function} SCM scm_from_locale_stringn (const char *str, size_t len)
4337 Creates a new Scheme string that has the same contents as @var{str} when
4338 interpreted in the character encoding of the current locale.
4339
4340 For @code{scm_from_locale_string}, @var{str} must be null-terminated.
4341
4342 For @code{scm_from_locale_stringn}, @var{len} specifies the length of
4343 @var{str} in bytes, and @var{str} does not need to be null-terminated.
4344 If @var{len} is @code{(size_t)-1}, then @var{str} does need to be
4345 null-terminated and the real length will be found with @code{strlen}.
4346
4347 If the C string is ill-formed, an error will be raised.
4348
4349 Note that these functions should @emph{not} be used to convert C string
4350 constants, because there is no guarantee that the current locale will
4351 match that of the execution character set, used for string and character
4352 constants. Most modern C compilers use UTF-8 by default, so to convert
4353 C string constants we recommend @code{scm_from_utf8_string}.
4354 @end deftypefn
4355
4356 @deftypefn {C Function} SCM scm_take_locale_string (char *str)
4357 @deftypefnx {C Function} SCM scm_take_locale_stringn (char *str, size_t len)
4358 Like @code{scm_from_locale_string} and @code{scm_from_locale_stringn},
4359 respectively, but also frees @var{str} with @code{free} eventually.
4360 Thus, you can use this function when you would free @var{str} anyway
4361 immediately after creating the Scheme string. In certain cases, Guile
4362 can then use @var{str} directly as its internal representation.
4363 @end deftypefn
4364
4365 @deftypefn {C Function} {char *} scm_to_locale_string (SCM str)
4366 @deftypefnx {C Function} {char *} scm_to_locale_stringn (SCM str, size_t *lenp)
4367 Returns a C string with the same contents as @var{str} in the character
4368 encoding of the current locale. The C string must be freed with
4369 @code{free} eventually, maybe by using @code{scm_dynwind_free},
4370 @xref{Dynamic Wind}.
4371
4372 For @code{scm_to_locale_string}, the returned string is
4373 null-terminated and an error is signalled when @var{str} contains
4374 @code{#\nul} characters.
4375
4376 For @code{scm_to_locale_stringn} and @var{lenp} not @code{NULL},
4377 @var{str} might contain @code{#\nul} characters and the length of the
4378 returned string in bytes is stored in @code{*@var{lenp}}. The
4379 returned string will not be null-terminated in this case. If
4380 @var{lenp} is @code{NULL}, @code{scm_to_locale_stringn} behaves like
4381 @code{scm_to_locale_string}.
4382
4383 If a character in @var{str} cannot be represented in the character
4384 encoding of the current locale, the default port conversion strategy is
4385 used. @xref{Ports}, for more on conversion strategies.
4386
4387 If the conversion strategy is @code{error}, an error will be raised. If
4388 it is @code{substitute}, a replacement character, such as a question
4389 mark, will be inserted in its place. If it is @code{escape}, a hex
4390 escape will be inserted in its place.
4391 @end deftypefn
4392
4393 @deftypefn {C Function} size_t scm_to_locale_stringbuf (SCM str, char *buf, size_t max_len)
4394 Puts @var{str} as a C string in the current locale encoding into the
4395 memory pointed to by @var{buf}. The buffer at @var{buf} has room for
4396 @var{max_len} bytes and @code{scm_to_local_stringbuf} will never store
4397 more than that. No terminating @code{'\0'} will be stored.
4398
4399 The return value of @code{scm_to_locale_stringbuf} is the number of
4400 bytes that are needed for all of @var{str}, regardless of whether
4401 @var{buf} was large enough to hold them. Thus, when the return value
4402 is larger than @var{max_len}, only @var{max_len} bytes have been
4403 stored and you probably need to try again with a larger buffer.
4404 @end deftypefn
4405
4406 For most situations, string conversion should occur using the current
4407 locale, such as with the functions above. But there may be cases where
4408 one wants to convert strings from a character encoding other than the
4409 locale's character encoding. For these cases, the lower-level functions
4410 @code{scm_to_stringn} and @code{scm_from_stringn} are provided. These
4411 functions should seldom be necessary if one is properly using locales.
4412
4413 @deftp {C Type} scm_t_string_failed_conversion_handler
4414 This is an enumerated type that can take one of three values:
4415 @code{SCM_FAILED_CONVERSION_ERROR},
4416 @code{SCM_FAILED_CONVERSION_QUESTION_MARK}, and
4417 @code{SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE}. They are used to indicate
4418 a strategy for handling characters that cannot be converted to or from a
4419 given character encoding. @code{SCM_FAILED_CONVERSION_ERROR} indicates
4420 that a conversion should throw an error if some characters cannot be
4421 converted. @code{SCM_FAILED_CONVERSION_QUESTION_MARK} indicates that a
4422 conversion should replace unconvertable characters with the question
4423 mark character. And, @code{SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE}
4424 requests that a conversion should replace an unconvertable character
4425 with an escape sequence.
4426
4427 While all three strategies apply when converting Scheme strings to C,
4428 only @code{SCM_FAILED_CONVERSION_ERROR} and
4429 @code{SCM_FAILED_CONVERSION_QUESTION_MARK} can be used when converting C
4430 strings to Scheme.
4431 @end deftp
4432
4433 @deftypefn {C Function} char *scm_to_stringn (SCM str, size_t *lenp, const char *encoding, scm_t_string_failed_conversion_handler handler)
4434 This function returns a newly allocated C string from the Guile string
4435 @var{str}. The length of the returned string in bytes will be returned in
4436 @var{lenp}. The character encoding of the C string is passed as the ASCII,
4437 null-terminated C string @var{encoding}. The @var{handler} parameter
4438 gives a strategy for dealing with characters that cannot be converted
4439 into @var{encoding}.
4440
4441 If @var{lenp} is @code{NULL}, this function will return a null-terminated C
4442 string. It will throw an error if the string contains a null
4443 character.
4444
4445 The Scheme interface to this function is @code{string->bytevector}, from the
4446 @code{ice-9 iconv} module. @xref{Representing Strings as Bytes}.
4447 @end deftypefn
4448
4449 @deftypefn {C Function} SCM scm_from_stringn (const char *str, size_t len, const char *encoding, scm_t_string_failed_conversion_handler handler)
4450 This function returns a scheme string from the C string @var{str}. The
4451 length in bytes of the C string is input as @var{len}. The encoding of the C
4452 string is passed as the ASCII, null-terminated C string @code{encoding}.
4453 The @var{handler} parameters suggests a strategy for dealing with
4454 unconvertable characters.
4455
4456 The Scheme interface to this function is @code{bytevector->string}.
4457 @xref{Representing Strings as Bytes}.
4458 @end deftypefn
4459
4460 The following conversion functions are provided as a convenience for the
4461 most commonly used encodings.
4462
4463 @deftypefn {C Function} SCM scm_from_latin1_string (const char *str)
4464 @deftypefnx {C Function} SCM scm_from_utf8_string (const char *str)
4465 @deftypefnx {C Function} SCM scm_from_utf32_string (const scm_t_wchar *str)
4466 Return a scheme string from the null-terminated C string @var{str},
4467 which is ISO-8859-1-, UTF-8-, or UTF-32-encoded. These functions should
4468 be used to convert hard-coded C string constants into Scheme strings.
4469 @end deftypefn
4470
4471 @deftypefn {C Function} SCM scm_from_latin1_stringn (const char *str, size_t len)
4472 @deftypefnx {C Function} SCM scm_from_utf8_stringn (const char *str, size_t len)
4473 @deftypefnx {C Function} SCM scm_from_utf32_stringn (const scm_t_wchar *str, size_t len)
4474 Return a scheme string from C string @var{str}, which is ISO-8859-1-,
4475 UTF-8-, or UTF-32-encoded, of length @var{len}. @var{len} is the number
4476 of bytes pointed to by @var{str} for @code{scm_from_latin1_stringn} and
4477 @code{scm_from_utf8_stringn}; it is the number of elements (code points)
4478 in @var{str} in the case of @code{scm_from_utf32_stringn}.
4479 @end deftypefn
4480
4481 @deftypefn {C function} char *scm_to_latin1_stringn (SCM str, size_t *lenp)
4482 @deftypefnx {C function} char *scm_to_utf8_stringn (SCM str, size_t *lenp)
4483 @deftypefnx {C function} scm_t_wchar *scm_to_utf32_stringn (SCM str, size_t *lenp)
4484 Return a newly allocated, ISO-8859-1-, UTF-8-, or UTF-32-encoded C string
4485 from Scheme string @var{str}. An error is thrown when @var{str}
4486 cannot be converted to the specified encoding. If @var{lenp} is
4487 @code{NULL}, the returned C string will be null terminated, and an error
4488 will be thrown if the C string would otherwise contain null
4489 characters. If @var{lenp} is not @code{NULL}, the string is not null terminated,
4490 and the length of the returned string is returned in @var{lenp}. The length
4491 returned is the number of bytes for @code{scm_to_latin1_stringn} and
4492 @code{scm_to_utf8_stringn}; it is the number of elements (code points)
4493 for @code{scm_to_utf32_stringn}.
4494 @end deftypefn
4495
4496 @node String Internals
4497 @subsubsection String Internals
4498
4499 Guile stores each string in memory as a contiguous array of Unicode code
4500 points along with an associated set of attributes. If all of the code
4501 points of a string have an integer range between 0 and 255 inclusive,
4502 the code point array is stored as one byte per code point: it is stored
4503 as an ISO-8859-1 (aka Latin-1) string. If any of the code points of the
4504 string has an integer value greater that 255, the code point array is
4505 stored as four bytes per code point: it is stored as a UTF-32 string.
4506
4507 Conversion between the one-byte-per-code-point and
4508 four-bytes-per-code-point representations happens automatically as
4509 necessary.
4510
4511 No API is provided to set the internal representation of strings;
4512 however, there are pair of procedures available to query it. These are
4513 debugging procedures. Using them in production code is discouraged,
4514 since the details of Guile's internal representation of strings may
4515 change from release to release.
4516
4517 @deffn {Scheme Procedure} string-bytes-per-char str
4518 @deffnx {C Function} scm_string_bytes_per_char (str)
4519 Return the number of bytes used to encode a Unicode code point in string
4520 @var{str}. The result is one or four.
4521 @end deffn
4522
4523 @deffn {Scheme Procedure} %string-dump str
4524 @deffnx {C Function} scm_sys_string_dump (str)
4525 Returns an association list containing debugging information for
4526 @var{str}. The association list has the following entries.
4527 @table @code
4528
4529 @item string
4530 The string itself.
4531
4532 @item start
4533 The start index of the string into its stringbuf
4534
4535 @item length
4536 The length of the string
4537
4538 @item shared
4539 If this string is a substring, it returns its
4540 parent string. Otherwise, it returns @code{#f}
4541
4542 @item read-only
4543 @code{#t} if the string is read-only
4544
4545 @item stringbuf-chars
4546 A new string containing this string's stringbuf's characters
4547
4548 @item stringbuf-length
4549 The number of characters in this stringbuf
4550
4551 @item stringbuf-shared
4552 @code{#t} if this stringbuf is shared
4553
4554 @item stringbuf-wide
4555 @code{#t} if this stringbuf's characters are stored in a 32-bit buffer,
4556 or @code{#f} if they are stored in an 8-bit buffer
4557 @end table
4558 @end deffn
4559
4560
4561 @node Bytevectors
4562 @subsection Bytevectors
4563
4564 @cindex bytevector
4565 @cindex R6RS
4566
4567 A @dfn{bytevector} is a raw bit string. The @code{(rnrs bytevectors)}
4568 module provides the programming interface specified by the
4569 @uref{http://www.r6rs.org/, Revised^6 Report on the Algorithmic Language
4570 Scheme (R6RS)}. It contains procedures to manipulate bytevectors and
4571 interpret their contents in a number of ways: bytevector contents can be
4572 accessed as signed or unsigned integer of various sizes and endianness,
4573 as IEEE-754 floating point numbers, or as strings. It is a useful tool
4574 to encode and decode binary data.
4575
4576 The R6RS (Section 4.3.4) specifies an external representation for
4577 bytevectors, whereby the octets (integers in the range 0--255) contained
4578 in the bytevector are represented as a list prefixed by @code{#vu8}:
4579
4580 @lisp
4581 #vu8(1 53 204)
4582 @end lisp
4583
4584 denotes a 3-byte bytevector containing the octets 1, 53, and 204. Like
4585 string literals, booleans, etc., bytevectors are ``self-quoting'', i.e.,
4586 they do not need to be quoted:
4587
4588 @lisp
4589 #vu8(1 53 204)
4590 @result{} #vu8(1 53 204)
4591 @end lisp
4592
4593 Bytevectors can be used with the binary input/output primitives of the
4594 R6RS (@pxref{R6RS I/O Ports}).
4595
4596 @menu
4597 * Bytevector Endianness:: Dealing with byte order.
4598 * Bytevector Manipulation:: Creating, copying, manipulating bytevectors.
4599 * Bytevectors as Integers:: Interpreting bytes as integers.
4600 * Bytevectors and Integer Lists:: Converting to/from an integer list.
4601 * Bytevectors as Floats:: Interpreting bytes as real numbers.
4602 * Bytevectors as Strings:: Interpreting bytes as Unicode strings.
4603 * Bytevectors as Arrays:: Guile extension to the bytevector API.
4604 * Bytevectors as Uniform Vectors:: Bytevectors and SRFI-4.
4605 @end menu
4606
4607 @node Bytevector Endianness
4608 @subsubsection Endianness
4609
4610 @cindex endianness
4611 @cindex byte order
4612 @cindex word order
4613
4614 Some of the following procedures take an @var{endianness} parameter.
4615 The @dfn{endianness} is defined as the order of bytes in multi-byte
4616 numbers: numbers encoded in @dfn{big endian} have their most
4617 significant bytes written first, whereas numbers encoded in
4618 @dfn{little endian} have their least significant bytes
4619 first@footnote{Big-endian and little-endian are the most common
4620 ``endiannesses'', but others do exist. For instance, the GNU MP
4621 library allows @dfn{word order} to be specified independently of
4622 @dfn{byte order} (@pxref{Integer Import and Export,,, gmp, The GNU
4623 Multiple Precision Arithmetic Library Manual}).}.
4624
4625 Little-endian is the native endianness of the IA32 architecture and
4626 its derivatives, while big-endian is native to SPARC and PowerPC,
4627 among others. The @code{native-endianness} procedure returns the
4628 native endianness of the machine it runs on.
4629
4630 @deffn {Scheme Procedure} native-endianness
4631 @deffnx {C Function} scm_native_endianness ()
4632 Return a value denoting the native endianness of the host machine.
4633 @end deffn
4634
4635 @deffn {Scheme Macro} endianness symbol
4636 Return an object denoting the endianness specified by @var{symbol}. If
4637 @var{symbol} is neither @code{big} nor @code{little} then an error is
4638 raised at expand-time.
4639 @end deffn
4640
4641 @defvr {C Variable} scm_endianness_big
4642 @defvrx {C Variable} scm_endianness_little
4643 The objects denoting big- and little-endianness, respectively.
4644 @end defvr
4645
4646
4647 @node Bytevector Manipulation
4648 @subsubsection Manipulating Bytevectors
4649
4650 Bytevectors can be created, copied, and analyzed with the following
4651 procedures and C functions.
4652
4653 @deffn {Scheme Procedure} make-bytevector len [fill]
4654 @deffnx {C Function} scm_make_bytevector (len, fill)
4655 @deffnx {C Function} scm_c_make_bytevector (size_t len)
4656 Return a new bytevector of @var{len} bytes. Optionally, if @var{fill}
4657 is given, fill it with @var{fill}; @var{fill} must be in the range
4658 [-128,255].
4659 @end deffn
4660
4661 @deffn {Scheme Procedure} bytevector? obj
4662 @deffnx {C Function} scm_bytevector_p (obj)
4663 Return true if @var{obj} is a bytevector.
4664 @end deffn
4665
4666 @deftypefn {C Function} int scm_is_bytevector (SCM obj)
4667 Equivalent to @code{scm_is_true (scm_bytevector_p (obj))}.
4668 @end deftypefn
4669
4670 @deffn {Scheme Procedure} bytevector-length bv
4671 @deffnx {C Function} scm_bytevector_length (bv)
4672 Return the length in bytes of bytevector @var{bv}.
4673 @end deffn
4674
4675 @deftypefn {C Function} size_t scm_c_bytevector_length (SCM bv)
4676 Likewise, return the length in bytes of bytevector @var{bv}.
4677 @end deftypefn
4678
4679 @deffn {Scheme Procedure} bytevector=? bv1 bv2
4680 @deffnx {C Function} scm_bytevector_eq_p (bv1, bv2)
4681 Return is @var{bv1} equals to @var{bv2}---i.e., if they have the same
4682 length and contents.
4683 @end deffn
4684
4685 @deffn {Scheme Procedure} bytevector-fill! bv fill
4686 @deffnx {C Function} scm_bytevector_fill_x (bv, fill)
4687 Fill bytevector @var{bv} with @var{fill}, a byte.
4688 @end deffn
4689
4690 @deffn {Scheme Procedure} bytevector-copy! source source-start target target-start len
4691 @deffnx {C Function} scm_bytevector_copy_x (source, source_start, target, target_start, len)
4692 Copy @var{len} bytes from @var{source} into @var{target}, starting
4693 reading from @var{source-start} (a positive index within @var{source})
4694 and start writing at @var{target-start}. It is permitted for the
4695 @var{source} and @var{target} regions to overlap.
4696 @end deffn
4697
4698 @deffn {Scheme Procedure} bytevector-copy bv
4699 @deffnx {C Function} scm_bytevector_copy (bv)
4700 Return a newly allocated copy of @var{bv}.
4701 @end deffn
4702
4703 @deftypefn {C Function} scm_t_uint8 scm_c_bytevector_ref (SCM bv, size_t index)
4704 Return the byte at @var{index} in bytevector @var{bv}.
4705 @end deftypefn
4706
4707 @deftypefn {C Function} void scm_c_bytevector_set_x (SCM bv, size_t index, scm_t_uint8 value)
4708 Set the byte at @var{index} in @var{bv} to @var{value}.
4709 @end deftypefn
4710
4711 Low-level C macros are available. They do not perform any
4712 type-checking; as such they should be used with care.
4713
4714 @deftypefn {C Macro} size_t SCM_BYTEVECTOR_LENGTH (bv)
4715 Return the length in bytes of bytevector @var{bv}.
4716 @end deftypefn
4717
4718 @deftypefn {C Macro} {signed char *} SCM_BYTEVECTOR_CONTENTS (bv)
4719 Return a pointer to the contents of bytevector @var{bv}.
4720 @end deftypefn
4721
4722
4723 @node Bytevectors as Integers
4724 @subsubsection Interpreting Bytevector Contents as Integers
4725
4726 The contents of a bytevector can be interpreted as a sequence of
4727 integers of any given size, sign, and endianness.
4728
4729 @lisp
4730 (let ((bv (make-bytevector 4)))
4731 (bytevector-u8-set! bv 0 #x12)
4732 (bytevector-u8-set! bv 1 #x34)
4733 (bytevector-u8-set! bv 2 #x56)
4734 (bytevector-u8-set! bv 3 #x78)
4735
4736 (map (lambda (number)
4737 (number->string number 16))
4738 (list (bytevector-u8-ref bv 0)
4739 (bytevector-u16-ref bv 0 (endianness big))
4740 (bytevector-u32-ref bv 0 (endianness little)))))
4741
4742 @result{} ("12" "1234" "78563412")
4743 @end lisp
4744
4745 The most generic procedures to interpret bytevector contents as integers
4746 are described below.
4747
4748 @deffn {Scheme Procedure} bytevector-uint-ref bv index endianness size
4749 @deffnx {C Function} scm_bytevector_uint_ref (bv, index, endianness, size)
4750 Return the @var{size}-byte long unsigned integer at index @var{index} in
4751 @var{bv}, decoded according to @var{endianness}.
4752 @end deffn
4753
4754 @deffn {Scheme Procedure} bytevector-sint-ref bv index endianness size
4755 @deffnx {C Function} scm_bytevector_sint_ref (bv, index, endianness, size)
4756 Return the @var{size}-byte long signed integer at index @var{index} in
4757 @var{bv}, decoded according to @var{endianness}.
4758 @end deffn
4759
4760 @deffn {Scheme Procedure} bytevector-uint-set! bv index value endianness size
4761 @deffnx {C Function} scm_bytevector_uint_set_x (bv, index, value, endianness, size)
4762 Set the @var{size}-byte long unsigned integer at @var{index} to
4763 @var{value}, encoded according to @var{endianness}.
4764 @end deffn
4765
4766 @deffn {Scheme Procedure} bytevector-sint-set! bv index value endianness size
4767 @deffnx {C Function} scm_bytevector_sint_set_x (bv, index, value, endianness, size)
4768 Set the @var{size}-byte long signed integer at @var{index} to
4769 @var{value}, encoded according to @var{endianness}.
4770 @end deffn
4771
4772 The following procedures are similar to the ones above, but specialized
4773 to a given integer size:
4774
4775 @deffn {Scheme Procedure} bytevector-u8-ref bv index
4776 @deffnx {Scheme Procedure} bytevector-s8-ref bv index
4777 @deffnx {Scheme Procedure} bytevector-u16-ref bv index endianness
4778 @deffnx {Scheme Procedure} bytevector-s16-ref bv index endianness
4779 @deffnx {Scheme Procedure} bytevector-u32-ref bv index endianness
4780 @deffnx {Scheme Procedure} bytevector-s32-ref bv index endianness
4781 @deffnx {Scheme Procedure} bytevector-u64-ref bv index endianness
4782 @deffnx {Scheme Procedure} bytevector-s64-ref bv index endianness
4783 @deffnx {C Function} scm_bytevector_u8_ref (bv, index)
4784 @deffnx {C Function} scm_bytevector_s8_ref (bv, index)
4785 @deffnx {C Function} scm_bytevector_u16_ref (bv, index, endianness)
4786 @deffnx {C Function} scm_bytevector_s16_ref (bv, index, endianness)
4787 @deffnx {C Function} scm_bytevector_u32_ref (bv, index, endianness)
4788 @deffnx {C Function} scm_bytevector_s32_ref (bv, index, endianness)
4789 @deffnx {C Function} scm_bytevector_u64_ref (bv, index, endianness)
4790 @deffnx {C Function} scm_bytevector_s64_ref (bv, index, endianness)
4791 Return the unsigned @var{n}-bit (signed) integer (where @var{n} is 8,
4792 16, 32 or 64) from @var{bv} at @var{index}, decoded according to
4793 @var{endianness}.
4794 @end deffn
4795
4796 @deffn {Scheme Procedure} bytevector-u8-set! bv index value
4797 @deffnx {Scheme Procedure} bytevector-s8-set! bv index value
4798 @deffnx {Scheme Procedure} bytevector-u16-set! bv index value endianness
4799 @deffnx {Scheme Procedure} bytevector-s16-set! bv index value endianness
4800 @deffnx {Scheme Procedure} bytevector-u32-set! bv index value endianness
4801 @deffnx {Scheme Procedure} bytevector-s32-set! bv index value endianness
4802 @deffnx {Scheme Procedure} bytevector-u64-set! bv index value endianness
4803 @deffnx {Scheme Procedure} bytevector-s64-set! bv index value endianness
4804 @deffnx {C Function} scm_bytevector_u8_set_x (bv, index, value)
4805 @deffnx {C Function} scm_bytevector_s8_set_x (bv, index, value)
4806 @deffnx {C Function} scm_bytevector_u16_set_x (bv, index, value, endianness)
4807 @deffnx {C Function} scm_bytevector_s16_set_x (bv, index, value, endianness)
4808 @deffnx {C Function} scm_bytevector_u32_set_x (bv, index, value, endianness)
4809 @deffnx {C Function} scm_bytevector_s32_set_x (bv, index, value, endianness)
4810 @deffnx {C Function} scm_bytevector_u64_set_x (bv, index, value, endianness)
4811 @deffnx {C Function} scm_bytevector_s64_set_x (bv, index, value, endianness)
4812 Store @var{value} as an @var{n}-bit (signed) integer (where @var{n} is
4813 8, 16, 32 or 64) in @var{bv} at @var{index}, encoded according to
4814 @var{endianness}.
4815 @end deffn
4816
4817 Finally, a variant specialized for the host's endianness is available
4818 for each of these functions (with the exception of the @code{u8}
4819 accessors, for obvious reasons):
4820
4821 @deffn {Scheme Procedure} bytevector-u16-native-ref bv index
4822 @deffnx {Scheme Procedure} bytevector-s16-native-ref bv index
4823 @deffnx {Scheme Procedure} bytevector-u32-native-ref bv index
4824 @deffnx {Scheme Procedure} bytevector-s32-native-ref bv index
4825 @deffnx {Scheme Procedure} bytevector-u64-native-ref bv index
4826 @deffnx {Scheme Procedure} bytevector-s64-native-ref bv index
4827 @deffnx {C Function} scm_bytevector_u16_native_ref (bv, index)
4828 @deffnx {C Function} scm_bytevector_s16_native_ref (bv, index)
4829 @deffnx {C Function} scm_bytevector_u32_native_ref (bv, index)
4830 @deffnx {C Function} scm_bytevector_s32_native_ref (bv, index)
4831 @deffnx {C Function} scm_bytevector_u64_native_ref (bv, index)
4832 @deffnx {C Function} scm_bytevector_s64_native_ref (bv, index)
4833 Return the unsigned @var{n}-bit (signed) integer (where @var{n} is 8,
4834 16, 32 or 64) from @var{bv} at @var{index}, decoded according to the
4835 host's native endianness.
4836 @end deffn
4837
4838 @deffn {Scheme Procedure} bytevector-u16-native-set! bv index value
4839 @deffnx {Scheme Procedure} bytevector-s16-native-set! bv index value
4840 @deffnx {Scheme Procedure} bytevector-u32-native-set! bv index value
4841 @deffnx {Scheme Procedure} bytevector-s32-native-set! bv index value
4842 @deffnx {Scheme Procedure} bytevector-u64-native-set! bv index value
4843 @deffnx {Scheme Procedure} bytevector-s64-native-set! bv index value
4844 @deffnx {C Function} scm_bytevector_u16_native_set_x (bv, index, value)
4845 @deffnx {C Function} scm_bytevector_s16_native_set_x (bv, index, value)
4846 @deffnx {C Function} scm_bytevector_u32_native_set_x (bv, index, value)
4847 @deffnx {C Function} scm_bytevector_s32_native_set_x (bv, index, value)
4848 @deffnx {C Function} scm_bytevector_u64_native_set_x (bv, index, value)
4849 @deffnx {C Function} scm_bytevector_s64_native_set_x (bv, index, value)
4850 Store @var{value} as an @var{n}-bit (signed) integer (where @var{n} is
4851 8, 16, 32 or 64) in @var{bv} at @var{index}, encoded according to the
4852 host's native endianness.
4853 @end deffn
4854
4855
4856 @node Bytevectors and Integer Lists
4857 @subsubsection Converting Bytevectors to/from Integer Lists
4858
4859 Bytevector contents can readily be converted to/from lists of signed or
4860 unsigned integers:
4861
4862 @lisp
4863 (bytevector->sint-list (u8-list->bytevector (make-list 4 255))
4864 (endianness little) 2)
4865 @result{} (-1 -1)
4866 @end lisp
4867
4868 @deffn {Scheme Procedure} bytevector->u8-list bv
4869 @deffnx {C Function} scm_bytevector_to_u8_list (bv)
4870 Return a newly allocated list of unsigned 8-bit integers from the
4871 contents of @var{bv}.
4872 @end deffn
4873
4874 @deffn {Scheme Procedure} u8-list->bytevector lst
4875 @deffnx {C Function} scm_u8_list_to_bytevector (lst)
4876 Return a newly allocated bytevector consisting of the unsigned 8-bit
4877 integers listed in @var{lst}.
4878 @end deffn
4879
4880 @deffn {Scheme Procedure} bytevector->uint-list bv endianness size
4881 @deffnx {C Function} scm_bytevector_to_uint_list (bv, endianness, size)
4882 Return a list of unsigned integers of @var{size} bytes representing the
4883 contents of @var{bv}, decoded according to @var{endianness}.
4884 @end deffn
4885
4886 @deffn {Scheme Procedure} bytevector->sint-list bv endianness size
4887 @deffnx {C Function} scm_bytevector_to_sint_list (bv, endianness, size)
4888 Return a list of signed integers of @var{size} bytes representing the
4889 contents of @var{bv}, decoded according to @var{endianness}.
4890 @end deffn
4891
4892 @deffn {Scheme Procedure} uint-list->bytevector lst endianness size
4893 @deffnx {C Function} scm_uint_list_to_bytevector (lst, endianness, size)
4894 Return a new bytevector containing the unsigned integers listed in
4895 @var{lst} and encoded on @var{size} bytes according to @var{endianness}.
4896 @end deffn
4897
4898 @deffn {Scheme Procedure} sint-list->bytevector lst endianness size
4899 @deffnx {C Function} scm_sint_list_to_bytevector (lst, endianness, size)
4900 Return a new bytevector containing the signed integers listed in
4901 @var{lst} and encoded on @var{size} bytes according to @var{endianness}.
4902 @end deffn
4903
4904 @node Bytevectors as Floats
4905 @subsubsection Interpreting Bytevector Contents as Floating Point Numbers
4906
4907 @cindex IEEE-754 floating point numbers
4908
4909 Bytevector contents can also be accessed as IEEE-754 single- or
4910 double-precision floating point numbers (respectively 32 and 64-bit
4911 long) using the procedures described here.
4912
4913 @deffn {Scheme Procedure} bytevector-ieee-single-ref bv index endianness
4914 @deffnx {Scheme Procedure} bytevector-ieee-double-ref bv index endianness
4915 @deffnx {C Function} scm_bytevector_ieee_single_ref (bv, index, endianness)
4916 @deffnx {C Function} scm_bytevector_ieee_double_ref (bv, index, endianness)
4917 Return the IEEE-754 single-precision floating point number from @var{bv}
4918 at @var{index} according to @var{endianness}.
4919 @end deffn
4920
4921 @deffn {Scheme Procedure} bytevector-ieee-single-set! bv index value endianness
4922 @deffnx {Scheme Procedure} bytevector-ieee-double-set! bv index value endianness
4923 @deffnx {C Function} scm_bytevector_ieee_single_set_x (bv, index, value, endianness)
4924 @deffnx {C Function} scm_bytevector_ieee_double_set_x (bv, index, value, endianness)
4925 Store real number @var{value} in @var{bv} at @var{index} according to
4926 @var{endianness}.
4927 @end deffn
4928
4929 Specialized procedures are also available:
4930
4931 @deffn {Scheme Procedure} bytevector-ieee-single-native-ref bv index
4932 @deffnx {Scheme Procedure} bytevector-ieee-double-native-ref bv index
4933 @deffnx {C Function} scm_bytevector_ieee_single_native_ref (bv, index)
4934 @deffnx {C Function} scm_bytevector_ieee_double_native_ref (bv, index)
4935 Return the IEEE-754 single-precision floating point number from @var{bv}
4936 at @var{index} according to the host's native endianness.
4937 @end deffn
4938
4939 @deffn {Scheme Procedure} bytevector-ieee-single-native-set! bv index value
4940 @deffnx {Scheme Procedure} bytevector-ieee-double-native-set! bv index value
4941 @deffnx {C Function} scm_bytevector_ieee_single_native_set_x (bv, index, value)
4942 @deffnx {C Function} scm_bytevector_ieee_double_native_set_x (bv, index, value)
4943 Store real number @var{value} in @var{bv} at @var{index} according to
4944 the host's native endianness.
4945 @end deffn
4946
4947
4948 @node Bytevectors as Strings
4949 @subsubsection Interpreting Bytevector Contents as Unicode Strings
4950
4951 @cindex Unicode string encoding
4952
4953 Bytevector contents can also be interpreted as Unicode strings encoded
4954 in one of the most commonly available encoding formats.
4955 @xref{Representing Strings as Bytes}, for a more generic interface.
4956
4957 @lisp
4958 (utf8->string (u8-list->bytevector '(99 97 102 101)))
4959 @result{} "cafe"
4960
4961 (string->utf8 "caf@'e") ;; SMALL LATIN LETTER E WITH ACUTE ACCENT
4962 @result{} #vu8(99 97 102 195 169)
4963 @end lisp
4964
4965 @deffn {Scheme Procedure} string->utf8 str
4966 @deffnx {Scheme Procedure} string->utf16 str [endianness]
4967 @deffnx {Scheme Procedure} string->utf32 str [endianness]
4968 @deffnx {C Function} scm_string_to_utf8 (str)
4969 @deffnx {C Function} scm_string_to_utf16 (str, endianness)
4970 @deffnx {C Function} scm_string_to_utf32 (str, endianness)
4971 Return a newly allocated bytevector that contains the UTF-8, UTF-16, or
4972 UTF-32 (aka. UCS-4) encoding of @var{str}. For UTF-16 and UTF-32,
4973 @var{endianness} should be the symbol @code{big} or @code{little}; when omitted,
4974 it defaults to big endian.
4975 @end deffn
4976
4977 @deffn {Scheme Procedure} utf8->string utf
4978 @deffnx {Scheme Procedure} utf16->string utf [endianness]
4979 @deffnx {Scheme Procedure} utf32->string utf [endianness]
4980 @deffnx {C Function} scm_utf8_to_string (utf)
4981 @deffnx {C Function} scm_utf16_to_string (utf, endianness)
4982 @deffnx {C Function} scm_utf32_to_string (utf, endianness)
4983 Return a newly allocated string that contains from the UTF-8-, UTF-16-,
4984 or UTF-32-decoded contents of bytevector @var{utf}. For UTF-16 and UTF-32,
4985 @var{endianness} should be the symbol @code{big} or @code{little}; when omitted,
4986 it defaults to big endian.
4987 @end deffn
4988
4989 @node Bytevectors as Arrays
4990 @subsubsection Accessing Bytevectors with the Array API
4991
4992 As an extension to the R6RS, Guile allows bytevectors to be manipulated
4993 with the @dfn{array} procedures (@pxref{Arrays}). When using these
4994 APIs, bytes are accessed one at a time as 8-bit unsigned integers:
4995
4996 @example
4997 (define bv #vu8(0 1 2 3))
4998
4999 (array? bv)
5000 @result{} #t
5001
5002 (array-rank bv)
5003 @result{} 1
5004
5005 (array-ref bv 2)
5006 @result{} 2
5007
5008 ;; Note the different argument order on array-set!.
5009 (array-set! bv 77 2)
5010 (array-ref bv 2)
5011 @result{} 77
5012
5013 (array-type bv)
5014 @result{} vu8
5015 @end example
5016
5017
5018 @node Bytevectors as Uniform Vectors
5019 @subsubsection Accessing Bytevectors with the SRFI-4 API
5020
5021 Bytevectors may also be accessed with the SRFI-4 API. @xref{SRFI-4 and
5022 Bytevectors}, for more information.
5023
5024
5025 @node Symbols
5026 @subsection Symbols
5027 @tpindex Symbols
5028
5029 Symbols in Scheme are widely used in three ways: as items of discrete
5030 data, as lookup keys for alists and hash tables, and to denote variable
5031 references.
5032
5033 A @dfn{symbol} is similar to a string in that it is defined by a
5034 sequence of characters. The sequence of characters is known as the
5035 symbol's @dfn{name}. In the usual case --- that is, where the symbol's
5036 name doesn't include any characters that could be confused with other
5037 elements of Scheme syntax --- a symbol is written in a Scheme program by
5038 writing the sequence of characters that make up the name, @emph{without}
5039 any quotation marks or other special syntax. For example, the symbol
5040 whose name is ``multiply-by-2'' is written, simply:
5041
5042 @lisp
5043 multiply-by-2
5044 @end lisp
5045
5046 Notice how this differs from a @emph{string} with contents
5047 ``multiply-by-2'', which is written with double quotation marks, like
5048 this:
5049
5050 @lisp
5051 "multiply-by-2"
5052 @end lisp
5053
5054 Looking beyond how they are written, symbols are different from strings
5055 in two important respects.
5056
5057 The first important difference is uniqueness. If the same-looking
5058 string is read twice from two different places in a program, the result
5059 is two @emph{different} string objects whose contents just happen to be
5060 the same. If, on the other hand, the same-looking symbol is read twice
5061 from two different places in a program, the result is the @emph{same}
5062 symbol object both times.
5063
5064 Given two read symbols, you can use @code{eq?} to test whether they are
5065 the same (that is, have the same name). @code{eq?} is the most
5066 efficient comparison operator in Scheme, and comparing two symbols like
5067 this is as fast as comparing, for example, two numbers. Given two
5068 strings, on the other hand, you must use @code{equal?} or
5069 @code{string=?}, which are much slower comparison operators, to
5070 determine whether the strings have the same contents.
5071
5072 @lisp
5073 (define sym1 (quote hello))
5074 (define sym2 (quote hello))
5075 (eq? sym1 sym2) @result{} #t
5076
5077 (define str1 "hello")
5078 (define str2 "hello")
5079 (eq? str1 str2) @result{} #f
5080 (equal? str1 str2) @result{} #t
5081 @end lisp
5082
5083 The second important difference is that symbols, unlike strings, are not
5084 self-evaluating. This is why we need the @code{(quote @dots{})}s in the
5085 example above: @code{(quote hello)} evaluates to the symbol named
5086 "hello" itself, whereas an unquoted @code{hello} is @emph{read} as the
5087 symbol named "hello" and evaluated as a variable reference @dots{} about
5088 which more below (@pxref{Symbol Variables}).
5089
5090 @menu
5091 * Symbol Data:: Symbols as discrete data.
5092 * Symbol Keys:: Symbols as lookup keys.
5093 * Symbol Variables:: Symbols as denoting variables.
5094 * Symbol Primitives:: Operations related to symbols.
5095 * Symbol Props:: Function slots and property lists.
5096 * Symbol Read Syntax:: Extended read syntax for symbols.
5097 * Symbol Uninterned:: Uninterned symbols.
5098 @end menu
5099
5100
5101 @node Symbol Data
5102 @subsubsection Symbols as Discrete Data
5103
5104 Numbers and symbols are similar to the extent that they both lend
5105 themselves to @code{eq?} comparison. But symbols are more descriptive
5106 than numbers, because a symbol's name can be used directly to describe
5107 the concept for which that symbol stands.
5108
5109 For example, imagine that you need to represent some colours in a
5110 computer program. Using numbers, you would have to choose arbitrarily
5111 some mapping between numbers and colours, and then take care to use that
5112 mapping consistently:
5113
5114 @lisp
5115 ;; 1=red, 2=green, 3=purple
5116
5117 (if (eq? (colour-of car) 1)
5118 ...)
5119 @end lisp
5120
5121 @noindent
5122 You can make the mapping more explicit and the code more readable by
5123 defining constants:
5124
5125 @lisp
5126 (define red 1)
5127 (define green 2)
5128 (define purple 3)
5129
5130 (if (eq? (colour-of car) red)
5131 ...)
5132 @end lisp
5133
5134 @noindent
5135 But the simplest and clearest approach is not to use numbers at all, but
5136 symbols whose names specify the colours that they refer to:
5137
5138 @lisp
5139 (if (eq? (colour-of car) 'red)
5140 ...)
5141 @end lisp
5142
5143 The descriptive advantages of symbols over numbers increase as the set
5144 of concepts that you want to describe grows. Suppose that a car object
5145 can have other properties as well, such as whether it has or uses:
5146
5147 @itemize @bullet
5148 @item
5149 automatic or manual transmission
5150 @item
5151 leaded or unleaded fuel
5152 @item
5153 power steering (or not).
5154 @end itemize
5155
5156 @noindent
5157 Then a car's combined property set could be naturally represented and
5158 manipulated as a list of symbols:
5159
5160 @lisp
5161 (properties-of car1)
5162 @result{}
5163 (red manual unleaded power-steering)
5164
5165 (if (memq 'power-steering (properties-of car1))
5166 (display "Unfit people can drive this car.\n")
5167 (display "You'll need strong arms to drive this car!\n"))
5168 @print{}
5169 Unfit people can drive this car.
5170 @end lisp
5171
5172 Remember, the fundamental property of symbols that we are relying on
5173 here is that an occurrence of @code{'red} in one part of a program is an
5174 @emph{indistinguishable} symbol from an occurrence of @code{'red} in
5175 another part of a program; this means that symbols can usefully be
5176 compared using @code{eq?}. At the same time, symbols have naturally
5177 descriptive names. This combination of efficiency and descriptive power
5178 makes them ideal for use as discrete data.
5179
5180
5181 @node Symbol Keys
5182 @subsubsection Symbols as Lookup Keys
5183
5184 Given their efficiency and descriptive power, it is natural to use
5185 symbols as the keys in an association list or hash table.
5186
5187 To illustrate this, consider a more structured representation of the car
5188 properties example from the preceding subsection. Rather than
5189 mixing all the properties up together in a flat list, we could use an
5190 association list like this:
5191
5192 @lisp
5193 (define car1-properties '((colour . red)
5194 (transmission . manual)
5195 (fuel . unleaded)
5196 (steering . power-assisted)))
5197 @end lisp
5198
5199 Notice how this structure is more explicit and extensible than the flat
5200 list. For example it makes clear that @code{manual} refers to the
5201 transmission rather than, say, the windows or the locking of the car.
5202 It also allows further properties to use the same symbols among their
5203 possible values without becoming ambiguous:
5204
5205 @lisp
5206 (define car1-properties '((colour . red)
5207 (transmission . manual)
5208 (fuel . unleaded)
5209 (steering . power-assisted)
5210 (seat-colour . red)
5211 (locking . manual)))
5212 @end lisp
5213
5214 With a representation like this, it is easy to use the efficient
5215 @code{assq-XXX} family of procedures (@pxref{Association Lists}) to
5216 extract or change individual pieces of information:
5217
5218 @lisp
5219 (assq-ref car1-properties 'fuel) @result{} unleaded
5220 (assq-ref car1-properties 'transmission) @result{} manual
5221
5222 (assq-set! car1-properties 'seat-colour 'black)
5223 @result{}
5224 ((colour . red)
5225 (transmission . manual)
5226 (fuel . unleaded)
5227 (steering . power-assisted)
5228 (seat-colour . black)
5229 (locking . manual)))
5230 @end lisp
5231
5232 Hash tables also have keys, and exactly the same arguments apply to the
5233 use of symbols in hash tables as in association lists. The hash value
5234 that Guile uses to decide where to add a symbol-keyed entry to a hash
5235 table can be obtained by calling the @code{symbol-hash} procedure:
5236
5237 @deffn {Scheme Procedure} symbol-hash symbol
5238 @deffnx {C Function} scm_symbol_hash (symbol)
5239 Return a hash value for @var{symbol}.
5240 @end deffn
5241
5242 See @ref{Hash Tables} for information about hash tables in general, and
5243 for why you might choose to use a hash table rather than an association
5244 list.
5245
5246
5247 @node Symbol Variables
5248 @subsubsection Symbols as Denoting Variables
5249
5250 When an unquoted symbol in a Scheme program is evaluated, it is
5251 interpreted as a variable reference, and the result of the evaluation is
5252 the appropriate variable's value.
5253
5254 For example, when the expression @code{(string-length "abcd")} is read
5255 and evaluated, the sequence of characters @code{string-length} is read
5256 as the symbol whose name is "string-length". This symbol is associated
5257 with a variable whose value is the procedure that implements string
5258 length calculation. Therefore evaluation of the @code{string-length}
5259 symbol results in that procedure.
5260
5261 The details of the connection between an unquoted symbol and the
5262 variable to which it refers are explained elsewhere. See @ref{Binding
5263 Constructs}, for how associations between symbols and variables are
5264 created, and @ref{Modules}, for how those associations are affected by
5265 Guile's module system.
5266
5267
5268 @node Symbol Primitives
5269 @subsubsection Operations Related to Symbols
5270
5271 Given any Scheme value, you can determine whether it is a symbol using
5272 the @code{symbol?} primitive:
5273
5274 @rnindex symbol?
5275 @deffn {Scheme Procedure} symbol? obj
5276 @deffnx {C Function} scm_symbol_p (obj)
5277 Return @code{#t} if @var{obj} is a symbol, otherwise return
5278 @code{#f}.
5279 @end deffn
5280
5281 @deftypefn {C Function} int scm_is_symbol (SCM val)
5282 Equivalent to @code{scm_is_true (scm_symbol_p (val))}.
5283 @end deftypefn
5284
5285 Once you know that you have a symbol, you can obtain its name as a
5286 string by calling @code{symbol->string}. Note that Guile differs by
5287 default from R5RS on the details of @code{symbol->string} as regards
5288 case-sensitivity:
5289
5290 @rnindex symbol->string
5291 @deffn {Scheme Procedure} symbol->string s
5292 @deffnx {C Function} scm_symbol_to_string (s)
5293 Return the name of symbol @var{s} as a string. By default, Guile reads
5294 symbols case-sensitively, so the string returned will have the same case
5295 variation as the sequence of characters that caused @var{s} to be
5296 created.
5297
5298 If Guile is set to read symbols case-insensitively (as specified by
5299 R5RS), and @var{s} comes into being as part of a literal expression
5300 (@pxref{Literal expressions,,,r5rs, The Revised^5 Report on Scheme}) or
5301 by a call to the @code{read} or @code{string-ci->symbol} procedures,
5302 Guile converts any alphabetic characters in the symbol's name to
5303 lower case before creating the symbol object, so the string returned
5304 here will be in lower case.
5305
5306 If @var{s} was created by @code{string->symbol}, the case of characters
5307 in the string returned will be the same as that in the string that was
5308 passed to @code{string->symbol}, regardless of Guile's case-sensitivity
5309 setting at the time @var{s} was created.
5310
5311 It is an error to apply mutation procedures like @code{string-set!} to
5312 strings returned by this procedure.
5313 @end deffn
5314
5315 Most symbols are created by writing them literally in code. However it
5316 is also possible to create symbols programmatically using the following
5317 procedures:
5318
5319 @deffn {Scheme Procedure} symbol char@dots{}
5320 @rnindex symbol
5321 Return a newly allocated symbol made from the given character arguments.
5322
5323 @example
5324 (symbol #\x #\y #\z) @result{} xyz
5325 @end example
5326 @end deffn
5327
5328 @deffn {Scheme Procedure} list->symbol lst
5329 @rnindex list->symbol
5330 Return a newly allocated symbol made from a list of characters.
5331
5332 @example
5333 (list->symbol '(#\a #\b #\c)) @result{} abc
5334 @end example
5335 @end deffn
5336
5337 @rnindex symbol-append
5338 @deffn {Scheme Procedure} symbol-append arg @dots{}
5339 Return a newly allocated symbol whose characters form the
5340 concatenation of the given symbols, @var{arg} @enddots{}.
5341
5342 @example
5343 (let ((h 'hello))
5344 (symbol-append h 'world))
5345 @result{} helloworld
5346 @end example
5347 @end deffn
5348
5349 @rnindex string->symbol
5350 @deffn {Scheme Procedure} string->symbol string
5351 @deffnx {C Function} scm_string_to_symbol (string)
5352 Return the symbol whose name is @var{string}. This procedure can create
5353 symbols with names containing special characters or letters in the
5354 non-standard case, but it is usually a bad idea to create such symbols
5355 because in some implementations of Scheme they cannot be read as
5356 themselves.
5357 @end deffn
5358
5359 @deffn {Scheme Procedure} string-ci->symbol str
5360 @deffnx {C Function} scm_string_ci_to_symbol (str)
5361 Return the symbol whose name is @var{str}. If Guile is currently
5362 reading symbols case-insensitively, @var{str} is converted to lowercase
5363 before the returned symbol is looked up or created.
5364 @end deffn
5365
5366 The following examples illustrate Guile's detailed behaviour as regards
5367 the case-sensitivity of symbols:
5368
5369 @lisp
5370 (read-enable 'case-insensitive) ; R5RS compliant behaviour
5371
5372 (symbol->string 'flying-fish) @result{} "flying-fish"
5373 (symbol->string 'Martin) @result{} "martin"
5374 (symbol->string
5375 (string->symbol "Malvina")) @result{} "Malvina"
5376
5377 (eq? 'mISSISSIppi 'mississippi) @result{} #t
5378 (string->symbol "mISSISSIppi") @result{} mISSISSIppi
5379 (eq? 'bitBlt (string->symbol "bitBlt")) @result{} #f
5380 (eq? 'LolliPop
5381 (string->symbol (symbol->string 'LolliPop))) @result{} #t
5382 (string=? "K. Harper, M.D."
5383 (symbol->string
5384 (string->symbol "K. Harper, M.D."))) @result{} #t
5385
5386 (read-disable 'case-insensitive) ; Guile default behaviour
5387
5388 (symbol->string 'flying-fish) @result{} "flying-fish"
5389 (symbol->string 'Martin) @result{} "Martin"
5390 (symbol->string
5391 (string->symbol "Malvina")) @result{} "Malvina"
5392
5393 (eq? 'mISSISSIppi 'mississippi) @result{} #f
5394 (string->symbol "mISSISSIppi") @result{} mISSISSIppi
5395 (eq? 'bitBlt (string->symbol "bitBlt")) @result{} #t
5396 (eq? 'LolliPop
5397 (string->symbol (symbol->string 'LolliPop))) @result{} #t
5398 (string=? "K. Harper, M.D."
5399 (symbol->string
5400 (string->symbol "K. Harper, M.D."))) @result{} #t
5401 @end lisp
5402
5403 From C, there are lower level functions that construct a Scheme symbol
5404 from a C string in the current locale encoding.
5405
5406 When you want to do more from C, you should convert between symbols
5407 and strings using @code{scm_symbol_to_string} and
5408 @code{scm_string_to_symbol} and work with the strings.
5409
5410 @deftypefn {C Function} SCM scm_from_latin1_symbol (const char *name)
5411 @deftypefnx {C Function} SCM scm_from_utf8_symbol (const char *name)
5412 Construct and return a Scheme symbol whose name is specified by the
5413 null-terminated C string @var{name}. These are appropriate when
5414 the C string is hard-coded in the source code.
5415 @end deftypefn
5416
5417 @deftypefn {C Function} SCM scm_from_locale_symbol (const char *name)
5418 @deftypefnx {C Function} SCM scm_from_locale_symboln (const char *name, size_t len)
5419 Construct and return a Scheme symbol whose name is specified by
5420 @var{name}. For @code{scm_from_locale_symbol}, @var{name} must be null
5421 terminated; for @code{scm_from_locale_symboln} the length of @var{name} is
5422 specified explicitly by @var{len}.
5423
5424 Note that these functions should @emph{not} be used when @var{name} is a
5425 C string constant, because there is no guarantee that the current locale
5426 will match that of the execution character set, used for string and
5427 character constants. Most modern C compilers use UTF-8 by default, so
5428 in such cases we recommend @code{scm_from_utf8_symbol}.
5429 @end deftypefn
5430
5431 @deftypefn {C Function} SCM scm_take_locale_symbol (char *str)
5432 @deftypefnx {C Function} SCM scm_take_locale_symboln (char *str, size_t len)
5433 Like @code{scm_from_locale_symbol} and @code{scm_from_locale_symboln},
5434 respectively, but also frees @var{str} with @code{free} eventually.
5435 Thus, you can use this function when you would free @var{str} anyway
5436 immediately after creating the Scheme string. In certain cases, Guile
5437 can then use @var{str} directly as its internal representation.
5438 @end deftypefn
5439
5440 The size of a symbol can also be obtained from C:
5441
5442 @deftypefn {C Function} size_t scm_c_symbol_length (SCM sym)
5443 Return the number of characters in @var{sym}.
5444 @end deftypefn
5445
5446 Finally, some applications, especially those that generate new Scheme
5447 code dynamically, need to generate symbols for use in the generated
5448 code. The @code{gensym} primitive meets this need:
5449
5450 @deffn {Scheme Procedure} gensym [prefix]
5451 @deffnx {C Function} scm_gensym (prefix)
5452 Create a new symbol with a name constructed from a prefix and a counter
5453 value. The string @var{prefix} can be specified as an optional
5454 argument. Default prefix is @samp{@w{ g}}. The counter is increased by 1
5455 at each call. There is no provision for resetting the counter.
5456 @end deffn
5457
5458 The symbols generated by @code{gensym} are @emph{likely} to be unique,
5459 since their names begin with a space and it is only otherwise possible
5460 to generate such symbols if a programmer goes out of their way to do
5461 so. Uniqueness can be guaranteed by instead using uninterned symbols
5462 (@pxref{Symbol Uninterned}), though they can't be usefully written out
5463 and read back in.
5464
5465
5466 @node Symbol Props
5467 @subsubsection Function Slots and Property Lists
5468
5469 In traditional Lisp dialects, symbols are often understood as having
5470 three kinds of value at once:
5471
5472 @itemize @bullet
5473 @item
5474 a @dfn{variable} value, which is used when the symbol appears in
5475 code in a variable reference context
5476
5477 @item
5478 a @dfn{function} value, which is used when the symbol appears in
5479 code in a function name position (i.e.@: as the first element in an
5480 unquoted list)
5481
5482 @item
5483 a @dfn{property list} value, which is used when the symbol is given as
5484 the first argument to Lisp's @code{put} or @code{get} functions.
5485 @end itemize
5486
5487 Although Scheme (as one of its simplifications with respect to Lisp)
5488 does away with the distinction between variable and function namespaces,
5489 Guile currently retains some elements of the traditional structure in
5490 case they turn out to be useful when implementing translators for other
5491 languages, in particular Emacs Lisp.
5492
5493 Specifically, Guile symbols have two extra slots, one for a symbol's
5494 property list, and one for its ``function value.'' The following procedures
5495 are provided to access these slots.
5496
5497 @deffn {Scheme Procedure} symbol-fref symbol
5498 @deffnx {C Function} scm_symbol_fref (symbol)
5499 Return the contents of @var{symbol}'s @dfn{function slot}.
5500 @end deffn
5501
5502 @deffn {Scheme Procedure} symbol-fset! symbol value
5503 @deffnx {C Function} scm_symbol_fset_x (symbol, value)
5504 Set the contents of @var{symbol}'s function slot to @var{value}.
5505 @end deffn
5506
5507 @deffn {Scheme Procedure} symbol-pref symbol
5508 @deffnx {C Function} scm_symbol_pref (symbol)
5509 Return the @dfn{property list} currently associated with @var{symbol}.
5510 @end deffn
5511
5512 @deffn {Scheme Procedure} symbol-pset! symbol value
5513 @deffnx {C Function} scm_symbol_pset_x (symbol, value)
5514 Set @var{symbol}'s property list to @var{value}.
5515 @end deffn
5516
5517 @deffn {Scheme Procedure} symbol-property sym prop
5518 From @var{sym}'s property list, return the value for property
5519 @var{prop}. The assumption is that @var{sym}'s property list is an
5520 association list whose keys are distinguished from each other using
5521 @code{equal?}; @var{prop} should be one of the keys in that list. If
5522 the property list has no entry for @var{prop}, @code{symbol-property}
5523 returns @code{#f}.
5524 @end deffn
5525
5526 @deffn {Scheme Procedure} set-symbol-property! sym prop val
5527 In @var{sym}'s property list, set the value for property @var{prop} to
5528 @var{val}, or add a new entry for @var{prop}, with value @var{val}, if
5529 none already exists. For the structure of the property list, see
5530 @code{symbol-property}.
5531 @end deffn
5532
5533 @deffn {Scheme Procedure} symbol-property-remove! sym prop
5534 From @var{sym}'s property list, remove the entry for property
5535 @var{prop}, if there is one. For the structure of the property list,
5536 see @code{symbol-property}.
5537 @end deffn
5538
5539 Support for these extra slots may be removed in a future release, and it
5540 is probably better to avoid using them. For a more modern and Schemely
5541 approach to properties, see @ref{Object Properties}.
5542
5543
5544 @node Symbol Read Syntax
5545 @subsubsection Extended Read Syntax for Symbols
5546
5547 @cindex r7rs-symbols
5548
5549 The read syntax for a symbol is a sequence of letters, digits, and
5550 @dfn{extended alphabetic characters}, beginning with a character that
5551 cannot begin a number. In addition, the special cases of @code{+},
5552 @code{-}, and @code{...} are read as symbols even though numbers can
5553 begin with @code{+}, @code{-} or @code{.}.
5554
5555 Extended alphabetic characters may be used within identifiers as if
5556 they were letters. The set of extended alphabetic characters is:
5557
5558 @example
5559 ! $ % & * + - . / : < = > ? @@ ^ _ ~
5560 @end example
5561
5562 In addition to the standard read syntax defined above (which is taken
5563 from R5RS (@pxref{Formal syntax,,,r5rs,The Revised^5 Report on
5564 Scheme})), Guile provides an extended symbol read syntax that allows the
5565 inclusion of unusual characters such as space characters, newlines and
5566 parentheses. If (for whatever reason) you need to write a symbol
5567 containing characters not mentioned above, you can do so as follows.
5568
5569 @itemize @bullet
5570 @item
5571 Begin the symbol with the characters @code{#@{},
5572
5573 @item
5574 write the characters of the symbol and
5575
5576 @item
5577 finish the symbol with the characters @code{@}#}.
5578 @end itemize
5579
5580 Here are a few examples of this form of read syntax. The first symbol
5581 needs to use extended syntax because it contains a space character, the
5582 second because it contains a line break, and the last because it looks
5583 like a number.
5584
5585 @lisp
5586 #@{foo bar@}#
5587
5588 #@{what
5589 ever@}#
5590
5591 #@{4242@}#
5592 @end lisp
5593
5594 Although Guile provides this extended read syntax for symbols,
5595 widespread usage of it is discouraged because it is not portable and not
5596 very readable.
5597
5598 Alternatively, if you enable the @code{r7rs-symbols} read option (see
5599 @pxref{Scheme Read}), you can write arbitrary symbols using the same
5600 notation used for strings, except delimited by vertical bars instead of
5601 double quotes.
5602
5603 @example
5604 |foo bar|
5605 |\x3BB; is a greek lambda|
5606 |\| is a vertical bar|
5607 @end example
5608
5609 Note that there's also an @code{r7rs-symbols} print option
5610 (@pxref{Scheme Write}). To enable the use of this notation, evaluate
5611 one or both of the following expressions:
5612
5613 @example
5614 (read-enable 'r7rs-symbols)
5615 (print-enable 'r7rs-symbols)
5616 @end example
5617
5618
5619 @node Symbol Uninterned
5620 @subsubsection Uninterned Symbols
5621
5622 What makes symbols useful is that they are automatically kept unique.
5623 There are no two symbols that are distinct objects but have the same
5624 name. But of course, there is no rule without exception. In addition
5625 to the normal symbols that have been discussed up to now, you can also
5626 create special @dfn{uninterned} symbols that behave slightly
5627 differently.
5628
5629 To understand what is different about them and why they might be useful,
5630 we look at how normal symbols are actually kept unique.
5631
5632 Whenever Guile wants to find the symbol with a specific name, for
5633 example during @code{read} or when executing @code{string->symbol}, it
5634 first looks into a table of all existing symbols to find out whether a
5635 symbol with the given name already exists. When this is the case, Guile
5636 just returns that symbol. When not, a new symbol with the name is
5637 created and entered into the table so that it can be found later.
5638
5639 Sometimes you might want to create a symbol that is guaranteed `fresh',
5640 i.e.@: a symbol that did not exist previously. You might also want to
5641 somehow guarantee that no one else will ever unintentionally stumble
5642 across your symbol in the future. These properties of a symbol are
5643 often needed when generating code during macro expansion. When
5644 introducing new temporary variables, you want to guarantee that they
5645 don't conflict with variables in other people's code.
5646
5647 The simplest way to arrange for this is to create a new symbol but
5648 not enter it into the global table of all symbols. That way, no one
5649 will ever get access to your symbol by chance. Symbols that are not in
5650 the table are called @dfn{uninterned}. Of course, symbols that
5651 @emph{are} in the table are called @dfn{interned}.
5652
5653 You create new uninterned symbols with the function @code{make-symbol}.
5654 You can test whether a symbol is interned or not with
5655 @code{symbol-interned?}.
5656
5657 Uninterned symbols break the rule that the name of a symbol uniquely
5658 identifies the symbol object. Because of this, they can not be written
5659 out and read back in like interned symbols. Currently, Guile has no
5660 support for reading uninterned symbols. Note that the function
5661 @code{gensym} does not return uninterned symbols for this reason.
5662
5663 @deffn {Scheme Procedure} make-symbol name
5664 @deffnx {C Function} scm_make_symbol (name)
5665 Return a new uninterned symbol with the name @var{name}. The returned
5666 symbol is guaranteed to be unique and future calls to
5667 @code{string->symbol} will not return it.
5668 @end deffn
5669
5670 @deffn {Scheme Procedure} symbol-interned? symbol
5671 @deffnx {C Function} scm_symbol_interned_p (symbol)
5672 Return @code{#t} if @var{symbol} is interned, otherwise return
5673 @code{#f}.
5674 @end deffn
5675
5676 For example:
5677
5678 @lisp
5679 (define foo-1 (string->symbol "foo"))
5680 (define foo-2 (string->symbol "foo"))
5681 (define foo-3 (make-symbol "foo"))
5682 (define foo-4 (make-symbol "foo"))
5683
5684 (eq? foo-1 foo-2)
5685 @result{} #t
5686 ; Two interned symbols with the same name are the same object,
5687
5688 (eq? foo-1 foo-3)
5689 @result{} #f
5690 ; but a call to make-symbol with the same name returns a
5691 ; distinct object.
5692
5693 (eq? foo-3 foo-4)
5694 @result{} #f
5695 ; A call to make-symbol always returns a new object, even for
5696 ; the same name.
5697
5698 foo-3
5699 @result{} #<uninterned-symbol foo 8085290>
5700 ; Uninterned symbols print differently from interned symbols,
5701
5702 (symbol? foo-3)
5703 @result{} #t
5704 ; but they are still symbols,
5705
5706 (symbol-interned? foo-3)
5707 @result{} #f
5708 ; just not interned.
5709 @end lisp
5710
5711
5712 @node Keywords
5713 @subsection Keywords
5714 @tpindex Keywords
5715
5716 Keywords are self-evaluating objects with a convenient read syntax that
5717 makes them easy to type.
5718
5719 Guile's keyword support conforms to R5RS, and adds a (switchable) read
5720 syntax extension to permit keywords to begin with @code{:} as well as
5721 @code{#:}, or to end with @code{:}.
5722
5723 @menu
5724 * Why Use Keywords?:: Motivation for keyword usage.
5725 * Coding With Keywords:: How to use keywords.
5726 * Keyword Read Syntax:: Read syntax for keywords.
5727 * Keyword Procedures:: Procedures for dealing with keywords.
5728 @end menu
5729
5730 @node Why Use Keywords?
5731 @subsubsection Why Use Keywords?
5732
5733 Keywords are useful in contexts where a program or procedure wants to be
5734 able to accept a large number of optional arguments without making its
5735 interface unmanageable.
5736
5737 To illustrate this, consider a hypothetical @code{make-window}
5738 procedure, which creates a new window on the screen for drawing into
5739 using some graphical toolkit. There are many parameters that the caller
5740 might like to specify, but which could also be sensibly defaulted, for
5741 example:
5742
5743 @itemize @bullet
5744 @item
5745 color depth -- Default: the color depth for the screen
5746
5747 @item
5748 background color -- Default: white
5749
5750 @item
5751 width -- Default: 600
5752
5753 @item
5754 height -- Default: 400
5755 @end itemize
5756
5757 If @code{make-window} did not use keywords, the caller would have to
5758 pass in a value for each possible argument, remembering the correct
5759 argument order and using a special value to indicate the default value
5760 for that argument:
5761
5762 @lisp
5763 (make-window 'default ;; Color depth
5764 'default ;; Background color
5765 800 ;; Width
5766 100 ;; Height
5767 @dots{}) ;; More make-window arguments
5768 @end lisp
5769
5770 With keywords, on the other hand, defaulted arguments are omitted, and
5771 non-default arguments are clearly tagged by the appropriate keyword. As
5772 a result, the invocation becomes much clearer:
5773
5774 @lisp
5775 (make-window #:width 800 #:height 100)
5776 @end lisp
5777
5778 On the other hand, for a simpler procedure with few arguments, the use
5779 of keywords would be a hindrance rather than a help. The primitive
5780 procedure @code{cons}, for example, would not be improved if it had to
5781 be invoked as
5782
5783 @lisp
5784 (cons #:car x #:cdr y)
5785 @end lisp
5786
5787 So the decision whether to use keywords or not is purely pragmatic: use
5788 them if they will clarify the procedure invocation at point of call.
5789
5790 @node Coding With Keywords
5791 @subsubsection Coding With Keywords
5792
5793 If a procedure wants to support keywords, it should take a rest argument
5794 and then use whatever means is convenient to extract keywords and their
5795 corresponding arguments from the contents of that rest argument.
5796
5797 The following example illustrates the principle: the code for
5798 @code{make-window} uses a helper procedure called
5799 @code{get-keyword-value} to extract individual keyword arguments from
5800 the rest argument.
5801
5802 @lisp
5803 (define (get-keyword-value args keyword default)
5804 (let ((kv (memq keyword args)))
5805 (if (and kv (>= (length kv) 2))
5806 (cadr kv)
5807 default)))
5808
5809 (define (make-window . args)
5810 (let ((depth (get-keyword-value args #:depth screen-depth))
5811 (bg (get-keyword-value args #:bg "white"))
5812 (width (get-keyword-value args #:width 800))
5813 (height (get-keyword-value args #:height 100))
5814 @dots{})
5815 @dots{}))
5816 @end lisp
5817
5818 But you don't need to write @code{get-keyword-value}. The @code{(ice-9
5819 optargs)} module provides a set of powerful macros that you can use to
5820 implement keyword-supporting procedures like this:
5821
5822 @lisp
5823 (use-modules (ice-9 optargs))
5824
5825 (define (make-window . args)
5826 (let-keywords args #f ((depth screen-depth)
5827 (bg "white")
5828 (width 800)
5829 (height 100))
5830 ...))
5831 @end lisp
5832
5833 @noindent
5834 Or, even more economically, like this:
5835
5836 @lisp
5837 (use-modules (ice-9 optargs))
5838
5839 (define* (make-window #:key (depth screen-depth)
5840 (bg "white")
5841 (width 800)
5842 (height 100))
5843 ...)
5844 @end lisp
5845
5846 For further details on @code{let-keywords}, @code{define*} and other
5847 facilities provided by the @code{(ice-9 optargs)} module, see
5848 @ref{Optional Arguments}.
5849
5850 To handle keyword arguments from procedures implemented in C,
5851 use @code{scm_c_bind_keyword_arguments} (@pxref{Keyword Procedures}).
5852
5853 @node Keyword Read Syntax
5854 @subsubsection Keyword Read Syntax
5855
5856 Guile, by default, only recognizes a keyword syntax that is compatible
5857 with R5RS. A token of the form @code{#:NAME}, where @code{NAME} has the
5858 same syntax as a Scheme symbol (@pxref{Symbol Read Syntax}), is the
5859 external representation of the keyword named @code{NAME}. Keyword
5860 objects print using this syntax as well, so values containing keyword
5861 objects can be read back into Guile. When used in an expression,
5862 keywords are self-quoting objects.
5863
5864 If the @code{keyword} read option is set to @code{'prefix}, Guile also
5865 recognizes the alternative read syntax @code{:NAME}. Otherwise, tokens
5866 of the form @code{:NAME} are read as symbols, as required by R5RS.
5867
5868 @cindex SRFI-88 keyword syntax
5869
5870 If the @code{keyword} read option is set to @code{'postfix}, Guile
5871 recognizes the SRFI-88 read syntax @code{NAME:} (@pxref{SRFI-88}).
5872 Otherwise, tokens of this form are read as symbols.
5873
5874 To enable and disable the alternative non-R5RS keyword syntax, you use
5875 the @code{read-set!} procedure documented @ref{Scheme Read}. Note that
5876 the @code{prefix} and @code{postfix} syntax are mutually exclusive.
5877
5878 @lisp
5879 (read-set! keywords 'prefix)
5880
5881 #:type
5882 @result{}
5883 #:type
5884
5885 :type
5886 @result{}
5887 #:type
5888
5889 (read-set! keywords 'postfix)
5890
5891 type:
5892 @result{}
5893 #:type
5894
5895 :type
5896 @result{}
5897 :type
5898
5899 (read-set! keywords #f)
5900
5901 #:type
5902 @result{}
5903 #:type
5904
5905 :type
5906 @print{}
5907 ERROR: In expression :type:
5908 ERROR: Unbound variable: :type
5909 ABORT: (unbound-variable)
5910 @end lisp
5911
5912 @node Keyword Procedures
5913 @subsubsection Keyword Procedures
5914
5915 @deffn {Scheme Procedure} keyword? obj
5916 @deffnx {C Function} scm_keyword_p (obj)
5917 Return @code{#t} if the argument @var{obj} is a keyword, else
5918 @code{#f}.
5919 @end deffn
5920
5921 @deffn {Scheme Procedure} keyword->symbol keyword
5922 @deffnx {C Function} scm_keyword_to_symbol (keyword)
5923 Return the symbol with the same name as @var{keyword}.
5924 @end deffn
5925
5926 @deffn {Scheme Procedure} symbol->keyword symbol
5927 @deffnx {C Function} scm_symbol_to_keyword (symbol)
5928 Return the keyword with the same name as @var{symbol}.
5929 @end deffn
5930
5931 @deftypefn {C Function} int scm_is_keyword (SCM obj)
5932 Equivalent to @code{scm_is_true (scm_keyword_p (@var{obj}))}.
5933 @end deftypefn
5934
5935 @deftypefn {C Function} SCM scm_from_locale_keyword (const char *name)
5936 @deftypefnx {C Function} SCM scm_from_locale_keywordn (const char *name, size_t len)
5937 Equivalent to @code{scm_symbol_to_keyword (scm_from_locale_symbol
5938 (@var{name}))} and @code{scm_symbol_to_keyword (scm_from_locale_symboln
5939 (@var{name}, @var{len}))}, respectively.
5940
5941 Note that these functions should @emph{not} be used when @var{name} is a
5942 C string constant, because there is no guarantee that the current locale
5943 will match that of the execution character set, used for string and
5944 character constants. Most modern C compilers use UTF-8 by default, so
5945 in such cases we recommend @code{scm_from_utf8_keyword}.
5946 @end deftypefn
5947
5948 @deftypefn {C Function} SCM scm_from_latin1_keyword (const char *name)
5949 @deftypefnx {C Function} SCM scm_from_utf8_keyword (const char *name)
5950 Equivalent to @code{scm_symbol_to_keyword (scm_from_latin1_symbol
5951 (@var{name}))} and @code{scm_symbol_to_keyword (scm_from_utf8_symbol
5952 (@var{name}))}, respectively.
5953 @end deftypefn
5954
5955 @deftypefn {C Function} void scm_c_bind_keyword_arguments (const char *subr, @
5956 SCM rest, scm_t_keyword_arguments_flags flags, @
5957 SCM keyword1, SCM *argp1, @
5958 @dots{}, @
5959 SCM keywordN, SCM *argpN, @
5960 @nicode{SCM_UNDEFINED})
5961
5962 Extract the specified keyword arguments from @var{rest}, which is not
5963 modified. If the keyword argument @var{keyword1} is present in
5964 @var{rest} with an associated value, that value is stored in the
5965 variable pointed to by @var{argp1}, otherwise the variable is left
5966 unchanged. Similarly for the other keywords and argument pointers up to
5967 @var{keywordN} and @var{argpN}. The argument list to
5968 @code{scm_c_bind_keyword_arguments} must be terminated by
5969 @code{SCM_UNDEFINED}.
5970
5971 Note that since the variables pointed to by @var{argp1} through
5972 @var{argpN} are left unchanged if the associated keyword argument is not
5973 present, they should be initialized to their default values before
5974 calling @code{scm_c_bind_keyword_arguments}. Alternatively, you can
5975 initialize them to @code{SCM_UNDEFINED} before the call, and then use
5976 @code{SCM_UNBNDP} after the call to see which ones were provided.
5977
5978 If an unrecognized keyword argument is present in @var{rest} and
5979 @var{flags} does not contain @code{SCM_ALLOW_OTHER_KEYS}, or if
5980 non-keyword arguments are present and @var{flags} does not contain
5981 @code{SCM_ALLOW_NON_KEYWORD_ARGUMENTS}, an exception is raised.
5982 @var{subr} should be the name of the procedure receiving the keyword
5983 arguments, for purposes of error reporting.
5984
5985 For example:
5986
5987 @example
5988 SCM k_delimiter;
5989 SCM k_grammar;
5990 SCM sym_infix;
5991
5992 SCM my_string_join (SCM strings, SCM rest)
5993 @{
5994 SCM delimiter = SCM_UNDEFINED;
5995 SCM grammar = sym_infix;
5996
5997 scm_c_bind_keyword_arguments ("my-string-join", rest, 0,
5998 k_delimiter, &delimiter,
5999 k_grammar, &grammar,
6000 SCM_UNDEFINED);
6001
6002 if (SCM_UNBNDP (delimiter))
6003 delimiter = scm_from_utf8_string (" ");
6004
6005 return scm_string_join (strings, delimiter, grammar);
6006 @}
6007
6008 void my_init ()
6009 @{
6010 k_delimiter = scm_from_utf8_keyword ("delimiter");
6011 k_grammar = scm_from_utf8_keyword ("grammar");
6012 sym_infix = scm_from_utf8_symbol ("infix");
6013 scm_c_define_gsubr ("my-string-join", 1, 0, 1, my_string_join);
6014 @}
6015 @end example
6016 @end deftypefn
6017
6018
6019 @node Other Types
6020 @subsection ``Functionality-Centric'' Data Types
6021
6022 Procedures and macros are documented in their own sections: see
6023 @ref{Procedures} and @ref{Macros}.
6024
6025 Variable objects are documented as part of the description of Guile's
6026 module system: see @ref{Variables}.
6027
6028 Asyncs, dynamic roots and fluids are described in the section on
6029 scheduling: see @ref{Scheduling}.
6030
6031 Hooks are documented in the section on general utility functions: see
6032 @ref{Hooks}.
6033
6034 Ports are described in the section on I/O: see @ref{Input and Output}.
6035
6036 Regular expressions are described in their own section: see @ref{Regular
6037 Expressions}.
6038
6039 @c Local Variables:
6040 @c TeX-master: "guile.texi"
6041 @c End: