Rename {euclidean,centered}_quo_rem to {euclidean,centered}_divide
[bpt/guile.git] / doc / ref / api-data.texi
CommitLineData
07d83abe
MV
1@c -*-texinfo-*-
2@c This is part of the GNU Guile Reference Manual.
bf7c2e96 3@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2006, 2007, 2008, 2009, 2010, 2011
07d83abe
MV
4@c Free Software Foundation, Inc.
5@c See the file guile.texi for copying conditions.
6
07d83abe
MV
7@node Simple Data Types
8@section Simple Generic Data Types
9
10This chapter describes those of Guile's simple data types which are
11primarily used for their role as items of generic data. By
12@dfn{simple} we mean data types that are not primarily used as
13containers to hold other data --- i.e.@: pairs, lists, vectors and so on.
14For the documentation of such @dfn{compound} data types, see
15@ref{Compound Data Types}.
16
17@c One of the great strengths of Scheme is that there is no straightforward
18@c distinction between ``data'' and ``functionality''. For example,
19@c Guile's support for dynamic linking could be described:
20
21@c @itemize @bullet
22@c @item
23@c either in a ``data-centric'' way, as the behaviour and properties of the
24@c ``dynamically linked object'' data type, and the operations that may be
25@c applied to instances of this type
26
27@c @item
28@c or in a ``functionality-centric'' way, as the set of procedures that
29@c constitute Guile's support for dynamic linking, in the context of the
30@c module system.
31@c @end itemize
32
33@c The contents of this chapter are, therefore, a matter of judgment. By
34@c @dfn{generic}, we mean to select those data types whose typical use as
35@c @emph{data} in a wide variety of programming contexts is more important
36@c than their use in the implementation of a particular piece of
37@c @emph{functionality}. The last section of this chapter provides
38@c references for all the data types that are documented not here but in a
39@c ``functionality-centric'' way elsewhere in the manual.
40
41@menu
42* Booleans:: True/false values.
43* Numbers:: Numerical data types.
050ab45f
MV
44* Characters:: Single characters.
45* Character Sets:: Sets of characters.
46* Strings:: Sequences of characters.
b242715b 47* Bytevectors:: Sequences of bytes.
07d83abe
MV
48* Symbols:: Symbols.
49* Keywords:: Self-quoting, customizable display keywords.
50* Other Types:: "Functionality-centric" data types.
51@end menu
52
53
54@node Booleans
55@subsection Booleans
56@tpindex Booleans
57
58The two boolean values are @code{#t} for true and @code{#f} for false.
59
60Boolean values are returned by predicate procedures, such as the general
61equality predicates @code{eq?}, @code{eqv?} and @code{equal?}
62(@pxref{Equality}) and numerical and string comparison operators like
63@code{string=?} (@pxref{String Comparison}) and @code{<=}
64(@pxref{Comparison}).
65
66@lisp
67(<= 3 8)
68@result{} #t
69
70(<= 3 -3)
71@result{} #f
72
73(equal? "house" "houses")
74@result{} #f
75
76(eq? #f #f)
77@result{}
78#t
79@end lisp
80
81In test condition contexts like @code{if} and @code{cond} (@pxref{if
82cond case}), where a group of subexpressions will be evaluated only if a
83@var{condition} expression evaluates to ``true'', ``true'' means any
84value at all except @code{#f}.
85
86@lisp
87(if #t "yes" "no")
88@result{} "yes"
89
90(if 0 "yes" "no")
91@result{} "yes"
92
93(if #f "yes" "no")
94@result{} "no"
95@end lisp
96
97A result of this asymmetry is that typical Scheme source code more often
98uses @code{#f} explicitly than @code{#t}: @code{#f} is necessary to
99represent an @code{if} or @code{cond} false value, whereas @code{#t} is
100not necessary to represent an @code{if} or @code{cond} true value.
101
102It is important to note that @code{#f} is @strong{not} equivalent to any
103other Scheme value. In particular, @code{#f} is not the same as the
104number 0 (like in C and C++), and not the same as the ``empty list''
105(like in some Lisp dialects).
106
107In C, the two Scheme boolean values are available as the two constants
108@code{SCM_BOOL_T} for @code{#t} and @code{SCM_BOOL_F} for @code{#f}.
109Care must be taken with the false value @code{SCM_BOOL_F}: it is not
110false when used in C conditionals. In order to test for it, use
111@code{scm_is_false} or @code{scm_is_true}.
112
113@rnindex not
114@deffn {Scheme Procedure} not x
115@deffnx {C Function} scm_not (x)
116Return @code{#t} if @var{x} is @code{#f}, else return @code{#f}.
117@end deffn
118
119@rnindex boolean?
120@deffn {Scheme Procedure} boolean? obj
121@deffnx {C Function} scm_boolean_p (obj)
122Return @code{#t} if @var{obj} is either @code{#t} or @code{#f}, else
123return @code{#f}.
124@end deffn
125
126@deftypevr {C Macro} SCM SCM_BOOL_T
127The @code{SCM} representation of the Scheme object @code{#t}.
128@end deftypevr
129
130@deftypevr {C Macro} SCM SCM_BOOL_F
131The @code{SCM} representation of the Scheme object @code{#f}.
132@end deftypevr
133
134@deftypefn {C Function} int scm_is_true (SCM obj)
135Return @code{0} if @var{obj} is @code{#f}, else return @code{1}.
136@end deftypefn
137
138@deftypefn {C Function} int scm_is_false (SCM obj)
139Return @code{1} if @var{obj} is @code{#f}, else return @code{0}.
140@end deftypefn
141
142@deftypefn {C Function} int scm_is_bool (SCM obj)
143Return @code{1} if @var{obj} is either @code{#t} or @code{#f}, else
144return @code{0}.
145@end deftypefn
146
147@deftypefn {C Function} SCM scm_from_bool (int val)
148Return @code{#f} if @var{val} is @code{0}, else return @code{#t}.
149@end deftypefn
150
151@deftypefn {C Function} int scm_to_bool (SCM val)
152Return @code{1} if @var{val} is @code{SCM_BOOL_T}, return @code{0}
153when @var{val} is @code{SCM_BOOL_F}, else signal a `wrong type' error.
154
155You should probably use @code{scm_is_true} instead of this function
156when you just want to test a @code{SCM} value for trueness.
157@end deftypefn
158
159@node Numbers
160@subsection Numerical data types
161@tpindex Numbers
162
163Guile supports a rich ``tower'' of numerical types --- integer,
164rational, real and complex --- and provides an extensive set of
165mathematical and scientific functions for operating on numerical
166data. This section of the manual documents those types and functions.
167
168You may also find it illuminating to read R5RS's presentation of numbers
169in Scheme, which is particularly clear and accessible: see
170@ref{Numbers,,,r5rs,R5RS}.
171
172@menu
173* Numerical Tower:: Scheme's numerical "tower".
174* Integers:: Whole numbers.
175* Reals and Rationals:: Real and rational numbers.
176* Complex Numbers:: Complex numbers.
177* Exactness:: Exactness and inexactness.
178* Number Syntax:: Read syntax for numerical data.
179* Integer Operations:: Operations on integer values.
180* Comparison:: Comparison predicates.
181* Conversion:: Converting numbers to and from strings.
182* Complex:: Complex number operations.
183* Arithmetic:: Arithmetic functions.
184* Scientific:: Scientific functions.
07d83abe
MV
185* Bitwise Operations:: Logical AND, OR, NOT, and so on.
186* Random:: Random number generation.
187@end menu
188
189
190@node Numerical Tower
191@subsubsection Scheme's Numerical ``Tower''
192@rnindex number?
193
194Scheme's numerical ``tower'' consists of the following categories of
195numbers:
196
197@table @dfn
198@item integers
199Whole numbers, positive or negative; e.g.@: --5, 0, 18.
200
201@item rationals
202The set of numbers that can be expressed as @math{@var{p}/@var{q}}
203where @var{p} and @var{q} are integers; e.g.@: @math{9/16} works, but
204pi (an irrational number) doesn't. These include integers
205(@math{@var{n}/1}).
206
207@item real numbers
208The set of numbers that describes all possible positions along a
209one-dimensional line. This includes rationals as well as irrational
210numbers.
211
212@item complex numbers
213The set of numbers that describes all possible positions in a two
214dimensional space. This includes real as well as imaginary numbers
215(@math{@var{a}+@var{b}i}, where @var{a} is the @dfn{real part},
216@var{b} is the @dfn{imaginary part}, and @math{i} is the square root of
217@minus{}1.)
218@end table
219
220It is called a tower because each category ``sits on'' the one that
221follows it, in the sense that every integer is also a rational, every
222rational is also real, and every real number is also a complex number
223(but with zero imaginary part).
224
225In addition to the classification into integers, rationals, reals and
226complex numbers, Scheme also distinguishes between whether a number is
227represented exactly or not. For example, the result of
9f1ba6a9
NJ
228@m{2\sin(\pi/4),2*sin(pi/4)} is exactly @m{\sqrt{2},2^(1/2)}, but Guile
229can represent neither @m{\pi/4,pi/4} nor @m{\sqrt{2},2^(1/2)} exactly.
07d83abe
MV
230Instead, it stores an inexact approximation, using the C type
231@code{double}.
232
233Guile can represent exact rationals of any magnitude, inexact
234rationals that fit into a C @code{double}, and inexact complex numbers
235with @code{double} real and imaginary parts.
236
237The @code{number?} predicate may be applied to any Scheme value to
238discover whether the value is any of the supported numerical types.
239
240@deffn {Scheme Procedure} number? obj
241@deffnx {C Function} scm_number_p (obj)
242Return @code{#t} if @var{obj} is any kind of number, else @code{#f}.
243@end deffn
244
245For example:
246
247@lisp
248(number? 3)
249@result{} #t
250
251(number? "hello there!")
252@result{} #f
253
254(define pi 3.141592654)
255(number? pi)
256@result{} #t
257@end lisp
258
5615f696
MV
259@deftypefn {C Function} int scm_is_number (SCM obj)
260This is equivalent to @code{scm_is_true (scm_number_p (obj))}.
261@end deftypefn
262
07d83abe
MV
263The next few subsections document each of Guile's numerical data types
264in detail.
265
266@node Integers
267@subsubsection Integers
268
269@tpindex Integer numbers
270
271@rnindex integer?
272
273Integers are whole numbers, that is numbers with no fractional part,
274such as 2, 83, and @minus{}3789.
275
276Integers in Guile can be arbitrarily big, as shown by the following
277example.
278
279@lisp
280(define (factorial n)
281 (let loop ((n n) (product 1))
282 (if (= n 0)
283 product
284 (loop (- n 1) (* product n)))))
285
286(factorial 3)
287@result{} 6
288
289(factorial 20)
290@result{} 2432902008176640000
291
292(- (factorial 45))
293@result{} -119622220865480194561963161495657715064383733760000000000
294@end lisp
295
296Readers whose background is in programming languages where integers are
297limited by the need to fit into just 4 or 8 bytes of memory may find
298this surprising, or suspect that Guile's representation of integers is
299inefficient. In fact, Guile achieves a near optimal balance of
300convenience and efficiency by using the host computer's native
301representation of integers where possible, and a more general
302representation where the required number does not fit in the native
303form. Conversion between these two representations is automatic and
304completely invisible to the Scheme level programmer.
305
07d83abe
MV
306C has a host of different integer types, and Guile offers a host of
307functions to convert between them and the @code{SCM} representation.
308For example, a C @code{int} can be handled with @code{scm_to_int} and
309@code{scm_from_int}. Guile also defines a few C integer types of its
310own, to help with differences between systems.
311
312C integer types that are not covered can be handled with the generic
313@code{scm_to_signed_integer} and @code{scm_from_signed_integer} for
314signed types, or with @code{scm_to_unsigned_integer} and
315@code{scm_from_unsigned_integer} for unsigned types.
316
317Scheme integers can be exact and inexact. For example, a number
318written as @code{3.0} with an explicit decimal-point is inexact, but
319it is also an integer. The functions @code{integer?} and
320@code{scm_is_integer} report true for such a number, but the functions
321@code{scm_is_signed_integer} and @code{scm_is_unsigned_integer} only
322allow exact integers and thus report false. Likewise, the conversion
323functions like @code{scm_to_signed_integer} only accept exact
324integers.
325
326The motivation for this behavior is that the inexactness of a number
327should not be lost silently. If you want to allow inexact integers,
877f06c3 328you can explicitly insert a call to @code{inexact->exact} or to its C
07d83abe
MV
329equivalent @code{scm_inexact_to_exact}. (Only inexact integers will
330be converted by this call into exact integers; inexact non-integers
331will become exact fractions.)
332
333@deffn {Scheme Procedure} integer? x
334@deffnx {C Function} scm_integer_p (x)
909fcc97 335Return @code{#t} if @var{x} is an exact or inexact integer number, else
07d83abe
MV
336@code{#f}.
337
338@lisp
339(integer? 487)
340@result{} #t
341
342(integer? 3.0)
343@result{} #t
344
345(integer? -3.4)
346@result{} #f
347
348(integer? +inf.0)
349@result{} #t
350@end lisp
351@end deffn
352
353@deftypefn {C Function} int scm_is_integer (SCM x)
354This is equivalent to @code{scm_is_true (scm_integer_p (x))}.
355@end deftypefn
356
357@defvr {C Type} scm_t_int8
358@defvrx {C Type} scm_t_uint8
359@defvrx {C Type} scm_t_int16
360@defvrx {C Type} scm_t_uint16
361@defvrx {C Type} scm_t_int32
362@defvrx {C Type} scm_t_uint32
363@defvrx {C Type} scm_t_int64
364@defvrx {C Type} scm_t_uint64
365@defvrx {C Type} scm_t_intmax
366@defvrx {C Type} scm_t_uintmax
367The C types are equivalent to the corresponding ISO C types but are
368defined on all platforms, with the exception of @code{scm_t_int64} and
369@code{scm_t_uint64}, which are only defined when a 64-bit type is
370available. For example, @code{scm_t_int8} is equivalent to
371@code{int8_t}.
372
373You can regard these definitions as a stop-gap measure until all
374platforms provide these types. If you know that all the platforms
375that you are interested in already provide these types, it is better
376to use them directly instead of the types provided by Guile.
377@end defvr
378
379@deftypefn {C Function} int scm_is_signed_integer (SCM x, scm_t_intmax min, scm_t_intmax max)
380@deftypefnx {C Function} int scm_is_unsigned_integer (SCM x, scm_t_uintmax min, scm_t_uintmax max)
381Return @code{1} when @var{x} represents an exact integer that is
382between @var{min} and @var{max}, inclusive.
383
384These functions can be used to check whether a @code{SCM} value will
385fit into a given range, such as the range of a given C integer type.
386If you just want to convert a @code{SCM} value to a given C integer
387type, use one of the conversion functions directly.
388@end deftypefn
389
390@deftypefn {C Function} scm_t_intmax scm_to_signed_integer (SCM x, scm_t_intmax min, scm_t_intmax max)
391@deftypefnx {C Function} scm_t_uintmax scm_to_unsigned_integer (SCM x, scm_t_uintmax min, scm_t_uintmax max)
392When @var{x} represents an exact integer that is between @var{min} and
393@var{max} inclusive, return that integer. Else signal an error,
394either a `wrong-type' error when @var{x} is not an exact integer, or
395an `out-of-range' error when it doesn't fit the given range.
396@end deftypefn
397
398@deftypefn {C Function} SCM scm_from_signed_integer (scm_t_intmax x)
399@deftypefnx {C Function} SCM scm_from_unsigned_integer (scm_t_uintmax x)
400Return the @code{SCM} value that represents the integer @var{x}. This
401function will always succeed and will always return an exact number.
402@end deftypefn
403
404@deftypefn {C Function} char scm_to_char (SCM x)
405@deftypefnx {C Function} {signed char} scm_to_schar (SCM x)
406@deftypefnx {C Function} {unsigned char} scm_to_uchar (SCM x)
407@deftypefnx {C Function} short scm_to_short (SCM x)
408@deftypefnx {C Function} {unsigned short} scm_to_ushort (SCM x)
409@deftypefnx {C Function} int scm_to_int (SCM x)
410@deftypefnx {C Function} {unsigned int} scm_to_uint (SCM x)
411@deftypefnx {C Function} long scm_to_long (SCM x)
412@deftypefnx {C Function} {unsigned long} scm_to_ulong (SCM x)
413@deftypefnx {C Function} {long long} scm_to_long_long (SCM x)
414@deftypefnx {C Function} {unsigned long long} scm_to_ulong_long (SCM x)
415@deftypefnx {C Function} size_t scm_to_size_t (SCM x)
416@deftypefnx {C Function} ssize_t scm_to_ssize_t (SCM x)
417@deftypefnx {C Function} scm_t_int8 scm_to_int8 (SCM x)
418@deftypefnx {C Function} scm_t_uint8 scm_to_uint8 (SCM x)
419@deftypefnx {C Function} scm_t_int16 scm_to_int16 (SCM x)
420@deftypefnx {C Function} scm_t_uint16 scm_to_uint16 (SCM x)
421@deftypefnx {C Function} scm_t_int32 scm_to_int32 (SCM x)
422@deftypefnx {C Function} scm_t_uint32 scm_to_uint32 (SCM x)
423@deftypefnx {C Function} scm_t_int64 scm_to_int64 (SCM x)
424@deftypefnx {C Function} scm_t_uint64 scm_to_uint64 (SCM x)
425@deftypefnx {C Function} scm_t_intmax scm_to_intmax (SCM x)
426@deftypefnx {C Function} scm_t_uintmax scm_to_uintmax (SCM x)
427When @var{x} represents an exact integer that fits into the indicated
428C type, return that integer. Else signal an error, either a
429`wrong-type' error when @var{x} is not an exact integer, or an
430`out-of-range' error when it doesn't fit the given range.
431
432The functions @code{scm_to_long_long}, @code{scm_to_ulong_long},
433@code{scm_to_int64}, and @code{scm_to_uint64} are only available when
434the corresponding types are.
435@end deftypefn
436
437@deftypefn {C Function} SCM scm_from_char (char x)
438@deftypefnx {C Function} SCM scm_from_schar (signed char x)
439@deftypefnx {C Function} SCM scm_from_uchar (unsigned char x)
440@deftypefnx {C Function} SCM scm_from_short (short x)
441@deftypefnx {C Function} SCM scm_from_ushort (unsigned short x)
442@deftypefnx {C Function} SCM scm_from_int (int x)
443@deftypefnx {C Function} SCM scm_from_uint (unsigned int x)
444@deftypefnx {C Function} SCM scm_from_long (long x)
445@deftypefnx {C Function} SCM scm_from_ulong (unsigned long x)
446@deftypefnx {C Function} SCM scm_from_long_long (long long x)
447@deftypefnx {C Function} SCM scm_from_ulong_long (unsigned long long x)
448@deftypefnx {C Function} SCM scm_from_size_t (size_t x)
449@deftypefnx {C Function} SCM scm_from_ssize_t (ssize_t x)
450@deftypefnx {C Function} SCM scm_from_int8 (scm_t_int8 x)
451@deftypefnx {C Function} SCM scm_from_uint8 (scm_t_uint8 x)
452@deftypefnx {C Function} SCM scm_from_int16 (scm_t_int16 x)
453@deftypefnx {C Function} SCM scm_from_uint16 (scm_t_uint16 x)
454@deftypefnx {C Function} SCM scm_from_int32 (scm_t_int32 x)
455@deftypefnx {C Function} SCM scm_from_uint32 (scm_t_uint32 x)
456@deftypefnx {C Function} SCM scm_from_int64 (scm_t_int64 x)
457@deftypefnx {C Function} SCM scm_from_uint64 (scm_t_uint64 x)
458@deftypefnx {C Function} SCM scm_from_intmax (scm_t_intmax x)
459@deftypefnx {C Function} SCM scm_from_uintmax (scm_t_uintmax x)
460Return the @code{SCM} value that represents the integer @var{x}.
461These functions will always succeed and will always return an exact
462number.
463@end deftypefn
464
08962922
MV
465@deftypefn {C Function} void scm_to_mpz (SCM val, mpz_t rop)
466Assign @var{val} to the multiple precision integer @var{rop}.
467@var{val} must be an exact integer, otherwise an error will be
468signalled. @var{rop} must have been initialized with @code{mpz_init}
469before this function is called. When @var{rop} is no longer needed
470the occupied space must be freed with @code{mpz_clear}.
471@xref{Initializing Integers,,, gmp, GNU MP Manual}, for details.
472@end deftypefn
473
9f1ba6a9 474@deftypefn {C Function} SCM scm_from_mpz (mpz_t val)
08962922
MV
475Return the @code{SCM} value that represents @var{val}.
476@end deftypefn
477
07d83abe
MV
478@node Reals and Rationals
479@subsubsection Real and Rational Numbers
480@tpindex Real numbers
481@tpindex Rational numbers
482
483@rnindex real?
484@rnindex rational?
485
486Mathematically, the real numbers are the set of numbers that describe
487all possible points along a continuous, infinite, one-dimensional line.
488The rational numbers are the set of all numbers that can be written as
489fractions @var{p}/@var{q}, where @var{p} and @var{q} are integers.
490All rational numbers are also real, but there are real numbers that
34942993
KR
491are not rational, for example @m{\sqrt2, the square root of 2}, and
492@m{\pi,pi}.
07d83abe
MV
493
494Guile can represent both exact and inexact rational numbers, but it
c960e556
MW
495cannot represent precise finite irrational numbers. Exact rationals are
496represented by storing the numerator and denominator as two exact
497integers. Inexact rationals are stored as floating point numbers using
498the C type @code{double}.
07d83abe
MV
499
500Exact rationals are written as a fraction of integers. There must be
501no whitespace around the slash:
502
503@lisp
5041/2
505-22/7
506@end lisp
507
508Even though the actual encoding of inexact rationals is in binary, it
509may be helpful to think of it as a decimal number with a limited
510number of significant figures and a decimal point somewhere, since
511this corresponds to the standard notation for non-whole numbers. For
512example:
513
514@lisp
5150.34
516-0.00000142857931198
517-5648394822220000000000.0
5184.0
519@end lisp
520
c960e556
MW
521The limited precision of Guile's encoding means that any finite ``real''
522number in Guile can be written in a rational form, by multiplying and
523then dividing by sufficient powers of 10 (or in fact, 2). For example,
524@samp{-0.00000142857931198} is the same as @minus{}142857931198 divided
525by 100000000000000000. In Guile's current incarnation, therefore, the
526@code{rational?} and @code{real?} predicates are equivalent for finite
527numbers.
07d83abe 528
07d83abe 529
c960e556
MW
530Dividing by an exact zero leads to a error message, as one might expect.
531However, dividing by an inexact zero does not produce an error.
532Instead, the result of the division is either plus or minus infinity,
533depending on the sign of the divided number and the sign of the zero
534divisor (some platforms support signed zeroes @samp{-0.0} and
535@samp{+0.0}; @samp{0.0} is the same as @samp{+0.0}).
536
537Dividing zero by an inexact zero yields a @acronym{NaN} (`not a number')
538value, although they are actually considered numbers by Scheme.
539Attempts to compare a @acronym{NaN} value with any number (including
540itself) using @code{=}, @code{<}, @code{>}, @code{<=} or @code{>=}
541always returns @code{#f}. Although a @acronym{NaN} value is not
542@code{=} to itself, it is both @code{eqv?} and @code{equal?} to itself
543and other @acronym{NaN} values. However, the preferred way to test for
544them is by using @code{nan?}.
545
546The real @acronym{NaN} values and infinities are written @samp{+nan.0},
547@samp{+inf.0} and @samp{-inf.0}. This syntax is also recognized by
548@code{read} as an extension to the usual Scheme syntax. These special
549values are considered by Scheme to be inexact real numbers but not
550rational. Note that non-real complex numbers may also contain
551infinities or @acronym{NaN} values in their real or imaginary parts. To
552test a real number to see if it is infinite, a @acronym{NaN} value, or
553neither, use @code{inf?}, @code{nan?}, or @code{finite?}, respectively.
554Every real number in Scheme belongs to precisely one of those three
555classes.
07d83abe
MV
556
557On platforms that follow @acronym{IEEE} 754 for their floating point
558arithmetic, the @samp{+inf.0}, @samp{-inf.0}, and @samp{+nan.0} values
559are implemented using the corresponding @acronym{IEEE} 754 values.
560They behave in arithmetic operations like @acronym{IEEE} 754 describes
561it, i.e., @code{(= +nan.0 +nan.0)} @result{} @code{#f}.
562
07d83abe
MV
563@deffn {Scheme Procedure} real? obj
564@deffnx {C Function} scm_real_p (obj)
565Return @code{#t} if @var{obj} is a real number, else @code{#f}. Note
566that the sets of integer and rational values form subsets of the set
567of real numbers, so the predicate will also be fulfilled if @var{obj}
568is an integer number or a rational number.
569@end deffn
570
571@deffn {Scheme Procedure} rational? x
572@deffnx {C Function} scm_rational_p (x)
573Return @code{#t} if @var{x} is a rational number, @code{#f} otherwise.
574Note that the set of integer values forms a subset of the set of
575rational numbers, i. e. the predicate will also be fulfilled if
576@var{x} is an integer number.
07d83abe
MV
577@end deffn
578
579@deffn {Scheme Procedure} rationalize x eps
580@deffnx {C Function} scm_rationalize (x, eps)
581Returns the @emph{simplest} rational number differing
582from @var{x} by no more than @var{eps}.
583
584As required by @acronym{R5RS}, @code{rationalize} only returns an
585exact result when both its arguments are exact. Thus, you might need
586to use @code{inexact->exact} on the arguments.
587
588@lisp
589(rationalize (inexact->exact 1.2) 1/100)
590@result{} 6/5
591@end lisp
592
593@end deffn
594
d3df9759
MV
595@deffn {Scheme Procedure} inf? x
596@deffnx {C Function} scm_inf_p (x)
10391e06
AW
597Return @code{#t} if the real number @var{x} is @samp{+inf.0} or
598@samp{-inf.0}. Otherwise return @code{#f}.
07d83abe
MV
599@end deffn
600
601@deffn {Scheme Procedure} nan? x
d3df9759 602@deffnx {C Function} scm_nan_p (x)
10391e06
AW
603Return @code{#t} if the real number @var{x} is @samp{+nan.0}, or
604@code{#f} otherwise.
07d83abe
MV
605@end deffn
606
7112615f
MW
607@deffn {Scheme Procedure} finite? x
608@deffnx {C Function} scm_finite_p (x)
10391e06
AW
609Return @code{#t} if the real number @var{x} is neither infinite nor a
610NaN, @code{#f} otherwise.
7112615f
MW
611@end deffn
612
cdf1ad3b
MV
613@deffn {Scheme Procedure} nan
614@deffnx {C Function} scm_nan ()
c960e556 615Return @samp{+nan.0}, a @acronym{NaN} value.
cdf1ad3b
MV
616@end deffn
617
618@deffn {Scheme Procedure} inf
619@deffnx {C Function} scm_inf ()
c960e556 620Return @samp{+inf.0}, positive infinity.
cdf1ad3b
MV
621@end deffn
622
d3df9759
MV
623@deffn {Scheme Procedure} numerator x
624@deffnx {C Function} scm_numerator (x)
625Return the numerator of the rational number @var{x}.
626@end deffn
627
628@deffn {Scheme Procedure} denominator x
629@deffnx {C Function} scm_denominator (x)
630Return the denominator of the rational number @var{x}.
631@end deffn
632
633@deftypefn {C Function} int scm_is_real (SCM val)
634@deftypefnx {C Function} int scm_is_rational (SCM val)
635Equivalent to @code{scm_is_true (scm_real_p (val))} and
636@code{scm_is_true (scm_rational_p (val))}, respectively.
637@end deftypefn
638
639@deftypefn {C Function} double scm_to_double (SCM val)
640Returns the number closest to @var{val} that is representable as a
641@code{double}. Returns infinity for a @var{val} that is too large in
642magnitude. The argument @var{val} must be a real number.
643@end deftypefn
644
645@deftypefn {C Function} SCM scm_from_double (double val)
be3eb25c 646Return the @code{SCM} value that represents @var{val}. The returned
d3df9759
MV
647value is inexact according to the predicate @code{inexact?}, but it
648will be exactly equal to @var{val}.
649@end deftypefn
650
07d83abe
MV
651@node Complex Numbers
652@subsubsection Complex Numbers
653@tpindex Complex numbers
654
655@rnindex complex?
656
657Complex numbers are the set of numbers that describe all possible points
658in a two-dimensional space. The two coordinates of a particular point
659in this space are known as the @dfn{real} and @dfn{imaginary} parts of
660the complex number that describes that point.
661
662In Guile, complex numbers are written in rectangular form as the sum of
663their real and imaginary parts, using the symbol @code{i} to indicate
664the imaginary part.
665
666@lisp
6673+4i
668@result{}
6693.0+4.0i
670
671(* 3-8i 2.3+0.3i)
672@result{}
6739.3-17.5i
674@end lisp
675
34942993
KR
676@cindex polar form
677@noindent
678Polar form can also be used, with an @samp{@@} between magnitude and
679angle,
680
681@lisp
6821@@3.141592 @result{} -1.0 (approx)
683-1@@1.57079 @result{} 0.0-1.0i (approx)
684@end lisp
685
07d83abe
MV
686Guile represents a complex number with a non-zero imaginary part as a
687pair of inexact rationals, so the real and imaginary parts of a
688complex number have the same properties of inexactness and limited
689precision as single inexact rational numbers. Guile can not represent
690exact complex numbers with non-zero imaginary parts.
691
5615f696
MV
692@deffn {Scheme Procedure} complex? z
693@deffnx {C Function} scm_complex_p (z)
07d83abe
MV
694Return @code{#t} if @var{x} is a complex number, @code{#f}
695otherwise. Note that the sets of real, rational and integer
696values form subsets of the set of complex numbers, i. e. the
697predicate will also be fulfilled if @var{x} is a real,
698rational or integer number.
699@end deffn
700
c9dc8c6c
MV
701@deftypefn {C Function} int scm_is_complex (SCM val)
702Equivalent to @code{scm_is_true (scm_complex_p (val))}.
703@end deftypefn
704
07d83abe
MV
705@node Exactness
706@subsubsection Exact and Inexact Numbers
707@tpindex Exact numbers
708@tpindex Inexact numbers
709
710@rnindex exact?
711@rnindex inexact?
712@rnindex exact->inexact
713@rnindex inexact->exact
714
715R5RS requires that a calculation involving inexact numbers always
716produces an inexact result. To meet this requirement, Guile
717distinguishes between an exact integer value such as @samp{5} and the
718corresponding inexact real value which, to the limited precision
719available, has no fractional part, and is printed as @samp{5.0}. Guile
720will only convert the latter value to the former when forced to do so by
721an invocation of the @code{inexact->exact} procedure.
722
723@deffn {Scheme Procedure} exact? z
724@deffnx {C Function} scm_exact_p (z)
725Return @code{#t} if the number @var{z} is exact, @code{#f}
726otherwise.
727
728@lisp
729(exact? 2)
730@result{} #t
731
732(exact? 0.5)
733@result{} #f
734
735(exact? (/ 2))
736@result{} #t
737@end lisp
738
739@end deffn
740
741@deffn {Scheme Procedure} inexact? z
742@deffnx {C Function} scm_inexact_p (z)
743Return @code{#t} if the number @var{z} is inexact, @code{#f}
744else.
745@end deffn
746
747@deffn {Scheme Procedure} inexact->exact z
748@deffnx {C Function} scm_inexact_to_exact (z)
749Return an exact number that is numerically closest to @var{z}, when
750there is one. For inexact rationals, Guile returns the exact rational
751that is numerically equal to the inexact rational. Inexact complex
752numbers with a non-zero imaginary part can not be made exact.
753
754@lisp
755(inexact->exact 0.5)
756@result{} 1/2
757@end lisp
758
759The following happens because 12/10 is not exactly representable as a
760@code{double} (on most platforms). However, when reading a decimal
761number that has been marked exact with the ``#e'' prefix, Guile is
762able to represent it correctly.
763
764@lisp
765(inexact->exact 1.2)
766@result{} 5404319552844595/4503599627370496
767
768#e1.2
769@result{} 6/5
770@end lisp
771
772@end deffn
773
774@c begin (texi-doc-string "guile" "exact->inexact")
775@deffn {Scheme Procedure} exact->inexact z
776@deffnx {C Function} scm_exact_to_inexact (z)
777Convert the number @var{z} to its inexact representation.
778@end deffn
779
780
781@node Number Syntax
782@subsubsection Read Syntax for Numerical Data
783
784The read syntax for integers is a string of digits, optionally
785preceded by a minus or plus character, a code indicating the
786base in which the integer is encoded, and a code indicating whether
787the number is exact or inexact. The supported base codes are:
788
789@table @code
790@item #b
791@itemx #B
792the integer is written in binary (base 2)
793
794@item #o
795@itemx #O
796the integer is written in octal (base 8)
797
798@item #d
799@itemx #D
800the integer is written in decimal (base 10)
801
802@item #x
803@itemx #X
804the integer is written in hexadecimal (base 16)
805@end table
806
807If the base code is omitted, the integer is assumed to be decimal. The
808following examples show how these base codes are used.
809
810@lisp
811-13
812@result{} -13
813
814#d-13
815@result{} -13
816
817#x-13
818@result{} -19
819
820#b+1101
821@result{} 13
822
823#o377
824@result{} 255
825@end lisp
826
827The codes for indicating exactness (which can, incidentally, be applied
828to all numerical values) are:
829
830@table @code
831@item #e
832@itemx #E
833the number is exact
834
835@item #i
836@itemx #I
837the number is inexact.
838@end table
839
840If the exactness indicator is omitted, the number is exact unless it
841contains a radix point. Since Guile can not represent exact complex
842numbers, an error is signalled when asking for them.
843
844@lisp
845(exact? 1.2)
846@result{} #f
847
848(exact? #e1.2)
849@result{} #t
850
851(exact? #e+1i)
852ERROR: Wrong type argument
853@end lisp
854
855Guile also understands the syntax @samp{+inf.0} and @samp{-inf.0} for
856plus and minus infinity, respectively. The value must be written
857exactly as shown, that is, they always must have a sign and exactly
858one zero digit after the decimal point. It also understands
859@samp{+nan.0} and @samp{-nan.0} for the special `not-a-number' value.
860The sign is ignored for `not-a-number' and the value is always printed
861as @samp{+nan.0}.
862
863@node Integer Operations
864@subsubsection Operations on Integer Values
865@rnindex odd?
866@rnindex even?
867@rnindex quotient
868@rnindex remainder
869@rnindex modulo
870@rnindex gcd
871@rnindex lcm
872
873@deffn {Scheme Procedure} odd? n
874@deffnx {C Function} scm_odd_p (n)
875Return @code{#t} if @var{n} is an odd number, @code{#f}
876otherwise.
877@end deffn
878
879@deffn {Scheme Procedure} even? n
880@deffnx {C Function} scm_even_p (n)
881Return @code{#t} if @var{n} is an even number, @code{#f}
882otherwise.
883@end deffn
884
885@c begin (texi-doc-string "guile" "quotient")
886@c begin (texi-doc-string "guile" "remainder")
887@deffn {Scheme Procedure} quotient n d
888@deffnx {Scheme Procedure} remainder n d
889@deffnx {C Function} scm_quotient (n, d)
890@deffnx {C Function} scm_remainder (n, d)
891Return the quotient or remainder from @var{n} divided by @var{d}. The
892quotient is rounded towards zero, and the remainder will have the same
893sign as @var{n}. In all cases quotient and remainder satisfy
894@math{@var{n} = @var{q}*@var{d} + @var{r}}.
895
896@lisp
897(remainder 13 4) @result{} 1
898(remainder -13 4) @result{} -1
899@end lisp
ff62c168
MW
900
901See also @code{euclidean-quotient}, @code{euclidean-remainder} and
902related operations in @ref{Arithmetic}.
07d83abe
MV
903@end deffn
904
905@c begin (texi-doc-string "guile" "modulo")
906@deffn {Scheme Procedure} modulo n d
907@deffnx {C Function} scm_modulo (n, d)
908Return the remainder from @var{n} divided by @var{d}, with the same
909sign as @var{d}.
910
911@lisp
912(modulo 13 4) @result{} 1
913(modulo -13 4) @result{} 3
914(modulo 13 -4) @result{} -3
915(modulo -13 -4) @result{} -1
916@end lisp
ff62c168
MW
917
918See also @code{euclidean-quotient}, @code{euclidean-remainder} and
919related operations in @ref{Arithmetic}.
07d83abe
MV
920@end deffn
921
922@c begin (texi-doc-string "guile" "gcd")
fd8a1df5 923@deffn {Scheme Procedure} gcd x@dots{}
07d83abe
MV
924@deffnx {C Function} scm_gcd (x, y)
925Return the greatest common divisor of all arguments.
926If called without arguments, 0 is returned.
927
928The C function @code{scm_gcd} always takes two arguments, while the
929Scheme function can take an arbitrary number.
930@end deffn
931
932@c begin (texi-doc-string "guile" "lcm")
fd8a1df5 933@deffn {Scheme Procedure} lcm x@dots{}
07d83abe
MV
934@deffnx {C Function} scm_lcm (x, y)
935Return the least common multiple of the arguments.
936If called without arguments, 1 is returned.
937
938The C function @code{scm_lcm} always takes two arguments, while the
939Scheme function can take an arbitrary number.
940@end deffn
941
cdf1ad3b
MV
942@deffn {Scheme Procedure} modulo-expt n k m
943@deffnx {C Function} scm_modulo_expt (n, k, m)
944Return @var{n} raised to the integer exponent
945@var{k}, modulo @var{m}.
946
947@lisp
948(modulo-expt 2 3 5)
949 @result{} 3
950@end lisp
951@end deffn
07d83abe
MV
952
953@node Comparison
954@subsubsection Comparison Predicates
955@rnindex zero?
956@rnindex positive?
957@rnindex negative?
958
959The C comparison functions below always takes two arguments, while the
960Scheme functions can take an arbitrary number. Also keep in mind that
961the C functions return one of the Scheme boolean values
962@code{SCM_BOOL_T} or @code{SCM_BOOL_F} which are both true as far as C
963is concerned. Thus, always write @code{scm_is_true (scm_num_eq_p (x,
964y))} when testing the two Scheme numbers @code{x} and @code{y} for
965equality, for example.
966
967@c begin (texi-doc-string "guile" "=")
968@deffn {Scheme Procedure} =
969@deffnx {C Function} scm_num_eq_p (x, y)
970Return @code{#t} if all parameters are numerically equal.
971@end deffn
972
973@c begin (texi-doc-string "guile" "<")
974@deffn {Scheme Procedure} <
975@deffnx {C Function} scm_less_p (x, y)
976Return @code{#t} if the list of parameters is monotonically
977increasing.
978@end deffn
979
980@c begin (texi-doc-string "guile" ">")
981@deffn {Scheme Procedure} >
982@deffnx {C Function} scm_gr_p (x, y)
983Return @code{#t} if the list of parameters is monotonically
984decreasing.
985@end deffn
986
987@c begin (texi-doc-string "guile" "<=")
988@deffn {Scheme Procedure} <=
989@deffnx {C Function} scm_leq_p (x, y)
990Return @code{#t} if the list of parameters is monotonically
991non-decreasing.
992@end deffn
993
994@c begin (texi-doc-string "guile" ">=")
995@deffn {Scheme Procedure} >=
996@deffnx {C Function} scm_geq_p (x, y)
997Return @code{#t} if the list of parameters is monotonically
998non-increasing.
999@end deffn
1000
1001@c begin (texi-doc-string "guile" "zero?")
1002@deffn {Scheme Procedure} zero? z
1003@deffnx {C Function} scm_zero_p (z)
1004Return @code{#t} if @var{z} is an exact or inexact number equal to
1005zero.
1006@end deffn
1007
1008@c begin (texi-doc-string "guile" "positive?")
1009@deffn {Scheme Procedure} positive? x
1010@deffnx {C Function} scm_positive_p (x)
1011Return @code{#t} if @var{x} is an exact or inexact number greater than
1012zero.
1013@end deffn
1014
1015@c begin (texi-doc-string "guile" "negative?")
1016@deffn {Scheme Procedure} negative? x
1017@deffnx {C Function} scm_negative_p (x)
1018Return @code{#t} if @var{x} is an exact or inexact number less than
1019zero.
1020@end deffn
1021
1022
1023@node Conversion
1024@subsubsection Converting Numbers To and From Strings
1025@rnindex number->string
1026@rnindex string->number
1027
b89c4943
LC
1028The following procedures read and write numbers according to their
1029external representation as defined by R5RS (@pxref{Lexical structure,
1030R5RS Lexical Structure,, r5rs, The Revised^5 Report on the Algorithmic
a2f00b9b 1031Language Scheme}). @xref{Number Input and Output, the @code{(ice-9
b89c4943
LC
1032i18n)} module}, for locale-dependent number parsing.
1033
07d83abe
MV
1034@deffn {Scheme Procedure} number->string n [radix]
1035@deffnx {C Function} scm_number_to_string (n, radix)
1036Return a string holding the external representation of the
1037number @var{n} in the given @var{radix}. If @var{n} is
1038inexact, a radix of 10 will be used.
1039@end deffn
1040
1041@deffn {Scheme Procedure} string->number string [radix]
1042@deffnx {C Function} scm_string_to_number (string, radix)
1043Return a number of the maximally precise representation
1044expressed by the given @var{string}. @var{radix} must be an
1045exact integer, either 2, 8, 10, or 16. If supplied, @var{radix}
1046is a default radix that may be overridden by an explicit radix
1047prefix in @var{string} (e.g. "#o177"). If @var{radix} is not
1048supplied, then the default radix is 10. If string is not a
1049syntactically valid notation for a number, then
1050@code{string->number} returns @code{#f}.
1051@end deffn
1052
1b09b607
KR
1053@deftypefn {C Function} SCM scm_c_locale_stringn_to_number (const char *string, size_t len, unsigned radix)
1054As per @code{string->number} above, but taking a C string, as pointer
1055and length. The string characters should be in the current locale
1056encoding (@code{locale} in the name refers only to that, there's no
1057locale-dependent parsing).
1058@end deftypefn
1059
07d83abe
MV
1060
1061@node Complex
1062@subsubsection Complex Number Operations
1063@rnindex make-rectangular
1064@rnindex make-polar
1065@rnindex real-part
1066@rnindex imag-part
1067@rnindex magnitude
1068@rnindex angle
1069
3323ec06
NJ
1070@deffn {Scheme Procedure} make-rectangular real_part imaginary_part
1071@deffnx {C Function} scm_make_rectangular (real_part, imaginary_part)
1072Return a complex number constructed of the given @var{real-part} and @var{imaginary-part} parts.
07d83abe
MV
1073@end deffn
1074
1075@deffn {Scheme Procedure} make-polar x y
1076@deffnx {C Function} scm_make_polar (x, y)
34942993 1077@cindex polar form
07d83abe
MV
1078Return the complex number @var{x} * e^(i * @var{y}).
1079@end deffn
1080
1081@c begin (texi-doc-string "guile" "real-part")
1082@deffn {Scheme Procedure} real-part z
1083@deffnx {C Function} scm_real_part (z)
1084Return the real part of the number @var{z}.
1085@end deffn
1086
1087@c begin (texi-doc-string "guile" "imag-part")
1088@deffn {Scheme Procedure} imag-part z
1089@deffnx {C Function} scm_imag_part (z)
1090Return the imaginary part of the number @var{z}.
1091@end deffn
1092
1093@c begin (texi-doc-string "guile" "magnitude")
1094@deffn {Scheme Procedure} magnitude z
1095@deffnx {C Function} scm_magnitude (z)
1096Return the magnitude of the number @var{z}. This is the same as
1097@code{abs} for real arguments, but also allows complex numbers.
1098@end deffn
1099
1100@c begin (texi-doc-string "guile" "angle")
1101@deffn {Scheme Procedure} angle z
1102@deffnx {C Function} scm_angle (z)
1103Return the angle of the complex number @var{z}.
1104@end deffn
1105
5615f696
MV
1106@deftypefn {C Function} SCM scm_c_make_rectangular (double re, double im)
1107@deftypefnx {C Function} SCM scm_c_make_polar (double x, double y)
1108Like @code{scm_make_rectangular} or @code{scm_make_polar},
1109respectively, but these functions take @code{double}s as their
1110arguments.
1111@end deftypefn
1112
1113@deftypefn {C Function} double scm_c_real_part (z)
1114@deftypefnx {C Function} double scm_c_imag_part (z)
1115Returns the real or imaginary part of @var{z} as a @code{double}.
1116@end deftypefn
1117
1118@deftypefn {C Function} double scm_c_magnitude (z)
1119@deftypefnx {C Function} double scm_c_angle (z)
1120Returns the magnitude or angle of @var{z} as a @code{double}.
1121@end deftypefn
1122
07d83abe
MV
1123
1124@node Arithmetic
1125@subsubsection Arithmetic Functions
1126@rnindex max
1127@rnindex min
1128@rnindex +
1129@rnindex *
1130@rnindex -
1131@rnindex /
b1f57ea4
LC
1132@findex 1+
1133@findex 1-
07d83abe
MV
1134@rnindex abs
1135@rnindex floor
1136@rnindex ceiling
1137@rnindex truncate
1138@rnindex round
ff62c168
MW
1139@rnindex euclidean/
1140@rnindex euclidean-quotient
1141@rnindex euclidean-remainder
1142@rnindex centered/
1143@rnindex centered-quotient
1144@rnindex centered-remainder
07d83abe
MV
1145
1146The C arithmetic functions below always takes two arguments, while the
1147Scheme functions can take an arbitrary number. When you need to
1148invoke them with just one argument, for example to compute the
1149equivalent od @code{(- x)}, pass @code{SCM_UNDEFINED} as the second
1150one: @code{scm_difference (x, SCM_UNDEFINED)}.
1151
1152@c begin (texi-doc-string "guile" "+")
1153@deffn {Scheme Procedure} + z1 @dots{}
1154@deffnx {C Function} scm_sum (z1, z2)
1155Return the sum of all parameter values. Return 0 if called without any
1156parameters.
1157@end deffn
1158
1159@c begin (texi-doc-string "guile" "-")
1160@deffn {Scheme Procedure} - z1 z2 @dots{}
1161@deffnx {C Function} scm_difference (z1, z2)
1162If called with one argument @var{z1}, -@var{z1} is returned. Otherwise
1163the sum of all but the first argument are subtracted from the first
1164argument.
1165@end deffn
1166
1167@c begin (texi-doc-string "guile" "*")
1168@deffn {Scheme Procedure} * z1 @dots{}
1169@deffnx {C Function} scm_product (z1, z2)
1170Return the product of all arguments. If called without arguments, 1 is
1171returned.
1172@end deffn
1173
1174@c begin (texi-doc-string "guile" "/")
1175@deffn {Scheme Procedure} / z1 z2 @dots{}
1176@deffnx {C Function} scm_divide (z1, z2)
1177Divide the first argument by the product of the remaining arguments. If
1178called with one argument @var{z1}, 1/@var{z1} is returned.
1179@end deffn
1180
b1f57ea4
LC
1181@deffn {Scheme Procedure} 1+ z
1182@deffnx {C Function} scm_oneplus (z)
1183Return @math{@var{z} + 1}.
1184@end deffn
1185
1186@deffn {Scheme Procedure} 1- z
1187@deffnx {C function} scm_oneminus (z)
1188Return @math{@var{z} - 1}.
1189@end deffn
1190
07d83abe
MV
1191@c begin (texi-doc-string "guile" "abs")
1192@deffn {Scheme Procedure} abs x
1193@deffnx {C Function} scm_abs (x)
1194Return the absolute value of @var{x}.
1195
1196@var{x} must be a number with zero imaginary part. To calculate the
1197magnitude of a complex number, use @code{magnitude} instead.
1198@end deffn
1199
1200@c begin (texi-doc-string "guile" "max")
1201@deffn {Scheme Procedure} max x1 x2 @dots{}
1202@deffnx {C Function} scm_max (x1, x2)
1203Return the maximum of all parameter values.
1204@end deffn
1205
1206@c begin (texi-doc-string "guile" "min")
1207@deffn {Scheme Procedure} min x1 x2 @dots{}
1208@deffnx {C Function} scm_min (x1, x2)
1209Return the minimum of all parameter values.
1210@end deffn
1211
1212@c begin (texi-doc-string "guile" "truncate")
fd8a1df5 1213@deffn {Scheme Procedure} truncate x
07d83abe
MV
1214@deffnx {C Function} scm_truncate_number (x)
1215Round the inexact number @var{x} towards zero.
1216@end deffn
1217
1218@c begin (texi-doc-string "guile" "round")
1219@deffn {Scheme Procedure} round x
1220@deffnx {C Function} scm_round_number (x)
1221Round the inexact number @var{x} to the nearest integer. When exactly
1222halfway between two integers, round to the even one.
1223@end deffn
1224
1225@c begin (texi-doc-string "guile" "floor")
1226@deffn {Scheme Procedure} floor x
1227@deffnx {C Function} scm_floor (x)
1228Round the number @var{x} towards minus infinity.
1229@end deffn
1230
1231@c begin (texi-doc-string "guile" "ceiling")
1232@deffn {Scheme Procedure} ceiling x
1233@deffnx {C Function} scm_ceiling (x)
1234Round the number @var{x} towards infinity.
1235@end deffn
1236
35da08ee
MV
1237@deftypefn {C Function} double scm_c_truncate (double x)
1238@deftypefnx {C Function} double scm_c_round (double x)
1239Like @code{scm_truncate_number} or @code{scm_round_number},
1240respectively, but these functions take and return @code{double}
1241values.
1242@end deftypefn
07d83abe 1243
ff62c168
MW
1244@deffn {Scheme Procedure} euclidean/ x y
1245@deffnx {Scheme Procedure} euclidean-quotient x y
1246@deffnx {Scheme Procedure} euclidean-remainder x y
ac6ce16b 1247@deffnx {C Function} scm_euclidean_divide (x y)
ff62c168
MW
1248@deffnx {C Function} scm_euclidean_quotient (x y)
1249@deffnx {C Function} scm_euclidean_remainder (x y)
1250These procedures accept two real numbers @var{x} and @var{y}, where the
1251divisor @var{y} must be non-zero. @code{euclidean-quotient} returns the
1252integer @var{q} and @code{euclidean-remainder} returns the real number
1253@var{r} such that @math{@var{x} = @var{q}*@var{y} + @var{r}} and
1254@math{0 <= @var{r} < abs(@var{y})}. @code{euclidean/} returns both @var{q} and
1255@var{r}, and is more efficient than computing each separately. Note
1256that when @math{@var{y} > 0}, @code{euclidean-quotient} returns
1257@math{floor(@var{x}/@var{y})}, otherwise it returns
1258@math{ceiling(@var{x}/@var{y})}.
1259
1260Note that these operators are equivalent to the R6RS operators
1261@code{div}, @code{mod}, and @code{div-and-mod}.
1262
1263@lisp
1264(euclidean-quotient 123 10) @result{} 12
1265(euclidean-remainder 123 10) @result{} 3
1266(euclidean/ 123 10) @result{} 12 and 3
1267(euclidean/ 123 -10) @result{} -12 and 3
1268(euclidean/ -123 10) @result{} -13 and 7
1269(euclidean/ -123 -10) @result{} 13 and 7
1270(euclidean/ -123.2 -63.5) @result{} 2.0 and 3.8
1271(euclidean/ 16/3 -10/7) @result{} -3 and 22/21
1272@end lisp
1273@end deffn
1274
1275@deffn {Scheme Procedure} centered/ x y
1276@deffnx {Scheme Procedure} centered-quotient x y
1277@deffnx {Scheme Procedure} centered-remainder x y
ac6ce16b 1278@deffnx {C Function} scm_centered_divide (x y)
ff62c168
MW
1279@deffnx {C Function} scm_centered_quotient (x y)
1280@deffnx {C Function} scm_centered_remainder (x y)
1281These procedures accept two real numbers @var{x} and @var{y}, where the
1282divisor @var{y} must be non-zero. @code{centered-quotient} returns the
1283integer @var{q} and @code{centered-remainder} returns the real number
1284@var{r} such that @math{@var{x} = @var{q}*@var{y} + @var{r}} and
1285@math{-abs(@var{y}/2) <= @var{r} < abs(@var{y}/2)}. @code{centered/}
1286returns both @var{q} and @var{r}, and is more efficient than computing
1287each separately.
1288
1289Note that @code{centered-quotient} returns @math{@var{x}/@var{y}}
1290rounded to the nearest integer. When @math{@var{x}/@var{y}} lies
1291exactly half-way between two integers, the tie is broken according to
1292the sign of @var{y}. If @math{@var{y} > 0}, ties are rounded toward
1293positive infinity, otherwise they are rounded toward negative infinity.
1294This is a consequence of the requirement that @math{-abs(@var{y}/2) <= @var{r} < abs(@var{y}/2)}.
1295
1296Note that these operators are equivalent to the R6RS operators
1297@code{div0}, @code{mod0}, and @code{div0-and-mod0}.
1298
1299@lisp
1300(centered-quotient 123 10) @result{} 12
1301(centered-remainder 123 10) @result{} 3
1302(centered/ 123 10) @result{} 12 and 3
1303(centered/ 123 -10) @result{} -12 and 3
1304(centered/ -123 10) @result{} -12 and -3
1305(centered/ -123 -10) @result{} 12 and -3
1306(centered/ -123.2 -63.5) @result{} 2.0 and 3.8
1307(centered/ 16/3 -10/7) @result{} -4 and -8/21
1308@end lisp
1309@end deffn
1310
07d83abe
MV
1311@node Scientific
1312@subsubsection Scientific Functions
1313
1314The following procedures accept any kind of number as arguments,
1315including complex numbers.
1316
1317@rnindex sqrt
1318@c begin (texi-doc-string "guile" "sqrt")
1319@deffn {Scheme Procedure} sqrt z
40296bab
KR
1320Return the square root of @var{z}. Of the two possible roots
1321(positive and negative), the one with the a positive real part is
1322returned, or if that's zero then a positive imaginary part. Thus,
1323
1324@example
1325(sqrt 9.0) @result{} 3.0
1326(sqrt -9.0) @result{} 0.0+3.0i
1327(sqrt 1.0+1.0i) @result{} 1.09868411346781+0.455089860562227i
1328(sqrt -1.0-1.0i) @result{} 0.455089860562227-1.09868411346781i
1329@end example
07d83abe
MV
1330@end deffn
1331
1332@rnindex expt
1333@c begin (texi-doc-string "guile" "expt")
1334@deffn {Scheme Procedure} expt z1 z2
1335Return @var{z1} raised to the power of @var{z2}.
1336@end deffn
1337
1338@rnindex sin
1339@c begin (texi-doc-string "guile" "sin")
1340@deffn {Scheme Procedure} sin z
1341Return the sine of @var{z}.
1342@end deffn
1343
1344@rnindex cos
1345@c begin (texi-doc-string "guile" "cos")
1346@deffn {Scheme Procedure} cos z
1347Return the cosine of @var{z}.
1348@end deffn
1349
1350@rnindex tan
1351@c begin (texi-doc-string "guile" "tan")
1352@deffn {Scheme Procedure} tan z
1353Return the tangent of @var{z}.
1354@end deffn
1355
1356@rnindex asin
1357@c begin (texi-doc-string "guile" "asin")
1358@deffn {Scheme Procedure} asin z
1359Return the arcsine of @var{z}.
1360@end deffn
1361
1362@rnindex acos
1363@c begin (texi-doc-string "guile" "acos")
1364@deffn {Scheme Procedure} acos z
1365Return the arccosine of @var{z}.
1366@end deffn
1367
1368@rnindex atan
1369@c begin (texi-doc-string "guile" "atan")
1370@deffn {Scheme Procedure} atan z
1371@deffnx {Scheme Procedure} atan y x
1372Return the arctangent of @var{z}, or of @math{@var{y}/@var{x}}.
1373@end deffn
1374
1375@rnindex exp
1376@c begin (texi-doc-string "guile" "exp")
1377@deffn {Scheme Procedure} exp z
1378Return e to the power of @var{z}, where e is the base of natural
1379logarithms (2.71828@dots{}).
1380@end deffn
1381
1382@rnindex log
1383@c begin (texi-doc-string "guile" "log")
1384@deffn {Scheme Procedure} log z
1385Return the natural logarithm of @var{z}.
1386@end deffn
1387
1388@c begin (texi-doc-string "guile" "log10")
1389@deffn {Scheme Procedure} log10 z
1390Return the base 10 logarithm of @var{z}.
1391@end deffn
1392
1393@c begin (texi-doc-string "guile" "sinh")
1394@deffn {Scheme Procedure} sinh z
1395Return the hyperbolic sine of @var{z}.
1396@end deffn
1397
1398@c begin (texi-doc-string "guile" "cosh")
1399@deffn {Scheme Procedure} cosh z
1400Return the hyperbolic cosine of @var{z}.
1401@end deffn
1402
1403@c begin (texi-doc-string "guile" "tanh")
1404@deffn {Scheme Procedure} tanh z
1405Return the hyperbolic tangent of @var{z}.
1406@end deffn
1407
1408@c begin (texi-doc-string "guile" "asinh")
1409@deffn {Scheme Procedure} asinh z
1410Return the hyperbolic arcsine of @var{z}.
1411@end deffn
1412
1413@c begin (texi-doc-string "guile" "acosh")
1414@deffn {Scheme Procedure} acosh z
1415Return the hyperbolic arccosine of @var{z}.
1416@end deffn
1417
1418@c begin (texi-doc-string "guile" "atanh")
1419@deffn {Scheme Procedure} atanh z
1420Return the hyperbolic arctangent of @var{z}.
1421@end deffn
1422
1423
07d83abe
MV
1424@node Bitwise Operations
1425@subsubsection Bitwise Operations
1426
1427For the following bitwise functions, negative numbers are treated as
1428infinite precision twos-complements. For instance @math{-6} is bits
1429@math{@dots{}111010}, with infinitely many ones on the left. It can
1430be seen that adding 6 (binary 110) to such a bit pattern gives all
1431zeros.
1432
1433@deffn {Scheme Procedure} logand n1 n2 @dots{}
1434@deffnx {C Function} scm_logand (n1, n2)
1435Return the bitwise @sc{and} of the integer arguments.
1436
1437@lisp
1438(logand) @result{} -1
1439(logand 7) @result{} 7
1440(logand #b111 #b011 #b001) @result{} 1
1441@end lisp
1442@end deffn
1443
1444@deffn {Scheme Procedure} logior n1 n2 @dots{}
1445@deffnx {C Function} scm_logior (n1, n2)
1446Return the bitwise @sc{or} of the integer arguments.
1447
1448@lisp
1449(logior) @result{} 0
1450(logior 7) @result{} 7
1451(logior #b000 #b001 #b011) @result{} 3
1452@end lisp
1453@end deffn
1454
1455@deffn {Scheme Procedure} logxor n1 n2 @dots{}
1456@deffnx {C Function} scm_loxor (n1, n2)
1457Return the bitwise @sc{xor} of the integer arguments. A bit is
1458set in the result if it is set in an odd number of arguments.
1459
1460@lisp
1461(logxor) @result{} 0
1462(logxor 7) @result{} 7
1463(logxor #b000 #b001 #b011) @result{} 2
1464(logxor #b000 #b001 #b011 #b011) @result{} 1
1465@end lisp
1466@end deffn
1467
1468@deffn {Scheme Procedure} lognot n
1469@deffnx {C Function} scm_lognot (n)
1470Return the integer which is the ones-complement of the integer
1471argument, ie.@: each 0 bit is changed to 1 and each 1 bit to 0.
1472
1473@lisp
1474(number->string (lognot #b10000000) 2)
1475 @result{} "-10000001"
1476(number->string (lognot #b0) 2)
1477 @result{} "-1"
1478@end lisp
1479@end deffn
1480
1481@deffn {Scheme Procedure} logtest j k
1482@deffnx {C Function} scm_logtest (j, k)
a46648ac
KR
1483Test whether @var{j} and @var{k} have any 1 bits in common. This is
1484equivalent to @code{(not (zero? (logand j k)))}, but without actually
1485calculating the @code{logand}, just testing for non-zero.
07d83abe 1486
a46648ac 1487@lisp
07d83abe
MV
1488(logtest #b0100 #b1011) @result{} #f
1489(logtest #b0100 #b0111) @result{} #t
1490@end lisp
1491@end deffn
1492
1493@deffn {Scheme Procedure} logbit? index j
1494@deffnx {C Function} scm_logbit_p (index, j)
a46648ac
KR
1495Test whether bit number @var{index} in @var{j} is set. @var{index}
1496starts from 0 for the least significant bit.
07d83abe 1497
a46648ac 1498@lisp
07d83abe
MV
1499(logbit? 0 #b1101) @result{} #t
1500(logbit? 1 #b1101) @result{} #f
1501(logbit? 2 #b1101) @result{} #t
1502(logbit? 3 #b1101) @result{} #t
1503(logbit? 4 #b1101) @result{} #f
1504@end lisp
1505@end deffn
1506
1507@deffn {Scheme Procedure} ash n cnt
1508@deffnx {C Function} scm_ash (n, cnt)
1509Return @var{n} shifted left by @var{cnt} bits, or shifted right if
1510@var{cnt} is negative. This is an ``arithmetic'' shift.
1511
1512This is effectively a multiplication by @m{2^{cnt}, 2^@var{cnt}}, and
1513when @var{cnt} is negative it's a division, rounded towards negative
1514infinity. (Note that this is not the same rounding as @code{quotient}
1515does.)
1516
1517With @var{n} viewed as an infinite precision twos complement,
1518@code{ash} means a left shift introducing zero bits, or a right shift
1519dropping bits.
1520
1521@lisp
1522(number->string (ash #b1 3) 2) @result{} "1000"
1523(number->string (ash #b1010 -1) 2) @result{} "101"
1524
1525;; -23 is bits ...11101001, -6 is bits ...111010
1526(ash -23 -2) @result{} -6
1527@end lisp
1528@end deffn
1529
1530@deffn {Scheme Procedure} logcount n
1531@deffnx {C Function} scm_logcount (n)
a46648ac 1532Return the number of bits in integer @var{n}. If @var{n} is
07d83abe
MV
1533positive, the 1-bits in its binary representation are counted.
1534If negative, the 0-bits in its two's-complement binary
a46648ac 1535representation are counted. If zero, 0 is returned.
07d83abe
MV
1536
1537@lisp
1538(logcount #b10101010)
1539 @result{} 4
1540(logcount 0)
1541 @result{} 0
1542(logcount -2)
1543 @result{} 1
1544@end lisp
1545@end deffn
1546
1547@deffn {Scheme Procedure} integer-length n
1548@deffnx {C Function} scm_integer_length (n)
1549Return the number of bits necessary to represent @var{n}.
1550
1551For positive @var{n} this is how many bits to the most significant one
1552bit. For negative @var{n} it's how many bits to the most significant
1553zero bit in twos complement form.
1554
1555@lisp
1556(integer-length #b10101010) @result{} 8
1557(integer-length #b1111) @result{} 4
1558(integer-length 0) @result{} 0
1559(integer-length -1) @result{} 0
1560(integer-length -256) @result{} 8
1561(integer-length -257) @result{} 9
1562@end lisp
1563@end deffn
1564
1565@deffn {Scheme Procedure} integer-expt n k
1566@deffnx {C Function} scm_integer_expt (n, k)
a46648ac
KR
1567Return @var{n} raised to the power @var{k}. @var{k} must be an exact
1568integer, @var{n} can be any number.
1569
1570Negative @var{k} is supported, and results in @m{1/n^|k|, 1/n^abs(k)}
1571in the usual way. @math{@var{n}^0} is 1, as usual, and that includes
1572@math{0^0} is 1.
07d83abe
MV
1573
1574@lisp
a46648ac
KR
1575(integer-expt 2 5) @result{} 32
1576(integer-expt -3 3) @result{} -27
1577(integer-expt 5 -3) @result{} 1/125
1578(integer-expt 0 0) @result{} 1
07d83abe
MV
1579@end lisp
1580@end deffn
1581
1582@deffn {Scheme Procedure} bit-extract n start end
1583@deffnx {C Function} scm_bit_extract (n, start, end)
1584Return the integer composed of the @var{start} (inclusive)
1585through @var{end} (exclusive) bits of @var{n}. The
1586@var{start}th bit becomes the 0-th bit in the result.
1587
1588@lisp
1589(number->string (bit-extract #b1101101010 0 4) 2)
1590 @result{} "1010"
1591(number->string (bit-extract #b1101101010 4 9) 2)
1592 @result{} "10110"
1593@end lisp
1594@end deffn
1595
1596
1597@node Random
1598@subsubsection Random Number Generation
1599
1600Pseudo-random numbers are generated from a random state object, which
77b13912 1601can be created with @code{seed->random-state} or
1d454874 1602@code{datum->random-state}. An external representation (i.e. one
77b13912
AR
1603which can written with @code{write} and read with @code{read}) of a
1604random state object can be obtained via
1d454874 1605@code{random-state->datum}. The @var{state} parameter to the
77b13912
AR
1606various functions below is optional, it defaults to the state object
1607in the @code{*random-state*} variable.
07d83abe
MV
1608
1609@deffn {Scheme Procedure} copy-random-state [state]
1610@deffnx {C Function} scm_copy_random_state (state)
1611Return a copy of the random state @var{state}.
1612@end deffn
1613
1614@deffn {Scheme Procedure} random n [state]
1615@deffnx {C Function} scm_random (n, state)
1616Return a number in [0, @var{n}).
1617
1618Accepts a positive integer or real n and returns a
1619number of the same type between zero (inclusive) and
1620@var{n} (exclusive). The values returned have a uniform
1621distribution.
1622@end deffn
1623
1624@deffn {Scheme Procedure} random:exp [state]
1625@deffnx {C Function} scm_random_exp (state)
1626Return an inexact real in an exponential distribution with mean
16271. For an exponential distribution with mean @var{u} use @code{(*
1628@var{u} (random:exp))}.
1629@end deffn
1630
1631@deffn {Scheme Procedure} random:hollow-sphere! vect [state]
1632@deffnx {C Function} scm_random_hollow_sphere_x (vect, state)
1633Fills @var{vect} with inexact real random numbers the sum of whose
1634squares is equal to 1.0. Thinking of @var{vect} as coordinates in
1635space of dimension @var{n} @math{=} @code{(vector-length @var{vect})},
1636the coordinates are uniformly distributed over the surface of the unit
1637n-sphere.
1638@end deffn
1639
1640@deffn {Scheme Procedure} random:normal [state]
1641@deffnx {C Function} scm_random_normal (state)
1642Return an inexact real in a normal distribution. The distribution
1643used has mean 0 and standard deviation 1. For a normal distribution
1644with mean @var{m} and standard deviation @var{d} use @code{(+ @var{m}
1645(* @var{d} (random:normal)))}.
1646@end deffn
1647
1648@deffn {Scheme Procedure} random:normal-vector! vect [state]
1649@deffnx {C Function} scm_random_normal_vector_x (vect, state)
1650Fills @var{vect} with inexact real random numbers that are
1651independent and standard normally distributed
1652(i.e., with mean 0 and variance 1).
1653@end deffn
1654
1655@deffn {Scheme Procedure} random:solid-sphere! vect [state]
1656@deffnx {C Function} scm_random_solid_sphere_x (vect, state)
1657Fills @var{vect} with inexact real random numbers the sum of whose
1658squares is less than 1.0. Thinking of @var{vect} as coordinates in
1659space of dimension @var{n} @math{=} @code{(vector-length @var{vect})},
1660the coordinates are uniformly distributed within the unit
4497bd2f 1661@var{n}-sphere.
07d83abe
MV
1662@c FIXME: What does this mean, particularly the n-sphere part?
1663@end deffn
1664
1665@deffn {Scheme Procedure} random:uniform [state]
1666@deffnx {C Function} scm_random_uniform (state)
1667Return a uniformly distributed inexact real random number in
1668[0,1).
1669@end deffn
1670
1671@deffn {Scheme Procedure} seed->random-state seed
1672@deffnx {C Function} scm_seed_to_random_state (seed)
1673Return a new random state using @var{seed}.
1674@end deffn
1675
1d454874
AW
1676@deffn {Scheme Procedure} datum->random-state datum
1677@deffnx {C Function} scm_datum_to_random_state (datum)
1678Return a new random state from @var{datum}, which should have been
1679obtained by @code{random-state->datum}.
77b13912
AR
1680@end deffn
1681
1d454874
AW
1682@deffn {Scheme Procedure} random-state->datum state
1683@deffnx {C Function} scm_random_state_to_datum (state)
1684Return a datum representation of @var{state} that may be written out and
1685read back with the Scheme reader.
77b13912
AR
1686@end deffn
1687
07d83abe
MV
1688@defvar *random-state*
1689The global random state used by the above functions when the
1690@var{state} parameter is not given.
1691@end defvar
1692
8c726cf0
NJ
1693Note that the initial value of @code{*random-state*} is the same every
1694time Guile starts up. Therefore, if you don't pass a @var{state}
1695parameter to the above procedures, and you don't set
1696@code{*random-state*} to @code{(seed->random-state your-seed)}, where
1697@code{your-seed} is something that @emph{isn't} the same every time,
1698you'll get the same sequence of ``random'' numbers on every run.
1699
1700For example, unless the relevant source code has changed, @code{(map
1701random (cdr (iota 30)))}, if the first use of random numbers since
1702Guile started up, will always give:
1703
1704@lisp
1705(map random (cdr (iota 19)))
1706@result{}
1707(0 1 1 2 2 2 1 2 6 7 10 0 5 3 12 5 5 12)
1708@end lisp
1709
1710To use the time of day as the random seed, you can use code like this:
1711
1712@lisp
1713(let ((time (gettimeofday)))
1714 (set! *random-state*
1715 (seed->random-state (+ (car time)
1716 (cdr time)))))
1717@end lisp
1718
1719@noindent
1720And then (depending on the time of day, of course):
1721
1722@lisp
1723(map random (cdr (iota 19)))
1724@result{}
1725(0 0 1 0 2 4 5 4 5 5 9 3 10 1 8 3 14 17)
1726@end lisp
1727
1728For security applications, such as password generation, you should use
1729more bits of seed. Otherwise an open source password generator could
1730be attacked by guessing the seed@dots{} but that's a subject for
1731another manual.
1732
07d83abe
MV
1733
1734@node Characters
1735@subsection Characters
1736@tpindex Characters
1737
3f12aedb
MG
1738In Scheme, there is a data type to describe a single character.
1739
1740Defining what exactly a character @emph{is} can be more complicated
bb15a36c
MG
1741than it seems. Guile follows the advice of R6RS and uses The Unicode
1742Standard to help define what a character is. So, for Guile, a
1743character is anything in the Unicode Character Database.
1744
1745@cindex code point
1746@cindex Unicode code point
1747
1748The Unicode Character Database is basically a table of characters
1749indexed using integers called 'code points'. Valid code points are in
1750the ranges 0 to @code{#xD7FF} inclusive or @code{#xE000} to
1751@code{#x10FFFF} inclusive, which is about 1.1 million code points.
1752
1753@cindex designated code point
1754@cindex code point, designated
1755
1756Any code point that has been assigned to a character or that has
1757otherwise been given a meaning by Unicode is called a 'designated code
1758point'. Most of the designated code points, about 200,000 of them,
1759indicate characters, accents or other combining marks that modify
1760other characters, symbols, whitespace, and control characters. Some
1761are not characters but indicators that suggest how to format or
1762display neighboring characters.
1763
1764@cindex reserved code point
1765@cindex code point, reserved
1766
1767If a code point is not a designated code point -- if it has not been
1768assigned to a character by The Unicode Standard -- it is a 'reserved
1769code point', meaning that they are reserved for future use. Most of
1770the code points, about 800,000, are 'reserved code points'.
1771
1772By convention, a Unicode code point is written as
1773``U+XXXX'' where ``XXXX'' is a hexadecimal number. Please note that
1774this convenient notation is not valid code. Guile does not interpret
1775``U+XXXX'' as a character.
3f12aedb 1776
050ab45f
MV
1777In Scheme, a character literal is written as @code{#\@var{name}} where
1778@var{name} is the name of the character that you want. Printable
1779characters have their usual single character name; for example,
bb15a36c
MG
1780@code{#\a} is a lower case @code{a}.
1781
1782Some of the code points are 'combining characters' that are not meant
1783to be printed by themselves but are instead meant to modify the
1784appearance of the previous character. For combining characters, an
1785alternate form of the character literal is @code{#\} followed by
1786U+25CC (a small, dotted circle), followed by the combining character.
1787This allows the combining character to be drawn on the circle, not on
1788the backslash of @code{#\}.
1789
1790Many of the non-printing characters, such as whitespace characters and
1791control characters, also have names.
07d83abe 1792
15b6a6b2
MG
1793The most commonly used non-printing characters have long character
1794names, described in the table below.
1795
1796@multitable {@code{#\backspace}} {Preferred}
1797@item Character Name @tab Codepoint
1798@item @code{#\nul} @tab U+0000
1799@item @code{#\alarm} @tab u+0007
1800@item @code{#\backspace} @tab U+0008
1801@item @code{#\tab} @tab U+0009
1802@item @code{#\linefeed} @tab U+000A
1803@item @code{#\newline} @tab U+000A
1804@item @code{#\vtab} @tab U+000B
1805@item @code{#\page} @tab U+000C
1806@item @code{#\return} @tab U+000D
1807@item @code{#\esc} @tab U+001B
1808@item @code{#\space} @tab U+0020
1809@item @code{#\delete} @tab U+007F
1810@end multitable
1811
1812There are also short names for all of the ``C0 control characters''
1813(those with code points below 32). The following table lists the short
1814name for each character.
07d83abe
MV
1815
1816@multitable @columnfractions .25 .25 .25 .25
1817@item 0 = @code{#\nul}
1818 @tab 1 = @code{#\soh}
1819 @tab 2 = @code{#\stx}
1820 @tab 3 = @code{#\etx}
1821@item 4 = @code{#\eot}
1822 @tab 5 = @code{#\enq}
1823 @tab 6 = @code{#\ack}
1824 @tab 7 = @code{#\bel}
1825@item 8 = @code{#\bs}
1826 @tab 9 = @code{#\ht}
6ea30487 1827 @tab 10 = @code{#\lf}
07d83abe 1828 @tab 11 = @code{#\vt}
3f12aedb 1829@item 12 = @code{#\ff}
07d83abe
MV
1830 @tab 13 = @code{#\cr}
1831 @tab 14 = @code{#\so}
1832 @tab 15 = @code{#\si}
1833@item 16 = @code{#\dle}
1834 @tab 17 = @code{#\dc1}
1835 @tab 18 = @code{#\dc2}
1836 @tab 19 = @code{#\dc3}
1837@item 20 = @code{#\dc4}
1838 @tab 21 = @code{#\nak}
1839 @tab 22 = @code{#\syn}
1840 @tab 23 = @code{#\etb}
1841@item 24 = @code{#\can}
1842 @tab 25 = @code{#\em}
1843 @tab 26 = @code{#\sub}
1844 @tab 27 = @code{#\esc}
1845@item 28 = @code{#\fs}
1846 @tab 29 = @code{#\gs}
1847 @tab 30 = @code{#\rs}
1848 @tab 31 = @code{#\us}
1849@item 32 = @code{#\sp}
1850@end multitable
1851
15b6a6b2
MG
1852The short name for the ``delete'' character (code point U+007F) is
1853@code{#\del}.
07d83abe 1854
15b6a6b2
MG
1855There are also a few alternative names left over for compatibility with
1856previous versions of Guile.
07d83abe 1857
3f12aedb
MG
1858@multitable {@code{#\backspace}} {Preferred}
1859@item Alternate @tab Standard
3f12aedb 1860@item @code{#\nl} @tab @code{#\newline}
15b6a6b2 1861@item @code{#\np} @tab @code{#\page}
07d83abe
MV
1862@item @code{#\null} @tab @code{#\nul}
1863@end multitable
1864
bb15a36c
MG
1865Characters may also be written using their code point values. They can
1866be written with as an octal number, such as @code{#\10} for
1867@code{#\bs} or @code{#\177} for @code{#\del}.
3f12aedb 1868
0f3a70cf
MG
1869If one prefers hex to octal, there is an additional syntax for character
1870escapes: @code{#\xHHHH} -- the letter 'x' followed by a hexadecimal
1871number of one to eight digits.
6ea30487 1872
07d83abe
MV
1873@rnindex char?
1874@deffn {Scheme Procedure} char? x
1875@deffnx {C Function} scm_char_p (x)
1876Return @code{#t} iff @var{x} is a character, else @code{#f}.
1877@end deffn
1878
bb15a36c 1879Fundamentally, the character comparison operations below are
3f12aedb
MG
1880numeric comparisons of the character's code points.
1881
07d83abe
MV
1882@rnindex char=?
1883@deffn {Scheme Procedure} char=? x y
3f12aedb
MG
1884Return @code{#t} iff code point of @var{x} is equal to the code point
1885of @var{y}, else @code{#f}.
07d83abe
MV
1886@end deffn
1887
1888@rnindex char<?
1889@deffn {Scheme Procedure} char<? x y
3f12aedb
MG
1890Return @code{#t} iff the code point of @var{x} is less than the code
1891point of @var{y}, else @code{#f}.
07d83abe
MV
1892@end deffn
1893
1894@rnindex char<=?
1895@deffn {Scheme Procedure} char<=? x y
3f12aedb
MG
1896Return @code{#t} iff the code point of @var{x} is less than or equal
1897to the code point of @var{y}, else @code{#f}.
07d83abe
MV
1898@end deffn
1899
1900@rnindex char>?
1901@deffn {Scheme Procedure} char>? x y
3f12aedb
MG
1902Return @code{#t} iff the code point of @var{x} is greater than the
1903code point of @var{y}, else @code{#f}.
07d83abe
MV
1904@end deffn
1905
1906@rnindex char>=?
1907@deffn {Scheme Procedure} char>=? x y
3f12aedb
MG
1908Return @code{#t} iff the code point of @var{x} is greater than or
1909equal to the code point of @var{y}, else @code{#f}.
07d83abe
MV
1910@end deffn
1911
bb15a36c
MG
1912@cindex case folding
1913
1914Case-insensitive character comparisons use @emph{Unicode case
1915folding}. In case folding comparisons, if a character is lowercase
1916and has an uppercase form that can be expressed as a single character,
1917it is converted to uppercase before comparison. All other characters
1918undergo no conversion before the comparison occurs. This includes the
1919German sharp S (Eszett) which is not uppercased before conversion
1920because its uppercase form has two characters. Unicode case folding
1921is language independent: it uses rules that are generally true, but,
1922it cannot cover all cases for all languages.
3f12aedb 1923
07d83abe
MV
1924@rnindex char-ci=?
1925@deffn {Scheme Procedure} char-ci=? x y
3f12aedb
MG
1926Return @code{#t} iff the case-folded code point of @var{x} is the same
1927as the case-folded code point of @var{y}, else @code{#f}.
07d83abe
MV
1928@end deffn
1929
1930@rnindex char-ci<?
1931@deffn {Scheme Procedure} char-ci<? x y
3f12aedb
MG
1932Return @code{#t} iff the case-folded code point of @var{x} is less
1933than the case-folded code point of @var{y}, else @code{#f}.
07d83abe
MV
1934@end deffn
1935
1936@rnindex char-ci<=?
1937@deffn {Scheme Procedure} char-ci<=? x y
3f12aedb
MG
1938Return @code{#t} iff the case-folded code point of @var{x} is less
1939than or equal to the case-folded code point of @var{y}, else
1940@code{#f}.
07d83abe
MV
1941@end deffn
1942
1943@rnindex char-ci>?
1944@deffn {Scheme Procedure} char-ci>? x y
3f12aedb
MG
1945Return @code{#t} iff the case-folded code point of @var{x} is greater
1946than the case-folded code point of @var{y}, else @code{#f}.
07d83abe
MV
1947@end deffn
1948
1949@rnindex char-ci>=?
1950@deffn {Scheme Procedure} char-ci>=? x y
3f12aedb
MG
1951Return @code{#t} iff the case-folded code point of @var{x} is greater
1952than or equal to the case-folded code point of @var{y}, else
1953@code{#f}.
07d83abe
MV
1954@end deffn
1955
1956@rnindex char-alphabetic?
1957@deffn {Scheme Procedure} char-alphabetic? chr
1958@deffnx {C Function} scm_char_alphabetic_p (chr)
1959Return @code{#t} iff @var{chr} is alphabetic, else @code{#f}.
07d83abe
MV
1960@end deffn
1961
1962@rnindex char-numeric?
1963@deffn {Scheme Procedure} char-numeric? chr
1964@deffnx {C Function} scm_char_numeric_p (chr)
1965Return @code{#t} iff @var{chr} is numeric, else @code{#f}.
07d83abe
MV
1966@end deffn
1967
1968@rnindex char-whitespace?
1969@deffn {Scheme Procedure} char-whitespace? chr
1970@deffnx {C Function} scm_char_whitespace_p (chr)
1971Return @code{#t} iff @var{chr} is whitespace, else @code{#f}.
07d83abe
MV
1972@end deffn
1973
1974@rnindex char-upper-case?
1975@deffn {Scheme Procedure} char-upper-case? chr
1976@deffnx {C Function} scm_char_upper_case_p (chr)
1977Return @code{#t} iff @var{chr} is uppercase, else @code{#f}.
07d83abe
MV
1978@end deffn
1979
1980@rnindex char-lower-case?
1981@deffn {Scheme Procedure} char-lower-case? chr
1982@deffnx {C Function} scm_char_lower_case_p (chr)
1983Return @code{#t} iff @var{chr} is lowercase, else @code{#f}.
07d83abe
MV
1984@end deffn
1985
1986@deffn {Scheme Procedure} char-is-both? chr
1987@deffnx {C Function} scm_char_is_both_p (chr)
1988Return @code{#t} iff @var{chr} is either uppercase or lowercase, else
5676b4fa 1989@code{#f}.
07d83abe
MV
1990@end deffn
1991
0ca3a342
JG
1992@deffn {Scheme Procedure} char-general-category chr
1993@deffnx {C Function} scm_char_general_category (chr)
1994Return a symbol giving the two-letter name of the Unicode general
1995category assigned to @var{chr} or @code{#f} if no named category is
1996assigned. The following table provides a list of category names along
1997with their meanings.
1998
1999@multitable @columnfractions .1 .4 .1 .4
2000@item Lu
2001 @tab Uppercase letter
2002 @tab Pf
2003 @tab Final quote punctuation
2004@item Ll
2005 @tab Lowercase letter
2006 @tab Po
2007 @tab Other punctuation
2008@item Lt
2009 @tab Titlecase letter
2010 @tab Sm
2011 @tab Math symbol
2012@item Lm
2013 @tab Modifier letter
2014 @tab Sc
2015 @tab Currency symbol
2016@item Lo
2017 @tab Other letter
2018 @tab Sk
2019 @tab Modifier symbol
2020@item Mn
2021 @tab Non-spacing mark
2022 @tab So
2023 @tab Other symbol
2024@item Mc
2025 @tab Combining spacing mark
2026 @tab Zs
2027 @tab Space separator
2028@item Me
2029 @tab Enclosing mark
2030 @tab Zl
2031 @tab Line separator
2032@item Nd
2033 @tab Decimal digit number
2034 @tab Zp
2035 @tab Paragraph separator
2036@item Nl
2037 @tab Letter number
2038 @tab Cc
2039 @tab Control
2040@item No
2041 @tab Other number
2042 @tab Cf
2043 @tab Format
2044@item Pc
2045 @tab Connector punctuation
2046 @tab Cs
2047 @tab Surrogate
2048@item Pd
2049 @tab Dash punctuation
2050 @tab Co
2051 @tab Private use
2052@item Ps
2053 @tab Open punctuation
2054 @tab Cn
2055 @tab Unassigned
2056@item Pe
2057 @tab Close punctuation
2058 @tab
2059 @tab
2060@item Pi
2061 @tab Initial quote punctuation
2062 @tab
2063 @tab
2064@end multitable
2065@end deffn
2066
07d83abe
MV
2067@rnindex char->integer
2068@deffn {Scheme Procedure} char->integer chr
2069@deffnx {C Function} scm_char_to_integer (chr)
3f12aedb 2070Return the code point of @var{chr}.
07d83abe
MV
2071@end deffn
2072
2073@rnindex integer->char
2074@deffn {Scheme Procedure} integer->char n
2075@deffnx {C Function} scm_integer_to_char (n)
3f12aedb
MG
2076Return the character that has code point @var{n}. The integer @var{n}
2077must be a valid code point. Valid code points are in the ranges 0 to
2078@code{#xD7FF} inclusive or @code{#xE000} to @code{#x10FFFF} inclusive.
07d83abe
MV
2079@end deffn
2080
2081@rnindex char-upcase
2082@deffn {Scheme Procedure} char-upcase chr
2083@deffnx {C Function} scm_char_upcase (chr)
2084Return the uppercase character version of @var{chr}.
2085@end deffn
2086
2087@rnindex char-downcase
2088@deffn {Scheme Procedure} char-downcase chr
2089@deffnx {C Function} scm_char_downcase (chr)
2090Return the lowercase character version of @var{chr}.
2091@end deffn
2092
820f33aa
JG
2093@rnindex char-titlecase
2094@deffn {Scheme Procedure} char-titlecase chr
2095@deffnx {C Function} scm_char_titlecase (chr)
2096Return the titlecase character version of @var{chr} if one exists;
2097otherwise return the uppercase version.
2098
2099For most characters these will be the same, but the Unicode Standard
2100includes certain digraph compatibility characters, such as @code{U+01F3}
2101``dz'', for which the uppercase and titlecase characters are different
2102(@code{U+01F1} ``DZ'' and @code{U+01F2} ``Dz'' in this case,
2103respectively).
2104@end deffn
2105
a1dcb961
MG
2106@tindex scm_t_wchar
2107@deftypefn {C Function} scm_t_wchar scm_c_upcase (scm_t_wchar @var{c})
2108@deftypefnx {C Function} scm_t_wchar scm_c_downcase (scm_t_wchar @var{c})
2109@deftypefnx {C Function} scm_t_wchar scm_c_titlecase (scm_t_wchar @var{c})
2110
2111These C functions take an integer representation of a Unicode
2112codepoint and return the codepoint corresponding to its uppercase,
2113lowercase, and titlecase forms respectively. The type
2114@code{scm_t_wchar} is a signed, 32-bit integer.
2115@end deftypefn
2116
050ab45f
MV
2117@node Character Sets
2118@subsection Character Sets
07d83abe 2119
050ab45f
MV
2120The features described in this section correspond directly to SRFI-14.
2121
2122The data type @dfn{charset} implements sets of characters
2123(@pxref{Characters}). Because the internal representation of
2124character sets is not visible to the user, a lot of procedures for
2125handling them are provided.
2126
2127Character sets can be created, extended, tested for the membership of a
2128characters and be compared to other character sets.
2129
050ab45f
MV
2130@menu
2131* Character Set Predicates/Comparison::
2132* Iterating Over Character Sets:: Enumerate charset elements.
2133* Creating Character Sets:: Making new charsets.
2134* Querying Character Sets:: Test charsets for membership etc.
2135* Character-Set Algebra:: Calculating new charsets.
2136* Standard Character Sets:: Variables containing predefined charsets.
2137@end menu
2138
2139@node Character Set Predicates/Comparison
2140@subsubsection Character Set Predicates/Comparison
2141
2142Use these procedures for testing whether an object is a character set,
2143or whether several character sets are equal or subsets of each other.
2144@code{char-set-hash} can be used for calculating a hash value, maybe for
2145usage in fast lookup procedures.
2146
2147@deffn {Scheme Procedure} char-set? obj
2148@deffnx {C Function} scm_char_set_p (obj)
2149Return @code{#t} if @var{obj} is a character set, @code{#f}
2150otherwise.
2151@end deffn
2152
2153@deffn {Scheme Procedure} char-set= . char_sets
2154@deffnx {C Function} scm_char_set_eq (char_sets)
2155Return @code{#t} if all given character sets are equal.
2156@end deffn
2157
2158@deffn {Scheme Procedure} char-set<= . char_sets
2159@deffnx {C Function} scm_char_set_leq (char_sets)
2160Return @code{#t} if every character set @var{cs}i is a subset
2161of character set @var{cs}i+1.
2162@end deffn
2163
2164@deffn {Scheme Procedure} char-set-hash cs [bound]
2165@deffnx {C Function} scm_char_set_hash (cs, bound)
2166Compute a hash value for the character set @var{cs}. If
2167@var{bound} is given and non-zero, it restricts the
2168returned value to the range 0 @dots{} @var{bound - 1}.
2169@end deffn
2170
2171@c ===================================================================
2172
2173@node Iterating Over Character Sets
2174@subsubsection Iterating Over Character Sets
2175
2176Character set cursors are a means for iterating over the members of a
2177character sets. After creating a character set cursor with
2178@code{char-set-cursor}, a cursor can be dereferenced with
2179@code{char-set-ref}, advanced to the next member with
2180@code{char-set-cursor-next}. Whether a cursor has passed past the last
2181element of the set can be checked with @code{end-of-char-set?}.
2182
2183Additionally, mapping and (un-)folding procedures for character sets are
2184provided.
2185
2186@deffn {Scheme Procedure} char-set-cursor cs
2187@deffnx {C Function} scm_char_set_cursor (cs)
2188Return a cursor into the character set @var{cs}.
2189@end deffn
2190
2191@deffn {Scheme Procedure} char-set-ref cs cursor
2192@deffnx {C Function} scm_char_set_ref (cs, cursor)
2193Return the character at the current cursor position
2194@var{cursor} in the character set @var{cs}. It is an error to
2195pass a cursor for which @code{end-of-char-set?} returns true.
2196@end deffn
2197
2198@deffn {Scheme Procedure} char-set-cursor-next cs cursor
2199@deffnx {C Function} scm_char_set_cursor_next (cs, cursor)
2200Advance the character set cursor @var{cursor} to the next
2201character in the character set @var{cs}. It is an error if the
2202cursor given satisfies @code{end-of-char-set?}.
2203@end deffn
2204
2205@deffn {Scheme Procedure} end-of-char-set? cursor
2206@deffnx {C Function} scm_end_of_char_set_p (cursor)
2207Return @code{#t} if @var{cursor} has reached the end of a
2208character set, @code{#f} otherwise.
2209@end deffn
2210
2211@deffn {Scheme Procedure} char-set-fold kons knil cs
2212@deffnx {C Function} scm_char_set_fold (kons, knil, cs)
2213Fold the procedure @var{kons} over the character set @var{cs},
2214initializing it with @var{knil}.
2215@end deffn
2216
2217@deffn {Scheme Procedure} char-set-unfold p f g seed [base_cs]
2218@deffnx {C Function} scm_char_set_unfold (p, f, g, seed, base_cs)
2219This is a fundamental constructor for character sets.
2220@itemize @bullet
2221@item @var{g} is used to generate a series of ``seed'' values
2222from the initial seed: @var{seed}, (@var{g} @var{seed}),
2223(@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}), @dots{}
2224@item @var{p} tells us when to stop -- when it returns true
2225when applied to one of the seed values.
2226@item @var{f} maps each seed value to a character. These
2227characters are added to the base character set @var{base_cs} to
2228form the result; @var{base_cs} defaults to the empty set.
2229@end itemize
2230@end deffn
2231
2232@deffn {Scheme Procedure} char-set-unfold! p f g seed base_cs
2233@deffnx {C Function} scm_char_set_unfold_x (p, f, g, seed, base_cs)
2234This is a fundamental constructor for character sets.
2235@itemize @bullet
2236@item @var{g} is used to generate a series of ``seed'' values
2237from the initial seed: @var{seed}, (@var{g} @var{seed}),
2238(@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}), @dots{}
2239@item @var{p} tells us when to stop -- when it returns true
2240when applied to one of the seed values.
2241@item @var{f} maps each seed value to a character. These
2242characters are added to the base character set @var{base_cs} to
2243form the result; @var{base_cs} defaults to the empty set.
2244@end itemize
2245@end deffn
2246
2247@deffn {Scheme Procedure} char-set-for-each proc cs
2248@deffnx {C Function} scm_char_set_for_each (proc, cs)
2249Apply @var{proc} to every character in the character set
2250@var{cs}. The return value is not specified.
2251@end deffn
2252
2253@deffn {Scheme Procedure} char-set-map proc cs
2254@deffnx {C Function} scm_char_set_map (proc, cs)
2255Map the procedure @var{proc} over every character in @var{cs}.
2256@var{proc} must be a character -> character procedure.
2257@end deffn
2258
2259@c ===================================================================
2260
2261@node Creating Character Sets
2262@subsubsection Creating Character Sets
2263
2264New character sets are produced with these procedures.
2265
2266@deffn {Scheme Procedure} char-set-copy cs
2267@deffnx {C Function} scm_char_set_copy (cs)
2268Return a newly allocated character set containing all
2269characters in @var{cs}.
2270@end deffn
2271
2272@deffn {Scheme Procedure} char-set . rest
2273@deffnx {C Function} scm_char_set (rest)
2274Return a character set containing all given characters.
2275@end deffn
2276
2277@deffn {Scheme Procedure} list->char-set list [base_cs]
2278@deffnx {C Function} scm_list_to_char_set (list, base_cs)
2279Convert the character list @var{list} to a character set. If
2280the character set @var{base_cs} is given, the character in this
2281set are also included in the result.
2282@end deffn
2283
2284@deffn {Scheme Procedure} list->char-set! list base_cs
2285@deffnx {C Function} scm_list_to_char_set_x (list, base_cs)
2286Convert the character list @var{list} to a character set. The
2287characters are added to @var{base_cs} and @var{base_cs} is
2288returned.
2289@end deffn
2290
2291@deffn {Scheme Procedure} string->char-set str [base_cs]
2292@deffnx {C Function} scm_string_to_char_set (str, base_cs)
2293Convert the string @var{str} to a character set. If the
2294character set @var{base_cs} is given, the characters in this
2295set are also included in the result.
2296@end deffn
2297
2298@deffn {Scheme Procedure} string->char-set! str base_cs
2299@deffnx {C Function} scm_string_to_char_set_x (str, base_cs)
2300Convert the string @var{str} to a character set. The
2301characters from the string are added to @var{base_cs}, and
2302@var{base_cs} is returned.
2303@end deffn
2304
2305@deffn {Scheme Procedure} char-set-filter pred cs [base_cs]
2306@deffnx {C Function} scm_char_set_filter (pred, cs, base_cs)
2307Return a character set containing every character from @var{cs}
2308so that it satisfies @var{pred}. If provided, the characters
2309from @var{base_cs} are added to the result.
2310@end deffn
2311
2312@deffn {Scheme Procedure} char-set-filter! pred cs base_cs
2313@deffnx {C Function} scm_char_set_filter_x (pred, cs, base_cs)
2314Return a character set containing every character from @var{cs}
2315so that it satisfies @var{pred}. The characters are added to
2316@var{base_cs} and @var{base_cs} is returned.
2317@end deffn
2318
2319@deffn {Scheme Procedure} ucs-range->char-set lower upper [error [base_cs]]
2320@deffnx {C Function} scm_ucs_range_to_char_set (lower, upper, error, base_cs)
2321Return a character set containing all characters whose
2322character codes lie in the half-open range
2323[@var{lower},@var{upper}).
2324
2325If @var{error} is a true value, an error is signalled if the
2326specified range contains characters which are not contained in
2327the implemented character range. If @var{error} is @code{#f},
be3eb25c 2328these characters are silently left out of the resulting
050ab45f
MV
2329character set.
2330
2331The characters in @var{base_cs} are added to the result, if
2332given.
2333@end deffn
2334
2335@deffn {Scheme Procedure} ucs-range->char-set! lower upper error base_cs
2336@deffnx {C Function} scm_ucs_range_to_char_set_x (lower, upper, error, base_cs)
2337Return a character set containing all characters whose
2338character codes lie in the half-open range
2339[@var{lower},@var{upper}).
2340
2341If @var{error} is a true value, an error is signalled if the
2342specified range contains characters which are not contained in
2343the implemented character range. If @var{error} is @code{#f},
be3eb25c 2344these characters are silently left out of the resulting
050ab45f
MV
2345character set.
2346
2347The characters are added to @var{base_cs} and @var{base_cs} is
2348returned.
2349@end deffn
2350
2351@deffn {Scheme Procedure} ->char-set x
2352@deffnx {C Function} scm_to_char_set (x)
be3eb25c
MG
2353Coerces x into a char-set. @var{x} may be a string, character or
2354char-set. A string is converted to the set of its constituent
2355characters; a character is converted to a singleton set; a char-set is
2356returned as-is.
050ab45f
MV
2357@end deffn
2358
2359@c ===================================================================
2360
2361@node Querying Character Sets
2362@subsubsection Querying Character Sets
2363
2364Access the elements and other information of a character set with these
2365procedures.
2366
be3eb25c
MG
2367@deffn {Scheme Procedure} %char-set-dump cs
2368Returns an association list containing debugging information
2369for @var{cs}. The association list has the following entries.
2370@table @code
2371@item char-set
2372The char-set itself
2373@item len
2374The number of groups of contiguous code points the char-set
2375contains
2376@item ranges
2377A list of lists where each sublist is a range of code points
2378and their associated characters
2379@end table
2380The return value of this function cannot be relied upon to be
2381consistent between versions of Guile and should not be used in code.
2382@end deffn
2383
050ab45f
MV
2384@deffn {Scheme Procedure} char-set-size cs
2385@deffnx {C Function} scm_char_set_size (cs)
2386Return the number of elements in character set @var{cs}.
2387@end deffn
2388
2389@deffn {Scheme Procedure} char-set-count pred cs
2390@deffnx {C Function} scm_char_set_count (pred, cs)
2391Return the number of the elements int the character set
2392@var{cs} which satisfy the predicate @var{pred}.
2393@end deffn
2394
2395@deffn {Scheme Procedure} char-set->list cs
2396@deffnx {C Function} scm_char_set_to_list (cs)
2397Return a list containing the elements of the character set
2398@var{cs}.
2399@end deffn
2400
2401@deffn {Scheme Procedure} char-set->string cs
2402@deffnx {C Function} scm_char_set_to_string (cs)
2403Return a string containing the elements of the character set
2404@var{cs}. The order in which the characters are placed in the
2405string is not defined.
2406@end deffn
2407
2408@deffn {Scheme Procedure} char-set-contains? cs ch
2409@deffnx {C Function} scm_char_set_contains_p (cs, ch)
2410Return @code{#t} iff the character @var{ch} is contained in the
2411character set @var{cs}.
2412@end deffn
2413
2414@deffn {Scheme Procedure} char-set-every pred cs
2415@deffnx {C Function} scm_char_set_every (pred, cs)
2416Return a true value if every character in the character set
2417@var{cs} satisfies the predicate @var{pred}.
2418@end deffn
2419
2420@deffn {Scheme Procedure} char-set-any pred cs
2421@deffnx {C Function} scm_char_set_any (pred, cs)
2422Return a true value if any character in the character set
2423@var{cs} satisfies the predicate @var{pred}.
2424@end deffn
2425
2426@c ===================================================================
2427
2428@node Character-Set Algebra
2429@subsubsection Character-Set Algebra
2430
2431Character sets can be manipulated with the common set algebra operation,
2432such as union, complement, intersection etc. All of these procedures
2433provide side-effecting variants, which modify their character set
2434argument(s).
2435
2436@deffn {Scheme Procedure} char-set-adjoin cs . rest
2437@deffnx {C Function} scm_char_set_adjoin (cs, rest)
2438Add all character arguments to the first argument, which must
2439be a character set.
2440@end deffn
2441
2442@deffn {Scheme Procedure} char-set-delete cs . rest
2443@deffnx {C Function} scm_char_set_delete (cs, rest)
2444Delete all character arguments from the first argument, which
2445must be a character set.
2446@end deffn
2447
2448@deffn {Scheme Procedure} char-set-adjoin! cs . rest
2449@deffnx {C Function} scm_char_set_adjoin_x (cs, rest)
2450Add all character arguments to the first argument, which must
2451be a character set.
2452@end deffn
2453
2454@deffn {Scheme Procedure} char-set-delete! cs . rest
2455@deffnx {C Function} scm_char_set_delete_x (cs, rest)
2456Delete all character arguments from the first argument, which
2457must be a character set.
2458@end deffn
2459
2460@deffn {Scheme Procedure} char-set-complement cs
2461@deffnx {C Function} scm_char_set_complement (cs)
2462Return the complement of the character set @var{cs}.
2463@end deffn
2464
be3eb25c
MG
2465Note that the complement of a character set is likely to contain many
2466reserved code points (code points that are not associated with
2467characters). It may be helpful to modify the output of
2468@code{char-set-complement} by computing its intersection with the set
2469of designated code points, @code{char-set:designated}.
2470
050ab45f
MV
2471@deffn {Scheme Procedure} char-set-union . rest
2472@deffnx {C Function} scm_char_set_union (rest)
2473Return the union of all argument character sets.
2474@end deffn
2475
2476@deffn {Scheme Procedure} char-set-intersection . rest
2477@deffnx {C Function} scm_char_set_intersection (rest)
2478Return the intersection of all argument character sets.
2479@end deffn
2480
2481@deffn {Scheme Procedure} char-set-difference cs1 . rest
2482@deffnx {C Function} scm_char_set_difference (cs1, rest)
2483Return the difference of all argument character sets.
2484@end deffn
2485
2486@deffn {Scheme Procedure} char-set-xor . rest
2487@deffnx {C Function} scm_char_set_xor (rest)
2488Return the exclusive-or of all argument character sets.
2489@end deffn
2490
2491@deffn {Scheme Procedure} char-set-diff+intersection cs1 . rest
2492@deffnx {C Function} scm_char_set_diff_plus_intersection (cs1, rest)
2493Return the difference and the intersection of all argument
2494character sets.
2495@end deffn
2496
2497@deffn {Scheme Procedure} char-set-complement! cs
2498@deffnx {C Function} scm_char_set_complement_x (cs)
2499Return the complement of the character set @var{cs}.
2500@end deffn
2501
2502@deffn {Scheme Procedure} char-set-union! cs1 . rest
2503@deffnx {C Function} scm_char_set_union_x (cs1, rest)
2504Return the union of all argument character sets.
2505@end deffn
2506
2507@deffn {Scheme Procedure} char-set-intersection! cs1 . rest
2508@deffnx {C Function} scm_char_set_intersection_x (cs1, rest)
2509Return the intersection of all argument character sets.
2510@end deffn
2511
2512@deffn {Scheme Procedure} char-set-difference! cs1 . rest
2513@deffnx {C Function} scm_char_set_difference_x (cs1, rest)
2514Return the difference of all argument character sets.
2515@end deffn
2516
2517@deffn {Scheme Procedure} char-set-xor! cs1 . rest
2518@deffnx {C Function} scm_char_set_xor_x (cs1, rest)
2519Return the exclusive-or of all argument character sets.
2520@end deffn
2521
2522@deffn {Scheme Procedure} char-set-diff+intersection! cs1 cs2 . rest
2523@deffnx {C Function} scm_char_set_diff_plus_intersection_x (cs1, cs2, rest)
2524Return the difference and the intersection of all argument
2525character sets.
2526@end deffn
2527
2528@c ===================================================================
2529
2530@node Standard Character Sets
2531@subsubsection Standard Character Sets
2532
2533In order to make the use of the character set data type and procedures
2534useful, several predefined character set variables exist.
2535
49dec04b
LC
2536@cindex codeset
2537@cindex charset
2538@cindex locale
2539
be3eb25c
MG
2540These character sets are locale independent and are not recomputed
2541upon a @code{setlocale} call. They contain characters from the whole
2542range of Unicode code points. For instance, @code{char-set:letter}
2543contains about 94,000 characters.
49dec04b 2544
c9dc8c6c
MV
2545@defvr {Scheme Variable} char-set:lower-case
2546@defvrx {C Variable} scm_char_set_lower_case
050ab45f 2547All lower-case characters.
c9dc8c6c 2548@end defvr
050ab45f 2549
c9dc8c6c
MV
2550@defvr {Scheme Variable} char-set:upper-case
2551@defvrx {C Variable} scm_char_set_upper_case
050ab45f 2552All upper-case characters.
c9dc8c6c 2553@end defvr
050ab45f 2554
c9dc8c6c
MV
2555@defvr {Scheme Variable} char-set:title-case
2556@defvrx {C Variable} scm_char_set_title_case
be3eb25c
MG
2557All single characters that function as if they were an upper-case
2558letter followed by a lower-case letter.
c9dc8c6c 2559@end defvr
050ab45f 2560
c9dc8c6c
MV
2561@defvr {Scheme Variable} char-set:letter
2562@defvrx {C Variable} scm_char_set_letter
be3eb25c
MG
2563All letters. This includes @code{char-set:lower-case},
2564@code{char-set:upper-case}, @code{char-set:title-case}, and many
2565letters that have no case at all. For example, Chinese and Japanese
2566characters typically have no concept of case.
c9dc8c6c 2567@end defvr
050ab45f 2568
c9dc8c6c
MV
2569@defvr {Scheme Variable} char-set:digit
2570@defvrx {C Variable} scm_char_set_digit
050ab45f 2571All digits.
c9dc8c6c 2572@end defvr
050ab45f 2573
c9dc8c6c
MV
2574@defvr {Scheme Variable} char-set:letter+digit
2575@defvrx {C Variable} scm_char_set_letter_and_digit
050ab45f 2576The union of @code{char-set:letter} and @code{char-set:digit}.
c9dc8c6c 2577@end defvr
050ab45f 2578
c9dc8c6c
MV
2579@defvr {Scheme Variable} char-set:graphic
2580@defvrx {C Variable} scm_char_set_graphic
050ab45f 2581All characters which would put ink on the paper.
c9dc8c6c 2582@end defvr
050ab45f 2583
c9dc8c6c
MV
2584@defvr {Scheme Variable} char-set:printing
2585@defvrx {C Variable} scm_char_set_printing
050ab45f 2586The union of @code{char-set:graphic} and @code{char-set:whitespace}.
c9dc8c6c 2587@end defvr
050ab45f 2588
c9dc8c6c
MV
2589@defvr {Scheme Variable} char-set:whitespace
2590@defvrx {C Variable} scm_char_set_whitespace
050ab45f 2591All whitespace characters.
c9dc8c6c 2592@end defvr
050ab45f 2593
c9dc8c6c
MV
2594@defvr {Scheme Variable} char-set:blank
2595@defvrx {C Variable} scm_char_set_blank
be3eb25c
MG
2596All horizontal whitespace characters, which notably includes
2597@code{#\space} and @code{#\tab}.
c9dc8c6c 2598@end defvr
050ab45f 2599
c9dc8c6c
MV
2600@defvr {Scheme Variable} char-set:iso-control
2601@defvrx {C Variable} scm_char_set_iso_control
be3eb25c
MG
2602The ISO control characters are the C0 control characters (U+0000 to
2603U+001F), delete (U+007F), and the C1 control characters (U+0080 to
2604U+009F).
c9dc8c6c 2605@end defvr
050ab45f 2606
c9dc8c6c
MV
2607@defvr {Scheme Variable} char-set:punctuation
2608@defvrx {C Variable} scm_char_set_punctuation
be3eb25c
MG
2609All punctuation characters, such as the characters
2610@code{!"#%&'()*,-./:;?@@[\\]_@{@}}
c9dc8c6c 2611@end defvr
050ab45f 2612
c9dc8c6c
MV
2613@defvr {Scheme Variable} char-set:symbol
2614@defvrx {C Variable} scm_char_set_symbol
be3eb25c 2615All symbol characters, such as the characters @code{$+<=>^`|~}.
c9dc8c6c 2616@end defvr
050ab45f 2617
c9dc8c6c
MV
2618@defvr {Scheme Variable} char-set:hex-digit
2619@defvrx {C Variable} scm_char_set_hex_digit
050ab45f 2620The hexadecimal digits @code{0123456789abcdefABCDEF}.
c9dc8c6c 2621@end defvr
050ab45f 2622
c9dc8c6c
MV
2623@defvr {Scheme Variable} char-set:ascii
2624@defvrx {C Variable} scm_char_set_ascii
050ab45f 2625All ASCII characters.
c9dc8c6c 2626@end defvr
050ab45f 2627
c9dc8c6c
MV
2628@defvr {Scheme Variable} char-set:empty
2629@defvrx {C Variable} scm_char_set_empty
050ab45f 2630The empty character set.
c9dc8c6c 2631@end defvr
050ab45f 2632
be3eb25c
MG
2633@defvr {Scheme Variable} char-set:designated
2634@defvrx {C Variable} scm_char_set_designated
2635This character set contains all designated code points. This includes
2636all the code points to which Unicode has assigned a character or other
2637meaning.
2638@end defvr
2639
c9dc8c6c
MV
2640@defvr {Scheme Variable} char-set:full
2641@defvrx {C Variable} scm_char_set_full
be3eb25c
MG
2642This character set contains all possible code points. This includes
2643both designated and reserved code points.
c9dc8c6c 2644@end defvr
07d83abe
MV
2645
2646@node Strings
2647@subsection Strings
2648@tpindex Strings
2649
2650Strings are fixed-length sequences of characters. They can be created
2651by calling constructor procedures, but they can also literally get
2652entered at the @acronym{REPL} or in Scheme source files.
2653
2654@c Guile provides a rich set of string processing procedures, because text
2655@c handling is very important when Guile is used as a scripting language.
2656
2657Strings always carry the information about how many characters they are
2658composed of with them, so there is no special end-of-string character,
2659like in C. That means that Scheme strings can contain any character,
c48c62d0
MV
2660even the @samp{#\nul} character @samp{\0}.
2661
2662To use strings efficiently, you need to know a bit about how Guile
2663implements them. In Guile, a string consists of two parts, a head and
2664the actual memory where the characters are stored. When a string (or
2665a substring of it) is copied, only a new head gets created, the memory
2666is usually not copied. The two heads start out pointing to the same
2667memory.
2668
2669When one of these two strings is modified, as with @code{string-set!},
2670their common memory does get copied so that each string has its own
be3eb25c 2671memory and modifying one does not accidentally modify the other as well.
c48c62d0
MV
2672Thus, Guile's strings are `copy on write'; the actual copying of their
2673memory is delayed until one string is written to.
2674
2675This implementation makes functions like @code{substring} very
2676efficient in the common case that no modifications are done to the
2677involved strings.
2678
2679If you do know that your strings are getting modified right away, you
2680can use @code{substring/copy} instead of @code{substring}. This
2681function performs the copy immediately at the time of creation. This
2682is more efficient, especially in a multi-threaded program. Also,
2683@code{substring/copy} can avoid the problem that a short substring
2684holds on to the memory of a very large original string that could
2685otherwise be recycled.
2686
2687If you want to avoid the copy altogether, so that modifications of one
2688string show up in the other, you can use @code{substring/shared}. The
2689strings created by this procedure are called @dfn{mutation sharing
2690substrings} since the substring and the original string share
2691modifications to each other.
07d83abe 2692
05256760
MV
2693If you want to prevent modifications, use @code{substring/read-only}.
2694
c9dc8c6c
MV
2695Guile provides all procedures of SRFI-13 and a few more.
2696
07d83abe 2697@menu
5676b4fa
MV
2698* String Syntax:: Read syntax for strings.
2699* String Predicates:: Testing strings for certain properties.
2700* String Constructors:: Creating new string objects.
2701* List/String Conversion:: Converting from/to lists of characters.
2702* String Selection:: Select portions from strings.
2703* String Modification:: Modify parts or whole strings.
2704* String Comparison:: Lexicographic ordering predicates.
2705* String Searching:: Searching in strings.
2706* Alphabetic Case Mapping:: Convert the alphabetic case of strings.
2707* Reversing and Appending Strings:: Appending strings to form a new string.
2708* Mapping Folding and Unfolding:: Iterating over strings.
2709* Miscellaneous String Operations:: Replicating, insertion, parsing, ...
67af975c 2710* Conversion to/from C::
5b6b22e8 2711* String Internals:: The storage strategy for strings.
07d83abe
MV
2712@end menu
2713
2714@node String Syntax
2715@subsubsection String Read Syntax
2716
2717@c In the following @code is used to get a good font in TeX etc, but
2718@c is omitted for Info format, so as not to risk any confusion over
2719@c whether surrounding ` ' quotes are part of the escape or are
2720@c special in a string (they're not).
2721
2722The read syntax for strings is an arbitrarily long sequence of
c48c62d0 2723characters enclosed in double quotes (@nicode{"}).
07d83abe 2724
67af975c
MG
2725Backslash is an escape character and can be used to insert the following
2726special characters. @nicode{\"} and @nicode{\\} are R5RS standard, the
2727next seven are R6RS standard --- notice they follow C syntax --- and the
2728remaining four are Guile extensions.
07d83abe
MV
2729
2730@table @asis
2731@item @nicode{\\}
2732Backslash character.
2733
2734@item @nicode{\"}
2735Double quote character (an unescaped @nicode{"} is otherwise the end
2736of the string).
2737
07d83abe
MV
2738@item @nicode{\a}
2739Bell character (ASCII 7).
2740
2741@item @nicode{\f}
2742Formfeed character (ASCII 12).
2743
2744@item @nicode{\n}
2745Newline character (ASCII 10).
2746
2747@item @nicode{\r}
2748Carriage return character (ASCII 13).
2749
2750@item @nicode{\t}
2751Tab character (ASCII 9).
2752
2753@item @nicode{\v}
2754Vertical tab character (ASCII 11).
2755
67a4a16d
MG
2756@item @nicode{\b}
2757Backspace character (ASCII 8).
2758
67af975c
MG
2759@item @nicode{\0}
2760NUL character (ASCII 0).
2761
c869f0c1
AW
2762@item @nicode{\} followed by newline (ASCII 10)
2763Nothing. This way if @nicode{\} is the last character in a line, the
2764string will continue with the first character from the next line,
2765without a line break.
2766
2767If the @code{hungry-eol-escapes} reader option is enabled, which is not
2768the case by default, leading whitespace on the next line is discarded.
2769
2770@lisp
2771"foo\
2772 bar"
2773@result{} "foo bar"
2774(read-enable 'hungry-eol-escapes)
2775"foo\
2776 bar"
2777@result{} "foobar"
2778@end lisp
07d83abe
MV
2779@item @nicode{\xHH}
2780Character code given by two hexadecimal digits. For example
2781@nicode{\x7f} for an ASCII DEL (127).
28cc8dac
MG
2782
2783@item @nicode{\uHHHH}
2784Character code given by four hexadecimal digits. For example
2785@nicode{\u0100} for a capital A with macron (U+0100).
2786
2787@item @nicode{\UHHHHHH}
2788Character code given by six hexadecimal digits. For example
2789@nicode{\U010402}.
07d83abe
MV
2790@end table
2791
2792@noindent
2793The following are examples of string literals:
2794
2795@lisp
2796"foo"
2797"bar plonk"
2798"Hello World"
2799"\"Hi\", he said."
2800@end lisp
2801
6ea30487
MG
2802The three escape sequences @code{\xHH}, @code{\uHHHH} and @code{\UHHHHHH} were
2803chosen to not break compatibility with code written for previous versions of
2804Guile. The R6RS specification suggests a different, incompatible syntax for hex
2805escapes: @code{\xHHHH;} -- a character code followed by one to eight hexadecimal
2806digits terminated with a semicolon. If this escape format is desired instead,
2807it can be enabled with the reader option @code{r6rs-hex-escapes}.
2808
2809@lisp
2810(read-enable 'r6rs-hex-escapes)
2811@end lisp
2812
1518f649 2813For more on reader options, @xref{Scheme Read}.
07d83abe
MV
2814
2815@node String Predicates
2816@subsubsection String Predicates
2817
2818The following procedures can be used to check whether a given string
2819fulfills some specified property.
2820
2821@rnindex string?
2822@deffn {Scheme Procedure} string? obj
2823@deffnx {C Function} scm_string_p (obj)
2824Return @code{#t} if @var{obj} is a string, else @code{#f}.
2825@end deffn
2826
91210d62
MV
2827@deftypefn {C Function} int scm_is_string (SCM obj)
2828Returns @code{1} if @var{obj} is a string, @code{0} otherwise.
2829@end deftypefn
2830
07d83abe
MV
2831@deffn {Scheme Procedure} string-null? str
2832@deffnx {C Function} scm_string_null_p (str)
2833Return @code{#t} if @var{str}'s length is zero, and
2834@code{#f} otherwise.
2835@lisp
2836(string-null? "") @result{} #t
2837y @result{} "foo"
2838(string-null? y) @result{} #f
2839@end lisp
2840@end deffn
2841
5676b4fa
MV
2842@deffn {Scheme Procedure} string-any char_pred s [start [end]]
2843@deffnx {C Function} scm_string_any (char_pred, s, start, end)
c100a12c 2844Check if @var{char_pred} is true for any character in string @var{s}.
5676b4fa 2845
c100a12c
KR
2846@var{char_pred} can be a character to check for any equal to that, or
2847a character set (@pxref{Character Sets}) to check for any in that set,
2848or a predicate procedure to call.
5676b4fa 2849
c100a12c
KR
2850For a procedure, calls @code{(@var{char_pred} c)} are made
2851successively on the characters from @var{start} to @var{end}. If
2852@var{char_pred} returns true (ie.@: non-@code{#f}), @code{string-any}
2853stops and that return value is the return from @code{string-any}. The
2854call on the last character (ie.@: at @math{@var{end}-1}), if that
2855point is reached, is a tail call.
2856
2857If there are no characters in @var{s} (ie.@: @var{start} equals
2858@var{end}) then the return is @code{#f}.
5676b4fa
MV
2859@end deffn
2860
2861@deffn {Scheme Procedure} string-every char_pred s [start [end]]
2862@deffnx {C Function} scm_string_every (char_pred, s, start, end)
c100a12c
KR
2863Check if @var{char_pred} is true for every character in string
2864@var{s}.
5676b4fa 2865
c100a12c
KR
2866@var{char_pred} can be a character to check for every character equal
2867to that, or a character set (@pxref{Character Sets}) to check for
2868every character being in that set, or a predicate procedure to call.
2869
2870For a procedure, calls @code{(@var{char_pred} c)} are made
2871successively on the characters from @var{start} to @var{end}. If
2872@var{char_pred} returns @code{#f}, @code{string-every} stops and
2873returns @code{#f}. The call on the last character (ie.@: at
2874@math{@var{end}-1}), if that point is reached, is a tail call and the
2875return from that call is the return from @code{string-every}.
5676b4fa
MV
2876
2877If there are no characters in @var{s} (ie.@: @var{start} equals
2878@var{end}) then the return is @code{#t}.
5676b4fa
MV
2879@end deffn
2880
07d83abe
MV
2881@node String Constructors
2882@subsubsection String Constructors
2883
2884The string constructor procedures create new string objects, possibly
c48c62d0
MV
2885initializing them with some specified character data. See also
2886@xref{String Selection}, for ways to create strings from existing
2887strings.
07d83abe
MV
2888
2889@c FIXME::martin: list->string belongs into `List/String Conversion'
2890
bba26c32 2891@deffn {Scheme Procedure} string char@dots{}
07d83abe 2892@rnindex string
bba26c32
KR
2893Return a newly allocated string made from the given character
2894arguments.
2895
2896@example
2897(string #\x #\y #\z) @result{} "xyz"
2898(string) @result{} ""
2899@end example
2900@end deffn
2901
2902@deffn {Scheme Procedure} list->string lst
2903@deffnx {C Function} scm_string (lst)
07d83abe 2904@rnindex list->string
bba26c32
KR
2905Return a newly allocated string made from a list of characters.
2906
2907@example
2908(list->string '(#\a #\b #\c)) @result{} "abc"
2909@end example
2910@end deffn
2911
2912@deffn {Scheme Procedure} reverse-list->string lst
2913@deffnx {C Function} scm_reverse_list_to_string (lst)
2914Return a newly allocated string made from a list of characters, in
2915reverse order.
2916
2917@example
2918(reverse-list->string '(#\a #\B #\c)) @result{} "cBa"
2919@end example
07d83abe
MV
2920@end deffn
2921
2922@rnindex make-string
2923@deffn {Scheme Procedure} make-string k [chr]
2924@deffnx {C Function} scm_make_string (k, chr)
2925Return a newly allocated string of
2926length @var{k}. If @var{chr} is given, then all elements of
2927the string are initialized to @var{chr}, otherwise the contents
2928of the @var{string} are unspecified.
2929@end deffn
2930
c48c62d0
MV
2931@deftypefn {C Function} SCM scm_c_make_string (size_t len, SCM chr)
2932Like @code{scm_make_string}, but expects the length as a
2933@code{size_t}.
2934@end deftypefn
2935
5676b4fa
MV
2936@deffn {Scheme Procedure} string-tabulate proc len
2937@deffnx {C Function} scm_string_tabulate (proc, len)
2938@var{proc} is an integer->char procedure. Construct a string
2939of size @var{len} by applying @var{proc} to each index to
2940produce the corresponding string element. The order in which
2941@var{proc} is applied to the indices is not specified.
2942@end deffn
2943
5676b4fa
MV
2944@deffn {Scheme Procedure} string-join ls [delimiter [grammar]]
2945@deffnx {C Function} scm_string_join (ls, delimiter, grammar)
2946Append the string in the string list @var{ls}, using the string
2947@var{delim} as a delimiter between the elements of @var{ls}.
2948@var{grammar} is a symbol which specifies how the delimiter is
2949placed between the strings, and defaults to the symbol
2950@code{infix}.
2951
2952@table @code
2953@item infix
2954Insert the separator between list elements. An empty string
2955will produce an empty list.
2956@item string-infix
2957Like @code{infix}, but will raise an error if given the empty
2958list.
2959@item suffix
2960Insert the separator after every list element.
2961@item prefix
2962Insert the separator before each list element.
2963@end table
2964@end deffn
2965
07d83abe
MV
2966@node List/String Conversion
2967@subsubsection List/String conversion
2968
2969When processing strings, it is often convenient to first convert them
2970into a list representation by using the procedure @code{string->list},
2971work with the resulting list, and then convert it back into a string.
2972These procedures are useful for similar tasks.
2973
2974@rnindex string->list
5676b4fa
MV
2975@deffn {Scheme Procedure} string->list str [start [end]]
2976@deffnx {C Function} scm_substring_to_list (str, start, end)
07d83abe 2977@deffnx {C Function} scm_string_to_list (str)
5676b4fa 2978Convert the string @var{str} into a list of characters.
07d83abe
MV
2979@end deffn
2980
2981@deffn {Scheme Procedure} string-split str chr
2982@deffnx {C Function} scm_string_split (str, chr)
2983Split the string @var{str} into the a list of the substrings delimited
2984by appearances of the character @var{chr}. Note that an empty substring
2985between separator characters will result in an empty string in the
2986result list.
2987
2988@lisp
2989(string-split "root:x:0:0:root:/root:/bin/bash" #\:)
2990@result{}
2991("root" "x" "0" "0" "root" "/root" "/bin/bash")
2992
2993(string-split "::" #\:)
2994@result{}
2995("" "" "")
2996
2997(string-split "" #\:)
2998@result{}
2999("")
3000@end lisp
3001@end deffn
3002
3003
3004@node String Selection
3005@subsubsection String Selection
3006
3007Portions of strings can be extracted by these procedures.
3008@code{string-ref} delivers individual characters whereas
3009@code{substring} can be used to extract substrings from longer strings.
3010
3011@rnindex string-length
3012@deffn {Scheme Procedure} string-length string
3013@deffnx {C Function} scm_string_length (string)
3014Return the number of characters in @var{string}.
3015@end deffn
3016
c48c62d0
MV
3017@deftypefn {C Function} size_t scm_c_string_length (SCM str)
3018Return the number of characters in @var{str} as a @code{size_t}.
3019@end deftypefn
3020
07d83abe
MV
3021@rnindex string-ref
3022@deffn {Scheme Procedure} string-ref str k
3023@deffnx {C Function} scm_string_ref (str, k)
3024Return character @var{k} of @var{str} using zero-origin
3025indexing. @var{k} must be a valid index of @var{str}.
3026@end deffn
3027
c48c62d0
MV
3028@deftypefn {C Function} SCM scm_c_string_ref (SCM str, size_t k)
3029Return character @var{k} of @var{str} using zero-origin
3030indexing. @var{k} must be a valid index of @var{str}.
3031@end deftypefn
3032
07d83abe 3033@rnindex string-copy
5676b4fa
MV
3034@deffn {Scheme Procedure} string-copy str [start [end]]
3035@deffnx {C Function} scm_substring_copy (str, start, end)
07d83abe 3036@deffnx {C Function} scm_string_copy (str)
5676b4fa 3037Return a copy of the given string @var{str}.
c48c62d0
MV
3038
3039The returned string shares storage with @var{str} initially, but it is
3040copied as soon as one of the two strings is modified.
07d83abe
MV
3041@end deffn
3042
3043@rnindex substring
3044@deffn {Scheme Procedure} substring str start [end]
3045@deffnx {C Function} scm_substring (str, start, end)
c48c62d0 3046Return a new string formed from the characters
07d83abe
MV
3047of @var{str} beginning with index @var{start} (inclusive) and
3048ending with index @var{end} (exclusive).
3049@var{str} must be a string, @var{start} and @var{end} must be
3050exact integers satisfying:
3051
30520 <= @var{start} <= @var{end} <= @code{(string-length @var{str})}.
c48c62d0
MV
3053
3054The returned string shares storage with @var{str} initially, but it is
3055copied as soon as one of the two strings is modified.
3056@end deffn
3057
3058@deffn {Scheme Procedure} substring/shared str start [end]
3059@deffnx {C Function} scm_substring_shared (str, start, end)
3060Like @code{substring}, but the strings continue to share their storage
3061even if they are modified. Thus, modifications to @var{str} show up
3062in the new string, and vice versa.
3063@end deffn
3064
3065@deffn {Scheme Procedure} substring/copy str start [end]
3066@deffnx {C Function} scm_substring_copy (str, start, end)
3067Like @code{substring}, but the storage for the new string is copied
3068immediately.
07d83abe
MV
3069@end deffn
3070
05256760
MV
3071@deffn {Scheme Procedure} substring/read-only str start [end]
3072@deffnx {C Function} scm_substring_read_only (str, start, end)
3073Like @code{substring}, but the resulting string can not be modified.
3074@end deffn
3075
c48c62d0
MV
3076@deftypefn {C Function} SCM scm_c_substring (SCM str, size_t start, size_t end)
3077@deftypefnx {C Function} SCM scm_c_substring_shared (SCM str, size_t start, size_t end)
3078@deftypefnx {C Function} SCM scm_c_substring_copy (SCM str, size_t start, size_t end)
05256760 3079@deftypefnx {C Function} SCM scm_c_substring_read_only (SCM str, size_t start, size_t end)
c48c62d0
MV
3080Like @code{scm_substring}, etc. but the bounds are given as a @code{size_t}.
3081@end deftypefn
3082
5676b4fa
MV
3083@deffn {Scheme Procedure} string-take s n
3084@deffnx {C Function} scm_string_take (s, n)
3085Return the @var{n} first characters of @var{s}.
3086@end deffn
3087
3088@deffn {Scheme Procedure} string-drop s n
3089@deffnx {C Function} scm_string_drop (s, n)
3090Return all but the first @var{n} characters of @var{s}.
3091@end deffn
3092
3093@deffn {Scheme Procedure} string-take-right s n
3094@deffnx {C Function} scm_string_take_right (s, n)
3095Return the @var{n} last characters of @var{s}.
3096@end deffn
3097
3098@deffn {Scheme Procedure} string-drop-right s n
3099@deffnx {C Function} scm_string_drop_right (s, n)
3100Return all but the last @var{n} characters of @var{s}.
3101@end deffn
3102
3103@deffn {Scheme Procedure} string-pad s len [chr [start [end]]]
6337e7fb 3104@deffnx {Scheme Procedure} string-pad-right s len [chr [start [end]]]
5676b4fa 3105@deffnx {C Function} scm_string_pad (s, len, chr, start, end)
5676b4fa 3106@deffnx {C Function} scm_string_pad_right (s, len, chr, start, end)
6337e7fb
KR
3107Take characters @var{start} to @var{end} from the string @var{s} and
3108either pad with @var{char} or truncate them to give @var{len}
3109characters.
3110
3111@code{string-pad} pads or truncates on the left, so for example
3112
3113@example
3114(string-pad "x" 3) @result{} " x"
3115(string-pad "abcde" 3) @result{} "cde"
3116@end example
3117
3118@code{string-pad-right} pads or truncates on the right, so for example
3119
3120@example
3121(string-pad-right "x" 3) @result{} "x "
3122(string-pad-right "abcde" 3) @result{} "abc"
3123@end example
5676b4fa
MV
3124@end deffn
3125
3126@deffn {Scheme Procedure} string-trim s [char_pred [start [end]]]
dc297bb7
KR
3127@deffnx {Scheme Procedure} string-trim-right s [char_pred [start [end]]]
3128@deffnx {Scheme Procedure} string-trim-both s [char_pred [start [end]]]
5676b4fa 3129@deffnx {C Function} scm_string_trim (s, char_pred, start, end)
5676b4fa 3130@deffnx {C Function} scm_string_trim_right (s, char_pred, start, end)
5676b4fa 3131@deffnx {C Function} scm_string_trim_both (s, char_pred, start, end)
be3eb25c 3132Trim occurrences of @var{char_pred} from the ends of @var{s}.
5676b4fa 3133
dc297bb7
KR
3134@code{string-trim} trims @var{char_pred} characters from the left
3135(start) of the string, @code{string-trim-right} trims them from the
3136right (end) of the string, @code{string-trim-both} trims from both
3137ends.
5676b4fa 3138
dc297bb7
KR
3139@var{char_pred} can be a character, a character set, or a predicate
3140procedure to call on each character. If @var{char_pred} is not given
3141the default is whitespace as per @code{char-set:whitespace}
3142(@pxref{Standard Character Sets}).
5676b4fa 3143
dc297bb7
KR
3144@example
3145(string-trim " x ") @result{} "x "
3146(string-trim-right "banana" #\a) @result{} "banan"
3147(string-trim-both ".,xy:;" char-set:punctuation)
3148 @result{} "xy"
3149(string-trim-both "xyzzy" (lambda (c)
3150 (or (eqv? c #\x)
3151 (eqv? c #\y))))
3152 @result{} "zz"
3153@end example
5676b4fa
MV
3154@end deffn
3155
07d83abe
MV
3156@node String Modification
3157@subsubsection String Modification
3158
3159These procedures are for modifying strings in-place. This means that the
3160result of the operation is not a new string; instead, the original string's
3161memory representation is modified.
3162
3163@rnindex string-set!
3164@deffn {Scheme Procedure} string-set! str k chr
3165@deffnx {C Function} scm_string_set_x (str, k, chr)
3166Store @var{chr} in element @var{k} of @var{str} and return
3167an unspecified value. @var{k} must be a valid index of
3168@var{str}.
3169@end deffn
3170
c48c62d0
MV
3171@deftypefn {C Function} void scm_c_string_set_x (SCM str, size_t k, SCM chr)
3172Like @code{scm_string_set_x}, but the index is given as a @code{size_t}.
3173@end deftypefn
3174
07d83abe 3175@rnindex string-fill!
5676b4fa
MV
3176@deffn {Scheme Procedure} string-fill! str chr [start [end]]
3177@deffnx {C Function} scm_substring_fill_x (str, chr, start, end)
07d83abe 3178@deffnx {C Function} scm_string_fill_x (str, chr)
5676b4fa
MV
3179Stores @var{chr} in every element of the given @var{str} and
3180returns an unspecified value.
07d83abe
MV
3181@end deffn
3182
3183@deffn {Scheme Procedure} substring-fill! str start end fill
3184@deffnx {C Function} scm_substring_fill_x (str, start, end, fill)
3185Change every character in @var{str} between @var{start} and
3186@var{end} to @var{fill}.
3187
3188@lisp
3189(define y "abcdefg")
3190(substring-fill! y 1 3 #\r)
3191y
3192@result{} "arrdefg"
3193@end lisp
3194@end deffn
3195
3196@deffn {Scheme Procedure} substring-move! str1 start1 end1 str2 start2
3197@deffnx {C Function} scm_substring_move_x (str1, start1, end1, str2, start2)
3198Copy the substring of @var{str1} bounded by @var{start1} and @var{end1}
3199into @var{str2} beginning at position @var{start2}.
3200@var{str1} and @var{str2} can be the same string.
3201@end deffn
3202
5676b4fa
MV
3203@deffn {Scheme Procedure} string-copy! target tstart s [start [end]]
3204@deffnx {C Function} scm_string_copy_x (target, tstart, s, start, end)
3205Copy the sequence of characters from index range [@var{start},
3206@var{end}) in string @var{s} to string @var{target}, beginning
3207at index @var{tstart}. The characters are copied left-to-right
3208or right-to-left as needed -- the copy is guaranteed to work,
3209even if @var{target} and @var{s} are the same string. It is an
3210error if the copy operation runs off the end of the target
3211string.
3212@end deffn
3213
07d83abe
MV
3214
3215@node String Comparison
3216@subsubsection String Comparison
3217
3218The procedures in this section are similar to the character ordering
3219predicates (@pxref{Characters}), but are defined on character sequences.
07d83abe 3220
5676b4fa 3221The first set is specified in R5RS and has names that end in @code{?}.
28cc8dac 3222The second set is specified in SRFI-13 and the names have not ending
67af975c 3223@code{?}.
28cc8dac
MG
3224
3225The predicates ending in @code{-ci} ignore the character case
3226when comparing strings. For now, case-insensitive comparison is done
3227using the R5RS rules, where every lower-case character that has a
3228single character upper-case form is converted to uppercase before
3229comparison. See @xref{Text Collation, the @code{(ice-9
b89c4943 3230i18n)} module}, for locale-dependent string comparison.
07d83abe
MV
3231
3232@rnindex string=?
3323ec06
NJ
3233@deffn {Scheme Procedure} string=? [s1 [s2 . rest]]
3234@deffnx {C Function} scm_i_string_equal_p (s1, s2, rest)
07d83abe
MV
3235Lexicographic equality predicate; return @code{#t} if the two
3236strings are the same length and contain the same characters in
3237the same positions, otherwise return @code{#f}.
3238
3239The procedure @code{string-ci=?} treats upper and lower case
3240letters as though they were the same character, but
3241@code{string=?} treats upper and lower case as distinct
3242characters.
3243@end deffn
3244
3245@rnindex string<?
3323ec06
NJ
3246@deffn {Scheme Procedure} string<? [s1 [s2 . rest]]
3247@deffnx {C Function} scm_i_string_less_p (s1, s2, rest)
07d83abe
MV
3248Lexicographic ordering predicate; return @code{#t} if @var{s1}
3249is lexicographically less than @var{s2}.
3250@end deffn
3251
3252@rnindex string<=?
3323ec06
NJ
3253@deffn {Scheme Procedure} string<=? [s1 [s2 . rest]]
3254@deffnx {C Function} scm_i_string_leq_p (s1, s2, rest)
07d83abe
MV
3255Lexicographic ordering predicate; return @code{#t} if @var{s1}
3256is lexicographically less than or equal to @var{s2}.
3257@end deffn
3258
3259@rnindex string>?
3323ec06
NJ
3260@deffn {Scheme Procedure} string>? [s1 [s2 . rest]]
3261@deffnx {C Function} scm_i_string_gr_p (s1, s2, rest)
07d83abe
MV
3262Lexicographic ordering predicate; return @code{#t} if @var{s1}
3263is lexicographically greater than @var{s2}.
3264@end deffn
3265
3266@rnindex string>=?
3323ec06
NJ
3267@deffn {Scheme Procedure} string>=? [s1 [s2 . rest]]
3268@deffnx {C Function} scm_i_string_geq_p (s1, s2, rest)
07d83abe
MV
3269Lexicographic ordering predicate; return @code{#t} if @var{s1}
3270is lexicographically greater than or equal to @var{s2}.
3271@end deffn
3272
3273@rnindex string-ci=?
3323ec06
NJ
3274@deffn {Scheme Procedure} string-ci=? [s1 [s2 . rest]]
3275@deffnx {C Function} scm_i_string_ci_equal_p (s1, s2, rest)
07d83abe
MV
3276Case-insensitive string equality predicate; return @code{#t} if
3277the two strings are the same length and their component
3278characters match (ignoring case) at each position; otherwise
3279return @code{#f}.
3280@end deffn
3281
5676b4fa 3282@rnindex string-ci<?
3323ec06
NJ
3283@deffn {Scheme Procedure} string-ci<? [s1 [s2 . rest]]
3284@deffnx {C Function} scm_i_string_ci_less_p (s1, s2, rest)
07d83abe
MV
3285Case insensitive lexicographic ordering predicate; return
3286@code{#t} if @var{s1} is lexicographically less than @var{s2}
3287regardless of case.
3288@end deffn
3289
3290@rnindex string<=?
3323ec06
NJ
3291@deffn {Scheme Procedure} string-ci<=? [s1 [s2 . rest]]
3292@deffnx {C Function} scm_i_string_ci_leq_p (s1, s2, rest)
07d83abe
MV
3293Case insensitive lexicographic ordering predicate; return
3294@code{#t} if @var{s1} is lexicographically less than or equal
3295to @var{s2} regardless of case.
3296@end deffn
3297
3298@rnindex string-ci>?
3323ec06
NJ
3299@deffn {Scheme Procedure} string-ci>? [s1 [s2 . rest]]
3300@deffnx {C Function} scm_i_string_ci_gr_p (s1, s2, rest)
07d83abe
MV
3301Case insensitive lexicographic ordering predicate; return
3302@code{#t} if @var{s1} is lexicographically greater than
3303@var{s2} regardless of case.
3304@end deffn
3305
3306@rnindex string-ci>=?
3323ec06
NJ
3307@deffn {Scheme Procedure} string-ci>=? [s1 [s2 . rest]]
3308@deffnx {C Function} scm_i_string_ci_geq_p (s1, s2, rest)
07d83abe
MV
3309Case insensitive lexicographic ordering predicate; return
3310@code{#t} if @var{s1} is lexicographically greater than or
3311equal to @var{s2} regardless of case.
3312@end deffn
3313
5676b4fa
MV
3314@deffn {Scheme Procedure} string-compare s1 s2 proc_lt proc_eq proc_gt [start1 [end1 [start2 [end2]]]]
3315@deffnx {C Function} scm_string_compare (s1, s2, proc_lt, proc_eq, proc_gt, start1, end1, start2, end2)
3316Apply @var{proc_lt}, @var{proc_eq}, @var{proc_gt} to the
3317mismatch index, depending upon whether @var{s1} is less than,
3318equal to, or greater than @var{s2}. The mismatch index is the
3319largest index @var{i} such that for every 0 <= @var{j} <
3320@var{i}, @var{s1}[@var{j}] = @var{s2}[@var{j}] -- that is,
3321@var{i} is the first position that does not match.
3322@end deffn
3323
3324@deffn {Scheme Procedure} string-compare-ci s1 s2 proc_lt proc_eq proc_gt [start1 [end1 [start2 [end2]]]]
3325@deffnx {C Function} scm_string_compare_ci (s1, s2, proc_lt, proc_eq, proc_gt, start1, end1, start2, end2)
3326Apply @var{proc_lt}, @var{proc_eq}, @var{proc_gt} to the
3327mismatch index, depending upon whether @var{s1} is less than,
3328equal to, or greater than @var{s2}. The mismatch index is the
3329largest index @var{i} such that for every 0 <= @var{j} <
3330@var{i}, @var{s1}[@var{j}] = @var{s2}[@var{j}] -- that is,
3323ec06
NJ
3331@var{i} is the first position where the lowercased letters
3332do not match.
3333
5676b4fa
MV
3334@end deffn
3335
3336@deffn {Scheme Procedure} string= s1 s2 [start1 [end1 [start2 [end2]]]]
3337@deffnx {C Function} scm_string_eq (s1, s2, start1, end1, start2, end2)
3338Return @code{#f} if @var{s1} and @var{s2} are not equal, a true
3339value otherwise.
3340@end deffn
3341
3342@deffn {Scheme Procedure} string<> s1 s2 [start1 [end1 [start2 [end2]]]]
3343@deffnx {C Function} scm_string_neq (s1, s2, start1, end1, start2, end2)
3344Return @code{#f} if @var{s1} and @var{s2} are equal, a true
3345value otherwise.
3346@end deffn
3347
3348@deffn {Scheme Procedure} string< s1 s2 [start1 [end1 [start2 [end2]]]]
3349@deffnx {C Function} scm_string_lt (s1, s2, start1, end1, start2, end2)
3350Return @code{#f} if @var{s1} is greater or equal to @var{s2}, a
3351true value otherwise.
3352@end deffn
3353
3354@deffn {Scheme Procedure} string> s1 s2 [start1 [end1 [start2 [end2]]]]
3355@deffnx {C Function} scm_string_gt (s1, s2, start1, end1, start2, end2)
3356Return @code{#f} if @var{s1} is less or equal to @var{s2}, a
3357true value otherwise.
3358@end deffn
3359
3360@deffn {Scheme Procedure} string<= s1 s2 [start1 [end1 [start2 [end2]]]]
3361@deffnx {C Function} scm_string_le (s1, s2, start1, end1, start2, end2)
3362Return @code{#f} if @var{s1} is greater to @var{s2}, a true
3363value otherwise.
3364@end deffn
3365
3366@deffn {Scheme Procedure} string>= s1 s2 [start1 [end1 [start2 [end2]]]]
3367@deffnx {C Function} scm_string_ge (s1, s2, start1, end1, start2, end2)
3368Return @code{#f} if @var{s1} is less to @var{s2}, a true value
3369otherwise.
3370@end deffn
3371
3372@deffn {Scheme Procedure} string-ci= s1 s2 [start1 [end1 [start2 [end2]]]]
3373@deffnx {C Function} scm_string_ci_eq (s1, s2, start1, end1, start2, end2)
3374Return @code{#f} if @var{s1} and @var{s2} are not equal, a true
3375value otherwise. The character comparison is done
3376case-insensitively.
3377@end deffn
3378
3379@deffn {Scheme Procedure} string-ci<> s1 s2 [start1 [end1 [start2 [end2]]]]
3380@deffnx {C Function} scm_string_ci_neq (s1, s2, start1, end1, start2, end2)
3381Return @code{#f} if @var{s1} and @var{s2} are equal, a true
3382value otherwise. The character comparison is done
3383case-insensitively.
3384@end deffn
3385
3386@deffn {Scheme Procedure} string-ci< s1 s2 [start1 [end1 [start2 [end2]]]]
3387@deffnx {C Function} scm_string_ci_lt (s1, s2, start1, end1, start2, end2)
3388Return @code{#f} if @var{s1} is greater or equal to @var{s2}, a
3389true value otherwise. The character comparison is done
3390case-insensitively.
3391@end deffn
3392
3393@deffn {Scheme Procedure} string-ci> s1 s2 [start1 [end1 [start2 [end2]]]]
3394@deffnx {C Function} scm_string_ci_gt (s1, s2, start1, end1, start2, end2)
3395Return @code{#f} if @var{s1} is less or equal to @var{s2}, a
3396true value otherwise. The character comparison is done
3397case-insensitively.
3398@end deffn
3399
3400@deffn {Scheme Procedure} string-ci<= s1 s2 [start1 [end1 [start2 [end2]]]]
3401@deffnx {C Function} scm_string_ci_le (s1, s2, start1, end1, start2, end2)
3402Return @code{#f} if @var{s1} is greater to @var{s2}, a true
3403value otherwise. The character comparison is done
3404case-insensitively.
3405@end deffn
3406
3407@deffn {Scheme Procedure} string-ci>= s1 s2 [start1 [end1 [start2 [end2]]]]
3408@deffnx {C Function} scm_string_ci_ge (s1, s2, start1, end1, start2, end2)
3409Return @code{#f} if @var{s1} is less to @var{s2}, a true value
3410otherwise. The character comparison is done
3411case-insensitively.
3412@end deffn
3413
3414@deffn {Scheme Procedure} string-hash s [bound [start [end]]]
3415@deffnx {C Function} scm_substring_hash (s, bound, start, end)
3416Compute a hash value for @var{S}. the optional argument @var{bound} is a non-negative exact integer specifying the range of the hash function. A positive value restricts the return value to the range [0,bound).
3417@end deffn
3418
3419@deffn {Scheme Procedure} string-hash-ci s [bound [start [end]]]
3420@deffnx {C Function} scm_substring_hash_ci (s, bound, start, end)
3421Compute a hash value for @var{S}. the optional argument @var{bound} is a non-negative exact integer specifying the range of the hash function. A positive value restricts the return value to the range [0,bound).
3422@end deffn
07d83abe 3423
edb7bb47
JG
3424Because the same visual appearance of an abstract Unicode character can
3425be obtained via multiple sequences of Unicode characters, even the
3426case-insensitive string comparison functions described above may return
3427@code{#f} when presented with strings containing different
3428representations of the same character. For example, the Unicode
3429character ``LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE'' can be
3430represented with a single character (U+1E69) or by the character ``LATIN
3431SMALL LETTER S'' (U+0073) followed by the combining marks ``COMBINING
3432DOT BELOW'' (U+0323) and ``COMBINING DOT ABOVE'' (U+0307).
3433
3434For this reason, it is often desirable to ensure that the strings
3435to be compared are using a mutually consistent representation for every
3436character. The Unicode standard defines two methods of normalizing the
3437contents of strings: Decomposition, which breaks composite characters
3438into a set of constituent characters with an ordering defined by the
3439Unicode Standard; and composition, which performs the converse.
3440
3441There are two decomposition operations. ``Canonical decomposition''
3442produces character sequences that share the same visual appearance as
3443the original characters, while ``compatiblity decomposition'' produces
3444ones whose visual appearances may differ from the originals but which
3445represent the same abstract character.
3446
3447These operations are encapsulated in the following set of normalization
3448forms:
3449
3450@table @dfn
3451@item NFD
3452Characters are decomposed to their canonical forms.
3453
3454@item NFKD
3455Characters are decomposed to their compatibility forms.
3456
3457@item NFC
3458Characters are decomposed to their canonical forms, then composed.
3459
3460@item NFKC
3461Characters are decomposed to their compatibility forms, then composed.
3462
3463@end table
3464
3465The functions below put their arguments into one of the forms described
3466above.
3467
3468@deffn {Scheme Procedure} string-normalize-nfd s
3469@deffnx {C Function} scm_string_normalize_nfd (s)
3470Return the @code{NFD} normalized form of @var{s}.
3471@end deffn
3472
3473@deffn {Scheme Procedure} string-normalize-nfkd s
3474@deffnx {C Function} scm_string_normalize_nfkd (s)
3475Return the @code{NFKD} normalized form of @var{s}.
3476@end deffn
3477
3478@deffn {Scheme Procedure} string-normalize-nfc s
3479@deffnx {C Function} scm_string_normalize_nfc (s)
3480Return the @code{NFC} normalized form of @var{s}.
3481@end deffn
3482
3483@deffn {Scheme Procedure} string-normalize-nfkc s
3484@deffnx {C Function} scm_string_normalize_nfkc (s)
3485Return the @code{NFKC} normalized form of @var{s}.
3486@end deffn
3487
07d83abe
MV
3488@node String Searching
3489@subsubsection String Searching
3490
5676b4fa
MV
3491@deffn {Scheme Procedure} string-index s char_pred [start [end]]
3492@deffnx {C Function} scm_string_index (s, char_pred, start, end)
3493Search through the string @var{s} from left to right, returning
be3eb25c 3494the index of the first occurrence of a character which
07d83abe 3495
5676b4fa
MV
3496@itemize @bullet
3497@item
3498equals @var{char_pred}, if it is character,
07d83abe 3499
5676b4fa 3500@item
be3eb25c 3501satisfies the predicate @var{char_pred}, if it is a procedure,
07d83abe 3502
5676b4fa
MV
3503@item
3504is in the set @var{char_pred}, if it is a character set.
3505@end itemize
bf7c2e96
LC
3506
3507Return @code{#f} if no match is found.
5676b4fa 3508@end deffn
07d83abe 3509
5676b4fa
MV
3510@deffn {Scheme Procedure} string-rindex s char_pred [start [end]]
3511@deffnx {C Function} scm_string_rindex (s, char_pred, start, end)
3512Search through the string @var{s} from right to left, returning
be3eb25c 3513the index of the last occurrence of a character which
5676b4fa
MV
3514
3515@itemize @bullet
3516@item
3517equals @var{char_pred}, if it is character,
3518
3519@item
be3eb25c 3520satisfies the predicate @var{char_pred}, if it is a procedure,
5676b4fa
MV
3521
3522@item
3523is in the set if @var{char_pred} is a character set.
3524@end itemize
bf7c2e96
LC
3525
3526Return @code{#f} if no match is found.
07d83abe
MV
3527@end deffn
3528
5676b4fa
MV
3529@deffn {Scheme Procedure} string-prefix-length s1 s2 [start1 [end1 [start2 [end2]]]]
3530@deffnx {C Function} scm_string_prefix_length (s1, s2, start1, end1, start2, end2)
3531Return the length of the longest common prefix of the two
3532strings.
3533@end deffn
07d83abe 3534
5676b4fa
MV
3535@deffn {Scheme Procedure} string-prefix-length-ci s1 s2 [start1 [end1 [start2 [end2]]]]
3536@deffnx {C Function} scm_string_prefix_length_ci (s1, s2, start1, end1, start2, end2)
3537Return the length of the longest common prefix of the two
3538strings, ignoring character case.
3539@end deffn
07d83abe 3540
5676b4fa
MV
3541@deffn {Scheme Procedure} string-suffix-length s1 s2 [start1 [end1 [start2 [end2]]]]
3542@deffnx {C Function} scm_string_suffix_length (s1, s2, start1, end1, start2, end2)
3543Return the length of the longest common suffix of the two
3544strings.
3545@end deffn
07d83abe 3546
5676b4fa
MV
3547@deffn {Scheme Procedure} string-suffix-length-ci s1 s2 [start1 [end1 [start2 [end2]]]]
3548@deffnx {C Function} scm_string_suffix_length_ci (s1, s2, start1, end1, start2, end2)
3549Return the length of the longest common suffix of the two
3550strings, ignoring character case.
3551@end deffn
3552
3553@deffn {Scheme Procedure} string-prefix? s1 s2 [start1 [end1 [start2 [end2]]]]
3554@deffnx {C Function} scm_string_prefix_p (s1, s2, start1, end1, start2, end2)
3555Is @var{s1} a prefix of @var{s2}?
3556@end deffn
3557
3558@deffn {Scheme Procedure} string-prefix-ci? s1 s2 [start1 [end1 [start2 [end2]]]]
3559@deffnx {C Function} scm_string_prefix_ci_p (s1, s2, start1, end1, start2, end2)
3560Is @var{s1} a prefix of @var{s2}, ignoring character case?
3561@end deffn
3562
3563@deffn {Scheme Procedure} string-suffix? s1 s2 [start1 [end1 [start2 [end2]]]]
3564@deffnx {C Function} scm_string_suffix_p (s1, s2, start1, end1, start2, end2)
3565Is @var{s1} a suffix of @var{s2}?
3566@end deffn
3567
3568@deffn {Scheme Procedure} string-suffix-ci? s1 s2 [start1 [end1 [start2 [end2]]]]
3569@deffnx {C Function} scm_string_suffix_ci_p (s1, s2, start1, end1, start2, end2)
3570Is @var{s1} a suffix of @var{s2}, ignoring character case?
3571@end deffn
3572
3573@deffn {Scheme Procedure} string-index-right s char_pred [start [end]]
3574@deffnx {C Function} scm_string_index_right (s, char_pred, start, end)
3575Search through the string @var{s} from right to left, returning
be3eb25c 3576the index of the last occurrence of a character which
5676b4fa
MV
3577
3578@itemize @bullet
3579@item
3580equals @var{char_pred}, if it is character,
3581
3582@item
be3eb25c 3583satisfies the predicate @var{char_pred}, if it is a procedure,
5676b4fa
MV
3584
3585@item
3586is in the set if @var{char_pred} is a character set.
3587@end itemize
bf7c2e96
LC
3588
3589Return @code{#f} if no match is found.
5676b4fa
MV
3590@end deffn
3591
3592@deffn {Scheme Procedure} string-skip s char_pred [start [end]]
3593@deffnx {C Function} scm_string_skip (s, char_pred, start, end)
3594Search through the string @var{s} from left to right, returning
be3eb25c 3595the index of the first occurrence of a character which
5676b4fa
MV
3596
3597@itemize @bullet
3598@item
3599does not equal @var{char_pred}, if it is character,
3600
3601@item
be3eb25c 3602does not satisfy the predicate @var{char_pred}, if it is a
5676b4fa
MV
3603procedure,
3604
3605@item
3606is not in the set if @var{char_pred} is a character set.
3607@end itemize
3608@end deffn
3609
3610@deffn {Scheme Procedure} string-skip-right s char_pred [start [end]]
3611@deffnx {C Function} scm_string_skip_right (s, char_pred, start, end)
3612Search through the string @var{s} from right to left, returning
be3eb25c 3613the index of the last occurrence of a character which
5676b4fa
MV
3614
3615@itemize @bullet
3616@item
3617does not equal @var{char_pred}, if it is character,
3618
3619@item
3620does not satisfy the predicate @var{char_pred}, if it is a
3621procedure,
3622
3623@item
3624is not in the set if @var{char_pred} is a character set.
3625@end itemize
3626@end deffn
3627
3628@deffn {Scheme Procedure} string-count s char_pred [start [end]]
3629@deffnx {C Function} scm_string_count (s, char_pred, start, end)
3630Return the count of the number of characters in the string
3631@var{s} which
3632
3633@itemize @bullet
3634@item
3635equals @var{char_pred}, if it is character,
3636
3637@item
be3eb25c 3638satisfies the predicate @var{char_pred}, if it is a procedure.
5676b4fa
MV
3639
3640@item
3641is in the set @var{char_pred}, if it is a character set.
3642@end itemize
3643@end deffn
3644
3645@deffn {Scheme Procedure} string-contains s1 s2 [start1 [end1 [start2 [end2]]]]
3646@deffnx {C Function} scm_string_contains (s1, s2, start1, end1, start2, end2)
3647Does string @var{s1} contain string @var{s2}? Return the index
3648in @var{s1} where @var{s2} occurs as a substring, or false.
3649The optional start/end indices restrict the operation to the
3650indicated substrings.
3651@end deffn
3652
3653@deffn {Scheme Procedure} string-contains-ci s1 s2 [start1 [end1 [start2 [end2]]]]
3654@deffnx {C Function} scm_string_contains_ci (s1, s2, start1, end1, start2, end2)
3655Does string @var{s1} contain string @var{s2}? Return the index
3656in @var{s1} where @var{s2} occurs as a substring, or false.
3657The optional start/end indices restrict the operation to the
3658indicated substrings. Character comparison is done
3659case-insensitively.
07d83abe
MV
3660@end deffn
3661
3662@node Alphabetic Case Mapping
3663@subsubsection Alphabetic Case Mapping
3664
3665These are procedures for mapping strings to their upper- or lower-case
3666equivalents, respectively, or for capitalizing strings.
3667
67af975c
MG
3668They use the basic case mapping rules for Unicode characters. No
3669special language or context rules are considered. The resulting strings
3670are guaranteed to be the same length as the input strings.
3671
3672@xref{Character Case Mapping, the @code{(ice-9
3673i18n)} module}, for locale-dependent case conversions.
3674
5676b4fa
MV
3675@deffn {Scheme Procedure} string-upcase str [start [end]]
3676@deffnx {C Function} scm_substring_upcase (str, start, end)
07d83abe 3677@deffnx {C Function} scm_string_upcase (str)
5676b4fa 3678Upcase every character in @code{str}.
07d83abe
MV
3679@end deffn
3680
5676b4fa
MV
3681@deffn {Scheme Procedure} string-upcase! str [start [end]]
3682@deffnx {C Function} scm_substring_upcase_x (str, start, end)
07d83abe 3683@deffnx {C Function} scm_string_upcase_x (str)
5676b4fa
MV
3684Destructively upcase every character in @code{str}.
3685
07d83abe 3686@lisp
5676b4fa
MV
3687(string-upcase! y)
3688@result{} "ARRDEFG"
3689y
3690@result{} "ARRDEFG"
07d83abe
MV
3691@end lisp
3692@end deffn
3693
5676b4fa
MV
3694@deffn {Scheme Procedure} string-downcase str [start [end]]
3695@deffnx {C Function} scm_substring_downcase (str, start, end)
07d83abe 3696@deffnx {C Function} scm_string_downcase (str)
5676b4fa 3697Downcase every character in @var{str}.
07d83abe
MV
3698@end deffn
3699
5676b4fa
MV
3700@deffn {Scheme Procedure} string-downcase! str [start [end]]
3701@deffnx {C Function} scm_substring_downcase_x (str, start, end)
07d83abe 3702@deffnx {C Function} scm_string_downcase_x (str)
5676b4fa
MV
3703Destructively downcase every character in @var{str}.
3704
07d83abe 3705@lisp
5676b4fa
MV
3706y
3707@result{} "ARRDEFG"
3708(string-downcase! y)
3709@result{} "arrdefg"
3710y
3711@result{} "arrdefg"
07d83abe
MV
3712@end lisp
3713@end deffn
3714
3715@deffn {Scheme Procedure} string-capitalize str
3716@deffnx {C Function} scm_string_capitalize (str)
3717Return a freshly allocated string with the characters in
3718@var{str}, where the first character of every word is
3719capitalized.
3720@end deffn
3721
3722@deffn {Scheme Procedure} string-capitalize! str
3723@deffnx {C Function} scm_string_capitalize_x (str)
3724Upcase the first character of every word in @var{str}
3725destructively and return @var{str}.
3726
3727@lisp
3728y @result{} "hello world"
3729(string-capitalize! y) @result{} "Hello World"
3730y @result{} "Hello World"
3731@end lisp
3732@end deffn
3733
5676b4fa
MV
3734@deffn {Scheme Procedure} string-titlecase str [start [end]]
3735@deffnx {C Function} scm_string_titlecase (str, start, end)
3736Titlecase every first character in a word in @var{str}.
3737@end deffn
07d83abe 3738
5676b4fa
MV
3739@deffn {Scheme Procedure} string-titlecase! str [start [end]]
3740@deffnx {C Function} scm_string_titlecase_x (str, start, end)
3741Destructively titlecase every first character in a word in
3742@var{str}.
3743@end deffn
3744
3745@node Reversing and Appending Strings
3746@subsubsection Reversing and Appending Strings
07d83abe 3747
5676b4fa
MV
3748@deffn {Scheme Procedure} string-reverse str [start [end]]
3749@deffnx {C Function} scm_string_reverse (str, start, end)
3750Reverse the string @var{str}. The optional arguments
3751@var{start} and @var{end} delimit the region of @var{str} to
3752operate on.
3753@end deffn
3754
3755@deffn {Scheme Procedure} string-reverse! str [start [end]]
3756@deffnx {C Function} scm_string_reverse_x (str, start, end)
3757Reverse the string @var{str} in-place. The optional arguments
3758@var{start} and @var{end} delimit the region of @var{str} to
3759operate on. The return value is unspecified.
3760@end deffn
07d83abe
MV
3761
3762@rnindex string-append
3763@deffn {Scheme Procedure} string-append . args
3764@deffnx {C Function} scm_string_append (args)
3765Return a newly allocated string whose characters form the
3766concatenation of the given strings, @var{args}.
3767
3768@example
3769(let ((h "hello "))
3770 (string-append h "world"))
3771@result{} "hello world"
3772@end example
3773@end deffn
3774
3323ec06
NJ
3775@deffn {Scheme Procedure} string-append/shared . rest
3776@deffnx {C Function} scm_string_append_shared (rest)
5676b4fa
MV
3777Like @code{string-append}, but the result may share memory
3778with the argument strings.
3779@end deffn
3780
3781@deffn {Scheme Procedure} string-concatenate ls
3782@deffnx {C Function} scm_string_concatenate (ls)
3783Append the elements of @var{ls} (which must be strings)
3784together into a single string. Guaranteed to return a freshly
3785allocated string.
3786@end deffn
3787
3788@deffn {Scheme Procedure} string-concatenate-reverse ls [final_string [end]]
3789@deffnx {C Function} scm_string_concatenate_reverse (ls, final_string, end)
3790Without optional arguments, this procedure is equivalent to
3791
aba0dff5 3792@lisp
5676b4fa 3793(string-concatenate (reverse ls))
aba0dff5 3794@end lisp
5676b4fa
MV
3795
3796If the optional argument @var{final_string} is specified, it is
3797consed onto the beginning to @var{ls} before performing the
3798list-reverse and string-concatenate operations. If @var{end}
3799is given, only the characters of @var{final_string} up to index
3800@var{end} are used.
3801
3802Guaranteed to return a freshly allocated string.
3803@end deffn
3804
3805@deffn {Scheme Procedure} string-concatenate/shared ls
3806@deffnx {C Function} scm_string_concatenate_shared (ls)
3807Like @code{string-concatenate}, but the result may share memory
3808with the strings in the list @var{ls}.
3809@end deffn
3810
3811@deffn {Scheme Procedure} string-concatenate-reverse/shared ls [final_string [end]]
3812@deffnx {C Function} scm_string_concatenate_reverse_shared (ls, final_string, end)
3813Like @code{string-concatenate-reverse}, but the result may
72b3aa56 3814share memory with the strings in the @var{ls} arguments.
5676b4fa
MV
3815@end deffn
3816
3817@node Mapping Folding and Unfolding
3818@subsubsection Mapping, Folding, and Unfolding
3819
3820@deffn {Scheme Procedure} string-map proc s [start [end]]
3821@deffnx {C Function} scm_string_map (proc, s, start, end)
3822@var{proc} is a char->char procedure, it is mapped over
3823@var{s}. The order in which the procedure is applied to the
3824string elements is not specified.
3825@end deffn
3826
3827@deffn {Scheme Procedure} string-map! proc s [start [end]]
3828@deffnx {C Function} scm_string_map_x (proc, s, start, end)
3829@var{proc} is a char->char procedure, it is mapped over
3830@var{s}. The order in which the procedure is applied to the
3831string elements is not specified. The string @var{s} is
3832modified in-place, the return value is not specified.
3833@end deffn
3834
3835@deffn {Scheme Procedure} string-for-each proc s [start [end]]
3836@deffnx {C Function} scm_string_for_each (proc, s, start, end)
3837@var{proc} is mapped over @var{s} in left-to-right order. The
3838return value is not specified.
3839@end deffn
3840
3841@deffn {Scheme Procedure} string-for-each-index proc s [start [end]]
3842@deffnx {C Function} scm_string_for_each_index (proc, s, start, end)
2a7820f2
KR
3843Call @code{(@var{proc} i)} for each index i in @var{s}, from left to
3844right.
3845
3846For example, to change characters to alternately upper and lower case,
3847
3848@example
3849(define str (string-copy "studly"))
45867c2a
NJ
3850(string-for-each-index
3851 (lambda (i)
3852 (string-set! str i
3853 ((if (even? i) char-upcase char-downcase)
3854 (string-ref str i))))
3855 str)
2a7820f2
KR
3856str @result{} "StUdLy"
3857@end example
5676b4fa
MV
3858@end deffn
3859
3860@deffn {Scheme Procedure} string-fold kons knil s [start [end]]
3861@deffnx {C Function} scm_string_fold (kons, knil, s, start, end)
3862Fold @var{kons} over the characters of @var{s}, with @var{knil}
3863as the terminating element, from left to right. @var{kons}
3864must expect two arguments: The actual character and the last
3865result of @var{kons}' application.
3866@end deffn
3867
3868@deffn {Scheme Procedure} string-fold-right kons knil s [start [end]]
3869@deffnx {C Function} scm_string_fold_right (kons, knil, s, start, end)
3870Fold @var{kons} over the characters of @var{s}, with @var{knil}
3871as the terminating element, from right to left. @var{kons}
3872must expect two arguments: The actual character and the last
3873result of @var{kons}' application.
3874@end deffn
3875
3876@deffn {Scheme Procedure} string-unfold p f g seed [base [make_final]]
3877@deffnx {C Function} scm_string_unfold (p, f, g, seed, base, make_final)
3878@itemize @bullet
3879@item @var{g} is used to generate a series of @emph{seed}
3880values from the initial @var{seed}: @var{seed}, (@var{g}
3881@var{seed}), (@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}),
3882@dots{}
3883@item @var{p} tells us when to stop -- when it returns true
3884when applied to one of these seed values.
3885@item @var{f} maps each seed value to the corresponding
3886character in the result string. These chars are assembled
3887into the string in a left-to-right order.
3888@item @var{base} is the optional initial/leftmost portion
3889of the constructed string; it default to the empty
3890string.
3891@item @var{make_final} is applied to the terminal seed
3892value (on which @var{p} returns true) to produce
3893the final/rightmost portion of the constructed string.
9a18d8d4 3894The default is nothing extra.
5676b4fa
MV
3895@end itemize
3896@end deffn
3897
3898@deffn {Scheme Procedure} string-unfold-right p f g seed [base [make_final]]
3899@deffnx {C Function} scm_string_unfold_right (p, f, g, seed, base, make_final)
3900@itemize @bullet
3901@item @var{g} is used to generate a series of @emph{seed}
3902values from the initial @var{seed}: @var{seed}, (@var{g}
3903@var{seed}), (@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}),
3904@dots{}
3905@item @var{p} tells us when to stop -- when it returns true
3906when applied to one of these seed values.
3907@item @var{f} maps each seed value to the corresponding
3908character in the result string. These chars are assembled
3909into the string in a right-to-left order.
3910@item @var{base} is the optional initial/rightmost portion
3911of the constructed string; it default to the empty
3912string.
3913@item @var{make_final} is applied to the terminal seed
3914value (on which @var{p} returns true) to produce
3915the final/leftmost portion of the constructed string.
3916It defaults to @code{(lambda (x) )}.
3917@end itemize
3918@end deffn
3919
3920@node Miscellaneous String Operations
3921@subsubsection Miscellaneous String Operations
3922
3923@deffn {Scheme Procedure} xsubstring s from [to [start [end]]]
3924@deffnx {C Function} scm_xsubstring (s, from, to, start, end)
3925This is the @emph{extended substring} procedure that implements
3926replicated copying of a substring of some string.
3927
3928@var{s} is a string, @var{start} and @var{end} are optional
3929arguments that demarcate a substring of @var{s}, defaulting to
39300 and the length of @var{s}. Replicate this substring up and
3931down index space, in both the positive and negative directions.
3932@code{xsubstring} returns the substring of this string
3933beginning at index @var{from}, and ending at @var{to}, which
3934defaults to @var{from} + (@var{end} - @var{start}).
3935@end deffn
3936
3937@deffn {Scheme Procedure} string-xcopy! target tstart s sfrom [sto [start [end]]]
3938@deffnx {C Function} scm_string_xcopy_x (target, tstart, s, sfrom, sto, start, end)
3939Exactly the same as @code{xsubstring}, but the extracted text
3940is written into the string @var{target} starting at index
3941@var{tstart}. The operation is not defined if @code{(eq?
3942@var{target} @var{s})} or these arguments share storage -- you
3943cannot copy a string on top of itself.
3944@end deffn
3945
3946@deffn {Scheme Procedure} string-replace s1 s2 [start1 [end1 [start2 [end2]]]]
3947@deffnx {C Function} scm_string_replace (s1, s2, start1, end1, start2, end2)
3948Return the string @var{s1}, but with the characters
3949@var{start1} @dots{} @var{end1} replaced by the characters
3950@var{start2} @dots{} @var{end2} from @var{s2}.
3951@end deffn
3952
3953@deffn {Scheme Procedure} string-tokenize s [token_set [start [end]]]
3954@deffnx {C Function} scm_string_tokenize (s, token_set, start, end)
3955Split the string @var{s} into a list of substrings, where each
3956substring is a maximal non-empty contiguous sequence of
3957characters from the character set @var{token_set}, which
3958defaults to @code{char-set:graphic}.
3959If @var{start} or @var{end} indices are provided, they restrict
3960@code{string-tokenize} to operating on the indicated substring
3961of @var{s}.
3962@end deffn
3963
9fe717e2
AW
3964@deffn {Scheme Procedure} string-filter char_pred s [start [end]]
3965@deffnx {C Function} scm_string_filter (char_pred, s, start, end)
08de3e24 3966Filter the string @var{s}, retaining only those characters which
a88e2a96 3967satisfy @var{char_pred}.
08de3e24
KR
3968
3969If @var{char_pred} is a procedure, it is applied to each character as
3970a predicate, if it is a character, it is tested for equality and if it
3971is a character set, it is tested for membership.
5676b4fa
MV
3972@end deffn
3973
9fe717e2
AW
3974@deffn {Scheme Procedure} string-delete char_pred s [start [end]]
3975@deffnx {C Function} scm_string_delete (char_pred, s, start, end)
a88e2a96 3976Delete characters satisfying @var{char_pred} from @var{s}.
08de3e24
KR
3977
3978If @var{char_pred} is a procedure, it is applied to each character as
3979a predicate, if it is a character, it is tested for equality and if it
3980is a character set, it is tested for membership.
5676b4fa
MV
3981@end deffn
3982
91210d62
MV
3983@node Conversion to/from C
3984@subsubsection Conversion to/from C
3985
3986When creating a Scheme string from a C string or when converting a
3987Scheme string to a C string, the concept of character encoding becomes
3988important.
3989
3990In C, a string is just a sequence of bytes, and the character encoding
3991describes the relation between these bytes and the actual characters
c88453e8
MV
3992that make up the string. For Scheme strings, character encoding is
3993not an issue (most of the time), since in Scheme you never get to see
3994the bytes, only the characters.
91210d62 3995
67af975c
MG
3996Converting to C and converting from C each have their own challenges.
3997
3998When converting from C to Scheme, it is important that the sequence of
3999bytes in the C string be valid with respect to its encoding. ASCII
4000strings, for example, can't have any bytes greater than 127. An ASCII
4001byte greater than 127 is considered @emph{ill-formed} and cannot be
4002converted into a Scheme character.
4003
4004Problems can occur in the reverse operation as well. Not all character
4005encodings can hold all possible Scheme characters. Some encodings, like
4006ASCII for example, can only describe a small subset of all possible
4007characters. So, when converting to C, one must first decide what to do
4008with Scheme characters that can't be represented in the C string.
91210d62 4009
c88453e8
MV
4010Converting a Scheme string to a C string will often allocate fresh
4011memory to hold the result. You must take care that this memory is
4012properly freed eventually. In many cases, this can be achieved by
661ae7ab
MV
4013using @code{scm_dynwind_free} inside an appropriate dynwind context,
4014@xref{Dynamic Wind}.
91210d62
MV
4015
4016@deftypefn {C Function} SCM scm_from_locale_string (const char *str)
4017@deftypefnx {C Function} SCM scm_from_locale_stringn (const char *str, size_t len)
67af975c
MG
4018Creates a new Scheme string that has the same contents as @var{str} when
4019interpreted in the locale character encoding of the
4020@code{current-input-port}.
91210d62
MV
4021
4022For @code{scm_from_locale_string}, @var{str} must be null-terminated.
4023
4024For @code{scm_from_locale_stringn}, @var{len} specifies the length of
4025@var{str} in bytes, and @var{str} does not need to be null-terminated.
4026If @var{len} is @code{(size_t)-1}, then @var{str} does need to be
4027null-terminated and the real length will be found with @code{strlen}.
67af975c
MG
4028
4029If the C string is ill-formed, an error will be raised.
91210d62
MV
4030@end deftypefn
4031
4032@deftypefn {C Function} SCM scm_take_locale_string (char *str)
4033@deftypefnx {C Function} SCM scm_take_locale_stringn (char *str, size_t len)
4034Like @code{scm_from_locale_string} and @code{scm_from_locale_stringn},
4035respectively, but also frees @var{str} with @code{free} eventually.
4036Thus, you can use this function when you would free @var{str} anyway
4037immediately after creating the Scheme string. In certain cases, Guile
4038can then use @var{str} directly as its internal representation.
4039@end deftypefn
4040
4846ae2c
KR
4041@deftypefn {C Function} {char *} scm_to_locale_string (SCM str)
4042@deftypefnx {C Function} {char *} scm_to_locale_stringn (SCM str, size_t *lenp)
67af975c
MG
4043Returns a C string with the same contents as @var{str} in the locale
4044encoding of the @code{current-output-port}. The C string must be freed
4045with @code{free} eventually, maybe by using @code{scm_dynwind_free},
4046@xref{Dynamic Wind}.
91210d62
MV
4047
4048For @code{scm_to_locale_string}, the returned string is
4049null-terminated and an error is signalled when @var{str} contains
4050@code{#\nul} characters.
4051
4052For @code{scm_to_locale_stringn} and @var{lenp} not @code{NULL},
4053@var{str} might contain @code{#\nul} characters and the length of the
4054returned string in bytes is stored in @code{*@var{lenp}}. The
4055returned string will not be null-terminated in this case. If
4056@var{lenp} is @code{NULL}, @code{scm_to_locale_stringn} behaves like
4057@code{scm_to_locale_string}.
67af975c
MG
4058
4059If a character in @var{str} cannot be represented in the locale encoding
4060of the current output port, the port conversion strategy of the current
4061output port will determine the result, @xref{Ports}. If output port's
4062conversion strategy is @code{error}, an error will be raised. If it is
4063@code{subsitute}, a replacement character, such as a question mark, will
4064be inserted in its place. If it is @code{escape}, a hex escape will be
4065inserted in its place.
91210d62
MV
4066@end deftypefn
4067
4068@deftypefn {C Function} size_t scm_to_locale_stringbuf (SCM str, char *buf, size_t max_len)
4069Puts @var{str} as a C string in the current locale encoding into the
4070memory pointed to by @var{buf}. The buffer at @var{buf} has room for
4071@var{max_len} bytes and @code{scm_to_local_stringbuf} will never store
4072more than that. No terminating @code{'\0'} will be stored.
4073
4074The return value of @code{scm_to_locale_stringbuf} is the number of
4075bytes that are needed for all of @var{str}, regardless of whether
4076@var{buf} was large enough to hold them. Thus, when the return value
4077is larger than @var{max_len}, only @var{max_len} bytes have been
4078stored and you probably need to try again with a larger buffer.
4079@end deftypefn
cf313a94
MG
4080
4081For most situations, string conversion should occur using the current
4082locale, such as with the functions above. But there may be cases where
4083one wants to convert strings from a character encoding other than the
4084locale's character encoding. For these cases, the lower-level functions
4085@code{scm_to_stringn} and @code{scm_from_stringn} are provided. These
4086functions should seldom be necessary if one is properly using locales.
4087
4088@deftp {C Type} scm_t_string_failed_conversion_handler
4089This is an enumerated type that can take one of three values:
4090@code{SCM_FAILED_CONVERSION_ERROR},
4091@code{SCM_FAILED_CONVERSION_QUESTION_MARK}, and
4092@code{SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE}. They are used to indicate
4093a strategy for handling characters that cannot be converted to or from a
4094given character encoding. @code{SCM_FAILED_CONVERSION_ERROR} indicates
4095that a conversion should throw an error if some characters cannot be
4096converted. @code{SCM_FAILED_CONVERSION_QUESTION_MARK} indicates that a
4097conversion should replace unconvertable characters with the question
4098mark character. And, @code{SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE}
4099requests that a conversion should replace an unconvertable character
4100with an escape sequence.
4101
4102While all three strategies apply when converting Scheme strings to C,
4103only @code{SCM_FAILED_CONVERSION_ERROR} and
4104@code{SCM_FAILED_CONVERSION_QUESTION_MARK} can be used when converting C
4105strings to Scheme.
4106@end deftp
4107
4108@deftypefn {C Function} char *scm_to_stringn (SCM str, size_t *lenp, const char *encoding, scm_t_string_failed_conversion_handler handler)
4109This function returns a newly allocated C string from the Guile string
4110@var{str}. The length of the string will be returned in @var{lenp}.
4111The character encoding of the C string is passed as the ASCII,
4112null-terminated C string @var{encoding}. The @var{handler} parameter
4113gives a strategy for dealing with characters that cannot be converted
4114into @var{encoding}.
4115
4116If @var{lenp} is NULL, this function will return a null-terminated C
4117string. It will throw an error if the string contains a null
4118character.
4119@end deftypefn
4120
4121@deftypefn {C Function} SCM scm_from_stringn (const char *str, size_t len, const char *encoding, scm_t_string_failed_conversion_handler handler)
4122This function returns a scheme string from the C string @var{str}. The
4123length of the C string is input as @var{len}. The encoding of the C
4124string is passed as the ASCII, null-terminated C string @code{encoding}.
4125The @var{handler} parameters suggests a strategy for dealing with
4126unconvertable characters.
4127@end deftypefn
4128
4129ISO-8859-1 is the most common 8-bit character encoding. This encoding
4130is also referred to as the Latin-1 encoding. The following two
4131conversion functions are provided to convert between Latin-1 C strings
4132and Guile strings.
4133
4134@deftypefn {C Function} SCM scm_from_latin1_stringn (const char *str, size_t len)
647dc1ac
LC
4135@deftypefnx {C Function} SCM scm_from_utf8_stringn (const char *str, size_t len)
4136@deftypefnx {C Function} SCM scm_from_utf32_stringn (const scm_t_wchar *str, size_t len)
4137Return a scheme string from C string @var{str}, which is ISO-8859-1-,
4138UTF-8-, or UTF-32-encoded, of length @var{len}. @var{len} is the number
4139of bytes pointed to by @var{str} for @code{scm_from_latin1_stringn} and
4140@code{scm_from_utf8_stringn}; it is the number of elements (code points)
4141in @var{str} in the case of @code{scm_from_utf32_stringn}.
cf313a94
MG
4142@end deftypefn
4143
647dc1ac
LC
4144@deftypefn {C function} char *scm_to_latin1_stringn (SCM str, size_t *lenp)
4145@deftypefnx {C function} char *scm_to_utf8_stringn (SCM str, size_t *lenp)
4146@deftypefnx {C function} scm_t_wchar *scm_to_utf32_stringn (SCM str, size_t *lenp)
4147Return a newly allocated, ISO-8859-1-, UTF-8-, or UTF-32-encoded C string
4148from Scheme string @var{str}. An error is thrown when @var{str}
4149string cannot be converted to the specified encoding. If @var{lenp} is
cf313a94
MG
4150@code{NULL}, the returned C string will be null terminated, and an error
4151will be thrown if the C string would otherwise contain null
4152characters. If @var{lenp} is not NULL, the length of the string is
4153returned in @var{lenp}, and the string is not null terminated.
4154@end deftypefn
07d83abe 4155
5b6b22e8
MG
4156@node String Internals
4157@subsubsection String Internals
4158
4159Guile stores each string in memory as a contiguous array of Unicode code
4160points along with an associated set of attributes. If all of the code
4161points of a string have an integer range between 0 and 255 inclusive,
4162the code point array is stored as one byte per code point: it is stored
4163as an ISO-8859-1 (aka Latin-1) string. If any of the code points of the
4164string has an integer value greater that 255, the code point array is
4165stored as four bytes per code point: it is stored as a UTF-32 string.
4166
4167Conversion between the one-byte-per-code-point and
4168four-bytes-per-code-point representations happens automatically as
4169necessary.
4170
4171No API is provided to set the internal representation of strings;
4172however, there are pair of procedures available to query it. These are
4173debugging procedures. Using them in production code is discouraged,
4174since the details of Guile's internal representation of strings may
4175change from release to release.
4176
4177@deffn {Scheme Procedure} string-bytes-per-char str
4178@deffnx {C Function} scm_string_bytes_per_char (str)
4179Return the number of bytes used to encode a Unicode code point in string
4180@var{str}. The result is one or four.
4181@end deffn
4182
4183@deffn {Scheme Procedure} %string-dump str
4184@deffnx {C Function} scm_sys_string_dump (str)
4185Returns an association list containing debugging information for
4186@var{str}. The association list has the following entries.
4187@table @code
4188
4189@item string
4190The string itself.
4191
4192@item start
4193The start index of the string into its stringbuf
4194
4195@item length
4196The length of the string
4197
4198@item shared
4199If this string is a substring, it returns its
4200parent string. Otherwise, it returns @code{#f}
4201
4202@item read-only
4203@code{#t} if the string is read-only
4204
4205@item stringbuf-chars
4206A new string containing this string's stringbuf's characters
4207
4208@item stringbuf-length
4209The number of characters in this stringbuf
4210
4211@item stringbuf-shared
4212@code{#t} if this stringbuf is shared
4213
4214@item stringbuf-wide
4215@code{#t} if this stringbuf's characters are stored in a 32-bit buffer,
4216or @code{#f} if they are stored in an 8-bit buffer
4217@end table
4218@end deffn
4219
4220
b242715b
LC
4221@node Bytevectors
4222@subsection Bytevectors
4223
4224@cindex bytevector
4225@cindex R6RS
4226
07d22c02 4227A @dfn{bytevector} is a raw bit string. The @code{(rnrs bytevectors)}
b242715b 4228module provides the programming interface specified by the
5fa2deb3 4229@uref{http://www.r6rs.org/, Revised^6 Report on the Algorithmic Language
b242715b
LC
4230Scheme (R6RS)}. It contains procedures to manipulate bytevectors and
4231interpret their contents in a number of ways: bytevector contents can be
4232accessed as signed or unsigned integer of various sizes and endianness,
4233as IEEE-754 floating point numbers, or as strings. It is a useful tool
4234to encode and decode binary data.
4235
4236The R6RS (Section 4.3.4) specifies an external representation for
4237bytevectors, whereby the octets (integers in the range 0--255) contained
4238in the bytevector are represented as a list prefixed by @code{#vu8}:
4239
4240@lisp
4241#vu8(1 53 204)
4242@end lisp
4243
4244denotes a 3-byte bytevector containing the octets 1, 53, and 204. Like
4245string literals, booleans, etc., bytevectors are ``self-quoting'', i.e.,
4246they do not need to be quoted:
4247
4248@lisp
4249#vu8(1 53 204)
4250@result{} #vu8(1 53 204)
4251@end lisp
4252
4253Bytevectors can be used with the binary input/output primitives of the
4254R6RS (@pxref{R6RS I/O Ports}).
4255
4256@menu
4257* Bytevector Endianness:: Dealing with byte order.
4258* Bytevector Manipulation:: Creating, copying, manipulating bytevectors.
4259* Bytevectors as Integers:: Interpreting bytes as integers.
4260* Bytevectors and Integer Lists:: Converting to/from an integer list.
4261* Bytevectors as Floats:: Interpreting bytes as real numbers.
4262* Bytevectors as Strings:: Interpreting bytes as Unicode strings.
438974d0 4263* Bytevectors as Generalized Vectors:: Guile extension to the bytevector API.
27219b32 4264* Bytevectors as Uniform Vectors:: Bytevectors and SRFI-4.
b242715b
LC
4265@end menu
4266
4267@node Bytevector Endianness
4268@subsubsection Endianness
4269
4270@cindex endianness
4271@cindex byte order
4272@cindex word order
4273
4274Some of the following procedures take an @var{endianness} parameter.
5fa2deb3
AW
4275The @dfn{endianness} is defined as the order of bytes in multi-byte
4276numbers: numbers encoded in @dfn{big endian} have their most
4277significant bytes written first, whereas numbers encoded in
4278@dfn{little endian} have their least significant bytes
4279first@footnote{Big-endian and little-endian are the most common
4280``endiannesses'', but others do exist. For instance, the GNU MP
4281library allows @dfn{word order} to be specified independently of
4282@dfn{byte order} (@pxref{Integer Import and Export,,, gmp, The GNU
4283Multiple Precision Arithmetic Library Manual}).}.
4284
4285Little-endian is the native endianness of the IA32 architecture and
4286its derivatives, while big-endian is native to SPARC and PowerPC,
4287among others. The @code{native-endianness} procedure returns the
4288native endianness of the machine it runs on.
b242715b
LC
4289
4290@deffn {Scheme Procedure} native-endianness
4291@deffnx {C Function} scm_native_endianness ()
4292Return a value denoting the native endianness of the host machine.
4293@end deffn
4294
4295@deffn {Scheme Macro} endianness symbol
4296Return an object denoting the endianness specified by @var{symbol}. If
5fa2deb3
AW
4297@var{symbol} is neither @code{big} nor @code{little} then an error is
4298raised at expand-time.
b242715b
LC
4299@end deffn
4300
4301@defvr {C Variable} scm_endianness_big
4302@defvrx {C Variable} scm_endianness_little
5fa2deb3 4303The objects denoting big- and little-endianness, respectively.
b242715b
LC
4304@end defvr
4305
4306
4307@node Bytevector Manipulation
4308@subsubsection Manipulating Bytevectors
4309
4310Bytevectors can be created, copied, and analyzed with the following
404bb5f8 4311procedures and C functions.
b242715b
LC
4312
4313@deffn {Scheme Procedure} make-bytevector len [fill]
4314@deffnx {C Function} scm_make_bytevector (len, fill)
2d34e924 4315@deffnx {C Function} scm_c_make_bytevector (size_t len)
b242715b 4316Return a new bytevector of @var{len} bytes. Optionally, if @var{fill}
d64fc8b0
LC
4317is given, fill it with @var{fill}; @var{fill} must be in the range
4318[-128,255].
b242715b
LC
4319@end deffn
4320
4321@deffn {Scheme Procedure} bytevector? obj
4322@deffnx {C Function} scm_bytevector_p (obj)
4323Return true if @var{obj} is a bytevector.
4324@end deffn
4325
404bb5f8
LC
4326@deftypefn {C Function} int scm_is_bytevector (SCM obj)
4327Equivalent to @code{scm_is_true (scm_bytevector_p (obj))}.
4328@end deftypefn
4329
b242715b
LC
4330@deffn {Scheme Procedure} bytevector-length bv
4331@deffnx {C Function} scm_bytevector_length (bv)
4332Return the length in bytes of bytevector @var{bv}.
4333@end deffn
4334
404bb5f8
LC
4335@deftypefn {C Function} size_t scm_c_bytevector_length (SCM bv)
4336Likewise, return the length in bytes of bytevector @var{bv}.
4337@end deftypefn
4338
b242715b
LC
4339@deffn {Scheme Procedure} bytevector=? bv1 bv2
4340@deffnx {C Function} scm_bytevector_eq_p (bv1, bv2)
4341Return is @var{bv1} equals to @var{bv2}---i.e., if they have the same
4342length and contents.
4343@end deffn
4344
4345@deffn {Scheme Procedure} bytevector-fill! bv fill
4346@deffnx {C Function} scm_bytevector_fill_x (bv, fill)
4347Fill bytevector @var{bv} with @var{fill}, a byte.
4348@end deffn
4349
4350@deffn {Scheme Procedure} bytevector-copy! source source-start target target-start len
4351@deffnx {C Function} scm_bytevector_copy_x (source, source_start, target, target_start, len)
4352Copy @var{len} bytes from @var{source} into @var{target}, starting
4353reading from @var{source-start} (a positive index within @var{source})
4354and start writing at @var{target-start}.
4355@end deffn
4356
4357@deffn {Scheme Procedure} bytevector-copy bv
4358@deffnx {C Function} scm_bytevector_copy (bv)
4359Return a newly allocated copy of @var{bv}.
4360@end deffn
4361
404bb5f8
LC
4362@deftypefn {C Function} scm_t_uint8 scm_c_bytevector_ref (SCM bv, size_t index)
4363Return the byte at @var{index} in bytevector @var{bv}.
4364@end deftypefn
4365
4366@deftypefn {C Function} void scm_c_bytevector_set_x (SCM bv, size_t index, scm_t_uint8 value)
4367Set the byte at @var{index} in @var{bv} to @var{value}.
4368@end deftypefn
4369
b242715b
LC
4370Low-level C macros are available. They do not perform any
4371type-checking; as such they should be used with care.
4372
4373@deftypefn {C Macro} size_t SCM_BYTEVECTOR_LENGTH (bv)
4374Return the length in bytes of bytevector @var{bv}.
4375@end deftypefn
4376
4377@deftypefn {C Macro} {signed char *} SCM_BYTEVECTOR_CONTENTS (bv)
4378Return a pointer to the contents of bytevector @var{bv}.
4379@end deftypefn
4380
4381
4382@node Bytevectors as Integers
4383@subsubsection Interpreting Bytevector Contents as Integers
4384
4385The contents of a bytevector can be interpreted as a sequence of
4386integers of any given size, sign, and endianness.
4387
4388@lisp
4389(let ((bv (make-bytevector 4)))
4390 (bytevector-u8-set! bv 0 #x12)
4391 (bytevector-u8-set! bv 1 #x34)
4392 (bytevector-u8-set! bv 2 #x56)
4393 (bytevector-u8-set! bv 3 #x78)
4394
4395 (map (lambda (number)
4396 (number->string number 16))
4397 (list (bytevector-u8-ref bv 0)
4398 (bytevector-u16-ref bv 0 (endianness big))
4399 (bytevector-u32-ref bv 0 (endianness little)))))
4400
4401@result{} ("12" "1234" "78563412")
4402@end lisp
4403
4404The most generic procedures to interpret bytevector contents as integers
4405are described below.
4406
4407@deffn {Scheme Procedure} bytevector-uint-ref bv index endianness size
4408@deffnx {Scheme Procedure} bytevector-sint-ref bv index endianness size
4409@deffnx {C Function} scm_bytevector_uint_ref (bv, index, endianness, size)
4410@deffnx {C Function} scm_bytevector_sint_ref (bv, index, endianness, size)
4411Return the @var{size}-byte long unsigned (resp. signed) integer at
4412index @var{index} in @var{bv}, decoded according to @var{endianness}.
4413@end deffn
4414
4415@deffn {Scheme Procedure} bytevector-uint-set! bv index value endianness size
4416@deffnx {Scheme Procedure} bytevector-sint-set! bv index value endianness size
4417@deffnx {C Function} scm_bytevector_uint_set_x (bv, index, value, endianness, size)
4418@deffnx {C Function} scm_bytevector_sint_set_x (bv, index, value, endianness, size)
4419Set the @var{size}-byte long unsigned (resp. signed) integer at
4420@var{index} to @var{value}, encoded according to @var{endianness}.
4421@end deffn
4422
4423The following procedures are similar to the ones above, but specialized
4424to a given integer size:
4425
4426@deffn {Scheme Procedure} bytevector-u8-ref bv index
4427@deffnx {Scheme Procedure} bytevector-s8-ref bv index
4428@deffnx {Scheme Procedure} bytevector-u16-ref bv index endianness
4429@deffnx {Scheme Procedure} bytevector-s16-ref bv index endianness
4430@deffnx {Scheme Procedure} bytevector-u32-ref bv index endianness
4431@deffnx {Scheme Procedure} bytevector-s32-ref bv index endianness
4432@deffnx {Scheme Procedure} bytevector-u64-ref bv index endianness
4433@deffnx {Scheme Procedure} bytevector-s64-ref bv index endianness
4434@deffnx {C Function} scm_bytevector_u8_ref (bv, index)
4435@deffnx {C Function} scm_bytevector_s8_ref (bv, index)
4436@deffnx {C Function} scm_bytevector_u16_ref (bv, index, endianness)
4437@deffnx {C Function} scm_bytevector_s16_ref (bv, index, endianness)
4438@deffnx {C Function} scm_bytevector_u32_ref (bv, index, endianness)
4439@deffnx {C Function} scm_bytevector_s32_ref (bv, index, endianness)
4440@deffnx {C Function} scm_bytevector_u64_ref (bv, index, endianness)
4441@deffnx {C Function} scm_bytevector_s64_ref (bv, index, endianness)
4442Return the unsigned @var{n}-bit (signed) integer (where @var{n} is 8,
444316, 32 or 64) from @var{bv} at @var{index}, decoded according to
4444@var{endianness}.
4445@end deffn
4446
4447@deffn {Scheme Procedure} bytevector-u8-set! bv index value
4448@deffnx {Scheme Procedure} bytevector-s8-set! bv index value
4449@deffnx {Scheme Procedure} bytevector-u16-set! bv index value endianness
4450@deffnx {Scheme Procedure} bytevector-s16-set! bv index value endianness
4451@deffnx {Scheme Procedure} bytevector-u32-set! bv index value endianness
4452@deffnx {Scheme Procedure} bytevector-s32-set! bv index value endianness
4453@deffnx {Scheme Procedure} bytevector-u64-set! bv index value endianness
4454@deffnx {Scheme Procedure} bytevector-s64-set! bv index value endianness
4455@deffnx {C Function} scm_bytevector_u8_set_x (bv, index, value)
4456@deffnx {C Function} scm_bytevector_s8_set_x (bv, index, value)
4457@deffnx {C Function} scm_bytevector_u16_set_x (bv, index, value, endianness)
4458@deffnx {C Function} scm_bytevector_s16_set_x (bv, index, value, endianness)
4459@deffnx {C Function} scm_bytevector_u32_set_x (bv, index, value, endianness)
4460@deffnx {C Function} scm_bytevector_s32_set_x (bv, index, value, endianness)
4461@deffnx {C Function} scm_bytevector_u64_set_x (bv, index, value, endianness)
4462@deffnx {C Function} scm_bytevector_s64_set_x (bv, index, value, endianness)
4463Store @var{value} as an @var{n}-bit (signed) integer (where @var{n} is
44648, 16, 32 or 64) in @var{bv} at @var{index}, encoded according to
4465@var{endianness}.
4466@end deffn
4467
4468Finally, a variant specialized for the host's endianness is available
4469for each of these functions (with the exception of the @code{u8}
4470accessors, for obvious reasons):
4471
4472@deffn {Scheme Procedure} bytevector-u16-native-ref bv index
4473@deffnx {Scheme Procedure} bytevector-s16-native-ref bv index
4474@deffnx {Scheme Procedure} bytevector-u32-native-ref bv index
4475@deffnx {Scheme Procedure} bytevector-s32-native-ref bv index
4476@deffnx {Scheme Procedure} bytevector-u64-native-ref bv index
4477@deffnx {Scheme Procedure} bytevector-s64-native-ref bv index
4478@deffnx {C Function} scm_bytevector_u16_native_ref (bv, index)
4479@deffnx {C Function} scm_bytevector_s16_native_ref (bv, index)
4480@deffnx {C Function} scm_bytevector_u32_native_ref (bv, index)
4481@deffnx {C Function} scm_bytevector_s32_native_ref (bv, index)
4482@deffnx {C Function} scm_bytevector_u64_native_ref (bv, index)
4483@deffnx {C Function} scm_bytevector_s64_native_ref (bv, index)
4484Return the unsigned @var{n}-bit (signed) integer (where @var{n} is 8,
448516, 32 or 64) from @var{bv} at @var{index}, decoded according to the
4486host's native endianness.
4487@end deffn
4488
4489@deffn {Scheme Procedure} bytevector-u16-native-set! bv index value
4490@deffnx {Scheme Procedure} bytevector-s16-native-set! bv index value
4491@deffnx {Scheme Procedure} bytevector-u32-native-set! bv index value
4492@deffnx {Scheme Procedure} bytevector-s32-native-set! bv index value
4493@deffnx {Scheme Procedure} bytevector-u64-native-set! bv index value
4494@deffnx {Scheme Procedure} bytevector-s64-native-set! bv index value
4495@deffnx {C Function} scm_bytevector_u16_native_set_x (bv, index, value)
4496@deffnx {C Function} scm_bytevector_s16_native_set_x (bv, index, value)
4497@deffnx {C Function} scm_bytevector_u32_native_set_x (bv, index, value)
4498@deffnx {C Function} scm_bytevector_s32_native_set_x (bv, index, value)
4499@deffnx {C Function} scm_bytevector_u64_native_set_x (bv, index, value)
4500@deffnx {C Function} scm_bytevector_s64_native_set_x (bv, index, value)
4501Store @var{value} as an @var{n}-bit (signed) integer (where @var{n} is
45028, 16, 32 or 64) in @var{bv} at @var{index}, encoded according to the
4503host's native endianness.
4504@end deffn
4505
4506
4507@node Bytevectors and Integer Lists
4508@subsubsection Converting Bytevectors to/from Integer Lists
4509
4510Bytevector contents can readily be converted to/from lists of signed or
4511unsigned integers:
4512
4513@lisp
4514(bytevector->sint-list (u8-list->bytevector (make-list 4 255))
4515 (endianness little) 2)
4516@result{} (-1 -1)
4517@end lisp
4518
4519@deffn {Scheme Procedure} bytevector->u8-list bv
4520@deffnx {C Function} scm_bytevector_to_u8_list (bv)
4521Return a newly allocated list of unsigned 8-bit integers from the
4522contents of @var{bv}.
4523@end deffn
4524
4525@deffn {Scheme Procedure} u8-list->bytevector lst
4526@deffnx {C Function} scm_u8_list_to_bytevector (lst)
4527Return a newly allocated bytevector consisting of the unsigned 8-bit
4528integers listed in @var{lst}.
4529@end deffn
4530
4531@deffn {Scheme Procedure} bytevector->uint-list bv endianness size
4532@deffnx {Scheme Procedure} bytevector->sint-list bv endianness size
4533@deffnx {C Function} scm_bytevector_to_uint_list (bv, endianness, size)
4534@deffnx {C Function} scm_bytevector_to_sint_list (bv, endianness, size)
4535Return a list of unsigned (resp. signed) integers of @var{size} bytes
4536representing the contents of @var{bv}, decoded according to
4537@var{endianness}.
4538@end deffn
4539
4540@deffn {Scheme Procedure} uint-list->bytevector lst endianness size
4541@deffnx {Scheme Procedure} sint-list->bytevector lst endianness size
4542@deffnx {C Function} scm_uint_list_to_bytevector (lst, endianness, size)
4543@deffnx {C Function} scm_sint_list_to_bytevector (lst, endianness, size)
4544Return a new bytevector containing the unsigned (resp. signed) integers
4545listed in @var{lst} and encoded on @var{size} bytes according to
4546@var{endianness}.
4547@end deffn
4548
4549@node Bytevectors as Floats
4550@subsubsection Interpreting Bytevector Contents as Floating Point Numbers
4551
4552@cindex IEEE-754 floating point numbers
4553
4554Bytevector contents can also be accessed as IEEE-754 single- or
4555double-precision floating point numbers (respectively 32 and 64-bit
4556long) using the procedures described here.
4557
4558@deffn {Scheme Procedure} bytevector-ieee-single-ref bv index endianness
4559@deffnx {Scheme Procedure} bytevector-ieee-double-ref bv index endianness
4560@deffnx {C Function} scm_bytevector_ieee_single_ref (bv, index, endianness)
4561@deffnx {C Function} scm_bytevector_ieee_double_ref (bv, index, endianness)
4562Return the IEEE-754 single-precision floating point number from @var{bv}
4563at @var{index} according to @var{endianness}.
4564@end deffn
4565
4566@deffn {Scheme Procedure} bytevector-ieee-single-set! bv index value endianness
4567@deffnx {Scheme Procedure} bytevector-ieee-double-set! bv index value endianness
4568@deffnx {C Function} scm_bytevector_ieee_single_set_x (bv, index, value, endianness)
4569@deffnx {C Function} scm_bytevector_ieee_double_set_x (bv, index, value, endianness)
4570Store real number @var{value} in @var{bv} at @var{index} according to
4571@var{endianness}.
4572@end deffn
4573
4574Specialized procedures are also available:
4575
4576@deffn {Scheme Procedure} bytevector-ieee-single-native-ref bv index
4577@deffnx {Scheme Procedure} bytevector-ieee-double-native-ref bv index
4578@deffnx {C Function} scm_bytevector_ieee_single_native_ref (bv, index)
4579@deffnx {C Function} scm_bytevector_ieee_double_native_ref (bv, index)
4580Return the IEEE-754 single-precision floating point number from @var{bv}
4581at @var{index} according to the host's native endianness.
4582@end deffn
4583
4584@deffn {Scheme Procedure} bytevector-ieee-single-native-set! bv index value
4585@deffnx {Scheme Procedure} bytevector-ieee-double-native-set! bv index value
4586@deffnx {C Function} scm_bytevector_ieee_single_native_set_x (bv, index, value)
4587@deffnx {C Function} scm_bytevector_ieee_double_native_set_x (bv, index, value)
4588Store real number @var{value} in @var{bv} at @var{index} according to
4589the host's native endianness.
4590@end deffn
4591
4592
4593@node Bytevectors as Strings
4594@subsubsection Interpreting Bytevector Contents as Unicode Strings
4595
4596@cindex Unicode string encoding
4597
4598Bytevector contents can also be interpreted as Unicode strings encoded
d3b5628c 4599in one of the most commonly available encoding formats.
b242715b
LC
4600
4601@lisp
4602(utf8->string (u8-list->bytevector '(99 97 102 101)))
4603@result{} "cafe"
4604
4605(string->utf8 "caf@'e") ;; SMALL LATIN LETTER E WITH ACUTE ACCENT
4606@result{} #vu8(99 97 102 195 169)
4607@end lisp
4608
4609@deffn {Scheme Procedure} string->utf8 str
524aa8ae
LC
4610@deffnx {Scheme Procedure} string->utf16 str [endianness]
4611@deffnx {Scheme Procedure} string->utf32 str [endianness]
b242715b 4612@deffnx {C Function} scm_string_to_utf8 (str)
524aa8ae
LC
4613@deffnx {C Function} scm_string_to_utf16 (str, endianness)
4614@deffnx {C Function} scm_string_to_utf32 (str, endianness)
b242715b 4615Return a newly allocated bytevector that contains the UTF-8, UTF-16, or
524aa8ae
LC
4616UTF-32 (aka. UCS-4) encoding of @var{str}. For UTF-16 and UTF-32,
4617@var{endianness} should be the symbol @code{big} or @code{little}; when omitted,
4618it defaults to big endian.
b242715b
LC
4619@end deffn
4620
4621@deffn {Scheme Procedure} utf8->string utf
524aa8ae
LC
4622@deffnx {Scheme Procedure} utf16->string utf [endianness]
4623@deffnx {Scheme Procedure} utf32->string utf [endianness]
b242715b 4624@deffnx {C Function} scm_utf8_to_string (utf)
524aa8ae
LC
4625@deffnx {C Function} scm_utf16_to_string (utf, endianness)
4626@deffnx {C Function} scm_utf32_to_string (utf, endianness)
b242715b 4627Return a newly allocated string that contains from the UTF-8-, UTF-16-,
524aa8ae
LC
4628or UTF-32-decoded contents of bytevector @var{utf}. For UTF-16 and UTF-32,
4629@var{endianness} should be the symbol @code{big} or @code{little}; when omitted,
4630it defaults to big endian.
b242715b
LC
4631@end deffn
4632
438974d0
LC
4633@node Bytevectors as Generalized Vectors
4634@subsubsection Accessing Bytevectors with the Generalized Vector API
4635
4636As an extension to the R6RS, Guile allows bytevectors to be manipulated
4637with the @dfn{generalized vector} procedures (@pxref{Generalized
4638Vectors}). This also allows bytevectors to be accessed using the
4639generic @dfn{array} procedures (@pxref{Array Procedures}). When using
4640these APIs, bytes are accessed one at a time as 8-bit unsigned integers:
4641
4642@example
4643(define bv #vu8(0 1 2 3))
4644
4645(generalized-vector? bv)
4646@result{} #t
4647
4648(generalized-vector-ref bv 2)
4649@result{} 2
4650
4651(generalized-vector-set! bv 2 77)
4652(array-ref bv 2)
4653@result{} 77
4654
4655(array-type bv)
4656@result{} vu8
4657@end example
4658
b242715b 4659
27219b32
AW
4660@node Bytevectors as Uniform Vectors
4661@subsubsection Accessing Bytevectors with the SRFI-4 API
4662
4663Bytevectors may also be accessed with the SRFI-4 API. @xref{SRFI-4 and
4664Bytevectors}, for more information.
4665
4666
07d83abe
MV
4667@node Symbols
4668@subsection Symbols
4669@tpindex Symbols
4670
4671Symbols in Scheme are widely used in three ways: as items of discrete
4672data, as lookup keys for alists and hash tables, and to denote variable
4673references.
4674
4675A @dfn{symbol} is similar to a string in that it is defined by a
4676sequence of characters. The sequence of characters is known as the
4677symbol's @dfn{name}. In the usual case --- that is, where the symbol's
4678name doesn't include any characters that could be confused with other
4679elements of Scheme syntax --- a symbol is written in a Scheme program by
4680writing the sequence of characters that make up the name, @emph{without}
4681any quotation marks or other special syntax. For example, the symbol
4682whose name is ``multiply-by-2'' is written, simply:
4683
4684@lisp
4685multiply-by-2
4686@end lisp
4687
4688Notice how this differs from a @emph{string} with contents
4689``multiply-by-2'', which is written with double quotation marks, like
4690this:
4691
4692@lisp
4693"multiply-by-2"
4694@end lisp
4695
4696Looking beyond how they are written, symbols are different from strings
4697in two important respects.
4698
4699The first important difference is uniqueness. If the same-looking
4700string is read twice from two different places in a program, the result
4701is two @emph{different} string objects whose contents just happen to be
4702the same. If, on the other hand, the same-looking symbol is read twice
4703from two different places in a program, the result is the @emph{same}
4704symbol object both times.
4705
4706Given two read symbols, you can use @code{eq?} to test whether they are
4707the same (that is, have the same name). @code{eq?} is the most
4708efficient comparison operator in Scheme, and comparing two symbols like
4709this is as fast as comparing, for example, two numbers. Given two
4710strings, on the other hand, you must use @code{equal?} or
4711@code{string=?}, which are much slower comparison operators, to
4712determine whether the strings have the same contents.
4713
4714@lisp
4715(define sym1 (quote hello))
4716(define sym2 (quote hello))
4717(eq? sym1 sym2) @result{} #t
4718
4719(define str1 "hello")
4720(define str2 "hello")
4721(eq? str1 str2) @result{} #f
4722(equal? str1 str2) @result{} #t
4723@end lisp
4724
4725The second important difference is that symbols, unlike strings, are not
4726self-evaluating. This is why we need the @code{(quote @dots{})}s in the
4727example above: @code{(quote hello)} evaluates to the symbol named
4728"hello" itself, whereas an unquoted @code{hello} is @emph{read} as the
4729symbol named "hello" and evaluated as a variable reference @dots{} about
4730which more below (@pxref{Symbol Variables}).
4731
4732@menu
4733* Symbol Data:: Symbols as discrete data.
4734* Symbol Keys:: Symbols as lookup keys.
4735* Symbol Variables:: Symbols as denoting variables.
4736* Symbol Primitives:: Operations related to symbols.
4737* Symbol Props:: Function slots and property lists.
4738* Symbol Read Syntax:: Extended read syntax for symbols.
4739* Symbol Uninterned:: Uninterned symbols.
4740@end menu
4741
4742
4743@node Symbol Data
4744@subsubsection Symbols as Discrete Data
4745
4746Numbers and symbols are similar to the extent that they both lend
4747themselves to @code{eq?} comparison. But symbols are more descriptive
4748than numbers, because a symbol's name can be used directly to describe
4749the concept for which that symbol stands.
4750
4751For example, imagine that you need to represent some colours in a
4752computer program. Using numbers, you would have to choose arbitrarily
4753some mapping between numbers and colours, and then take care to use that
4754mapping consistently:
4755
4756@lisp
4757;; 1=red, 2=green, 3=purple
4758
4759(if (eq? (colour-of car) 1)
4760 ...)
4761@end lisp
4762
4763@noindent
4764You can make the mapping more explicit and the code more readable by
4765defining constants:
4766
4767@lisp
4768(define red 1)
4769(define green 2)
4770(define purple 3)
4771
4772(if (eq? (colour-of car) red)
4773 ...)
4774@end lisp
4775
4776@noindent
4777But the simplest and clearest approach is not to use numbers at all, but
4778symbols whose names specify the colours that they refer to:
4779
4780@lisp
4781(if (eq? (colour-of car) 'red)
4782 ...)
4783@end lisp
4784
4785The descriptive advantages of symbols over numbers increase as the set
4786of concepts that you want to describe grows. Suppose that a car object
4787can have other properties as well, such as whether it has or uses:
4788
4789@itemize @bullet
4790@item
4791automatic or manual transmission
4792@item
4793leaded or unleaded fuel
4794@item
4795power steering (or not).
4796@end itemize
4797
4798@noindent
4799Then a car's combined property set could be naturally represented and
4800manipulated as a list of symbols:
4801
4802@lisp
4803(properties-of car1)
4804@result{}
4805(red manual unleaded power-steering)
4806
4807(if (memq 'power-steering (properties-of car1))
4808 (display "Unfit people can drive this car.\n")
4809 (display "You'll need strong arms to drive this car!\n"))
4810@print{}
4811Unfit people can drive this car.
4812@end lisp
4813
4814Remember, the fundamental property of symbols that we are relying on
4815here is that an occurrence of @code{'red} in one part of a program is an
4816@emph{indistinguishable} symbol from an occurrence of @code{'red} in
4817another part of a program; this means that symbols can usefully be
4818compared using @code{eq?}. At the same time, symbols have naturally
4819descriptive names. This combination of efficiency and descriptive power
4820makes them ideal for use as discrete data.
4821
4822
4823@node Symbol Keys
4824@subsubsection Symbols as Lookup Keys
4825
4826Given their efficiency and descriptive power, it is natural to use
4827symbols as the keys in an association list or hash table.
4828
4829To illustrate this, consider a more structured representation of the car
4830properties example from the preceding subsection. Rather than
4831mixing all the properties up together in a flat list, we could use an
4832association list like this:
4833
4834@lisp
4835(define car1-properties '((colour . red)
4836 (transmission . manual)
4837 (fuel . unleaded)
4838 (steering . power-assisted)))
4839@end lisp
4840
4841Notice how this structure is more explicit and extensible than the flat
4842list. For example it makes clear that @code{manual} refers to the
4843transmission rather than, say, the windows or the locking of the car.
4844It also allows further properties to use the same symbols among their
4845possible values without becoming ambiguous:
4846
4847@lisp
4848(define car1-properties '((colour . red)
4849 (transmission . manual)
4850 (fuel . unleaded)
4851 (steering . power-assisted)
4852 (seat-colour . red)
4853 (locking . manual)))
4854@end lisp
4855
4856With a representation like this, it is easy to use the efficient
4857@code{assq-XXX} family of procedures (@pxref{Association Lists}) to
4858extract or change individual pieces of information:
4859
4860@lisp
4861(assq-ref car1-properties 'fuel) @result{} unleaded
4862(assq-ref car1-properties 'transmission) @result{} manual
4863
4864(assq-set! car1-properties 'seat-colour 'black)
4865@result{}
4866((colour . red)
4867 (transmission . manual)
4868 (fuel . unleaded)
4869 (steering . power-assisted)
4870 (seat-colour . black)
4871 (locking . manual)))
4872@end lisp
4873
4874Hash tables also have keys, and exactly the same arguments apply to the
4875use of symbols in hash tables as in association lists. The hash value
4876that Guile uses to decide where to add a symbol-keyed entry to a hash
4877table can be obtained by calling the @code{symbol-hash} procedure:
4878
4879@deffn {Scheme Procedure} symbol-hash symbol
4880@deffnx {C Function} scm_symbol_hash (symbol)
4881Return a hash value for @var{symbol}.
4882@end deffn
4883
4884See @ref{Hash Tables} for information about hash tables in general, and
4885for why you might choose to use a hash table rather than an association
4886list.
4887
4888
4889@node Symbol Variables
4890@subsubsection Symbols as Denoting Variables
4891
4892When an unquoted symbol in a Scheme program is evaluated, it is
4893interpreted as a variable reference, and the result of the evaluation is
4894the appropriate variable's value.
4895
4896For example, when the expression @code{(string-length "abcd")} is read
4897and evaluated, the sequence of characters @code{string-length} is read
4898as the symbol whose name is "string-length". This symbol is associated
4899with a variable whose value is the procedure that implements string
4900length calculation. Therefore evaluation of the @code{string-length}
4901symbol results in that procedure.
4902
4903The details of the connection between an unquoted symbol and the
4904variable to which it refers are explained elsewhere. See @ref{Binding
4905Constructs}, for how associations between symbols and variables are
4906created, and @ref{Modules}, for how those associations are affected by
4907Guile's module system.
4908
4909
4910@node Symbol Primitives
4911@subsubsection Operations Related to Symbols
4912
4913Given any Scheme value, you can determine whether it is a symbol using
4914the @code{symbol?} primitive:
4915
4916@rnindex symbol?
4917@deffn {Scheme Procedure} symbol? obj
4918@deffnx {C Function} scm_symbol_p (obj)
4919Return @code{#t} if @var{obj} is a symbol, otherwise return
4920@code{#f}.
4921@end deffn
4922
c9dc8c6c
MV
4923@deftypefn {C Function} int scm_is_symbol (SCM val)
4924Equivalent to @code{scm_is_true (scm_symbol_p (val))}.
4925@end deftypefn
4926
07d83abe
MV
4927Once you know that you have a symbol, you can obtain its name as a
4928string by calling @code{symbol->string}. Note that Guile differs by
4929default from R5RS on the details of @code{symbol->string} as regards
4930case-sensitivity:
4931
4932@rnindex symbol->string
4933@deffn {Scheme Procedure} symbol->string s
4934@deffnx {C Function} scm_symbol_to_string (s)
4935Return the name of symbol @var{s} as a string. By default, Guile reads
4936symbols case-sensitively, so the string returned will have the same case
4937variation as the sequence of characters that caused @var{s} to be
4938created.
4939
4940If Guile is set to read symbols case-insensitively (as specified by
4941R5RS), and @var{s} comes into being as part of a literal expression
4942(@pxref{Literal expressions,,,r5rs, The Revised^5 Report on Scheme}) or
4943by a call to the @code{read} or @code{string-ci->symbol} procedures,
4944Guile converts any alphabetic characters in the symbol's name to
4945lower case before creating the symbol object, so the string returned
4946here will be in lower case.
4947
4948If @var{s} was created by @code{string->symbol}, the case of characters
4949in the string returned will be the same as that in the string that was
4950passed to @code{string->symbol}, regardless of Guile's case-sensitivity
4951setting at the time @var{s} was created.
4952
4953It is an error to apply mutation procedures like @code{string-set!} to
4954strings returned by this procedure.
4955@end deffn
4956
4957Most symbols are created by writing them literally in code. However it
4958is also possible to create symbols programmatically using the following
c5fc8f8c
JG
4959procedures:
4960
4961@deffn {Scheme Procedure} symbol char@dots{}
4962@rnindex symbol
4963Return a newly allocated symbol made from the given character arguments.
4964
4965@example
4966(symbol #\x #\y #\z) @result{} xyz
4967@end example
4968@end deffn
4969
4970@deffn {Scheme Procedure} list->symbol lst
4971@rnindex list->symbol
4972Return a newly allocated symbol made from a list of characters.
4973
4974@example
4975(list->symbol '(#\a #\b #\c)) @result{} abc
4976@end example
4977@end deffn
4978
4979@rnindex symbol-append
4980@deffn {Scheme Procedure} symbol-append . args
4981Return a newly allocated symbol whose characters form the
4982concatenation of the given symbols, @var{args}.
4983
4984@example
4985(let ((h 'hello))
4986 (symbol-append h 'world))
4987@result{} helloworld
4988@end example
4989@end deffn
07d83abe
MV
4990
4991@rnindex string->symbol
4992@deffn {Scheme Procedure} string->symbol string
4993@deffnx {C Function} scm_string_to_symbol (string)
4994Return the symbol whose name is @var{string}. This procedure can create
4995symbols with names containing special characters or letters in the
4996non-standard case, but it is usually a bad idea to create such symbols
4997because in some implementations of Scheme they cannot be read as
4998themselves.
4999@end deffn
5000
5001@deffn {Scheme Procedure} string-ci->symbol str
5002@deffnx {C Function} scm_string_ci_to_symbol (str)
5003Return the symbol whose name is @var{str}. If Guile is currently
5004reading symbols case-insensitively, @var{str} is converted to lowercase
5005before the returned symbol is looked up or created.
5006@end deffn
5007
5008The following examples illustrate Guile's detailed behaviour as regards
5009the case-sensitivity of symbols:
5010
5011@lisp
5012(read-enable 'case-insensitive) ; R5RS compliant behaviour
5013
5014(symbol->string 'flying-fish) @result{} "flying-fish"
5015(symbol->string 'Martin) @result{} "martin"
5016(symbol->string
5017 (string->symbol "Malvina")) @result{} "Malvina"
5018
5019(eq? 'mISSISSIppi 'mississippi) @result{} #t
5020(string->symbol "mISSISSIppi") @result{} mISSISSIppi
5021(eq? 'bitBlt (string->symbol "bitBlt")) @result{} #f
5022(eq? 'LolliPop
5023 (string->symbol (symbol->string 'LolliPop))) @result{} #t
5024(string=? "K. Harper, M.D."
5025 (symbol->string
5026 (string->symbol "K. Harper, M.D."))) @result{} #t
5027
5028(read-disable 'case-insensitive) ; Guile default behaviour
5029
5030(symbol->string 'flying-fish) @result{} "flying-fish"
5031(symbol->string 'Martin) @result{} "Martin"
5032(symbol->string
5033 (string->symbol "Malvina")) @result{} "Malvina"
5034
5035(eq? 'mISSISSIppi 'mississippi) @result{} #f
5036(string->symbol "mISSISSIppi") @result{} mISSISSIppi
5037(eq? 'bitBlt (string->symbol "bitBlt")) @result{} #t
5038(eq? 'LolliPop
5039 (string->symbol (symbol->string 'LolliPop))) @result{} #t
5040(string=? "K. Harper, M.D."
5041 (symbol->string
5042 (string->symbol "K. Harper, M.D."))) @result{} #t
5043@end lisp
5044
5045From C, there are lower level functions that construct a Scheme symbol
c48c62d0
MV
5046from a C string in the current locale encoding.
5047
5048When you want to do more from C, you should convert between symbols
5049and strings using @code{scm_symbol_to_string} and
5050@code{scm_string_to_symbol} and work with the strings.
07d83abe 5051
c48c62d0
MV
5052@deffn {C Function} scm_from_locale_symbol (const char *name)
5053@deffnx {C Function} scm_from_locale_symboln (const char *name, size_t len)
07d83abe 5054Construct and return a Scheme symbol whose name is specified by
c48c62d0
MV
5055@var{name}. For @code{scm_from_locale_symbol}, @var{name} must be null
5056terminated; for @code{scm_from_locale_symboln} the length of @var{name} is
07d83abe
MV
5057specified explicitly by @var{len}.
5058@end deffn
5059
fd0a5bbc
HWN
5060@deftypefn {C Function} SCM scm_take_locale_symbol (char *str)
5061@deftypefnx {C Function} SCM scm_take_locale_symboln (char *str, size_t len)
5062Like @code{scm_from_locale_symbol} and @code{scm_from_locale_symboln},
5063respectively, but also frees @var{str} with @code{free} eventually.
5064Thus, you can use this function when you would free @var{str} anyway
5065immediately after creating the Scheme string. In certain cases, Guile
5066can then use @var{str} directly as its internal representation.
5067@end deftypefn
5068
071bb6a8
LC
5069The size of a symbol can also be obtained from C:
5070
5071@deftypefn {C Function} size_t scm_c_symbol_length (SCM sym)
5072Return the number of characters in @var{sym}.
5073@end deftypefn
fd0a5bbc 5074
07d83abe
MV
5075Finally, some applications, especially those that generate new Scheme
5076code dynamically, need to generate symbols for use in the generated
5077code. The @code{gensym} primitive meets this need:
5078
5079@deffn {Scheme Procedure} gensym [prefix]
5080@deffnx {C Function} scm_gensym (prefix)
5081Create a new symbol with a name constructed from a prefix and a counter
5082value. The string @var{prefix} can be specified as an optional
5083argument. Default prefix is @samp{@w{ g}}. The counter is increased by 1
5084at each call. There is no provision for resetting the counter.
5085@end deffn
5086
5087The symbols generated by @code{gensym} are @emph{likely} to be unique,
5088since their names begin with a space and it is only otherwise possible
5089to generate such symbols if a programmer goes out of their way to do
5090so. Uniqueness can be guaranteed by instead using uninterned symbols
5091(@pxref{Symbol Uninterned}), though they can't be usefully written out
5092and read back in.
5093
5094
5095@node Symbol Props
5096@subsubsection Function Slots and Property Lists
5097
5098In traditional Lisp dialects, symbols are often understood as having
5099three kinds of value at once:
5100
5101@itemize @bullet
5102@item
5103a @dfn{variable} value, which is used when the symbol appears in
5104code in a variable reference context
5105
5106@item
5107a @dfn{function} value, which is used when the symbol appears in
5108code in a function name position (i.e. as the first element in an
5109unquoted list)
5110
5111@item
5112a @dfn{property list} value, which is used when the symbol is given as
5113the first argument to Lisp's @code{put} or @code{get} functions.
5114@end itemize
5115
5116Although Scheme (as one of its simplifications with respect to Lisp)
5117does away with the distinction between variable and function namespaces,
5118Guile currently retains some elements of the traditional structure in
5119case they turn out to be useful when implementing translators for other
5120languages, in particular Emacs Lisp.
5121
5122Specifically, Guile symbols have two extra slots. for a symbol's
5123property list, and for its ``function value.'' The following procedures
5124are provided to access these slots.
5125
5126@deffn {Scheme Procedure} symbol-fref symbol
5127@deffnx {C Function} scm_symbol_fref (symbol)
5128Return the contents of @var{symbol}'s @dfn{function slot}.
5129@end deffn
5130
5131@deffn {Scheme Procedure} symbol-fset! symbol value
5132@deffnx {C Function} scm_symbol_fset_x (symbol, value)
5133Set the contents of @var{symbol}'s function slot to @var{value}.
5134@end deffn
5135
5136@deffn {Scheme Procedure} symbol-pref symbol
5137@deffnx {C Function} scm_symbol_pref (symbol)
5138Return the @dfn{property list} currently associated with @var{symbol}.
5139@end deffn
5140
5141@deffn {Scheme Procedure} symbol-pset! symbol value
5142@deffnx {C Function} scm_symbol_pset_x (symbol, value)
5143Set @var{symbol}'s property list to @var{value}.
5144@end deffn
5145
5146@deffn {Scheme Procedure} symbol-property sym prop
5147From @var{sym}'s property list, return the value for property
5148@var{prop}. The assumption is that @var{sym}'s property list is an
5149association list whose keys are distinguished from each other using
5150@code{equal?}; @var{prop} should be one of the keys in that list. If
5151the property list has no entry for @var{prop}, @code{symbol-property}
5152returns @code{#f}.
5153@end deffn
5154
5155@deffn {Scheme Procedure} set-symbol-property! sym prop val
5156In @var{sym}'s property list, set the value for property @var{prop} to
5157@var{val}, or add a new entry for @var{prop}, with value @var{val}, if
5158none already exists. For the structure of the property list, see
5159@code{symbol-property}.
5160@end deffn
5161
5162@deffn {Scheme Procedure} symbol-property-remove! sym prop
5163From @var{sym}'s property list, remove the entry for property
5164@var{prop}, if there is one. For the structure of the property list,
5165see @code{symbol-property}.
5166@end deffn
5167
5168Support for these extra slots may be removed in a future release, and it
4695789c
NJ
5169is probably better to avoid using them. For a more modern and Schemely
5170approach to properties, see @ref{Object Properties}.
07d83abe
MV
5171
5172
5173@node Symbol Read Syntax
5174@subsubsection Extended Read Syntax for Symbols
5175
5176The read syntax for a symbol is a sequence of letters, digits, and
5177@dfn{extended alphabetic characters}, beginning with a character that
5178cannot begin a number. In addition, the special cases of @code{+},
5179@code{-}, and @code{...} are read as symbols even though numbers can
5180begin with @code{+}, @code{-} or @code{.}.
5181
5182Extended alphabetic characters may be used within identifiers as if
5183they were letters. The set of extended alphabetic characters is:
5184
5185@example
5186! $ % & * + - . / : < = > ? @@ ^ _ ~
5187@end example
5188
5189In addition to the standard read syntax defined above (which is taken
5190from R5RS (@pxref{Formal syntax,,,r5rs,The Revised^5 Report on
5191Scheme})), Guile provides an extended symbol read syntax that allows the
5192inclusion of unusual characters such as space characters, newlines and
5193parentheses. If (for whatever reason) you need to write a symbol
5194containing characters not mentioned above, you can do so as follows.
5195
5196@itemize @bullet
5197@item
5198Begin the symbol with the characters @code{#@{},
5199
5200@item
5201write the characters of the symbol and
5202
5203@item
5204finish the symbol with the characters @code{@}#}.
5205@end itemize
5206
5207Here are a few examples of this form of read syntax. The first symbol
5208needs to use extended syntax because it contains a space character, the
5209second because it contains a line break, and the last because it looks
5210like a number.
5211
5212@lisp
5213#@{foo bar@}#
5214
5215#@{what
5216ever@}#
5217
5218#@{4242@}#
5219@end lisp
5220
5221Although Guile provides this extended read syntax for symbols,
5222widespread usage of it is discouraged because it is not portable and not
5223very readable.
5224
5225
5226@node Symbol Uninterned
5227@subsubsection Uninterned Symbols
5228
5229What makes symbols useful is that they are automatically kept unique.
5230There are no two symbols that are distinct objects but have the same
5231name. But of course, there is no rule without exception. In addition
5232to the normal symbols that have been discussed up to now, you can also
5233create special @dfn{uninterned} symbols that behave slightly
5234differently.
5235
5236To understand what is different about them and why they might be useful,
5237we look at how normal symbols are actually kept unique.
5238
5239Whenever Guile wants to find the symbol with a specific name, for
5240example during @code{read} or when executing @code{string->symbol}, it
5241first looks into a table of all existing symbols to find out whether a
5242symbol with the given name already exists. When this is the case, Guile
5243just returns that symbol. When not, a new symbol with the name is
5244created and entered into the table so that it can be found later.
5245
5246Sometimes you might want to create a symbol that is guaranteed `fresh',
5247i.e. a symbol that did not exist previously. You might also want to
5248somehow guarantee that no one else will ever unintentionally stumble
5249across your symbol in the future. These properties of a symbol are
5250often needed when generating code during macro expansion. When
5251introducing new temporary variables, you want to guarantee that they
5252don't conflict with variables in other people's code.
5253
5254The simplest way to arrange for this is to create a new symbol but
5255not enter it into the global table of all symbols. That way, no one
5256will ever get access to your symbol by chance. Symbols that are not in
5257the table are called @dfn{uninterned}. Of course, symbols that
5258@emph{are} in the table are called @dfn{interned}.
5259
5260You create new uninterned symbols with the function @code{make-symbol}.
5261You can test whether a symbol is interned or not with
5262@code{symbol-interned?}.
5263
5264Uninterned symbols break the rule that the name of a symbol uniquely
5265identifies the symbol object. Because of this, they can not be written
5266out and read back in like interned symbols. Currently, Guile has no
5267support for reading uninterned symbols. Note that the function
5268@code{gensym} does not return uninterned symbols for this reason.
5269
5270@deffn {Scheme Procedure} make-symbol name
5271@deffnx {C Function} scm_make_symbol (name)
5272Return a new uninterned symbol with the name @var{name}. The returned
5273symbol is guaranteed to be unique and future calls to
5274@code{string->symbol} will not return it.
5275@end deffn
5276
5277@deffn {Scheme Procedure} symbol-interned? symbol
5278@deffnx {C Function} scm_symbol_interned_p (symbol)
5279Return @code{#t} if @var{symbol} is interned, otherwise return
5280@code{#f}.
5281@end deffn
5282
5283For example:
5284
5285@lisp
5286(define foo-1 (string->symbol "foo"))
5287(define foo-2 (string->symbol "foo"))
5288(define foo-3 (make-symbol "foo"))
5289(define foo-4 (make-symbol "foo"))
5290
5291(eq? foo-1 foo-2)
5292@result{} #t
5293; Two interned symbols with the same name are the same object,
5294
5295(eq? foo-1 foo-3)
5296@result{} #f
5297; but a call to make-symbol with the same name returns a
5298; distinct object.
5299
5300(eq? foo-3 foo-4)
5301@result{} #f
5302; A call to make-symbol always returns a new object, even for
5303; the same name.
5304
5305foo-3
5306@result{} #<uninterned-symbol foo 8085290>
5307; Uninterned symbols print differently from interned symbols,
5308
5309(symbol? foo-3)
5310@result{} #t
5311; but they are still symbols,
5312
5313(symbol-interned? foo-3)
5314@result{} #f
5315; just not interned.
5316@end lisp
5317
5318
5319@node Keywords
5320@subsection Keywords
5321@tpindex Keywords
5322
5323Keywords are self-evaluating objects with a convenient read syntax that
5324makes them easy to type.
5325
5326Guile's keyword support conforms to R5RS, and adds a (switchable) read
5327syntax extension to permit keywords to begin with @code{:} as well as
ef4cbc08 5328@code{#:}, or to end with @code{:}.
07d83abe
MV
5329
5330@menu
5331* Why Use Keywords?:: Motivation for keyword usage.
5332* Coding With Keywords:: How to use keywords.
5333* Keyword Read Syntax:: Read syntax for keywords.
5334* Keyword Procedures:: Procedures for dealing with keywords.
07d83abe
MV
5335@end menu
5336
5337@node Why Use Keywords?
5338@subsubsection Why Use Keywords?
5339
5340Keywords are useful in contexts where a program or procedure wants to be
5341able to accept a large number of optional arguments without making its
5342interface unmanageable.
5343
5344To illustrate this, consider a hypothetical @code{make-window}
5345procedure, which creates a new window on the screen for drawing into
5346using some graphical toolkit. There are many parameters that the caller
5347might like to specify, but which could also be sensibly defaulted, for
5348example:
5349
5350@itemize @bullet
5351@item
5352color depth -- Default: the color depth for the screen
5353
5354@item
5355background color -- Default: white
5356
5357@item
5358width -- Default: 600
5359
5360@item
5361height -- Default: 400
5362@end itemize
5363
5364If @code{make-window} did not use keywords, the caller would have to
5365pass in a value for each possible argument, remembering the correct
5366argument order and using a special value to indicate the default value
5367for that argument:
5368
5369@lisp
5370(make-window 'default ;; Color depth
5371 'default ;; Background color
5372 800 ;; Width
5373 100 ;; Height
5374 @dots{}) ;; More make-window arguments
5375@end lisp
5376
5377With keywords, on the other hand, defaulted arguments are omitted, and
5378non-default arguments are clearly tagged by the appropriate keyword. As
5379a result, the invocation becomes much clearer:
5380
5381@lisp
5382(make-window #:width 800 #:height 100)
5383@end lisp
5384
5385On the other hand, for a simpler procedure with few arguments, the use
5386of keywords would be a hindrance rather than a help. The primitive
5387procedure @code{cons}, for example, would not be improved if it had to
5388be invoked as
5389
5390@lisp
5391(cons #:car x #:cdr y)
5392@end lisp
5393
5394So the decision whether to use keywords or not is purely pragmatic: use
5395them if they will clarify the procedure invocation at point of call.
5396
5397@node Coding With Keywords
5398@subsubsection Coding With Keywords
5399
5400If a procedure wants to support keywords, it should take a rest argument
5401and then use whatever means is convenient to extract keywords and their
5402corresponding arguments from the contents of that rest argument.
5403
5404The following example illustrates the principle: the code for
5405@code{make-window} uses a helper procedure called
5406@code{get-keyword-value} to extract individual keyword arguments from
5407the rest argument.
5408
5409@lisp
5410(define (get-keyword-value args keyword default)
5411 (let ((kv (memq keyword args)))
5412 (if (and kv (>= (length kv) 2))
5413 (cadr kv)
5414 default)))
5415
5416(define (make-window . args)
5417 (let ((depth (get-keyword-value args #:depth screen-depth))
5418 (bg (get-keyword-value args #:bg "white"))
5419 (width (get-keyword-value args #:width 800))
5420 (height (get-keyword-value args #:height 100))
5421 @dots{})
5422 @dots{}))
5423@end lisp
5424
5425But you don't need to write @code{get-keyword-value}. The @code{(ice-9
5426optargs)} module provides a set of powerful macros that you can use to
5427implement keyword-supporting procedures like this:
5428
5429@lisp
5430(use-modules (ice-9 optargs))
5431
5432(define (make-window . args)
5433 (let-keywords args #f ((depth screen-depth)
5434 (bg "white")
5435 (width 800)
5436 (height 100))
5437 ...))
5438@end lisp
5439
5440@noindent
5441Or, even more economically, like this:
5442
5443@lisp
5444(use-modules (ice-9 optargs))
5445
5446(define* (make-window #:key (depth screen-depth)
5447 (bg "white")
5448 (width 800)
5449 (height 100))
5450 ...)
5451@end lisp
5452
5453For further details on @code{let-keywords}, @code{define*} and other
5454facilities provided by the @code{(ice-9 optargs)} module, see
5455@ref{Optional Arguments}.
5456
5457
5458@node Keyword Read Syntax
5459@subsubsection Keyword Read Syntax
5460
7719ef22
MV
5461Guile, by default, only recognizes a keyword syntax that is compatible
5462with R5RS. A token of the form @code{#:NAME}, where @code{NAME} has the
5463same syntax as a Scheme symbol (@pxref{Symbol Read Syntax}), is the
5464external representation of the keyword named @code{NAME}. Keyword
5465objects print using this syntax as well, so values containing keyword
5466objects can be read back into Guile. When used in an expression,
5467keywords are self-quoting objects.
07d83abe
MV
5468
5469If the @code{keyword} read option is set to @code{'prefix}, Guile also
5470recognizes the alternative read syntax @code{:NAME}. Otherwise, tokens
5471of the form @code{:NAME} are read as symbols, as required by R5RS.
5472
ef4cbc08
LC
5473@cindex SRFI-88 keyword syntax
5474
5475If the @code{keyword} read option is set to @code{'postfix}, Guile
189681f5
LC
5476recognizes the SRFI-88 read syntax @code{NAME:} (@pxref{SRFI-88}).
5477Otherwise, tokens of this form are read as symbols.
ef4cbc08 5478
07d83abe 5479To enable and disable the alternative non-R5RS keyword syntax, you use
1518f649
AW
5480the @code{read-set!} procedure documented @ref{Scheme Read}. Note that
5481the @code{prefix} and @code{postfix} syntax are mutually exclusive.
07d83abe 5482
aba0dff5 5483@lisp
07d83abe
MV
5484(read-set! keywords 'prefix)
5485
5486#:type
5487@result{}
5488#:type
5489
5490:type
5491@result{}
5492#:type
5493
ef4cbc08
LC
5494(read-set! keywords 'postfix)
5495
5496type:
5497@result{}
5498#:type
5499
5500:type
5501@result{}
5502:type
5503
07d83abe
MV
5504(read-set! keywords #f)
5505
5506#:type
5507@result{}
5508#:type
5509
5510:type
5511@print{}
5512ERROR: In expression :type:
5513ERROR: Unbound variable: :type
5514ABORT: (unbound-variable)
aba0dff5 5515@end lisp
07d83abe
MV
5516
5517@node Keyword Procedures
5518@subsubsection Keyword Procedures
5519
07d83abe
MV
5520@deffn {Scheme Procedure} keyword? obj
5521@deffnx {C Function} scm_keyword_p (obj)
5522Return @code{#t} if the argument @var{obj} is a keyword, else
5523@code{#f}.
5524@end deffn
5525
7719ef22
MV
5526@deffn {Scheme Procedure} keyword->symbol keyword
5527@deffnx {C Function} scm_keyword_to_symbol (keyword)
5528Return the symbol with the same name as @var{keyword}.
07d83abe
MV
5529@end deffn
5530
7719ef22
MV
5531@deffn {Scheme Procedure} symbol->keyword symbol
5532@deffnx {C Function} scm_symbol_to_keyword (symbol)
5533Return the keyword with the same name as @var{symbol}.
5534@end deffn
07d83abe 5535
7719ef22
MV
5536@deftypefn {C Function} int scm_is_keyword (SCM obj)
5537Equivalent to @code{scm_is_true (scm_keyword_p (@var{obj}))}.
07d83abe
MV
5538@end deftypefn
5539
7719ef22
MV
5540@deftypefn {C Function} SCM scm_from_locale_keyword (const char *str)
5541@deftypefnx {C Function} SCM scm_from_locale_keywordn (const char *str, size_t len)
5542Equivalent to @code{scm_symbol_to_keyword (scm_from_locale_symbol
5543(@var{str}))} and @code{scm_symbol_to_keyword (scm_from_locale_symboln
5544(@var{str}, @var{len}))}, respectively.
5545@end deftypefn
07d83abe
MV
5546
5547@node Other Types
5548@subsection ``Functionality-Centric'' Data Types
5549
a136ada6 5550Procedures and macros are documented in their own sections: see
e4955559 5551@ref{Procedures} and @ref{Macros}.
07d83abe
MV
5552
5553Variable objects are documented as part of the description of Guile's
5554module system: see @ref{Variables}.
5555
a136ada6 5556Asyncs, dynamic roots and fluids are described in the section on
07d83abe
MV
5557scheduling: see @ref{Scheduling}.
5558
a136ada6 5559Hooks are documented in the section on general utility functions: see
07d83abe
MV
5560@ref{Hooks}.
5561
a136ada6 5562Ports are described in the section on I/O: see @ref{Input and Output}.
07d83abe 5563
a136ada6
NJ
5564Regular expressions are described in their own section: see @ref{Regular
5565Expressions}.
07d83abe
MV
5566
5567@c Local Variables:
5568@c TeX-master: "guile.texi"
5569@c End: