Commit | Line | Data |
---|---|---|
07d83abe MV |
1 | @c -*-texinfo-*- |
2 | @c This is part of the GNU Guile Reference Manual. | |
7facc08a | 3 | @c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2006, 2007, |
f659df44 | 4 | @c 2008, 2009, 2010, 2011, 2012, 2013, 2014 Free Software Foundation, Inc. |
07d83abe MV |
5 | @c See the file guile.texi for copying conditions. |
6 | ||
07d83abe MV |
7 | @node Simple Data Types |
8 | @section Simple Generic Data Types | |
9 | ||
10 | This chapter describes those of Guile's simple data types which are | |
11 | primarily used for their role as items of generic data. By | |
12 | @dfn{simple} we mean data types that are not primarily used as | |
13 | containers to hold other data --- i.e.@: pairs, lists, vectors and so on. | |
14 | For the documentation of such @dfn{compound} data types, see | |
15 | @ref{Compound Data Types}. | |
16 | ||
17 | @c One of the great strengths of Scheme is that there is no straightforward | |
18 | @c distinction between ``data'' and ``functionality''. For example, | |
19 | @c Guile's support for dynamic linking could be described: | |
20 | ||
21 | @c @itemize @bullet | |
22 | @c @item | |
23 | @c either in a ``data-centric'' way, as the behaviour and properties of the | |
24 | @c ``dynamically linked object'' data type, and the operations that may be | |
25 | @c applied to instances of this type | |
26 | ||
27 | @c @item | |
28 | @c or in a ``functionality-centric'' way, as the set of procedures that | |
29 | @c constitute Guile's support for dynamic linking, in the context of the | |
30 | @c module system. | |
31 | @c @end itemize | |
32 | ||
33 | @c The contents of this chapter are, therefore, a matter of judgment. By | |
34 | @c @dfn{generic}, we mean to select those data types whose typical use as | |
35 | @c @emph{data} in a wide variety of programming contexts is more important | |
36 | @c than their use in the implementation of a particular piece of | |
37 | @c @emph{functionality}. The last section of this chapter provides | |
38 | @c references for all the data types that are documented not here but in a | |
39 | @c ``functionality-centric'' way elsewhere in the manual. | |
40 | ||
41 | @menu | |
42 | * Booleans:: True/false values. | |
43 | * Numbers:: Numerical data types. | |
050ab45f MV |
44 | * Characters:: Single characters. |
45 | * Character Sets:: Sets of characters. | |
46 | * Strings:: Sequences of characters. | |
b242715b | 47 | * Bytevectors:: Sequences of bytes. |
07d83abe MV |
48 | * Symbols:: Symbols. |
49 | * Keywords:: Self-quoting, customizable display keywords. | |
50 | * Other Types:: "Functionality-centric" data types. | |
51 | @end menu | |
52 | ||
53 | ||
54 | @node Booleans | |
55 | @subsection Booleans | |
56 | @tpindex Booleans | |
57 | ||
58 | The two boolean values are @code{#t} for true and @code{#f} for false. | |
7a329029 | 59 | They can also be written as @code{#true} and @code{#false}, as per R7RS. |
07d83abe MV |
60 | |
61 | Boolean values are returned by predicate procedures, such as the general | |
62 | equality predicates @code{eq?}, @code{eqv?} and @code{equal?} | |
63 | (@pxref{Equality}) and numerical and string comparison operators like | |
64 | @code{string=?} (@pxref{String Comparison}) and @code{<=} | |
65 | (@pxref{Comparison}). | |
66 | ||
67 | @lisp | |
68 | (<= 3 8) | |
69 | @result{} #t | |
70 | ||
71 | (<= 3 -3) | |
72 | @result{} #f | |
73 | ||
74 | (equal? "house" "houses") | |
75 | @result{} #f | |
76 | ||
77 | (eq? #f #f) | |
78 | @result{} | |
79 | #t | |
80 | @end lisp | |
81 | ||
9accf3d9 AW |
82 | In test condition contexts like @code{if} and @code{cond} |
83 | (@pxref{Conditionals}), where a group of subexpressions will be | |
84 | evaluated only if a @var{condition} expression evaluates to ``true'', | |
85 | ``true'' means any value at all except @code{#f}. | |
07d83abe MV |
86 | |
87 | @lisp | |
88 | (if #t "yes" "no") | |
89 | @result{} "yes" | |
90 | ||
91 | (if 0 "yes" "no") | |
92 | @result{} "yes" | |
93 | ||
94 | (if #f "yes" "no") | |
95 | @result{} "no" | |
96 | @end lisp | |
97 | ||
98 | A result of this asymmetry is that typical Scheme source code more often | |
99 | uses @code{#f} explicitly than @code{#t}: @code{#f} is necessary to | |
100 | represent an @code{if} or @code{cond} false value, whereas @code{#t} is | |
101 | not necessary to represent an @code{if} or @code{cond} true value. | |
102 | ||
103 | It is important to note that @code{#f} is @strong{not} equivalent to any | |
104 | other Scheme value. In particular, @code{#f} is not the same as the | |
105 | number 0 (like in C and C++), and not the same as the ``empty list'' | |
106 | (like in some Lisp dialects). | |
107 | ||
108 | In C, the two Scheme boolean values are available as the two constants | |
109 | @code{SCM_BOOL_T} for @code{#t} and @code{SCM_BOOL_F} for @code{#f}. | |
110 | Care must be taken with the false value @code{SCM_BOOL_F}: it is not | |
111 | false when used in C conditionals. In order to test for it, use | |
112 | @code{scm_is_false} or @code{scm_is_true}. | |
113 | ||
114 | @rnindex not | |
115 | @deffn {Scheme Procedure} not x | |
116 | @deffnx {C Function} scm_not (x) | |
117 | Return @code{#t} if @var{x} is @code{#f}, else return @code{#f}. | |
118 | @end deffn | |
119 | ||
120 | @rnindex boolean? | |
121 | @deffn {Scheme Procedure} boolean? obj | |
122 | @deffnx {C Function} scm_boolean_p (obj) | |
123 | Return @code{#t} if @var{obj} is either @code{#t} or @code{#f}, else | |
124 | return @code{#f}. | |
125 | @end deffn | |
126 | ||
127 | @deftypevr {C Macro} SCM SCM_BOOL_T | |
128 | The @code{SCM} representation of the Scheme object @code{#t}. | |
129 | @end deftypevr | |
130 | ||
131 | @deftypevr {C Macro} SCM SCM_BOOL_F | |
132 | The @code{SCM} representation of the Scheme object @code{#f}. | |
133 | @end deftypevr | |
134 | ||
135 | @deftypefn {C Function} int scm_is_true (SCM obj) | |
136 | Return @code{0} if @var{obj} is @code{#f}, else return @code{1}. | |
137 | @end deftypefn | |
138 | ||
139 | @deftypefn {C Function} int scm_is_false (SCM obj) | |
140 | Return @code{1} if @var{obj} is @code{#f}, else return @code{0}. | |
141 | @end deftypefn | |
142 | ||
143 | @deftypefn {C Function} int scm_is_bool (SCM obj) | |
144 | Return @code{1} if @var{obj} is either @code{#t} or @code{#f}, else | |
145 | return @code{0}. | |
146 | @end deftypefn | |
147 | ||
148 | @deftypefn {C Function} SCM scm_from_bool (int val) | |
149 | Return @code{#f} if @var{val} is @code{0}, else return @code{#t}. | |
150 | @end deftypefn | |
151 | ||
152 | @deftypefn {C Function} int scm_to_bool (SCM val) | |
153 | Return @code{1} if @var{val} is @code{SCM_BOOL_T}, return @code{0} | |
154 | when @var{val} is @code{SCM_BOOL_F}, else signal a `wrong type' error. | |
155 | ||
156 | You should probably use @code{scm_is_true} instead of this function | |
157 | when you just want to test a @code{SCM} value for trueness. | |
158 | @end deftypefn | |
159 | ||
160 | @node Numbers | |
161 | @subsection Numerical data types | |
162 | @tpindex Numbers | |
163 | ||
164 | Guile supports a rich ``tower'' of numerical types --- integer, | |
165 | rational, real and complex --- and provides an extensive set of | |
166 | mathematical and scientific functions for operating on numerical | |
167 | data. This section of the manual documents those types and functions. | |
168 | ||
169 | You may also find it illuminating to read R5RS's presentation of numbers | |
170 | in Scheme, which is particularly clear and accessible: see | |
171 | @ref{Numbers,,,r5rs,R5RS}. | |
172 | ||
173 | @menu | |
174 | * Numerical Tower:: Scheme's numerical "tower". | |
175 | * Integers:: Whole numbers. | |
176 | * Reals and Rationals:: Real and rational numbers. | |
177 | * Complex Numbers:: Complex numbers. | |
178 | * Exactness:: Exactness and inexactness. | |
179 | * Number Syntax:: Read syntax for numerical data. | |
180 | * Integer Operations:: Operations on integer values. | |
181 | * Comparison:: Comparison predicates. | |
182 | * Conversion:: Converting numbers to and from strings. | |
183 | * Complex:: Complex number operations. | |
184 | * Arithmetic:: Arithmetic functions. | |
185 | * Scientific:: Scientific functions. | |
07d83abe MV |
186 | * Bitwise Operations:: Logical AND, OR, NOT, and so on. |
187 | * Random:: Random number generation. | |
188 | @end menu | |
189 | ||
190 | ||
191 | @node Numerical Tower | |
192 | @subsubsection Scheme's Numerical ``Tower'' | |
193 | @rnindex number? | |
194 | ||
195 | Scheme's numerical ``tower'' consists of the following categories of | |
196 | numbers: | |
197 | ||
198 | @table @dfn | |
199 | @item integers | |
200 | Whole numbers, positive or negative; e.g.@: --5, 0, 18. | |
201 | ||
202 | @item rationals | |
203 | The set of numbers that can be expressed as @math{@var{p}/@var{q}} | |
204 | where @var{p} and @var{q} are integers; e.g.@: @math{9/16} works, but | |
205 | pi (an irrational number) doesn't. These include integers | |
206 | (@math{@var{n}/1}). | |
207 | ||
208 | @item real numbers | |
209 | The set of numbers that describes all possible positions along a | |
210 | one-dimensional line. This includes rationals as well as irrational | |
211 | numbers. | |
212 | ||
213 | @item complex numbers | |
214 | The set of numbers that describes all possible positions in a two | |
215 | dimensional space. This includes real as well as imaginary numbers | |
216 | (@math{@var{a}+@var{b}i}, where @var{a} is the @dfn{real part}, | |
217 | @var{b} is the @dfn{imaginary part}, and @math{i} is the square root of | |
218 | @minus{}1.) | |
219 | @end table | |
220 | ||
221 | It is called a tower because each category ``sits on'' the one that | |
222 | follows it, in the sense that every integer is also a rational, every | |
223 | rational is also real, and every real number is also a complex number | |
224 | (but with zero imaginary part). | |
225 | ||
226 | In addition to the classification into integers, rationals, reals and | |
227 | complex numbers, Scheme also distinguishes between whether a number is | |
228 | represented exactly or not. For example, the result of | |
9f1ba6a9 NJ |
229 | @m{2\sin(\pi/4),2*sin(pi/4)} is exactly @m{\sqrt{2},2^(1/2)}, but Guile |
230 | can represent neither @m{\pi/4,pi/4} nor @m{\sqrt{2},2^(1/2)} exactly. | |
07d83abe MV |
231 | Instead, it stores an inexact approximation, using the C type |
232 | @code{double}. | |
233 | ||
234 | Guile can represent exact rationals of any magnitude, inexact | |
235 | rationals that fit into a C @code{double}, and inexact complex numbers | |
236 | with @code{double} real and imaginary parts. | |
237 | ||
238 | The @code{number?} predicate may be applied to any Scheme value to | |
239 | discover whether the value is any of the supported numerical types. | |
240 | ||
241 | @deffn {Scheme Procedure} number? obj | |
242 | @deffnx {C Function} scm_number_p (obj) | |
243 | Return @code{#t} if @var{obj} is any kind of number, else @code{#f}. | |
244 | @end deffn | |
245 | ||
246 | For example: | |
247 | ||
248 | @lisp | |
249 | (number? 3) | |
250 | @result{} #t | |
251 | ||
252 | (number? "hello there!") | |
253 | @result{} #f | |
254 | ||
255 | (define pi 3.141592654) | |
256 | (number? pi) | |
257 | @result{} #t | |
258 | @end lisp | |
259 | ||
5615f696 MV |
260 | @deftypefn {C Function} int scm_is_number (SCM obj) |
261 | This is equivalent to @code{scm_is_true (scm_number_p (obj))}. | |
262 | @end deftypefn | |
263 | ||
07d83abe MV |
264 | The next few subsections document each of Guile's numerical data types |
265 | in detail. | |
266 | ||
267 | @node Integers | |
268 | @subsubsection Integers | |
269 | ||
270 | @tpindex Integer numbers | |
271 | ||
272 | @rnindex integer? | |
273 | ||
274 | Integers are whole numbers, that is numbers with no fractional part, | |
275 | such as 2, 83, and @minus{}3789. | |
276 | ||
277 | Integers in Guile can be arbitrarily big, as shown by the following | |
278 | example. | |
279 | ||
280 | @lisp | |
281 | (define (factorial n) | |
282 | (let loop ((n n) (product 1)) | |
283 | (if (= n 0) | |
284 | product | |
285 | (loop (- n 1) (* product n))))) | |
286 | ||
287 | (factorial 3) | |
288 | @result{} 6 | |
289 | ||
290 | (factorial 20) | |
291 | @result{} 2432902008176640000 | |
292 | ||
293 | (- (factorial 45)) | |
294 | @result{} -119622220865480194561963161495657715064383733760000000000 | |
295 | @end lisp | |
296 | ||
297 | Readers whose background is in programming languages where integers are | |
298 | limited by the need to fit into just 4 or 8 bytes of memory may find | |
299 | this surprising, or suspect that Guile's representation of integers is | |
300 | inefficient. In fact, Guile achieves a near optimal balance of | |
301 | convenience and efficiency by using the host computer's native | |
302 | representation of integers where possible, and a more general | |
303 | representation where the required number does not fit in the native | |
304 | form. Conversion between these two representations is automatic and | |
305 | completely invisible to the Scheme level programmer. | |
306 | ||
07d83abe MV |
307 | C has a host of different integer types, and Guile offers a host of |
308 | functions to convert between them and the @code{SCM} representation. | |
309 | For example, a C @code{int} can be handled with @code{scm_to_int} and | |
310 | @code{scm_from_int}. Guile also defines a few C integer types of its | |
311 | own, to help with differences between systems. | |
312 | ||
313 | C integer types that are not covered can be handled with the generic | |
314 | @code{scm_to_signed_integer} and @code{scm_from_signed_integer} for | |
315 | signed types, or with @code{scm_to_unsigned_integer} and | |
316 | @code{scm_from_unsigned_integer} for unsigned types. | |
317 | ||
318 | Scheme integers can be exact and inexact. For example, a number | |
319 | written as @code{3.0} with an explicit decimal-point is inexact, but | |
320 | it is also an integer. The functions @code{integer?} and | |
321 | @code{scm_is_integer} report true for such a number, but the functions | |
900a897c MW |
322 | @code{exact-integer?}, @code{scm_is_exact_integer}, |
323 | @code{scm_is_signed_integer}, and @code{scm_is_unsigned_integer} only | |
07d83abe MV |
324 | allow exact integers and thus report false. Likewise, the conversion |
325 | functions like @code{scm_to_signed_integer} only accept exact | |
326 | integers. | |
327 | ||
328 | The motivation for this behavior is that the inexactness of a number | |
329 | should not be lost silently. If you want to allow inexact integers, | |
877f06c3 | 330 | you can explicitly insert a call to @code{inexact->exact} or to its C |
07d83abe MV |
331 | equivalent @code{scm_inexact_to_exact}. (Only inexact integers will |
332 | be converted by this call into exact integers; inexact non-integers | |
333 | will become exact fractions.) | |
334 | ||
335 | @deffn {Scheme Procedure} integer? x | |
336 | @deffnx {C Function} scm_integer_p (x) | |
909fcc97 | 337 | Return @code{#t} if @var{x} is an exact or inexact integer number, else |
900a897c | 338 | return @code{#f}. |
07d83abe MV |
339 | |
340 | @lisp | |
341 | (integer? 487) | |
342 | @result{} #t | |
343 | ||
344 | (integer? 3.0) | |
345 | @result{} #t | |
346 | ||
347 | (integer? -3.4) | |
348 | @result{} #f | |
349 | ||
350 | (integer? +inf.0) | |
f659df44 | 351 | @result{} #f |
07d83abe MV |
352 | @end lisp |
353 | @end deffn | |
354 | ||
355 | @deftypefn {C Function} int scm_is_integer (SCM x) | |
356 | This is equivalent to @code{scm_is_true (scm_integer_p (x))}. | |
357 | @end deftypefn | |
358 | ||
900a897c MW |
359 | @deffn {Scheme Procedure} exact-integer? x |
360 | @deffnx {C Function} scm_exact_integer_p (x) | |
361 | Return @code{#t} if @var{x} is an exact integer number, else | |
362 | return @code{#f}. | |
363 | ||
364 | @lisp | |
365 | (exact-integer? 37) | |
366 | @result{} #t | |
367 | ||
368 | (exact-integer? 3.0) | |
369 | @result{} #f | |
370 | @end lisp | |
371 | @end deffn | |
372 | ||
373 | @deftypefn {C Function} int scm_is_exact_integer (SCM x) | |
374 | This is equivalent to @code{scm_is_true (scm_exact_integer_p (x))}. | |
375 | @end deftypefn | |
376 | ||
07d83abe MV |
377 | @defvr {C Type} scm_t_int8 |
378 | @defvrx {C Type} scm_t_uint8 | |
379 | @defvrx {C Type} scm_t_int16 | |
380 | @defvrx {C Type} scm_t_uint16 | |
381 | @defvrx {C Type} scm_t_int32 | |
382 | @defvrx {C Type} scm_t_uint32 | |
383 | @defvrx {C Type} scm_t_int64 | |
384 | @defvrx {C Type} scm_t_uint64 | |
385 | @defvrx {C Type} scm_t_intmax | |
386 | @defvrx {C Type} scm_t_uintmax | |
387 | The C types are equivalent to the corresponding ISO C types but are | |
388 | defined on all platforms, with the exception of @code{scm_t_int64} and | |
389 | @code{scm_t_uint64}, which are only defined when a 64-bit type is | |
390 | available. For example, @code{scm_t_int8} is equivalent to | |
391 | @code{int8_t}. | |
392 | ||
393 | You can regard these definitions as a stop-gap measure until all | |
394 | platforms provide these types. If you know that all the platforms | |
395 | that you are interested in already provide these types, it is better | |
396 | to use them directly instead of the types provided by Guile. | |
397 | @end defvr | |
398 | ||
399 | @deftypefn {C Function} int scm_is_signed_integer (SCM x, scm_t_intmax min, scm_t_intmax max) | |
400 | @deftypefnx {C Function} int scm_is_unsigned_integer (SCM x, scm_t_uintmax min, scm_t_uintmax max) | |
401 | Return @code{1} when @var{x} represents an exact integer that is | |
402 | between @var{min} and @var{max}, inclusive. | |
403 | ||
404 | These functions can be used to check whether a @code{SCM} value will | |
405 | fit into a given range, such as the range of a given C integer type. | |
406 | If you just want to convert a @code{SCM} value to a given C integer | |
407 | type, use one of the conversion functions directly. | |
408 | @end deftypefn | |
409 | ||
410 | @deftypefn {C Function} scm_t_intmax scm_to_signed_integer (SCM x, scm_t_intmax min, scm_t_intmax max) | |
411 | @deftypefnx {C Function} scm_t_uintmax scm_to_unsigned_integer (SCM x, scm_t_uintmax min, scm_t_uintmax max) | |
412 | When @var{x} represents an exact integer that is between @var{min} and | |
413 | @var{max} inclusive, return that integer. Else signal an error, | |
414 | either a `wrong-type' error when @var{x} is not an exact integer, or | |
415 | an `out-of-range' error when it doesn't fit the given range. | |
416 | @end deftypefn | |
417 | ||
418 | @deftypefn {C Function} SCM scm_from_signed_integer (scm_t_intmax x) | |
419 | @deftypefnx {C Function} SCM scm_from_unsigned_integer (scm_t_uintmax x) | |
420 | Return the @code{SCM} value that represents the integer @var{x}. This | |
421 | function will always succeed and will always return an exact number. | |
422 | @end deftypefn | |
423 | ||
424 | @deftypefn {C Function} char scm_to_char (SCM x) | |
425 | @deftypefnx {C Function} {signed char} scm_to_schar (SCM x) | |
426 | @deftypefnx {C Function} {unsigned char} scm_to_uchar (SCM x) | |
427 | @deftypefnx {C Function} short scm_to_short (SCM x) | |
428 | @deftypefnx {C Function} {unsigned short} scm_to_ushort (SCM x) | |
429 | @deftypefnx {C Function} int scm_to_int (SCM x) | |
430 | @deftypefnx {C Function} {unsigned int} scm_to_uint (SCM x) | |
431 | @deftypefnx {C Function} long scm_to_long (SCM x) | |
432 | @deftypefnx {C Function} {unsigned long} scm_to_ulong (SCM x) | |
433 | @deftypefnx {C Function} {long long} scm_to_long_long (SCM x) | |
434 | @deftypefnx {C Function} {unsigned long long} scm_to_ulong_long (SCM x) | |
435 | @deftypefnx {C Function} size_t scm_to_size_t (SCM x) | |
436 | @deftypefnx {C Function} ssize_t scm_to_ssize_t (SCM x) | |
7facc08a | 437 | @deftypefnx {C Function} scm_t_ptrdiff scm_to_ptrdiff_t (SCM x) |
07d83abe MV |
438 | @deftypefnx {C Function} scm_t_int8 scm_to_int8 (SCM x) |
439 | @deftypefnx {C Function} scm_t_uint8 scm_to_uint8 (SCM x) | |
440 | @deftypefnx {C Function} scm_t_int16 scm_to_int16 (SCM x) | |
441 | @deftypefnx {C Function} scm_t_uint16 scm_to_uint16 (SCM x) | |
442 | @deftypefnx {C Function} scm_t_int32 scm_to_int32 (SCM x) | |
443 | @deftypefnx {C Function} scm_t_uint32 scm_to_uint32 (SCM x) | |
444 | @deftypefnx {C Function} scm_t_int64 scm_to_int64 (SCM x) | |
445 | @deftypefnx {C Function} scm_t_uint64 scm_to_uint64 (SCM x) | |
446 | @deftypefnx {C Function} scm_t_intmax scm_to_intmax (SCM x) | |
447 | @deftypefnx {C Function} scm_t_uintmax scm_to_uintmax (SCM x) | |
448 | When @var{x} represents an exact integer that fits into the indicated | |
449 | C type, return that integer. Else signal an error, either a | |
450 | `wrong-type' error when @var{x} is not an exact integer, or an | |
451 | `out-of-range' error when it doesn't fit the given range. | |
452 | ||
453 | The functions @code{scm_to_long_long}, @code{scm_to_ulong_long}, | |
454 | @code{scm_to_int64}, and @code{scm_to_uint64} are only available when | |
455 | the corresponding types are. | |
456 | @end deftypefn | |
457 | ||
458 | @deftypefn {C Function} SCM scm_from_char (char x) | |
459 | @deftypefnx {C Function} SCM scm_from_schar (signed char x) | |
460 | @deftypefnx {C Function} SCM scm_from_uchar (unsigned char x) | |
461 | @deftypefnx {C Function} SCM scm_from_short (short x) | |
462 | @deftypefnx {C Function} SCM scm_from_ushort (unsigned short x) | |
463 | @deftypefnx {C Function} SCM scm_from_int (int x) | |
464 | @deftypefnx {C Function} SCM scm_from_uint (unsigned int x) | |
465 | @deftypefnx {C Function} SCM scm_from_long (long x) | |
466 | @deftypefnx {C Function} SCM scm_from_ulong (unsigned long x) | |
467 | @deftypefnx {C Function} SCM scm_from_long_long (long long x) | |
468 | @deftypefnx {C Function} SCM scm_from_ulong_long (unsigned long long x) | |
469 | @deftypefnx {C Function} SCM scm_from_size_t (size_t x) | |
470 | @deftypefnx {C Function} SCM scm_from_ssize_t (ssize_t x) | |
7facc08a | 471 | @deftypefnx {C Function} SCM scm_from_ptrdiff_t (scm_t_ptrdiff x) |
07d83abe MV |
472 | @deftypefnx {C Function} SCM scm_from_int8 (scm_t_int8 x) |
473 | @deftypefnx {C Function} SCM scm_from_uint8 (scm_t_uint8 x) | |
474 | @deftypefnx {C Function} SCM scm_from_int16 (scm_t_int16 x) | |
475 | @deftypefnx {C Function} SCM scm_from_uint16 (scm_t_uint16 x) | |
476 | @deftypefnx {C Function} SCM scm_from_int32 (scm_t_int32 x) | |
477 | @deftypefnx {C Function} SCM scm_from_uint32 (scm_t_uint32 x) | |
478 | @deftypefnx {C Function} SCM scm_from_int64 (scm_t_int64 x) | |
479 | @deftypefnx {C Function} SCM scm_from_uint64 (scm_t_uint64 x) | |
480 | @deftypefnx {C Function} SCM scm_from_intmax (scm_t_intmax x) | |
481 | @deftypefnx {C Function} SCM scm_from_uintmax (scm_t_uintmax x) | |
482 | Return the @code{SCM} value that represents the integer @var{x}. | |
483 | These functions will always succeed and will always return an exact | |
484 | number. | |
485 | @end deftypefn | |
486 | ||
08962922 MV |
487 | @deftypefn {C Function} void scm_to_mpz (SCM val, mpz_t rop) |
488 | Assign @var{val} to the multiple precision integer @var{rop}. | |
489 | @var{val} must be an exact integer, otherwise an error will be | |
490 | signalled. @var{rop} must have been initialized with @code{mpz_init} | |
491 | before this function is called. When @var{rop} is no longer needed | |
492 | the occupied space must be freed with @code{mpz_clear}. | |
493 | @xref{Initializing Integers,,, gmp, GNU MP Manual}, for details. | |
494 | @end deftypefn | |
495 | ||
9f1ba6a9 | 496 | @deftypefn {C Function} SCM scm_from_mpz (mpz_t val) |
08962922 MV |
497 | Return the @code{SCM} value that represents @var{val}. |
498 | @end deftypefn | |
499 | ||
07d83abe MV |
500 | @node Reals and Rationals |
501 | @subsubsection Real and Rational Numbers | |
502 | @tpindex Real numbers | |
503 | @tpindex Rational numbers | |
504 | ||
505 | @rnindex real? | |
506 | @rnindex rational? | |
507 | ||
508 | Mathematically, the real numbers are the set of numbers that describe | |
509 | all possible points along a continuous, infinite, one-dimensional line. | |
510 | The rational numbers are the set of all numbers that can be written as | |
511 | fractions @var{p}/@var{q}, where @var{p} and @var{q} are integers. | |
512 | All rational numbers are also real, but there are real numbers that | |
995953e5 | 513 | are not rational, for example @m{\sqrt{2}, the square root of 2}, and |
34942993 | 514 | @m{\pi,pi}. |
07d83abe MV |
515 | |
516 | Guile can represent both exact and inexact rational numbers, but it | |
c960e556 MW |
517 | cannot represent precise finite irrational numbers. Exact rationals are |
518 | represented by storing the numerator and denominator as two exact | |
519 | integers. Inexact rationals are stored as floating point numbers using | |
520 | the C type @code{double}. | |
07d83abe MV |
521 | |
522 | Exact rationals are written as a fraction of integers. There must be | |
523 | no whitespace around the slash: | |
524 | ||
525 | @lisp | |
526 | 1/2 | |
527 | -22/7 | |
528 | @end lisp | |
529 | ||
530 | Even though the actual encoding of inexact rationals is in binary, it | |
531 | may be helpful to think of it as a decimal number with a limited | |
532 | number of significant figures and a decimal point somewhere, since | |
533 | this corresponds to the standard notation for non-whole numbers. For | |
534 | example: | |
535 | ||
536 | @lisp | |
537 | 0.34 | |
538 | -0.00000142857931198 | |
539 | -5648394822220000000000.0 | |
540 | 4.0 | |
541 | @end lisp | |
542 | ||
c960e556 MW |
543 | The limited precision of Guile's encoding means that any finite ``real'' |
544 | number in Guile can be written in a rational form, by multiplying and | |
545 | then dividing by sufficient powers of 10 (or in fact, 2). For example, | |
546 | @samp{-0.00000142857931198} is the same as @minus{}142857931198 divided | |
547 | by 100000000000000000. In Guile's current incarnation, therefore, the | |
548 | @code{rational?} and @code{real?} predicates are equivalent for finite | |
549 | numbers. | |
07d83abe | 550 | |
07d83abe | 551 | |
c960e556 MW |
552 | Dividing by an exact zero leads to a error message, as one might expect. |
553 | However, dividing by an inexact zero does not produce an error. | |
554 | Instead, the result of the division is either plus or minus infinity, | |
555 | depending on the sign of the divided number and the sign of the zero | |
556 | divisor (some platforms support signed zeroes @samp{-0.0} and | |
557 | @samp{+0.0}; @samp{0.0} is the same as @samp{+0.0}). | |
558 | ||
559 | Dividing zero by an inexact zero yields a @acronym{NaN} (`not a number') | |
560 | value, although they are actually considered numbers by Scheme. | |
561 | Attempts to compare a @acronym{NaN} value with any number (including | |
562 | itself) using @code{=}, @code{<}, @code{>}, @code{<=} or @code{>=} | |
563 | always returns @code{#f}. Although a @acronym{NaN} value is not | |
564 | @code{=} to itself, it is both @code{eqv?} and @code{equal?} to itself | |
565 | and other @acronym{NaN} values. However, the preferred way to test for | |
566 | them is by using @code{nan?}. | |
567 | ||
568 | The real @acronym{NaN} values and infinities are written @samp{+nan.0}, | |
569 | @samp{+inf.0} and @samp{-inf.0}. This syntax is also recognized by | |
570 | @code{read} as an extension to the usual Scheme syntax. These special | |
571 | values are considered by Scheme to be inexact real numbers but not | |
572 | rational. Note that non-real complex numbers may also contain | |
573 | infinities or @acronym{NaN} values in their real or imaginary parts. To | |
574 | test a real number to see if it is infinite, a @acronym{NaN} value, or | |
575 | neither, use @code{inf?}, @code{nan?}, or @code{finite?}, respectively. | |
576 | Every real number in Scheme belongs to precisely one of those three | |
577 | classes. | |
07d83abe MV |
578 | |
579 | On platforms that follow @acronym{IEEE} 754 for their floating point | |
580 | arithmetic, the @samp{+inf.0}, @samp{-inf.0}, and @samp{+nan.0} values | |
581 | are implemented using the corresponding @acronym{IEEE} 754 values. | |
582 | They behave in arithmetic operations like @acronym{IEEE} 754 describes | |
583 | it, i.e., @code{(= +nan.0 +nan.0)} @result{} @code{#f}. | |
584 | ||
07d83abe MV |
585 | @deffn {Scheme Procedure} real? obj |
586 | @deffnx {C Function} scm_real_p (obj) | |
587 | Return @code{#t} if @var{obj} is a real number, else @code{#f}. Note | |
588 | that the sets of integer and rational values form subsets of the set | |
589 | of real numbers, so the predicate will also be fulfilled if @var{obj} | |
590 | is an integer number or a rational number. | |
591 | @end deffn | |
592 | ||
593 | @deffn {Scheme Procedure} rational? x | |
594 | @deffnx {C Function} scm_rational_p (x) | |
595 | Return @code{#t} if @var{x} is a rational number, @code{#f} otherwise. | |
596 | Note that the set of integer values forms a subset of the set of | |
995953e5 | 597 | rational numbers, i.e.@: the predicate will also be fulfilled if |
07d83abe | 598 | @var{x} is an integer number. |
07d83abe MV |
599 | @end deffn |
600 | ||
601 | @deffn {Scheme Procedure} rationalize x eps | |
602 | @deffnx {C Function} scm_rationalize (x, eps) | |
603 | Returns the @emph{simplest} rational number differing | |
604 | from @var{x} by no more than @var{eps}. | |
605 | ||
606 | As required by @acronym{R5RS}, @code{rationalize} only returns an | |
607 | exact result when both its arguments are exact. Thus, you might need | |
608 | to use @code{inexact->exact} on the arguments. | |
609 | ||
610 | @lisp | |
611 | (rationalize (inexact->exact 1.2) 1/100) | |
612 | @result{} 6/5 | |
613 | @end lisp | |
614 | ||
615 | @end deffn | |
616 | ||
d3df9759 MV |
617 | @deffn {Scheme Procedure} inf? x |
618 | @deffnx {C Function} scm_inf_p (x) | |
10391e06 AW |
619 | Return @code{#t} if the real number @var{x} is @samp{+inf.0} or |
620 | @samp{-inf.0}. Otherwise return @code{#f}. | |
07d83abe MV |
621 | @end deffn |
622 | ||
623 | @deffn {Scheme Procedure} nan? x | |
d3df9759 | 624 | @deffnx {C Function} scm_nan_p (x) |
10391e06 AW |
625 | Return @code{#t} if the real number @var{x} is @samp{+nan.0}, or |
626 | @code{#f} otherwise. | |
07d83abe MV |
627 | @end deffn |
628 | ||
7112615f MW |
629 | @deffn {Scheme Procedure} finite? x |
630 | @deffnx {C Function} scm_finite_p (x) | |
10391e06 AW |
631 | Return @code{#t} if the real number @var{x} is neither infinite nor a |
632 | NaN, @code{#f} otherwise. | |
7112615f MW |
633 | @end deffn |
634 | ||
cdf1ad3b MV |
635 | @deffn {Scheme Procedure} nan |
636 | @deffnx {C Function} scm_nan () | |
c960e556 | 637 | Return @samp{+nan.0}, a @acronym{NaN} value. |
cdf1ad3b MV |
638 | @end deffn |
639 | ||
640 | @deffn {Scheme Procedure} inf | |
641 | @deffnx {C Function} scm_inf () | |
c960e556 | 642 | Return @samp{+inf.0}, positive infinity. |
cdf1ad3b MV |
643 | @end deffn |
644 | ||
d3df9759 MV |
645 | @deffn {Scheme Procedure} numerator x |
646 | @deffnx {C Function} scm_numerator (x) | |
647 | Return the numerator of the rational number @var{x}. | |
648 | @end deffn | |
649 | ||
650 | @deffn {Scheme Procedure} denominator x | |
651 | @deffnx {C Function} scm_denominator (x) | |
652 | Return the denominator of the rational number @var{x}. | |
653 | @end deffn | |
654 | ||
655 | @deftypefn {C Function} int scm_is_real (SCM val) | |
656 | @deftypefnx {C Function} int scm_is_rational (SCM val) | |
657 | Equivalent to @code{scm_is_true (scm_real_p (val))} and | |
658 | @code{scm_is_true (scm_rational_p (val))}, respectively. | |
659 | @end deftypefn | |
660 | ||
661 | @deftypefn {C Function} double scm_to_double (SCM val) | |
662 | Returns the number closest to @var{val} that is representable as a | |
663 | @code{double}. Returns infinity for a @var{val} that is too large in | |
664 | magnitude. The argument @var{val} must be a real number. | |
665 | @end deftypefn | |
666 | ||
667 | @deftypefn {C Function} SCM scm_from_double (double val) | |
be3eb25c | 668 | Return the @code{SCM} value that represents @var{val}. The returned |
d3df9759 MV |
669 | value is inexact according to the predicate @code{inexact?}, but it |
670 | will be exactly equal to @var{val}. | |
671 | @end deftypefn | |
672 | ||
07d83abe MV |
673 | @node Complex Numbers |
674 | @subsubsection Complex Numbers | |
675 | @tpindex Complex numbers | |
676 | ||
677 | @rnindex complex? | |
678 | ||
679 | Complex numbers are the set of numbers that describe all possible points | |
680 | in a two-dimensional space. The two coordinates of a particular point | |
681 | in this space are known as the @dfn{real} and @dfn{imaginary} parts of | |
682 | the complex number that describes that point. | |
683 | ||
684 | In Guile, complex numbers are written in rectangular form as the sum of | |
685 | their real and imaginary parts, using the symbol @code{i} to indicate | |
686 | the imaginary part. | |
687 | ||
688 | @lisp | |
689 | 3+4i | |
690 | @result{} | |
691 | 3.0+4.0i | |
692 | ||
693 | (* 3-8i 2.3+0.3i) | |
694 | @result{} | |
695 | 9.3-17.5i | |
696 | @end lisp | |
697 | ||
34942993 KR |
698 | @cindex polar form |
699 | @noindent | |
700 | Polar form can also be used, with an @samp{@@} between magnitude and | |
701 | angle, | |
702 | ||
703 | @lisp | |
704 | 1@@3.141592 @result{} -1.0 (approx) | |
705 | -1@@1.57079 @result{} 0.0-1.0i (approx) | |
706 | @end lisp | |
707 | ||
c7218482 MW |
708 | Guile represents a complex number as a pair of inexact reals, so the |
709 | real and imaginary parts of a complex number have the same properties of | |
710 | inexactness and limited precision as single inexact real numbers. | |
711 | ||
712 | Note that each part of a complex number may contain any inexact real | |
713 | value, including the special values @samp{+nan.0}, @samp{+inf.0} and | |
714 | @samp{-inf.0}, as well as either of the signed zeroes @samp{0.0} or | |
715 | @samp{-0.0}. | |
716 | ||
07d83abe | 717 | |
5615f696 MV |
718 | @deffn {Scheme Procedure} complex? z |
719 | @deffnx {C Function} scm_complex_p (z) | |
64de6db5 | 720 | Return @code{#t} if @var{z} is a complex number, @code{#f} |
07d83abe | 721 | otherwise. Note that the sets of real, rational and integer |
679cceed | 722 | values form subsets of the set of complex numbers, i.e.@: the |
64de6db5 | 723 | predicate will also be fulfilled if @var{z} is a real, |
07d83abe MV |
724 | rational or integer number. |
725 | @end deffn | |
726 | ||
c9dc8c6c MV |
727 | @deftypefn {C Function} int scm_is_complex (SCM val) |
728 | Equivalent to @code{scm_is_true (scm_complex_p (val))}. | |
729 | @end deftypefn | |
730 | ||
07d83abe MV |
731 | @node Exactness |
732 | @subsubsection Exact and Inexact Numbers | |
733 | @tpindex Exact numbers | |
734 | @tpindex Inexact numbers | |
735 | ||
736 | @rnindex exact? | |
737 | @rnindex inexact? | |
738 | @rnindex exact->inexact | |
739 | @rnindex inexact->exact | |
740 | ||
654b2823 MW |
741 | R5RS requires that, with few exceptions, a calculation involving inexact |
742 | numbers always produces an inexact result. To meet this requirement, | |
743 | Guile distinguishes between an exact integer value such as @samp{5} and | |
744 | the corresponding inexact integer value which, to the limited precision | |
07d83abe MV |
745 | available, has no fractional part, and is printed as @samp{5.0}. Guile |
746 | will only convert the latter value to the former when forced to do so by | |
747 | an invocation of the @code{inexact->exact} procedure. | |
748 | ||
654b2823 MW |
749 | The only exception to the above requirement is when the values of the |
750 | inexact numbers do not affect the result. For example @code{(expt n 0)} | |
751 | is @samp{1} for any value of @code{n}, therefore @code{(expt 5.0 0)} is | |
752 | permitted to return an exact @samp{1}. | |
753 | ||
07d83abe MV |
754 | @deffn {Scheme Procedure} exact? z |
755 | @deffnx {C Function} scm_exact_p (z) | |
756 | Return @code{#t} if the number @var{z} is exact, @code{#f} | |
757 | otherwise. | |
758 | ||
759 | @lisp | |
760 | (exact? 2) | |
761 | @result{} #t | |
762 | ||
763 | (exact? 0.5) | |
764 | @result{} #f | |
765 | ||
766 | (exact? (/ 2)) | |
767 | @result{} #t | |
768 | @end lisp | |
769 | ||
770 | @end deffn | |
771 | ||
022dda69 MG |
772 | @deftypefn {C Function} int scm_is_exact (SCM z) |
773 | Return a @code{1} if the number @var{z} is exact, and @code{0} | |
774 | otherwise. This is equivalent to @code{scm_is_true (scm_exact_p (z))}. | |
775 | ||
776 | An alternate approch to testing the exactness of a number is to | |
777 | use @code{scm_is_signed_integer} or @code{scm_is_unsigned_integer}. | |
778 | @end deftypefn | |
779 | ||
07d83abe MV |
780 | @deffn {Scheme Procedure} inexact? z |
781 | @deffnx {C Function} scm_inexact_p (z) | |
782 | Return @code{#t} if the number @var{z} is inexact, @code{#f} | |
783 | else. | |
784 | @end deffn | |
785 | ||
022dda69 MG |
786 | @deftypefn {C Function} int scm_is_inexact (SCM z) |
787 | Return a @code{1} if the number @var{z} is inexact, and @code{0} | |
788 | otherwise. This is equivalent to @code{scm_is_true (scm_inexact_p (z))}. | |
789 | @end deftypefn | |
790 | ||
07d83abe MV |
791 | @deffn {Scheme Procedure} inexact->exact z |
792 | @deffnx {C Function} scm_inexact_to_exact (z) | |
793 | Return an exact number that is numerically closest to @var{z}, when | |
794 | there is one. For inexact rationals, Guile returns the exact rational | |
795 | that is numerically equal to the inexact rational. Inexact complex | |
796 | numbers with a non-zero imaginary part can not be made exact. | |
797 | ||
798 | @lisp | |
799 | (inexact->exact 0.5) | |
800 | @result{} 1/2 | |
801 | @end lisp | |
802 | ||
803 | The following happens because 12/10 is not exactly representable as a | |
804 | @code{double} (on most platforms). However, when reading a decimal | |
805 | number that has been marked exact with the ``#e'' prefix, Guile is | |
806 | able to represent it correctly. | |
807 | ||
808 | @lisp | |
809 | (inexact->exact 1.2) | |
810 | @result{} 5404319552844595/4503599627370496 | |
811 | ||
812 | #e1.2 | |
813 | @result{} 6/5 | |
814 | @end lisp | |
815 | ||
816 | @end deffn | |
817 | ||
818 | @c begin (texi-doc-string "guile" "exact->inexact") | |
819 | @deffn {Scheme Procedure} exact->inexact z | |
820 | @deffnx {C Function} scm_exact_to_inexact (z) | |
821 | Convert the number @var{z} to its inexact representation. | |
822 | @end deffn | |
823 | ||
824 | ||
825 | @node Number Syntax | |
826 | @subsubsection Read Syntax for Numerical Data | |
827 | ||
828 | The read syntax for integers is a string of digits, optionally | |
829 | preceded by a minus or plus character, a code indicating the | |
830 | base in which the integer is encoded, and a code indicating whether | |
831 | the number is exact or inexact. The supported base codes are: | |
832 | ||
833 | @table @code | |
834 | @item #b | |
835 | @itemx #B | |
836 | the integer is written in binary (base 2) | |
837 | ||
838 | @item #o | |
839 | @itemx #O | |
840 | the integer is written in octal (base 8) | |
841 | ||
842 | @item #d | |
843 | @itemx #D | |
844 | the integer is written in decimal (base 10) | |
845 | ||
846 | @item #x | |
847 | @itemx #X | |
848 | the integer is written in hexadecimal (base 16) | |
849 | @end table | |
850 | ||
851 | If the base code is omitted, the integer is assumed to be decimal. The | |
852 | following examples show how these base codes are used. | |
853 | ||
854 | @lisp | |
855 | -13 | |
856 | @result{} -13 | |
857 | ||
858 | #d-13 | |
859 | @result{} -13 | |
860 | ||
861 | #x-13 | |
862 | @result{} -19 | |
863 | ||
864 | #b+1101 | |
865 | @result{} 13 | |
866 | ||
867 | #o377 | |
868 | @result{} 255 | |
869 | @end lisp | |
870 | ||
871 | The codes for indicating exactness (which can, incidentally, be applied | |
872 | to all numerical values) are: | |
873 | ||
874 | @table @code | |
875 | @item #e | |
876 | @itemx #E | |
877 | the number is exact | |
878 | ||
879 | @item #i | |
880 | @itemx #I | |
881 | the number is inexact. | |
882 | @end table | |
883 | ||
884 | If the exactness indicator is omitted, the number is exact unless it | |
885 | contains a radix point. Since Guile can not represent exact complex | |
886 | numbers, an error is signalled when asking for them. | |
887 | ||
888 | @lisp | |
889 | (exact? 1.2) | |
890 | @result{} #f | |
891 | ||
892 | (exact? #e1.2) | |
893 | @result{} #t | |
894 | ||
895 | (exact? #e+1i) | |
896 | ERROR: Wrong type argument | |
897 | @end lisp | |
898 | ||
899 | Guile also understands the syntax @samp{+inf.0} and @samp{-inf.0} for | |
900 | plus and minus infinity, respectively. The value must be written | |
901 | exactly as shown, that is, they always must have a sign and exactly | |
902 | one zero digit after the decimal point. It also understands | |
903 | @samp{+nan.0} and @samp{-nan.0} for the special `not-a-number' value. | |
904 | The sign is ignored for `not-a-number' and the value is always printed | |
905 | as @samp{+nan.0}. | |
906 | ||
907 | @node Integer Operations | |
908 | @subsubsection Operations on Integer Values | |
909 | @rnindex odd? | |
910 | @rnindex even? | |
911 | @rnindex quotient | |
912 | @rnindex remainder | |
913 | @rnindex modulo | |
914 | @rnindex gcd | |
915 | @rnindex lcm | |
916 | ||
917 | @deffn {Scheme Procedure} odd? n | |
918 | @deffnx {C Function} scm_odd_p (n) | |
919 | Return @code{#t} if @var{n} is an odd number, @code{#f} | |
920 | otherwise. | |
921 | @end deffn | |
922 | ||
923 | @deffn {Scheme Procedure} even? n | |
924 | @deffnx {C Function} scm_even_p (n) | |
925 | Return @code{#t} if @var{n} is an even number, @code{#f} | |
926 | otherwise. | |
927 | @end deffn | |
928 | ||
929 | @c begin (texi-doc-string "guile" "quotient") | |
930 | @c begin (texi-doc-string "guile" "remainder") | |
931 | @deffn {Scheme Procedure} quotient n d | |
932 | @deffnx {Scheme Procedure} remainder n d | |
933 | @deffnx {C Function} scm_quotient (n, d) | |
934 | @deffnx {C Function} scm_remainder (n, d) | |
935 | Return the quotient or remainder from @var{n} divided by @var{d}. The | |
936 | quotient is rounded towards zero, and the remainder will have the same | |
937 | sign as @var{n}. In all cases quotient and remainder satisfy | |
938 | @math{@var{n} = @var{q}*@var{d} + @var{r}}. | |
939 | ||
940 | @lisp | |
941 | (remainder 13 4) @result{} 1 | |
942 | (remainder -13 4) @result{} -1 | |
943 | @end lisp | |
ff62c168 | 944 | |
8f9da340 | 945 | See also @code{truncate-quotient}, @code{truncate-remainder} and |
ff62c168 | 946 | related operations in @ref{Arithmetic}. |
07d83abe MV |
947 | @end deffn |
948 | ||
949 | @c begin (texi-doc-string "guile" "modulo") | |
950 | @deffn {Scheme Procedure} modulo n d | |
951 | @deffnx {C Function} scm_modulo (n, d) | |
952 | Return the remainder from @var{n} divided by @var{d}, with the same | |
953 | sign as @var{d}. | |
954 | ||
955 | @lisp | |
956 | (modulo 13 4) @result{} 1 | |
957 | (modulo -13 4) @result{} 3 | |
958 | (modulo 13 -4) @result{} -3 | |
959 | (modulo -13 -4) @result{} -1 | |
960 | @end lisp | |
ff62c168 | 961 | |
8f9da340 | 962 | See also @code{floor-quotient}, @code{floor-remainder} and |
ff62c168 | 963 | related operations in @ref{Arithmetic}. |
07d83abe MV |
964 | @end deffn |
965 | ||
966 | @c begin (texi-doc-string "guile" "gcd") | |
fd8a1df5 | 967 | @deffn {Scheme Procedure} gcd x@dots{} |
07d83abe MV |
968 | @deffnx {C Function} scm_gcd (x, y) |
969 | Return the greatest common divisor of all arguments. | |
970 | If called without arguments, 0 is returned. | |
971 | ||
972 | The C function @code{scm_gcd} always takes two arguments, while the | |
973 | Scheme function can take an arbitrary number. | |
974 | @end deffn | |
975 | ||
976 | @c begin (texi-doc-string "guile" "lcm") | |
fd8a1df5 | 977 | @deffn {Scheme Procedure} lcm x@dots{} |
07d83abe MV |
978 | @deffnx {C Function} scm_lcm (x, y) |
979 | Return the least common multiple of the arguments. | |
980 | If called without arguments, 1 is returned. | |
981 | ||
982 | The C function @code{scm_lcm} always takes two arguments, while the | |
983 | Scheme function can take an arbitrary number. | |
984 | @end deffn | |
985 | ||
cdf1ad3b MV |
986 | @deffn {Scheme Procedure} modulo-expt n k m |
987 | @deffnx {C Function} scm_modulo_expt (n, k, m) | |
988 | Return @var{n} raised to the integer exponent | |
989 | @var{k}, modulo @var{m}. | |
990 | ||
991 | @lisp | |
992 | (modulo-expt 2 3 5) | |
993 | @result{} 3 | |
994 | @end lisp | |
995 | @end deffn | |
07d83abe | 996 | |
882c8963 MW |
997 | @deftypefn {Scheme Procedure} {} exact-integer-sqrt @var{k} |
998 | @deftypefnx {C Function} void scm_exact_integer_sqrt (SCM @var{k}, SCM *@var{s}, SCM *@var{r}) | |
999 | Return two exact non-negative integers @var{s} and @var{r} | |
1000 | such that @math{@var{k} = @var{s}^2 + @var{r}} and | |
1001 | @math{@var{s}^2 <= @var{k} < (@var{s} + 1)^2}. | |
1002 | An error is raised if @var{k} is not an exact non-negative integer. | |
1003 | ||
1004 | @lisp | |
1005 | (exact-integer-sqrt 10) @result{} 3 and 1 | |
1006 | @end lisp | |
1007 | @end deftypefn | |
1008 | ||
07d83abe MV |
1009 | @node Comparison |
1010 | @subsubsection Comparison Predicates | |
1011 | @rnindex zero? | |
1012 | @rnindex positive? | |
1013 | @rnindex negative? | |
1014 | ||
1015 | The C comparison functions below always takes two arguments, while the | |
1016 | Scheme functions can take an arbitrary number. Also keep in mind that | |
1017 | the C functions return one of the Scheme boolean values | |
1018 | @code{SCM_BOOL_T} or @code{SCM_BOOL_F} which are both true as far as C | |
1019 | is concerned. Thus, always write @code{scm_is_true (scm_num_eq_p (x, | |
1020 | y))} when testing the two Scheme numbers @code{x} and @code{y} for | |
1021 | equality, for example. | |
1022 | ||
1023 | @c begin (texi-doc-string "guile" "=") | |
1024 | @deffn {Scheme Procedure} = | |
1025 | @deffnx {C Function} scm_num_eq_p (x, y) | |
1026 | Return @code{#t} if all parameters are numerically equal. | |
1027 | @end deffn | |
1028 | ||
1029 | @c begin (texi-doc-string "guile" "<") | |
1030 | @deffn {Scheme Procedure} < | |
1031 | @deffnx {C Function} scm_less_p (x, y) | |
1032 | Return @code{#t} if the list of parameters is monotonically | |
1033 | increasing. | |
1034 | @end deffn | |
1035 | ||
1036 | @c begin (texi-doc-string "guile" ">") | |
1037 | @deffn {Scheme Procedure} > | |
1038 | @deffnx {C Function} scm_gr_p (x, y) | |
1039 | Return @code{#t} if the list of parameters is monotonically | |
1040 | decreasing. | |
1041 | @end deffn | |
1042 | ||
1043 | @c begin (texi-doc-string "guile" "<=") | |
1044 | @deffn {Scheme Procedure} <= | |
1045 | @deffnx {C Function} scm_leq_p (x, y) | |
1046 | Return @code{#t} if the list of parameters is monotonically | |
1047 | non-decreasing. | |
1048 | @end deffn | |
1049 | ||
1050 | @c begin (texi-doc-string "guile" ">=") | |
1051 | @deffn {Scheme Procedure} >= | |
1052 | @deffnx {C Function} scm_geq_p (x, y) | |
1053 | Return @code{#t} if the list of parameters is monotonically | |
1054 | non-increasing. | |
1055 | @end deffn | |
1056 | ||
1057 | @c begin (texi-doc-string "guile" "zero?") | |
1058 | @deffn {Scheme Procedure} zero? z | |
1059 | @deffnx {C Function} scm_zero_p (z) | |
1060 | Return @code{#t} if @var{z} is an exact or inexact number equal to | |
1061 | zero. | |
1062 | @end deffn | |
1063 | ||
1064 | @c begin (texi-doc-string "guile" "positive?") | |
1065 | @deffn {Scheme Procedure} positive? x | |
1066 | @deffnx {C Function} scm_positive_p (x) | |
1067 | Return @code{#t} if @var{x} is an exact or inexact number greater than | |
1068 | zero. | |
1069 | @end deffn | |
1070 | ||
1071 | @c begin (texi-doc-string "guile" "negative?") | |
1072 | @deffn {Scheme Procedure} negative? x | |
1073 | @deffnx {C Function} scm_negative_p (x) | |
1074 | Return @code{#t} if @var{x} is an exact or inexact number less than | |
1075 | zero. | |
1076 | @end deffn | |
1077 | ||
1078 | ||
1079 | @node Conversion | |
1080 | @subsubsection Converting Numbers To and From Strings | |
1081 | @rnindex number->string | |
1082 | @rnindex string->number | |
1083 | ||
b89c4943 LC |
1084 | The following procedures read and write numbers according to their |
1085 | external representation as defined by R5RS (@pxref{Lexical structure, | |
1086 | R5RS Lexical Structure,, r5rs, The Revised^5 Report on the Algorithmic | |
a2f00b9b | 1087 | Language Scheme}). @xref{Number Input and Output, the @code{(ice-9 |
b89c4943 LC |
1088 | i18n)} module}, for locale-dependent number parsing. |
1089 | ||
07d83abe MV |
1090 | @deffn {Scheme Procedure} number->string n [radix] |
1091 | @deffnx {C Function} scm_number_to_string (n, radix) | |
1092 | Return a string holding the external representation of the | |
1093 | number @var{n} in the given @var{radix}. If @var{n} is | |
1094 | inexact, a radix of 10 will be used. | |
1095 | @end deffn | |
1096 | ||
1097 | @deffn {Scheme Procedure} string->number string [radix] | |
1098 | @deffnx {C Function} scm_string_to_number (string, radix) | |
1099 | Return a number of the maximally precise representation | |
1100 | expressed by the given @var{string}. @var{radix} must be an | |
1101 | exact integer, either 2, 8, 10, or 16. If supplied, @var{radix} | |
1102 | is a default radix that may be overridden by an explicit radix | |
679cceed | 1103 | prefix in @var{string} (e.g.@: "#o177"). If @var{radix} is not |
07d83abe MV |
1104 | supplied, then the default radix is 10. If string is not a |
1105 | syntactically valid notation for a number, then | |
1106 | @code{string->number} returns @code{#f}. | |
1107 | @end deffn | |
1108 | ||
1b09b607 KR |
1109 | @deftypefn {C Function} SCM scm_c_locale_stringn_to_number (const char *string, size_t len, unsigned radix) |
1110 | As per @code{string->number} above, but taking a C string, as pointer | |
1111 | and length. The string characters should be in the current locale | |
1112 | encoding (@code{locale} in the name refers only to that, there's no | |
1113 | locale-dependent parsing). | |
1114 | @end deftypefn | |
1115 | ||
07d83abe MV |
1116 | |
1117 | @node Complex | |
1118 | @subsubsection Complex Number Operations | |
1119 | @rnindex make-rectangular | |
1120 | @rnindex make-polar | |
1121 | @rnindex real-part | |
1122 | @rnindex imag-part | |
1123 | @rnindex magnitude | |
1124 | @rnindex angle | |
1125 | ||
3323ec06 NJ |
1126 | @deffn {Scheme Procedure} make-rectangular real_part imaginary_part |
1127 | @deffnx {C Function} scm_make_rectangular (real_part, imaginary_part) | |
1128 | Return a complex number constructed of the given @var{real-part} and @var{imaginary-part} parts. | |
07d83abe MV |
1129 | @end deffn |
1130 | ||
c7218482 MW |
1131 | @deffn {Scheme Procedure} make-polar mag ang |
1132 | @deffnx {C Function} scm_make_polar (mag, ang) | |
34942993 | 1133 | @cindex polar form |
c7218482 | 1134 | Return the complex number @var{mag} * e^(i * @var{ang}). |
07d83abe MV |
1135 | @end deffn |
1136 | ||
1137 | @c begin (texi-doc-string "guile" "real-part") | |
1138 | @deffn {Scheme Procedure} real-part z | |
1139 | @deffnx {C Function} scm_real_part (z) | |
1140 | Return the real part of the number @var{z}. | |
1141 | @end deffn | |
1142 | ||
1143 | @c begin (texi-doc-string "guile" "imag-part") | |
1144 | @deffn {Scheme Procedure} imag-part z | |
1145 | @deffnx {C Function} scm_imag_part (z) | |
1146 | Return the imaginary part of the number @var{z}. | |
1147 | @end deffn | |
1148 | ||
1149 | @c begin (texi-doc-string "guile" "magnitude") | |
1150 | @deffn {Scheme Procedure} magnitude z | |
1151 | @deffnx {C Function} scm_magnitude (z) | |
1152 | Return the magnitude of the number @var{z}. This is the same as | |
1153 | @code{abs} for real arguments, but also allows complex numbers. | |
1154 | @end deffn | |
1155 | ||
1156 | @c begin (texi-doc-string "guile" "angle") | |
1157 | @deffn {Scheme Procedure} angle z | |
1158 | @deffnx {C Function} scm_angle (z) | |
1159 | Return the angle of the complex number @var{z}. | |
1160 | @end deffn | |
1161 | ||
5615f696 MV |
1162 | @deftypefn {C Function} SCM scm_c_make_rectangular (double re, double im) |
1163 | @deftypefnx {C Function} SCM scm_c_make_polar (double x, double y) | |
1164 | Like @code{scm_make_rectangular} or @code{scm_make_polar}, | |
1165 | respectively, but these functions take @code{double}s as their | |
1166 | arguments. | |
1167 | @end deftypefn | |
1168 | ||
1169 | @deftypefn {C Function} double scm_c_real_part (z) | |
1170 | @deftypefnx {C Function} double scm_c_imag_part (z) | |
1171 | Returns the real or imaginary part of @var{z} as a @code{double}. | |
1172 | @end deftypefn | |
1173 | ||
1174 | @deftypefn {C Function} double scm_c_magnitude (z) | |
1175 | @deftypefnx {C Function} double scm_c_angle (z) | |
1176 | Returns the magnitude or angle of @var{z} as a @code{double}. | |
1177 | @end deftypefn | |
1178 | ||
07d83abe MV |
1179 | |
1180 | @node Arithmetic | |
1181 | @subsubsection Arithmetic Functions | |
1182 | @rnindex max | |
1183 | @rnindex min | |
1184 | @rnindex + | |
1185 | @rnindex * | |
1186 | @rnindex - | |
1187 | @rnindex / | |
b1f57ea4 LC |
1188 | @findex 1+ |
1189 | @findex 1- | |
07d83abe MV |
1190 | @rnindex abs |
1191 | @rnindex floor | |
1192 | @rnindex ceiling | |
1193 | @rnindex truncate | |
1194 | @rnindex round | |
ff62c168 MW |
1195 | @rnindex euclidean/ |
1196 | @rnindex euclidean-quotient | |
1197 | @rnindex euclidean-remainder | |
8f9da340 MW |
1198 | @rnindex floor/ |
1199 | @rnindex floor-quotient | |
1200 | @rnindex floor-remainder | |
1201 | @rnindex ceiling/ | |
1202 | @rnindex ceiling-quotient | |
1203 | @rnindex ceiling-remainder | |
1204 | @rnindex truncate/ | |
1205 | @rnindex truncate-quotient | |
1206 | @rnindex truncate-remainder | |
ff62c168 MW |
1207 | @rnindex centered/ |
1208 | @rnindex centered-quotient | |
1209 | @rnindex centered-remainder | |
8f9da340 MW |
1210 | @rnindex round/ |
1211 | @rnindex round-quotient | |
1212 | @rnindex round-remainder | |
07d83abe MV |
1213 | |
1214 | The C arithmetic functions below always takes two arguments, while the | |
1215 | Scheme functions can take an arbitrary number. When you need to | |
1216 | invoke them with just one argument, for example to compute the | |
ecb87335 | 1217 | equivalent of @code{(- x)}, pass @code{SCM_UNDEFINED} as the second |
07d83abe MV |
1218 | one: @code{scm_difference (x, SCM_UNDEFINED)}. |
1219 | ||
1220 | @c begin (texi-doc-string "guile" "+") | |
1221 | @deffn {Scheme Procedure} + z1 @dots{} | |
1222 | @deffnx {C Function} scm_sum (z1, z2) | |
1223 | Return the sum of all parameter values. Return 0 if called without any | |
1224 | parameters. | |
1225 | @end deffn | |
1226 | ||
1227 | @c begin (texi-doc-string "guile" "-") | |
1228 | @deffn {Scheme Procedure} - z1 z2 @dots{} | |
1229 | @deffnx {C Function} scm_difference (z1, z2) | |
1230 | If called with one argument @var{z1}, -@var{z1} is returned. Otherwise | |
1231 | the sum of all but the first argument are subtracted from the first | |
1232 | argument. | |
1233 | @end deffn | |
1234 | ||
1235 | @c begin (texi-doc-string "guile" "*") | |
1236 | @deffn {Scheme Procedure} * z1 @dots{} | |
1237 | @deffnx {C Function} scm_product (z1, z2) | |
1238 | Return the product of all arguments. If called without arguments, 1 is | |
1239 | returned. | |
1240 | @end deffn | |
1241 | ||
1242 | @c begin (texi-doc-string "guile" "/") | |
1243 | @deffn {Scheme Procedure} / z1 z2 @dots{} | |
1244 | @deffnx {C Function} scm_divide (z1, z2) | |
1245 | Divide the first argument by the product of the remaining arguments. If | |
1246 | called with one argument @var{z1}, 1/@var{z1} is returned. | |
1247 | @end deffn | |
1248 | ||
b1f57ea4 LC |
1249 | @deffn {Scheme Procedure} 1+ z |
1250 | @deffnx {C Function} scm_oneplus (z) | |
1251 | Return @math{@var{z} + 1}. | |
1252 | @end deffn | |
1253 | ||
1254 | @deffn {Scheme Procedure} 1- z | |
1255 | @deffnx {C function} scm_oneminus (z) | |
1256 | Return @math{@var{z} - 1}. | |
1257 | @end deffn | |
1258 | ||
07d83abe MV |
1259 | @c begin (texi-doc-string "guile" "abs") |
1260 | @deffn {Scheme Procedure} abs x | |
1261 | @deffnx {C Function} scm_abs (x) | |
1262 | Return the absolute value of @var{x}. | |
1263 | ||
1264 | @var{x} must be a number with zero imaginary part. To calculate the | |
1265 | magnitude of a complex number, use @code{magnitude} instead. | |
1266 | @end deffn | |
1267 | ||
1268 | @c begin (texi-doc-string "guile" "max") | |
1269 | @deffn {Scheme Procedure} max x1 x2 @dots{} | |
1270 | @deffnx {C Function} scm_max (x1, x2) | |
1271 | Return the maximum of all parameter values. | |
1272 | @end deffn | |
1273 | ||
1274 | @c begin (texi-doc-string "guile" "min") | |
1275 | @deffn {Scheme Procedure} min x1 x2 @dots{} | |
1276 | @deffnx {C Function} scm_min (x1, x2) | |
1277 | Return the minimum of all parameter values. | |
1278 | @end deffn | |
1279 | ||
1280 | @c begin (texi-doc-string "guile" "truncate") | |
fd8a1df5 | 1281 | @deffn {Scheme Procedure} truncate x |
07d83abe MV |
1282 | @deffnx {C Function} scm_truncate_number (x) |
1283 | Round the inexact number @var{x} towards zero. | |
1284 | @end deffn | |
1285 | ||
1286 | @c begin (texi-doc-string "guile" "round") | |
1287 | @deffn {Scheme Procedure} round x | |
1288 | @deffnx {C Function} scm_round_number (x) | |
1289 | Round the inexact number @var{x} to the nearest integer. When exactly | |
1290 | halfway between two integers, round to the even one. | |
1291 | @end deffn | |
1292 | ||
1293 | @c begin (texi-doc-string "guile" "floor") | |
1294 | @deffn {Scheme Procedure} floor x | |
1295 | @deffnx {C Function} scm_floor (x) | |
1296 | Round the number @var{x} towards minus infinity. | |
1297 | @end deffn | |
1298 | ||
1299 | @c begin (texi-doc-string "guile" "ceiling") | |
1300 | @deffn {Scheme Procedure} ceiling x | |
1301 | @deffnx {C Function} scm_ceiling (x) | |
1302 | Round the number @var{x} towards infinity. | |
1303 | @end deffn | |
1304 | ||
35da08ee MV |
1305 | @deftypefn {C Function} double scm_c_truncate (double x) |
1306 | @deftypefnx {C Function} double scm_c_round (double x) | |
1307 | Like @code{scm_truncate_number} or @code{scm_round_number}, | |
1308 | respectively, but these functions take and return @code{double} | |
1309 | values. | |
1310 | @end deftypefn | |
07d83abe | 1311 | |
5fbf680b MW |
1312 | @deftypefn {Scheme Procedure} {} euclidean/ @var{x} @var{y} |
1313 | @deftypefnx {Scheme Procedure} {} euclidean-quotient @var{x} @var{y} | |
1314 | @deftypefnx {Scheme Procedure} {} euclidean-remainder @var{x} @var{y} | |
1315 | @deftypefnx {C Function} void scm_euclidean_divide (SCM @var{x}, SCM @var{y}, SCM *@var{q}, SCM *@var{r}) | |
1316 | @deftypefnx {C Function} SCM scm_euclidean_quotient (SCM @var{x}, SCM @var{y}) | |
1317 | @deftypefnx {C Function} SCM scm_euclidean_remainder (SCM @var{x}, SCM @var{y}) | |
ff62c168 MW |
1318 | These procedures accept two real numbers @var{x} and @var{y}, where the |
1319 | divisor @var{y} must be non-zero. @code{euclidean-quotient} returns the | |
1320 | integer @var{q} and @code{euclidean-remainder} returns the real number | |
1321 | @var{r} such that @math{@var{x} = @var{q}*@var{y} + @var{r}} and | |
5fbf680b | 1322 | @math{0 <= @var{r} < |@var{y}|}. @code{euclidean/} returns both @var{q} and |
ff62c168 MW |
1323 | @var{r}, and is more efficient than computing each separately. Note |
1324 | that when @math{@var{y} > 0}, @code{euclidean-quotient} returns | |
1325 | @math{floor(@var{x}/@var{y})}, otherwise it returns | |
1326 | @math{ceiling(@var{x}/@var{y})}. | |
1327 | ||
1328 | Note that these operators are equivalent to the R6RS operators | |
1329 | @code{div}, @code{mod}, and @code{div-and-mod}. | |
1330 | ||
1331 | @lisp | |
1332 | (euclidean-quotient 123 10) @result{} 12 | |
1333 | (euclidean-remainder 123 10) @result{} 3 | |
1334 | (euclidean/ 123 10) @result{} 12 and 3 | |
1335 | (euclidean/ 123 -10) @result{} -12 and 3 | |
1336 | (euclidean/ -123 10) @result{} -13 and 7 | |
1337 | (euclidean/ -123 -10) @result{} 13 and 7 | |
1338 | (euclidean/ -123.2 -63.5) @result{} 2.0 and 3.8 | |
1339 | (euclidean/ 16/3 -10/7) @result{} -3 and 22/21 | |
1340 | @end lisp | |
5fbf680b | 1341 | @end deftypefn |
ff62c168 | 1342 | |
8f9da340 MW |
1343 | @deftypefn {Scheme Procedure} {} floor/ @var{x} @var{y} |
1344 | @deftypefnx {Scheme Procedure} {} floor-quotient @var{x} @var{y} | |
1345 | @deftypefnx {Scheme Procedure} {} floor-remainder @var{x} @var{y} | |
1346 | @deftypefnx {C Function} void scm_floor_divide (SCM @var{x}, SCM @var{y}, SCM *@var{q}, SCM *@var{r}) | |
1347 | @deftypefnx {C Function} SCM scm_floor_quotient (@var{x}, @var{y}) | |
1348 | @deftypefnx {C Function} SCM scm_floor_remainder (@var{x}, @var{y}) | |
1349 | These procedures accept two real numbers @var{x} and @var{y}, where the | |
1350 | divisor @var{y} must be non-zero. @code{floor-quotient} returns the | |
1351 | integer @var{q} and @code{floor-remainder} returns the real number | |
1352 | @var{r} such that @math{@var{q} = floor(@var{x}/@var{y})} and | |
1353 | @math{@var{x} = @var{q}*@var{y} + @var{r}}. @code{floor/} returns | |
1354 | both @var{q} and @var{r}, and is more efficient than computing each | |
1355 | separately. Note that @var{r}, if non-zero, will have the same sign | |
1356 | as @var{y}. | |
1357 | ||
ce606606 | 1358 | When @var{x} and @var{y} are integers, @code{floor-remainder} is |
8f9da340 MW |
1359 | equivalent to the R5RS integer-only operator @code{modulo}. |
1360 | ||
1361 | @lisp | |
1362 | (floor-quotient 123 10) @result{} 12 | |
1363 | (floor-remainder 123 10) @result{} 3 | |
1364 | (floor/ 123 10) @result{} 12 and 3 | |
1365 | (floor/ 123 -10) @result{} -13 and -7 | |
1366 | (floor/ -123 10) @result{} -13 and 7 | |
1367 | (floor/ -123 -10) @result{} 12 and -3 | |
1368 | (floor/ -123.2 -63.5) @result{} 1.0 and -59.7 | |
1369 | (floor/ 16/3 -10/7) @result{} -4 and -8/21 | |
1370 | @end lisp | |
1371 | @end deftypefn | |
1372 | ||
1373 | @deftypefn {Scheme Procedure} {} ceiling/ @var{x} @var{y} | |
1374 | @deftypefnx {Scheme Procedure} {} ceiling-quotient @var{x} @var{y} | |
1375 | @deftypefnx {Scheme Procedure} {} ceiling-remainder @var{x} @var{y} | |
1376 | @deftypefnx {C Function} void scm_ceiling_divide (SCM @var{x}, SCM @var{y}, SCM *@var{q}, SCM *@var{r}) | |
1377 | @deftypefnx {C Function} SCM scm_ceiling_quotient (@var{x}, @var{y}) | |
1378 | @deftypefnx {C Function} SCM scm_ceiling_remainder (@var{x}, @var{y}) | |
1379 | These procedures accept two real numbers @var{x} and @var{y}, where the | |
1380 | divisor @var{y} must be non-zero. @code{ceiling-quotient} returns the | |
1381 | integer @var{q} and @code{ceiling-remainder} returns the real number | |
1382 | @var{r} such that @math{@var{q} = ceiling(@var{x}/@var{y})} and | |
1383 | @math{@var{x} = @var{q}*@var{y} + @var{r}}. @code{ceiling/} returns | |
1384 | both @var{q} and @var{r}, and is more efficient than computing each | |
1385 | separately. Note that @var{r}, if non-zero, will have the opposite sign | |
1386 | of @var{y}. | |
1387 | ||
1388 | @lisp | |
1389 | (ceiling-quotient 123 10) @result{} 13 | |
1390 | (ceiling-remainder 123 10) @result{} -7 | |
1391 | (ceiling/ 123 10) @result{} 13 and -7 | |
1392 | (ceiling/ 123 -10) @result{} -12 and 3 | |
1393 | (ceiling/ -123 10) @result{} -12 and -3 | |
1394 | (ceiling/ -123 -10) @result{} 13 and 7 | |
1395 | (ceiling/ -123.2 -63.5) @result{} 2.0 and 3.8 | |
1396 | (ceiling/ 16/3 -10/7) @result{} -3 and 22/21 | |
1397 | @end lisp | |
1398 | @end deftypefn | |
1399 | ||
1400 | @deftypefn {Scheme Procedure} {} truncate/ @var{x} @var{y} | |
1401 | @deftypefnx {Scheme Procedure} {} truncate-quotient @var{x} @var{y} | |
1402 | @deftypefnx {Scheme Procedure} {} truncate-remainder @var{x} @var{y} | |
1403 | @deftypefnx {C Function} void scm_truncate_divide (SCM @var{x}, SCM @var{y}, SCM *@var{q}, SCM *@var{r}) | |
1404 | @deftypefnx {C Function} SCM scm_truncate_quotient (@var{x}, @var{y}) | |
1405 | @deftypefnx {C Function} SCM scm_truncate_remainder (@var{x}, @var{y}) | |
1406 | These procedures accept two real numbers @var{x} and @var{y}, where the | |
1407 | divisor @var{y} must be non-zero. @code{truncate-quotient} returns the | |
1408 | integer @var{q} and @code{truncate-remainder} returns the real number | |
1409 | @var{r} such that @var{q} is @math{@var{x}/@var{y}} rounded toward zero, | |
1410 | and @math{@var{x} = @var{q}*@var{y} + @var{r}}. @code{truncate/} returns | |
1411 | both @var{q} and @var{r}, and is more efficient than computing each | |
1412 | separately. Note that @var{r}, if non-zero, will have the same sign | |
1413 | as @var{x}. | |
1414 | ||
ce606606 | 1415 | When @var{x} and @var{y} are integers, these operators are |
a6b087be MW |
1416 | equivalent to the R5RS integer-only operators @code{quotient} and |
1417 | @code{remainder}. | |
8f9da340 MW |
1418 | |
1419 | @lisp | |
1420 | (truncate-quotient 123 10) @result{} 12 | |
1421 | (truncate-remainder 123 10) @result{} 3 | |
1422 | (truncate/ 123 10) @result{} 12 and 3 | |
1423 | (truncate/ 123 -10) @result{} -12 and 3 | |
1424 | (truncate/ -123 10) @result{} -12 and -3 | |
1425 | (truncate/ -123 -10) @result{} 12 and -3 | |
1426 | (truncate/ -123.2 -63.5) @result{} 1.0 and -59.7 | |
1427 | (truncate/ 16/3 -10/7) @result{} -3 and 22/21 | |
1428 | @end lisp | |
1429 | @end deftypefn | |
1430 | ||
5fbf680b MW |
1431 | @deftypefn {Scheme Procedure} {} centered/ @var{x} @var{y} |
1432 | @deftypefnx {Scheme Procedure} {} centered-quotient @var{x} @var{y} | |
1433 | @deftypefnx {Scheme Procedure} {} centered-remainder @var{x} @var{y} | |
1434 | @deftypefnx {C Function} void scm_centered_divide (SCM @var{x}, SCM @var{y}, SCM *@var{q}, SCM *@var{r}) | |
1435 | @deftypefnx {C Function} SCM scm_centered_quotient (SCM @var{x}, SCM @var{y}) | |
1436 | @deftypefnx {C Function} SCM scm_centered_remainder (SCM @var{x}, SCM @var{y}) | |
ff62c168 MW |
1437 | These procedures accept two real numbers @var{x} and @var{y}, where the |
1438 | divisor @var{y} must be non-zero. @code{centered-quotient} returns the | |
1439 | integer @var{q} and @code{centered-remainder} returns the real number | |
1440 | @var{r} such that @math{@var{x} = @var{q}*@var{y} + @var{r}} and | |
5fbf680b | 1441 | @math{-|@var{y}/2| <= @var{r} < |@var{y}/2|}. @code{centered/} |
ff62c168 MW |
1442 | returns both @var{q} and @var{r}, and is more efficient than computing |
1443 | each separately. | |
1444 | ||
1445 | Note that @code{centered-quotient} returns @math{@var{x}/@var{y}} | |
1446 | rounded to the nearest integer. When @math{@var{x}/@var{y}} lies | |
1447 | exactly half-way between two integers, the tie is broken according to | |
1448 | the sign of @var{y}. If @math{@var{y} > 0}, ties are rounded toward | |
1449 | positive infinity, otherwise they are rounded toward negative infinity. | |
5fbf680b MW |
1450 | This is a consequence of the requirement that |
1451 | @math{-|@var{y}/2| <= @var{r} < |@var{y}/2|}. | |
ff62c168 MW |
1452 | |
1453 | Note that these operators are equivalent to the R6RS operators | |
1454 | @code{div0}, @code{mod0}, and @code{div0-and-mod0}. | |
1455 | ||
1456 | @lisp | |
1457 | (centered-quotient 123 10) @result{} 12 | |
1458 | (centered-remainder 123 10) @result{} 3 | |
1459 | (centered/ 123 10) @result{} 12 and 3 | |
1460 | (centered/ 123 -10) @result{} -12 and 3 | |
1461 | (centered/ -123 10) @result{} -12 and -3 | |
1462 | (centered/ -123 -10) @result{} 12 and -3 | |
8f9da340 MW |
1463 | (centered/ 125 10) @result{} 13 and -5 |
1464 | (centered/ 127 10) @result{} 13 and -3 | |
1465 | (centered/ 135 10) @result{} 14 and -5 | |
ff62c168 MW |
1466 | (centered/ -123.2 -63.5) @result{} 2.0 and 3.8 |
1467 | (centered/ 16/3 -10/7) @result{} -4 and -8/21 | |
1468 | @end lisp | |
5fbf680b | 1469 | @end deftypefn |
ff62c168 | 1470 | |
8f9da340 MW |
1471 | @deftypefn {Scheme Procedure} {} round/ @var{x} @var{y} |
1472 | @deftypefnx {Scheme Procedure} {} round-quotient @var{x} @var{y} | |
1473 | @deftypefnx {Scheme Procedure} {} round-remainder @var{x} @var{y} | |
1474 | @deftypefnx {C Function} void scm_round_divide (SCM @var{x}, SCM @var{y}, SCM *@var{q}, SCM *@var{r}) | |
1475 | @deftypefnx {C Function} SCM scm_round_quotient (@var{x}, @var{y}) | |
1476 | @deftypefnx {C Function} SCM scm_round_remainder (@var{x}, @var{y}) | |
1477 | These procedures accept two real numbers @var{x} and @var{y}, where the | |
1478 | divisor @var{y} must be non-zero. @code{round-quotient} returns the | |
1479 | integer @var{q} and @code{round-remainder} returns the real number | |
1480 | @var{r} such that @math{@var{x} = @var{q}*@var{y} + @var{r}} and | |
1481 | @var{q} is @math{@var{x}/@var{y}} rounded to the nearest integer, | |
1482 | with ties going to the nearest even integer. @code{round/} | |
1483 | returns both @var{q} and @var{r}, and is more efficient than computing | |
1484 | each separately. | |
1485 | ||
1486 | Note that @code{round/} and @code{centered/} are almost equivalent, but | |
1487 | their behavior differs when @math{@var{x}/@var{y}} lies exactly half-way | |
1488 | between two integers. In this case, @code{round/} chooses the nearest | |
1489 | even integer, whereas @code{centered/} chooses in such a way to satisfy | |
1490 | the constraint @math{-|@var{y}/2| <= @var{r} < |@var{y}/2|}, which | |
1491 | is stronger than the corresponding constraint for @code{round/}, | |
1492 | @math{-|@var{y}/2| <= @var{r} <= |@var{y}/2|}. In particular, | |
1493 | when @var{x} and @var{y} are integers, the number of possible remainders | |
1494 | returned by @code{centered/} is @math{|@var{y}|}, whereas the number of | |
1495 | possible remainders returned by @code{round/} is @math{|@var{y}|+1} when | |
1496 | @var{y} is even. | |
1497 | ||
1498 | @lisp | |
1499 | (round-quotient 123 10) @result{} 12 | |
1500 | (round-remainder 123 10) @result{} 3 | |
1501 | (round/ 123 10) @result{} 12 and 3 | |
1502 | (round/ 123 -10) @result{} -12 and 3 | |
1503 | (round/ -123 10) @result{} -12 and -3 | |
1504 | (round/ -123 -10) @result{} 12 and -3 | |
1505 | (round/ 125 10) @result{} 12 and 5 | |
1506 | (round/ 127 10) @result{} 13 and -3 | |
1507 | (round/ 135 10) @result{} 14 and -5 | |
1508 | (round/ -123.2 -63.5) @result{} 2.0 and 3.8 | |
1509 | (round/ 16/3 -10/7) @result{} -4 and -8/21 | |
1510 | @end lisp | |
1511 | @end deftypefn | |
1512 | ||
07d83abe MV |
1513 | @node Scientific |
1514 | @subsubsection Scientific Functions | |
1515 | ||
1516 | The following procedures accept any kind of number as arguments, | |
1517 | including complex numbers. | |
1518 | ||
1519 | @rnindex sqrt | |
1520 | @c begin (texi-doc-string "guile" "sqrt") | |
1521 | @deffn {Scheme Procedure} sqrt z | |
40296bab | 1522 | Return the square root of @var{z}. Of the two possible roots |
ecb87335 | 1523 | (positive and negative), the one with a positive real part is |
40296bab KR |
1524 | returned, or if that's zero then a positive imaginary part. Thus, |
1525 | ||
1526 | @example | |
1527 | (sqrt 9.0) @result{} 3.0 | |
1528 | (sqrt -9.0) @result{} 0.0+3.0i | |
1529 | (sqrt 1.0+1.0i) @result{} 1.09868411346781+0.455089860562227i | |
1530 | (sqrt -1.0-1.0i) @result{} 0.455089860562227-1.09868411346781i | |
1531 | @end example | |
07d83abe MV |
1532 | @end deffn |
1533 | ||
1534 | @rnindex expt | |
1535 | @c begin (texi-doc-string "guile" "expt") | |
1536 | @deffn {Scheme Procedure} expt z1 z2 | |
1537 | Return @var{z1} raised to the power of @var{z2}. | |
1538 | @end deffn | |
1539 | ||
1540 | @rnindex sin | |
1541 | @c begin (texi-doc-string "guile" "sin") | |
1542 | @deffn {Scheme Procedure} sin z | |
1543 | Return the sine of @var{z}. | |
1544 | @end deffn | |
1545 | ||
1546 | @rnindex cos | |
1547 | @c begin (texi-doc-string "guile" "cos") | |
1548 | @deffn {Scheme Procedure} cos z | |
1549 | Return the cosine of @var{z}. | |
1550 | @end deffn | |
1551 | ||
1552 | @rnindex tan | |
1553 | @c begin (texi-doc-string "guile" "tan") | |
1554 | @deffn {Scheme Procedure} tan z | |
1555 | Return the tangent of @var{z}. | |
1556 | @end deffn | |
1557 | ||
1558 | @rnindex asin | |
1559 | @c begin (texi-doc-string "guile" "asin") | |
1560 | @deffn {Scheme Procedure} asin z | |
1561 | Return the arcsine of @var{z}. | |
1562 | @end deffn | |
1563 | ||
1564 | @rnindex acos | |
1565 | @c begin (texi-doc-string "guile" "acos") | |
1566 | @deffn {Scheme Procedure} acos z | |
1567 | Return the arccosine of @var{z}. | |
1568 | @end deffn | |
1569 | ||
1570 | @rnindex atan | |
1571 | @c begin (texi-doc-string "guile" "atan") | |
1572 | @deffn {Scheme Procedure} atan z | |
1573 | @deffnx {Scheme Procedure} atan y x | |
1574 | Return the arctangent of @var{z}, or of @math{@var{y}/@var{x}}. | |
1575 | @end deffn | |
1576 | ||
1577 | @rnindex exp | |
1578 | @c begin (texi-doc-string "guile" "exp") | |
1579 | @deffn {Scheme Procedure} exp z | |
1580 | Return e to the power of @var{z}, where e is the base of natural | |
1581 | logarithms (2.71828@dots{}). | |
1582 | @end deffn | |
1583 | ||
1584 | @rnindex log | |
1585 | @c begin (texi-doc-string "guile" "log") | |
1586 | @deffn {Scheme Procedure} log z | |
1587 | Return the natural logarithm of @var{z}. | |
1588 | @end deffn | |
1589 | ||
1590 | @c begin (texi-doc-string "guile" "log10") | |
1591 | @deffn {Scheme Procedure} log10 z | |
1592 | Return the base 10 logarithm of @var{z}. | |
1593 | @end deffn | |
1594 | ||
1595 | @c begin (texi-doc-string "guile" "sinh") | |
1596 | @deffn {Scheme Procedure} sinh z | |
1597 | Return the hyperbolic sine of @var{z}. | |
1598 | @end deffn | |
1599 | ||
1600 | @c begin (texi-doc-string "guile" "cosh") | |
1601 | @deffn {Scheme Procedure} cosh z | |
1602 | Return the hyperbolic cosine of @var{z}. | |
1603 | @end deffn | |
1604 | ||
1605 | @c begin (texi-doc-string "guile" "tanh") | |
1606 | @deffn {Scheme Procedure} tanh z | |
1607 | Return the hyperbolic tangent of @var{z}. | |
1608 | @end deffn | |
1609 | ||
1610 | @c begin (texi-doc-string "guile" "asinh") | |
1611 | @deffn {Scheme Procedure} asinh z | |
1612 | Return the hyperbolic arcsine of @var{z}. | |
1613 | @end deffn | |
1614 | ||
1615 | @c begin (texi-doc-string "guile" "acosh") | |
1616 | @deffn {Scheme Procedure} acosh z | |
1617 | Return the hyperbolic arccosine of @var{z}. | |
1618 | @end deffn | |
1619 | ||
1620 | @c begin (texi-doc-string "guile" "atanh") | |
1621 | @deffn {Scheme Procedure} atanh z | |
1622 | Return the hyperbolic arctangent of @var{z}. | |
1623 | @end deffn | |
1624 | ||
1625 | ||
07d83abe MV |
1626 | @node Bitwise Operations |
1627 | @subsubsection Bitwise Operations | |
1628 | ||
1629 | For the following bitwise functions, negative numbers are treated as | |
1630 | infinite precision twos-complements. For instance @math{-6} is bits | |
1631 | @math{@dots{}111010}, with infinitely many ones on the left. It can | |
1632 | be seen that adding 6 (binary 110) to such a bit pattern gives all | |
1633 | zeros. | |
1634 | ||
1635 | @deffn {Scheme Procedure} logand n1 n2 @dots{} | |
1636 | @deffnx {C Function} scm_logand (n1, n2) | |
1637 | Return the bitwise @sc{and} of the integer arguments. | |
1638 | ||
1639 | @lisp | |
1640 | (logand) @result{} -1 | |
1641 | (logand 7) @result{} 7 | |
1642 | (logand #b111 #b011 #b001) @result{} 1 | |
1643 | @end lisp | |
1644 | @end deffn | |
1645 | ||
1646 | @deffn {Scheme Procedure} logior n1 n2 @dots{} | |
1647 | @deffnx {C Function} scm_logior (n1, n2) | |
1648 | Return the bitwise @sc{or} of the integer arguments. | |
1649 | ||
1650 | @lisp | |
1651 | (logior) @result{} 0 | |
1652 | (logior 7) @result{} 7 | |
1653 | (logior #b000 #b001 #b011) @result{} 3 | |
1654 | @end lisp | |
1655 | @end deffn | |
1656 | ||
1657 | @deffn {Scheme Procedure} logxor n1 n2 @dots{} | |
1658 | @deffnx {C Function} scm_loxor (n1, n2) | |
1659 | Return the bitwise @sc{xor} of the integer arguments. A bit is | |
1660 | set in the result if it is set in an odd number of arguments. | |
1661 | ||
1662 | @lisp | |
1663 | (logxor) @result{} 0 | |
1664 | (logxor 7) @result{} 7 | |
1665 | (logxor #b000 #b001 #b011) @result{} 2 | |
1666 | (logxor #b000 #b001 #b011 #b011) @result{} 1 | |
1667 | @end lisp | |
1668 | @end deffn | |
1669 | ||
1670 | @deffn {Scheme Procedure} lognot n | |
1671 | @deffnx {C Function} scm_lognot (n) | |
1672 | Return the integer which is the ones-complement of the integer | |
1673 | argument, ie.@: each 0 bit is changed to 1 and each 1 bit to 0. | |
1674 | ||
1675 | @lisp | |
1676 | (number->string (lognot #b10000000) 2) | |
1677 | @result{} "-10000001" | |
1678 | (number->string (lognot #b0) 2) | |
1679 | @result{} "-1" | |
1680 | @end lisp | |
1681 | @end deffn | |
1682 | ||
1683 | @deffn {Scheme Procedure} logtest j k | |
1684 | @deffnx {C Function} scm_logtest (j, k) | |
a46648ac KR |
1685 | Test whether @var{j} and @var{k} have any 1 bits in common. This is |
1686 | equivalent to @code{(not (zero? (logand j k)))}, but without actually | |
1687 | calculating the @code{logand}, just testing for non-zero. | |
07d83abe | 1688 | |
a46648ac | 1689 | @lisp |
07d83abe MV |
1690 | (logtest #b0100 #b1011) @result{} #f |
1691 | (logtest #b0100 #b0111) @result{} #t | |
1692 | @end lisp | |
1693 | @end deffn | |
1694 | ||
1695 | @deffn {Scheme Procedure} logbit? index j | |
1696 | @deffnx {C Function} scm_logbit_p (index, j) | |
a46648ac KR |
1697 | Test whether bit number @var{index} in @var{j} is set. @var{index} |
1698 | starts from 0 for the least significant bit. | |
07d83abe | 1699 | |
a46648ac | 1700 | @lisp |
07d83abe MV |
1701 | (logbit? 0 #b1101) @result{} #t |
1702 | (logbit? 1 #b1101) @result{} #f | |
1703 | (logbit? 2 #b1101) @result{} #t | |
1704 | (logbit? 3 #b1101) @result{} #t | |
1705 | (logbit? 4 #b1101) @result{} #f | |
1706 | @end lisp | |
1707 | @end deffn | |
1708 | ||
e08a12b5 MW |
1709 | @deffn {Scheme Procedure} ash n count |
1710 | @deffnx {C Function} scm_ash (n, count) | |
912f5f34 | 1711 | Return @math{floor(n * 2^count)}. |
e08a12b5 | 1712 | @var{n} and @var{count} must be exact integers. |
07d83abe | 1713 | |
e08a12b5 MW |
1714 | With @var{n} viewed as an infinite-precision twos-complement |
1715 | integer, @code{ash} means a left shift introducing zero bits | |
1716 | when @var{count} is positive, or a right shift dropping bits | |
1717 | when @var{count} is negative. This is an ``arithmetic'' shift. | |
07d83abe MV |
1718 | |
1719 | @lisp | |
1720 | (number->string (ash #b1 3) 2) @result{} "1000" | |
1721 | (number->string (ash #b1010 -1) 2) @result{} "101" | |
1722 | ||
1723 | ;; -23 is bits ...11101001, -6 is bits ...111010 | |
1724 | (ash -23 -2) @result{} -6 | |
1725 | @end lisp | |
1726 | @end deffn | |
1727 | ||
e08a12b5 MW |
1728 | @deffn {Scheme Procedure} round-ash n count |
1729 | @deffnx {C Function} scm_round_ash (n, count) | |
912f5f34 | 1730 | Return @math{round(n * 2^count)}. |
e08a12b5 MW |
1731 | @var{n} and @var{count} must be exact integers. |
1732 | ||
1733 | With @var{n} viewed as an infinite-precision twos-complement | |
1734 | integer, @code{round-ash} means a left shift introducing zero | |
1735 | bits when @var{count} is positive, or a right shift rounding | |
1736 | to the nearest integer (with ties going to the nearest even | |
1737 | integer) when @var{count} is negative. This is a rounded | |
1738 | ``arithmetic'' shift. | |
1739 | ||
1740 | @lisp | |
1741 | (number->string (round-ash #b1 3) 2) @result{} \"1000\" | |
1742 | (number->string (round-ash #b1010 -1) 2) @result{} \"101\" | |
1743 | (number->string (round-ash #b1010 -2) 2) @result{} \"10\" | |
1744 | (number->string (round-ash #b1011 -2) 2) @result{} \"11\" | |
1745 | (number->string (round-ash #b1101 -2) 2) @result{} \"11\" | |
1746 | (number->string (round-ash #b1110 -2) 2) @result{} \"100\" | |
1747 | @end lisp | |
1748 | @end deffn | |
1749 | ||
07d83abe MV |
1750 | @deffn {Scheme Procedure} logcount n |
1751 | @deffnx {C Function} scm_logcount (n) | |
a46648ac | 1752 | Return the number of bits in integer @var{n}. If @var{n} is |
07d83abe MV |
1753 | positive, the 1-bits in its binary representation are counted. |
1754 | If negative, the 0-bits in its two's-complement binary | |
a46648ac | 1755 | representation are counted. If zero, 0 is returned. |
07d83abe MV |
1756 | |
1757 | @lisp | |
1758 | (logcount #b10101010) | |
1759 | @result{} 4 | |
1760 | (logcount 0) | |
1761 | @result{} 0 | |
1762 | (logcount -2) | |
1763 | @result{} 1 | |
1764 | @end lisp | |
1765 | @end deffn | |
1766 | ||
1767 | @deffn {Scheme Procedure} integer-length n | |
1768 | @deffnx {C Function} scm_integer_length (n) | |
1769 | Return the number of bits necessary to represent @var{n}. | |
1770 | ||
1771 | For positive @var{n} this is how many bits to the most significant one | |
1772 | bit. For negative @var{n} it's how many bits to the most significant | |
1773 | zero bit in twos complement form. | |
1774 | ||
1775 | @lisp | |
1776 | (integer-length #b10101010) @result{} 8 | |
1777 | (integer-length #b1111) @result{} 4 | |
1778 | (integer-length 0) @result{} 0 | |
1779 | (integer-length -1) @result{} 0 | |
1780 | (integer-length -256) @result{} 8 | |
1781 | (integer-length -257) @result{} 9 | |
1782 | @end lisp | |
1783 | @end deffn | |
1784 | ||
1785 | @deffn {Scheme Procedure} integer-expt n k | |
1786 | @deffnx {C Function} scm_integer_expt (n, k) | |
a46648ac KR |
1787 | Return @var{n} raised to the power @var{k}. @var{k} must be an exact |
1788 | integer, @var{n} can be any number. | |
1789 | ||
1790 | Negative @var{k} is supported, and results in @m{1/n^|k|, 1/n^abs(k)} | |
1791 | in the usual way. @math{@var{n}^0} is 1, as usual, and that includes | |
1792 | @math{0^0} is 1. | |
07d83abe MV |
1793 | |
1794 | @lisp | |
a46648ac KR |
1795 | (integer-expt 2 5) @result{} 32 |
1796 | (integer-expt -3 3) @result{} -27 | |
1797 | (integer-expt 5 -3) @result{} 1/125 | |
1798 | (integer-expt 0 0) @result{} 1 | |
07d83abe MV |
1799 | @end lisp |
1800 | @end deffn | |
1801 | ||
1802 | @deffn {Scheme Procedure} bit-extract n start end | |
1803 | @deffnx {C Function} scm_bit_extract (n, start, end) | |
1804 | Return the integer composed of the @var{start} (inclusive) | |
1805 | through @var{end} (exclusive) bits of @var{n}. The | |
1806 | @var{start}th bit becomes the 0-th bit in the result. | |
1807 | ||
1808 | @lisp | |
1809 | (number->string (bit-extract #b1101101010 0 4) 2) | |
1810 | @result{} "1010" | |
1811 | (number->string (bit-extract #b1101101010 4 9) 2) | |
1812 | @result{} "10110" | |
1813 | @end lisp | |
1814 | @end deffn | |
1815 | ||
1816 | ||
1817 | @node Random | |
1818 | @subsubsection Random Number Generation | |
1819 | ||
1820 | Pseudo-random numbers are generated from a random state object, which | |
77b13912 | 1821 | can be created with @code{seed->random-state} or |
679cceed | 1822 | @code{datum->random-state}. An external representation (i.e.@: one |
77b13912 AR |
1823 | which can written with @code{write} and read with @code{read}) of a |
1824 | random state object can be obtained via | |
1d454874 | 1825 | @code{random-state->datum}. The @var{state} parameter to the |
77b13912 AR |
1826 | various functions below is optional, it defaults to the state object |
1827 | in the @code{*random-state*} variable. | |
07d83abe MV |
1828 | |
1829 | @deffn {Scheme Procedure} copy-random-state [state] | |
1830 | @deffnx {C Function} scm_copy_random_state (state) | |
1831 | Return a copy of the random state @var{state}. | |
1832 | @end deffn | |
1833 | ||
1834 | @deffn {Scheme Procedure} random n [state] | |
1835 | @deffnx {C Function} scm_random (n, state) | |
1836 | Return a number in [0, @var{n}). | |
1837 | ||
1838 | Accepts a positive integer or real n and returns a | |
1839 | number of the same type between zero (inclusive) and | |
1840 | @var{n} (exclusive). The values returned have a uniform | |
1841 | distribution. | |
1842 | @end deffn | |
1843 | ||
1844 | @deffn {Scheme Procedure} random:exp [state] | |
1845 | @deffnx {C Function} scm_random_exp (state) | |
1846 | Return an inexact real in an exponential distribution with mean | |
1847 | 1. For an exponential distribution with mean @var{u} use @code{(* | |
1848 | @var{u} (random:exp))}. | |
1849 | @end deffn | |
1850 | ||
1851 | @deffn {Scheme Procedure} random:hollow-sphere! vect [state] | |
1852 | @deffnx {C Function} scm_random_hollow_sphere_x (vect, state) | |
1853 | Fills @var{vect} with inexact real random numbers the sum of whose | |
1854 | squares is equal to 1.0. Thinking of @var{vect} as coordinates in | |
1855 | space of dimension @var{n} @math{=} @code{(vector-length @var{vect})}, | |
1856 | the coordinates are uniformly distributed over the surface of the unit | |
1857 | n-sphere. | |
1858 | @end deffn | |
1859 | ||
1860 | @deffn {Scheme Procedure} random:normal [state] | |
1861 | @deffnx {C Function} scm_random_normal (state) | |
1862 | Return an inexact real in a normal distribution. The distribution | |
1863 | used has mean 0 and standard deviation 1. For a normal distribution | |
1864 | with mean @var{m} and standard deviation @var{d} use @code{(+ @var{m} | |
1865 | (* @var{d} (random:normal)))}. | |
1866 | @end deffn | |
1867 | ||
1868 | @deffn {Scheme Procedure} random:normal-vector! vect [state] | |
1869 | @deffnx {C Function} scm_random_normal_vector_x (vect, state) | |
1870 | Fills @var{vect} with inexact real random numbers that are | |
1871 | independent and standard normally distributed | |
1872 | (i.e., with mean 0 and variance 1). | |
1873 | @end deffn | |
1874 | ||
1875 | @deffn {Scheme Procedure} random:solid-sphere! vect [state] | |
1876 | @deffnx {C Function} scm_random_solid_sphere_x (vect, state) | |
1877 | Fills @var{vect} with inexact real random numbers the sum of whose | |
1878 | squares is less than 1.0. Thinking of @var{vect} as coordinates in | |
1879 | space of dimension @var{n} @math{=} @code{(vector-length @var{vect})}, | |
1880 | the coordinates are uniformly distributed within the unit | |
4497bd2f | 1881 | @var{n}-sphere. |
07d83abe MV |
1882 | @c FIXME: What does this mean, particularly the n-sphere part? |
1883 | @end deffn | |
1884 | ||
1885 | @deffn {Scheme Procedure} random:uniform [state] | |
1886 | @deffnx {C Function} scm_random_uniform (state) | |
1887 | Return a uniformly distributed inexact real random number in | |
1888 | [0,1). | |
1889 | @end deffn | |
1890 | ||
1891 | @deffn {Scheme Procedure} seed->random-state seed | |
1892 | @deffnx {C Function} scm_seed_to_random_state (seed) | |
1893 | Return a new random state using @var{seed}. | |
1894 | @end deffn | |
1895 | ||
1d454874 AW |
1896 | @deffn {Scheme Procedure} datum->random-state datum |
1897 | @deffnx {C Function} scm_datum_to_random_state (datum) | |
1898 | Return a new random state from @var{datum}, which should have been | |
1899 | obtained by @code{random-state->datum}. | |
77b13912 AR |
1900 | @end deffn |
1901 | ||
1d454874 AW |
1902 | @deffn {Scheme Procedure} random-state->datum state |
1903 | @deffnx {C Function} scm_random_state_to_datum (state) | |
1904 | Return a datum representation of @var{state} that may be written out and | |
1905 | read back with the Scheme reader. | |
77b13912 AR |
1906 | @end deffn |
1907 | ||
d47db067 MW |
1908 | @deffn {Scheme Procedure} random-state-from-platform |
1909 | @deffnx {C Function} scm_random_state_from_platform () | |
1910 | Construct a new random state seeded from a platform-specific source of | |
1911 | entropy, appropriate for use in non-security-critical applications. | |
1912 | Currently @file{/dev/urandom} is tried first, or else the seed is based | |
1913 | on the time, date, process ID, an address from a freshly allocated heap | |
1914 | cell, an address from the local stack frame, and a high-resolution timer | |
1915 | if available. | |
1916 | @end deffn | |
1917 | ||
07d83abe MV |
1918 | @defvar *random-state* |
1919 | The global random state used by the above functions when the | |
1920 | @var{state} parameter is not given. | |
1921 | @end defvar | |
1922 | ||
8c726cf0 NJ |
1923 | Note that the initial value of @code{*random-state*} is the same every |
1924 | time Guile starts up. Therefore, if you don't pass a @var{state} | |
1925 | parameter to the above procedures, and you don't set | |
1926 | @code{*random-state*} to @code{(seed->random-state your-seed)}, where | |
1927 | @code{your-seed} is something that @emph{isn't} the same every time, | |
1928 | you'll get the same sequence of ``random'' numbers on every run. | |
1929 | ||
1930 | For example, unless the relevant source code has changed, @code{(map | |
1931 | random (cdr (iota 30)))}, if the first use of random numbers since | |
1932 | Guile started up, will always give: | |
1933 | ||
1934 | @lisp | |
1935 | (map random (cdr (iota 19))) | |
1936 | @result{} | |
1937 | (0 1 1 2 2 2 1 2 6 7 10 0 5 3 12 5 5 12) | |
1938 | @end lisp | |
1939 | ||
d47db067 MW |
1940 | To seed the random state in a sensible way for non-security-critical |
1941 | applications, do this during initialization of your program: | |
8c726cf0 NJ |
1942 | |
1943 | @lisp | |
d47db067 | 1944 | (set! *random-state* (random-state-from-platform)) |
8c726cf0 NJ |
1945 | @end lisp |
1946 | ||
07d83abe MV |
1947 | |
1948 | @node Characters | |
1949 | @subsection Characters | |
1950 | @tpindex Characters | |
1951 | ||
3f12aedb MG |
1952 | In Scheme, there is a data type to describe a single character. |
1953 | ||
1954 | Defining what exactly a character @emph{is} can be more complicated | |
bb15a36c MG |
1955 | than it seems. Guile follows the advice of R6RS and uses The Unicode |
1956 | Standard to help define what a character is. So, for Guile, a | |
1957 | character is anything in the Unicode Character Database. | |
1958 | ||
1959 | @cindex code point | |
1960 | @cindex Unicode code point | |
1961 | ||
1962 | The Unicode Character Database is basically a table of characters | |
1963 | indexed using integers called 'code points'. Valid code points are in | |
1964 | the ranges 0 to @code{#xD7FF} inclusive or @code{#xE000} to | |
1965 | @code{#x10FFFF} inclusive, which is about 1.1 million code points. | |
1966 | ||
1967 | @cindex designated code point | |
1968 | @cindex code point, designated | |
1969 | ||
1970 | Any code point that has been assigned to a character or that has | |
1971 | otherwise been given a meaning by Unicode is called a 'designated code | |
1972 | point'. Most of the designated code points, about 200,000 of them, | |
1973 | indicate characters, accents or other combining marks that modify | |
1974 | other characters, symbols, whitespace, and control characters. Some | |
1975 | are not characters but indicators that suggest how to format or | |
1976 | display neighboring characters. | |
1977 | ||
1978 | @cindex reserved code point | |
1979 | @cindex code point, reserved | |
1980 | ||
1981 | If a code point is not a designated code point -- if it has not been | |
1982 | assigned to a character by The Unicode Standard -- it is a 'reserved | |
1983 | code point', meaning that they are reserved for future use. Most of | |
1984 | the code points, about 800,000, are 'reserved code points'. | |
1985 | ||
1986 | By convention, a Unicode code point is written as | |
1987 | ``U+XXXX'' where ``XXXX'' is a hexadecimal number. Please note that | |
1988 | this convenient notation is not valid code. Guile does not interpret | |
1989 | ``U+XXXX'' as a character. | |
3f12aedb | 1990 | |
050ab45f MV |
1991 | In Scheme, a character literal is written as @code{#\@var{name}} where |
1992 | @var{name} is the name of the character that you want. Printable | |
1993 | characters have their usual single character name; for example, | |
bb15a36c MG |
1994 | @code{#\a} is a lower case @code{a}. |
1995 | ||
1996 | Some of the code points are 'combining characters' that are not meant | |
1997 | to be printed by themselves but are instead meant to modify the | |
1998 | appearance of the previous character. For combining characters, an | |
1999 | alternate form of the character literal is @code{#\} followed by | |
2000 | U+25CC (a small, dotted circle), followed by the combining character. | |
2001 | This allows the combining character to be drawn on the circle, not on | |
2002 | the backslash of @code{#\}. | |
2003 | ||
2004 | Many of the non-printing characters, such as whitespace characters and | |
2005 | control characters, also have names. | |
07d83abe | 2006 | |
15b6a6b2 MG |
2007 | The most commonly used non-printing characters have long character |
2008 | names, described in the table below. | |
2009 | ||
2010 | @multitable {@code{#\backspace}} {Preferred} | |
2011 | @item Character Name @tab Codepoint | |
2012 | @item @code{#\nul} @tab U+0000 | |
2013 | @item @code{#\alarm} @tab u+0007 | |
2014 | @item @code{#\backspace} @tab U+0008 | |
2015 | @item @code{#\tab} @tab U+0009 | |
2016 | @item @code{#\linefeed} @tab U+000A | |
2017 | @item @code{#\newline} @tab U+000A | |
2018 | @item @code{#\vtab} @tab U+000B | |
2019 | @item @code{#\page} @tab U+000C | |
2020 | @item @code{#\return} @tab U+000D | |
2021 | @item @code{#\esc} @tab U+001B | |
2022 | @item @code{#\space} @tab U+0020 | |
2023 | @item @code{#\delete} @tab U+007F | |
2024 | @end multitable | |
2025 | ||
2026 | There are also short names for all of the ``C0 control characters'' | |
2027 | (those with code points below 32). The following table lists the short | |
2028 | name for each character. | |
07d83abe MV |
2029 | |
2030 | @multitable @columnfractions .25 .25 .25 .25 | |
2031 | @item 0 = @code{#\nul} | |
2032 | @tab 1 = @code{#\soh} | |
2033 | @tab 2 = @code{#\stx} | |
2034 | @tab 3 = @code{#\etx} | |
2035 | @item 4 = @code{#\eot} | |
2036 | @tab 5 = @code{#\enq} | |
2037 | @tab 6 = @code{#\ack} | |
2038 | @tab 7 = @code{#\bel} | |
2039 | @item 8 = @code{#\bs} | |
2040 | @tab 9 = @code{#\ht} | |
6ea30487 | 2041 | @tab 10 = @code{#\lf} |
07d83abe | 2042 | @tab 11 = @code{#\vt} |
3f12aedb | 2043 | @item 12 = @code{#\ff} |
07d83abe MV |
2044 | @tab 13 = @code{#\cr} |
2045 | @tab 14 = @code{#\so} | |
2046 | @tab 15 = @code{#\si} | |
2047 | @item 16 = @code{#\dle} | |
2048 | @tab 17 = @code{#\dc1} | |
2049 | @tab 18 = @code{#\dc2} | |
2050 | @tab 19 = @code{#\dc3} | |
2051 | @item 20 = @code{#\dc4} | |
2052 | @tab 21 = @code{#\nak} | |
2053 | @tab 22 = @code{#\syn} | |
2054 | @tab 23 = @code{#\etb} | |
2055 | @item 24 = @code{#\can} | |
2056 | @tab 25 = @code{#\em} | |
2057 | @tab 26 = @code{#\sub} | |
2058 | @tab 27 = @code{#\esc} | |
2059 | @item 28 = @code{#\fs} | |
2060 | @tab 29 = @code{#\gs} | |
2061 | @tab 30 = @code{#\rs} | |
2062 | @tab 31 = @code{#\us} | |
2063 | @item 32 = @code{#\sp} | |
2064 | @end multitable | |
2065 | ||
15b6a6b2 MG |
2066 | The short name for the ``delete'' character (code point U+007F) is |
2067 | @code{#\del}. | |
07d83abe | 2068 | |
394449d5 MW |
2069 | The R7RS name for the ``escape'' character (code point U+001B) is |
2070 | @code{#\escape}. | |
2071 | ||
15b6a6b2 MG |
2072 | There are also a few alternative names left over for compatibility with |
2073 | previous versions of Guile. | |
07d83abe | 2074 | |
3f12aedb MG |
2075 | @multitable {@code{#\backspace}} {Preferred} |
2076 | @item Alternate @tab Standard | |
3f12aedb | 2077 | @item @code{#\nl} @tab @code{#\newline} |
15b6a6b2 | 2078 | @item @code{#\np} @tab @code{#\page} |
07d83abe MV |
2079 | @item @code{#\null} @tab @code{#\nul} |
2080 | @end multitable | |
2081 | ||
bb15a36c MG |
2082 | Characters may also be written using their code point values. They can |
2083 | be written with as an octal number, such as @code{#\10} for | |
2084 | @code{#\bs} or @code{#\177} for @code{#\del}. | |
3f12aedb | 2085 | |
0f3a70cf MG |
2086 | If one prefers hex to octal, there is an additional syntax for character |
2087 | escapes: @code{#\xHHHH} -- the letter 'x' followed by a hexadecimal | |
2088 | number of one to eight digits. | |
6ea30487 | 2089 | |
07d83abe MV |
2090 | @rnindex char? |
2091 | @deffn {Scheme Procedure} char? x | |
2092 | @deffnx {C Function} scm_char_p (x) | |
a4b4fbbd | 2093 | Return @code{#t} if @var{x} is a character, else @code{#f}. |
07d83abe MV |
2094 | @end deffn |
2095 | ||
bb15a36c | 2096 | Fundamentally, the character comparison operations below are |
3f12aedb MG |
2097 | numeric comparisons of the character's code points. |
2098 | ||
07d83abe MV |
2099 | @rnindex char=? |
2100 | @deffn {Scheme Procedure} char=? x y | |
a4b4fbbd | 2101 | Return @code{#t} if code point of @var{x} is equal to the code point |
3f12aedb | 2102 | of @var{y}, else @code{#f}. |
07d83abe MV |
2103 | @end deffn |
2104 | ||
2105 | @rnindex char<? | |
2106 | @deffn {Scheme Procedure} char<? x y | |
a4b4fbbd | 2107 | Return @code{#t} if the code point of @var{x} is less than the code |
3f12aedb | 2108 | point of @var{y}, else @code{#f}. |
07d83abe MV |
2109 | @end deffn |
2110 | ||
2111 | @rnindex char<=? | |
2112 | @deffn {Scheme Procedure} char<=? x y | |
a4b4fbbd | 2113 | Return @code{#t} if the code point of @var{x} is less than or equal |
3f12aedb | 2114 | to the code point of @var{y}, else @code{#f}. |
07d83abe MV |
2115 | @end deffn |
2116 | ||
2117 | @rnindex char>? | |
2118 | @deffn {Scheme Procedure} char>? x y | |
a4b4fbbd | 2119 | Return @code{#t} if the code point of @var{x} is greater than the |
3f12aedb | 2120 | code point of @var{y}, else @code{#f}. |
07d83abe MV |
2121 | @end deffn |
2122 | ||
2123 | @rnindex char>=? | |
2124 | @deffn {Scheme Procedure} char>=? x y | |
a4b4fbbd | 2125 | Return @code{#t} if the code point of @var{x} is greater than or |
3f12aedb | 2126 | equal to the code point of @var{y}, else @code{#f}. |
07d83abe MV |
2127 | @end deffn |
2128 | ||
bb15a36c MG |
2129 | @cindex case folding |
2130 | ||
2131 | Case-insensitive character comparisons use @emph{Unicode case | |
2132 | folding}. In case folding comparisons, if a character is lowercase | |
2133 | and has an uppercase form that can be expressed as a single character, | |
2134 | it is converted to uppercase before comparison. All other characters | |
2135 | undergo no conversion before the comparison occurs. This includes the | |
2136 | German sharp S (Eszett) which is not uppercased before conversion | |
2137 | because its uppercase form has two characters. Unicode case folding | |
2138 | is language independent: it uses rules that are generally true, but, | |
2139 | it cannot cover all cases for all languages. | |
3f12aedb | 2140 | |
07d83abe MV |
2141 | @rnindex char-ci=? |
2142 | @deffn {Scheme Procedure} char-ci=? x y | |
a4b4fbbd | 2143 | Return @code{#t} if the case-folded code point of @var{x} is the same |
3f12aedb | 2144 | as the case-folded code point of @var{y}, else @code{#f}. |
07d83abe MV |
2145 | @end deffn |
2146 | ||
2147 | @rnindex char-ci<? | |
2148 | @deffn {Scheme Procedure} char-ci<? x y | |
a4b4fbbd | 2149 | Return @code{#t} if the case-folded code point of @var{x} is less |
3f12aedb | 2150 | than the case-folded code point of @var{y}, else @code{#f}. |
07d83abe MV |
2151 | @end deffn |
2152 | ||
2153 | @rnindex char-ci<=? | |
2154 | @deffn {Scheme Procedure} char-ci<=? x y | |
a4b4fbbd | 2155 | Return @code{#t} if the case-folded code point of @var{x} is less |
3f12aedb MG |
2156 | than or equal to the case-folded code point of @var{y}, else |
2157 | @code{#f}. | |
07d83abe MV |
2158 | @end deffn |
2159 | ||
2160 | @rnindex char-ci>? | |
2161 | @deffn {Scheme Procedure} char-ci>? x y | |
a4b4fbbd | 2162 | Return @code{#t} if the case-folded code point of @var{x} is greater |
3f12aedb | 2163 | than the case-folded code point of @var{y}, else @code{#f}. |
07d83abe MV |
2164 | @end deffn |
2165 | ||
2166 | @rnindex char-ci>=? | |
2167 | @deffn {Scheme Procedure} char-ci>=? x y | |
a4b4fbbd | 2168 | Return @code{#t} if the case-folded code point of @var{x} is greater |
3f12aedb MG |
2169 | than or equal to the case-folded code point of @var{y}, else |
2170 | @code{#f}. | |
07d83abe MV |
2171 | @end deffn |
2172 | ||
2173 | @rnindex char-alphabetic? | |
2174 | @deffn {Scheme Procedure} char-alphabetic? chr | |
2175 | @deffnx {C Function} scm_char_alphabetic_p (chr) | |
a4b4fbbd | 2176 | Return @code{#t} if @var{chr} is alphabetic, else @code{#f}. |
07d83abe MV |
2177 | @end deffn |
2178 | ||
2179 | @rnindex char-numeric? | |
2180 | @deffn {Scheme Procedure} char-numeric? chr | |
2181 | @deffnx {C Function} scm_char_numeric_p (chr) | |
a4b4fbbd | 2182 | Return @code{#t} if @var{chr} is numeric, else @code{#f}. |
07d83abe MV |
2183 | @end deffn |
2184 | ||
2185 | @rnindex char-whitespace? | |
2186 | @deffn {Scheme Procedure} char-whitespace? chr | |
2187 | @deffnx {C Function} scm_char_whitespace_p (chr) | |
a4b4fbbd | 2188 | Return @code{#t} if @var{chr} is whitespace, else @code{#f}. |
07d83abe MV |
2189 | @end deffn |
2190 | ||
2191 | @rnindex char-upper-case? | |
2192 | @deffn {Scheme Procedure} char-upper-case? chr | |
2193 | @deffnx {C Function} scm_char_upper_case_p (chr) | |
a4b4fbbd | 2194 | Return @code{#t} if @var{chr} is uppercase, else @code{#f}. |
07d83abe MV |
2195 | @end deffn |
2196 | ||
2197 | @rnindex char-lower-case? | |
2198 | @deffn {Scheme Procedure} char-lower-case? chr | |
2199 | @deffnx {C Function} scm_char_lower_case_p (chr) | |
a4b4fbbd | 2200 | Return @code{#t} if @var{chr} is lowercase, else @code{#f}. |
07d83abe MV |
2201 | @end deffn |
2202 | ||
2203 | @deffn {Scheme Procedure} char-is-both? chr | |
2204 | @deffnx {C Function} scm_char_is_both_p (chr) | |
a4b4fbbd | 2205 | Return @code{#t} if @var{chr} is either uppercase or lowercase, else |
5676b4fa | 2206 | @code{#f}. |
07d83abe MV |
2207 | @end deffn |
2208 | ||
0ca3a342 JG |
2209 | @deffn {Scheme Procedure} char-general-category chr |
2210 | @deffnx {C Function} scm_char_general_category (chr) | |
2211 | Return a symbol giving the two-letter name of the Unicode general | |
2212 | category assigned to @var{chr} or @code{#f} if no named category is | |
2213 | assigned. The following table provides a list of category names along | |
2214 | with their meanings. | |
2215 | ||
2216 | @multitable @columnfractions .1 .4 .1 .4 | |
2217 | @item Lu | |
2218 | @tab Uppercase letter | |
2219 | @tab Pf | |
2220 | @tab Final quote punctuation | |
2221 | @item Ll | |
2222 | @tab Lowercase letter | |
2223 | @tab Po | |
2224 | @tab Other punctuation | |
2225 | @item Lt | |
2226 | @tab Titlecase letter | |
2227 | @tab Sm | |
2228 | @tab Math symbol | |
2229 | @item Lm | |
2230 | @tab Modifier letter | |
2231 | @tab Sc | |
2232 | @tab Currency symbol | |
2233 | @item Lo | |
2234 | @tab Other letter | |
2235 | @tab Sk | |
2236 | @tab Modifier symbol | |
2237 | @item Mn | |
2238 | @tab Non-spacing mark | |
2239 | @tab So | |
2240 | @tab Other symbol | |
2241 | @item Mc | |
2242 | @tab Combining spacing mark | |
2243 | @tab Zs | |
2244 | @tab Space separator | |
2245 | @item Me | |
2246 | @tab Enclosing mark | |
2247 | @tab Zl | |
2248 | @tab Line separator | |
2249 | @item Nd | |
2250 | @tab Decimal digit number | |
2251 | @tab Zp | |
2252 | @tab Paragraph separator | |
2253 | @item Nl | |
2254 | @tab Letter number | |
2255 | @tab Cc | |
2256 | @tab Control | |
2257 | @item No | |
2258 | @tab Other number | |
2259 | @tab Cf | |
2260 | @tab Format | |
2261 | @item Pc | |
2262 | @tab Connector punctuation | |
2263 | @tab Cs | |
2264 | @tab Surrogate | |
2265 | @item Pd | |
2266 | @tab Dash punctuation | |
2267 | @tab Co | |
2268 | @tab Private use | |
2269 | @item Ps | |
2270 | @tab Open punctuation | |
2271 | @tab Cn | |
2272 | @tab Unassigned | |
2273 | @item Pe | |
2274 | @tab Close punctuation | |
2275 | @tab | |
2276 | @tab | |
2277 | @item Pi | |
2278 | @tab Initial quote punctuation | |
2279 | @tab | |
2280 | @tab | |
2281 | @end multitable | |
2282 | @end deffn | |
2283 | ||
07d83abe MV |
2284 | @rnindex char->integer |
2285 | @deffn {Scheme Procedure} char->integer chr | |
2286 | @deffnx {C Function} scm_char_to_integer (chr) | |
3f12aedb | 2287 | Return the code point of @var{chr}. |
07d83abe MV |
2288 | @end deffn |
2289 | ||
2290 | @rnindex integer->char | |
2291 | @deffn {Scheme Procedure} integer->char n | |
2292 | @deffnx {C Function} scm_integer_to_char (n) | |
3f12aedb MG |
2293 | Return the character that has code point @var{n}. The integer @var{n} |
2294 | must be a valid code point. Valid code points are in the ranges 0 to | |
2295 | @code{#xD7FF} inclusive or @code{#xE000} to @code{#x10FFFF} inclusive. | |
07d83abe MV |
2296 | @end deffn |
2297 | ||
2298 | @rnindex char-upcase | |
2299 | @deffn {Scheme Procedure} char-upcase chr | |
2300 | @deffnx {C Function} scm_char_upcase (chr) | |
2301 | Return the uppercase character version of @var{chr}. | |
2302 | @end deffn | |
2303 | ||
2304 | @rnindex char-downcase | |
2305 | @deffn {Scheme Procedure} char-downcase chr | |
2306 | @deffnx {C Function} scm_char_downcase (chr) | |
2307 | Return the lowercase character version of @var{chr}. | |
2308 | @end deffn | |
2309 | ||
820f33aa JG |
2310 | @rnindex char-titlecase |
2311 | @deffn {Scheme Procedure} char-titlecase chr | |
2312 | @deffnx {C Function} scm_char_titlecase (chr) | |
2313 | Return the titlecase character version of @var{chr} if one exists; | |
2314 | otherwise return the uppercase version. | |
2315 | ||
2316 | For most characters these will be the same, but the Unicode Standard | |
2317 | includes certain digraph compatibility characters, such as @code{U+01F3} | |
2318 | ``dz'', for which the uppercase and titlecase characters are different | |
2319 | (@code{U+01F1} ``DZ'' and @code{U+01F2} ``Dz'' in this case, | |
2320 | respectively). | |
2321 | @end deffn | |
2322 | ||
a1dcb961 MG |
2323 | @tindex scm_t_wchar |
2324 | @deftypefn {C Function} scm_t_wchar scm_c_upcase (scm_t_wchar @var{c}) | |
2325 | @deftypefnx {C Function} scm_t_wchar scm_c_downcase (scm_t_wchar @var{c}) | |
2326 | @deftypefnx {C Function} scm_t_wchar scm_c_titlecase (scm_t_wchar @var{c}) | |
2327 | ||
2328 | These C functions take an integer representation of a Unicode | |
2329 | codepoint and return the codepoint corresponding to its uppercase, | |
2330 | lowercase, and titlecase forms respectively. The type | |
2331 | @code{scm_t_wchar} is a signed, 32-bit integer. | |
2332 | @end deftypefn | |
2333 | ||
050ab45f MV |
2334 | @node Character Sets |
2335 | @subsection Character Sets | |
07d83abe | 2336 | |
050ab45f MV |
2337 | The features described in this section correspond directly to SRFI-14. |
2338 | ||
2339 | The data type @dfn{charset} implements sets of characters | |
2340 | (@pxref{Characters}). Because the internal representation of | |
2341 | character sets is not visible to the user, a lot of procedures for | |
2342 | handling them are provided. | |
2343 | ||
2344 | Character sets can be created, extended, tested for the membership of a | |
2345 | characters and be compared to other character sets. | |
2346 | ||
050ab45f MV |
2347 | @menu |
2348 | * Character Set Predicates/Comparison:: | |
2349 | * Iterating Over Character Sets:: Enumerate charset elements. | |
2350 | * Creating Character Sets:: Making new charsets. | |
2351 | * Querying Character Sets:: Test charsets for membership etc. | |
2352 | * Character-Set Algebra:: Calculating new charsets. | |
2353 | * Standard Character Sets:: Variables containing predefined charsets. | |
2354 | @end menu | |
2355 | ||
2356 | @node Character Set Predicates/Comparison | |
2357 | @subsubsection Character Set Predicates/Comparison | |
2358 | ||
2359 | Use these procedures for testing whether an object is a character set, | |
2360 | or whether several character sets are equal or subsets of each other. | |
2361 | @code{char-set-hash} can be used for calculating a hash value, maybe for | |
2362 | usage in fast lookup procedures. | |
2363 | ||
2364 | @deffn {Scheme Procedure} char-set? obj | |
2365 | @deffnx {C Function} scm_char_set_p (obj) | |
2366 | Return @code{#t} if @var{obj} is a character set, @code{#f} | |
2367 | otherwise. | |
2368 | @end deffn | |
2369 | ||
df0a1002 | 2370 | @deffn {Scheme Procedure} char-set= char_set @dots{} |
050ab45f MV |
2371 | @deffnx {C Function} scm_char_set_eq (char_sets) |
2372 | Return @code{#t} if all given character sets are equal. | |
2373 | @end deffn | |
2374 | ||
df0a1002 | 2375 | @deffn {Scheme Procedure} char-set<= char_set @dots{} |
050ab45f | 2376 | @deffnx {C Function} scm_char_set_leq (char_sets) |
64de6db5 BT |
2377 | Return @code{#t} if every character set @var{char_set}i is a subset |
2378 | of character set @var{char_set}i+1. | |
050ab45f MV |
2379 | @end deffn |
2380 | ||
2381 | @deffn {Scheme Procedure} char-set-hash cs [bound] | |
2382 | @deffnx {C Function} scm_char_set_hash (cs, bound) | |
2383 | Compute a hash value for the character set @var{cs}. If | |
2384 | @var{bound} is given and non-zero, it restricts the | |
df0a1002 | 2385 | returned value to the range 0 @dots{} @var{bound} - 1. |
050ab45f MV |
2386 | @end deffn |
2387 | ||
2388 | @c =================================================================== | |
2389 | ||
2390 | @node Iterating Over Character Sets | |
2391 | @subsubsection Iterating Over Character Sets | |
2392 | ||
2393 | Character set cursors are a means for iterating over the members of a | |
2394 | character sets. After creating a character set cursor with | |
2395 | @code{char-set-cursor}, a cursor can be dereferenced with | |
2396 | @code{char-set-ref}, advanced to the next member with | |
2397 | @code{char-set-cursor-next}. Whether a cursor has passed past the last | |
2398 | element of the set can be checked with @code{end-of-char-set?}. | |
2399 | ||
2400 | Additionally, mapping and (un-)folding procedures for character sets are | |
2401 | provided. | |
2402 | ||
2403 | @deffn {Scheme Procedure} char-set-cursor cs | |
2404 | @deffnx {C Function} scm_char_set_cursor (cs) | |
2405 | Return a cursor into the character set @var{cs}. | |
2406 | @end deffn | |
2407 | ||
2408 | @deffn {Scheme Procedure} char-set-ref cs cursor | |
2409 | @deffnx {C Function} scm_char_set_ref (cs, cursor) | |
2410 | Return the character at the current cursor position | |
2411 | @var{cursor} in the character set @var{cs}. It is an error to | |
2412 | pass a cursor for which @code{end-of-char-set?} returns true. | |
2413 | @end deffn | |
2414 | ||
2415 | @deffn {Scheme Procedure} char-set-cursor-next cs cursor | |
2416 | @deffnx {C Function} scm_char_set_cursor_next (cs, cursor) | |
2417 | Advance the character set cursor @var{cursor} to the next | |
2418 | character in the character set @var{cs}. It is an error if the | |
2419 | cursor given satisfies @code{end-of-char-set?}. | |
2420 | @end deffn | |
2421 | ||
2422 | @deffn {Scheme Procedure} end-of-char-set? cursor | |
2423 | @deffnx {C Function} scm_end_of_char_set_p (cursor) | |
2424 | Return @code{#t} if @var{cursor} has reached the end of a | |
2425 | character set, @code{#f} otherwise. | |
2426 | @end deffn | |
2427 | ||
2428 | @deffn {Scheme Procedure} char-set-fold kons knil cs | |
2429 | @deffnx {C Function} scm_char_set_fold (kons, knil, cs) | |
2430 | Fold the procedure @var{kons} over the character set @var{cs}, | |
2431 | initializing it with @var{knil}. | |
2432 | @end deffn | |
2433 | ||
2434 | @deffn {Scheme Procedure} char-set-unfold p f g seed [base_cs] | |
2435 | @deffnx {C Function} scm_char_set_unfold (p, f, g, seed, base_cs) | |
2436 | This is a fundamental constructor for character sets. | |
2437 | @itemize @bullet | |
2438 | @item @var{g} is used to generate a series of ``seed'' values | |
2439 | from the initial seed: @var{seed}, (@var{g} @var{seed}), | |
2440 | (@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}), @dots{} | |
2441 | @item @var{p} tells us when to stop -- when it returns true | |
2442 | when applied to one of the seed values. | |
2443 | @item @var{f} maps each seed value to a character. These | |
2444 | characters are added to the base character set @var{base_cs} to | |
2445 | form the result; @var{base_cs} defaults to the empty set. | |
2446 | @end itemize | |
2447 | @end deffn | |
2448 | ||
2449 | @deffn {Scheme Procedure} char-set-unfold! p f g seed base_cs | |
2450 | @deffnx {C Function} scm_char_set_unfold_x (p, f, g, seed, base_cs) | |
2451 | This is a fundamental constructor for character sets. | |
2452 | @itemize @bullet | |
2453 | @item @var{g} is used to generate a series of ``seed'' values | |
2454 | from the initial seed: @var{seed}, (@var{g} @var{seed}), | |
2455 | (@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}), @dots{} | |
2456 | @item @var{p} tells us when to stop -- when it returns true | |
2457 | when applied to one of the seed values. | |
2458 | @item @var{f} maps each seed value to a character. These | |
2459 | characters are added to the base character set @var{base_cs} to | |
2460 | form the result; @var{base_cs} defaults to the empty set. | |
2461 | @end itemize | |
2462 | @end deffn | |
2463 | ||
2464 | @deffn {Scheme Procedure} char-set-for-each proc cs | |
2465 | @deffnx {C Function} scm_char_set_for_each (proc, cs) | |
2466 | Apply @var{proc} to every character in the character set | |
2467 | @var{cs}. The return value is not specified. | |
2468 | @end deffn | |
2469 | ||
2470 | @deffn {Scheme Procedure} char-set-map proc cs | |
2471 | @deffnx {C Function} scm_char_set_map (proc, cs) | |
2472 | Map the procedure @var{proc} over every character in @var{cs}. | |
2473 | @var{proc} must be a character -> character procedure. | |
2474 | @end deffn | |
2475 | ||
2476 | @c =================================================================== | |
2477 | ||
2478 | @node Creating Character Sets | |
2479 | @subsubsection Creating Character Sets | |
2480 | ||
2481 | New character sets are produced with these procedures. | |
2482 | ||
2483 | @deffn {Scheme Procedure} char-set-copy cs | |
2484 | @deffnx {C Function} scm_char_set_copy (cs) | |
2485 | Return a newly allocated character set containing all | |
2486 | characters in @var{cs}. | |
2487 | @end deffn | |
2488 | ||
df0a1002 BT |
2489 | @deffn {Scheme Procedure} char-set chr @dots{} |
2490 | @deffnx {C Function} scm_char_set (chrs) | |
050ab45f MV |
2491 | Return a character set containing all given characters. |
2492 | @end deffn | |
2493 | ||
2494 | @deffn {Scheme Procedure} list->char-set list [base_cs] | |
2495 | @deffnx {C Function} scm_list_to_char_set (list, base_cs) | |
2496 | Convert the character list @var{list} to a character set. If | |
2497 | the character set @var{base_cs} is given, the character in this | |
2498 | set are also included in the result. | |
2499 | @end deffn | |
2500 | ||
2501 | @deffn {Scheme Procedure} list->char-set! list base_cs | |
2502 | @deffnx {C Function} scm_list_to_char_set_x (list, base_cs) | |
2503 | Convert the character list @var{list} to a character set. The | |
2504 | characters are added to @var{base_cs} and @var{base_cs} is | |
2505 | returned. | |
2506 | @end deffn | |
2507 | ||
2508 | @deffn {Scheme Procedure} string->char-set str [base_cs] | |
2509 | @deffnx {C Function} scm_string_to_char_set (str, base_cs) | |
2510 | Convert the string @var{str} to a character set. If the | |
2511 | character set @var{base_cs} is given, the characters in this | |
2512 | set are also included in the result. | |
2513 | @end deffn | |
2514 | ||
2515 | @deffn {Scheme Procedure} string->char-set! str base_cs | |
2516 | @deffnx {C Function} scm_string_to_char_set_x (str, base_cs) | |
2517 | Convert the string @var{str} to a character set. The | |
2518 | characters from the string are added to @var{base_cs}, and | |
2519 | @var{base_cs} is returned. | |
2520 | @end deffn | |
2521 | ||
2522 | @deffn {Scheme Procedure} char-set-filter pred cs [base_cs] | |
2523 | @deffnx {C Function} scm_char_set_filter (pred, cs, base_cs) | |
2524 | Return a character set containing every character from @var{cs} | |
2525 | so that it satisfies @var{pred}. If provided, the characters | |
2526 | from @var{base_cs} are added to the result. | |
2527 | @end deffn | |
2528 | ||
2529 | @deffn {Scheme Procedure} char-set-filter! pred cs base_cs | |
2530 | @deffnx {C Function} scm_char_set_filter_x (pred, cs, base_cs) | |
2531 | Return a character set containing every character from @var{cs} | |
2532 | so that it satisfies @var{pred}. The characters are added to | |
2533 | @var{base_cs} and @var{base_cs} is returned. | |
2534 | @end deffn | |
2535 | ||
2536 | @deffn {Scheme Procedure} ucs-range->char-set lower upper [error [base_cs]] | |
2537 | @deffnx {C Function} scm_ucs_range_to_char_set (lower, upper, error, base_cs) | |
2538 | Return a character set containing all characters whose | |
2539 | character codes lie in the half-open range | |
2540 | [@var{lower},@var{upper}). | |
2541 | ||
2542 | If @var{error} is a true value, an error is signalled if the | |
2543 | specified range contains characters which are not contained in | |
2544 | the implemented character range. If @var{error} is @code{#f}, | |
be3eb25c | 2545 | these characters are silently left out of the resulting |
050ab45f MV |
2546 | character set. |
2547 | ||
2548 | The characters in @var{base_cs} are added to the result, if | |
2549 | given. | |
2550 | @end deffn | |
2551 | ||
2552 | @deffn {Scheme Procedure} ucs-range->char-set! lower upper error base_cs | |
2553 | @deffnx {C Function} scm_ucs_range_to_char_set_x (lower, upper, error, base_cs) | |
2554 | Return a character set containing all characters whose | |
2555 | character codes lie in the half-open range | |
2556 | [@var{lower},@var{upper}). | |
2557 | ||
2558 | If @var{error} is a true value, an error is signalled if the | |
2559 | specified range contains characters which are not contained in | |
2560 | the implemented character range. If @var{error} is @code{#f}, | |
be3eb25c | 2561 | these characters are silently left out of the resulting |
050ab45f MV |
2562 | character set. |
2563 | ||
2564 | The characters are added to @var{base_cs} and @var{base_cs} is | |
2565 | returned. | |
2566 | @end deffn | |
2567 | ||
2568 | @deffn {Scheme Procedure} ->char-set x | |
2569 | @deffnx {C Function} scm_to_char_set (x) | |
be3eb25c MG |
2570 | Coerces x into a char-set. @var{x} may be a string, character or |
2571 | char-set. A string is converted to the set of its constituent | |
2572 | characters; a character is converted to a singleton set; a char-set is | |
2573 | returned as-is. | |
050ab45f MV |
2574 | @end deffn |
2575 | ||
2576 | @c =================================================================== | |
2577 | ||
2578 | @node Querying Character Sets | |
2579 | @subsubsection Querying Character Sets | |
2580 | ||
2581 | Access the elements and other information of a character set with these | |
2582 | procedures. | |
2583 | ||
be3eb25c MG |
2584 | @deffn {Scheme Procedure} %char-set-dump cs |
2585 | Returns an association list containing debugging information | |
2586 | for @var{cs}. The association list has the following entries. | |
2587 | @table @code | |
2588 | @item char-set | |
2589 | The char-set itself | |
2590 | @item len | |
2591 | The number of groups of contiguous code points the char-set | |
2592 | contains | |
2593 | @item ranges | |
2594 | A list of lists where each sublist is a range of code points | |
2595 | and their associated characters | |
2596 | @end table | |
2597 | The return value of this function cannot be relied upon to be | |
2598 | consistent between versions of Guile and should not be used in code. | |
2599 | @end deffn | |
2600 | ||
050ab45f MV |
2601 | @deffn {Scheme Procedure} char-set-size cs |
2602 | @deffnx {C Function} scm_char_set_size (cs) | |
2603 | Return the number of elements in character set @var{cs}. | |
2604 | @end deffn | |
2605 | ||
2606 | @deffn {Scheme Procedure} char-set-count pred cs | |
2607 | @deffnx {C Function} scm_char_set_count (pred, cs) | |
2608 | Return the number of the elements int the character set | |
2609 | @var{cs} which satisfy the predicate @var{pred}. | |
2610 | @end deffn | |
2611 | ||
2612 | @deffn {Scheme Procedure} char-set->list cs | |
2613 | @deffnx {C Function} scm_char_set_to_list (cs) | |
2614 | Return a list containing the elements of the character set | |
2615 | @var{cs}. | |
2616 | @end deffn | |
2617 | ||
2618 | @deffn {Scheme Procedure} char-set->string cs | |
2619 | @deffnx {C Function} scm_char_set_to_string (cs) | |
2620 | Return a string containing the elements of the character set | |
2621 | @var{cs}. The order in which the characters are placed in the | |
2622 | string is not defined. | |
2623 | @end deffn | |
2624 | ||
2625 | @deffn {Scheme Procedure} char-set-contains? cs ch | |
2626 | @deffnx {C Function} scm_char_set_contains_p (cs, ch) | |
a4b4fbbd JE |
2627 | Return @code{#t} if the character @var{ch} is contained in the |
2628 | character set @var{cs}, or @code{#f} otherwise. | |
050ab45f MV |
2629 | @end deffn |
2630 | ||
2631 | @deffn {Scheme Procedure} char-set-every pred cs | |
2632 | @deffnx {C Function} scm_char_set_every (pred, cs) | |
2633 | Return a true value if every character in the character set | |
2634 | @var{cs} satisfies the predicate @var{pred}. | |
2635 | @end deffn | |
2636 | ||
2637 | @deffn {Scheme Procedure} char-set-any pred cs | |
2638 | @deffnx {C Function} scm_char_set_any (pred, cs) | |
2639 | Return a true value if any character in the character set | |
2640 | @var{cs} satisfies the predicate @var{pred}. | |
2641 | @end deffn | |
2642 | ||
2643 | @c =================================================================== | |
2644 | ||
2645 | @node Character-Set Algebra | |
2646 | @subsubsection Character-Set Algebra | |
2647 | ||
2648 | Character sets can be manipulated with the common set algebra operation, | |
2649 | such as union, complement, intersection etc. All of these procedures | |
2650 | provide side-effecting variants, which modify their character set | |
2651 | argument(s). | |
2652 | ||
df0a1002 BT |
2653 | @deffn {Scheme Procedure} char-set-adjoin cs chr @dots{} |
2654 | @deffnx {C Function} scm_char_set_adjoin (cs, chrs) | |
050ab45f MV |
2655 | Add all character arguments to the first argument, which must |
2656 | be a character set. | |
2657 | @end deffn | |
2658 | ||
df0a1002 BT |
2659 | @deffn {Scheme Procedure} char-set-delete cs chr @dots{} |
2660 | @deffnx {C Function} scm_char_set_delete (cs, chrs) | |
050ab45f MV |
2661 | Delete all character arguments from the first argument, which |
2662 | must be a character set. | |
2663 | @end deffn | |
2664 | ||
df0a1002 BT |
2665 | @deffn {Scheme Procedure} char-set-adjoin! cs chr @dots{} |
2666 | @deffnx {C Function} scm_char_set_adjoin_x (cs, chrs) | |
050ab45f MV |
2667 | Add all character arguments to the first argument, which must |
2668 | be a character set. | |
2669 | @end deffn | |
2670 | ||
df0a1002 BT |
2671 | @deffn {Scheme Procedure} char-set-delete! cs chr @dots{} |
2672 | @deffnx {C Function} scm_char_set_delete_x (cs, chrs) | |
050ab45f MV |
2673 | Delete all character arguments from the first argument, which |
2674 | must be a character set. | |
2675 | @end deffn | |
2676 | ||
2677 | @deffn {Scheme Procedure} char-set-complement cs | |
2678 | @deffnx {C Function} scm_char_set_complement (cs) | |
2679 | Return the complement of the character set @var{cs}. | |
2680 | @end deffn | |
2681 | ||
be3eb25c MG |
2682 | Note that the complement of a character set is likely to contain many |
2683 | reserved code points (code points that are not associated with | |
2684 | characters). It may be helpful to modify the output of | |
2685 | @code{char-set-complement} by computing its intersection with the set | |
2686 | of designated code points, @code{char-set:designated}. | |
2687 | ||
df0a1002 BT |
2688 | @deffn {Scheme Procedure} char-set-union cs @dots{} |
2689 | @deffnx {C Function} scm_char_set_union (char_sets) | |
050ab45f MV |
2690 | Return the union of all argument character sets. |
2691 | @end deffn | |
2692 | ||
df0a1002 BT |
2693 | @deffn {Scheme Procedure} char-set-intersection cs @dots{} |
2694 | @deffnx {C Function} scm_char_set_intersection (char_sets) | |
050ab45f MV |
2695 | Return the intersection of all argument character sets. |
2696 | @end deffn | |
2697 | ||
df0a1002 BT |
2698 | @deffn {Scheme Procedure} char-set-difference cs1 cs @dots{} |
2699 | @deffnx {C Function} scm_char_set_difference (cs1, char_sets) | |
050ab45f MV |
2700 | Return the difference of all argument character sets. |
2701 | @end deffn | |
2702 | ||
df0a1002 BT |
2703 | @deffn {Scheme Procedure} char-set-xor cs @dots{} |
2704 | @deffnx {C Function} scm_char_set_xor (char_sets) | |
050ab45f MV |
2705 | Return the exclusive-or of all argument character sets. |
2706 | @end deffn | |
2707 | ||
df0a1002 BT |
2708 | @deffn {Scheme Procedure} char-set-diff+intersection cs1 cs @dots{} |
2709 | @deffnx {C Function} scm_char_set_diff_plus_intersection (cs1, char_sets) | |
050ab45f MV |
2710 | Return the difference and the intersection of all argument |
2711 | character sets. | |
2712 | @end deffn | |
2713 | ||
2714 | @deffn {Scheme Procedure} char-set-complement! cs | |
2715 | @deffnx {C Function} scm_char_set_complement_x (cs) | |
2716 | Return the complement of the character set @var{cs}. | |
2717 | @end deffn | |
2718 | ||
df0a1002 BT |
2719 | @deffn {Scheme Procedure} char-set-union! cs1 cs @dots{} |
2720 | @deffnx {C Function} scm_char_set_union_x (cs1, char_sets) | |
050ab45f MV |
2721 | Return the union of all argument character sets. |
2722 | @end deffn | |
2723 | ||
df0a1002 BT |
2724 | @deffn {Scheme Procedure} char-set-intersection! cs1 cs @dots{} |
2725 | @deffnx {C Function} scm_char_set_intersection_x (cs1, char_sets) | |
050ab45f MV |
2726 | Return the intersection of all argument character sets. |
2727 | @end deffn | |
2728 | ||
df0a1002 BT |
2729 | @deffn {Scheme Procedure} char-set-difference! cs1 cs @dots{} |
2730 | @deffnx {C Function} scm_char_set_difference_x (cs1, char_sets) | |
050ab45f MV |
2731 | Return the difference of all argument character sets. |
2732 | @end deffn | |
2733 | ||
df0a1002 BT |
2734 | @deffn {Scheme Procedure} char-set-xor! cs1 cs @dots{} |
2735 | @deffnx {C Function} scm_char_set_xor_x (cs1, char_sets) | |
050ab45f MV |
2736 | Return the exclusive-or of all argument character sets. |
2737 | @end deffn | |
2738 | ||
df0a1002 BT |
2739 | @deffn {Scheme Procedure} char-set-diff+intersection! cs1 cs2 cs @dots{} |
2740 | @deffnx {C Function} scm_char_set_diff_plus_intersection_x (cs1, cs2, char_sets) | |
050ab45f MV |
2741 | Return the difference and the intersection of all argument |
2742 | character sets. | |
2743 | @end deffn | |
2744 | ||
2745 | @c =================================================================== | |
2746 | ||
2747 | @node Standard Character Sets | |
2748 | @subsubsection Standard Character Sets | |
2749 | ||
2750 | In order to make the use of the character set data type and procedures | |
2751 | useful, several predefined character set variables exist. | |
2752 | ||
49dec04b LC |
2753 | @cindex codeset |
2754 | @cindex charset | |
2755 | @cindex locale | |
2756 | ||
be3eb25c MG |
2757 | These character sets are locale independent and are not recomputed |
2758 | upon a @code{setlocale} call. They contain characters from the whole | |
2759 | range of Unicode code points. For instance, @code{char-set:letter} | |
bf8d8454 | 2760 | contains about 100,000 characters. |
49dec04b | 2761 | |
c9dc8c6c MV |
2762 | @defvr {Scheme Variable} char-set:lower-case |
2763 | @defvrx {C Variable} scm_char_set_lower_case | |
050ab45f | 2764 | All lower-case characters. |
c9dc8c6c | 2765 | @end defvr |
050ab45f | 2766 | |
c9dc8c6c MV |
2767 | @defvr {Scheme Variable} char-set:upper-case |
2768 | @defvrx {C Variable} scm_char_set_upper_case | |
050ab45f | 2769 | All upper-case characters. |
c9dc8c6c | 2770 | @end defvr |
050ab45f | 2771 | |
c9dc8c6c MV |
2772 | @defvr {Scheme Variable} char-set:title-case |
2773 | @defvrx {C Variable} scm_char_set_title_case | |
be3eb25c MG |
2774 | All single characters that function as if they were an upper-case |
2775 | letter followed by a lower-case letter. | |
c9dc8c6c | 2776 | @end defvr |
050ab45f | 2777 | |
c9dc8c6c MV |
2778 | @defvr {Scheme Variable} char-set:letter |
2779 | @defvrx {C Variable} scm_char_set_letter | |
be3eb25c MG |
2780 | All letters. This includes @code{char-set:lower-case}, |
2781 | @code{char-set:upper-case}, @code{char-set:title-case}, and many | |
2782 | letters that have no case at all. For example, Chinese and Japanese | |
2783 | characters typically have no concept of case. | |
c9dc8c6c | 2784 | @end defvr |
050ab45f | 2785 | |
c9dc8c6c MV |
2786 | @defvr {Scheme Variable} char-set:digit |
2787 | @defvrx {C Variable} scm_char_set_digit | |
050ab45f | 2788 | All digits. |
c9dc8c6c | 2789 | @end defvr |
050ab45f | 2790 | |
c9dc8c6c MV |
2791 | @defvr {Scheme Variable} char-set:letter+digit |
2792 | @defvrx {C Variable} scm_char_set_letter_and_digit | |
050ab45f | 2793 | The union of @code{char-set:letter} and @code{char-set:digit}. |
c9dc8c6c | 2794 | @end defvr |
050ab45f | 2795 | |
c9dc8c6c MV |
2796 | @defvr {Scheme Variable} char-set:graphic |
2797 | @defvrx {C Variable} scm_char_set_graphic | |
050ab45f | 2798 | All characters which would put ink on the paper. |
c9dc8c6c | 2799 | @end defvr |
050ab45f | 2800 | |
c9dc8c6c MV |
2801 | @defvr {Scheme Variable} char-set:printing |
2802 | @defvrx {C Variable} scm_char_set_printing | |
050ab45f | 2803 | The union of @code{char-set:graphic} and @code{char-set:whitespace}. |
c9dc8c6c | 2804 | @end defvr |
050ab45f | 2805 | |
c9dc8c6c MV |
2806 | @defvr {Scheme Variable} char-set:whitespace |
2807 | @defvrx {C Variable} scm_char_set_whitespace | |
050ab45f | 2808 | All whitespace characters. |
c9dc8c6c | 2809 | @end defvr |
050ab45f | 2810 | |
c9dc8c6c MV |
2811 | @defvr {Scheme Variable} char-set:blank |
2812 | @defvrx {C Variable} scm_char_set_blank | |
be3eb25c MG |
2813 | All horizontal whitespace characters, which notably includes |
2814 | @code{#\space} and @code{#\tab}. | |
c9dc8c6c | 2815 | @end defvr |
050ab45f | 2816 | |
c9dc8c6c MV |
2817 | @defvr {Scheme Variable} char-set:iso-control |
2818 | @defvrx {C Variable} scm_char_set_iso_control | |
be3eb25c MG |
2819 | The ISO control characters are the C0 control characters (U+0000 to |
2820 | U+001F), delete (U+007F), and the C1 control characters (U+0080 to | |
2821 | U+009F). | |
c9dc8c6c | 2822 | @end defvr |
050ab45f | 2823 | |
c9dc8c6c MV |
2824 | @defvr {Scheme Variable} char-set:punctuation |
2825 | @defvrx {C Variable} scm_char_set_punctuation | |
be3eb25c MG |
2826 | All punctuation characters, such as the characters |
2827 | @code{!"#%&'()*,-./:;?@@[\\]_@{@}} | |
c9dc8c6c | 2828 | @end defvr |
050ab45f | 2829 | |
c9dc8c6c MV |
2830 | @defvr {Scheme Variable} char-set:symbol |
2831 | @defvrx {C Variable} scm_char_set_symbol | |
be3eb25c | 2832 | All symbol characters, such as the characters @code{$+<=>^`|~}. |
c9dc8c6c | 2833 | @end defvr |
050ab45f | 2834 | |
c9dc8c6c MV |
2835 | @defvr {Scheme Variable} char-set:hex-digit |
2836 | @defvrx {C Variable} scm_char_set_hex_digit | |
050ab45f | 2837 | The hexadecimal digits @code{0123456789abcdefABCDEF}. |
c9dc8c6c | 2838 | @end defvr |
050ab45f | 2839 | |
c9dc8c6c MV |
2840 | @defvr {Scheme Variable} char-set:ascii |
2841 | @defvrx {C Variable} scm_char_set_ascii | |
050ab45f | 2842 | All ASCII characters. |
c9dc8c6c | 2843 | @end defvr |
050ab45f | 2844 | |
c9dc8c6c MV |
2845 | @defvr {Scheme Variable} char-set:empty |
2846 | @defvrx {C Variable} scm_char_set_empty | |
050ab45f | 2847 | The empty character set. |
c9dc8c6c | 2848 | @end defvr |
050ab45f | 2849 | |
be3eb25c MG |
2850 | @defvr {Scheme Variable} char-set:designated |
2851 | @defvrx {C Variable} scm_char_set_designated | |
2852 | This character set contains all designated code points. This includes | |
2853 | all the code points to which Unicode has assigned a character or other | |
2854 | meaning. | |
2855 | @end defvr | |
2856 | ||
c9dc8c6c MV |
2857 | @defvr {Scheme Variable} char-set:full |
2858 | @defvrx {C Variable} scm_char_set_full | |
be3eb25c MG |
2859 | This character set contains all possible code points. This includes |
2860 | both designated and reserved code points. | |
c9dc8c6c | 2861 | @end defvr |
07d83abe MV |
2862 | |
2863 | @node Strings | |
2864 | @subsection Strings | |
2865 | @tpindex Strings | |
2866 | ||
2867 | Strings are fixed-length sequences of characters. They can be created | |
2868 | by calling constructor procedures, but they can also literally get | |
2869 | entered at the @acronym{REPL} or in Scheme source files. | |
2870 | ||
2871 | @c Guile provides a rich set of string processing procedures, because text | |
2872 | @c handling is very important when Guile is used as a scripting language. | |
2873 | ||
2874 | Strings always carry the information about how many characters they are | |
2875 | composed of with them, so there is no special end-of-string character, | |
2876 | like in C. That means that Scheme strings can contain any character, | |
c48c62d0 MV |
2877 | even the @samp{#\nul} character @samp{\0}. |
2878 | ||
2879 | To use strings efficiently, you need to know a bit about how Guile | |
2880 | implements them. In Guile, a string consists of two parts, a head and | |
2881 | the actual memory where the characters are stored. When a string (or | |
2882 | a substring of it) is copied, only a new head gets created, the memory | |
2883 | is usually not copied. The two heads start out pointing to the same | |
2884 | memory. | |
2885 | ||
2886 | When one of these two strings is modified, as with @code{string-set!}, | |
2887 | their common memory does get copied so that each string has its own | |
be3eb25c | 2888 | memory and modifying one does not accidentally modify the other as well. |
c48c62d0 MV |
2889 | Thus, Guile's strings are `copy on write'; the actual copying of their |
2890 | memory is delayed until one string is written to. | |
2891 | ||
2892 | This implementation makes functions like @code{substring} very | |
2893 | efficient in the common case that no modifications are done to the | |
2894 | involved strings. | |
2895 | ||
2896 | If you do know that your strings are getting modified right away, you | |
2897 | can use @code{substring/copy} instead of @code{substring}. This | |
2898 | function performs the copy immediately at the time of creation. This | |
2899 | is more efficient, especially in a multi-threaded program. Also, | |
2900 | @code{substring/copy} can avoid the problem that a short substring | |
2901 | holds on to the memory of a very large original string that could | |
2902 | otherwise be recycled. | |
2903 | ||
2904 | If you want to avoid the copy altogether, so that modifications of one | |
2905 | string show up in the other, you can use @code{substring/shared}. The | |
2906 | strings created by this procedure are called @dfn{mutation sharing | |
2907 | substrings} since the substring and the original string share | |
2908 | modifications to each other. | |
07d83abe | 2909 | |
05256760 MV |
2910 | If you want to prevent modifications, use @code{substring/read-only}. |
2911 | ||
c9dc8c6c MV |
2912 | Guile provides all procedures of SRFI-13 and a few more. |
2913 | ||
07d83abe | 2914 | @menu |
5676b4fa MV |
2915 | * String Syntax:: Read syntax for strings. |
2916 | * String Predicates:: Testing strings for certain properties. | |
2917 | * String Constructors:: Creating new string objects. | |
2918 | * List/String Conversion:: Converting from/to lists of characters. | |
2919 | * String Selection:: Select portions from strings. | |
2920 | * String Modification:: Modify parts or whole strings. | |
2921 | * String Comparison:: Lexicographic ordering predicates. | |
2922 | * String Searching:: Searching in strings. | |
2923 | * Alphabetic Case Mapping:: Convert the alphabetic case of strings. | |
2924 | * Reversing and Appending Strings:: Appending strings to form a new string. | |
2925 | * Mapping Folding and Unfolding:: Iterating over strings. | |
2926 | * Miscellaneous String Operations:: Replicating, insertion, parsing, ... | |
f05bb849 | 2927 | * Representing Strings as Bytes:: Encoding and decoding strings. |
67af975c | 2928 | * Conversion to/from C:: |
5b6b22e8 | 2929 | * String Internals:: The storage strategy for strings. |
07d83abe MV |
2930 | @end menu |
2931 | ||
2932 | @node String Syntax | |
2933 | @subsubsection String Read Syntax | |
2934 | ||
2935 | @c In the following @code is used to get a good font in TeX etc, but | |
2936 | @c is omitted for Info format, so as not to risk any confusion over | |
2937 | @c whether surrounding ` ' quotes are part of the escape or are | |
2938 | @c special in a string (they're not). | |
2939 | ||
2940 | The read syntax for strings is an arbitrarily long sequence of | |
c48c62d0 | 2941 | characters enclosed in double quotes (@nicode{"}). |
07d83abe | 2942 | |
67af975c | 2943 | Backslash is an escape character and can be used to insert the following |
6579c330 MW |
2944 | special characters. @nicode{\"} and @nicode{\\} are R5RS standard, |
2945 | @nicode{\|} is R7RS standard, the next seven are R6RS standard --- | |
2946 | notice they follow C syntax --- and the remaining four are Guile | |
2947 | extensions. | |
07d83abe MV |
2948 | |
2949 | @table @asis | |
2950 | @item @nicode{\\} | |
2951 | Backslash character. | |
2952 | ||
2953 | @item @nicode{\"} | |
2954 | Double quote character (an unescaped @nicode{"} is otherwise the end | |
2955 | of the string). | |
2956 | ||
6579c330 MW |
2957 | @item @nicode{\|} |
2958 | Vertical bar character. | |
2959 | ||
07d83abe MV |
2960 | @item @nicode{\a} |
2961 | Bell character (ASCII 7). | |
2962 | ||
2963 | @item @nicode{\f} | |
2964 | Formfeed character (ASCII 12). | |
2965 | ||
2966 | @item @nicode{\n} | |
2967 | Newline character (ASCII 10). | |
2968 | ||
2969 | @item @nicode{\r} | |
2970 | Carriage return character (ASCII 13). | |
2971 | ||
2972 | @item @nicode{\t} | |
2973 | Tab character (ASCII 9). | |
2974 | ||
2975 | @item @nicode{\v} | |
2976 | Vertical tab character (ASCII 11). | |
2977 | ||
67a4a16d MG |
2978 | @item @nicode{\b} |
2979 | Backspace character (ASCII 8). | |
2980 | ||
67af975c MG |
2981 | @item @nicode{\0} |
2982 | NUL character (ASCII 0). | |
2983 | ||
c869f0c1 AW |
2984 | @item @nicode{\} followed by newline (ASCII 10) |
2985 | Nothing. This way if @nicode{\} is the last character in a line, the | |
2986 | string will continue with the first character from the next line, | |
2987 | without a line break. | |
2988 | ||
2989 | If the @code{hungry-eol-escapes} reader option is enabled, which is not | |
2990 | the case by default, leading whitespace on the next line is discarded. | |
2991 | ||
2992 | @lisp | |
2993 | "foo\ | |
2994 | bar" | |
2995 | @result{} "foo bar" | |
2996 | (read-enable 'hungry-eol-escapes) | |
2997 | "foo\ | |
2998 | bar" | |
2999 | @result{} "foobar" | |
3000 | @end lisp | |
07d83abe MV |
3001 | @item @nicode{\xHH} |
3002 | Character code given by two hexadecimal digits. For example | |
3003 | @nicode{\x7f} for an ASCII DEL (127). | |
28cc8dac MG |
3004 | |
3005 | @item @nicode{\uHHHH} | |
3006 | Character code given by four hexadecimal digits. For example | |
3007 | @nicode{\u0100} for a capital A with macron (U+0100). | |
3008 | ||
3009 | @item @nicode{\UHHHHHH} | |
3010 | Character code given by six hexadecimal digits. For example | |
3011 | @nicode{\U010402}. | |
07d83abe MV |
3012 | @end table |
3013 | ||
3014 | @noindent | |
3015 | The following are examples of string literals: | |
3016 | ||
3017 | @lisp | |
3018 | "foo" | |
3019 | "bar plonk" | |
3020 | "Hello World" | |
3021 | "\"Hi\", he said." | |
3022 | @end lisp | |
3023 | ||
6ea30487 MG |
3024 | The three escape sequences @code{\xHH}, @code{\uHHHH} and @code{\UHHHHHH} were |
3025 | chosen to not break compatibility with code written for previous versions of | |
3026 | Guile. The R6RS specification suggests a different, incompatible syntax for hex | |
3027 | escapes: @code{\xHHHH;} -- a character code followed by one to eight hexadecimal | |
3028 | digits terminated with a semicolon. If this escape format is desired instead, | |
3029 | it can be enabled with the reader option @code{r6rs-hex-escapes}. | |
3030 | ||
3031 | @lisp | |
3032 | (read-enable 'r6rs-hex-escapes) | |
3033 | @end lisp | |
3034 | ||
1518f649 | 3035 | For more on reader options, @xref{Scheme Read}. |
07d83abe MV |
3036 | |
3037 | @node String Predicates | |
3038 | @subsubsection String Predicates | |
3039 | ||
3040 | The following procedures can be used to check whether a given string | |
3041 | fulfills some specified property. | |
3042 | ||
3043 | @rnindex string? | |
3044 | @deffn {Scheme Procedure} string? obj | |
3045 | @deffnx {C Function} scm_string_p (obj) | |
3046 | Return @code{#t} if @var{obj} is a string, else @code{#f}. | |
3047 | @end deffn | |
3048 | ||
91210d62 MV |
3049 | @deftypefn {C Function} int scm_is_string (SCM obj) |
3050 | Returns @code{1} if @var{obj} is a string, @code{0} otherwise. | |
3051 | @end deftypefn | |
3052 | ||
07d83abe MV |
3053 | @deffn {Scheme Procedure} string-null? str |
3054 | @deffnx {C Function} scm_string_null_p (str) | |
3055 | Return @code{#t} if @var{str}'s length is zero, and | |
3056 | @code{#f} otherwise. | |
3057 | @lisp | |
3058 | (string-null? "") @result{} #t | |
3059 | y @result{} "foo" | |
3060 | (string-null? y) @result{} #f | |
3061 | @end lisp | |
3062 | @end deffn | |
3063 | ||
5676b4fa MV |
3064 | @deffn {Scheme Procedure} string-any char_pred s [start [end]] |
3065 | @deffnx {C Function} scm_string_any (char_pred, s, start, end) | |
c100a12c | 3066 | Check if @var{char_pred} is true for any character in string @var{s}. |
5676b4fa | 3067 | |
c100a12c KR |
3068 | @var{char_pred} can be a character to check for any equal to that, or |
3069 | a character set (@pxref{Character Sets}) to check for any in that set, | |
3070 | or a predicate procedure to call. | |
5676b4fa | 3071 | |
c100a12c KR |
3072 | For a procedure, calls @code{(@var{char_pred} c)} are made |
3073 | successively on the characters from @var{start} to @var{end}. If | |
3074 | @var{char_pred} returns true (ie.@: non-@code{#f}), @code{string-any} | |
3075 | stops and that return value is the return from @code{string-any}. The | |
3076 | call on the last character (ie.@: at @math{@var{end}-1}), if that | |
3077 | point is reached, is a tail call. | |
3078 | ||
3079 | If there are no characters in @var{s} (ie.@: @var{start} equals | |
3080 | @var{end}) then the return is @code{#f}. | |
5676b4fa MV |
3081 | @end deffn |
3082 | ||
3083 | @deffn {Scheme Procedure} string-every char_pred s [start [end]] | |
3084 | @deffnx {C Function} scm_string_every (char_pred, s, start, end) | |
c100a12c KR |
3085 | Check if @var{char_pred} is true for every character in string |
3086 | @var{s}. | |
5676b4fa | 3087 | |
c100a12c KR |
3088 | @var{char_pred} can be a character to check for every character equal |
3089 | to that, or a character set (@pxref{Character Sets}) to check for | |
3090 | every character being in that set, or a predicate procedure to call. | |
3091 | ||
3092 | For a procedure, calls @code{(@var{char_pred} c)} are made | |
3093 | successively on the characters from @var{start} to @var{end}. If | |
3094 | @var{char_pred} returns @code{#f}, @code{string-every} stops and | |
3095 | returns @code{#f}. The call on the last character (ie.@: at | |
3096 | @math{@var{end}-1}), if that point is reached, is a tail call and the | |
3097 | return from that call is the return from @code{string-every}. | |
5676b4fa MV |
3098 | |
3099 | If there are no characters in @var{s} (ie.@: @var{start} equals | |
3100 | @var{end}) then the return is @code{#t}. | |
5676b4fa MV |
3101 | @end deffn |
3102 | ||
07d83abe MV |
3103 | @node String Constructors |
3104 | @subsubsection String Constructors | |
3105 | ||
3106 | The string constructor procedures create new string objects, possibly | |
c48c62d0 MV |
3107 | initializing them with some specified character data. See also |
3108 | @xref{String Selection}, for ways to create strings from existing | |
3109 | strings. | |
07d83abe MV |
3110 | |
3111 | @c FIXME::martin: list->string belongs into `List/String Conversion' | |
3112 | ||
bba26c32 | 3113 | @deffn {Scheme Procedure} string char@dots{} |
07d83abe | 3114 | @rnindex string |
bba26c32 KR |
3115 | Return a newly allocated string made from the given character |
3116 | arguments. | |
3117 | ||
3118 | @example | |
3119 | (string #\x #\y #\z) @result{} "xyz" | |
3120 | (string) @result{} "" | |
3121 | @end example | |
3122 | @end deffn | |
3123 | ||
3124 | @deffn {Scheme Procedure} list->string lst | |
3125 | @deffnx {C Function} scm_string (lst) | |
07d83abe | 3126 | @rnindex list->string |
bba26c32 KR |
3127 | Return a newly allocated string made from a list of characters. |
3128 | ||
3129 | @example | |
3130 | (list->string '(#\a #\b #\c)) @result{} "abc" | |
3131 | @end example | |
3132 | @end deffn | |
3133 | ||
3134 | @deffn {Scheme Procedure} reverse-list->string lst | |
3135 | @deffnx {C Function} scm_reverse_list_to_string (lst) | |
3136 | Return a newly allocated string made from a list of characters, in | |
3137 | reverse order. | |
3138 | ||
3139 | @example | |
3140 | (reverse-list->string '(#\a #\B #\c)) @result{} "cBa" | |
3141 | @end example | |
07d83abe MV |
3142 | @end deffn |
3143 | ||
3144 | @rnindex make-string | |
3145 | @deffn {Scheme Procedure} make-string k [chr] | |
3146 | @deffnx {C Function} scm_make_string (k, chr) | |
3147 | Return a newly allocated string of | |
3148 | length @var{k}. If @var{chr} is given, then all elements of | |
3149 | the string are initialized to @var{chr}, otherwise the contents | |
64de6db5 | 3150 | of the string are unspecified. |
07d83abe MV |
3151 | @end deffn |
3152 | ||
c48c62d0 MV |
3153 | @deftypefn {C Function} SCM scm_c_make_string (size_t len, SCM chr) |
3154 | Like @code{scm_make_string}, but expects the length as a | |
3155 | @code{size_t}. | |
3156 | @end deftypefn | |
3157 | ||
5676b4fa MV |
3158 | @deffn {Scheme Procedure} string-tabulate proc len |
3159 | @deffnx {C Function} scm_string_tabulate (proc, len) | |
3160 | @var{proc} is an integer->char procedure. Construct a string | |
3161 | of size @var{len} by applying @var{proc} to each index to | |
3162 | produce the corresponding string element. The order in which | |
3163 | @var{proc} is applied to the indices is not specified. | |
3164 | @end deffn | |
3165 | ||
5676b4fa MV |
3166 | @deffn {Scheme Procedure} string-join ls [delimiter [grammar]] |
3167 | @deffnx {C Function} scm_string_join (ls, delimiter, grammar) | |
3168 | Append the string in the string list @var{ls}, using the string | |
64de6db5 | 3169 | @var{delimiter} as a delimiter between the elements of @var{ls}. |
5676b4fa MV |
3170 | @var{grammar} is a symbol which specifies how the delimiter is |
3171 | placed between the strings, and defaults to the symbol | |
3172 | @code{infix}. | |
3173 | ||
3174 | @table @code | |
3175 | @item infix | |
3176 | Insert the separator between list elements. An empty string | |
3177 | will produce an empty list. | |
3b80c358 | 3178 | @item strict-infix |
5676b4fa MV |
3179 | Like @code{infix}, but will raise an error if given the empty |
3180 | list. | |
3181 | @item suffix | |
3182 | Insert the separator after every list element. | |
3183 | @item prefix | |
3184 | Insert the separator before each list element. | |
3185 | @end table | |
3186 | @end deffn | |
3187 | ||
07d83abe MV |
3188 | @node List/String Conversion |
3189 | @subsubsection List/String conversion | |
3190 | ||
3191 | When processing strings, it is often convenient to first convert them | |
3192 | into a list representation by using the procedure @code{string->list}, | |
3193 | work with the resulting list, and then convert it back into a string. | |
3194 | These procedures are useful for similar tasks. | |
3195 | ||
3196 | @rnindex string->list | |
5676b4fa MV |
3197 | @deffn {Scheme Procedure} string->list str [start [end]] |
3198 | @deffnx {C Function} scm_substring_to_list (str, start, end) | |
07d83abe | 3199 | @deffnx {C Function} scm_string_to_list (str) |
5676b4fa | 3200 | Convert the string @var{str} into a list of characters. |
07d83abe MV |
3201 | @end deffn |
3202 | ||
5f085775 DH |
3203 | @deffn {Scheme Procedure} string-split str char_pred |
3204 | @deffnx {C Function} scm_string_split (str, char_pred) | |
ecb87335 | 3205 | Split the string @var{str} into a list of substrings delimited |
5f085775 DH |
3206 | by appearances of characters that |
3207 | ||
3208 | @itemize @bullet | |
3209 | @item | |
3210 | equal @var{char_pred}, if it is a character, | |
3211 | ||
3212 | @item | |
3213 | satisfy the predicate @var{char_pred}, if it is a procedure, | |
3214 | ||
3215 | @item | |
3216 | are in the set @var{char_pred}, if it is a character set. | |
3217 | @end itemize | |
3218 | ||
3219 | Note that an empty substring between separator characters will result in | |
3220 | an empty string in the result list. | |
07d83abe MV |
3221 | |
3222 | @lisp | |
3223 | (string-split "root:x:0:0:root:/root:/bin/bash" #\:) | |
3224 | @result{} | |
3225 | ("root" "x" "0" "0" "root" "/root" "/bin/bash") | |
3226 | ||
3227 | (string-split "::" #\:) | |
3228 | @result{} | |
3229 | ("" "" "") | |
3230 | ||
3231 | (string-split "" #\:) | |
3232 | @result{} | |
3233 | ("") | |
3234 | @end lisp | |
3235 | @end deffn | |
3236 | ||
3237 | ||
3238 | @node String Selection | |
3239 | @subsubsection String Selection | |
3240 | ||
3241 | Portions of strings can be extracted by these procedures. | |
3242 | @code{string-ref} delivers individual characters whereas | |
3243 | @code{substring} can be used to extract substrings from longer strings. | |
3244 | ||
3245 | @rnindex string-length | |
3246 | @deffn {Scheme Procedure} string-length string | |
3247 | @deffnx {C Function} scm_string_length (string) | |
3248 | Return the number of characters in @var{string}. | |
3249 | @end deffn | |
3250 | ||
c48c62d0 MV |
3251 | @deftypefn {C Function} size_t scm_c_string_length (SCM str) |
3252 | Return the number of characters in @var{str} as a @code{size_t}. | |
3253 | @end deftypefn | |
3254 | ||
07d83abe MV |
3255 | @rnindex string-ref |
3256 | @deffn {Scheme Procedure} string-ref str k | |
3257 | @deffnx {C Function} scm_string_ref (str, k) | |
3258 | Return character @var{k} of @var{str} using zero-origin | |
3259 | indexing. @var{k} must be a valid index of @var{str}. | |
3260 | @end deffn | |
3261 | ||
c48c62d0 MV |
3262 | @deftypefn {C Function} SCM scm_c_string_ref (SCM str, size_t k) |
3263 | Return character @var{k} of @var{str} using zero-origin | |
3264 | indexing. @var{k} must be a valid index of @var{str}. | |
3265 | @end deftypefn | |
3266 | ||
07d83abe | 3267 | @rnindex string-copy |
5676b4fa MV |
3268 | @deffn {Scheme Procedure} string-copy str [start [end]] |
3269 | @deffnx {C Function} scm_substring_copy (str, start, end) | |
07d83abe | 3270 | @deffnx {C Function} scm_string_copy (str) |
5676b4fa | 3271 | Return a copy of the given string @var{str}. |
c48c62d0 MV |
3272 | |
3273 | The returned string shares storage with @var{str} initially, but it is | |
3274 | copied as soon as one of the two strings is modified. | |
07d83abe MV |
3275 | @end deffn |
3276 | ||
3277 | @rnindex substring | |
3278 | @deffn {Scheme Procedure} substring str start [end] | |
3279 | @deffnx {C Function} scm_substring (str, start, end) | |
c48c62d0 | 3280 | Return a new string formed from the characters |
07d83abe MV |
3281 | of @var{str} beginning with index @var{start} (inclusive) and |
3282 | ending with index @var{end} (exclusive). | |
3283 | @var{str} must be a string, @var{start} and @var{end} must be | |
3284 | exact integers satisfying: | |
3285 | ||
3286 | 0 <= @var{start} <= @var{end} <= @code{(string-length @var{str})}. | |
c48c62d0 MV |
3287 | |
3288 | The returned string shares storage with @var{str} initially, but it is | |
3289 | copied as soon as one of the two strings is modified. | |
3290 | @end deffn | |
3291 | ||
3292 | @deffn {Scheme Procedure} substring/shared str start [end] | |
3293 | @deffnx {C Function} scm_substring_shared (str, start, end) | |
3294 | Like @code{substring}, but the strings continue to share their storage | |
3295 | even if they are modified. Thus, modifications to @var{str} show up | |
3296 | in the new string, and vice versa. | |
3297 | @end deffn | |
3298 | ||
3299 | @deffn {Scheme Procedure} substring/copy str start [end] | |
3300 | @deffnx {C Function} scm_substring_copy (str, start, end) | |
3301 | Like @code{substring}, but the storage for the new string is copied | |
3302 | immediately. | |
07d83abe MV |
3303 | @end deffn |
3304 | ||
05256760 MV |
3305 | @deffn {Scheme Procedure} substring/read-only str start [end] |
3306 | @deffnx {C Function} scm_substring_read_only (str, start, end) | |
3307 | Like @code{substring}, but the resulting string can not be modified. | |
3308 | @end deffn | |
3309 | ||
c48c62d0 MV |
3310 | @deftypefn {C Function} SCM scm_c_substring (SCM str, size_t start, size_t end) |
3311 | @deftypefnx {C Function} SCM scm_c_substring_shared (SCM str, size_t start, size_t end) | |
3312 | @deftypefnx {C Function} SCM scm_c_substring_copy (SCM str, size_t start, size_t end) | |
05256760 | 3313 | @deftypefnx {C Function} SCM scm_c_substring_read_only (SCM str, size_t start, size_t end) |
c48c62d0 MV |
3314 | Like @code{scm_substring}, etc. but the bounds are given as a @code{size_t}. |
3315 | @end deftypefn | |
3316 | ||
5676b4fa MV |
3317 | @deffn {Scheme Procedure} string-take s n |
3318 | @deffnx {C Function} scm_string_take (s, n) | |
3319 | Return the @var{n} first characters of @var{s}. | |
3320 | @end deffn | |
3321 | ||
3322 | @deffn {Scheme Procedure} string-drop s n | |
3323 | @deffnx {C Function} scm_string_drop (s, n) | |
3324 | Return all but the first @var{n} characters of @var{s}. | |
3325 | @end deffn | |
3326 | ||
3327 | @deffn {Scheme Procedure} string-take-right s n | |
3328 | @deffnx {C Function} scm_string_take_right (s, n) | |
3329 | Return the @var{n} last characters of @var{s}. | |
3330 | @end deffn | |
3331 | ||
3332 | @deffn {Scheme Procedure} string-drop-right s n | |
3333 | @deffnx {C Function} scm_string_drop_right (s, n) | |
3334 | Return all but the last @var{n} characters of @var{s}. | |
3335 | @end deffn | |
3336 | ||
3337 | @deffn {Scheme Procedure} string-pad s len [chr [start [end]]] | |
6337e7fb | 3338 | @deffnx {Scheme Procedure} string-pad-right s len [chr [start [end]]] |
5676b4fa | 3339 | @deffnx {C Function} scm_string_pad (s, len, chr, start, end) |
5676b4fa | 3340 | @deffnx {C Function} scm_string_pad_right (s, len, chr, start, end) |
6337e7fb | 3341 | Take characters @var{start} to @var{end} from the string @var{s} and |
64de6db5 | 3342 | either pad with @var{chr} or truncate them to give @var{len} |
6337e7fb KR |
3343 | characters. |
3344 | ||
3345 | @code{string-pad} pads or truncates on the left, so for example | |
3346 | ||
3347 | @example | |
3348 | (string-pad "x" 3) @result{} " x" | |
3349 | (string-pad "abcde" 3) @result{} "cde" | |
3350 | @end example | |
3351 | ||
3352 | @code{string-pad-right} pads or truncates on the right, so for example | |
3353 | ||
3354 | @example | |
3355 | (string-pad-right "x" 3) @result{} "x " | |
3356 | (string-pad-right "abcde" 3) @result{} "abc" | |
3357 | @end example | |
5676b4fa MV |
3358 | @end deffn |
3359 | ||
3360 | @deffn {Scheme Procedure} string-trim s [char_pred [start [end]]] | |
dc297bb7 KR |
3361 | @deffnx {Scheme Procedure} string-trim-right s [char_pred [start [end]]] |
3362 | @deffnx {Scheme Procedure} string-trim-both s [char_pred [start [end]]] | |
5676b4fa | 3363 | @deffnx {C Function} scm_string_trim (s, char_pred, start, end) |
5676b4fa | 3364 | @deffnx {C Function} scm_string_trim_right (s, char_pred, start, end) |
5676b4fa | 3365 | @deffnx {C Function} scm_string_trim_both (s, char_pred, start, end) |
be3eb25c | 3366 | Trim occurrences of @var{char_pred} from the ends of @var{s}. |
5676b4fa | 3367 | |
dc297bb7 KR |
3368 | @code{string-trim} trims @var{char_pred} characters from the left |
3369 | (start) of the string, @code{string-trim-right} trims them from the | |
3370 | right (end) of the string, @code{string-trim-both} trims from both | |
3371 | ends. | |
5676b4fa | 3372 | |
dc297bb7 KR |
3373 | @var{char_pred} can be a character, a character set, or a predicate |
3374 | procedure to call on each character. If @var{char_pred} is not given | |
3375 | the default is whitespace as per @code{char-set:whitespace} | |
3376 | (@pxref{Standard Character Sets}). | |
5676b4fa | 3377 | |
dc297bb7 KR |
3378 | @example |
3379 | (string-trim " x ") @result{} "x " | |
3380 | (string-trim-right "banana" #\a) @result{} "banan" | |
3381 | (string-trim-both ".,xy:;" char-set:punctuation) | |
3382 | @result{} "xy" | |
3383 | (string-trim-both "xyzzy" (lambda (c) | |
3384 | (or (eqv? c #\x) | |
3385 | (eqv? c #\y)))) | |
3386 | @result{} "zz" | |
3387 | @end example | |
5676b4fa MV |
3388 | @end deffn |
3389 | ||
07d83abe MV |
3390 | @node String Modification |
3391 | @subsubsection String Modification | |
3392 | ||
3393 | These procedures are for modifying strings in-place. This means that the | |
3394 | result of the operation is not a new string; instead, the original string's | |
3395 | memory representation is modified. | |
3396 | ||
3397 | @rnindex string-set! | |
3398 | @deffn {Scheme Procedure} string-set! str k chr | |
3399 | @deffnx {C Function} scm_string_set_x (str, k, chr) | |
3400 | Store @var{chr} in element @var{k} of @var{str} and return | |
3401 | an unspecified value. @var{k} must be a valid index of | |
3402 | @var{str}. | |
3403 | @end deffn | |
3404 | ||
c48c62d0 MV |
3405 | @deftypefn {C Function} void scm_c_string_set_x (SCM str, size_t k, SCM chr) |
3406 | Like @code{scm_string_set_x}, but the index is given as a @code{size_t}. | |
3407 | @end deftypefn | |
3408 | ||
07d83abe | 3409 | @rnindex string-fill! |
5676b4fa MV |
3410 | @deffn {Scheme Procedure} string-fill! str chr [start [end]] |
3411 | @deffnx {C Function} scm_substring_fill_x (str, chr, start, end) | |
07d83abe | 3412 | @deffnx {C Function} scm_string_fill_x (str, chr) |
5676b4fa MV |
3413 | Stores @var{chr} in every element of the given @var{str} and |
3414 | returns an unspecified value. | |
07d83abe MV |
3415 | @end deffn |
3416 | ||
3417 | @deffn {Scheme Procedure} substring-fill! str start end fill | |
3418 | @deffnx {C Function} scm_substring_fill_x (str, start, end, fill) | |
3419 | Change every character in @var{str} between @var{start} and | |
3420 | @var{end} to @var{fill}. | |
3421 | ||
3422 | @lisp | |
4dbd29a9 | 3423 | (define y (string-copy "abcdefg")) |
07d83abe MV |
3424 | (substring-fill! y 1 3 #\r) |
3425 | y | |
3426 | @result{} "arrdefg" | |
3427 | @end lisp | |
3428 | @end deffn | |
3429 | ||
3430 | @deffn {Scheme Procedure} substring-move! str1 start1 end1 str2 start2 | |
3431 | @deffnx {C Function} scm_substring_move_x (str1, start1, end1, str2, start2) | |
3432 | Copy the substring of @var{str1} bounded by @var{start1} and @var{end1} | |
3433 | into @var{str2} beginning at position @var{start2}. | |
3434 | @var{str1} and @var{str2} can be the same string. | |
3435 | @end deffn | |
3436 | ||
5676b4fa MV |
3437 | @deffn {Scheme Procedure} string-copy! target tstart s [start [end]] |
3438 | @deffnx {C Function} scm_string_copy_x (target, tstart, s, start, end) | |
3439 | Copy the sequence of characters from index range [@var{start}, | |
3440 | @var{end}) in string @var{s} to string @var{target}, beginning | |
3441 | at index @var{tstart}. The characters are copied left-to-right | |
3442 | or right-to-left as needed -- the copy is guaranteed to work, | |
3443 | even if @var{target} and @var{s} are the same string. It is an | |
3444 | error if the copy operation runs off the end of the target | |
3445 | string. | |
3446 | @end deffn | |
3447 | ||
07d83abe MV |
3448 | |
3449 | @node String Comparison | |
3450 | @subsubsection String Comparison | |
3451 | ||
3452 | The procedures in this section are similar to the character ordering | |
3453 | predicates (@pxref{Characters}), but are defined on character sequences. | |
07d83abe | 3454 | |
5676b4fa | 3455 | The first set is specified in R5RS and has names that end in @code{?}. |
28cc8dac | 3456 | The second set is specified in SRFI-13 and the names have not ending |
67af975c | 3457 | @code{?}. |
28cc8dac MG |
3458 | |
3459 | The predicates ending in @code{-ci} ignore the character case | |
3460 | when comparing strings. For now, case-insensitive comparison is done | |
3461 | using the R5RS rules, where every lower-case character that has a | |
3462 | single character upper-case form is converted to uppercase before | |
3463 | comparison. See @xref{Text Collation, the @code{(ice-9 | |
b89c4943 | 3464 | i18n)} module}, for locale-dependent string comparison. |
07d83abe MV |
3465 | |
3466 | @rnindex string=? | |
df0a1002 | 3467 | @deffn {Scheme Procedure} string=? s1 s2 s3 @dots{} |
df0a1002 BT |
3468 | Lexicographic equality predicate; return @code{#t} if all strings are |
3469 | the same length and contain the same characters in the same positions, | |
3470 | otherwise return @code{#f}. | |
07d83abe MV |
3471 | |
3472 | The procedure @code{string-ci=?} treats upper and lower case | |
3473 | letters as though they were the same character, but | |
3474 | @code{string=?} treats upper and lower case as distinct | |
3475 | characters. | |
3476 | @end deffn | |
3477 | ||
3478 | @rnindex string<? | |
df0a1002 | 3479 | @deffn {Scheme Procedure} string<? s1 s2 s3 @dots{} |
df0a1002 BT |
3480 | Lexicographic ordering predicate; return @code{#t} if, for every pair of |
3481 | consecutive string arguments @var{str_i} and @var{str_i+1}, @var{str_i} is | |
3482 | lexicographically less than @var{str_i+1}. | |
07d83abe MV |
3483 | @end deffn |
3484 | ||
3485 | @rnindex string<=? | |
df0a1002 | 3486 | @deffn {Scheme Procedure} string<=? s1 s2 s3 @dots{} |
df0a1002 BT |
3487 | Lexicographic ordering predicate; return @code{#t} if, for every pair of |
3488 | consecutive string arguments @var{str_i} and @var{str_i+1}, @var{str_i} is | |
3489 | lexicographically less than or equal to @var{str_i+1}. | |
07d83abe MV |
3490 | @end deffn |
3491 | ||
3492 | @rnindex string>? | |
df0a1002 | 3493 | @deffn {Scheme Procedure} string>? s1 s2 s3 @dots{} |
df0a1002 BT |
3494 | Lexicographic ordering predicate; return @code{#t} if, for every pair of |
3495 | consecutive string arguments @var{str_i} and @var{str_i+1}, @var{str_i} is | |
3496 | lexicographically greater than @var{str_i+1}. | |
07d83abe MV |
3497 | @end deffn |
3498 | ||
3499 | @rnindex string>=? | |
df0a1002 | 3500 | @deffn {Scheme Procedure} string>=? s1 s2 s3 @dots{} |
df0a1002 BT |
3501 | Lexicographic ordering predicate; return @code{#t} if, for every pair of |
3502 | consecutive string arguments @var{str_i} and @var{str_i+1}, @var{str_i} is | |
3503 | lexicographically greater than or equal to @var{str_i+1}. | |
07d83abe MV |
3504 | @end deffn |
3505 | ||
3506 | @rnindex string-ci=? | |
df0a1002 | 3507 | @deffn {Scheme Procedure} string-ci=? s1 s2 s3 @dots{} |
07d83abe | 3508 | Case-insensitive string equality predicate; return @code{#t} if |
df0a1002 | 3509 | all strings are the same length and their component |
07d83abe MV |
3510 | characters match (ignoring case) at each position; otherwise |
3511 | return @code{#f}. | |
3512 | @end deffn | |
3513 | ||
5676b4fa | 3514 | @rnindex string-ci<? |
df0a1002 | 3515 | @deffn {Scheme Procedure} string-ci<? s1 s2 s3 @dots{} |
df0a1002 BT |
3516 | Case insensitive lexicographic ordering predicate; return @code{#t} if, |
3517 | for every pair of consecutive string arguments @var{str_i} and | |
3518 | @var{str_i+1}, @var{str_i} is lexicographically less than @var{str_i+1} | |
07d83abe MV |
3519 | regardless of case. |
3520 | @end deffn | |
3521 | ||
3522 | @rnindex string<=? | |
df0a1002 | 3523 | @deffn {Scheme Procedure} string-ci<=? s1 s2 s3 @dots{} |
df0a1002 BT |
3524 | Case insensitive lexicographic ordering predicate; return @code{#t} if, |
3525 | for every pair of consecutive string arguments @var{str_i} and | |
3526 | @var{str_i+1}, @var{str_i} is lexicographically less than or equal to | |
3527 | @var{str_i+1} regardless of case. | |
07d83abe MV |
3528 | @end deffn |
3529 | ||
3530 | @rnindex string-ci>? | |
df0a1002 | 3531 | @deffn {Scheme Procedure} string-ci>? s1 s2 s3 @dots{} |
df0a1002 BT |
3532 | Case insensitive lexicographic ordering predicate; return @code{#t} if, |
3533 | for every pair of consecutive string arguments @var{str_i} and | |
3534 | @var{str_i+1}, @var{str_i} is lexicographically greater than | |
3535 | @var{str_i+1} regardless of case. | |
07d83abe MV |
3536 | @end deffn |
3537 | ||
3538 | @rnindex string-ci>=? | |
df0a1002 | 3539 | @deffn {Scheme Procedure} string-ci>=? s1 s2 s3 @dots{} |
df0a1002 BT |
3540 | Case insensitive lexicographic ordering predicate; return @code{#t} if, |
3541 | for every pair of consecutive string arguments @var{str_i} and | |
3542 | @var{str_i+1}, @var{str_i} is lexicographically greater than or equal to | |
3543 | @var{str_i+1} regardless of case. | |
07d83abe MV |
3544 | @end deffn |
3545 | ||
5676b4fa MV |
3546 | @deffn {Scheme Procedure} string-compare s1 s2 proc_lt proc_eq proc_gt [start1 [end1 [start2 [end2]]]] |
3547 | @deffnx {C Function} scm_string_compare (s1, s2, proc_lt, proc_eq, proc_gt, start1, end1, start2, end2) | |
3548 | Apply @var{proc_lt}, @var{proc_eq}, @var{proc_gt} to the | |
3549 | mismatch index, depending upon whether @var{s1} is less than, | |
3550 | equal to, or greater than @var{s2}. The mismatch index is the | |
3551 | largest index @var{i} such that for every 0 <= @var{j} < | |
3552 | @var{i}, @var{s1}[@var{j}] = @var{s2}[@var{j}] -- that is, | |
3553 | @var{i} is the first position that does not match. | |
3554 | @end deffn | |
3555 | ||
3556 | @deffn {Scheme Procedure} string-compare-ci s1 s2 proc_lt proc_eq proc_gt [start1 [end1 [start2 [end2]]]] | |
3557 | @deffnx {C Function} scm_string_compare_ci (s1, s2, proc_lt, proc_eq, proc_gt, start1, end1, start2, end2) | |
3558 | Apply @var{proc_lt}, @var{proc_eq}, @var{proc_gt} to the | |
3559 | mismatch index, depending upon whether @var{s1} is less than, | |
3560 | equal to, or greater than @var{s2}. The mismatch index is the | |
3561 | largest index @var{i} such that for every 0 <= @var{j} < | |
3562 | @var{i}, @var{s1}[@var{j}] = @var{s2}[@var{j}] -- that is, | |
3323ec06 NJ |
3563 | @var{i} is the first position where the lowercased letters |
3564 | do not match. | |
3565 | ||
5676b4fa MV |
3566 | @end deffn |
3567 | ||
3568 | @deffn {Scheme Procedure} string= s1 s2 [start1 [end1 [start2 [end2]]]] | |
3569 | @deffnx {C Function} scm_string_eq (s1, s2, start1, end1, start2, end2) | |
3570 | Return @code{#f} if @var{s1} and @var{s2} are not equal, a true | |
3571 | value otherwise. | |
3572 | @end deffn | |
3573 | ||
3574 | @deffn {Scheme Procedure} string<> s1 s2 [start1 [end1 [start2 [end2]]]] | |
3575 | @deffnx {C Function} scm_string_neq (s1, s2, start1, end1, start2, end2) | |
3576 | Return @code{#f} if @var{s1} and @var{s2} are equal, a true | |
3577 | value otherwise. | |
3578 | @end deffn | |
3579 | ||
3580 | @deffn {Scheme Procedure} string< s1 s2 [start1 [end1 [start2 [end2]]]] | |
3581 | @deffnx {C Function} scm_string_lt (s1, s2, start1, end1, start2, end2) | |
3582 | Return @code{#f} if @var{s1} is greater or equal to @var{s2}, a | |
3583 | true value otherwise. | |
3584 | @end deffn | |
3585 | ||
3586 | @deffn {Scheme Procedure} string> s1 s2 [start1 [end1 [start2 [end2]]]] | |
3587 | @deffnx {C Function} scm_string_gt (s1, s2, start1, end1, start2, end2) | |
3588 | Return @code{#f} if @var{s1} is less or equal to @var{s2}, a | |
3589 | true value otherwise. | |
3590 | @end deffn | |
3591 | ||
3592 | @deffn {Scheme Procedure} string<= s1 s2 [start1 [end1 [start2 [end2]]]] | |
3593 | @deffnx {C Function} scm_string_le (s1, s2, start1, end1, start2, end2) | |
3594 | Return @code{#f} if @var{s1} is greater to @var{s2}, a true | |
3595 | value otherwise. | |
3596 | @end deffn | |
3597 | ||
3598 | @deffn {Scheme Procedure} string>= s1 s2 [start1 [end1 [start2 [end2]]]] | |
3599 | @deffnx {C Function} scm_string_ge (s1, s2, start1, end1, start2, end2) | |
3600 | Return @code{#f} if @var{s1} is less to @var{s2}, a true value | |
3601 | otherwise. | |
3602 | @end deffn | |
3603 | ||
3604 | @deffn {Scheme Procedure} string-ci= s1 s2 [start1 [end1 [start2 [end2]]]] | |
3605 | @deffnx {C Function} scm_string_ci_eq (s1, s2, start1, end1, start2, end2) | |
3606 | Return @code{#f} if @var{s1} and @var{s2} are not equal, a true | |
3607 | value otherwise. The character comparison is done | |
3608 | case-insensitively. | |
3609 | @end deffn | |
3610 | ||
3611 | @deffn {Scheme Procedure} string-ci<> s1 s2 [start1 [end1 [start2 [end2]]]] | |
3612 | @deffnx {C Function} scm_string_ci_neq (s1, s2, start1, end1, start2, end2) | |
3613 | Return @code{#f} if @var{s1} and @var{s2} are equal, a true | |
3614 | value otherwise. The character comparison is done | |
3615 | case-insensitively. | |
3616 | @end deffn | |
3617 | ||
3618 | @deffn {Scheme Procedure} string-ci< s1 s2 [start1 [end1 [start2 [end2]]]] | |
3619 | @deffnx {C Function} scm_string_ci_lt (s1, s2, start1, end1, start2, end2) | |
3620 | Return @code{#f} if @var{s1} is greater or equal to @var{s2}, a | |
3621 | true value otherwise. The character comparison is done | |
3622 | case-insensitively. | |
3623 | @end deffn | |
3624 | ||
3625 | @deffn {Scheme Procedure} string-ci> s1 s2 [start1 [end1 [start2 [end2]]]] | |
3626 | @deffnx {C Function} scm_string_ci_gt (s1, s2, start1, end1, start2, end2) | |
3627 | Return @code{#f} if @var{s1} is less or equal to @var{s2}, a | |
3628 | true value otherwise. The character comparison is done | |
3629 | case-insensitively. | |
3630 | @end deffn | |
3631 | ||
3632 | @deffn {Scheme Procedure} string-ci<= s1 s2 [start1 [end1 [start2 [end2]]]] | |
3633 | @deffnx {C Function} scm_string_ci_le (s1, s2, start1, end1, start2, end2) | |
3634 | Return @code{#f} if @var{s1} is greater to @var{s2}, a true | |
3635 | value otherwise. The character comparison is done | |
3636 | case-insensitively. | |
3637 | @end deffn | |
3638 | ||
3639 | @deffn {Scheme Procedure} string-ci>= s1 s2 [start1 [end1 [start2 [end2]]]] | |
3640 | @deffnx {C Function} scm_string_ci_ge (s1, s2, start1, end1, start2, end2) | |
3641 | Return @code{#f} if @var{s1} is less to @var{s2}, a true value | |
3642 | otherwise. The character comparison is done | |
3643 | case-insensitively. | |
3644 | @end deffn | |
3645 | ||
3646 | @deffn {Scheme Procedure} string-hash s [bound [start [end]]] | |
3647 | @deffnx {C Function} scm_substring_hash (s, bound, start, end) | |
64de6db5 | 3648 | Compute a hash value for @var{s}. The optional argument @var{bound} is a non-negative exact integer specifying the range of the hash function. A positive value restricts the return value to the range [0,bound). |
5676b4fa MV |
3649 | @end deffn |
3650 | ||
3651 | @deffn {Scheme Procedure} string-hash-ci s [bound [start [end]]] | |
3652 | @deffnx {C Function} scm_substring_hash_ci (s, bound, start, end) | |
64de6db5 | 3653 | Compute a hash value for @var{s}. The optional argument @var{bound} is a non-negative exact integer specifying the range of the hash function. A positive value restricts the return value to the range [0,bound). |
5676b4fa | 3654 | @end deffn |
07d83abe | 3655 | |
edb7bb47 JG |
3656 | Because the same visual appearance of an abstract Unicode character can |
3657 | be obtained via multiple sequences of Unicode characters, even the | |
3658 | case-insensitive string comparison functions described above may return | |
3659 | @code{#f} when presented with strings containing different | |
3660 | representations of the same character. For example, the Unicode | |
3661 | character ``LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE'' can be | |
3662 | represented with a single character (U+1E69) or by the character ``LATIN | |
3663 | SMALL LETTER S'' (U+0073) followed by the combining marks ``COMBINING | |
3664 | DOT BELOW'' (U+0323) and ``COMBINING DOT ABOVE'' (U+0307). | |
3665 | ||
3666 | For this reason, it is often desirable to ensure that the strings | |
3667 | to be compared are using a mutually consistent representation for every | |
3668 | character. The Unicode standard defines two methods of normalizing the | |
3669 | contents of strings: Decomposition, which breaks composite characters | |
3670 | into a set of constituent characters with an ordering defined by the | |
3671 | Unicode Standard; and composition, which performs the converse. | |
3672 | ||
3673 | There are two decomposition operations. ``Canonical decomposition'' | |
3674 | produces character sequences that share the same visual appearance as | |
ecb87335 | 3675 | the original characters, while ``compatibility decomposition'' produces |
edb7bb47 JG |
3676 | ones whose visual appearances may differ from the originals but which |
3677 | represent the same abstract character. | |
3678 | ||
3679 | These operations are encapsulated in the following set of normalization | |
3680 | forms: | |
3681 | ||
3682 | @table @dfn | |
3683 | @item NFD | |
3684 | Characters are decomposed to their canonical forms. | |
3685 | ||
3686 | @item NFKD | |
3687 | Characters are decomposed to their compatibility forms. | |
3688 | ||
3689 | @item NFC | |
3690 | Characters are decomposed to their canonical forms, then composed. | |
3691 | ||
3692 | @item NFKC | |
3693 | Characters are decomposed to their compatibility forms, then composed. | |
3694 | ||
3695 | @end table | |
3696 | ||
3697 | The functions below put their arguments into one of the forms described | |
3698 | above. | |
3699 | ||
3700 | @deffn {Scheme Procedure} string-normalize-nfd s | |
3701 | @deffnx {C Function} scm_string_normalize_nfd (s) | |
3702 | Return the @code{NFD} normalized form of @var{s}. | |
3703 | @end deffn | |
3704 | ||
3705 | @deffn {Scheme Procedure} string-normalize-nfkd s | |
3706 | @deffnx {C Function} scm_string_normalize_nfkd (s) | |
3707 | Return the @code{NFKD} normalized form of @var{s}. | |
3708 | @end deffn | |
3709 | ||
3710 | @deffn {Scheme Procedure} string-normalize-nfc s | |
3711 | @deffnx {C Function} scm_string_normalize_nfc (s) | |
3712 | Return the @code{NFC} normalized form of @var{s}. | |
3713 | @end deffn | |
3714 | ||
3715 | @deffn {Scheme Procedure} string-normalize-nfkc s | |
3716 | @deffnx {C Function} scm_string_normalize_nfkc (s) | |
3717 | Return the @code{NFKC} normalized form of @var{s}. | |
3718 | @end deffn | |
3719 | ||
07d83abe MV |
3720 | @node String Searching |
3721 | @subsubsection String Searching | |
3722 | ||
5676b4fa MV |
3723 | @deffn {Scheme Procedure} string-index s char_pred [start [end]] |
3724 | @deffnx {C Function} scm_string_index (s, char_pred, start, end) | |
3725 | Search through the string @var{s} from left to right, returning | |
be3eb25c | 3726 | the index of the first occurrence of a character which |
07d83abe | 3727 | |
5676b4fa MV |
3728 | @itemize @bullet |
3729 | @item | |
3730 | equals @var{char_pred}, if it is character, | |
07d83abe | 3731 | |
5676b4fa | 3732 | @item |
be3eb25c | 3733 | satisfies the predicate @var{char_pred}, if it is a procedure, |
07d83abe | 3734 | |
5676b4fa MV |
3735 | @item |
3736 | is in the set @var{char_pred}, if it is a character set. | |
3737 | @end itemize | |
bf7c2e96 LC |
3738 | |
3739 | Return @code{#f} if no match is found. | |
5676b4fa | 3740 | @end deffn |
07d83abe | 3741 | |
5676b4fa MV |
3742 | @deffn {Scheme Procedure} string-rindex s char_pred [start [end]] |
3743 | @deffnx {C Function} scm_string_rindex (s, char_pred, start, end) | |
3744 | Search through the string @var{s} from right to left, returning | |
be3eb25c | 3745 | the index of the last occurrence of a character which |
5676b4fa MV |
3746 | |
3747 | @itemize @bullet | |
3748 | @item | |
3749 | equals @var{char_pred}, if it is character, | |
3750 | ||
3751 | @item | |
be3eb25c | 3752 | satisfies the predicate @var{char_pred}, if it is a procedure, |
5676b4fa MV |
3753 | |
3754 | @item | |
3755 | is in the set if @var{char_pred} is a character set. | |
3756 | @end itemize | |
bf7c2e96 LC |
3757 | |
3758 | Return @code{#f} if no match is found. | |
07d83abe MV |
3759 | @end deffn |
3760 | ||
5676b4fa MV |
3761 | @deffn {Scheme Procedure} string-prefix-length s1 s2 [start1 [end1 [start2 [end2]]]] |
3762 | @deffnx {C Function} scm_string_prefix_length (s1, s2, start1, end1, start2, end2) | |
3763 | Return the length of the longest common prefix of the two | |
3764 | strings. | |
3765 | @end deffn | |
07d83abe | 3766 | |
5676b4fa MV |
3767 | @deffn {Scheme Procedure} string-prefix-length-ci s1 s2 [start1 [end1 [start2 [end2]]]] |
3768 | @deffnx {C Function} scm_string_prefix_length_ci (s1, s2, start1, end1, start2, end2) | |
3769 | Return the length of the longest common prefix of the two | |
3770 | strings, ignoring character case. | |
3771 | @end deffn | |
07d83abe | 3772 | |
5676b4fa MV |
3773 | @deffn {Scheme Procedure} string-suffix-length s1 s2 [start1 [end1 [start2 [end2]]]] |
3774 | @deffnx {C Function} scm_string_suffix_length (s1, s2, start1, end1, start2, end2) | |
3775 | Return the length of the longest common suffix of the two | |
3776 | strings. | |
3777 | @end deffn | |
07d83abe | 3778 | |
5676b4fa MV |
3779 | @deffn {Scheme Procedure} string-suffix-length-ci s1 s2 [start1 [end1 [start2 [end2]]]] |
3780 | @deffnx {C Function} scm_string_suffix_length_ci (s1, s2, start1, end1, start2, end2) | |
3781 | Return the length of the longest common suffix of the two | |
3782 | strings, ignoring character case. | |
3783 | @end deffn | |
3784 | ||
3785 | @deffn {Scheme Procedure} string-prefix? s1 s2 [start1 [end1 [start2 [end2]]]] | |
3786 | @deffnx {C Function} scm_string_prefix_p (s1, s2, start1, end1, start2, end2) | |
3787 | Is @var{s1} a prefix of @var{s2}? | |
3788 | @end deffn | |
3789 | ||
3790 | @deffn {Scheme Procedure} string-prefix-ci? s1 s2 [start1 [end1 [start2 [end2]]]] | |
3791 | @deffnx {C Function} scm_string_prefix_ci_p (s1, s2, start1, end1, start2, end2) | |
3792 | Is @var{s1} a prefix of @var{s2}, ignoring character case? | |
3793 | @end deffn | |
3794 | ||
3795 | @deffn {Scheme Procedure} string-suffix? s1 s2 [start1 [end1 [start2 [end2]]]] | |
3796 | @deffnx {C Function} scm_string_suffix_p (s1, s2, start1, end1, start2, end2) | |
3797 | Is @var{s1} a suffix of @var{s2}? | |
3798 | @end deffn | |
3799 | ||
3800 | @deffn {Scheme Procedure} string-suffix-ci? s1 s2 [start1 [end1 [start2 [end2]]]] | |
3801 | @deffnx {C Function} scm_string_suffix_ci_p (s1, s2, start1, end1, start2, end2) | |
3802 | Is @var{s1} a suffix of @var{s2}, ignoring character case? | |
3803 | @end deffn | |
3804 | ||
3805 | @deffn {Scheme Procedure} string-index-right s char_pred [start [end]] | |
3806 | @deffnx {C Function} scm_string_index_right (s, char_pred, start, end) | |
3807 | Search through the string @var{s} from right to left, returning | |
be3eb25c | 3808 | the index of the last occurrence of a character which |
5676b4fa MV |
3809 | |
3810 | @itemize @bullet | |
3811 | @item | |
3812 | equals @var{char_pred}, if it is character, | |
3813 | ||
3814 | @item | |
be3eb25c | 3815 | satisfies the predicate @var{char_pred}, if it is a procedure, |
5676b4fa MV |
3816 | |
3817 | @item | |
3818 | is in the set if @var{char_pred} is a character set. | |
3819 | @end itemize | |
bf7c2e96 LC |
3820 | |
3821 | Return @code{#f} if no match is found. | |
5676b4fa MV |
3822 | @end deffn |
3823 | ||
3824 | @deffn {Scheme Procedure} string-skip s char_pred [start [end]] | |
3825 | @deffnx {C Function} scm_string_skip (s, char_pred, start, end) | |
3826 | Search through the string @var{s} from left to right, returning | |
be3eb25c | 3827 | the index of the first occurrence of a character which |
5676b4fa MV |
3828 | |
3829 | @itemize @bullet | |
3830 | @item | |
3831 | does not equal @var{char_pred}, if it is character, | |
3832 | ||
3833 | @item | |
be3eb25c | 3834 | does not satisfy the predicate @var{char_pred}, if it is a |
5676b4fa MV |
3835 | procedure, |
3836 | ||
3837 | @item | |
3838 | is not in the set if @var{char_pred} is a character set. | |
3839 | @end itemize | |
3840 | @end deffn | |
3841 | ||
3842 | @deffn {Scheme Procedure} string-skip-right s char_pred [start [end]] | |
3843 | @deffnx {C Function} scm_string_skip_right (s, char_pred, start, end) | |
3844 | Search through the string @var{s} from right to left, returning | |
be3eb25c | 3845 | the index of the last occurrence of a character which |
5676b4fa MV |
3846 | |
3847 | @itemize @bullet | |
3848 | @item | |
3849 | does not equal @var{char_pred}, if it is character, | |
3850 | ||
3851 | @item | |
3852 | does not satisfy the predicate @var{char_pred}, if it is a | |
3853 | procedure, | |
3854 | ||
3855 | @item | |
3856 | is not in the set if @var{char_pred} is a character set. | |
3857 | @end itemize | |
3858 | @end deffn | |
3859 | ||
3860 | @deffn {Scheme Procedure} string-count s char_pred [start [end]] | |
3861 | @deffnx {C Function} scm_string_count (s, char_pred, start, end) | |
3862 | Return the count of the number of characters in the string | |
3863 | @var{s} which | |
3864 | ||
3865 | @itemize @bullet | |
3866 | @item | |
3867 | equals @var{char_pred}, if it is character, | |
3868 | ||
3869 | @item | |
be3eb25c | 3870 | satisfies the predicate @var{char_pred}, if it is a procedure. |
5676b4fa MV |
3871 | |
3872 | @item | |
3873 | is in the set @var{char_pred}, if it is a character set. | |
3874 | @end itemize | |
3875 | @end deffn | |
3876 | ||
3877 | @deffn {Scheme Procedure} string-contains s1 s2 [start1 [end1 [start2 [end2]]]] | |
3878 | @deffnx {C Function} scm_string_contains (s1, s2, start1, end1, start2, end2) | |
3879 | Does string @var{s1} contain string @var{s2}? Return the index | |
3880 | in @var{s1} where @var{s2} occurs as a substring, or false. | |
3881 | The optional start/end indices restrict the operation to the | |
3882 | indicated substrings. | |
3883 | @end deffn | |
3884 | ||
3885 | @deffn {Scheme Procedure} string-contains-ci s1 s2 [start1 [end1 [start2 [end2]]]] | |
3886 | @deffnx {C Function} scm_string_contains_ci (s1, s2, start1, end1, start2, end2) | |
3887 | Does string @var{s1} contain string @var{s2}? Return the index | |
3888 | in @var{s1} where @var{s2} occurs as a substring, or false. | |
3889 | The optional start/end indices restrict the operation to the | |
3890 | indicated substrings. Character comparison is done | |
3891 | case-insensitively. | |
07d83abe MV |
3892 | @end deffn |
3893 | ||
3894 | @node Alphabetic Case Mapping | |
3895 | @subsubsection Alphabetic Case Mapping | |
3896 | ||
3897 | These are procedures for mapping strings to their upper- or lower-case | |
3898 | equivalents, respectively, or for capitalizing strings. | |
3899 | ||
67af975c MG |
3900 | They use the basic case mapping rules for Unicode characters. No |
3901 | special language or context rules are considered. The resulting strings | |
3902 | are guaranteed to be the same length as the input strings. | |
3903 | ||
3904 | @xref{Character Case Mapping, the @code{(ice-9 | |
3905 | i18n)} module}, for locale-dependent case conversions. | |
3906 | ||
5676b4fa MV |
3907 | @deffn {Scheme Procedure} string-upcase str [start [end]] |
3908 | @deffnx {C Function} scm_substring_upcase (str, start, end) | |
07d83abe | 3909 | @deffnx {C Function} scm_string_upcase (str) |
5676b4fa | 3910 | Upcase every character in @code{str}. |
07d83abe MV |
3911 | @end deffn |
3912 | ||
5676b4fa MV |
3913 | @deffn {Scheme Procedure} string-upcase! str [start [end]] |
3914 | @deffnx {C Function} scm_substring_upcase_x (str, start, end) | |
07d83abe | 3915 | @deffnx {C Function} scm_string_upcase_x (str) |
5676b4fa MV |
3916 | Destructively upcase every character in @code{str}. |
3917 | ||
07d83abe | 3918 | @lisp |
5676b4fa MV |
3919 | (string-upcase! y) |
3920 | @result{} "ARRDEFG" | |
3921 | y | |
3922 | @result{} "ARRDEFG" | |
07d83abe MV |
3923 | @end lisp |
3924 | @end deffn | |
3925 | ||
5676b4fa MV |
3926 | @deffn {Scheme Procedure} string-downcase str [start [end]] |
3927 | @deffnx {C Function} scm_substring_downcase (str, start, end) | |
07d83abe | 3928 | @deffnx {C Function} scm_string_downcase (str) |
5676b4fa | 3929 | Downcase every character in @var{str}. |
07d83abe MV |
3930 | @end deffn |
3931 | ||
5676b4fa MV |
3932 | @deffn {Scheme Procedure} string-downcase! str [start [end]] |
3933 | @deffnx {C Function} scm_substring_downcase_x (str, start, end) | |
07d83abe | 3934 | @deffnx {C Function} scm_string_downcase_x (str) |
5676b4fa MV |
3935 | Destructively downcase every character in @var{str}. |
3936 | ||
07d83abe | 3937 | @lisp |
5676b4fa MV |
3938 | y |
3939 | @result{} "ARRDEFG" | |
3940 | (string-downcase! y) | |
3941 | @result{} "arrdefg" | |
3942 | y | |
3943 | @result{} "arrdefg" | |
07d83abe MV |
3944 | @end lisp |
3945 | @end deffn | |
3946 | ||
3947 | @deffn {Scheme Procedure} string-capitalize str | |
3948 | @deffnx {C Function} scm_string_capitalize (str) | |
3949 | Return a freshly allocated string with the characters in | |
3950 | @var{str}, where the first character of every word is | |
3951 | capitalized. | |
3952 | @end deffn | |
3953 | ||
3954 | @deffn {Scheme Procedure} string-capitalize! str | |
3955 | @deffnx {C Function} scm_string_capitalize_x (str) | |
3956 | Upcase the first character of every word in @var{str} | |
3957 | destructively and return @var{str}. | |
3958 | ||
3959 | @lisp | |
3960 | y @result{} "hello world" | |
3961 | (string-capitalize! y) @result{} "Hello World" | |
3962 | y @result{} "Hello World" | |
3963 | @end lisp | |
3964 | @end deffn | |
3965 | ||
5676b4fa MV |
3966 | @deffn {Scheme Procedure} string-titlecase str [start [end]] |
3967 | @deffnx {C Function} scm_string_titlecase (str, start, end) | |
3968 | Titlecase every first character in a word in @var{str}. | |
3969 | @end deffn | |
07d83abe | 3970 | |
5676b4fa MV |
3971 | @deffn {Scheme Procedure} string-titlecase! str [start [end]] |
3972 | @deffnx {C Function} scm_string_titlecase_x (str, start, end) | |
3973 | Destructively titlecase every first character in a word in | |
3974 | @var{str}. | |
3975 | @end deffn | |
3976 | ||
3977 | @node Reversing and Appending Strings | |
3978 | @subsubsection Reversing and Appending Strings | |
07d83abe | 3979 | |
5676b4fa MV |
3980 | @deffn {Scheme Procedure} string-reverse str [start [end]] |
3981 | @deffnx {C Function} scm_string_reverse (str, start, end) | |
3982 | Reverse the string @var{str}. The optional arguments | |
3983 | @var{start} and @var{end} delimit the region of @var{str} to | |
3984 | operate on. | |
3985 | @end deffn | |
3986 | ||
3987 | @deffn {Scheme Procedure} string-reverse! str [start [end]] | |
3988 | @deffnx {C Function} scm_string_reverse_x (str, start, end) | |
3989 | Reverse the string @var{str} in-place. The optional arguments | |
3990 | @var{start} and @var{end} delimit the region of @var{str} to | |
3991 | operate on. The return value is unspecified. | |
3992 | @end deffn | |
07d83abe MV |
3993 | |
3994 | @rnindex string-append | |
df0a1002 | 3995 | @deffn {Scheme Procedure} string-append arg @dots{} |
07d83abe MV |
3996 | @deffnx {C Function} scm_string_append (args) |
3997 | Return a newly allocated string whose characters form the | |
df0a1002 | 3998 | concatenation of the given strings, @var{arg} @enddots{}. |
07d83abe MV |
3999 | |
4000 | @example | |
4001 | (let ((h "hello ")) | |
4002 | (string-append h "world")) | |
4003 | @result{} "hello world" | |
4004 | @end example | |
4005 | @end deffn | |
4006 | ||
df0a1002 BT |
4007 | @deffn {Scheme Procedure} string-append/shared arg @dots{} |
4008 | @deffnx {C Function} scm_string_append_shared (args) | |
5676b4fa MV |
4009 | Like @code{string-append}, but the result may share memory |
4010 | with the argument strings. | |
4011 | @end deffn | |
4012 | ||
4013 | @deffn {Scheme Procedure} string-concatenate ls | |
4014 | @deffnx {C Function} scm_string_concatenate (ls) | |
df0a1002 BT |
4015 | Append the elements (which must be strings) of @var{ls} together into a |
4016 | single string. Guaranteed to return a freshly allocated string. | |
5676b4fa MV |
4017 | @end deffn |
4018 | ||
4019 | @deffn {Scheme Procedure} string-concatenate-reverse ls [final_string [end]] | |
4020 | @deffnx {C Function} scm_string_concatenate_reverse (ls, final_string, end) | |
4021 | Without optional arguments, this procedure is equivalent to | |
4022 | ||
aba0dff5 | 4023 | @lisp |
5676b4fa | 4024 | (string-concatenate (reverse ls)) |
aba0dff5 | 4025 | @end lisp |
5676b4fa MV |
4026 | |
4027 | If the optional argument @var{final_string} is specified, it is | |
4028 | consed onto the beginning to @var{ls} before performing the | |
4029 | list-reverse and string-concatenate operations. If @var{end} | |
4030 | is given, only the characters of @var{final_string} up to index | |
4031 | @var{end} are used. | |
4032 | ||
4033 | Guaranteed to return a freshly allocated string. | |
4034 | @end deffn | |
4035 | ||
4036 | @deffn {Scheme Procedure} string-concatenate/shared ls | |
4037 | @deffnx {C Function} scm_string_concatenate_shared (ls) | |
4038 | Like @code{string-concatenate}, but the result may share memory | |
4039 | with the strings in the list @var{ls}. | |
4040 | @end deffn | |
4041 | ||
4042 | @deffn {Scheme Procedure} string-concatenate-reverse/shared ls [final_string [end]] | |
4043 | @deffnx {C Function} scm_string_concatenate_reverse_shared (ls, final_string, end) | |
4044 | Like @code{string-concatenate-reverse}, but the result may | |
72b3aa56 | 4045 | share memory with the strings in the @var{ls} arguments. |
5676b4fa MV |
4046 | @end deffn |
4047 | ||
4048 | @node Mapping Folding and Unfolding | |
4049 | @subsubsection Mapping, Folding, and Unfolding | |
4050 | ||
4051 | @deffn {Scheme Procedure} string-map proc s [start [end]] | |
4052 | @deffnx {C Function} scm_string_map (proc, s, start, end) | |
4053 | @var{proc} is a char->char procedure, it is mapped over | |
4054 | @var{s}. The order in which the procedure is applied to the | |
4055 | string elements is not specified. | |
4056 | @end deffn | |
4057 | ||
4058 | @deffn {Scheme Procedure} string-map! proc s [start [end]] | |
4059 | @deffnx {C Function} scm_string_map_x (proc, s, start, end) | |
4060 | @var{proc} is a char->char procedure, it is mapped over | |
4061 | @var{s}. The order in which the procedure is applied to the | |
4062 | string elements is not specified. The string @var{s} is | |
4063 | modified in-place, the return value is not specified. | |
4064 | @end deffn | |
4065 | ||
4066 | @deffn {Scheme Procedure} string-for-each proc s [start [end]] | |
4067 | @deffnx {C Function} scm_string_for_each (proc, s, start, end) | |
4068 | @var{proc} is mapped over @var{s} in left-to-right order. The | |
4069 | return value is not specified. | |
4070 | @end deffn | |
4071 | ||
4072 | @deffn {Scheme Procedure} string-for-each-index proc s [start [end]] | |
4073 | @deffnx {C Function} scm_string_for_each_index (proc, s, start, end) | |
2a7820f2 KR |
4074 | Call @code{(@var{proc} i)} for each index i in @var{s}, from left to |
4075 | right. | |
4076 | ||
4077 | For example, to change characters to alternately upper and lower case, | |
4078 | ||
4079 | @example | |
4080 | (define str (string-copy "studly")) | |
45867c2a NJ |
4081 | (string-for-each-index |
4082 | (lambda (i) | |
4083 | (string-set! str i | |
4084 | ((if (even? i) char-upcase char-downcase) | |
4085 | (string-ref str i)))) | |
4086 | str) | |
2a7820f2 KR |
4087 | str @result{} "StUdLy" |
4088 | @end example | |
5676b4fa MV |
4089 | @end deffn |
4090 | ||
4091 | @deffn {Scheme Procedure} string-fold kons knil s [start [end]] | |
4092 | @deffnx {C Function} scm_string_fold (kons, knil, s, start, end) | |
4093 | Fold @var{kons} over the characters of @var{s}, with @var{knil} | |
4094 | as the terminating element, from left to right. @var{kons} | |
4095 | must expect two arguments: The actual character and the last | |
4096 | result of @var{kons}' application. | |
4097 | @end deffn | |
4098 | ||
4099 | @deffn {Scheme Procedure} string-fold-right kons knil s [start [end]] | |
4100 | @deffnx {C Function} scm_string_fold_right (kons, knil, s, start, end) | |
4101 | Fold @var{kons} over the characters of @var{s}, with @var{knil} | |
4102 | as the terminating element, from right to left. @var{kons} | |
4103 | must expect two arguments: The actual character and the last | |
4104 | result of @var{kons}' application. | |
4105 | @end deffn | |
4106 | ||
4107 | @deffn {Scheme Procedure} string-unfold p f g seed [base [make_final]] | |
4108 | @deffnx {C Function} scm_string_unfold (p, f, g, seed, base, make_final) | |
4109 | @itemize @bullet | |
4110 | @item @var{g} is used to generate a series of @emph{seed} | |
4111 | values from the initial @var{seed}: @var{seed}, (@var{g} | |
4112 | @var{seed}), (@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}), | |
4113 | @dots{} | |
4114 | @item @var{p} tells us when to stop -- when it returns true | |
4115 | when applied to one of these seed values. | |
4116 | @item @var{f} maps each seed value to the corresponding | |
4117 | character in the result string. These chars are assembled | |
4118 | into the string in a left-to-right order. | |
4119 | @item @var{base} is the optional initial/leftmost portion | |
4120 | of the constructed string; it default to the empty | |
4121 | string. | |
4122 | @item @var{make_final} is applied to the terminal seed | |
4123 | value (on which @var{p} returns true) to produce | |
4124 | the final/rightmost portion of the constructed string. | |
9a18d8d4 | 4125 | The default is nothing extra. |
5676b4fa MV |
4126 | @end itemize |
4127 | @end deffn | |
4128 | ||
4129 | @deffn {Scheme Procedure} string-unfold-right p f g seed [base [make_final]] | |
4130 | @deffnx {C Function} scm_string_unfold_right (p, f, g, seed, base, make_final) | |
4131 | @itemize @bullet | |
4132 | @item @var{g} is used to generate a series of @emph{seed} | |
4133 | values from the initial @var{seed}: @var{seed}, (@var{g} | |
4134 | @var{seed}), (@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}), | |
4135 | @dots{} | |
4136 | @item @var{p} tells us when to stop -- when it returns true | |
4137 | when applied to one of these seed values. | |
4138 | @item @var{f} maps each seed value to the corresponding | |
4139 | character in the result string. These chars are assembled | |
4140 | into the string in a right-to-left order. | |
4141 | @item @var{base} is the optional initial/rightmost portion | |
4142 | of the constructed string; it default to the empty | |
4143 | string. | |
4144 | @item @var{make_final} is applied to the terminal seed | |
4145 | value (on which @var{p} returns true) to produce | |
4146 | the final/leftmost portion of the constructed string. | |
4147 | It defaults to @code{(lambda (x) )}. | |
4148 | @end itemize | |
4149 | @end deffn | |
4150 | ||
4151 | @node Miscellaneous String Operations | |
4152 | @subsubsection Miscellaneous String Operations | |
4153 | ||
4154 | @deffn {Scheme Procedure} xsubstring s from [to [start [end]]] | |
4155 | @deffnx {C Function} scm_xsubstring (s, from, to, start, end) | |
4156 | This is the @emph{extended substring} procedure that implements | |
4157 | replicated copying of a substring of some string. | |
4158 | ||
4159 | @var{s} is a string, @var{start} and @var{end} are optional | |
4160 | arguments that demarcate a substring of @var{s}, defaulting to | |
4161 | 0 and the length of @var{s}. Replicate this substring up and | |
4162 | down index space, in both the positive and negative directions. | |
4163 | @code{xsubstring} returns the substring of this string | |
4164 | beginning at index @var{from}, and ending at @var{to}, which | |
4165 | defaults to @var{from} + (@var{end} - @var{start}). | |
4166 | @end deffn | |
4167 | ||
4168 | @deffn {Scheme Procedure} string-xcopy! target tstart s sfrom [sto [start [end]]] | |
4169 | @deffnx {C Function} scm_string_xcopy_x (target, tstart, s, sfrom, sto, start, end) | |
4170 | Exactly the same as @code{xsubstring}, but the extracted text | |
4171 | is written into the string @var{target} starting at index | |
4172 | @var{tstart}. The operation is not defined if @code{(eq? | |
4173 | @var{target} @var{s})} or these arguments share storage -- you | |
4174 | cannot copy a string on top of itself. | |
4175 | @end deffn | |
4176 | ||
4177 | @deffn {Scheme Procedure} string-replace s1 s2 [start1 [end1 [start2 [end2]]]] | |
4178 | @deffnx {C Function} scm_string_replace (s1, s2, start1, end1, start2, end2) | |
4179 | Return the string @var{s1}, but with the characters | |
4180 | @var{start1} @dots{} @var{end1} replaced by the characters | |
4181 | @var{start2} @dots{} @var{end2} from @var{s2}. | |
4182 | @end deffn | |
4183 | ||
4184 | @deffn {Scheme Procedure} string-tokenize s [token_set [start [end]]] | |
4185 | @deffnx {C Function} scm_string_tokenize (s, token_set, start, end) | |
4186 | Split the string @var{s} into a list of substrings, where each | |
4187 | substring is a maximal non-empty contiguous sequence of | |
4188 | characters from the character set @var{token_set}, which | |
4189 | defaults to @code{char-set:graphic}. | |
4190 | If @var{start} or @var{end} indices are provided, they restrict | |
4191 | @code{string-tokenize} to operating on the indicated substring | |
4192 | of @var{s}. | |
4193 | @end deffn | |
4194 | ||
9fe717e2 AW |
4195 | @deffn {Scheme Procedure} string-filter char_pred s [start [end]] |
4196 | @deffnx {C Function} scm_string_filter (char_pred, s, start, end) | |
08de3e24 | 4197 | Filter the string @var{s}, retaining only those characters which |
a88e2a96 | 4198 | satisfy @var{char_pred}. |
08de3e24 KR |
4199 | |
4200 | If @var{char_pred} is a procedure, it is applied to each character as | |
4201 | a predicate, if it is a character, it is tested for equality and if it | |
4202 | is a character set, it is tested for membership. | |
5676b4fa MV |
4203 | @end deffn |
4204 | ||
9fe717e2 AW |
4205 | @deffn {Scheme Procedure} string-delete char_pred s [start [end]] |
4206 | @deffnx {C Function} scm_string_delete (char_pred, s, start, end) | |
a88e2a96 | 4207 | Delete characters satisfying @var{char_pred} from @var{s}. |
08de3e24 KR |
4208 | |
4209 | If @var{char_pred} is a procedure, it is applied to each character as | |
4210 | a predicate, if it is a character, it is tested for equality and if it | |
4211 | is a character set, it is tested for membership. | |
5676b4fa MV |
4212 | @end deffn |
4213 | ||
f05bb849 AW |
4214 | @node Representing Strings as Bytes |
4215 | @subsubsection Representing Strings as Bytes | |
4216 | ||
4217 | Out in the cold world outside of Guile, not all strings are treated in | |
4218 | the same way. Out there there are only bytes, and there are many ways | |
4219 | of representing a strings (sequences of characters) as binary data | |
4220 | (sequences of bytes). | |
4221 | ||
4222 | As a user, usually you don't have to think about this very much. When | |
4223 | you type on your keyboard, your system encodes your keystrokes as bytes | |
4224 | according to the locale that you have configured on your computer. | |
4225 | Guile uses the locale to decode those bytes back into characters -- | |
4226 | hopefully the same characters that you typed in. | |
4227 | ||
4228 | All is not so clear when dealing with a system with multiple users, such | |
4229 | as a web server. Your web server might get a request from one user for | |
4230 | data encoded in the ISO-8859-1 character set, and then another request | |
4231 | from a different user for UTF-8 data. | |
4232 | ||
4233 | @cindex iconv | |
4234 | @cindex character encoding | |
4235 | Guile provides an @dfn{iconv} module for converting between strings and | |
4236 | sequences of bytes. @xref{Bytevectors}, for more on how Guile | |
4237 | represents raw byte sequences. This module gets its name from the | |
4238 | common @sc{unix} command of the same name. | |
4239 | ||
5ed4ea90 AW |
4240 | Note that often it is sufficient to just read and write strings from |
4241 | ports instead of using these functions. To do this, specify the port | |
4242 | encoding using @code{set-port-encoding!}. @xref{Ports}, for more on | |
4243 | ports and character encodings. | |
4244 | ||
f05bb849 AW |
4245 | Unlike the rest of the procedures in this section, you have to load the |
4246 | @code{iconv} module before having access to these procedures: | |
4247 | ||
4248 | @example | |
4249 | (use-modules (ice-9 iconv)) | |
4250 | @end example | |
4251 | ||
36929486 | 4252 | @deffn {Scheme Procedure} string->bytevector string encoding [conversion-strategy] |
f05bb849 AW |
4253 | Encode @var{string} as a sequence of bytes. |
4254 | ||
4255 | The string will be encoded in the character set specified by the | |
4256 | @var{encoding} string. If the string has characters that cannot be | |
4257 | represented in the encoding, by default this procedure raises an | |
5ed4ea90 AW |
4258 | @code{encoding-error}. Pass a @var{conversion-strategy} argument to |
4259 | specify other behaviors. | |
f05bb849 AW |
4260 | |
4261 | The return value is a bytevector. @xref{Bytevectors}, for more on | |
4262 | bytevectors. @xref{Ports}, for more on character encodings and | |
4263 | conversion strategies. | |
4264 | @end deffn | |
4265 | ||
36929486 | 4266 | @deffn {Scheme Procedure} bytevector->string bytevector encoding [conversion-strategy] |
f05bb849 AW |
4267 | Decode @var{bytevector} into a string. |
4268 | ||
4269 | The bytes will be decoded from the character set by the @var{encoding} | |
4270 | string. If the bytes do not form a valid encoding, by default this | |
5ed4ea90 AW |
4271 | procedure raises an @code{decoding-error}. As with |
4272 | @code{string->bytevector}, pass the optional @var{conversion-strategy} | |
4273 | argument to modify this behavior. @xref{Ports}, for more on character | |
4274 | encodings and conversion strategies. | |
f05bb849 AW |
4275 | @end deffn |
4276 | ||
36929486 | 4277 | @deffn {Scheme Procedure} call-with-output-encoded-string encoding proc [conversion-strategy] |
f05bb849 AW |
4278 | Like @code{call-with-output-string}, but instead of returning a string, |
4279 | returns a encoding of the string according to @var{encoding}, as a | |
4280 | bytevector. This procedure can be more efficient than collecting a | |
4281 | string and then converting it via @code{string->bytevector}. | |
4282 | @end deffn | |
4283 | ||
91210d62 MV |
4284 | @node Conversion to/from C |
4285 | @subsubsection Conversion to/from C | |
4286 | ||
4287 | When creating a Scheme string from a C string or when converting a | |
4288 | Scheme string to a C string, the concept of character encoding becomes | |
4289 | important. | |
4290 | ||
4291 | In C, a string is just a sequence of bytes, and the character encoding | |
4292 | describes the relation between these bytes and the actual characters | |
f05bb849 AW |
4293 | that make up the string. For Scheme strings, character encoding is not |
4294 | an issue (most of the time), since in Scheme you usually treat strings | |
4295 | as character sequences, not byte sequences. | |
91210d62 | 4296 | |
67af975c MG |
4297 | Converting to C and converting from C each have their own challenges. |
4298 | ||
4299 | When converting from C to Scheme, it is important that the sequence of | |
4300 | bytes in the C string be valid with respect to its encoding. ASCII | |
4301 | strings, for example, can't have any bytes greater than 127. An ASCII | |
4302 | byte greater than 127 is considered @emph{ill-formed} and cannot be | |
4303 | converted into a Scheme character. | |
4304 | ||
4305 | Problems can occur in the reverse operation as well. Not all character | |
4306 | encodings can hold all possible Scheme characters. Some encodings, like | |
4307 | ASCII for example, can only describe a small subset of all possible | |
4308 | characters. So, when converting to C, one must first decide what to do | |
4309 | with Scheme characters that can't be represented in the C string. | |
91210d62 | 4310 | |
c88453e8 MV |
4311 | Converting a Scheme string to a C string will often allocate fresh |
4312 | memory to hold the result. You must take care that this memory is | |
4313 | properly freed eventually. In many cases, this can be achieved by | |
661ae7ab MV |
4314 | using @code{scm_dynwind_free} inside an appropriate dynwind context, |
4315 | @xref{Dynamic Wind}. | |
91210d62 MV |
4316 | |
4317 | @deftypefn {C Function} SCM scm_from_locale_string (const char *str) | |
4318 | @deftypefnx {C Function} SCM scm_from_locale_stringn (const char *str, size_t len) | |
67af975c | 4319 | Creates a new Scheme string that has the same contents as @var{str} when |
95f5e303 | 4320 | interpreted in the character encoding of the current locale. |
91210d62 MV |
4321 | |
4322 | For @code{scm_from_locale_string}, @var{str} must be null-terminated. | |
4323 | ||
4324 | For @code{scm_from_locale_stringn}, @var{len} specifies the length of | |
4325 | @var{str} in bytes, and @var{str} does not need to be null-terminated. | |
4326 | If @var{len} is @code{(size_t)-1}, then @var{str} does need to be | |
4327 | null-terminated and the real length will be found with @code{strlen}. | |
67af975c MG |
4328 | |
4329 | If the C string is ill-formed, an error will be raised. | |
ce3ce21c MW |
4330 | |
4331 | Note that these functions should @emph{not} be used to convert C string | |
4332 | constants, because there is no guarantee that the current locale will | |
a71e79c3 MW |
4333 | match that of the execution character set, used for string and character |
4334 | constants. Most modern C compilers use UTF-8 by default, so to convert | |
4335 | C string constants we recommend @code{scm_from_utf8_string}. | |
91210d62 MV |
4336 | @end deftypefn |
4337 | ||
4338 | @deftypefn {C Function} SCM scm_take_locale_string (char *str) | |
4339 | @deftypefnx {C Function} SCM scm_take_locale_stringn (char *str, size_t len) | |
4340 | Like @code{scm_from_locale_string} and @code{scm_from_locale_stringn}, | |
4341 | respectively, but also frees @var{str} with @code{free} eventually. | |
4342 | Thus, you can use this function when you would free @var{str} anyway | |
4343 | immediately after creating the Scheme string. In certain cases, Guile | |
4344 | can then use @var{str} directly as its internal representation. | |
4345 | @end deftypefn | |
4346 | ||
4846ae2c KR |
4347 | @deftypefn {C Function} {char *} scm_to_locale_string (SCM str) |
4348 | @deftypefnx {C Function} {char *} scm_to_locale_stringn (SCM str, size_t *lenp) | |
95f5e303 AW |
4349 | Returns a C string with the same contents as @var{str} in the character |
4350 | encoding of the current locale. The C string must be freed with | |
4351 | @code{free} eventually, maybe by using @code{scm_dynwind_free}, | |
67af975c | 4352 | @xref{Dynamic Wind}. |
91210d62 MV |
4353 | |
4354 | For @code{scm_to_locale_string}, the returned string is | |
4355 | null-terminated and an error is signalled when @var{str} contains | |
4356 | @code{#\nul} characters. | |
4357 | ||
4358 | For @code{scm_to_locale_stringn} and @var{lenp} not @code{NULL}, | |
4359 | @var{str} might contain @code{#\nul} characters and the length of the | |
4360 | returned string in bytes is stored in @code{*@var{lenp}}. The | |
4361 | returned string will not be null-terminated in this case. If | |
4362 | @var{lenp} is @code{NULL}, @code{scm_to_locale_stringn} behaves like | |
4363 | @code{scm_to_locale_string}. | |
67af975c | 4364 | |
95f5e303 AW |
4365 | If a character in @var{str} cannot be represented in the character |
4366 | encoding of the current locale, the default port conversion strategy is | |
4367 | used. @xref{Ports}, for more on conversion strategies. | |
4368 | ||
4369 | If the conversion strategy is @code{error}, an error will be raised. If | |
4370 | it is @code{substitute}, a replacement character, such as a question | |
4371 | mark, will be inserted in its place. If it is @code{escape}, a hex | |
4372 | escape will be inserted in its place. | |
91210d62 MV |
4373 | @end deftypefn |
4374 | ||
4375 | @deftypefn {C Function} size_t scm_to_locale_stringbuf (SCM str, char *buf, size_t max_len) | |
4376 | Puts @var{str} as a C string in the current locale encoding into the | |
4377 | memory pointed to by @var{buf}. The buffer at @var{buf} has room for | |
4378 | @var{max_len} bytes and @code{scm_to_local_stringbuf} will never store | |
4379 | more than that. No terminating @code{'\0'} will be stored. | |
4380 | ||
4381 | The return value of @code{scm_to_locale_stringbuf} is the number of | |
4382 | bytes that are needed for all of @var{str}, regardless of whether | |
4383 | @var{buf} was large enough to hold them. Thus, when the return value | |
4384 | is larger than @var{max_len}, only @var{max_len} bytes have been | |
4385 | stored and you probably need to try again with a larger buffer. | |
4386 | @end deftypefn | |
cf313a94 MG |
4387 | |
4388 | For most situations, string conversion should occur using the current | |
4389 | locale, such as with the functions above. But there may be cases where | |
4390 | one wants to convert strings from a character encoding other than the | |
4391 | locale's character encoding. For these cases, the lower-level functions | |
4392 | @code{scm_to_stringn} and @code{scm_from_stringn} are provided. These | |
4393 | functions should seldom be necessary if one is properly using locales. | |
4394 | ||
4395 | @deftp {C Type} scm_t_string_failed_conversion_handler | |
4396 | This is an enumerated type that can take one of three values: | |
4397 | @code{SCM_FAILED_CONVERSION_ERROR}, | |
4398 | @code{SCM_FAILED_CONVERSION_QUESTION_MARK}, and | |
4399 | @code{SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE}. They are used to indicate | |
4400 | a strategy for handling characters that cannot be converted to or from a | |
4401 | given character encoding. @code{SCM_FAILED_CONVERSION_ERROR} indicates | |
4402 | that a conversion should throw an error if some characters cannot be | |
4403 | converted. @code{SCM_FAILED_CONVERSION_QUESTION_MARK} indicates that a | |
4404 | conversion should replace unconvertable characters with the question | |
4405 | mark character. And, @code{SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE} | |
4406 | requests that a conversion should replace an unconvertable character | |
4407 | with an escape sequence. | |
4408 | ||
4409 | While all three strategies apply when converting Scheme strings to C, | |
4410 | only @code{SCM_FAILED_CONVERSION_ERROR} and | |
4411 | @code{SCM_FAILED_CONVERSION_QUESTION_MARK} can be used when converting C | |
4412 | strings to Scheme. | |
4413 | @end deftp | |
4414 | ||
4415 | @deftypefn {C Function} char *scm_to_stringn (SCM str, size_t *lenp, const char *encoding, scm_t_string_failed_conversion_handler handler) | |
4416 | This function returns a newly allocated C string from the Guile string | |
68a78738 MW |
4417 | @var{str}. The length of the returned string in bytes will be returned in |
4418 | @var{lenp}. The character encoding of the C string is passed as the ASCII, | |
cf313a94 MG |
4419 | null-terminated C string @var{encoding}. The @var{handler} parameter |
4420 | gives a strategy for dealing with characters that cannot be converted | |
4421 | into @var{encoding}. | |
4422 | ||
68a78738 | 4423 | If @var{lenp} is @code{NULL}, this function will return a null-terminated C |
cf313a94 MG |
4424 | string. It will throw an error if the string contains a null |
4425 | character. | |
f05bb849 | 4426 | |
5ed4ea90 | 4427 | The Scheme interface to this function is @code{string->bytevector}, from the |
f05bb849 | 4428 | @code{ice-9 iconv} module. @xref{Representing Strings as Bytes}. |
cf313a94 MG |
4429 | @end deftypefn |
4430 | ||
4431 | @deftypefn {C Function} SCM scm_from_stringn (const char *str, size_t len, const char *encoding, scm_t_string_failed_conversion_handler handler) | |
4432 | This function returns a scheme string from the C string @var{str}. The | |
c3d8450c | 4433 | length in bytes of the C string is input as @var{len}. The encoding of the C |
cf313a94 MG |
4434 | string is passed as the ASCII, null-terminated C string @code{encoding}. |
4435 | The @var{handler} parameters suggests a strategy for dealing with | |
4436 | unconvertable characters. | |
f05bb849 | 4437 | |
5ed4ea90 | 4438 | The Scheme interface to this function is @code{bytevector->string}. |
f05bb849 | 4439 | @xref{Representing Strings as Bytes}. |
cf313a94 MG |
4440 | @end deftypefn |
4441 | ||
ce3ce21c MW |
4442 | The following conversion functions are provided as a convenience for the |
4443 | most commonly used encodings. | |
4444 | ||
4445 | @deftypefn {C Function} SCM scm_from_latin1_string (const char *str) | |
4446 | @deftypefnx {C Function} SCM scm_from_utf8_string (const char *str) | |
4447 | @deftypefnx {C Function} SCM scm_from_utf32_string (const scm_t_wchar *str) | |
4448 | Return a scheme string from the null-terminated C string @var{str}, | |
4449 | which is ISO-8859-1-, UTF-8-, or UTF-32-encoded. These functions should | |
4450 | be used to convert hard-coded C string constants into Scheme strings. | |
4451 | @end deftypefn | |
cf313a94 MG |
4452 | |
4453 | @deftypefn {C Function} SCM scm_from_latin1_stringn (const char *str, size_t len) | |
647dc1ac LC |
4454 | @deftypefnx {C Function} SCM scm_from_utf8_stringn (const char *str, size_t len) |
4455 | @deftypefnx {C Function} SCM scm_from_utf32_stringn (const scm_t_wchar *str, size_t len) | |
4456 | Return a scheme string from C string @var{str}, which is ISO-8859-1-, | |
4457 | UTF-8-, or UTF-32-encoded, of length @var{len}. @var{len} is the number | |
4458 | of bytes pointed to by @var{str} for @code{scm_from_latin1_stringn} and | |
4459 | @code{scm_from_utf8_stringn}; it is the number of elements (code points) | |
4460 | in @var{str} in the case of @code{scm_from_utf32_stringn}. | |
cf313a94 MG |
4461 | @end deftypefn |
4462 | ||
647dc1ac LC |
4463 | @deftypefn {C function} char *scm_to_latin1_stringn (SCM str, size_t *lenp) |
4464 | @deftypefnx {C function} char *scm_to_utf8_stringn (SCM str, size_t *lenp) | |
4465 | @deftypefnx {C function} scm_t_wchar *scm_to_utf32_stringn (SCM str, size_t *lenp) | |
4466 | Return a newly allocated, ISO-8859-1-, UTF-8-, or UTF-32-encoded C string | |
4467 | from Scheme string @var{str}. An error is thrown when @var{str} | |
68a78738 | 4468 | cannot be converted to the specified encoding. If @var{lenp} is |
cf313a94 MG |
4469 | @code{NULL}, the returned C string will be null terminated, and an error |
4470 | will be thrown if the C string would otherwise contain null | |
68a78738 MW |
4471 | characters. If @var{lenp} is not @code{NULL}, the string is not null terminated, |
4472 | and the length of the returned string is returned in @var{lenp}. The length | |
4473 | returned is the number of bytes for @code{scm_to_latin1_stringn} and | |
4474 | @code{scm_to_utf8_stringn}; it is the number of elements (code points) | |
4475 | for @code{scm_to_utf32_stringn}. | |
cf313a94 | 4476 | @end deftypefn |
07d83abe | 4477 | |
08467a7e AW |
4478 | It is not often the case, but sometimes when you are dealing with the |
4479 | implementation details of a port, you need to encode and decode strings | |
4480 | according to the encoding and conversion strategy of the port. There | |
4481 | are some convenience functions for that purpose as well. | |
4482 | ||
4483 | @deftypefn {C Function} SCM scm_from_port_string (const char *str, SCM port) | |
4484 | @deftypefnx {C Function} SCM scm_from_port_stringn (const char *str, size_t len, SCM port) | |
4485 | @deftypefnx {C Function} char* scm_to_port_string (SCM str, SCM port) | |
4486 | @deftypefnx {C Function} char* scm_to_port_stringn (SCM str, size_t *lenp, SCM port) | |
4487 | Like @code{scm_from_stringn} and friends, except they take their | |
4488 | encoding and conversion strategy from a given port object. | |
4489 | @end deftypefn | |
4490 | ||
5b6b22e8 MG |
4491 | @node String Internals |
4492 | @subsubsection String Internals | |
4493 | ||
4494 | Guile stores each string in memory as a contiguous array of Unicode code | |
4495 | points along with an associated set of attributes. If all of the code | |
4496 | points of a string have an integer range between 0 and 255 inclusive, | |
4497 | the code point array is stored as one byte per code point: it is stored | |
4498 | as an ISO-8859-1 (aka Latin-1) string. If any of the code points of the | |
4499 | string has an integer value greater that 255, the code point array is | |
4500 | stored as four bytes per code point: it is stored as a UTF-32 string. | |
4501 | ||
4502 | Conversion between the one-byte-per-code-point and | |
4503 | four-bytes-per-code-point representations happens automatically as | |
4504 | necessary. | |
4505 | ||
4506 | No API is provided to set the internal representation of strings; | |
4507 | however, there are pair of procedures available to query it. These are | |
4508 | debugging procedures. Using them in production code is discouraged, | |
4509 | since the details of Guile's internal representation of strings may | |
4510 | change from release to release. | |
4511 | ||
4512 | @deffn {Scheme Procedure} string-bytes-per-char str | |
4513 | @deffnx {C Function} scm_string_bytes_per_char (str) | |
4514 | Return the number of bytes used to encode a Unicode code point in string | |
4515 | @var{str}. The result is one or four. | |
4516 | @end deffn | |
4517 | ||
4518 | @deffn {Scheme Procedure} %string-dump str | |
4519 | @deffnx {C Function} scm_sys_string_dump (str) | |
4520 | Returns an association list containing debugging information for | |
4521 | @var{str}. The association list has the following entries. | |
4522 | @table @code | |
4523 | ||
4524 | @item string | |
4525 | The string itself. | |
4526 | ||
4527 | @item start | |
4528 | The start index of the string into its stringbuf | |
4529 | ||
4530 | @item length | |
4531 | The length of the string | |
4532 | ||
4533 | @item shared | |
4534 | If this string is a substring, it returns its | |
4535 | parent string. Otherwise, it returns @code{#f} | |
4536 | ||
4537 | @item read-only | |
4538 | @code{#t} if the string is read-only | |
4539 | ||
4540 | @item stringbuf-chars | |
4541 | A new string containing this string's stringbuf's characters | |
4542 | ||
4543 | @item stringbuf-length | |
4544 | The number of characters in this stringbuf | |
4545 | ||
4546 | @item stringbuf-shared | |
4547 | @code{#t} if this stringbuf is shared | |
4548 | ||
4549 | @item stringbuf-wide | |
4550 | @code{#t} if this stringbuf's characters are stored in a 32-bit buffer, | |
4551 | or @code{#f} if they are stored in an 8-bit buffer | |
4552 | @end table | |
4553 | @end deffn | |
4554 | ||
4555 | ||
b242715b LC |
4556 | @node Bytevectors |
4557 | @subsection Bytevectors | |
4558 | ||
4559 | @cindex bytevector | |
4560 | @cindex R6RS | |
4561 | ||
07d22c02 | 4562 | A @dfn{bytevector} is a raw bit string. The @code{(rnrs bytevectors)} |
b242715b | 4563 | module provides the programming interface specified by the |
5fa2deb3 | 4564 | @uref{http://www.r6rs.org/, Revised^6 Report on the Algorithmic Language |
b242715b LC |
4565 | Scheme (R6RS)}. It contains procedures to manipulate bytevectors and |
4566 | interpret their contents in a number of ways: bytevector contents can be | |
4567 | accessed as signed or unsigned integer of various sizes and endianness, | |
4568 | as IEEE-754 floating point numbers, or as strings. It is a useful tool | |
4569 | to encode and decode binary data. | |
4570 | ||
4571 | The R6RS (Section 4.3.4) specifies an external representation for | |
4572 | bytevectors, whereby the octets (integers in the range 0--255) contained | |
4573 | in the bytevector are represented as a list prefixed by @code{#vu8}: | |
4574 | ||
4575 | @lisp | |
4576 | #vu8(1 53 204) | |
4577 | @end lisp | |
4578 | ||
4579 | denotes a 3-byte bytevector containing the octets 1, 53, and 204. Like | |
4580 | string literals, booleans, etc., bytevectors are ``self-quoting'', i.e., | |
4581 | they do not need to be quoted: | |
4582 | ||
4583 | @lisp | |
4584 | #vu8(1 53 204) | |
4585 | @result{} #vu8(1 53 204) | |
4586 | @end lisp | |
4587 | ||
4588 | Bytevectors can be used with the binary input/output primitives of the | |
4589 | R6RS (@pxref{R6RS I/O Ports}). | |
4590 | ||
4591 | @menu | |
4592 | * Bytevector Endianness:: Dealing with byte order. | |
4593 | * Bytevector Manipulation:: Creating, copying, manipulating bytevectors. | |
4594 | * Bytevectors as Integers:: Interpreting bytes as integers. | |
4595 | * Bytevectors and Integer Lists:: Converting to/from an integer list. | |
4596 | * Bytevectors as Floats:: Interpreting bytes as real numbers. | |
4597 | * Bytevectors as Strings:: Interpreting bytes as Unicode strings. | |
118ff892 | 4598 | * Bytevectors as Arrays:: Guile extension to the bytevector API. |
27219b32 | 4599 | * Bytevectors as Uniform Vectors:: Bytevectors and SRFI-4. |
b242715b LC |
4600 | @end menu |
4601 | ||
4602 | @node Bytevector Endianness | |
4603 | @subsubsection Endianness | |
4604 | ||
4605 | @cindex endianness | |
4606 | @cindex byte order | |
4607 | @cindex word order | |
4608 | ||
4609 | Some of the following procedures take an @var{endianness} parameter. | |
5fa2deb3 AW |
4610 | The @dfn{endianness} is defined as the order of bytes in multi-byte |
4611 | numbers: numbers encoded in @dfn{big endian} have their most | |
4612 | significant bytes written first, whereas numbers encoded in | |
4613 | @dfn{little endian} have their least significant bytes | |
4614 | first@footnote{Big-endian and little-endian are the most common | |
4615 | ``endiannesses'', but others do exist. For instance, the GNU MP | |
4616 | library allows @dfn{word order} to be specified independently of | |
4617 | @dfn{byte order} (@pxref{Integer Import and Export,,, gmp, The GNU | |
4618 | Multiple Precision Arithmetic Library Manual}).}. | |
4619 | ||
4620 | Little-endian is the native endianness of the IA32 architecture and | |
4621 | its derivatives, while big-endian is native to SPARC and PowerPC, | |
4622 | among others. The @code{native-endianness} procedure returns the | |
4623 | native endianness of the machine it runs on. | |
b242715b LC |
4624 | |
4625 | @deffn {Scheme Procedure} native-endianness | |
4626 | @deffnx {C Function} scm_native_endianness () | |
4627 | Return a value denoting the native endianness of the host machine. | |
4628 | @end deffn | |
4629 | ||
4630 | @deffn {Scheme Macro} endianness symbol | |
4631 | Return an object denoting the endianness specified by @var{symbol}. If | |
5fa2deb3 AW |
4632 | @var{symbol} is neither @code{big} nor @code{little} then an error is |
4633 | raised at expand-time. | |
b242715b LC |
4634 | @end deffn |
4635 | ||
4636 | @defvr {C Variable} scm_endianness_big | |
4637 | @defvrx {C Variable} scm_endianness_little | |
5fa2deb3 | 4638 | The objects denoting big- and little-endianness, respectively. |
b242715b LC |
4639 | @end defvr |
4640 | ||
4641 | ||
4642 | @node Bytevector Manipulation | |
4643 | @subsubsection Manipulating Bytevectors | |
4644 | ||
4645 | Bytevectors can be created, copied, and analyzed with the following | |
404bb5f8 | 4646 | procedures and C functions. |
b242715b LC |
4647 | |
4648 | @deffn {Scheme Procedure} make-bytevector len [fill] | |
4649 | @deffnx {C Function} scm_make_bytevector (len, fill) | |
2d34e924 | 4650 | @deffnx {C Function} scm_c_make_bytevector (size_t len) |
b242715b | 4651 | Return a new bytevector of @var{len} bytes. Optionally, if @var{fill} |
d64fc8b0 LC |
4652 | is given, fill it with @var{fill}; @var{fill} must be in the range |
4653 | [-128,255]. | |
b242715b LC |
4654 | @end deffn |
4655 | ||
4656 | @deffn {Scheme Procedure} bytevector? obj | |
4657 | @deffnx {C Function} scm_bytevector_p (obj) | |
4658 | Return true if @var{obj} is a bytevector. | |
4659 | @end deffn | |
4660 | ||
404bb5f8 LC |
4661 | @deftypefn {C Function} int scm_is_bytevector (SCM obj) |
4662 | Equivalent to @code{scm_is_true (scm_bytevector_p (obj))}. | |
4663 | @end deftypefn | |
4664 | ||
b242715b LC |
4665 | @deffn {Scheme Procedure} bytevector-length bv |
4666 | @deffnx {C Function} scm_bytevector_length (bv) | |
4667 | Return the length in bytes of bytevector @var{bv}. | |
4668 | @end deffn | |
4669 | ||
404bb5f8 LC |
4670 | @deftypefn {C Function} size_t scm_c_bytevector_length (SCM bv) |
4671 | Likewise, return the length in bytes of bytevector @var{bv}. | |
4672 | @end deftypefn | |
4673 | ||
b242715b LC |
4674 | @deffn {Scheme Procedure} bytevector=? bv1 bv2 |
4675 | @deffnx {C Function} scm_bytevector_eq_p (bv1, bv2) | |
4676 | Return is @var{bv1} equals to @var{bv2}---i.e., if they have the same | |
4677 | length and contents. | |
4678 | @end deffn | |
4679 | ||
4680 | @deffn {Scheme Procedure} bytevector-fill! bv fill | |
4681 | @deffnx {C Function} scm_bytevector_fill_x (bv, fill) | |
4682 | Fill bytevector @var{bv} with @var{fill}, a byte. | |
4683 | @end deffn | |
4684 | ||
4685 | @deffn {Scheme Procedure} bytevector-copy! source source-start target target-start len | |
4686 | @deffnx {C Function} scm_bytevector_copy_x (source, source_start, target, target_start, len) | |
4687 | Copy @var{len} bytes from @var{source} into @var{target}, starting | |
4688 | reading from @var{source-start} (a positive index within @var{source}) | |
80719649 LC |
4689 | and start writing at @var{target-start}. It is permitted for the |
4690 | @var{source} and @var{target} regions to overlap. | |
b242715b LC |
4691 | @end deffn |
4692 | ||
4693 | @deffn {Scheme Procedure} bytevector-copy bv | |
4694 | @deffnx {C Function} scm_bytevector_copy (bv) | |
4695 | Return a newly allocated copy of @var{bv}. | |
4696 | @end deffn | |
4697 | ||
404bb5f8 LC |
4698 | @deftypefn {C Function} scm_t_uint8 scm_c_bytevector_ref (SCM bv, size_t index) |
4699 | Return the byte at @var{index} in bytevector @var{bv}. | |
4700 | @end deftypefn | |
4701 | ||
4702 | @deftypefn {C Function} void scm_c_bytevector_set_x (SCM bv, size_t index, scm_t_uint8 value) | |
4703 | Set the byte at @var{index} in @var{bv} to @var{value}. | |
4704 | @end deftypefn | |
4705 | ||
b242715b LC |
4706 | Low-level C macros are available. They do not perform any |
4707 | type-checking; as such they should be used with care. | |
4708 | ||
4709 | @deftypefn {C Macro} size_t SCM_BYTEVECTOR_LENGTH (bv) | |
4710 | Return the length in bytes of bytevector @var{bv}. | |
4711 | @end deftypefn | |
4712 | ||
4713 | @deftypefn {C Macro} {signed char *} SCM_BYTEVECTOR_CONTENTS (bv) | |
4714 | Return a pointer to the contents of bytevector @var{bv}. | |
4715 | @end deftypefn | |
4716 | ||
4717 | ||
4718 | @node Bytevectors as Integers | |
4719 | @subsubsection Interpreting Bytevector Contents as Integers | |
4720 | ||
4721 | The contents of a bytevector can be interpreted as a sequence of | |
4722 | integers of any given size, sign, and endianness. | |
4723 | ||
4724 | @lisp | |
4725 | (let ((bv (make-bytevector 4))) | |
4726 | (bytevector-u8-set! bv 0 #x12) | |
4727 | (bytevector-u8-set! bv 1 #x34) | |
4728 | (bytevector-u8-set! bv 2 #x56) | |
4729 | (bytevector-u8-set! bv 3 #x78) | |
4730 | ||
4731 | (map (lambda (number) | |
4732 | (number->string number 16)) | |
4733 | (list (bytevector-u8-ref bv 0) | |
4734 | (bytevector-u16-ref bv 0 (endianness big)) | |
4735 | (bytevector-u32-ref bv 0 (endianness little))))) | |
4736 | ||
4737 | @result{} ("12" "1234" "78563412") | |
4738 | @end lisp | |
4739 | ||
4740 | The most generic procedures to interpret bytevector contents as integers | |
4741 | are described below. | |
4742 | ||
4743 | @deffn {Scheme Procedure} bytevector-uint-ref bv index endianness size | |
b242715b | 4744 | @deffnx {C Function} scm_bytevector_uint_ref (bv, index, endianness, size) |
4827afeb NJ |
4745 | Return the @var{size}-byte long unsigned integer at index @var{index} in |
4746 | @var{bv}, decoded according to @var{endianness}. | |
4747 | @end deffn | |
4748 | ||
4749 | @deffn {Scheme Procedure} bytevector-sint-ref bv index endianness size | |
b242715b | 4750 | @deffnx {C Function} scm_bytevector_sint_ref (bv, index, endianness, size) |
4827afeb NJ |
4751 | Return the @var{size}-byte long signed integer at index @var{index} in |
4752 | @var{bv}, decoded according to @var{endianness}. | |
b242715b LC |
4753 | @end deffn |
4754 | ||
4755 | @deffn {Scheme Procedure} bytevector-uint-set! bv index value endianness size | |
b242715b | 4756 | @deffnx {C Function} scm_bytevector_uint_set_x (bv, index, value, endianness, size) |
4827afeb NJ |
4757 | Set the @var{size}-byte long unsigned integer at @var{index} to |
4758 | @var{value}, encoded according to @var{endianness}. | |
4759 | @end deffn | |
4760 | ||
4761 | @deffn {Scheme Procedure} bytevector-sint-set! bv index value endianness size | |
b242715b | 4762 | @deffnx {C Function} scm_bytevector_sint_set_x (bv, index, value, endianness, size) |
4827afeb NJ |
4763 | Set the @var{size}-byte long signed integer at @var{index} to |
4764 | @var{value}, encoded according to @var{endianness}. | |
b242715b LC |
4765 | @end deffn |
4766 | ||
4767 | The following procedures are similar to the ones above, but specialized | |
4768 | to a given integer size: | |
4769 | ||
4770 | @deffn {Scheme Procedure} bytevector-u8-ref bv index | |
4771 | @deffnx {Scheme Procedure} bytevector-s8-ref bv index | |
4772 | @deffnx {Scheme Procedure} bytevector-u16-ref bv index endianness | |
4773 | @deffnx {Scheme Procedure} bytevector-s16-ref bv index endianness | |
4774 | @deffnx {Scheme Procedure} bytevector-u32-ref bv index endianness | |
4775 | @deffnx {Scheme Procedure} bytevector-s32-ref bv index endianness | |
4776 | @deffnx {Scheme Procedure} bytevector-u64-ref bv index endianness | |
4777 | @deffnx {Scheme Procedure} bytevector-s64-ref bv index endianness | |
4778 | @deffnx {C Function} scm_bytevector_u8_ref (bv, index) | |
4779 | @deffnx {C Function} scm_bytevector_s8_ref (bv, index) | |
4780 | @deffnx {C Function} scm_bytevector_u16_ref (bv, index, endianness) | |
4781 | @deffnx {C Function} scm_bytevector_s16_ref (bv, index, endianness) | |
4782 | @deffnx {C Function} scm_bytevector_u32_ref (bv, index, endianness) | |
4783 | @deffnx {C Function} scm_bytevector_s32_ref (bv, index, endianness) | |
4784 | @deffnx {C Function} scm_bytevector_u64_ref (bv, index, endianness) | |
4785 | @deffnx {C Function} scm_bytevector_s64_ref (bv, index, endianness) | |
4786 | Return the unsigned @var{n}-bit (signed) integer (where @var{n} is 8, | |
4787 | 16, 32 or 64) from @var{bv} at @var{index}, decoded according to | |
4788 | @var{endianness}. | |
4789 | @end deffn | |
4790 | ||
4791 | @deffn {Scheme Procedure} bytevector-u8-set! bv index value | |
4792 | @deffnx {Scheme Procedure} bytevector-s8-set! bv index value | |
4793 | @deffnx {Scheme Procedure} bytevector-u16-set! bv index value endianness | |
4794 | @deffnx {Scheme Procedure} bytevector-s16-set! bv index value endianness | |
4795 | @deffnx {Scheme Procedure} bytevector-u32-set! bv index value endianness | |
4796 | @deffnx {Scheme Procedure} bytevector-s32-set! bv index value endianness | |
4797 | @deffnx {Scheme Procedure} bytevector-u64-set! bv index value endianness | |
4798 | @deffnx {Scheme Procedure} bytevector-s64-set! bv index value endianness | |
4799 | @deffnx {C Function} scm_bytevector_u8_set_x (bv, index, value) | |
4800 | @deffnx {C Function} scm_bytevector_s8_set_x (bv, index, value) | |
4801 | @deffnx {C Function} scm_bytevector_u16_set_x (bv, index, value, endianness) | |
4802 | @deffnx {C Function} scm_bytevector_s16_set_x (bv, index, value, endianness) | |
4803 | @deffnx {C Function} scm_bytevector_u32_set_x (bv, index, value, endianness) | |
4804 | @deffnx {C Function} scm_bytevector_s32_set_x (bv, index, value, endianness) | |
4805 | @deffnx {C Function} scm_bytevector_u64_set_x (bv, index, value, endianness) | |
4806 | @deffnx {C Function} scm_bytevector_s64_set_x (bv, index, value, endianness) | |
4807 | Store @var{value} as an @var{n}-bit (signed) integer (where @var{n} is | |
4808 | 8, 16, 32 or 64) in @var{bv} at @var{index}, encoded according to | |
4809 | @var{endianness}. | |
4810 | @end deffn | |
4811 | ||
4812 | Finally, a variant specialized for the host's endianness is available | |
4813 | for each of these functions (with the exception of the @code{u8} | |
4814 | accessors, for obvious reasons): | |
4815 | ||
4816 | @deffn {Scheme Procedure} bytevector-u16-native-ref bv index | |
4817 | @deffnx {Scheme Procedure} bytevector-s16-native-ref bv index | |
4818 | @deffnx {Scheme Procedure} bytevector-u32-native-ref bv index | |
4819 | @deffnx {Scheme Procedure} bytevector-s32-native-ref bv index | |
4820 | @deffnx {Scheme Procedure} bytevector-u64-native-ref bv index | |
4821 | @deffnx {Scheme Procedure} bytevector-s64-native-ref bv index | |
4822 | @deffnx {C Function} scm_bytevector_u16_native_ref (bv, index) | |
4823 | @deffnx {C Function} scm_bytevector_s16_native_ref (bv, index) | |
4824 | @deffnx {C Function} scm_bytevector_u32_native_ref (bv, index) | |
4825 | @deffnx {C Function} scm_bytevector_s32_native_ref (bv, index) | |
4826 | @deffnx {C Function} scm_bytevector_u64_native_ref (bv, index) | |
4827 | @deffnx {C Function} scm_bytevector_s64_native_ref (bv, index) | |
4828 | Return the unsigned @var{n}-bit (signed) integer (where @var{n} is 8, | |
4829 | 16, 32 or 64) from @var{bv} at @var{index}, decoded according to the | |
4830 | host's native endianness. | |
4831 | @end deffn | |
4832 | ||
4833 | @deffn {Scheme Procedure} bytevector-u16-native-set! bv index value | |
4834 | @deffnx {Scheme Procedure} bytevector-s16-native-set! bv index value | |
4835 | @deffnx {Scheme Procedure} bytevector-u32-native-set! bv index value | |
4836 | @deffnx {Scheme Procedure} bytevector-s32-native-set! bv index value | |
4837 | @deffnx {Scheme Procedure} bytevector-u64-native-set! bv index value | |
4838 | @deffnx {Scheme Procedure} bytevector-s64-native-set! bv index value | |
4839 | @deffnx {C Function} scm_bytevector_u16_native_set_x (bv, index, value) | |
4840 | @deffnx {C Function} scm_bytevector_s16_native_set_x (bv, index, value) | |
4841 | @deffnx {C Function} scm_bytevector_u32_native_set_x (bv, index, value) | |
4842 | @deffnx {C Function} scm_bytevector_s32_native_set_x (bv, index, value) | |
4843 | @deffnx {C Function} scm_bytevector_u64_native_set_x (bv, index, value) | |
4844 | @deffnx {C Function} scm_bytevector_s64_native_set_x (bv, index, value) | |
4845 | Store @var{value} as an @var{n}-bit (signed) integer (where @var{n} is | |
4846 | 8, 16, 32 or 64) in @var{bv} at @var{index}, encoded according to the | |
4847 | host's native endianness. | |
4848 | @end deffn | |
4849 | ||
4850 | ||
4851 | @node Bytevectors and Integer Lists | |
4852 | @subsubsection Converting Bytevectors to/from Integer Lists | |
4853 | ||
4854 | Bytevector contents can readily be converted to/from lists of signed or | |
4855 | unsigned integers: | |
4856 | ||
4857 | @lisp | |
4858 | (bytevector->sint-list (u8-list->bytevector (make-list 4 255)) | |
4859 | (endianness little) 2) | |
4860 | @result{} (-1 -1) | |
4861 | @end lisp | |
4862 | ||
4863 | @deffn {Scheme Procedure} bytevector->u8-list bv | |
4864 | @deffnx {C Function} scm_bytevector_to_u8_list (bv) | |
4865 | Return a newly allocated list of unsigned 8-bit integers from the | |
4866 | contents of @var{bv}. | |
4867 | @end deffn | |
4868 | ||
4869 | @deffn {Scheme Procedure} u8-list->bytevector lst | |
4870 | @deffnx {C Function} scm_u8_list_to_bytevector (lst) | |
4871 | Return a newly allocated bytevector consisting of the unsigned 8-bit | |
4872 | integers listed in @var{lst}. | |
4873 | @end deffn | |
4874 | ||
4875 | @deffn {Scheme Procedure} bytevector->uint-list bv endianness size | |
b242715b | 4876 | @deffnx {C Function} scm_bytevector_to_uint_list (bv, endianness, size) |
4827afeb NJ |
4877 | Return a list of unsigned integers of @var{size} bytes representing the |
4878 | contents of @var{bv}, decoded according to @var{endianness}. | |
4879 | @end deffn | |
4880 | ||
4881 | @deffn {Scheme Procedure} bytevector->sint-list bv endianness size | |
b242715b | 4882 | @deffnx {C Function} scm_bytevector_to_sint_list (bv, endianness, size) |
4827afeb NJ |
4883 | Return a list of signed integers of @var{size} bytes representing the |
4884 | contents of @var{bv}, decoded according to @var{endianness}. | |
b242715b LC |
4885 | @end deffn |
4886 | ||
4887 | @deffn {Scheme Procedure} uint-list->bytevector lst endianness size | |
b242715b | 4888 | @deffnx {C Function} scm_uint_list_to_bytevector (lst, endianness, size) |
4827afeb NJ |
4889 | Return a new bytevector containing the unsigned integers listed in |
4890 | @var{lst} and encoded on @var{size} bytes according to @var{endianness}. | |
4891 | @end deffn | |
4892 | ||
4893 | @deffn {Scheme Procedure} sint-list->bytevector lst endianness size | |
b242715b | 4894 | @deffnx {C Function} scm_sint_list_to_bytevector (lst, endianness, size) |
4827afeb NJ |
4895 | Return a new bytevector containing the signed integers listed in |
4896 | @var{lst} and encoded on @var{size} bytes according to @var{endianness}. | |
b242715b LC |
4897 | @end deffn |
4898 | ||
4899 | @node Bytevectors as Floats | |
4900 | @subsubsection Interpreting Bytevector Contents as Floating Point Numbers | |
4901 | ||
4902 | @cindex IEEE-754 floating point numbers | |
4903 | ||
4904 | Bytevector contents can also be accessed as IEEE-754 single- or | |
4905 | double-precision floating point numbers (respectively 32 and 64-bit | |
4906 | long) using the procedures described here. | |
4907 | ||
4908 | @deffn {Scheme Procedure} bytevector-ieee-single-ref bv index endianness | |
4909 | @deffnx {Scheme Procedure} bytevector-ieee-double-ref bv index endianness | |
4910 | @deffnx {C Function} scm_bytevector_ieee_single_ref (bv, index, endianness) | |
4911 | @deffnx {C Function} scm_bytevector_ieee_double_ref (bv, index, endianness) | |
4912 | Return the IEEE-754 single-precision floating point number from @var{bv} | |
4913 | at @var{index} according to @var{endianness}. | |
4914 | @end deffn | |
4915 | ||
4916 | @deffn {Scheme Procedure} bytevector-ieee-single-set! bv index value endianness | |
4917 | @deffnx {Scheme Procedure} bytevector-ieee-double-set! bv index value endianness | |
4918 | @deffnx {C Function} scm_bytevector_ieee_single_set_x (bv, index, value, endianness) | |
4919 | @deffnx {C Function} scm_bytevector_ieee_double_set_x (bv, index, value, endianness) | |
4920 | Store real number @var{value} in @var{bv} at @var{index} according to | |
4921 | @var{endianness}. | |
4922 | @end deffn | |
4923 | ||
4924 | Specialized procedures are also available: | |
4925 | ||
4926 | @deffn {Scheme Procedure} bytevector-ieee-single-native-ref bv index | |
4927 | @deffnx {Scheme Procedure} bytevector-ieee-double-native-ref bv index | |
4928 | @deffnx {C Function} scm_bytevector_ieee_single_native_ref (bv, index) | |
4929 | @deffnx {C Function} scm_bytevector_ieee_double_native_ref (bv, index) | |
4930 | Return the IEEE-754 single-precision floating point number from @var{bv} | |
4931 | at @var{index} according to the host's native endianness. | |
4932 | @end deffn | |
4933 | ||
4934 | @deffn {Scheme Procedure} bytevector-ieee-single-native-set! bv index value | |
4935 | @deffnx {Scheme Procedure} bytevector-ieee-double-native-set! bv index value | |
4936 | @deffnx {C Function} scm_bytevector_ieee_single_native_set_x (bv, index, value) | |
4937 | @deffnx {C Function} scm_bytevector_ieee_double_native_set_x (bv, index, value) | |
4938 | Store real number @var{value} in @var{bv} at @var{index} according to | |
4939 | the host's native endianness. | |
4940 | @end deffn | |
4941 | ||
4942 | ||
4943 | @node Bytevectors as Strings | |
4944 | @subsubsection Interpreting Bytevector Contents as Unicode Strings | |
4945 | ||
4946 | @cindex Unicode string encoding | |
4947 | ||
4948 | Bytevector contents can also be interpreted as Unicode strings encoded | |
d3b5628c | 4949 | in one of the most commonly available encoding formats. |
f05bb849 | 4950 | @xref{Representing Strings as Bytes}, for a more generic interface. |
b242715b LC |
4951 | |
4952 | @lisp | |
4953 | (utf8->string (u8-list->bytevector '(99 97 102 101))) | |
4954 | @result{} "cafe" | |
4955 | ||
4956 | (string->utf8 "caf@'e") ;; SMALL LATIN LETTER E WITH ACUTE ACCENT | |
4957 | @result{} #vu8(99 97 102 195 169) | |
4958 | @end lisp | |
4959 | ||
4960 | @deffn {Scheme Procedure} string->utf8 str | |
524aa8ae LC |
4961 | @deffnx {Scheme Procedure} string->utf16 str [endianness] |
4962 | @deffnx {Scheme Procedure} string->utf32 str [endianness] | |
b242715b | 4963 | @deffnx {C Function} scm_string_to_utf8 (str) |
524aa8ae LC |
4964 | @deffnx {C Function} scm_string_to_utf16 (str, endianness) |
4965 | @deffnx {C Function} scm_string_to_utf32 (str, endianness) | |
b242715b | 4966 | Return a newly allocated bytevector that contains the UTF-8, UTF-16, or |
524aa8ae LC |
4967 | UTF-32 (aka. UCS-4) encoding of @var{str}. For UTF-16 and UTF-32, |
4968 | @var{endianness} should be the symbol @code{big} or @code{little}; when omitted, | |
4969 | it defaults to big endian. | |
b242715b LC |
4970 | @end deffn |
4971 | ||
4972 | @deffn {Scheme Procedure} utf8->string utf | |
524aa8ae LC |
4973 | @deffnx {Scheme Procedure} utf16->string utf [endianness] |
4974 | @deffnx {Scheme Procedure} utf32->string utf [endianness] | |
b242715b | 4975 | @deffnx {C Function} scm_utf8_to_string (utf) |
524aa8ae LC |
4976 | @deffnx {C Function} scm_utf16_to_string (utf, endianness) |
4977 | @deffnx {C Function} scm_utf32_to_string (utf, endianness) | |
b242715b | 4978 | Return a newly allocated string that contains from the UTF-8-, UTF-16-, |
524aa8ae LC |
4979 | or UTF-32-decoded contents of bytevector @var{utf}. For UTF-16 and UTF-32, |
4980 | @var{endianness} should be the symbol @code{big} or @code{little}; when omitted, | |
4981 | it defaults to big endian. | |
b242715b LC |
4982 | @end deffn |
4983 | ||
118ff892 AW |
4984 | @node Bytevectors as Arrays |
4985 | @subsubsection Accessing Bytevectors with the Array API | |
438974d0 LC |
4986 | |
4987 | As an extension to the R6RS, Guile allows bytevectors to be manipulated | |
118ff892 AW |
4988 | with the @dfn{array} procedures (@pxref{Arrays}). When using these |
4989 | APIs, bytes are accessed one at a time as 8-bit unsigned integers: | |
438974d0 LC |
4990 | |
4991 | @example | |
4992 | (define bv #vu8(0 1 2 3)) | |
4993 | ||
118ff892 | 4994 | (array? bv) |
438974d0 LC |
4995 | @result{} #t |
4996 | ||
118ff892 AW |
4997 | (array-rank bv) |
4998 | @result{} 1 | |
4999 | ||
5000 | (array-ref bv 2) | |
438974d0 LC |
5001 | @result{} 2 |
5002 | ||
118ff892 AW |
5003 | ;; Note the different argument order on array-set!. |
5004 | (array-set! bv 77 2) | |
438974d0 LC |
5005 | (array-ref bv 2) |
5006 | @result{} 77 | |
5007 | ||
5008 | (array-type bv) | |
5009 | @result{} vu8 | |
5010 | @end example | |
5011 | ||
b242715b | 5012 | |
27219b32 AW |
5013 | @node Bytevectors as Uniform Vectors |
5014 | @subsubsection Accessing Bytevectors with the SRFI-4 API | |
5015 | ||
5016 | Bytevectors may also be accessed with the SRFI-4 API. @xref{SRFI-4 and | |
5017 | Bytevectors}, for more information. | |
5018 | ||
5019 | ||
07d83abe MV |
5020 | @node Symbols |
5021 | @subsection Symbols | |
5022 | @tpindex Symbols | |
5023 | ||
5024 | Symbols in Scheme are widely used in three ways: as items of discrete | |
5025 | data, as lookup keys for alists and hash tables, and to denote variable | |
5026 | references. | |
5027 | ||
5028 | A @dfn{symbol} is similar to a string in that it is defined by a | |
5029 | sequence of characters. The sequence of characters is known as the | |
5030 | symbol's @dfn{name}. In the usual case --- that is, where the symbol's | |
5031 | name doesn't include any characters that could be confused with other | |
5032 | elements of Scheme syntax --- a symbol is written in a Scheme program by | |
5033 | writing the sequence of characters that make up the name, @emph{without} | |
5034 | any quotation marks or other special syntax. For example, the symbol | |
5035 | whose name is ``multiply-by-2'' is written, simply: | |
5036 | ||
5037 | @lisp | |
5038 | multiply-by-2 | |
5039 | @end lisp | |
5040 | ||
5041 | Notice how this differs from a @emph{string} with contents | |
5042 | ``multiply-by-2'', which is written with double quotation marks, like | |
5043 | this: | |
5044 | ||
5045 | @lisp | |
5046 | "multiply-by-2" | |
5047 | @end lisp | |
5048 | ||
5049 | Looking beyond how they are written, symbols are different from strings | |
5050 | in two important respects. | |
5051 | ||
5052 | The first important difference is uniqueness. If the same-looking | |
5053 | string is read twice from two different places in a program, the result | |
5054 | is two @emph{different} string objects whose contents just happen to be | |
5055 | the same. If, on the other hand, the same-looking symbol is read twice | |
5056 | from two different places in a program, the result is the @emph{same} | |
5057 | symbol object both times. | |
5058 | ||
5059 | Given two read symbols, you can use @code{eq?} to test whether they are | |
5060 | the same (that is, have the same name). @code{eq?} is the most | |
5061 | efficient comparison operator in Scheme, and comparing two symbols like | |
5062 | this is as fast as comparing, for example, two numbers. Given two | |
5063 | strings, on the other hand, you must use @code{equal?} or | |
5064 | @code{string=?}, which are much slower comparison operators, to | |
5065 | determine whether the strings have the same contents. | |
5066 | ||
5067 | @lisp | |
5068 | (define sym1 (quote hello)) | |
5069 | (define sym2 (quote hello)) | |
5070 | (eq? sym1 sym2) @result{} #t | |
5071 | ||
5072 | (define str1 "hello") | |
5073 | (define str2 "hello") | |
5074 | (eq? str1 str2) @result{} #f | |
5075 | (equal? str1 str2) @result{} #t | |
5076 | @end lisp | |
5077 | ||
5078 | The second important difference is that symbols, unlike strings, are not | |
5079 | self-evaluating. This is why we need the @code{(quote @dots{})}s in the | |
5080 | example above: @code{(quote hello)} evaluates to the symbol named | |
5081 | "hello" itself, whereas an unquoted @code{hello} is @emph{read} as the | |
5082 | symbol named "hello" and evaluated as a variable reference @dots{} about | |
5083 | which more below (@pxref{Symbol Variables}). | |
5084 | ||
5085 | @menu | |
5086 | * Symbol Data:: Symbols as discrete data. | |
5087 | * Symbol Keys:: Symbols as lookup keys. | |
5088 | * Symbol Variables:: Symbols as denoting variables. | |
5089 | * Symbol Primitives:: Operations related to symbols. | |
5090 | * Symbol Props:: Function slots and property lists. | |
5091 | * Symbol Read Syntax:: Extended read syntax for symbols. | |
5092 | * Symbol Uninterned:: Uninterned symbols. | |
5093 | @end menu | |
5094 | ||
5095 | ||
5096 | @node Symbol Data | |
5097 | @subsubsection Symbols as Discrete Data | |
5098 | ||
5099 | Numbers and symbols are similar to the extent that they both lend | |
5100 | themselves to @code{eq?} comparison. But symbols are more descriptive | |
5101 | than numbers, because a symbol's name can be used directly to describe | |
5102 | the concept for which that symbol stands. | |
5103 | ||
5104 | For example, imagine that you need to represent some colours in a | |
5105 | computer program. Using numbers, you would have to choose arbitrarily | |
5106 | some mapping between numbers and colours, and then take care to use that | |
5107 | mapping consistently: | |
5108 | ||
5109 | @lisp | |
5110 | ;; 1=red, 2=green, 3=purple | |
5111 | ||
5112 | (if (eq? (colour-of car) 1) | |
5113 | ...) | |
5114 | @end lisp | |
5115 | ||
5116 | @noindent | |
5117 | You can make the mapping more explicit and the code more readable by | |
5118 | defining constants: | |
5119 | ||
5120 | @lisp | |
5121 | (define red 1) | |
5122 | (define green 2) | |
5123 | (define purple 3) | |
5124 | ||
5125 | (if (eq? (colour-of car) red) | |
5126 | ...) | |
5127 | @end lisp | |
5128 | ||
5129 | @noindent | |
5130 | But the simplest and clearest approach is not to use numbers at all, but | |
5131 | symbols whose names specify the colours that they refer to: | |
5132 | ||
5133 | @lisp | |
5134 | (if (eq? (colour-of car) 'red) | |
5135 | ...) | |
5136 | @end lisp | |
5137 | ||
5138 | The descriptive advantages of symbols over numbers increase as the set | |
5139 | of concepts that you want to describe grows. Suppose that a car object | |
5140 | can have other properties as well, such as whether it has or uses: | |
5141 | ||
5142 | @itemize @bullet | |
5143 | @item | |
5144 | automatic or manual transmission | |
5145 | @item | |
5146 | leaded or unleaded fuel | |
5147 | @item | |
5148 | power steering (or not). | |
5149 | @end itemize | |
5150 | ||
5151 | @noindent | |
5152 | Then a car's combined property set could be naturally represented and | |
5153 | manipulated as a list of symbols: | |
5154 | ||
5155 | @lisp | |
5156 | (properties-of car1) | |
5157 | @result{} | |
5158 | (red manual unleaded power-steering) | |
5159 | ||
5160 | (if (memq 'power-steering (properties-of car1)) | |
5161 | (display "Unfit people can drive this car.\n") | |
5162 | (display "You'll need strong arms to drive this car!\n")) | |
5163 | @print{} | |
5164 | Unfit people can drive this car. | |
5165 | @end lisp | |
5166 | ||
5167 | Remember, the fundamental property of symbols that we are relying on | |
5168 | here is that an occurrence of @code{'red} in one part of a program is an | |
5169 | @emph{indistinguishable} symbol from an occurrence of @code{'red} in | |
5170 | another part of a program; this means that symbols can usefully be | |
5171 | compared using @code{eq?}. At the same time, symbols have naturally | |
5172 | descriptive names. This combination of efficiency and descriptive power | |
5173 | makes them ideal for use as discrete data. | |
5174 | ||
5175 | ||
5176 | @node Symbol Keys | |
5177 | @subsubsection Symbols as Lookup Keys | |
5178 | ||
5179 | Given their efficiency and descriptive power, it is natural to use | |
5180 | symbols as the keys in an association list or hash table. | |
5181 | ||
5182 | To illustrate this, consider a more structured representation of the car | |
5183 | properties example from the preceding subsection. Rather than | |
5184 | mixing all the properties up together in a flat list, we could use an | |
5185 | association list like this: | |
5186 | ||
5187 | @lisp | |
5188 | (define car1-properties '((colour . red) | |
5189 | (transmission . manual) | |
5190 | (fuel . unleaded) | |
5191 | (steering . power-assisted))) | |
5192 | @end lisp | |
5193 | ||
5194 | Notice how this structure is more explicit and extensible than the flat | |
5195 | list. For example it makes clear that @code{manual} refers to the | |
5196 | transmission rather than, say, the windows or the locking of the car. | |
5197 | It also allows further properties to use the same symbols among their | |
5198 | possible values without becoming ambiguous: | |
5199 | ||
5200 | @lisp | |
5201 | (define car1-properties '((colour . red) | |
5202 | (transmission . manual) | |
5203 | (fuel . unleaded) | |
5204 | (steering . power-assisted) | |
5205 | (seat-colour . red) | |
5206 | (locking . manual))) | |
5207 | @end lisp | |
5208 | ||
5209 | With a representation like this, it is easy to use the efficient | |
5210 | @code{assq-XXX} family of procedures (@pxref{Association Lists}) to | |
5211 | extract or change individual pieces of information: | |
5212 | ||
5213 | @lisp | |
5214 | (assq-ref car1-properties 'fuel) @result{} unleaded | |
5215 | (assq-ref car1-properties 'transmission) @result{} manual | |
5216 | ||
5217 | (assq-set! car1-properties 'seat-colour 'black) | |
5218 | @result{} | |
5219 | ((colour . red) | |
5220 | (transmission . manual) | |
5221 | (fuel . unleaded) | |
5222 | (steering . power-assisted) | |
5223 | (seat-colour . black) | |
5224 | (locking . manual))) | |
5225 | @end lisp | |
5226 | ||
5227 | Hash tables also have keys, and exactly the same arguments apply to the | |
5228 | use of symbols in hash tables as in association lists. The hash value | |
5229 | that Guile uses to decide where to add a symbol-keyed entry to a hash | |
5230 | table can be obtained by calling the @code{symbol-hash} procedure: | |
5231 | ||
5232 | @deffn {Scheme Procedure} symbol-hash symbol | |
5233 | @deffnx {C Function} scm_symbol_hash (symbol) | |
5234 | Return a hash value for @var{symbol}. | |
5235 | @end deffn | |
5236 | ||
5237 | See @ref{Hash Tables} for information about hash tables in general, and | |
5238 | for why you might choose to use a hash table rather than an association | |
5239 | list. | |
5240 | ||
5241 | ||
5242 | @node Symbol Variables | |
5243 | @subsubsection Symbols as Denoting Variables | |
5244 | ||
5245 | When an unquoted symbol in a Scheme program is evaluated, it is | |
5246 | interpreted as a variable reference, and the result of the evaluation is | |
5247 | the appropriate variable's value. | |
5248 | ||
5249 | For example, when the expression @code{(string-length "abcd")} is read | |
5250 | and evaluated, the sequence of characters @code{string-length} is read | |
5251 | as the symbol whose name is "string-length". This symbol is associated | |
5252 | with a variable whose value is the procedure that implements string | |
5253 | length calculation. Therefore evaluation of the @code{string-length} | |
5254 | symbol results in that procedure. | |
5255 | ||
5256 | The details of the connection between an unquoted symbol and the | |
5257 | variable to which it refers are explained elsewhere. See @ref{Binding | |
5258 | Constructs}, for how associations between symbols and variables are | |
5259 | created, and @ref{Modules}, for how those associations are affected by | |
5260 | Guile's module system. | |
5261 | ||
5262 | ||
5263 | @node Symbol Primitives | |
5264 | @subsubsection Operations Related to Symbols | |
5265 | ||
5266 | Given any Scheme value, you can determine whether it is a symbol using | |
5267 | the @code{symbol?} primitive: | |
5268 | ||
5269 | @rnindex symbol? | |
5270 | @deffn {Scheme Procedure} symbol? obj | |
5271 | @deffnx {C Function} scm_symbol_p (obj) | |
5272 | Return @code{#t} if @var{obj} is a symbol, otherwise return | |
5273 | @code{#f}. | |
5274 | @end deffn | |
5275 | ||
c9dc8c6c MV |
5276 | @deftypefn {C Function} int scm_is_symbol (SCM val) |
5277 | Equivalent to @code{scm_is_true (scm_symbol_p (val))}. | |
5278 | @end deftypefn | |
5279 | ||
07d83abe MV |
5280 | Once you know that you have a symbol, you can obtain its name as a |
5281 | string by calling @code{symbol->string}. Note that Guile differs by | |
5282 | default from R5RS on the details of @code{symbol->string} as regards | |
5283 | case-sensitivity: | |
5284 | ||
5285 | @rnindex symbol->string | |
5286 | @deffn {Scheme Procedure} symbol->string s | |
5287 | @deffnx {C Function} scm_symbol_to_string (s) | |
5288 | Return the name of symbol @var{s} as a string. By default, Guile reads | |
5289 | symbols case-sensitively, so the string returned will have the same case | |
5290 | variation as the sequence of characters that caused @var{s} to be | |
5291 | created. | |
5292 | ||
5293 | If Guile is set to read symbols case-insensitively (as specified by | |
5294 | R5RS), and @var{s} comes into being as part of a literal expression | |
5295 | (@pxref{Literal expressions,,,r5rs, The Revised^5 Report on Scheme}) or | |
5296 | by a call to the @code{read} or @code{string-ci->symbol} procedures, | |
5297 | Guile converts any alphabetic characters in the symbol's name to | |
5298 | lower case before creating the symbol object, so the string returned | |
5299 | here will be in lower case. | |
5300 | ||
5301 | If @var{s} was created by @code{string->symbol}, the case of characters | |
5302 | in the string returned will be the same as that in the string that was | |
5303 | passed to @code{string->symbol}, regardless of Guile's case-sensitivity | |
5304 | setting at the time @var{s} was created. | |
5305 | ||
5306 | It is an error to apply mutation procedures like @code{string-set!} to | |
5307 | strings returned by this procedure. | |
5308 | @end deffn | |
5309 | ||
5310 | Most symbols are created by writing them literally in code. However it | |
5311 | is also possible to create symbols programmatically using the following | |
c5fc8f8c JG |
5312 | procedures: |
5313 | ||
5314 | @deffn {Scheme Procedure} symbol char@dots{} | |
5315 | @rnindex symbol | |
5316 | Return a newly allocated symbol made from the given character arguments. | |
5317 | ||
5318 | @example | |
5319 | (symbol #\x #\y #\z) @result{} xyz | |
5320 | @end example | |
5321 | @end deffn | |
5322 | ||
5323 | @deffn {Scheme Procedure} list->symbol lst | |
5324 | @rnindex list->symbol | |
5325 | Return a newly allocated symbol made from a list of characters. | |
5326 | ||
5327 | @example | |
5328 | (list->symbol '(#\a #\b #\c)) @result{} abc | |
5329 | @end example | |
5330 | @end deffn | |
5331 | ||
5332 | @rnindex symbol-append | |
df0a1002 | 5333 | @deffn {Scheme Procedure} symbol-append arg @dots{} |
c5fc8f8c | 5334 | Return a newly allocated symbol whose characters form the |
df0a1002 | 5335 | concatenation of the given symbols, @var{arg} @enddots{}. |
c5fc8f8c JG |
5336 | |
5337 | @example | |
5338 | (let ((h 'hello)) | |
5339 | (symbol-append h 'world)) | |
5340 | @result{} helloworld | |
5341 | @end example | |
5342 | @end deffn | |
07d83abe MV |
5343 | |
5344 | @rnindex string->symbol | |
5345 | @deffn {Scheme Procedure} string->symbol string | |
5346 | @deffnx {C Function} scm_string_to_symbol (string) | |
5347 | Return the symbol whose name is @var{string}. This procedure can create | |
5348 | symbols with names containing special characters or letters in the | |
5349 | non-standard case, but it is usually a bad idea to create such symbols | |
5350 | because in some implementations of Scheme they cannot be read as | |
5351 | themselves. | |
5352 | @end deffn | |
5353 | ||
5354 | @deffn {Scheme Procedure} string-ci->symbol str | |
5355 | @deffnx {C Function} scm_string_ci_to_symbol (str) | |
5356 | Return the symbol whose name is @var{str}. If Guile is currently | |
5357 | reading symbols case-insensitively, @var{str} is converted to lowercase | |
5358 | before the returned symbol is looked up or created. | |
5359 | @end deffn | |
5360 | ||
5361 | The following examples illustrate Guile's detailed behaviour as regards | |
5362 | the case-sensitivity of symbols: | |
5363 | ||
5364 | @lisp | |
5365 | (read-enable 'case-insensitive) ; R5RS compliant behaviour | |
5366 | ||
5367 | (symbol->string 'flying-fish) @result{} "flying-fish" | |
5368 | (symbol->string 'Martin) @result{} "martin" | |
5369 | (symbol->string | |
5370 | (string->symbol "Malvina")) @result{} "Malvina" | |
5371 | ||
5372 | (eq? 'mISSISSIppi 'mississippi) @result{} #t | |
5373 | (string->symbol "mISSISSIppi") @result{} mISSISSIppi | |
5374 | (eq? 'bitBlt (string->symbol "bitBlt")) @result{} #f | |
5375 | (eq? 'LolliPop | |
5376 | (string->symbol (symbol->string 'LolliPop))) @result{} #t | |
5377 | (string=? "K. Harper, M.D." | |
5378 | (symbol->string | |
5379 | (string->symbol "K. Harper, M.D."))) @result{} #t | |
5380 | ||
5381 | (read-disable 'case-insensitive) ; Guile default behaviour | |
5382 | ||
5383 | (symbol->string 'flying-fish) @result{} "flying-fish" | |
5384 | (symbol->string 'Martin) @result{} "Martin" | |
5385 | (symbol->string | |
5386 | (string->symbol "Malvina")) @result{} "Malvina" | |
5387 | ||
5388 | (eq? 'mISSISSIppi 'mississippi) @result{} #f | |
5389 | (string->symbol "mISSISSIppi") @result{} mISSISSIppi | |
5390 | (eq? 'bitBlt (string->symbol "bitBlt")) @result{} #t | |
5391 | (eq? 'LolliPop | |
5392 | (string->symbol (symbol->string 'LolliPop))) @result{} #t | |
5393 | (string=? "K. Harper, M.D." | |
5394 | (symbol->string | |
5395 | (string->symbol "K. Harper, M.D."))) @result{} #t | |
5396 | @end lisp | |
5397 | ||
5398 | From C, there are lower level functions that construct a Scheme symbol | |
c48c62d0 MV |
5399 | from a C string in the current locale encoding. |
5400 | ||
5401 | When you want to do more from C, you should convert between symbols | |
5402 | and strings using @code{scm_symbol_to_string} and | |
5403 | @code{scm_string_to_symbol} and work with the strings. | |
07d83abe | 5404 | |
a71e79c3 MW |
5405 | @deftypefn {C Function} SCM scm_from_latin1_symbol (const char *name) |
5406 | @deftypefnx {C Function} SCM scm_from_utf8_symbol (const char *name) | |
ce3ce21c MW |
5407 | Construct and return a Scheme symbol whose name is specified by the |
5408 | null-terminated C string @var{name}. These are appropriate when | |
5409 | the C string is hard-coded in the source code. | |
5f6ffd66 | 5410 | @end deftypefn |
ce3ce21c | 5411 | |
a71e79c3 MW |
5412 | @deftypefn {C Function} SCM scm_from_locale_symbol (const char *name) |
5413 | @deftypefnx {C Function} SCM scm_from_locale_symboln (const char *name, size_t len) | |
07d83abe | 5414 | Construct and return a Scheme symbol whose name is specified by |
c48c62d0 MV |
5415 | @var{name}. For @code{scm_from_locale_symbol}, @var{name} must be null |
5416 | terminated; for @code{scm_from_locale_symboln} the length of @var{name} is | |
07d83abe | 5417 | specified explicitly by @var{len}. |
ce3ce21c MW |
5418 | |
5419 | Note that these functions should @emph{not} be used when @var{name} is a | |
5420 | C string constant, because there is no guarantee that the current locale | |
a71e79c3 MW |
5421 | will match that of the execution character set, used for string and |
5422 | character constants. Most modern C compilers use UTF-8 by default, so | |
5423 | in such cases we recommend @code{scm_from_utf8_symbol}. | |
5f6ffd66 | 5424 | @end deftypefn |
07d83abe | 5425 | |
fd0a5bbc HWN |
5426 | @deftypefn {C Function} SCM scm_take_locale_symbol (char *str) |
5427 | @deftypefnx {C Function} SCM scm_take_locale_symboln (char *str, size_t len) | |
5428 | Like @code{scm_from_locale_symbol} and @code{scm_from_locale_symboln}, | |
5429 | respectively, but also frees @var{str} with @code{free} eventually. | |
5430 | Thus, you can use this function when you would free @var{str} anyway | |
5431 | immediately after creating the Scheme string. In certain cases, Guile | |
5432 | can then use @var{str} directly as its internal representation. | |
5433 | @end deftypefn | |
5434 | ||
071bb6a8 LC |
5435 | The size of a symbol can also be obtained from C: |
5436 | ||
5437 | @deftypefn {C Function} size_t scm_c_symbol_length (SCM sym) | |
5438 | Return the number of characters in @var{sym}. | |
5439 | @end deftypefn | |
fd0a5bbc | 5440 | |
07d83abe MV |
5441 | Finally, some applications, especially those that generate new Scheme |
5442 | code dynamically, need to generate symbols for use in the generated | |
5443 | code. The @code{gensym} primitive meets this need: | |
5444 | ||
5445 | @deffn {Scheme Procedure} gensym [prefix] | |
5446 | @deffnx {C Function} scm_gensym (prefix) | |
5447 | Create a new symbol with a name constructed from a prefix and a counter | |
5448 | value. The string @var{prefix} can be specified as an optional | |
5449 | argument. Default prefix is @samp{@w{ g}}. The counter is increased by 1 | |
5450 | at each call. There is no provision for resetting the counter. | |
5451 | @end deffn | |
5452 | ||
5453 | The symbols generated by @code{gensym} are @emph{likely} to be unique, | |
5454 | since their names begin with a space and it is only otherwise possible | |
5455 | to generate such symbols if a programmer goes out of their way to do | |
5456 | so. Uniqueness can be guaranteed by instead using uninterned symbols | |
5457 | (@pxref{Symbol Uninterned}), though they can't be usefully written out | |
5458 | and read back in. | |
5459 | ||
5460 | ||
5461 | @node Symbol Props | |
5462 | @subsubsection Function Slots and Property Lists | |
5463 | ||
5464 | In traditional Lisp dialects, symbols are often understood as having | |
5465 | three kinds of value at once: | |
5466 | ||
5467 | @itemize @bullet | |
5468 | @item | |
5469 | a @dfn{variable} value, which is used when the symbol appears in | |
5470 | code in a variable reference context | |
5471 | ||
5472 | @item | |
5473 | a @dfn{function} value, which is used when the symbol appears in | |
679cceed | 5474 | code in a function name position (i.e.@: as the first element in an |
07d83abe MV |
5475 | unquoted list) |
5476 | ||
5477 | @item | |
5478 | a @dfn{property list} value, which is used when the symbol is given as | |
5479 | the first argument to Lisp's @code{put} or @code{get} functions. | |
5480 | @end itemize | |
5481 | ||
5482 | Although Scheme (as one of its simplifications with respect to Lisp) | |
5483 | does away with the distinction between variable and function namespaces, | |
5484 | Guile currently retains some elements of the traditional structure in | |
5485 | case they turn out to be useful when implementing translators for other | |
5486 | languages, in particular Emacs Lisp. | |
5487 | ||
ecb87335 RW |
5488 | Specifically, Guile symbols have two extra slots, one for a symbol's |
5489 | property list, and one for its ``function value.'' The following procedures | |
07d83abe MV |
5490 | are provided to access these slots. |
5491 | ||
5492 | @deffn {Scheme Procedure} symbol-fref symbol | |
5493 | @deffnx {C Function} scm_symbol_fref (symbol) | |
5494 | Return the contents of @var{symbol}'s @dfn{function slot}. | |
5495 | @end deffn | |
5496 | ||
5497 | @deffn {Scheme Procedure} symbol-fset! symbol value | |
5498 | @deffnx {C Function} scm_symbol_fset_x (symbol, value) | |
5499 | Set the contents of @var{symbol}'s function slot to @var{value}. | |
5500 | @end deffn | |
5501 | ||
5502 | @deffn {Scheme Procedure} symbol-pref symbol | |
5503 | @deffnx {C Function} scm_symbol_pref (symbol) | |
5504 | Return the @dfn{property list} currently associated with @var{symbol}. | |
5505 | @end deffn | |
5506 | ||
5507 | @deffn {Scheme Procedure} symbol-pset! symbol value | |
5508 | @deffnx {C Function} scm_symbol_pset_x (symbol, value) | |
5509 | Set @var{symbol}'s property list to @var{value}. | |
5510 | @end deffn | |
5511 | ||
5512 | @deffn {Scheme Procedure} symbol-property sym prop | |
5513 | From @var{sym}'s property list, return the value for property | |
5514 | @var{prop}. The assumption is that @var{sym}'s property list is an | |
5515 | association list whose keys are distinguished from each other using | |
5516 | @code{equal?}; @var{prop} should be one of the keys in that list. If | |
5517 | the property list has no entry for @var{prop}, @code{symbol-property} | |
5518 | returns @code{#f}. | |
5519 | @end deffn | |
5520 | ||
5521 | @deffn {Scheme Procedure} set-symbol-property! sym prop val | |
5522 | In @var{sym}'s property list, set the value for property @var{prop} to | |
5523 | @var{val}, or add a new entry for @var{prop}, with value @var{val}, if | |
5524 | none already exists. For the structure of the property list, see | |
5525 | @code{symbol-property}. | |
5526 | @end deffn | |
5527 | ||
5528 | @deffn {Scheme Procedure} symbol-property-remove! sym prop | |
5529 | From @var{sym}'s property list, remove the entry for property | |
5530 | @var{prop}, if there is one. For the structure of the property list, | |
5531 | see @code{symbol-property}. | |
5532 | @end deffn | |
5533 | ||
5534 | Support for these extra slots may be removed in a future release, and it | |
4695789c NJ |
5535 | is probably better to avoid using them. For a more modern and Schemely |
5536 | approach to properties, see @ref{Object Properties}. | |
07d83abe MV |
5537 | |
5538 | ||
5539 | @node Symbol Read Syntax | |
5540 | @subsubsection Extended Read Syntax for Symbols | |
5541 | ||
5542 | The read syntax for a symbol is a sequence of letters, digits, and | |
5543 | @dfn{extended alphabetic characters}, beginning with a character that | |
5544 | cannot begin a number. In addition, the special cases of @code{+}, | |
5545 | @code{-}, and @code{...} are read as symbols even though numbers can | |
5546 | begin with @code{+}, @code{-} or @code{.}. | |
5547 | ||
5548 | Extended alphabetic characters may be used within identifiers as if | |
5549 | they were letters. The set of extended alphabetic characters is: | |
5550 | ||
5551 | @example | |
5552 | ! $ % & * + - . / : < = > ? @@ ^ _ ~ | |
5553 | @end example | |
5554 | ||
5555 | In addition to the standard read syntax defined above (which is taken | |
5556 | from R5RS (@pxref{Formal syntax,,,r5rs,The Revised^5 Report on | |
5557 | Scheme})), Guile provides an extended symbol read syntax that allows the | |
5558 | inclusion of unusual characters such as space characters, newlines and | |
5559 | parentheses. If (for whatever reason) you need to write a symbol | |
5560 | containing characters not mentioned above, you can do so as follows. | |
5561 | ||
5562 | @itemize @bullet | |
5563 | @item | |
5564 | Begin the symbol with the characters @code{#@{}, | |
5565 | ||
5566 | @item | |
5567 | write the characters of the symbol and | |
5568 | ||
5569 | @item | |
5570 | finish the symbol with the characters @code{@}#}. | |
5571 | @end itemize | |
5572 | ||
5573 | Here are a few examples of this form of read syntax. The first symbol | |
5574 | needs to use extended syntax because it contains a space character, the | |
5575 | second because it contains a line break, and the last because it looks | |
5576 | like a number. | |
5577 | ||
5578 | @lisp | |
5579 | #@{foo bar@}# | |
5580 | ||
5581 | #@{what | |
5582 | ever@}# | |
5583 | ||
5584 | #@{4242@}# | |
5585 | @end lisp | |
5586 | ||
5587 | Although Guile provides this extended read syntax for symbols, | |
5588 | widespread usage of it is discouraged because it is not portable and not | |
5589 | very readable. | |
5590 | ||
dc59631d MW |
5591 | Alternatively, if you enable the @code{r7rs-symbols} read option (see |
5592 | @pxref{Scheme Read}), you can write arbitrary symbols using the same | |
5593 | notation used for strings, except delimited by vertical bars instead of | |
5594 | double quotes. | |
5595 | ||
5596 | @example | |
5597 | |foo bar| | |
5598 | |\x3BB; is a greek lambda| | |
5599 | |\| is a vertical bar| | |
5600 | @end example | |
07d83abe MV |
5601 | |
5602 | @node Symbol Uninterned | |
5603 | @subsubsection Uninterned Symbols | |
5604 | ||
5605 | What makes symbols useful is that they are automatically kept unique. | |
5606 | There are no two symbols that are distinct objects but have the same | |
5607 | name. But of course, there is no rule without exception. In addition | |
5608 | to the normal symbols that have been discussed up to now, you can also | |
5609 | create special @dfn{uninterned} symbols that behave slightly | |
5610 | differently. | |
5611 | ||
5612 | To understand what is different about them and why they might be useful, | |
5613 | we look at how normal symbols are actually kept unique. | |
5614 | ||
5615 | Whenever Guile wants to find the symbol with a specific name, for | |
5616 | example during @code{read} or when executing @code{string->symbol}, it | |
5617 | first looks into a table of all existing symbols to find out whether a | |
5618 | symbol with the given name already exists. When this is the case, Guile | |
5619 | just returns that symbol. When not, a new symbol with the name is | |
5620 | created and entered into the table so that it can be found later. | |
5621 | ||
5622 | Sometimes you might want to create a symbol that is guaranteed `fresh', | |
679cceed | 5623 | i.e.@: a symbol that did not exist previously. You might also want to |
07d83abe MV |
5624 | somehow guarantee that no one else will ever unintentionally stumble |
5625 | across your symbol in the future. These properties of a symbol are | |
5626 | often needed when generating code during macro expansion. When | |
5627 | introducing new temporary variables, you want to guarantee that they | |
5628 | don't conflict with variables in other people's code. | |
5629 | ||
5630 | The simplest way to arrange for this is to create a new symbol but | |
5631 | not enter it into the global table of all symbols. That way, no one | |
5632 | will ever get access to your symbol by chance. Symbols that are not in | |
5633 | the table are called @dfn{uninterned}. Of course, symbols that | |
5634 | @emph{are} in the table are called @dfn{interned}. | |
5635 | ||
5636 | You create new uninterned symbols with the function @code{make-symbol}. | |
5637 | You can test whether a symbol is interned or not with | |
5638 | @code{symbol-interned?}. | |
5639 | ||
5640 | Uninterned symbols break the rule that the name of a symbol uniquely | |
5641 | identifies the symbol object. Because of this, they can not be written | |
5642 | out and read back in like interned symbols. Currently, Guile has no | |
5643 | support for reading uninterned symbols. Note that the function | |
5644 | @code{gensym} does not return uninterned symbols for this reason. | |
5645 | ||
5646 | @deffn {Scheme Procedure} make-symbol name | |
5647 | @deffnx {C Function} scm_make_symbol (name) | |
5648 | Return a new uninterned symbol with the name @var{name}. The returned | |
5649 | symbol is guaranteed to be unique and future calls to | |
5650 | @code{string->symbol} will not return it. | |
5651 | @end deffn | |
5652 | ||
5653 | @deffn {Scheme Procedure} symbol-interned? symbol | |
5654 | @deffnx {C Function} scm_symbol_interned_p (symbol) | |
5655 | Return @code{#t} if @var{symbol} is interned, otherwise return | |
5656 | @code{#f}. | |
5657 | @end deffn | |
5658 | ||
5659 | For example: | |
5660 | ||
5661 | @lisp | |
5662 | (define foo-1 (string->symbol "foo")) | |
5663 | (define foo-2 (string->symbol "foo")) | |
5664 | (define foo-3 (make-symbol "foo")) | |
5665 | (define foo-4 (make-symbol "foo")) | |
5666 | ||
5667 | (eq? foo-1 foo-2) | |
5668 | @result{} #t | |
5669 | ; Two interned symbols with the same name are the same object, | |
5670 | ||
5671 | (eq? foo-1 foo-3) | |
5672 | @result{} #f | |
5673 | ; but a call to make-symbol with the same name returns a | |
5674 | ; distinct object. | |
5675 | ||
5676 | (eq? foo-3 foo-4) | |
5677 | @result{} #f | |
5678 | ; A call to make-symbol always returns a new object, even for | |
5679 | ; the same name. | |
5680 | ||
5681 | foo-3 | |
5682 | @result{} #<uninterned-symbol foo 8085290> | |
5683 | ; Uninterned symbols print differently from interned symbols, | |
5684 | ||
5685 | (symbol? foo-3) | |
5686 | @result{} #t | |
5687 | ; but they are still symbols, | |
5688 | ||
5689 | (symbol-interned? foo-3) | |
5690 | @result{} #f | |
5691 | ; just not interned. | |
5692 | @end lisp | |
5693 | ||
5694 | ||
5695 | @node Keywords | |
5696 | @subsection Keywords | |
5697 | @tpindex Keywords | |
5698 | ||
5699 | Keywords are self-evaluating objects with a convenient read syntax that | |
5700 | makes them easy to type. | |
5701 | ||
5702 | Guile's keyword support conforms to R5RS, and adds a (switchable) read | |
5703 | syntax extension to permit keywords to begin with @code{:} as well as | |
ef4cbc08 | 5704 | @code{#:}, or to end with @code{:}. |
07d83abe MV |
5705 | |
5706 | @menu | |
5707 | * Why Use Keywords?:: Motivation for keyword usage. | |
5708 | * Coding With Keywords:: How to use keywords. | |
5709 | * Keyword Read Syntax:: Read syntax for keywords. | |
5710 | * Keyword Procedures:: Procedures for dealing with keywords. | |
07d83abe MV |
5711 | @end menu |
5712 | ||
5713 | @node Why Use Keywords? | |
5714 | @subsubsection Why Use Keywords? | |
5715 | ||
5716 | Keywords are useful in contexts where a program or procedure wants to be | |
5717 | able to accept a large number of optional arguments without making its | |
5718 | interface unmanageable. | |
5719 | ||
5720 | To illustrate this, consider a hypothetical @code{make-window} | |
5721 | procedure, which creates a new window on the screen for drawing into | |
5722 | using some graphical toolkit. There are many parameters that the caller | |
5723 | might like to specify, but which could also be sensibly defaulted, for | |
5724 | example: | |
5725 | ||
5726 | @itemize @bullet | |
5727 | @item | |
5728 | color depth -- Default: the color depth for the screen | |
5729 | ||
5730 | @item | |
5731 | background color -- Default: white | |
5732 | ||
5733 | @item | |
5734 | width -- Default: 600 | |
5735 | ||
5736 | @item | |
5737 | height -- Default: 400 | |
5738 | @end itemize | |
5739 | ||
5740 | If @code{make-window} did not use keywords, the caller would have to | |
5741 | pass in a value for each possible argument, remembering the correct | |
5742 | argument order and using a special value to indicate the default value | |
5743 | for that argument: | |
5744 | ||
5745 | @lisp | |
5746 | (make-window 'default ;; Color depth | |
5747 | 'default ;; Background color | |
5748 | 800 ;; Width | |
5749 | 100 ;; Height | |
5750 | @dots{}) ;; More make-window arguments | |
5751 | @end lisp | |
5752 | ||
5753 | With keywords, on the other hand, defaulted arguments are omitted, and | |
5754 | non-default arguments are clearly tagged by the appropriate keyword. As | |
5755 | a result, the invocation becomes much clearer: | |
5756 | ||
5757 | @lisp | |
5758 | (make-window #:width 800 #:height 100) | |
5759 | @end lisp | |
5760 | ||
5761 | On the other hand, for a simpler procedure with few arguments, the use | |
5762 | of keywords would be a hindrance rather than a help. The primitive | |
5763 | procedure @code{cons}, for example, would not be improved if it had to | |
5764 | be invoked as | |
5765 | ||
5766 | @lisp | |
5767 | (cons #:car x #:cdr y) | |
5768 | @end lisp | |
5769 | ||
5770 | So the decision whether to use keywords or not is purely pragmatic: use | |
5771 | them if they will clarify the procedure invocation at point of call. | |
5772 | ||
5773 | @node Coding With Keywords | |
5774 | @subsubsection Coding With Keywords | |
5775 | ||
5776 | If a procedure wants to support keywords, it should take a rest argument | |
5777 | and then use whatever means is convenient to extract keywords and their | |
5778 | corresponding arguments from the contents of that rest argument. | |
5779 | ||
5780 | The following example illustrates the principle: the code for | |
5781 | @code{make-window} uses a helper procedure called | |
5782 | @code{get-keyword-value} to extract individual keyword arguments from | |
5783 | the rest argument. | |
5784 | ||
5785 | @lisp | |
5786 | (define (get-keyword-value args keyword default) | |
5787 | (let ((kv (memq keyword args))) | |
5788 | (if (and kv (>= (length kv) 2)) | |
5789 | (cadr kv) | |
5790 | default))) | |
5791 | ||
5792 | (define (make-window . args) | |
5793 | (let ((depth (get-keyword-value args #:depth screen-depth)) | |
5794 | (bg (get-keyword-value args #:bg "white")) | |
5795 | (width (get-keyword-value args #:width 800)) | |
5796 | (height (get-keyword-value args #:height 100)) | |
5797 | @dots{}) | |
5798 | @dots{})) | |
5799 | @end lisp | |
5800 | ||
5801 | But you don't need to write @code{get-keyword-value}. The @code{(ice-9 | |
5802 | optargs)} module provides a set of powerful macros that you can use to | |
5803 | implement keyword-supporting procedures like this: | |
5804 | ||
5805 | @lisp | |
5806 | (use-modules (ice-9 optargs)) | |
5807 | ||
5808 | (define (make-window . args) | |
5809 | (let-keywords args #f ((depth screen-depth) | |
5810 | (bg "white") | |
5811 | (width 800) | |
5812 | (height 100)) | |
5813 | ...)) | |
5814 | @end lisp | |
5815 | ||
5816 | @noindent | |
5817 | Or, even more economically, like this: | |
5818 | ||
5819 | @lisp | |
5820 | (use-modules (ice-9 optargs)) | |
5821 | ||
5822 | (define* (make-window #:key (depth screen-depth) | |
5823 | (bg "white") | |
5824 | (width 800) | |
5825 | (height 100)) | |
5826 | ...) | |
5827 | @end lisp | |
5828 | ||
5829 | For further details on @code{let-keywords}, @code{define*} and other | |
5830 | facilities provided by the @code{(ice-9 optargs)} module, see | |
5831 | @ref{Optional Arguments}. | |
5832 | ||
a16d4e82 MW |
5833 | To handle keyword arguments from procedures implemented in C, |
5834 | use @code{scm_c_bind_keyword_arguments} (@pxref{Keyword Procedures}). | |
07d83abe MV |
5835 | |
5836 | @node Keyword Read Syntax | |
5837 | @subsubsection Keyword Read Syntax | |
5838 | ||
7719ef22 MV |
5839 | Guile, by default, only recognizes a keyword syntax that is compatible |
5840 | with R5RS. A token of the form @code{#:NAME}, where @code{NAME} has the | |
5841 | same syntax as a Scheme symbol (@pxref{Symbol Read Syntax}), is the | |
5842 | external representation of the keyword named @code{NAME}. Keyword | |
5843 | objects print using this syntax as well, so values containing keyword | |
5844 | objects can be read back into Guile. When used in an expression, | |
5845 | keywords are self-quoting objects. | |
07d83abe MV |
5846 | |
5847 | If the @code{keyword} read option is set to @code{'prefix}, Guile also | |
5848 | recognizes the alternative read syntax @code{:NAME}. Otherwise, tokens | |
5849 | of the form @code{:NAME} are read as symbols, as required by R5RS. | |
5850 | ||
ef4cbc08 LC |
5851 | @cindex SRFI-88 keyword syntax |
5852 | ||
5853 | If the @code{keyword} read option is set to @code{'postfix}, Guile | |
189681f5 LC |
5854 | recognizes the SRFI-88 read syntax @code{NAME:} (@pxref{SRFI-88}). |
5855 | Otherwise, tokens of this form are read as symbols. | |
ef4cbc08 | 5856 | |
07d83abe | 5857 | To enable and disable the alternative non-R5RS keyword syntax, you use |
1518f649 AW |
5858 | the @code{read-set!} procedure documented @ref{Scheme Read}. Note that |
5859 | the @code{prefix} and @code{postfix} syntax are mutually exclusive. | |
07d83abe | 5860 | |
aba0dff5 | 5861 | @lisp |
07d83abe MV |
5862 | (read-set! keywords 'prefix) |
5863 | ||
5864 | #:type | |
5865 | @result{} | |
5866 | #:type | |
5867 | ||
5868 | :type | |
5869 | @result{} | |
5870 | #:type | |
5871 | ||
ef4cbc08 LC |
5872 | (read-set! keywords 'postfix) |
5873 | ||
5874 | type: | |
5875 | @result{} | |
5876 | #:type | |
5877 | ||
5878 | :type | |
5879 | @result{} | |
5880 | :type | |
5881 | ||
07d83abe MV |
5882 | (read-set! keywords #f) |
5883 | ||
5884 | #:type | |
5885 | @result{} | |
5886 | #:type | |
5887 | ||
5888 | :type | |
5889 | @print{} | |
5890 | ERROR: In expression :type: | |
5891 | ERROR: Unbound variable: :type | |
5892 | ABORT: (unbound-variable) | |
aba0dff5 | 5893 | @end lisp |
07d83abe MV |
5894 | |
5895 | @node Keyword Procedures | |
5896 | @subsubsection Keyword Procedures | |
5897 | ||
07d83abe MV |
5898 | @deffn {Scheme Procedure} keyword? obj |
5899 | @deffnx {C Function} scm_keyword_p (obj) | |
5900 | Return @code{#t} if the argument @var{obj} is a keyword, else | |
5901 | @code{#f}. | |
5902 | @end deffn | |
5903 | ||
7719ef22 MV |
5904 | @deffn {Scheme Procedure} keyword->symbol keyword |
5905 | @deffnx {C Function} scm_keyword_to_symbol (keyword) | |
5906 | Return the symbol with the same name as @var{keyword}. | |
07d83abe MV |
5907 | @end deffn |
5908 | ||
7719ef22 MV |
5909 | @deffn {Scheme Procedure} symbol->keyword symbol |
5910 | @deffnx {C Function} scm_symbol_to_keyword (symbol) | |
5911 | Return the keyword with the same name as @var{symbol}. | |
5912 | @end deffn | |
07d83abe | 5913 | |
7719ef22 MV |
5914 | @deftypefn {C Function} int scm_is_keyword (SCM obj) |
5915 | Equivalent to @code{scm_is_true (scm_keyword_p (@var{obj}))}. | |
07d83abe MV |
5916 | @end deftypefn |
5917 | ||
c428e586 MW |
5918 | @deftypefn {C Function} SCM scm_from_locale_keyword (const char *name) |
5919 | @deftypefnx {C Function} SCM scm_from_locale_keywordn (const char *name, size_t len) | |
7719ef22 | 5920 | Equivalent to @code{scm_symbol_to_keyword (scm_from_locale_symbol |
c428e586 MW |
5921 | (@var{name}))} and @code{scm_symbol_to_keyword (scm_from_locale_symboln |
5922 | (@var{name}, @var{len}))}, respectively. | |
5923 | ||
5924 | Note that these functions should @emph{not} be used when @var{name} is a | |
5925 | C string constant, because there is no guarantee that the current locale | |
a71e79c3 MW |
5926 | will match that of the execution character set, used for string and |
5927 | character constants. Most modern C compilers use UTF-8 by default, so | |
5928 | in such cases we recommend @code{scm_from_utf8_keyword}. | |
c428e586 MW |
5929 | @end deftypefn |
5930 | ||
5931 | @deftypefn {C Function} SCM scm_from_latin1_keyword (const char *name) | |
5932 | @deftypefnx {C Function} SCM scm_from_utf8_keyword (const char *name) | |
5933 | Equivalent to @code{scm_symbol_to_keyword (scm_from_latin1_symbol | |
5934 | (@var{name}))} and @code{scm_symbol_to_keyword (scm_from_utf8_symbol | |
5935 | (@var{name}))}, respectively. | |
7719ef22 | 5936 | @end deftypefn |
07d83abe | 5937 | |
a16d4e82 MW |
5938 | @deftypefn {C Function} void scm_c_bind_keyword_arguments (const char *subr, @ |
5939 | SCM rest, scm_t_keyword_arguments_flags flags, @ | |
5940 | SCM keyword1, SCM *argp1, @ | |
5941 | @dots{}, @ | |
5942 | SCM keywordN, SCM *argpN, @ | |
5943 | @nicode{SCM_UNDEFINED}) | |
5944 | ||
5945 | Extract the specified keyword arguments from @var{rest}, which is not | |
5946 | modified. If the keyword argument @var{keyword1} is present in | |
5947 | @var{rest} with an associated value, that value is stored in the | |
5948 | variable pointed to by @var{argp1}, otherwise the variable is left | |
5949 | unchanged. Similarly for the other keywords and argument pointers up to | |
5950 | @var{keywordN} and @var{argpN}. The argument list to | |
5951 | @code{scm_c_bind_keyword_arguments} must be terminated by | |
5952 | @code{SCM_UNDEFINED}. | |
5953 | ||
5954 | Note that since the variables pointed to by @var{argp1} through | |
5955 | @var{argpN} are left unchanged if the associated keyword argument is not | |
5956 | present, they should be initialized to their default values before | |
5957 | calling @code{scm_c_bind_keyword_arguments}. Alternatively, you can | |
5958 | initialize them to @code{SCM_UNDEFINED} before the call, and then use | |
5959 | @code{SCM_UNBNDP} after the call to see which ones were provided. | |
5960 | ||
5961 | If an unrecognized keyword argument is present in @var{rest} and | |
5962 | @var{flags} does not contain @code{SCM_ALLOW_OTHER_KEYS}, or if | |
5963 | non-keyword arguments are present and @var{flags} does not contain | |
5964 | @code{SCM_ALLOW_NON_KEYWORD_ARGUMENTS}, an exception is raised. | |
5965 | @var{subr} should be the name of the procedure receiving the keyword | |
5966 | arguments, for purposes of error reporting. | |
5967 | ||
5968 | For example: | |
5969 | ||
5970 | @example | |
5971 | SCM k_delimiter; | |
5972 | SCM k_grammar; | |
5973 | SCM sym_infix; | |
5974 | ||
5975 | SCM my_string_join (SCM strings, SCM rest) | |
5976 | @{ | |
5977 | SCM delimiter = SCM_UNDEFINED; | |
5978 | SCM grammar = sym_infix; | |
5979 | ||
5980 | scm_c_bind_keyword_arguments ("my-string-join", rest, 0, | |
5981 | k_delimiter, &delimiter, | |
5982 | k_grammar, &grammar, | |
5983 | SCM_UNDEFINED); | |
5984 | ||
5985 | if (SCM_UNBNDP (delimiter)) | |
5986 | delimiter = scm_from_utf8_string (" "); | |
5987 | ||
5988 | return scm_string_join (strings, delimiter, grammar); | |
5989 | @} | |
5990 | ||
5991 | void my_init () | |
5992 | @{ | |
5993 | k_delimiter = scm_from_utf8_keyword ("delimiter"); | |
5994 | k_grammar = scm_from_utf8_keyword ("grammar"); | |
5995 | sym_infix = scm_from_utf8_symbol ("infix"); | |
5996 | scm_c_define_gsubr ("my-string-join", 1, 0, 1, my_string_join); | |
5997 | @} | |
5998 | @end example | |
5999 | @end deftypefn | |
6000 | ||
6001 | ||
07d83abe MV |
6002 | @node Other Types |
6003 | @subsection ``Functionality-Centric'' Data Types | |
6004 | ||
a136ada6 | 6005 | Procedures and macros are documented in their own sections: see |
e4955559 | 6006 | @ref{Procedures} and @ref{Macros}. |
07d83abe MV |
6007 | |
6008 | Variable objects are documented as part of the description of Guile's | |
6009 | module system: see @ref{Variables}. | |
6010 | ||
a136ada6 | 6011 | Asyncs, dynamic roots and fluids are described in the section on |
07d83abe MV |
6012 | scheduling: see @ref{Scheduling}. |
6013 | ||
a136ada6 | 6014 | Hooks are documented in the section on general utility functions: see |
07d83abe MV |
6015 | @ref{Hooks}. |
6016 | ||
a136ada6 | 6017 | Ports are described in the section on I/O: see @ref{Input and Output}. |
07d83abe | 6018 | |
a136ada6 NJ |
6019 | Regular expressions are described in their own section: see @ref{Regular |
6020 | Expressions}. | |
07d83abe MV |
6021 | |
6022 | @c Local Variables: | |
6023 | @c TeX-master: "guile.texi" | |
6024 | @c End: |