*** empty log message ***
[bpt/emacs.git] / lispref / strings.texi
CommitLineData
869f4785
RS
1@c -*-texinfo-*-
2@c This is part of the GNU Emacs Lisp Reference Manual.
651f374c
TTN
3@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2002, 2003,
4@c 2004, 2005 Free Software Foundation, Inc.
869f4785
RS
5@c See the file elisp.texi for copying conditions.
6@setfilename ../info/strings
7@node Strings and Characters, Lists, Numbers, Top
8@comment node-name, next, previous, up
9@chapter Strings and Characters
10@cindex strings
11@cindex character arrays
12@cindex characters
13@cindex bytes
14
15 A string in Emacs Lisp is an array that contains an ordered sequence
16of characters. Strings are used as names of symbols, buffers, and
b6ae404e
KH
17files; to send messages to users; to hold text being copied between
18buffers; and for many other purposes. Because strings are so important,
869f4785
RS
19Emacs Lisp has many functions expressly for manipulating them. Emacs
20Lisp programs use strings more often than individual characters.
21
22 @xref{Strings of Events}, for special considerations for strings of
23keyboard character events.
24
25@menu
26* Basics: String Basics. Basic properties of strings and characters.
27* Predicates for Strings:: Testing whether an object is a string or char.
28* Creating Strings:: Functions to allocate new strings.
f9f59935 29* Modifying Strings:: Altering the contents of an existing string.
869f4785 30* Text Comparison:: Comparing characters or strings.
8241495d 31* String Conversion:: Converting to and from characters and strings.
a9f0a989 32* Formatting Strings:: @code{format}: Emacs's analogue of @code{printf}.
969fe9b5
RS
33* Case Conversion:: Case conversion functions.
34* Case Tables:: Customizing case conversion.
869f4785
RS
35@end menu
36
37@node String Basics
38@section String and Character Basics
39
b6ae404e 40 Characters are represented in Emacs Lisp as integers;
969fe9b5
RS
41whether an integer is a character or not is determined only by how it is
42used. Thus, strings really contain integers.
869f4785 43
f9f59935
RS
44 The length of a string (like any array) is fixed, and cannot be
45altered once the string exists. Strings in Lisp are @emph{not}
46terminated by a distinguished character code. (By contrast, strings in
ad800164 47C are terminated by a character with @acronym{ASCII} code 0.)
869f4785 48
969fe9b5
RS
49 Since strings are arrays, and therefore sequences as well, you can
50operate on them with the general array and sequence functions.
51(@xref{Sequences Arrays Vectors}.) For example, you can access or
52change individual characters in a string using the functions @code{aref}
53and @code{aset} (@pxref{Array Functions}).
869f4785 54
ad800164 55 There are two text representations for non-@acronym{ASCII} characters in
f9f59935 56Emacs strings (and in buffers): unibyte and multibyte (@pxref{Text
ad800164
EZ
57Representations}). An @acronym{ASCII} character always occupies one byte in a
58string; in fact, when a string is all @acronym{ASCII}, there is no real
b6ae404e
KH
59difference between the unibyte and multibyte representations.
60For most Lisp programming, you don't need to be concerned with these two
f9f59935 61representations.
869f4785
RS
62
63 Sometimes key sequences are represented as strings. When a string is
64a key sequence, string elements in the range 128 to 255 represent meta
8241495d 65characters (which are large integers) rather than character
969fe9b5 66codes in the range 128 to 255.
869f4785
RS
67
68 Strings cannot hold characters that have the hyper, super or alt
ad800164
EZ
69modifiers; they can hold @acronym{ASCII} control characters, but no other
70control characters. They do not distinguish case in @acronym{ASCII} control
f9f59935
RS
71characters. If you want to store such characters in a sequence, such as
72a key sequence, you must use a vector instead of a string.
8241495d 73@xref{Character Type}, for more information about the representation of meta
f9f59935 74and other modifiers for keyboard input characters.
869f4785 75
bfe721d1 76 Strings are useful for holding regular expressions. You can also
86cf000e
RS
77match regular expressions against strings with @code{string-match}
78(@pxref{Regexp Search}). The functions @code{match-string}
79(@pxref{Simple Match Data}) and @code{replace-match} (@pxref{Replacing
80Match}) are useful for decomposing and modifying strings after
81matching regular expressions against them.
bfe721d1 82
869f4785
RS
83 Like a buffer, a string can contain text properties for the characters
84in it, as well as the characters themselves. @xref{Text Properties}.
bfe721d1
KH
85All the Lisp primitives that copy text from strings to buffers or other
86strings also copy the properties of the characters being copied.
869f4785
RS
87
88 @xref{Text}, for information about functions that display strings or
89copy them into buffers. @xref{Character Type}, and @ref{String Type},
90for information about the syntax of characters and strings.
f9f59935 91@xref{Non-ASCII Characters}, for functions to convert between text
b6ae404e 92representations and to encode and decode character codes.
869f4785
RS
93
94@node Predicates for Strings
95@section The Predicates for Strings
96
97For more information about general sequence and array predicates,
98see @ref{Sequences Arrays Vectors}, and @ref{Arrays}.
99
100@defun stringp object
a9f0a989 101This function returns @code{t} if @var{object} is a string, @code{nil}
869f4785
RS
102otherwise.
103@end defun
104
105@defun char-or-string-p object
a9f0a989 106This function returns @code{t} if @var{object} is a string or a
869f4785
RS
107character (i.e., an integer), @code{nil} otherwise.
108@end defun
109
110@node Creating Strings
111@section Creating Strings
112
113 The following functions create strings, either from scratch, or by
114putting strings together, or by taking them apart.
115
116@defun make-string count character
a9f0a989 117This function returns a string made up of @var{count} repetitions of
869f4785
RS
118@var{character}. If @var{count} is negative, an error is signaled.
119
120@example
121(make-string 5 ?x)
122 @result{} "xxxxx"
123(make-string 0 ?x)
124 @result{} ""
125@end example
126
127 Other functions to compare with this one include @code{char-to-string}
128(@pxref{String Conversion}), @code{make-vector} (@pxref{Vectors}), and
129@code{make-list} (@pxref{Building Lists}).
130@end defun
131
f9f59935
RS
132@defun string &rest characters
133This returns a string containing the characters @var{characters}.
134
135@example
136(string ?a ?b ?c)
137 @result{} "abc"
138@end example
139@end defun
140
869f4785 141@defun substring string start &optional end
bfe721d1 142This function returns a new string which consists of those characters
869f4785
RS
143from @var{string} in the range from (and including) the character at the
144index @var{start} up to (but excluding) the character at the index
145@var{end}. The first character is at index zero.
146
147@example
148@group
149(substring "abcdefg" 0 3)
150 @result{} "abc"
151@end group
152@end example
153
154@noindent
155Here the index for @samp{a} is 0, the index for @samp{b} is 1, and the
156index for @samp{c} is 2. Thus, three letters, @samp{abc}, are copied
157from the string @code{"abcdefg"}. The index 3 marks the character
158position up to which the substring is copied. The character whose index
159is 3 is actually the fourth character in the string.
160
161A negative number counts from the end of the string, so that @minus{}1
177c0ea7 162signifies the index of the last character of the string. For example:
869f4785
RS
163
164@example
165@group
166(substring "abcdefg" -3 -1)
167 @result{} "ef"
168@end group
169@end example
170
171@noindent
172In this example, the index for @samp{e} is @minus{}3, the index for
173@samp{f} is @minus{}2, and the index for @samp{g} is @minus{}1.
174Therefore, @samp{e} and @samp{f} are included, and @samp{g} is excluded.
175
f67b6c12 176When @code{nil} is used for @var{end}, it stands for the length of the
869f4785
RS
177string. Thus,
178
179@example
180@group
181(substring "abcdefg" -3 nil)
182 @result{} "efg"
183@end group
184@end example
185
186Omitting the argument @var{end} is equivalent to specifying @code{nil}.
187It follows that @code{(substring @var{string} 0)} returns a copy of all
188of @var{string}.
189
190@example
191@group
192(substring "abcdefg" 0)
193 @result{} "abcdefg"
194@end group
195@end example
196
197@noindent
198But we recommend @code{copy-sequence} for this purpose (@pxref{Sequence
199Functions}).
200
bfe721d1
KH
201If the characters copied from @var{string} have text properties, the
202properties are copied into the new string also. @xref{Text Properties}.
203
8241495d 204@code{substring} also accepts a vector for the first argument.
969fe9b5
RS
205For example:
206
207@example
208(substring [a b (c) "d"] 1 3)
209 @result{} [b (c)]
210@end example
211
f67b6c12
LT
212A @code{wrong-type-argument} error is signaled if @var{start} is not
213an integer or if @var{end} is neither an integer nor @code{nil}. An
214@code{args-out-of-range} error is signaled if @var{start} indicates a
215character following @var{end}, or if either integer is out of range
216for @var{string}.
869f4785
RS
217
218Contrast this function with @code{buffer-substring} (@pxref{Buffer
219Contents}), which returns a string containing a portion of the text in
220the current buffer. The beginning of a string is at index 0, but the
221beginning of a buffer is at index 1.
222@end defun
223
f67b6c12
LT
224@defun substring-no-properties string &optional start end
225This works like @code{substring} but discards all text properties from
226the value. Also, @var{start} may be omitted or @code{nil}, which is
227equivalent to 0. Thus, @w{@code{(substring-no-properties
228@var{string})}} returns a copy of @var{string}, with all text
229properties removed.
1994c2a7
RS
230@end defun
231
869f4785
RS
232@defun concat &rest sequences
233@cindex copying strings
234@cindex concatenating strings
235This function returns a new string consisting of the characters in the
bfe721d1
KH
236arguments passed to it (along with their text properties, if any). The
237arguments may be strings, lists of numbers, or vectors of numbers; they
238are not themselves changed. If @code{concat} receives no arguments, it
239returns an empty string.
869f4785
RS
240
241@example
242(concat "abc" "-def")
243 @result{} "abc-def"
a9f0a989 244(concat "abc" (list 120 121) [122])
869f4785
RS
245 @result{} "abcxyz"
246;; @r{@code{nil} is an empty sequence.}
247(concat "abc" nil "-def")
248 @result{} "abc-def"
249(concat "The " "quick brown " "fox.")
250 @result{} "The quick brown fox."
251(concat)
252 @result{} ""
253@end example
254
255@noindent
869f4785
RS
256The @code{concat} function always constructs a new string that is
257not @code{eq} to any existing string.
258
315fe0e9
DL
259In Emacs versions before 21, when an argument was an integer (not a
260sequence of integers), it was converted to a string of digits making up
261the decimal printed representation of the integer. This obsolete usage
262no longer works. The proper way to convert an integer to its decimal
263printed form is with @code{format} (@pxref{Formatting Strings}) or
a10f6c69 264@code{number-to-string} (@pxref{String Conversion}).
869f4785 265
869f4785
RS
266For information about other concatenation functions, see the
267description of @code{mapconcat} in @ref{Mapping Functions},
ad833e10 268@code{vconcat} in @ref{Vector Functions}, and @code{append} in @ref{Building
869f4785
RS
269Lists}.
270@end defun
271
f67b6c12 272@defun split-string string &optional separators omit-nulls
a730d07b
RS
273This function splits @var{string} into substrings at matches for the
274regular expression @var{separators}. Each match for @var{separators}
275defines a splitting point; the substrings between the splitting points
276are made into a list, which is the value returned by
277@code{split-string}.
278
279If @var{omit-nulls} is @code{nil}, the result contains null strings
280whenever there are two consecutive matches for @var{separators}, or a
281match is adjacent to the beginning or end of @var{string}. If
282@var{omit-nulls} is @code{t}, these null strings are omitted from the
283result list.
284
b6ae404e 285If @var{separators} is @code{nil} (or omitted),
3aeea9e9 286the default is the value of @code{split-string-default-separators}.
f9f59935 287
3aeea9e9
JB
288As a special case, when @var{separators} is @code{nil} (or omitted),
289null strings are always omitted from the result. Thus:
f9f59935
RS
290
291@example
3aeea9e9 292(split-string " two words ")
f67b6c12 293 @result{} ("two" "words")
3aeea9e9
JB
294@end example
295
296The result is not @samp{("" "two" "words" "")}, which would rarely be
d8186297 297useful. If you need such a result, use an explicit value for
3aeea9e9
JB
298@var{separators}:
299
300@example
342fd6cd
RS
301(split-string " two words "
302 split-string-default-separators)
f67b6c12 303 @result{} ("" "two" "words" "")
f9f59935
RS
304@end example
305
3aeea9e9 306More examples:
f9f59935
RS
307
308@example
3aeea9e9 309(split-string "Soup is good food" "o")
f67b6c12 310 @result{} ("S" "up is g" "" "d f" "" "d")
3aeea9e9 311(split-string "Soup is good food" "o" t)
f67b6c12 312 @result{} ("S" "up is g" "d f" "d")
3aeea9e9 313(split-string "Soup is good food" "o+")
f67b6c12 314 @result{} ("S" "up is g" "d f" "d")
f9f59935
RS
315@end example
316
f67b6c12
LT
317Empty matches do count, except that @code{split-string} will not look
318for a final empty match when it already reached the end of the string
319using a non-empty match or when @var{string} is empty:
f9f59935
RS
320
321@example
f67b6c12
LT
322(split-string "aooob" "o*")
323 @result{} ("" "a" "" "b" "")
324(split-string "ooaboo" "o*")
325 @result{} ("" "" "a" "b" "")
326(split-string "" "")
327 @result{} ("")
328@end example
329
330However, when @var{separators} can match the empty string,
331@var{omit-nulls} is usually @code{t}, so that the subtleties in the
332three previous examples are rarely relevant:
333
334@example
335(split-string "Soup is good food" "o*" t)
336 @result{} ("S" "u" "p" " " "i" "s" " " "g" "d" " " "f" "d")
337(split-string "Nice doggy!" "" t)
338 @result{} ("N" "i" "c" "e" " " "d" "o" "g" "g" "y" "!")
339(split-string "" "" t)
340 @result{} nil
341@end example
342
343Somewhat odd, but predictable, behavior can occur for certain
344``non-greedy'' values of @var{separators} that can prefer empty
345matches over non-empty matches. Again, such values rarely occur in
346practice:
347
348@example
349(split-string "ooo" "o*" t)
350 @result{} nil
351(split-string "ooo" "\\|o+" t)
352 @result{} ("o" "o" "o")
f9f59935
RS
353@end example
354@end defun
355
ac902a01 356@defvar split-string-default-separators
342fd6cd
RS
357The default value of @var{separators} for @code{split-string}. Its
358usual value is @w{@samp{"[ \f\t\n\r\v]+"}}.
ac902a01
MB
359@end defvar
360
f9f59935
RS
361@node Modifying Strings
362@section Modifying Strings
363
364 The most basic way to alter the contents of an existing string is with
365@code{aset} (@pxref{Array Functions}). @code{(aset @var{string}
366@var{idx} @var{char})} stores @var{char} into @var{string} at index
367@var{idx}. Each character occupies one or more bytes, and if @var{char}
368needs a different number of bytes from the character already present at
969fe9b5 369that index, @code{aset} signals an error.
f9f59935
RS
370
371 A more powerful function is @code{store-substring}:
372
f9f59935
RS
373@defun store-substring string idx obj
374This function alters part of the contents of the string @var{string}, by
375storing @var{obj} starting at index @var{idx}. The argument @var{obj}
376may be either a character or a (smaller) string.
377
378Since it is impossible to change the length of an existing string, it is
379an error if @var{obj} doesn't fit within @var{string}'s actual length,
b6ae404e 380or if any new character requires a different number of bytes from the
969fe9b5 381character currently present at that point in @var{string}.
81e65dff
RS
382@end defun
383
384 To clear out a string that contained a password, use
385@code{clear-string}:
386
387@defun clear-string string
376dfc01
RS
388This clears the contents of @var{string} to zeros.
389It may also change @var{string}'s length and convert it to
390a unibyte string.
f9f59935
RS
391@end defun
392
bda144f4 393@need 2000
869f4785
RS
394@node Text Comparison
395@section Comparison of Characters and Strings
396@cindex string equality
397
398@defun char-equal character1 character2
399This function returns @code{t} if the arguments represent the same
400character, @code{nil} otherwise. This function ignores differences
401in case if @code{case-fold-search} is non-@code{nil}.
402
403@example
404(char-equal ?x ?x)
405 @result{} t
f9f59935
RS
406(let ((case-fold-search nil))
407 (char-equal ?x ?X))
408 @result{} nil
869f4785
RS
409@end example
410@end defun
411
412@defun string= string1 string2
413This function returns @code{t} if the characters of the two strings
f67b6c12
LT
414match exactly. Symbols are also allowed as arguments, in which case
415their print names are used.
b6ae404e 416Case is always significant, regardless of @code{case-fold-search}.
869f4785
RS
417
418@example
419(string= "abc" "abc")
420 @result{} t
421(string= "abc" "ABC")
422 @result{} nil
423(string= "ab" "ABC")
424 @result{} nil
425@end example
22697dac 426
f9f59935
RS
427The function @code{string=} ignores the text properties of the two
428strings. When @code{equal} (@pxref{Equality Predicates}) compares two
429strings, it uses @code{string=}.
430
a62f71e4
LT
431For technical reasons, a unibyte and a multibyte string are
432@code{equal} if and only if they contain the same sequence of
433character codes and all these codes are either in the range 0 through
434127 (@acronym{ASCII}) or 160 through 255 (@code{eight-bit-graphic}).
435However, when a unibyte string gets converted to a multibyte string,
436all characters with codes in the range 160 through 255 get converted
437to characters with higher codes, whereas @acronym{ASCII} characters
438remain unchanged. Thus, a unibyte string and its conversion to
439multibyte are only @code{equal} if the string is all @acronym{ASCII}.
440Character codes 160 through 255 are not entirely proper in multibyte
441text, even though they can occur. As a consequence, the situation
442where a unibyte and a multibyte string are @code{equal} without both
443being all @acronym{ASCII} is a technical oddity that very few Emacs
444Lisp programmers ever get confronted with. @xref{Text
f9f59935 445Representations}.
869f4785
RS
446@end defun
447
448@defun string-equal string1 string2
449@code{string-equal} is another name for @code{string=}.
450@end defun
451
452@cindex lexical comparison
453@defun string< string1 string2
454@c (findex string< causes problems for permuted index!!)
8241495d
RS
455This function compares two strings a character at a time. It
456scans both the strings at the same time to find the first pair of corresponding
457characters that do not match. If the lesser character of these two is
869f4785
RS
458the character from @var{string1}, then @var{string1} is less, and this
459function returns @code{t}. If the lesser character is the one from
460@var{string2}, then @var{string1} is greater, and this function returns
461@code{nil}. If the two strings match entirely, the value is @code{nil}.
462
969fe9b5
RS
463Pairs of characters are compared according to their character codes.
464Keep in mind that lower case letters have higher numeric values in the
ad800164 465@acronym{ASCII} character set than their upper case counterparts; digits and
869f4785 466many punctuation characters have a lower numeric value than upper case
ad800164
EZ
467letters. An @acronym{ASCII} character is less than any non-@acronym{ASCII}
468character; a unibyte non-@acronym{ASCII} character is always less than any
469multibyte non-@acronym{ASCII} character (@pxref{Text Representations}).
869f4785
RS
470
471@example
472@group
473(string< "abc" "abd")
474 @result{} t
475(string< "abd" "abc")
476 @result{} nil
477(string< "123" "abc")
478 @result{} t
479@end group
480@end example
481
482When the strings have different lengths, and they match up to the
483length of @var{string1}, then the result is @code{t}. If they match up
484to the length of @var{string2}, the result is @code{nil}. A string of
485no characters is less than any other string.
486
487@example
488@group
489(string< "" "abc")
490 @result{} t
491(string< "ab" "abc")
492 @result{} t
493(string< "abc" "")
494 @result{} nil
495(string< "abc" "ab")
496 @result{} nil
497(string< "" "")
177c0ea7 498 @result{} nil
869f4785
RS
499@end group
500@end example
f67b6c12
LT
501
502Symbols are also allowed as arguments, in which case their print names
503are used.
869f4785
RS
504@end defun
505
506@defun string-lessp string1 string2
507@code{string-lessp} is another name for @code{string<}.
a9f0a989
RS
508@end defun
509
510@defun compare-strings string1 start1 end1 string2 start2 end2 &optional ignore-case
8241495d 511This function compares the specified part of @var{string1} with the
a9f0a989 512specified part of @var{string2}. The specified part of @var{string1}
8241495d
RS
513runs from index @var{start1} up to index @var{end1} (@code{nil} means
514the end of the string). The specified part of @var{string2} runs from
515index @var{start2} up to index @var{end2} (@code{nil} means the end of
516the string).
a9f0a989
RS
517
518The strings are both converted to multibyte for the comparison
a62f71e4
LT
519(@pxref{Text Representations}) so that a unibyte string and its
520conversion to multibyte are always regarded as equal. If
521@var{ignore-case} is non-@code{nil}, then case is ignored, so that
522upper case letters can be equal to lower case letters.
a9f0a989
RS
523
524If the specified portions of the two strings match, the value is
525@code{t}. Otherwise, the value is an integer which indicates how many
526leading characters agree, and which string is less. Its absolute value
527is one plus the number of characters that agree at the beginning of the
528two strings. The sign is negative if @var{string1} (or its specified
529portion) is less.
530@end defun
531
81e65dff
RS
532@defun assoc-string key alist &optional case-fold
533This function works like @code{assoc}, except that @var{key} must be a
534string, and comparison is done using @code{compare-strings}. If
535@var{case-fold} is non-@code{nil}, it ignores case differences.
d8186297
LT
536Unlike @code{assoc}, this function can also match elements of the alist
537that are strings rather than conses. In particular, @var{alist} can
538be a list of strings rather than an actual alist.
a62f71e4 539@xref{Association Lists}.
869f4785
RS
540@end defun
541
542 See also @code{compare-buffer-substrings} in @ref{Comparing Text}, for
543a way to compare text in buffers. The function @code{string-match},
544which matches a regular expression against a string, can be used
545for a kind of string comparison; see @ref{Regexp Search}.
546
547@node String Conversion
548@comment node-name, next, previous, up
549@section Conversion of Characters and Strings
550@cindex conversion of strings
551
552 This section describes functions for conversions between characters,
42fc00a4
RS
553strings and integers. @code{format} (@pxref{Formatting Strings})
554and @code{prin1-to-string}
869f4785
RS
555(@pxref{Output Functions}) can also convert Lisp objects into strings.
556@code{read-from-string} (@pxref{Input Functions}) can ``convert'' a
f9f59935
RS
557string representation of a Lisp object into an object. The functions
558@code{string-make-multibyte} and @code{string-make-unibyte} convert the
559text representation of a string (@pxref{Converting Representations}).
869f4785
RS
560
561 @xref{Documentation}, for functions that produce textual descriptions
562of text characters and general input events
563(@code{single-key-description} and @code{text-char-description}). These
564functions are used primarily for making help messages.
565
566@defun char-to-string character
567@cindex character to string
969fe9b5
RS
568This function returns a new string containing one character,
569@var{character}. This function is semi-obsolete because the function
570@code{string} is more general. @xref{Creating Strings}.
869f4785
RS
571@end defun
572
573@defun string-to-char string
574@cindex string to character
575 This function returns the first character in @var{string}. If the
576string is empty, the function returns 0. The value is also 0 when the
ad800164 577first character of @var{string} is the null character, @acronym{ASCII} code
869f4785
RS
5780.
579
580@example
581(string-to-char "ABC")
582 @result{} 65
583(string-to-char "xyz")
584 @result{} 120
585(string-to-char "")
586 @result{} 0
8241495d 587@group
869f4785
RS
588(string-to-char "\000")
589 @result{} 0
8241495d 590@end group
869f4785
RS
591@end example
592
593This function may be eliminated in the future if it does not seem useful
594enough to retain.
595@end defun
596
597@defun number-to-string number
598@cindex integer to string
599@cindex integer to decimal
b6ae404e 600This function returns a string consisting of the printed base-ten
869f4785 601representation of @var{number}, which may be an integer or a floating
8241495d 602point number. The returned value starts with a minus sign if the argument is
869f4785
RS
603negative.
604
605@example
606(number-to-string 256)
607 @result{} "256"
f67b6c12 608@group
869f4785
RS
609(number-to-string -23)
610 @result{} "-23"
f67b6c12 611@end group
869f4785
RS
612(number-to-string -23.5)
613 @result{} "-23.5"
614@end example
615
616@cindex int-to-string
617@code{int-to-string} is a semi-obsolete alias for this function.
618
619See also the function @code{format} in @ref{Formatting Strings}.
620@end defun
621
a9f0a989 622@defun string-to-number string &optional base
869f4785
RS
623@cindex string to number
624This function returns the numeric value of the characters in
f67b6c12
LT
625@var{string}. If @var{base} is non-@code{nil}, it must be an integer
626between 2 and 16 (inclusive), and integers are converted in that base.
627If @var{base} is @code{nil}, then base ten is used. Floating point
628conversion only works in base ten; we have not implemented other
629radices for floating point numbers, because that would be much more
630work and does not seem useful. If @var{string} looks like an integer
631but its value is too large to fit into a Lisp integer,
3afd8c25 632@code{string-to-number} returns a floating point result.
f9f59935 633
f67b6c12
LT
634The parsing skips spaces and tabs at the beginning of @var{string},
635then reads as much of @var{string} as it can interpret as a number in
636the given base. (On some systems it ignores other whitespace at the
637beginning, not just spaces and tabs.) If the first character after
638the ignored whitespace is neither a digit in the given base, nor a
639plus or minus sign, nor the leading dot of a floating point number,
640this function returns 0.
869f4785
RS
641
642@example
643(string-to-number "256")
644 @result{} 256
645(string-to-number "25 is a perfect square.")
646 @result{} 25
647(string-to-number "X256")
648 @result{} 0
649(string-to-number "-4.5")
650 @result{} -4.5
ea626e87
RS
651(string-to-number "1e5")
652 @result{} 100000.0
869f4785
RS
653@end example
654
655@findex string-to-int
656@code{string-to-int} is an obsolete alias for this function.
657@end defun
658
f9f59935
RS
659 Here are some other functions that can convert to or from a string:
660
661@table @code
662@item concat
663@code{concat} can convert a vector or a list into a string.
664@xref{Creating Strings}.
665
666@item vconcat
667@code{vconcat} can convert a string into a vector. @xref{Vector
668Functions}.
669
670@item append
671@code{append} can convert a string into a list. @xref{Building Lists}.
672@end table
673
869f4785
RS
674@node Formatting Strings
675@comment node-name, next, previous, up
676@section Formatting Strings
677@cindex formatting strings
678@cindex strings, formatting them
679
680 @dfn{Formatting} means constructing a string by substitution of
b6ae404e
KH
681computed values at various places in a constant string. This constant string
682controls how the other values are printed, as well as where they appear;
869f4785
RS
683it is called a @dfn{format string}.
684
685 Formatting is often useful for computing messages to be displayed. In
686fact, the functions @code{message} and @code{error} provide the same
687formatting feature described here; they differ from @code{format} only
688in how they use the result of formatting.
689
690@defun format string &rest objects
969fe9b5 691This function returns a new string that is made by copying
177c0ea7 692@var{string} and then replacing any format specification
869f4785
RS
693in the copy with encodings of the corresponding @var{objects}. The
694arguments @var{objects} are the computed values to be formatted.
8241495d
RS
695
696The characters in @var{string}, other than the format specifications,
a546cd47
RS
697are copied directly into the output; if they have text properties,
698these are copied into the output also.
869f4785
RS
699@end defun
700
701@cindex @samp{%} in format
702@cindex format specification
703 A format specification is a sequence of characters beginning with a
704@samp{%}. Thus, if there is a @samp{%d} in @var{string}, the
705@code{format} function replaces it with the printed representation of
706one of the values to be formatted (one of the arguments @var{objects}).
707For example:
708
709@example
710@group
711(format "The value of fill-column is %d." fill-column)
712 @result{} "The value of fill-column is 72."
713@end group
714@end example
715
716 If @var{string} contains more than one format specification, the
b6ae404e 717format specifications correspond to successive values from
869f4785
RS
718@var{objects}. Thus, the first format specification in @var{string}
719uses the first such value, the second format specification uses the
720second such value, and so on. Any extra format specifications (those
376dfc01
RS
721for which there are no corresponding values) cause an error. Any
722extra values to be formatted are ignored.
869f4785 723
a9f0a989
RS
724 Certain format specifications require values of particular types. If
725you supply a value that doesn't fit the requirements, an error is
726signaled.
869f4785
RS
727
728 Here is a table of valid format specifications:
729
730@table @samp
731@item %s
732Replace the specification with the printed representation of the object,
f9f59935 733made without quoting (that is, using @code{princ}, not
969fe9b5 734@code{prin1}---@pxref{Output Functions}). Thus, strings are represented
f9f59935
RS
735by their contents alone, with no @samp{"} characters, and symbols appear
736without @samp{\} characters.
869f4785 737
a546cd47 738If the object is a string, its text properties are
8241495d
RS
739copied into the output. The text properties of the @samp{%s} itself
740are also copied, but those of the object take priority.
741
869f4785
RS
742@item %S
743Replace the specification with the printed representation of the object,
f9f59935
RS
744made with quoting (that is, using @code{prin1}---@pxref{Output
745Functions}). Thus, strings are enclosed in @samp{"} characters, and
746@samp{\} characters appear where necessary before special characters.
869f4785 747
869f4785
RS
748@item %o
749@cindex integer to octal
750Replace the specification with the base-eight representation of an
751integer.
752
753@item %d
754Replace the specification with the base-ten representation of an
755integer.
756
757@item %x
898bb59a 758@itemx %X
869f4785
RS
759@cindex integer to hexadecimal
760Replace the specification with the base-sixteen representation of an
898bb59a 761integer. @samp{%x} uses lower case and @samp{%X} uses upper case.
869f4785
RS
762
763@item %c
764Replace the specification with the character which is the value given.
765
766@item %e
767Replace the specification with the exponential notation for a floating
394d33a8 768point number.
869f4785
RS
769
770@item %f
771Replace the specification with the decimal-point notation for a floating
772point number.
773
774@item %g
775Replace the specification with notation for a floating point number,
a9f0a989 776using either exponential notation or decimal-point notation, whichever
394d33a8 777is shorter.
869f4785
RS
778
779@item %%
898bb59a
DL
780Replace the specification with a single @samp{%}. This format
781specification is unusual in that it does not use a value. For example,
782@code{(format "%% %d" 30)} returns @code{"% 30"}.
869f4785
RS
783@end table
784
785 Any other format character results in an @samp{Invalid format
786operation} error.
787
788 Here are several examples:
789
790@example
791@group
792(format "The name of this buffer is %s." (buffer-name))
793 @result{} "The name of this buffer is strings.texi."
794
795(format "The buffer object prints as %s." (current-buffer))
9feb90da 796 @result{} "The buffer object prints as strings.texi."
869f4785 797
177c0ea7 798(format "The octal value of %d is %o,
869f4785 799 and the hex value is %x." 18 18 18)
177c0ea7 800 @result{} "The octal value of 18 is 22,
869f4785
RS
801 and the hex value is 12."
802@end group
803@end example
804
869f4785
RS
805@cindex field width
806@cindex padding
728345f8
JY
807 All the specification characters allow an optional ``width'', which
808is a digit-string between the @samp{%} and the character. If the
809printed representation of the object contains fewer characters than
810this width, then it is padded. The padding is on the left if the
d8186297
LT
811width is positive (or starts with zero) and on the right if the
812width is negative. The padding character is normally a space, but if
728345f8
JY
813the width starts with a zero, zeros are used for padding. Some of
814these conventions are ignored for specification characters for which
d8186297
LT
815they do not make sense. That is, @samp{%s}, @samp{%S} and @samp{%c}
816accept a width starting with 0, but still pad with @emph{spaces} on
817the left. Also, @samp{%%} accepts a width, but ignores it. Here are
818some examples of padding:
869f4785
RS
819
820@example
821(format "%06d is padded on the left with zeros" 123)
822 @result{} "000123 is padded on the left with zeros"
823
824(format "%-6d is padded on the right" 123)
825 @result{} "123 is padded on the right"
826@end example
827
728345f8
JY
828If the width is too small, @code{format} does not truncate the
829object's printed representation. Thus, you can use a width to specify
830a minimum spacing between columns with no risk of losing information.
869f4785
RS
831
832 In the following three examples, @samp{%7s} specifies a minimum width
833of 7. In the first case, the string inserted in place of @samp{%7s} has
834only 3 letters, so 4 blank spaces are inserted for padding. In the
835second case, the string @code{"specification"} is 13 letters wide but is
836not truncated. In the third case, the padding is on the right.
837
177c0ea7 838@smallexample
869f4785
RS
839@group
840(format "The word `%7s' actually has %d letters in it."
841 "foo" (length "foo"))
177c0ea7 842 @result{} "The word ` foo' actually has 3 letters in it."
869f4785
RS
843@end group
844
845@group
846(format "The word `%7s' actually has %d letters in it."
177c0ea7
JB
847 "specification" (length "specification"))
848 @result{} "The word `specification' actually has 13 letters in it."
869f4785
RS
849@end group
850
851@group
852(format "The word `%-7s' actually has %d letters in it."
853 "foo" (length "foo"))
177c0ea7 854 @result{} "The word `foo ' actually has 3 letters in it."
869f4785
RS
855@end group
856@end smallexample
857
d8186297 858@cindex precision in format specifications
728345f8
JY
859 All the specification characters allow an optional ``precision''
860before the character (after the width, if present). The precision is
861a decimal-point @samp{.} followed by a digit-string. For the
d8186297
LT
862floating-point specifications (@samp{%e}, @samp{%f}, @samp{%g}), the
863precision specifies how many decimal places to show; if zero, the
864decimal-point itself is also omitted. For @samp{%s} and @samp{%S},
865the precision truncates the string to the given width, so
866@samp{%.3s} shows only the first three characters of the
867representation for @var{object}. Precision is ignored for other
868specification characters.
869
870@cindex flags in format specifications
871Immediately after the @samp{%} and before the optional width and
872precision, you can put certain ``flag'' characters.
873
874A space character inserts a space for positive numbers (otherwise
728345f8 875nothing is inserted for positive numbers). This flag is ignored
d8186297 876except for @samp{%d}, @samp{%e}, @samp{%f}, @samp{%g}.
728345f8 877
d8186297
LT
878The flag @samp{#} indicates ``alternate form''. For @samp{%o} it
879ensures that the result begins with a 0. For @samp{%x} and @samp{%X}
880the result is prefixed with @samp{0x} or @samp{0X}. For @samp{%e},
881@samp{%f}, and @samp{%g} a decimal point is always shown even if the
882precision is zero.
728345f8 883
969fe9b5 884@node Case Conversion
177c0ea7 885@comment node-name, next, previous, up
969fe9b5 886@section Case Conversion in Lisp
177c0ea7
JB
887@cindex upper case
888@cindex lower case
889@cindex character case
969fe9b5 890@cindex case conversion in Lisp
869f4785
RS
891
892 The character case functions change the case of single characters or
a9f0a989
RS
893of the contents of strings. The functions normally convert only
894alphabetic characters (the letters @samp{A} through @samp{Z} and
ad800164 895@samp{a} through @samp{z}, as well as non-@acronym{ASCII} letters); other
8241495d
RS
896characters are not altered. You can specify a different case
897conversion mapping by specifying a case table (@pxref{Case Tables}).
a9f0a989
RS
898
899 These functions do not modify the strings that are passed to them as
900arguments.
869f4785
RS
901
902 The examples below use the characters @samp{X} and @samp{x} which have
ad800164 903@acronym{ASCII} codes 88 and 120 respectively.
869f4785
RS
904
905@defun downcase string-or-char
906This function converts a character or a string to lower case.
907
908When the argument to @code{downcase} is a string, the function creates
909and returns a new string in which each letter in the argument that is
910upper case is converted to lower case. When the argument to
911@code{downcase} is a character, @code{downcase} returns the
912corresponding lower case character. This value is an integer. If the
913original character is lower case, or is not a letter, then the value
914equals the original character.
915
916@example
917(downcase "The cat in the hat")
918 @result{} "the cat in the hat"
919
920(downcase ?X)
921 @result{} 120
922@end example
923@end defun
924
925@defun upcase string-or-char
926This function converts a character or a string to upper case.
927
928When the argument to @code{upcase} is a string, the function creates
929and returns a new string in which each letter in the argument that is
930lower case is converted to upper case.
931
932When the argument to @code{upcase} is a character, @code{upcase}
933returns the corresponding upper case character. This value is an integer.
934If the original character is upper case, or is not a letter, then the
8241495d 935value returned equals the original character.
869f4785
RS
936
937@example
938(upcase "The cat in the hat")
939 @result{} "THE CAT IN THE HAT"
940
941(upcase ?x)
942 @result{} 88
943@end example
944@end defun
945
946@defun capitalize string-or-char
947@cindex capitalization
948This function capitalizes strings or characters. If
949@var{string-or-char} is a string, the function creates and returns a new
950string, whose contents are a copy of @var{string-or-char} in which each
951word has been capitalized. This means that the first character of each
952word is converted to upper case, and the rest are converted to lower
953case.
954
955The definition of a word is any sequence of consecutive characters that
956are assigned to the word constituent syntax class in the current syntax
15da7853 957table (@pxref{Syntax Class Table}).
869f4785
RS
958
959When the argument to @code{capitalize} is a character, @code{capitalize}
960has the same result as @code{upcase}.
961
962@example
f67b6c12 963@group
869f4785
RS
964(capitalize "The cat in the hat")
965 @result{} "The Cat In The Hat"
f67b6c12 966@end group
869f4785 967
f67b6c12 968@group
869f4785
RS
969(capitalize "THE 77TH-HATTED CAT")
970 @result{} "The 77th-Hatted Cat"
f67b6c12 971@end group
869f4785
RS
972
973@group
974(capitalize ?x)
975 @result{} 88
976@end group
977@end example
978@end defun
979
f67b6c12
LT
980@defun upcase-initials string-or-char
981If @var{string-or-char} is a string, this function capitalizes the
982initials of the words in @var{string-or-char}, without altering any
983letters other than the initials. It returns a new string whose
984contents are a copy of @var{string-or-char}, in which each word has
b6ae404e 985had its initial letter converted to upper case.
969fe9b5
RS
986
987The definition of a word is any sequence of consecutive characters that
988are assigned to the word constituent syntax class in the current syntax
15da7853 989table (@pxref{Syntax Class Table}).
969fe9b5 990
f67b6c12
LT
991When the argument to @code{upcase-initials} is a character,
992@code{upcase-initials} has the same result as @code{upcase}.
993
969fe9b5
RS
994@example
995@group
996(upcase-initials "The CAT in the hAt")
997 @result{} "The CAT In The HAt"
998@end group
999@end example
1000@end defun
1001
a9f0a989
RS
1002 @xref{Text Comparison}, for functions that compare strings; some of
1003them ignore case differences, or can optionally ignore case differences.
1004
969fe9b5 1005@node Case Tables
869f4785
RS
1006@section The Case Table
1007
1008 You can customize case conversion by installing a special @dfn{case
1009table}. A case table specifies the mapping between upper case and lower
969fe9b5
RS
1010case letters. It affects both the case conversion functions for Lisp
1011objects (see the previous section) and those that apply to text in the
1012buffer (@pxref{Case Changes}). Each buffer has a case table; there is
1013also a standard case table which is used to initialize the case table
1014of new buffers.
f9f59935 1015
969fe9b5
RS
1016 A case table is a char-table (@pxref{Char-Tables}) whose subtype is
1017@code{case-table}. This char-table maps each character into the
1018corresponding lower case character. It has three extra slots, which
1019hold related tables:
f9f59935
RS
1020
1021@table @var
1022@item upcase
1023The upcase table maps each character into the corresponding upper
1024case character.
1025@item canonicalize
1026The canonicalize table maps all of a set of case-related characters
a9f0a989 1027into a particular member of that set.
f9f59935 1028@item equivalences
a9f0a989
RS
1029The equivalences table maps each one of a set of case-related characters
1030into the next character in that set.
f9f59935 1031@end table
869f4785 1032
f9f59935
RS
1033 In simple cases, all you need to specify is the mapping to lower-case;
1034the three related tables will be calculated automatically from that one.
869f4785
RS
1035
1036 For some languages, upper and lower case letters are not in one-to-one
1037correspondence. There may be two different lower case letters with the
1038same upper case equivalent. In these cases, you need to specify the
f9f59935 1039maps for both lower case and upper case.
869f4785 1040
f9f59935 1041 The extra table @var{canonicalize} maps each character to a canonical
869f4785 1042equivalent; any two characters that are related by case-conversion have
f9f59935
RS
1043the same canonical equivalent character. For example, since @samp{a}
1044and @samp{A} are related by case-conversion, they should have the same
1045canonical equivalent character (which should be either @samp{a} for both
1046of them, or @samp{A} for both of them).
869f4785 1047
d8186297 1048 The extra table @var{equivalences} is a map that cyclically permutes
f9f59935 1049each equivalence class (of characters with the same canonical
ad800164 1050equivalent). (For ordinary @acronym{ASCII}, this would map @samp{a} into
f9f59935
RS
1051@samp{A} and @samp{A} into @samp{a}, and likewise for each set of
1052equivalent characters.)
869f4785 1053
2778c642 1054 When you construct a case table, you can provide @code{nil} for
969fe9b5 1055@var{canonicalize}; then Emacs fills in this slot from the lower case
f9f59935 1056and upper case mappings. You can also provide @code{nil} for
969fe9b5 1057@var{equivalences}; then Emacs fills in this slot from
2778c642
RS
1058@var{canonicalize}. In a case table that is actually in use, those
1059components are non-@code{nil}. Do not try to specify @var{equivalences}
1060without also specifying @var{canonicalize}.
869f4785 1061
869f4785
RS
1062 Here are the functions for working with case tables:
1063
1064@defun case-table-p object
1065This predicate returns non-@code{nil} if @var{object} is a valid case
1066table.
1067@end defun
1068
1069@defun set-standard-case-table table
1070This function makes @var{table} the standard case table, so that it will
969fe9b5 1071be used in any buffers created subsequently.
869f4785
RS
1072@end defun
1073
1074@defun standard-case-table
1075This returns the standard case table.
1076@end defun
1077
1078@defun current-case-table
1079This function returns the current buffer's case table.
1080@end defun
1081
1082@defun set-case-table table
1083This sets the current buffer's case table to @var{table}.
1084@end defun
1085
1086 The following three functions are convenient subroutines for packages
ad800164 1087that define non-@acronym{ASCII} character sets. They modify the specified
f9f59935 1088case table @var{case-table}; they also modify the standard syntax table.
969fe9b5
RS
1089@xref{Syntax Tables}. Normally you would use these functions to change
1090the standard case table.
869f4785 1091
f9f59935 1092@defun set-case-syntax-pair uc lc case-table
869f4785
RS
1093This function specifies a pair of corresponding letters, one upper case
1094and one lower case.
1095@end defun
1096
f9f59935 1097@defun set-case-syntax-delims l r case-table
869f4785
RS
1098This function makes characters @var{l} and @var{r} a matching pair of
1099case-invariant delimiters.
1100@end defun
1101
f9f59935 1102@defun set-case-syntax char syntax case-table
869f4785
RS
1103This function makes @var{char} case-invariant, with syntax
1104@var{syntax}.
1105@end defun
1106
1107@deffn Command describe-buffer-case-table
1108This command displays a description of the contents of the current
1109buffer's case table.
1110@end deffn
ab5796a9
MB
1111
1112@ignore
1113 arch-tag: 700b8e95-7aa5-4b52-9eb3-8f2e1ea152b4
1114@end ignore