*** empty log message ***
[bpt/emacs.git] / lispref / strings.texi
CommitLineData
869f4785
RS
1@c -*-texinfo-*-
2@c This is part of the GNU Emacs Lisp Reference Manual.
ac902a01 3@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2003
177c0ea7 4@c Free Software Foundation, Inc.
869f4785
RS
5@c See the file elisp.texi for copying conditions.
6@setfilename ../info/strings
7@node Strings and Characters, Lists, Numbers, Top
8@comment node-name, next, previous, up
9@chapter Strings and Characters
10@cindex strings
11@cindex character arrays
12@cindex characters
13@cindex bytes
14
15 A string in Emacs Lisp is an array that contains an ordered sequence
16of characters. Strings are used as names of symbols, buffers, and
b6ae404e
KH
17files; to send messages to users; to hold text being copied between
18buffers; and for many other purposes. Because strings are so important,
869f4785
RS
19Emacs Lisp has many functions expressly for manipulating them. Emacs
20Lisp programs use strings more often than individual characters.
21
22 @xref{Strings of Events}, for special considerations for strings of
23keyboard character events.
24
25@menu
26* Basics: String Basics. Basic properties of strings and characters.
27* Predicates for Strings:: Testing whether an object is a string or char.
28* Creating Strings:: Functions to allocate new strings.
f9f59935 29* Modifying Strings:: Altering the contents of an existing string.
869f4785 30* Text Comparison:: Comparing characters or strings.
8241495d 31* String Conversion:: Converting to and from characters and strings.
a9f0a989 32* Formatting Strings:: @code{format}: Emacs's analogue of @code{printf}.
969fe9b5
RS
33* Case Conversion:: Case conversion functions.
34* Case Tables:: Customizing case conversion.
869f4785
RS
35@end menu
36
37@node String Basics
38@section String and Character Basics
39
b6ae404e 40 Characters are represented in Emacs Lisp as integers;
969fe9b5
RS
41whether an integer is a character or not is determined only by how it is
42used. Thus, strings really contain integers.
869f4785 43
f9f59935
RS
44 The length of a string (like any array) is fixed, and cannot be
45altered once the string exists. Strings in Lisp are @emph{not}
46terminated by a distinguished character code. (By contrast, strings in
ad800164 47C are terminated by a character with @acronym{ASCII} code 0.)
869f4785 48
969fe9b5
RS
49 Since strings are arrays, and therefore sequences as well, you can
50operate on them with the general array and sequence functions.
51(@xref{Sequences Arrays Vectors}.) For example, you can access or
52change individual characters in a string using the functions @code{aref}
53and @code{aset} (@pxref{Array Functions}).
869f4785 54
ad800164 55 There are two text representations for non-@acronym{ASCII} characters in
f9f59935 56Emacs strings (and in buffers): unibyte and multibyte (@pxref{Text
ad800164
EZ
57Representations}). An @acronym{ASCII} character always occupies one byte in a
58string; in fact, when a string is all @acronym{ASCII}, there is no real
b6ae404e
KH
59difference between the unibyte and multibyte representations.
60For most Lisp programming, you don't need to be concerned with these two
f9f59935 61representations.
869f4785
RS
62
63 Sometimes key sequences are represented as strings. When a string is
64a key sequence, string elements in the range 128 to 255 represent meta
8241495d 65characters (which are large integers) rather than character
969fe9b5 66codes in the range 128 to 255.
869f4785
RS
67
68 Strings cannot hold characters that have the hyper, super or alt
ad800164
EZ
69modifiers; they can hold @acronym{ASCII} control characters, but no other
70control characters. They do not distinguish case in @acronym{ASCII} control
f9f59935
RS
71characters. If you want to store such characters in a sequence, such as
72a key sequence, you must use a vector instead of a string.
8241495d 73@xref{Character Type}, for more information about the representation of meta
f9f59935 74and other modifiers for keyboard input characters.
869f4785 75
bfe721d1
KH
76 Strings are useful for holding regular expressions. You can also
77match regular expressions against strings (@pxref{Regexp Search}). The
78functions @code{match-string} (@pxref{Simple Match Data}) and
79@code{replace-match} (@pxref{Replacing Match}) are useful for
80decomposing and modifying strings based on regular expression matching.
81
869f4785
RS
82 Like a buffer, a string can contain text properties for the characters
83in it, as well as the characters themselves. @xref{Text Properties}.
bfe721d1
KH
84All the Lisp primitives that copy text from strings to buffers or other
85strings also copy the properties of the characters being copied.
869f4785
RS
86
87 @xref{Text}, for information about functions that display strings or
88copy them into buffers. @xref{Character Type}, and @ref{String Type},
89for information about the syntax of characters and strings.
f9f59935 90@xref{Non-ASCII Characters}, for functions to convert between text
b6ae404e 91representations and to encode and decode character codes.
869f4785
RS
92
93@node Predicates for Strings
94@section The Predicates for Strings
95
96For more information about general sequence and array predicates,
97see @ref{Sequences Arrays Vectors}, and @ref{Arrays}.
98
99@defun stringp object
a9f0a989 100This function returns @code{t} if @var{object} is a string, @code{nil}
869f4785
RS
101otherwise.
102@end defun
103
104@defun char-or-string-p object
a9f0a989 105This function returns @code{t} if @var{object} is a string or a
869f4785
RS
106character (i.e., an integer), @code{nil} otherwise.
107@end defun
108
109@node Creating Strings
110@section Creating Strings
111
112 The following functions create strings, either from scratch, or by
113putting strings together, or by taking them apart.
114
115@defun make-string count character
a9f0a989 116This function returns a string made up of @var{count} repetitions of
869f4785
RS
117@var{character}. If @var{count} is negative, an error is signaled.
118
119@example
120(make-string 5 ?x)
121 @result{} "xxxxx"
122(make-string 0 ?x)
123 @result{} ""
124@end example
125
126 Other functions to compare with this one include @code{char-to-string}
127(@pxref{String Conversion}), @code{make-vector} (@pxref{Vectors}), and
128@code{make-list} (@pxref{Building Lists}).
129@end defun
130
f9f59935
RS
131@defun string &rest characters
132This returns a string containing the characters @var{characters}.
133
134@example
135(string ?a ?b ?c)
136 @result{} "abc"
137@end example
138@end defun
139
869f4785 140@defun substring string start &optional end
bfe721d1 141This function returns a new string which consists of those characters
869f4785
RS
142from @var{string} in the range from (and including) the character at the
143index @var{start} up to (but excluding) the character at the index
144@var{end}. The first character is at index zero.
145
146@example
147@group
148(substring "abcdefg" 0 3)
149 @result{} "abc"
150@end group
151@end example
152
153@noindent
154Here the index for @samp{a} is 0, the index for @samp{b} is 1, and the
155index for @samp{c} is 2. Thus, three letters, @samp{abc}, are copied
156from the string @code{"abcdefg"}. The index 3 marks the character
157position up to which the substring is copied. The character whose index
158is 3 is actually the fourth character in the string.
159
160A negative number counts from the end of the string, so that @minus{}1
177c0ea7 161signifies the index of the last character of the string. For example:
869f4785
RS
162
163@example
164@group
165(substring "abcdefg" -3 -1)
166 @result{} "ef"
167@end group
168@end example
169
170@noindent
171In this example, the index for @samp{e} is @minus{}3, the index for
172@samp{f} is @minus{}2, and the index for @samp{g} is @minus{}1.
173Therefore, @samp{e} and @samp{f} are included, and @samp{g} is excluded.
174
f67b6c12 175When @code{nil} is used for @var{end}, it stands for the length of the
869f4785
RS
176string. Thus,
177
178@example
179@group
180(substring "abcdefg" -3 nil)
181 @result{} "efg"
182@end group
183@end example
184
185Omitting the argument @var{end} is equivalent to specifying @code{nil}.
186It follows that @code{(substring @var{string} 0)} returns a copy of all
187of @var{string}.
188
189@example
190@group
191(substring "abcdefg" 0)
192 @result{} "abcdefg"
193@end group
194@end example
195
196@noindent
197But we recommend @code{copy-sequence} for this purpose (@pxref{Sequence
198Functions}).
199
bfe721d1
KH
200If the characters copied from @var{string} have text properties, the
201properties are copied into the new string also. @xref{Text Properties}.
202
8241495d 203@code{substring} also accepts a vector for the first argument.
969fe9b5
RS
204For example:
205
206@example
207(substring [a b (c) "d"] 1 3)
208 @result{} [b (c)]
209@end example
210
f67b6c12
LT
211A @code{wrong-type-argument} error is signaled if @var{start} is not
212an integer or if @var{end} is neither an integer nor @code{nil}. An
213@code{args-out-of-range} error is signaled if @var{start} indicates a
214character following @var{end}, or if either integer is out of range
215for @var{string}.
869f4785
RS
216
217Contrast this function with @code{buffer-substring} (@pxref{Buffer
218Contents}), which returns a string containing a portion of the text in
219the current buffer. The beginning of a string is at index 0, but the
220beginning of a buffer is at index 1.
221@end defun
222
f67b6c12
LT
223@defun substring-no-properties string &optional start end
224This works like @code{substring} but discards all text properties from
225the value. Also, @var{start} may be omitted or @code{nil}, which is
226equivalent to 0. Thus, @w{@code{(substring-no-properties
227@var{string})}} returns a copy of @var{string}, with all text
228properties removed.
1994c2a7
RS
229@end defun
230
869f4785
RS
231@defun concat &rest sequences
232@cindex copying strings
233@cindex concatenating strings
234This function returns a new string consisting of the characters in the
bfe721d1
KH
235arguments passed to it (along with their text properties, if any). The
236arguments may be strings, lists of numbers, or vectors of numbers; they
237are not themselves changed. If @code{concat} receives no arguments, it
238returns an empty string.
869f4785
RS
239
240@example
241(concat "abc" "-def")
242 @result{} "abc-def"
a9f0a989 243(concat "abc" (list 120 121) [122])
869f4785
RS
244 @result{} "abcxyz"
245;; @r{@code{nil} is an empty sequence.}
246(concat "abc" nil "-def")
247 @result{} "abc-def"
248(concat "The " "quick brown " "fox.")
249 @result{} "The quick brown fox."
250(concat)
251 @result{} ""
252@end example
253
254@noindent
869f4785
RS
255The @code{concat} function always constructs a new string that is
256not @code{eq} to any existing string.
257
315fe0e9
DL
258In Emacs versions before 21, when an argument was an integer (not a
259sequence of integers), it was converted to a string of digits making up
260the decimal printed representation of the integer. This obsolete usage
261no longer works. The proper way to convert an integer to its decimal
262printed form is with @code{format} (@pxref{Formatting Strings}) or
a10f6c69 263@code{number-to-string} (@pxref{String Conversion}).
869f4785 264
869f4785
RS
265For information about other concatenation functions, see the
266description of @code{mapconcat} in @ref{Mapping Functions},
ad833e10 267@code{vconcat} in @ref{Vector Functions}, and @code{append} in @ref{Building
869f4785
RS
268Lists}.
269@end defun
270
f67b6c12 271@defun split-string string &optional separators omit-nulls
a730d07b
RS
272This function splits @var{string} into substrings at matches for the
273regular expression @var{separators}. Each match for @var{separators}
274defines a splitting point; the substrings between the splitting points
275are made into a list, which is the value returned by
276@code{split-string}.
277
278If @var{omit-nulls} is @code{nil}, the result contains null strings
279whenever there are two consecutive matches for @var{separators}, or a
280match is adjacent to the beginning or end of @var{string}. If
281@var{omit-nulls} is @code{t}, these null strings are omitted from the
282result list.
283
b6ae404e 284If @var{separators} is @code{nil} (or omitted),
3aeea9e9 285the default is the value of @code{split-string-default-separators}.
f9f59935 286
3aeea9e9
JB
287As a special case, when @var{separators} is @code{nil} (or omitted),
288null strings are always omitted from the result. Thus:
f9f59935
RS
289
290@example
3aeea9e9 291(split-string " two words ")
f67b6c12 292 @result{} ("two" "words")
3aeea9e9
JB
293@end example
294
295The result is not @samp{("" "two" "words" "")}, which would rarely be
d8186297 296useful. If you need such a result, use an explicit value for
3aeea9e9
JB
297@var{separators}:
298
299@example
300(split-string " two words " split-string-default-separators)
f67b6c12 301 @result{} ("" "two" "words" "")
f9f59935
RS
302@end example
303
3aeea9e9 304More examples:
f9f59935
RS
305
306@example
3aeea9e9 307(split-string "Soup is good food" "o")
f67b6c12 308 @result{} ("S" "up is g" "" "d f" "" "d")
3aeea9e9 309(split-string "Soup is good food" "o" t)
f67b6c12 310 @result{} ("S" "up is g" "d f" "d")
3aeea9e9 311(split-string "Soup is good food" "o+")
f67b6c12 312 @result{} ("S" "up is g" "d f" "d")
f9f59935
RS
313@end example
314
f67b6c12
LT
315Empty matches do count, except that @code{split-string} will not look
316for a final empty match when it already reached the end of the string
317using a non-empty match or when @var{string} is empty:
f9f59935
RS
318
319@example
f67b6c12
LT
320(split-string "aooob" "o*")
321 @result{} ("" "a" "" "b" "")
322(split-string "ooaboo" "o*")
323 @result{} ("" "" "a" "b" "")
324(split-string "" "")
325 @result{} ("")
326@end example
327
328However, when @var{separators} can match the empty string,
329@var{omit-nulls} is usually @code{t}, so that the subtleties in the
330three previous examples are rarely relevant:
331
332@example
333(split-string "Soup is good food" "o*" t)
334 @result{} ("S" "u" "p" " " "i" "s" " " "g" "d" " " "f" "d")
335(split-string "Nice doggy!" "" t)
336 @result{} ("N" "i" "c" "e" " " "d" "o" "g" "g" "y" "!")
337(split-string "" "" t)
338 @result{} nil
339@end example
340
341Somewhat odd, but predictable, behavior can occur for certain
342``non-greedy'' values of @var{separators} that can prefer empty
343matches over non-empty matches. Again, such values rarely occur in
344practice:
345
346@example
347(split-string "ooo" "o*" t)
348 @result{} nil
349(split-string "ooo" "\\|o+" t)
350 @result{} ("o" "o" "o")
f9f59935
RS
351@end example
352@end defun
353
ac902a01
MB
354@defvar split-string-default-separators
355The default value of @var{separators} for @code{split-string}, initially
f67b6c12 356@w{@samp{"[ \f\t\n\r\v]+"}}.
ac902a01
MB
357@end defvar
358
f9f59935
RS
359@node Modifying Strings
360@section Modifying Strings
361
362 The most basic way to alter the contents of an existing string is with
363@code{aset} (@pxref{Array Functions}). @code{(aset @var{string}
364@var{idx} @var{char})} stores @var{char} into @var{string} at index
365@var{idx}. Each character occupies one or more bytes, and if @var{char}
366needs a different number of bytes from the character already present at
969fe9b5 367that index, @code{aset} signals an error.
f9f59935
RS
368
369 A more powerful function is @code{store-substring}:
370
f9f59935
RS
371@defun store-substring string idx obj
372This function alters part of the contents of the string @var{string}, by
373storing @var{obj} starting at index @var{idx}. The argument @var{obj}
374may be either a character or a (smaller) string.
375
376Since it is impossible to change the length of an existing string, it is
377an error if @var{obj} doesn't fit within @var{string}'s actual length,
b6ae404e 378or if any new character requires a different number of bytes from the
969fe9b5 379character currently present at that point in @var{string}.
81e65dff
RS
380@end defun
381
382 To clear out a string that contained a password, use
383@code{clear-string}:
384
385@defun clear-string string
386This clears the contents of @var{string} to zeros
387and may change its length.
f9f59935
RS
388@end defun
389
bda144f4 390@need 2000
869f4785
RS
391@node Text Comparison
392@section Comparison of Characters and Strings
393@cindex string equality
394
395@defun char-equal character1 character2
396This function returns @code{t} if the arguments represent the same
397character, @code{nil} otherwise. This function ignores differences
398in case if @code{case-fold-search} is non-@code{nil}.
399
400@example
401(char-equal ?x ?x)
402 @result{} t
f9f59935
RS
403(let ((case-fold-search nil))
404 (char-equal ?x ?X))
405 @result{} nil
869f4785
RS
406@end example
407@end defun
408
409@defun string= string1 string2
410This function returns @code{t} if the characters of the two strings
f67b6c12
LT
411match exactly. Symbols are also allowed as arguments, in which case
412their print names are used.
b6ae404e 413Case is always significant, regardless of @code{case-fold-search}.
869f4785
RS
414
415@example
416(string= "abc" "abc")
417 @result{} t
418(string= "abc" "ABC")
419 @result{} nil
420(string= "ab" "ABC")
421 @result{} nil
422@end example
22697dac 423
f9f59935
RS
424The function @code{string=} ignores the text properties of the two
425strings. When @code{equal} (@pxref{Equality Predicates}) compares two
426strings, it uses @code{string=}.
427
a62f71e4
LT
428For technical reasons, a unibyte and a multibyte string are
429@code{equal} if and only if they contain the same sequence of
430character codes and all these codes are either in the range 0 through
431127 (@acronym{ASCII}) or 160 through 255 (@code{eight-bit-graphic}).
432However, when a unibyte string gets converted to a multibyte string,
433all characters with codes in the range 160 through 255 get converted
434to characters with higher codes, whereas @acronym{ASCII} characters
435remain unchanged. Thus, a unibyte string and its conversion to
436multibyte are only @code{equal} if the string is all @acronym{ASCII}.
437Character codes 160 through 255 are not entirely proper in multibyte
438text, even though they can occur. As a consequence, the situation
439where a unibyte and a multibyte string are @code{equal} without both
440being all @acronym{ASCII} is a technical oddity that very few Emacs
441Lisp programmers ever get confronted with. @xref{Text
f9f59935 442Representations}.
869f4785
RS
443@end defun
444
445@defun string-equal string1 string2
446@code{string-equal} is another name for @code{string=}.
447@end defun
448
449@cindex lexical comparison
450@defun string< string1 string2
451@c (findex string< causes problems for permuted index!!)
8241495d
RS
452This function compares two strings a character at a time. It
453scans both the strings at the same time to find the first pair of corresponding
454characters that do not match. If the lesser character of these two is
869f4785
RS
455the character from @var{string1}, then @var{string1} is less, and this
456function returns @code{t}. If the lesser character is the one from
457@var{string2}, then @var{string1} is greater, and this function returns
458@code{nil}. If the two strings match entirely, the value is @code{nil}.
459
969fe9b5
RS
460Pairs of characters are compared according to their character codes.
461Keep in mind that lower case letters have higher numeric values in the
ad800164 462@acronym{ASCII} character set than their upper case counterparts; digits and
869f4785 463many punctuation characters have a lower numeric value than upper case
ad800164
EZ
464letters. An @acronym{ASCII} character is less than any non-@acronym{ASCII}
465character; a unibyte non-@acronym{ASCII} character is always less than any
466multibyte non-@acronym{ASCII} character (@pxref{Text Representations}).
869f4785
RS
467
468@example
469@group
470(string< "abc" "abd")
471 @result{} t
472(string< "abd" "abc")
473 @result{} nil
474(string< "123" "abc")
475 @result{} t
476@end group
477@end example
478
479When the strings have different lengths, and they match up to the
480length of @var{string1}, then the result is @code{t}. If they match up
481to the length of @var{string2}, the result is @code{nil}. A string of
482no characters is less than any other string.
483
484@example
485@group
486(string< "" "abc")
487 @result{} t
488(string< "ab" "abc")
489 @result{} t
490(string< "abc" "")
491 @result{} nil
492(string< "abc" "ab")
493 @result{} nil
494(string< "" "")
177c0ea7 495 @result{} nil
869f4785
RS
496@end group
497@end example
f67b6c12
LT
498
499Symbols are also allowed as arguments, in which case their print names
500are used.
869f4785
RS
501@end defun
502
503@defun string-lessp string1 string2
504@code{string-lessp} is another name for @code{string<}.
a9f0a989
RS
505@end defun
506
507@defun compare-strings string1 start1 end1 string2 start2 end2 &optional ignore-case
8241495d 508This function compares the specified part of @var{string1} with the
a9f0a989 509specified part of @var{string2}. The specified part of @var{string1}
8241495d
RS
510runs from index @var{start1} up to index @var{end1} (@code{nil} means
511the end of the string). The specified part of @var{string2} runs from
512index @var{start2} up to index @var{end2} (@code{nil} means the end of
513the string).
a9f0a989
RS
514
515The strings are both converted to multibyte for the comparison
a62f71e4
LT
516(@pxref{Text Representations}) so that a unibyte string and its
517conversion to multibyte are always regarded as equal. If
518@var{ignore-case} is non-@code{nil}, then case is ignored, so that
519upper case letters can be equal to lower case letters.
a9f0a989
RS
520
521If the specified portions of the two strings match, the value is
522@code{t}. Otherwise, the value is an integer which indicates how many
523leading characters agree, and which string is less. Its absolute value
524is one plus the number of characters that agree at the beginning of the
525two strings. The sign is negative if @var{string1} (or its specified
526portion) is less.
527@end defun
528
81e65dff
RS
529@defun assoc-string key alist &optional case-fold
530This function works like @code{assoc}, except that @var{key} must be a
531string, and comparison is done using @code{compare-strings}. If
532@var{case-fold} is non-@code{nil}, it ignores case differences.
d8186297
LT
533Unlike @code{assoc}, this function can also match elements of the alist
534that are strings rather than conses. In particular, @var{alist} can
535be a list of strings rather than an actual alist.
a62f71e4 536@xref{Association Lists}.
869f4785
RS
537@end defun
538
539 See also @code{compare-buffer-substrings} in @ref{Comparing Text}, for
540a way to compare text in buffers. The function @code{string-match},
541which matches a regular expression against a string, can be used
542for a kind of string comparison; see @ref{Regexp Search}.
543
544@node String Conversion
545@comment node-name, next, previous, up
546@section Conversion of Characters and Strings
547@cindex conversion of strings
548
549 This section describes functions for conversions between characters,
550strings and integers. @code{format} and @code{prin1-to-string}
551(@pxref{Output Functions}) can also convert Lisp objects into strings.
552@code{read-from-string} (@pxref{Input Functions}) can ``convert'' a
f9f59935
RS
553string representation of a Lisp object into an object. The functions
554@code{string-make-multibyte} and @code{string-make-unibyte} convert the
555text representation of a string (@pxref{Converting Representations}).
869f4785
RS
556
557 @xref{Documentation}, for functions that produce textual descriptions
558of text characters and general input events
559(@code{single-key-description} and @code{text-char-description}). These
560functions are used primarily for making help messages.
561
562@defun char-to-string character
563@cindex character to string
969fe9b5
RS
564This function returns a new string containing one character,
565@var{character}. This function is semi-obsolete because the function
566@code{string} is more general. @xref{Creating Strings}.
869f4785
RS
567@end defun
568
569@defun string-to-char string
570@cindex string to character
571 This function returns the first character in @var{string}. If the
572string is empty, the function returns 0. The value is also 0 when the
ad800164 573first character of @var{string} is the null character, @acronym{ASCII} code
869f4785
RS
5740.
575
576@example
577(string-to-char "ABC")
578 @result{} 65
579(string-to-char "xyz")
580 @result{} 120
581(string-to-char "")
582 @result{} 0
8241495d 583@group
869f4785
RS
584(string-to-char "\000")
585 @result{} 0
8241495d 586@end group
869f4785
RS
587@end example
588
589This function may be eliminated in the future if it does not seem useful
590enough to retain.
591@end defun
592
593@defun number-to-string number
594@cindex integer to string
595@cindex integer to decimal
b6ae404e 596This function returns a string consisting of the printed base-ten
869f4785 597representation of @var{number}, which may be an integer or a floating
8241495d 598point number. The returned value starts with a minus sign if the argument is
869f4785
RS
599negative.
600
601@example
602(number-to-string 256)
603 @result{} "256"
f67b6c12 604@group
869f4785
RS
605(number-to-string -23)
606 @result{} "-23"
f67b6c12 607@end group
869f4785
RS
608(number-to-string -23.5)
609 @result{} "-23.5"
610@end example
611
612@cindex int-to-string
613@code{int-to-string} is a semi-obsolete alias for this function.
614
615See also the function @code{format} in @ref{Formatting Strings}.
616@end defun
617
a9f0a989 618@defun string-to-number string &optional base
869f4785
RS
619@cindex string to number
620This function returns the numeric value of the characters in
f67b6c12
LT
621@var{string}. If @var{base} is non-@code{nil}, it must be an integer
622between 2 and 16 (inclusive), and integers are converted in that base.
623If @var{base} is @code{nil}, then base ten is used. Floating point
624conversion only works in base ten; we have not implemented other
625radices for floating point numbers, because that would be much more
626work and does not seem useful. If @var{string} looks like an integer
627but its value is too large to fit into a Lisp integer,
3afd8c25 628@code{string-to-number} returns a floating point result.
f9f59935 629
f67b6c12
LT
630The parsing skips spaces and tabs at the beginning of @var{string},
631then reads as much of @var{string} as it can interpret as a number in
632the given base. (On some systems it ignores other whitespace at the
633beginning, not just spaces and tabs.) If the first character after
634the ignored whitespace is neither a digit in the given base, nor a
635plus or minus sign, nor the leading dot of a floating point number,
636this function returns 0.
869f4785
RS
637
638@example
639(string-to-number "256")
640 @result{} 256
641(string-to-number "25 is a perfect square.")
642 @result{} 25
643(string-to-number "X256")
644 @result{} 0
645(string-to-number "-4.5")
646 @result{} -4.5
ea626e87
RS
647(string-to-number "1e5")
648 @result{} 100000.0
869f4785
RS
649@end example
650
651@findex string-to-int
652@code{string-to-int} is an obsolete alias for this function.
653@end defun
654
f9f59935
RS
655 Here are some other functions that can convert to or from a string:
656
657@table @code
658@item concat
659@code{concat} can convert a vector or a list into a string.
660@xref{Creating Strings}.
661
662@item vconcat
663@code{vconcat} can convert a string into a vector. @xref{Vector
664Functions}.
665
666@item append
667@code{append} can convert a string into a list. @xref{Building Lists}.
668@end table
669
869f4785
RS
670@node Formatting Strings
671@comment node-name, next, previous, up
672@section Formatting Strings
673@cindex formatting strings
674@cindex strings, formatting them
675
676 @dfn{Formatting} means constructing a string by substitution of
b6ae404e
KH
677computed values at various places in a constant string. This constant string
678controls how the other values are printed, as well as where they appear;
869f4785
RS
679it is called a @dfn{format string}.
680
681 Formatting is often useful for computing messages to be displayed. In
682fact, the functions @code{message} and @code{error} provide the same
683formatting feature described here; they differ from @code{format} only
684in how they use the result of formatting.
685
686@defun format string &rest objects
969fe9b5 687This function returns a new string that is made by copying
177c0ea7 688@var{string} and then replacing any format specification
869f4785
RS
689in the copy with encodings of the corresponding @var{objects}. The
690arguments @var{objects} are the computed values to be formatted.
8241495d
RS
691
692The characters in @var{string}, other than the format specifications,
693are copied directly into the output; starting in Emacs 21, if they have
694text properties, these are copied into the output also.
869f4785
RS
695@end defun
696
697@cindex @samp{%} in format
698@cindex format specification
699 A format specification is a sequence of characters beginning with a
700@samp{%}. Thus, if there is a @samp{%d} in @var{string}, the
701@code{format} function replaces it with the printed representation of
702one of the values to be formatted (one of the arguments @var{objects}).
703For example:
704
705@example
706@group
707(format "The value of fill-column is %d." fill-column)
708 @result{} "The value of fill-column is 72."
709@end group
710@end example
711
712 If @var{string} contains more than one format specification, the
b6ae404e 713format specifications correspond to successive values from
869f4785
RS
714@var{objects}. Thus, the first format specification in @var{string}
715uses the first such value, the second format specification uses the
716second such value, and so on. Any extra format specifications (those
717for which there are no corresponding values) cause unpredictable
718behavior. Any extra values to be formatted are ignored.
719
a9f0a989
RS
720 Certain format specifications require values of particular types. If
721you supply a value that doesn't fit the requirements, an error is
722signaled.
869f4785
RS
723
724 Here is a table of valid format specifications:
725
726@table @samp
727@item %s
728Replace the specification with the printed representation of the object,
f9f59935 729made without quoting (that is, using @code{princ}, not
969fe9b5 730@code{prin1}---@pxref{Output Functions}). Thus, strings are represented
f9f59935
RS
731by their contents alone, with no @samp{"} characters, and symbols appear
732without @samp{\} characters.
869f4785 733
8241495d
RS
734Starting in Emacs 21, if the object is a string, its text properties are
735copied into the output. The text properties of the @samp{%s} itself
736are also copied, but those of the object take priority.
737
869f4785
RS
738@item %S
739Replace the specification with the printed representation of the object,
f9f59935
RS
740made with quoting (that is, using @code{prin1}---@pxref{Output
741Functions}). Thus, strings are enclosed in @samp{"} characters, and
742@samp{\} characters appear where necessary before special characters.
869f4785 743
869f4785
RS
744@item %o
745@cindex integer to octal
746Replace the specification with the base-eight representation of an
747integer.
748
749@item %d
750Replace the specification with the base-ten representation of an
751integer.
752
753@item %x
898bb59a 754@itemx %X
869f4785
RS
755@cindex integer to hexadecimal
756Replace the specification with the base-sixteen representation of an
898bb59a 757integer. @samp{%x} uses lower case and @samp{%X} uses upper case.
869f4785
RS
758
759@item %c
760Replace the specification with the character which is the value given.
761
762@item %e
763Replace the specification with the exponential notation for a floating
394d33a8 764point number.
869f4785
RS
765
766@item %f
767Replace the specification with the decimal-point notation for a floating
768point number.
769
770@item %g
771Replace the specification with notation for a floating point number,
a9f0a989 772using either exponential notation or decimal-point notation, whichever
394d33a8 773is shorter.
869f4785
RS
774
775@item %%
898bb59a
DL
776Replace the specification with a single @samp{%}. This format
777specification is unusual in that it does not use a value. For example,
778@code{(format "%% %d" 30)} returns @code{"% 30"}.
869f4785
RS
779@end table
780
781 Any other format character results in an @samp{Invalid format
782operation} error.
783
784 Here are several examples:
785
786@example
787@group
788(format "The name of this buffer is %s." (buffer-name))
789 @result{} "The name of this buffer is strings.texi."
790
791(format "The buffer object prints as %s." (current-buffer))
9feb90da 792 @result{} "The buffer object prints as strings.texi."
869f4785 793
177c0ea7 794(format "The octal value of %d is %o,
869f4785 795 and the hex value is %x." 18 18 18)
177c0ea7 796 @result{} "The octal value of 18 is 22,
869f4785
RS
797 and the hex value is 12."
798@end group
799@end example
800
869f4785
RS
801@cindex field width
802@cindex padding
728345f8
JY
803 All the specification characters allow an optional ``width'', which
804is a digit-string between the @samp{%} and the character. If the
805printed representation of the object contains fewer characters than
806this width, then it is padded. The padding is on the left if the
d8186297
LT
807width is positive (or starts with zero) and on the right if the
808width is negative. The padding character is normally a space, but if
728345f8
JY
809the width starts with a zero, zeros are used for padding. Some of
810these conventions are ignored for specification characters for which
d8186297
LT
811they do not make sense. That is, @samp{%s}, @samp{%S} and @samp{%c}
812accept a width starting with 0, but still pad with @emph{spaces} on
813the left. Also, @samp{%%} accepts a width, but ignores it. Here are
814some examples of padding:
869f4785
RS
815
816@example
817(format "%06d is padded on the left with zeros" 123)
818 @result{} "000123 is padded on the left with zeros"
819
820(format "%-6d is padded on the right" 123)
821 @result{} "123 is padded on the right"
822@end example
823
728345f8
JY
824If the width is too small, @code{format} does not truncate the
825object's printed representation. Thus, you can use a width to specify
826a minimum spacing between columns with no risk of losing information.
869f4785
RS
827
828 In the following three examples, @samp{%7s} specifies a minimum width
829of 7. In the first case, the string inserted in place of @samp{%7s} has
830only 3 letters, so 4 blank spaces are inserted for padding. In the
831second case, the string @code{"specification"} is 13 letters wide but is
832not truncated. In the third case, the padding is on the right.
833
177c0ea7 834@smallexample
869f4785
RS
835@group
836(format "The word `%7s' actually has %d letters in it."
837 "foo" (length "foo"))
177c0ea7 838 @result{} "The word ` foo' actually has 3 letters in it."
869f4785
RS
839@end group
840
841@group
842(format "The word `%7s' actually has %d letters in it."
177c0ea7
JB
843 "specification" (length "specification"))
844 @result{} "The word `specification' actually has 13 letters in it."
869f4785
RS
845@end group
846
847@group
848(format "The word `%-7s' actually has %d letters in it."
849 "foo" (length "foo"))
177c0ea7 850 @result{} "The word `foo ' actually has 3 letters in it."
869f4785
RS
851@end group
852@end smallexample
853
d8186297 854@cindex precision in format specifications
728345f8
JY
855 All the specification characters allow an optional ``precision''
856before the character (after the width, if present). The precision is
857a decimal-point @samp{.} followed by a digit-string. For the
d8186297
LT
858floating-point specifications (@samp{%e}, @samp{%f}, @samp{%g}), the
859precision specifies how many decimal places to show; if zero, the
860decimal-point itself is also omitted. For @samp{%s} and @samp{%S},
861the precision truncates the string to the given width, so
862@samp{%.3s} shows only the first three characters of the
863representation for @var{object}. Precision is ignored for other
864specification characters.
865
866@cindex flags in format specifications
867Immediately after the @samp{%} and before the optional width and
868precision, you can put certain ``flag'' characters.
869
870A space character inserts a space for positive numbers (otherwise
728345f8 871nothing is inserted for positive numbers). This flag is ignored
d8186297 872except for @samp{%d}, @samp{%e}, @samp{%f}, @samp{%g}.
728345f8 873
d8186297
LT
874The flag @samp{#} indicates ``alternate form''. For @samp{%o} it
875ensures that the result begins with a 0. For @samp{%x} and @samp{%X}
876the result is prefixed with @samp{0x} or @samp{0X}. For @samp{%e},
877@samp{%f}, and @samp{%g} a decimal point is always shown even if the
878precision is zero.
728345f8 879
969fe9b5 880@node Case Conversion
177c0ea7 881@comment node-name, next, previous, up
969fe9b5 882@section Case Conversion in Lisp
177c0ea7
JB
883@cindex upper case
884@cindex lower case
885@cindex character case
969fe9b5 886@cindex case conversion in Lisp
869f4785
RS
887
888 The character case functions change the case of single characters or
a9f0a989
RS
889of the contents of strings. The functions normally convert only
890alphabetic characters (the letters @samp{A} through @samp{Z} and
ad800164 891@samp{a} through @samp{z}, as well as non-@acronym{ASCII} letters); other
8241495d
RS
892characters are not altered. You can specify a different case
893conversion mapping by specifying a case table (@pxref{Case Tables}).
a9f0a989
RS
894
895 These functions do not modify the strings that are passed to them as
896arguments.
869f4785
RS
897
898 The examples below use the characters @samp{X} and @samp{x} which have
ad800164 899@acronym{ASCII} codes 88 and 120 respectively.
869f4785
RS
900
901@defun downcase string-or-char
902This function converts a character or a string to lower case.
903
904When the argument to @code{downcase} is a string, the function creates
905and returns a new string in which each letter in the argument that is
906upper case is converted to lower case. When the argument to
907@code{downcase} is a character, @code{downcase} returns the
908corresponding lower case character. This value is an integer. If the
909original character is lower case, or is not a letter, then the value
910equals the original character.
911
912@example
913(downcase "The cat in the hat")
914 @result{} "the cat in the hat"
915
916(downcase ?X)
917 @result{} 120
918@end example
919@end defun
920
921@defun upcase string-or-char
922This function converts a character or a string to upper case.
923
924When the argument to @code{upcase} is a string, the function creates
925and returns a new string in which each letter in the argument that is
926lower case is converted to upper case.
927
928When the argument to @code{upcase} is a character, @code{upcase}
929returns the corresponding upper case character. This value is an integer.
930If the original character is upper case, or is not a letter, then the
8241495d 931value returned equals the original character.
869f4785
RS
932
933@example
934(upcase "The cat in the hat")
935 @result{} "THE CAT IN THE HAT"
936
937(upcase ?x)
938 @result{} 88
939@end example
940@end defun
941
942@defun capitalize string-or-char
943@cindex capitalization
944This function capitalizes strings or characters. If
945@var{string-or-char} is a string, the function creates and returns a new
946string, whose contents are a copy of @var{string-or-char} in which each
947word has been capitalized. This means that the first character of each
948word is converted to upper case, and the rest are converted to lower
949case.
950
951The definition of a word is any sequence of consecutive characters that
952are assigned to the word constituent syntax class in the current syntax
15da7853 953table (@pxref{Syntax Class Table}).
869f4785
RS
954
955When the argument to @code{capitalize} is a character, @code{capitalize}
956has the same result as @code{upcase}.
957
958@example
f67b6c12 959@group
869f4785
RS
960(capitalize "The cat in the hat")
961 @result{} "The Cat In The Hat"
f67b6c12 962@end group
869f4785 963
f67b6c12 964@group
869f4785
RS
965(capitalize "THE 77TH-HATTED CAT")
966 @result{} "The 77th-Hatted Cat"
f67b6c12 967@end group
869f4785
RS
968
969@group
970(capitalize ?x)
971 @result{} 88
972@end group
973@end example
974@end defun
975
f67b6c12
LT
976@defun upcase-initials string-or-char
977If @var{string-or-char} is a string, this function capitalizes the
978initials of the words in @var{string-or-char}, without altering any
979letters other than the initials. It returns a new string whose
980contents are a copy of @var{string-or-char}, in which each word has
b6ae404e 981had its initial letter converted to upper case.
969fe9b5
RS
982
983The definition of a word is any sequence of consecutive characters that
984are assigned to the word constituent syntax class in the current syntax
15da7853 985table (@pxref{Syntax Class Table}).
969fe9b5 986
f67b6c12
LT
987When the argument to @code{upcase-initials} is a character,
988@code{upcase-initials} has the same result as @code{upcase}.
989
969fe9b5
RS
990@example
991@group
992(upcase-initials "The CAT in the hAt")
993 @result{} "The CAT In The HAt"
994@end group
995@end example
996@end defun
997
a9f0a989
RS
998 @xref{Text Comparison}, for functions that compare strings; some of
999them ignore case differences, or can optionally ignore case differences.
1000
969fe9b5 1001@node Case Tables
869f4785
RS
1002@section The Case Table
1003
1004 You can customize case conversion by installing a special @dfn{case
1005table}. A case table specifies the mapping between upper case and lower
969fe9b5
RS
1006case letters. It affects both the case conversion functions for Lisp
1007objects (see the previous section) and those that apply to text in the
1008buffer (@pxref{Case Changes}). Each buffer has a case table; there is
1009also a standard case table which is used to initialize the case table
1010of new buffers.
f9f59935 1011
969fe9b5
RS
1012 A case table is a char-table (@pxref{Char-Tables}) whose subtype is
1013@code{case-table}. This char-table maps each character into the
1014corresponding lower case character. It has three extra slots, which
1015hold related tables:
f9f59935
RS
1016
1017@table @var
1018@item upcase
1019The upcase table maps each character into the corresponding upper
1020case character.
1021@item canonicalize
1022The canonicalize table maps all of a set of case-related characters
a9f0a989 1023into a particular member of that set.
f9f59935 1024@item equivalences
a9f0a989
RS
1025The equivalences table maps each one of a set of case-related characters
1026into the next character in that set.
f9f59935 1027@end table
869f4785 1028
f9f59935
RS
1029 In simple cases, all you need to specify is the mapping to lower-case;
1030the three related tables will be calculated automatically from that one.
869f4785
RS
1031
1032 For some languages, upper and lower case letters are not in one-to-one
1033correspondence. There may be two different lower case letters with the
1034same upper case equivalent. In these cases, you need to specify the
f9f59935 1035maps for both lower case and upper case.
869f4785 1036
f9f59935 1037 The extra table @var{canonicalize} maps each character to a canonical
869f4785 1038equivalent; any two characters that are related by case-conversion have
f9f59935
RS
1039the same canonical equivalent character. For example, since @samp{a}
1040and @samp{A} are related by case-conversion, they should have the same
1041canonical equivalent character (which should be either @samp{a} for both
1042of them, or @samp{A} for both of them).
869f4785 1043
d8186297 1044 The extra table @var{equivalences} is a map that cyclically permutes
f9f59935 1045each equivalence class (of characters with the same canonical
ad800164 1046equivalent). (For ordinary @acronym{ASCII}, this would map @samp{a} into
f9f59935
RS
1047@samp{A} and @samp{A} into @samp{a}, and likewise for each set of
1048equivalent characters.)
869f4785 1049
2778c642 1050 When you construct a case table, you can provide @code{nil} for
969fe9b5 1051@var{canonicalize}; then Emacs fills in this slot from the lower case
f9f59935 1052and upper case mappings. You can also provide @code{nil} for
969fe9b5 1053@var{equivalences}; then Emacs fills in this slot from
2778c642
RS
1054@var{canonicalize}. In a case table that is actually in use, those
1055components are non-@code{nil}. Do not try to specify @var{equivalences}
1056without also specifying @var{canonicalize}.
869f4785 1057
869f4785
RS
1058 Here are the functions for working with case tables:
1059
1060@defun case-table-p object
1061This predicate returns non-@code{nil} if @var{object} is a valid case
1062table.
1063@end defun
1064
1065@defun set-standard-case-table table
1066This function makes @var{table} the standard case table, so that it will
969fe9b5 1067be used in any buffers created subsequently.
869f4785
RS
1068@end defun
1069
1070@defun standard-case-table
1071This returns the standard case table.
1072@end defun
1073
1074@defun current-case-table
1075This function returns the current buffer's case table.
1076@end defun
1077
1078@defun set-case-table table
1079This sets the current buffer's case table to @var{table}.
1080@end defun
1081
1082 The following three functions are convenient subroutines for packages
ad800164 1083that define non-@acronym{ASCII} character sets. They modify the specified
f9f59935 1084case table @var{case-table}; they also modify the standard syntax table.
969fe9b5
RS
1085@xref{Syntax Tables}. Normally you would use these functions to change
1086the standard case table.
869f4785 1087
f9f59935 1088@defun set-case-syntax-pair uc lc case-table
869f4785
RS
1089This function specifies a pair of corresponding letters, one upper case
1090and one lower case.
1091@end defun
1092
f9f59935 1093@defun set-case-syntax-delims l r case-table
869f4785
RS
1094This function makes characters @var{l} and @var{r} a matching pair of
1095case-invariant delimiters.
1096@end defun
1097
f9f59935 1098@defun set-case-syntax char syntax case-table
869f4785
RS
1099This function makes @var{char} case-invariant, with syntax
1100@var{syntax}.
1101@end defun
1102
1103@deffn Command describe-buffer-case-table
1104This command displays a description of the contents of the current
1105buffer's case table.
1106@end deffn
ab5796a9
MB
1107
1108@ignore
1109 arch-tag: 700b8e95-7aa5-4b52-9eb3-8f2e1ea152b4
1110@end ignore