@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998 Free Software Foundation, Inc.
+@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999
+@c Free Software Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@setfilename ../info/strings
@node Strings and Characters, Lists, Numbers, Top
A string in Emacs Lisp is an array that contains an ordered sequence
of characters. Strings are used as names of symbols, buffers, and
-files, to send messages to users, to hold text being copied between
-buffers, and for many other purposes. Because strings are so important,
+files; to send messages to users; to hold text being copied between
+buffers; and for many other purposes. Because strings are so important,
Emacs Lisp has many functions expressly for manipulating them. Emacs
Lisp programs use strings more often than individual characters.
* Creating Strings:: Functions to allocate new strings.
* Modifying Strings:: Altering the contents of an existing string.
* Text Comparison:: Comparing characters or strings.
-* String Conversion:: Converting characters or strings and vice versa.
+* String Conversion:: Converting to and from characters and strings.
* Formatting Strings:: @code{format}: Emacs's analogue of @code{printf}.
* Case Conversion:: Case conversion functions.
* Case Tables:: Customizing case conversion.
@node String Basics
@section String and Character Basics
- Strings in Emacs Lisp are arrays that contain an ordered sequence of
-characters. Characters are represented in Emacs Lisp as integers;
+ Characters are represented in Emacs Lisp as integers;
whether an integer is a character or not is determined only by how it is
used. Thus, strings really contain integers.
The length of a string (like any array) is fixed, and cannot be
altered once the string exists. Strings in Lisp are @emph{not}
terminated by a distinguished character code. (By contrast, strings in
-C are terminated by a character with @sc{ASCII} code 0.)
+C are terminated by a character with @sc{ascii} code 0.)
Since strings are arrays, and therefore sequences as well, you can
operate on them with the general array and sequence functions.
change individual characters in a string using the functions @code{aref}
and @code{aset} (@pxref{Array Functions}).
- There are two text representations for non-@sc{ASCII} characters in
+ There are two text representations for non-@sc{ascii} characters in
Emacs strings (and in buffers): unibyte and multibyte (@pxref{Text
-Representations}). @sc{ASCII} characters always occupy one byte in a
-string; in fact, there is no real difference between the two
-representation for a string which is all @sc{ASCII}. For most Lisp
-programming, you don't need to be concerned with these two
+Representations}). An @sc{ascii} character always occupies one byte in a
+string; in fact, when a string is all @sc{ascii}, there is no real
+difference between the unibyte and multibyte representations.
+For most Lisp programming, you don't need to be concerned with these two
representations.
Sometimes key sequences are represented as strings. When a string is
a key sequence, string elements in the range 128 to 255 represent meta
-characters (which are extremely large integers) rather than character
+characters (which are large integers) rather than character
codes in the range 128 to 255.
Strings cannot hold characters that have the hyper, super or alt
-modifiers; they can hold @sc{ASCII} control characters, but no other
-control characters. They do not distinguish case in @sc{ASCII} control
+modifiers; they can hold @sc{ascii} control characters, but no other
+control characters. They do not distinguish case in @sc{ascii} control
characters. If you want to store such characters in a sequence, such as
a key sequence, you must use a vector instead of a string.
-@xref{Character Type}, for more information about representation of meta
+@xref{Character Type}, for more information about the representation of meta
and other modifiers for keyboard input characters.
Strings are useful for holding regular expressions. You can also
copy them into buffers. @xref{Character Type}, and @ref{String Type},
for information about the syntax of characters and strings.
@xref{Non-ASCII Characters}, for functions to convert between text
-representations and encode and decode character codes.
+representations and to encode and decode character codes.
@node Predicates for Strings
@section The Predicates for Strings
@end defun
@defun string &rest characters
-@tindex string
This returns a string containing the characters @var{characters}.
@example
If the characters copied from @var{string} have text properties, the
properties are copied into the new string also. @xref{Text Properties}.
-@code{substring} also allows vectors for the first argument.
+@code{substring} also accepts a vector for the first argument.
For example:
@example
The @code{concat} function always constructs a new string that is
not @code{eq} to any existing string.
-When an argument is an integer (not a sequence of integers), it is
-converted to a string of digits making up the decimal printed
-representation of the integer. @strong{Don't use this feature; we plan
-to eliminate it. If you already use this feature, change your programs
-now!} The proper way to convert an integer to a decimal number in this
-way is with @code{format} (@pxref{Formatting Strings}) or
+In Emacs versions before 21, when an argument was an integer (not a
+sequence of integers), it was converted to a string of digits making up
+the decimal printed representation of the integer. This obsolete usage
+no longer works. The proper way to convert an integer to its decimal
+printed form is with @code{format} (@pxref{Formatting Strings}) or
@code{number-to-string} (@pxref{String Conversion}).
-@example
-@group
-(concat 137)
- @result{} "137"
-(concat 54 321)
- @result{} "54321"
-@end group
-@end example
-
For information about other concatenation functions, see the
description of @code{mapconcat} in @ref{Mapping Functions},
@code{vconcat} in @ref{Vectors}, and @code{append} in @ref{Building
@end defun
@defun split-string string separators
-@tindex split-string
-Split @var{string} into substrings in between matches for the regular
+This function splits @var{string} into substrings at matches for the regular
expression @var{separators}. Each match for @var{separators} defines a
splitting point; the substrings between the splitting points are made
-into a list, which is the value. If @var{separators} is @code{nil} (or
-omitted), the default is @code{"[ \f\t\n\r\v]+"}.
+into a list, which is the value returned by @code{split-string}.
+If @var{separators} is @code{nil} (or omitted),
+the default is @code{"[ \f\t\n\r\v]+"}.
For example,
A more powerful function is @code{store-substring}:
@defun store-substring string idx obj
-@tindex store-substring
This function alters part of the contents of the string @var{string}, by
storing @var{obj} starting at index @var{idx}. The argument @var{obj}
may be either a character or a (smaller) string.
Since it is impossible to change the length of an existing string, it is
an error if @var{obj} doesn't fit within @var{string}'s actual length,
-of if any new character requires a different number of bytes from the
+or if any new character requires a different number of bytes from the
character currently present at that point in @var{string}.
@end defun
@defun string= string1 string2
This function returns @code{t} if the characters of the two strings
-match exactly; case is significant.
+match exactly.
+Case is always significant, regardless of @code{case-fold-search}.
@example
(string= "abc" "abc")
strings. When @code{equal} (@pxref{Equality Predicates}) compares two
strings, it uses @code{string=}.
-If the strings contain non-@sc{ASCII} characters, and one is unibyte
+If the strings contain non-@sc{ascii} characters, and one is unibyte
while the other is multibyte, then they cannot be equal. @xref{Text
Representations}.
@end defun
@cindex lexical comparison
@defun string< string1 string2
@c (findex string< causes problems for permuted index!!)
-This function compares two strings a character at a time. First it
-scans both the strings at once to find the first pair of corresponding
-characters that do not match. If the lesser character of those two is
+This function compares two strings a character at a time. It
+scans both the strings at the same time to find the first pair of corresponding
+characters that do not match. If the lesser character of these two is
the character from @var{string1}, then @var{string1} is less, and this
function returns @code{t}. If the lesser character is the one from
@var{string2}, then @var{string1} is greater, and this function returns
Pairs of characters are compared according to their character codes.
Keep in mind that lower case letters have higher numeric values in the
-@sc{ASCII} character set than their upper case counterparts; digits and
+@sc{ascii} character set than their upper case counterparts; digits and
many punctuation characters have a lower numeric value than upper case
-letters. An @sc{ASCII} character is less than any non-@sc{ASCII}
-character; a unibyte non-@sc{ASCII} character is always less than any
-multibyte non-@sc{ASCII} character (@pxref{Text Representations}).
+letters. An @sc{ascii} character is less than any non-@sc{ascii}
+character; a unibyte non-@sc{ascii} character is always less than any
+multibyte non-@sc{ascii} character (@pxref{Text Representations}).
@example
@group
@end defun
@defun compare-strings string1 start1 end1 string2 start2 end2 &optional ignore-case
-@tindex compare-strings
-This function compares a specified part of @var{string1} with a
+This function compares the specified part of @var{string1} with the
specified part of @var{string2}. The specified part of @var{string1}
-runs from index @var{start1} up to index @var{end1} (default, the end of
-the string). The specified part of @var{string2} runs from index
-@var{start2} up to index @var{end2} (default, the end of the string).
+runs from index @var{start1} up to index @var{end1} (@code{nil} means
+the end of the string). The specified part of @var{string2} runs from
+index @var{start2} up to index @var{end2} (@code{nil} means the end of
+the string).
The strings are both converted to multibyte for the comparison
-(@pxref{Text Representations}) so that a unibyte string can be usefully
-compared with a multibyte string. If @var{ignore-case} is
-non-@code{nil}, then case is ignored as well.
+(@pxref{Text Representations}) so that a unibyte string can be equal to
+a multibyte string. If @var{ignore-case} is non-@code{nil}, then case
+is ignored, so that upper case letters can be equal to lower case letters.
If the specified portions of the two strings match, the value is
@code{t}. Otherwise, the value is an integer which indicates how many
@end defun
@defun assoc-ignore-case key alist
-@tindex assoc-ignore-case
This function works like @code{assoc}, except that @var{key} must be a
-string, and comparison is done using @code{compare-strings}.
-Case differences are ignored in this comparison.
+string, and comparison is done using @code{compare-strings}, ignoring
+case differences. @xref{Association Lists}.
@end defun
@defun assoc-ignore-representation key alist
-@tindex assoc-ignore-representation
This function works like @code{assoc}, except that @var{key} must be a
string, and comparison is done using @code{compare-strings}.
Case differences are significant.
@cindex string to character
This function returns the first character in @var{string}. If the
string is empty, the function returns 0. The value is also 0 when the
-first character of @var{string} is the null character, @sc{ASCII} code
+first character of @var{string} is the null character, @sc{ascii} code
0.
@example
@result{} 120
(string-to-char "")
@result{} 0
+@group
(string-to-char "\000")
@result{} 0
+@end group
@end example
This function may be eliminated in the future if it does not seem useful
@defun number-to-string number
@cindex integer to string
@cindex integer to decimal
-This function returns a string consisting of the printed
+This function returns a string consisting of the printed base-ten
representation of @var{number}, which may be an integer or a floating
-point number. The value starts with a sign if the argument is
+point number. The returned value starts with a minus sign if the argument is
negative.
@example
in that base. If @var{base} is @code{nil}, then base ten is used.
Floating point conversion always uses base ten; we have not implemented
other radices for floating point numbers, because that would be much
-more work and does not seem useful.
+more work and does not seem useful. If @var{string} looks like an
+integer but its value is too large to fit into a Lisp integer,
+@code{string-to-number} returns a floating point result.
The parsing skips spaces and tabs at the beginning of @var{string}, then
reads as much of @var{string} as it can interpret as a number. (On some
systems it ignores other whitespace at the beginning, not just spaces
-and tabs.) If the first character after the ignored whitespace is not a
-digit or a plus or minus sign, this function returns 0.
+and tabs.) If the first character after the ignored whitespace is
+neither a digit, nor a plus or minus sign, nor the leading dot of a
+floating point number, this function returns 0.
@example
(string-to-number "256")
@result{} 0
(string-to-number "-4.5")
@result{} -4.5
+(string-to-number "1e5")
+ @result{} 100000.0
@end example
@findex string-to-int
@cindex strings, formatting them
@dfn{Formatting} means constructing a string by substitution of
-computed values at various places in a constant string. This string
-controls how the other values are printed as well as where they appear;
+computed values at various places in a constant string. This constant string
+controls how the other values are printed, as well as where they appear;
it is called a @dfn{format string}.
Formatting is often useful for computing messages to be displayed. In
@var{string} and then replacing any format specification
in the copy with encodings of the corresponding @var{objects}. The
arguments @var{objects} are the computed values to be formatted.
+
+The characters in @var{string}, other than the format specifications,
+are copied directly into the output; starting in Emacs 21, if they have
+text properties, these are copied into the output also.
@end defun
@cindex @samp{%} in format
@end example
If @var{string} contains more than one format specification, the
-format specifications correspond with successive values from
+format specifications correspond to successive values from
@var{objects}. Thus, the first format specification in @var{string}
uses the first such value, the second format specification uses the
second such value, and so on. Any extra format specifications (those
by their contents alone, with no @samp{"} characters, and symbols appear
without @samp{\} characters.
+Starting in Emacs 21, if the object is a string, its text properties are
+copied into the output. The text properties of the @samp{%s} itself
+are also copied, but those of the object take priority.
+
If there is no corresponding object, the empty string is used.
@item %S
integer.
@item %x
+@itemx %X
@cindex integer to hexadecimal
Replace the specification with the base-sixteen representation of an
-integer.
+integer. @samp{%x} uses lower case and @samp{%X} uses upper case.
@item %c
Replace the specification with the character which is the value given.
is shorter.
@item %%
-A single @samp{%} is placed in the string. This format specification is
-unusual in that it does not use a value. For example, @code{(format "%%
-%d" 30)} returns @code{"% 30"}.
+Replace the specification with a single @samp{%}. This format
+specification is unusual in that it does not use a value. For example,
+@code{(format "%% %d" 30)} returns @code{"% 30"}.
@end table
Any other format character results in an @samp{Invalid format
The character case functions change the case of single characters or
of the contents of strings. The functions normally convert only
alphabetic characters (the letters @samp{A} through @samp{Z} and
-@samp{a} through @samp{z}, as well as non-ASCII letters); other
-characters are not altered. (You can specify a different case
-conversion mapping by specifying a case table---@pxref{Case Tables}.)
+@samp{a} through @samp{z}, as well as non-@sc{ascii} letters); other
+characters are not altered. You can specify a different case
+conversion mapping by specifying a case table (@pxref{Case Tables}).
These functions do not modify the strings that are passed to them as
arguments.
The examples below use the characters @samp{X} and @samp{x} which have
-@sc{ASCII} codes 88 and 120 respectively.
+@sc{ascii} codes 88 and 120 respectively.
@defun downcase string-or-char
This function converts a character or a string to lower case.
When the argument to @code{upcase} is a character, @code{upcase}
returns the corresponding upper case character. This value is an integer.
If the original character is upper case, or is not a letter, then the
-value equals the original character.
+value returned equals the original character.
@example
(upcase "The cat in the hat")
The definition of a word is any sequence of consecutive characters that
are assigned to the word constituent syntax class in the current syntax
-table (@xref{Syntax Class Table}).
+table (@pxref{Syntax Class Table}).
When the argument to @code{capitalize} is a character, @code{capitalize}
has the same result as @code{upcase}.
@end defun
@defun upcase-initials string
-This function capitalizes the initials of the words in @var{string}.
+This function capitalizes the initials of the words in @var{string},
without altering any letters other than the initials. It returns a new
string whose contents are a copy of @var{string}, in which each word has
-been converted to upper case.
+had its initial letter converted to upper case.
The definition of a word is any sequence of consecutive characters that
are assigned to the word constituent syntax class in the current syntax
-table (@xref{Syntax Class Table}).
+table (@pxref{Syntax Class Table}).
@example
@group
The extra table @var{equivalences} is a map that cyclicly permutes
each equivalence class (of characters with the same canonical
-equivalent). (For ordinary @sc{ASCII}, this would map @samp{a} into
+equivalent). (For ordinary @sc{ascii}, this would map @samp{a} into
@samp{A} and @samp{A} into @samp{a}, and likewise for each set of
equivalent characters.)
@end defun
The following three functions are convenient subroutines for packages
-that define non-@sc{ASCII} character sets. They modify the specified
+that define non-@sc{ascii} character sets. They modify the specified
case table @var{case-table}; they also modify the standard syntax table.
@xref{Syntax Tables}. Normally you would use these functions to change
the standard case table.