@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999
-@c Free Software Foundation, Inc.
+@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2003
+@c Free Software Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@setfilename ../info/objects
@node Lisp Data Types, Numbers, Introduction, Top
variable, and the type is known by the compiler but not represented in
the data. Such type declarations do not exist in Emacs Lisp. A Lisp
variable can have any type of value, and it remembers whatever value
-you store in it, type and all.
+you store in it, type and all. (Actually, a small number of Emacs
+Lisp variables can only take on values of a certain type.
+@xref{Variables with Restricted Values}.)
This chapter describes the purpose, printed representation, and read
syntax of each of the standard types in GNU Emacs Lisp. Details on how
@node Integer Type
@subsection Integer Type
- The range of values for integers in Emacs Lisp is @minus{}134217728 to
-134217727 (28 bits; i.e.,
+ The range of values for integers in Emacs Lisp is @minus{}268435456 to
+268435455 (29 bits; i.e.,
@ifnottex
--2**27
+-2**28
@end ifnottex
@tex
-@math{-2^{27}}
+@math{-2^{28}}
@end tex
to
@ifnottex
-2**27 - 1)
+2**28 - 1)
@end ifnottex
@tex
@math{2^{28}-1})
@end tex
on most machines. (Some machines may provide a wider range.) It is
important to note that the Emacs Lisp arithmetic functions do not check
-for overflow. Thus @code{(1+ 134217727)} is @minus{}134217728 on most
+for overflow. Thus @code{(1+ 268435455)} is @minus{}268435456 on most
machines.
The read syntax for integers is a sequence of (base ten) digits with an
1 ; @r{The integer 1.}
1. ; @r{Also the integer 1.}
+1 ; @r{Also the integer 1.}
-268435457 ; @r{Also the integer 1 on a 28-bit implementation.}
+536870913 ; @r{Also the integer 1 on a 29-bit implementation.}
@end group
@end example
@node Floating Point Type
@subsection Floating Point Type
- Emacs supports floating point numbers (though there is a compilation
-option to disable them). The precise range of floating point numbers is
-machine-specific.
+ Floating point numbers are the computer equivalent of scientific
+notation. The precise number of significant figures and the range of
+possible exponents is machine-specific; Emacs always uses the C data
+type @code{double} to store the value.
The printed representation for floating point numbers requires either
a decimal point (with at least one digit following), an exponent, or
@node Character Type
@subsection Character Type
-@cindex @sc{ascii} character codes
+@cindex @acronym{ASCII} character codes
A @dfn{character} in Emacs Lisp is nothing more than an integer. In
other words, characters are represented by their character codes. For
common to work with @emph{strings}, which are sequences composed of
characters. @xref{String Type}.
- Characters in strings, buffers, and files are currently limited to the
-range of 0 to 524287---nineteen bits. But not all values in that range
-are valid character codes. Codes 0 through 127 are @sc{ascii} codes; the
-rest are non-@sc{ascii} (@pxref{Non-ASCII Characters}). Characters that represent
-keyboard input have a much wider range, to encode modifier keys such as
+ Characters in strings, buffers, and files are currently limited to
+the range of 0 to 524287---nineteen bits. But not all values in that
+range are valid character codes. Codes 0 through 127 are
+@acronym{ASCII} codes; the rest are non-@acronym{ASCII}
+(@pxref{Non-ASCII Characters}). Characters that represent keyboard
+input have a much wider range, to encode modifier keys such as
Control, Meta and Shift.
@cindex read syntax for characters
The usual read syntax for alphanumeric characters is a question mark
followed by the character; thus, @samp{?A} for the character
@kbd{A}, @samp{?B} for the character @kbd{B}, and @samp{?a} for the
-character @kbd{a}.
+character @kbd{a}.
For example:
You can use the same syntax for punctuation characters, but it is
often a good idea to add a @samp{\} so that the Emacs commands for
-editing Lisp code don't get confused. For example, @samp{?\ } is the
-way to write the space character. If the character is @samp{\}, you
-@emph{must} use a second @samp{\} to quote it: @samp{?\\}.
+editing Lisp code don't get confused. For example, @samp{?\(} is the
+way to write the open-paren character. If the character is @samp{\},
+you @emph{must} use a second @samp{\} to quote it: @samp{?\\}.
@cindex whitespace
@cindex bell character
@cindex @samp{\r}
@cindex escape
@cindex @samp{\e}
- You can express the characters Control-g, backspace, tab, newline,
-vertical tab, formfeed, return, and escape as @samp{?\a}, @samp{?\b},
-@samp{?\t}, @samp{?\n}, @samp{?\v}, @samp{?\f}, @samp{?\r}, @samp{?\e},
-respectively. Thus,
+@cindex space
+@cindex @samp{\s}
+ You can express the characters control-g, backspace, tab, newline,
+vertical tab, formfeed, space, return, del, and escape as @samp{?\a},
+@samp{?\b}, @samp{?\t}, @samp{?\n}, @samp{?\v}, @samp{?\f},
+@samp{?\s}, @samp{?\r}, @samp{?\d}, and @samp{?\e}, respectively.
+Thus,
@example
-?\a @result{} 7 ; @r{@kbd{C-g}}
+?\a @result{} 7 ; @r{control-g, @kbd{C-g}}
?\b @result{} 8 ; @r{backspace, @key{BS}, @kbd{C-h}}
?\t @result{} 9 ; @r{tab, @key{TAB}, @kbd{C-i}}
?\n @result{} 10 ; @r{newline, @kbd{C-j}}
?\f @result{} 12 ; @r{formfeed character, @kbd{C-l}}
?\r @result{} 13 ; @r{carriage return, @key{RET}, @kbd{C-m}}
?\e @result{} 27 ; @r{escape character, @key{ESC}, @kbd{C-[}}
+?\s @result{} 32 ; @r{space character, @key{SPC}}
?\\ @result{} 92 ; @r{backslash character, @kbd{\}}
?\d @result{} 127 ; @r{delete character, @key{DEL}}
@end example
@cindex escape sequence
These sequences which start with backslash are also known as
-@dfn{escape sequences}, because backslash plays the role of an escape
-character; this usage has nothing to do with the character @key{ESC}.
+@dfn{escape sequences}, because backslash plays the role of an
+``escape character''; this terminology has nothing to do with the
+character @key{ESC}. @samp{\s} is meant for use only in character
+constants; in string constants, just write the space.
@cindex control characters
Control characters may be represented using yet another read syntax.
@end example
In strings and buffers, the only control characters allowed are those
-that exist in @sc{ascii}; but for keyboard input purposes, you can turn
+that exist in @acronym{ASCII}; but for keyboard input purposes, you can turn
any character into a control character with @samp{C-}. The character
-codes for these non-@sc{ascii} control characters include the
+codes for these non-@acronym{ASCII} control characters include the
@tex
@math{2^{26}}
@end tex
2**26
@end ifnottex
bit as well as the code for the corresponding non-control
-character. Ordinary terminals have no way of generating non-@sc{ascii}
+character. Ordinary terminals have no way of generating non-@acronym{ASCII}
control characters, but you can generate them straightforwardly using X
and other window systems.
@ifnottex
2**27
@end ifnottex
-bit set (which on most machines makes it a negative number). We
-use high bits for this and other modifiers to make possible a wide range
-of basic character codes.
+bit set. We use high bits for this and other modifiers to make
+possible a wide range of basic character codes.
In a string, the
@tex
@ifnottex
2**7
@end ifnottex
-bit attached to an @sc{ascii} character indicates a meta character; thus, the
-meta characters that can fit in a string have codes in the range from
-128 to 255, and are the meta versions of the ordinary @sc{ascii}
-characters. (In Emacs versions 18 and older, this convention was used
-for characters outside of strings as well.)
+bit attached to an @acronym{ASCII} character indicates a meta
+character; thus, the meta characters that can fit in a string have
+codes in the range from 128 to 255, and are the meta versions of the
+ordinary @acronym{ASCII} characters. (In Emacs versions 18 and older,
+this convention was used for characters outside of strings as well.)
The read syntax for meta characters uses @samp{\M-}. For example,
@samp{?\M-A} stands for @kbd{M-A}. You can use @samp{\M-} together with
@samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}.
The case of a graphic character is indicated by its character code;
-for example, @sc{ascii} distinguishes between the characters @samp{a}
-and @samp{A}. But @sc{ascii} has no way to represent whether a control
+for example, @acronym{ASCII} distinguishes between the characters @samp{a}
+and @samp{A}. But @acronym{ASCII} has no way to represent whether a control
character is upper case or lower case. Emacs uses the
@tex
@math{2^{25}}
character. This distinction is possible only when you use X terminals
or other special terminals; ordinary terminals do not report the
distinction to the computer in any way. The Lisp syntax for
-the shift bit is @samp{\S-}; thus, @samp{?\C-\S-o} or @samp{?\C-\S-O}
+the shift bit is @samp{\S-}; thus, @samp{?\C-\S-o} or @samp{?\C-\S-O}
represents the shifted-control-o character.
@cindex hyper characters
@cindex super characters
@cindex alt characters
- The X Window System defines three other modifier bits that can be set
+ The X Window System defines three other @anchor{modifier bits}
+modifier bits that can be set
in a character: @dfn{hyper}, @dfn{super} and @dfn{alt}. The syntaxes
for these bits are @samp{\H-}, @samp{\s-} and @samp{\A-}. (Case is
significant in these prefixes.) Thus, @samp{?\H-\M-\A-x} represents
-@kbd{Alt-Hyper-Meta-x}.
+@kbd{Alt-Hyper-Meta-x}. (Note that @samp{\s} with no following @samp{-}
+represents the space character.)
@tex
-Numerically, the
-bit values are @math{2^{22}} for alt, @math{2^{23}} for super and @math{2^{24}} for hyper.
+Numerically, the bit values are @math{2^{22}} for alt, @math{2^{23}}
+for super and @math{2^{24}} for hyper.
@end tex
@ifnottex
Numerically, the
mark followed by a backslash and the octal character code (up to three
octal digits); thus, @samp{?\101} for the character @kbd{A},
@samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the
-character @kbd{C-b}. Although this syntax can represent any @sc{ascii}
+character @kbd{C-b}. Although this syntax can represent any @acronym{ASCII}
character, it is preferred only when the precise octal value is more
-important than the @sc{ascii} representation.
+important than the @acronym{ASCII} representation.
@example
@group
There is no reason to add a backslash before most characters. However,
you should add a backslash before any of the characters
@samp{()\|;'`"#.,} to avoid confusing the Emacs commands for editing
-Lisp code. Also add a backslash before whitespace characters such as
+Lisp code. You can also add a backslash before whitespace characters such as
space, tab, newline and formfeed. However, it is cleaner to use one of
-the easily readable escape sequences, such as @samp{\t}, instead of an
-actual whitespace character such as a tab.
+the easily readable escape sequences, such as @samp{\t} or @samp{\s},
+instead of an actual whitespace character such as a tab or a space.
+(If you do write backslash followed by a space, you should write
+an extra space after the character constant to separate it from the
+following text.)
@node Symbol Type
@subsection Symbol Type
Here are several examples of symbol names. Note that the @samp{+} in
the fifth example is escaped to prevent it from being read as a number.
-This is not necessary in the sixth example because the rest of the name
+This is not necessary in the seventh example because the rest of the name
makes it invalid as a number.
@example
@end group
@end example
+@ifinfo
+@c This uses ``colon'' instead of a literal `:' because Info cannot
+@c cope with a `:' in a menu
+@cindex @samp{#@var{colon}} read syntax
+@end ifinfo
+@ifnotinfo
@cindex @samp{#:} read syntax
+@end ifnotinfo
Normally the Lisp reader interns all symbols (@pxref{Creating
Symbols}). To prevent interning, you can write @samp{#:} before the
name of the symbol.
@code{(@var{a} .@: @var{b})} stands for a cons cell whose @sc{car} is
the object @var{a}, and whose @sc{cdr} is the object @var{b}. Dotted
pair notation is therefore more general than list syntax. In the dotted
-pair notation, the list @samp{(1 2 3)} is written as @samp{(1 . (2 . (3
+pair notation, the list @samp{(1 2 3)} is written as @samp{(1 . (2 . (3
. nil)))}. For @code{nil}-terminated lists, you can use either
notation, but list notation is usually clearer and more convenient.
When printing a list, the dotted pair notation is only used if the
@example
(setq alist-of-colors
- '((rose . red) (lily . white) (buttercup . yellow)))
+ '((rose . red) (lily . white) (buttercup . yellow)))
@end example
@noindent
in documentation strings,
but the newline is \
ignored if escaped."
- @result{} "It is useful to include newlines
-in documentation strings,
+ @result{} "It is useful to include newlines
+in documentation strings,
but the newline is ignored if escaped."
@end example
@node Non-ASCII in Strings
-@subsubsection Non-@sc{ascii} Characters in Strings
+@subsubsection Non-@acronym{ASCII} Characters in Strings
- You can include a non-@sc{ascii} international character in a string
+ You can include a non-@acronym{ASCII} international character in a string
constant by writing it literally. There are two text representations
-for non-@sc{ascii} characters in Emacs strings (and in buffers): unibyte
+for non-@acronym{ASCII} characters in Emacs strings (and in buffers): unibyte
and multibyte. If the string constant is read from a multibyte source,
such as a multibyte buffer or string, or a file that would be visited as
multibyte, then the character is read as a multibyte character, and that
unibyte source, then the character is read as unibyte and that makes the
string unibyte.
- You can also represent a multibyte non-@sc{ascii} character with its
+ You can also represent a multibyte non-@acronym{ASCII} character with its
character code: use a hex escape, @samp{\x@var{nnnnnnn}}, with as many
-digits as necessary. (Multibyte non-@sc{ascii} character codes are all
+digits as necessary. (Multibyte non-@acronym{ASCII} character codes are all
greater than 256.) Any character which is not a valid hex digit
terminates this construct. If the next character in the string could be
interpreted as a hex digit, write @w{@samp{\ }} (backslash and space) to
constant is just like backslash-newline; it does not contribute any
character to the string, but it does terminate the preceding hex escape.
- Using a multibyte hex escape forces the string to multibyte. You can
-represent a unibyte non-@sc{ascii} character with its character code,
-which must be in the range from 128 (0200 octal) to 255 (0377 octal).
-This forces a unibyte string.
-
+ You can represent a unibyte non-@acronym{ASCII} character with its
+character code, which must be in the range from 128 (0200 octal) to
+255 (0377 octal). If you write all such character codes in octal and
+the string contains no other characters forcing it to be multibyte,
+this produces a unibyte string. However, using any hex escape in a
+string (even for an @acronym{ASCII} character) forces the string to be
+multibyte.
+
@xref{Text Representations}, for more information about the two
text representations.
However, not all of the characters you can write with backslash
escape-sequences are valid in strings. The only control characters that
-a string can hold are the @sc{ascii} control characters. Strings do not
-distinguish case in @sc{ascii} control characters.
+a string can hold are the @acronym{ASCII} control characters. Strings do not
+distinguish case in @acronym{ASCII} control characters.
Properly speaking, strings cannot hold meta characters; but when a
string is to be used as a key sequence, there is a special convention
-that provides a way to represent meta versions of @sc{ascii} characters in a
-string. If you use the @samp{\M-} syntax to indicate a meta character
-in a string constant, this sets the
+that provides a way to represent meta versions of @acronym{ASCII}
+characters in a string. If you use the @samp{\M-} syntax to indicate
+a meta character in a string constant, this sets the
@tex
@math{2^{7}}
@end tex
Character category tables (@pxref{Categories}).
@item
-Display Tables (@pxref{Display Tables}).
+Display tables (@pxref{Display Tables}).
@item
Syntax tables (@pxref{Syntax Tables}).
constant that follows actually specifies the contents of the bool-vector
as a bitmap---each ``character'' in the string contains 8 bits, which
specify the next 8 elements of the bool-vector (1 stands for @code{t},
-and 0 for @code{nil}). The least significant bits of the character
-correspond to the lowest indices in the bool-vector. If the length is not a
-multiple of 8, the printed representation shows extra elements, but
-these extras really make no difference.
+and 0 for @code{nil}). The least significant bits of the character
+correspond to the lowest indices in the bool-vector.
@example
(make-bool-vector 3 t)
- @result{} #&3"\007"
+ @result{} #&3"^G"
(make-bool-vector 3 nil)
- @result{} #&3"\0"
-;; @r{These are equal since only the first 3 bits are used.}
+ @result{} #&3"^@@"
+@end example
+
+@noindent
+These results make sense, because the binary code for @samp{C-g} is
+111 and @samp{C-@@} is the character with code 0.
+
+ If the length is not a multiple of 8, the printed representation
+shows extra elements, but these extras really make no difference. For
+instance, in the next example, the two bool-vectors are equal, because
+only the first 3 bits are used:
+
+@example
(equal #&3"\377" #&3"\007")
@result{} t
@end example
@cindex @samp{#@var{n}=} read syntax
@cindex @samp{#@var{n}#} read syntax
- In Emacs 21, to represent shared or circular structure within a
+ In Emacs 21, to represent shared or circular structures within a
complex of Lisp objects, you can use the reader constructs
@samp{#@var{n}=} and @samp{#@var{n}#}.
@end example
Comparison of strings is case-sensitive, but does not take account of
-text properties---it compares only the characters in the strings.
-A unibyte string never equals a multibyte string unless the
-contents are entirely @sc{ascii} (@pxref{Text Representations}).
+text properties---it compares only the characters in the strings. For
+technical reasons, a unibyte string and a multibyte string are
+@code{equal} if and only if they contain the same sequence of
+character codes and all these codes are either in the range 0 through
+127 (@acronym{ASCII}) or 160 through 255 (@code{eight-bit-graphic}).
+(@pxref{Text Representations}).
@example
@group
Because of this recursive method, circular lists may therefore cause
infinite recursion (leading to an error).
+
+@ignore
+ arch-tag: 9711a66e-4749-4265-9e8c-972d55b67096
+@end ignore