@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1998, 1999, 2002, 2003, 2004,
-@c 2005, 2006 Free Software Foundation, Inc.
+@c Copyright (C) 1998, 1999, 2001, 2002, 2003, 2004,
+@c 2005, 2006, 2007 Free Software Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@setfilename ../info/characters
@node Non-ASCII Characters, Searching and Matching, Text, Top
@chapter Non-@acronym{ASCII} Characters
@cindex multibyte characters
+@cindex characters, multi-byte
@cindex non-@acronym{ASCII} characters
This chapter covers the special issues relating to non-@acronym{ASCII}
@end defvar
@defun position-bytes position
-@tindex position-bytes
Return the byte-position corresponding to buffer position
@var{position} in the current buffer. This is 1 at the start of the
buffer, and counts upward in bytes. If @var{position} is out of
@end defun
@defun byte-to-position byte-position
-@tindex byte-to-position
Return the buffer position corresponding to byte-position
@var{byte-position} in the current buffer. If @var{byte-position} is
out of range, the value is @code{nil}.
Return @code{t} if @var{string} is a multibyte string.
@end defun
+@defun string-bytes string
+@cindex string, number of bytes
+This function returns the number of bytes in @var{string}.
+If @var{string} is a multibyte string, this can be greater than
+@code{(length @var{string})}.
+@end defun
+
@node Converting Representations
@section Converting Text Representations
@end defun
@defun charset-plist charset
-@tindex charset-plist
This function returns the charset property list of the character set
@var{charset}. Although @var{charset} is a symbol, this is not the same
as the property list of that symbol. Charset properties are used for
@section Characters and Bytes
@cindex bytes and characters
-@cindex introduction sequence
+@cindex introduction sequence (of character)
@cindex dimension (of character set)
In multibyte representation, each character occupies one or more
bytes. Each character set has an @dfn{introduction sequence}, which is
@end defun
@defun charset-bytes charset
-@tindex charset-bytes
This function returns the number of bytes used to represent a character
in character set @var{charset}.
@end defun
@node Splitting Characters
@section Splitting Characters
+@cindex character as bytes
The functions in this section convert between characters and the byte
values used to represent them. For most purposes, there is no need to
@end example
@end defun
+@cindex generate characters in charsets
@defun make-char charset &optional code1 code2
This function returns the character in character set @var{charset} whose
position codes are @var{code1} and @var{code2}. This is roughly the
@code{iso-latin-2} and decode the result with the same coding system,
you'll get Latin-2 characters.
-@cindex end of line conversion
+@cindex EOL conversion
+@cindex end-of-line conversion
+@cindex line end conversion
@dfn{End of line conversion} handles three different conventions used
on various systems for representing end of line in files. The Unix
convention is to use the linefeed character (also called newline). The
Otherwise it signals an error with condition @code{coding-system-error}.
@end defun
-@cindex EOL conversion
-@cindex end-of-line conversion
-@cindex line end conversion
@defun coding-system-eol-type coding-system
This function returns the type of end-of-line (a.k.a.@: @dfn{eol})
conversion used by @var{coding-system}. If @var{coding-system}
return value is just one coding system, the one that is highest in
priority.
-If the region contains only @acronym{ASCII} characters, the value
-is @code{undecided} or @code{(undecided)}, or a variant specifying
+If the region contains only @acronym{ASCII} characters except for such
+ISO-2022 control characters ISO-2022 as @code{ESC}, the value is
+@code{undecided} or @code{(undecided)}, or a variant specifying
end-of-line conversion, if that can be deduced from the text.
@end defun
@var{encoding-system} is the coding system for encoding (in case
@var{operation} does encoding).
-The argument @var{operation} should be a symbol, any one of
-@code{insert-file-contents}, @code{write-region},
+The argument @var{operation} is a symbol, one of @code{write-region},
@code{start-process}, @code{call-process}, @code{call-process-region},
-or @code{open-network-stream}. These are the names of the Emacs I/O
-primitives that can do coding system conversion.
+@code{insert-file-contents}, or @code{open-network-stream}. These are
+the names of the Emacs I/O primitives that can do character code and
+eol conversion.
The remaining arguments should be the same arguments that might be given
-to that I/O primitive. Depending on the primitive, one of those
-arguments is selected as the @dfn{target}. For example, if
+to the corresponding I/O primitive. Depending on the primitive, one
+of those arguments is selected as the @dfn{target}. For example, if
@var{operation} does file I/O, whichever argument specifies the file
name is the target. For subprocess primitives, the process name is the
target. For @code{open-network-stream}, the target is the service name
Depending on @var{operation}, this function looks up the target in
@code{file-coding-system-alist}, @code{process-coding-system-alist},
-or @code{network-coding-system-alist}.
+or @code{network-coding-system-alist}. If the target is found in the
+alist, @code{find-operation-coding-system} returns its association in
+the alist; otherwise it returns @code{nil}.
+
+If @var{operation} is @code{insert-file-contents}, the argument
+corresponding to the target may be a cons cell of the form
+@code{(@var{filename} . @var{buffer})}). In that case, @var{filename}
+is a file name to look up in @code{file-coding-system-alist}, and
+@var{buffer} is a buffer that contains the file's contents (not yet
+decoded). If @code{file-coding-system-alist} specifies a function to
+call for this file, and that function needs to examine the file's
+contents (as it usually does), it should examine the contents of
+@var{buffer} instead of reading the file.
@end defun
@node Specifying Coding Systems
(insert-file-contents filename))
@end example
-When its value is non-@code{nil}, @code{coding-system-for-read} takes
-precedence over all other methods of specifying a coding system to use for
-input, including @code{file-coding-system-alist},
+When its value is non-@code{nil}, this variable takes precedence over
+all other methods of specifying a coding system to use for input,
+including @code{file-coding-system-alist},
@code{process-coding-system-alist} and
@code{network-coding-system-alist}.
@end defvar
@node Explicit Encoding
@subsection Explicit Encoding and Decoding
-@cindex encoding text
-@cindex decoding text
+@cindex encoding in coding systems
+@cindex decoding in coding systems
All the operations that transfer text in and out of Emacs have the
ability to use a coding system to encode or decode the text.
@code{no-conversion}.
Here are the functions to perform explicit encoding or decoding. The
-decoding functions produce sequences of bytes; the encoding functions
+encoding functions produce sequences of bytes; the decoding functions
are meant to operate on sequences of bytes. All of these functions
discard text properties.
how Emacs interacts with these features.
@defvar locale-coding-system
-@tindex locale-coding-system
@cindex keyboard input decoding on X
This variable specifies the coding system to use for decoding system
error messages and---on X Window system only---keyboard input, for
@end defvar
@defvar system-messages-locale
-@tindex system-messages-locale
This variable specifies the locale to use for generating system error
messages. Changing the locale can cause messages to come out in a
different language or in a different orthography. If the variable is
@end defvar
@defvar system-time-locale
-@tindex system-time-locale
This variable specifies the locale to use for formatting time values.
Changing the locale can cause messages to appear according to the
conventions of a different language. If the variable is @code{nil}, the