@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1998, 1999 Free Software Foundation, Inc.
+@c Copyright (C) 1998, 1999 Free Software Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@setfilename ../info/characters
@node Non-ASCII Characters, Searching and Matching, Text, Top
* Translation of Characters:: Translation tables are used for conversion.
* Coding Systems:: Coding systems are conversions for saving files.
* Input Methods:: Input methods allow users to enter various
- non-ASCII characters without speciak keyboards.
+ non-ASCII characters without special keyboards.
* Locales:: Interacting with the POSIX locale.
@end menu
@defun string-make-unibyte string
This function converts the text of @var{string} to unibyte
representation, if it isn't already, and returns the result. If
-@var{string} is a unibyte string, it is returned unchanged.
-Multibyte character codes are converted to unibyte
-by using just the low 8 bits.
+@var{string} is a unibyte string, it is returned unchanged. Multibyte
+character codes are converted to unibyte according to
+@code{nonascii-translation-table} or, if that is @code{nil}, using
+@code{nonascii-insert-offset}. If the lookup in the translation table
+fails, this function takes just the low 8 bits of each character.
@end defun
@defun string-make-multibyte string
each unibyte character to a multibyte character.
@end defun
+@defun string-to-multibyte string
+This function returns a multibyte string containing the same sequence
+of character codes as @var{string}. If @var{string} is a multibyte
+string, the value is the equal to @var{string}.
+@end defun
+
@node Selecting a Representation
@section Selecting a Representation
If @var{string} is already a unibyte string, then the value is
@var{string} itself. Otherwise it is a newly created string, with no
text properties. If @var{string} is multibyte, any characters it
-contains of charset @var{eight-bit-control} or @var{eight-bit-graphic}
+contains of charset @code{eight-bit-control} or @code{eight-bit-graphic}
are converted to the corresponding single byte.
@end defun
@var{string} itself. Otherwise it is a newly created string, with no
text properties. If @var{string} is unibyte and contains any individual
8-bit bytes (i.e.@: not part of a multibyte form), they are converted to
-the corresponding multibyte character of charset @var{eight-bit-control}
-or @var{eight-bit-graphic}.
+the corresponding multibyte character of charset @code{eight-bit-control}
+or @code{eight-bit-graphic}.
@end defun
@node Character Codes
@result{} t
@end example
-If the optional argument @var{genericp} is non-nil, this function
-returns @code{t} if @var{charcode} is a generic character
+If the optional argument @var{genericp} is non-@code{nil}, this
+function returns @code{t} if @var{charcode} is a generic character
(@pxref{Splitting Characters}).
@end defun
This function returns the charset property list of the character set
@var{charset}. Although @var{charset} is a symbol, this is not the same
as the property list of that symbol. Charset properties are used for
-special purposes within Emacs; for example,
-@code{preferred-coding-system} helps determine which coding system to
-use to encode characters in a charset.
+special purposes within Emacs.
@end defun
@node Chars and Bytes
(make-char 'latin-iso8859-1 72)
@result{} 2248
@end example
+
+Actually, the eighth bit of both @var{code1} and @var{code2} is zeroed
+before they are used to index @var{charset}. Thus you may use, for
+instance, an ISO 8859 character code rather than subtracting 128, as
+is necessary to index the corresponding Emacs charset.
@end defun
@cindex generic characters
coding systems that don't specify any other translation table.
@end defvar
+@defvar translation-table-for-input
+Self-inserting characters are translated through this translation
+table before they are inserted.
+@end defvar
+
@node Coding Systems
@section Coding Systems
@defvar buffer-file-coding-system
This variable records the coding system that was used for visiting the
current buffer. It is used for saving the buffer, and for writing part
-of the buffer with @code{write-region}. When those operations ask the
-user to specify a different coding system,
-@code{buffer-file-coding-system} is updated to the coding system
-specified.
-
-However, @code{buffer-file-coding-system} does not affect sending text
+of the buffer with @code{write-region}. If the text to be written
+cannot be safely encoded using the coding system specified by this
+variable, these operations select an alternative encoding by calling
+the function @code{select-safe-coding-system} (@pxref{User-Chosen
+Coding Systems}). If selecting a different encoding requires to ask
+the user to specify a coding system, @code{buffer-file-coding-system}
+is updated to the newly selected coding system.
+
+@code{buffer-file-coding-system} does @emph{not} affect sending text
to a subprocess.
@end defvar
@code{buffer-file-coding-system} (or @code{save-buffer-coding-system}),
and that coding system cannot handle
the actual text in the buffer, the command asks the user to choose
-another coding system. After that happens, the command also updates
-@code{buffer-file-coding-system} to represent the coding system that the
-user specified.
+another coding system (by calling @code{select-safe-coding-system}).
+After that happens, the command also updates
+@code{buffer-file-coding-system} to represent the coding system that
+the user specified.
@end defvar
@defvar last-coding-system-used
@node User-Chosen Coding Systems
@subsection User-Chosen Coding Systems
-@defun select-safe-coding-system from to &optional preferred-coding-system
-This function selects a coding system for encoding the text between
-@var{from} and @var{to}, asking the user to choose if necessary.
-
-The optional argument @var{preferred-coding-system} specifies a coding
-system to try first. If that one can handle the text in the specified
-region, then it is used. If this argument is omitted, the current
-buffer's value of @code{buffer-file-coding-system} is tried first.
-
-If the region contains some multibyte characters that the preferred
-coding system cannot encode, this function asks the user to choose from
-a list of coding systems which can encode the text, and returns the
-user's choice.
-
-One other kludgy feature: if @var{from} is a string, the string is the
-target text, and @var{to} is ignored.
+@cindex select safe coding system
+@defun select-safe-coding-system from to &optional default-coding-system accept-default-p
+This function selects a coding system for encoding specified text,
+asking the user to choose if necessary. Normally the specified text
+is the text in the current buffer between @var{from} and @var{to},
+defaulting to the whole buffer if they are @code{nil}. If @var{from}
+is a string, the string specifies the text to encode, and @var{to} is
+ignored.
+
+If @var{default-coding-system} is non-@code{nil}, that is the first
+coding system to try; if that can handle the text,
+@code{select-safe-coding-system} returns that coding system. It can
+also be a list of coding systems; then the function tries each of them
+one by one. After trying all of them, it next tries the user's most
+preferred coding system (@pxref{Recognize Coding,
+prefer-coding-system, the description of @code{prefer-coding-system},
+emacs, GNU Emacs Manual}), and after that the current buffer's value
+of @code{buffer-file-coding-system} (if it is not @code{undecided}).
+
+If one of those coding systems can safely encode all the specified
+text, @code{select-safe-coding-system} chooses it and returns it.
+Otherwise, it asks the user to choose from a list of coding systems
+which can encode all the text, and returns the user's choice.
+
+The optional argument @var{accept-default-p}, if non-@code{nil},
+should be a function to determine whether the coding system selected
+without user interaction is acceptable. If this function returns
+@code{nil}, the silently selected coding system is rejected, and the
+user is asked to select a coding system from a list of possible
+candidates.
+
+@vindex select-safe-coding-system-accept-default-p
+If the variable @code{select-safe-coding-system-accept-default-p} is
+non-@code{nil}, its value overrides the value of
+@var{accept-default-p}.
@end defun
Here are two functions you can use to let the user specify a coding
@code{coding-system-for-read} and @code{coding-system-for-write}
(@pxref{Specifying Coding Systems}).
+@defvar auto-coding-regexp-alist
+This variable is an alist of text patterns and corresponding coding
+systems. Each element has the form @code{(@var{regexp}
+. @var{coding-system})}; a file whose first few kilobytes match
+@var{regexp} is decoded with @var{coding-system} when its contents are
+read into a buffer. The settings in this alist take priority over
+@code{coding:} tags in the files and the contents of
+@code{file-coding-system-alist} (see below). The default value is set
+so that Emacs automatically recognizes mail files in Babyl format and
+reads them with no code conversions.
+@end defvar
+
@defvar file-coding-system-alist
This variable is an alist that specifies the coding systems to use for
reading and writing particular files. Each element has the form
the subprocess, and @var{output-coding} applies to output to it.
@end defvar
+@defvar auto-coding-functions
+This variable holds a list of functions that try to determine a
+coding system for a file based on its undecoded contents.
+
+Each function in this list should be written to look at text in the
+current buffer, but should not modify it in any way. The buffer will
+contain undecoded text of parts of the file. Each function should
+take one argument, @var{size}, which tells it how many characters to
+look at, starting from point. If the function succeeds in determining
+a coding system for the file, it should return that coding system.
+Otherwise, it should return @code{nil}.
+
+If a file has a @samp{coding:} tag, that takes precedence, so these
+functions won't be called.
+@end defvar
+
@defun find-operation-coding-system operation &rest arguments
This function returns the coding system to use (by default) for
performing @var{operation} with @var{arguments}. The value has this
string is acceptable.
@end defun
+@defun decode-coding-inserted-region from to filename &optional visit beg end replace
+This function decodes the text from @var{from} to @var{to} as if
+it were being read from file @var{filename} using @code{insert-file-contents}
+using the rest of the arguments provided.
+
+The normal way to use this function is after reading text from a file
+without decoding, if you decide you would rather have decoded it.
+Instead of deleting the text and reading it again, this time with
+decoding, you can call this function.
+@end defun
+
@node Terminal I/O Encoding
@subsection Terminal I/O Encoding
@defvar locale-coding-system
@tindex locale-coding-system
+@cindex keyboard input decoding on X
This variable specifies the coding system to use for decoding system
-error messages, for encoding the format argument to
-@code{format-time-string}, and for decoding the return value of
-@code{format-time-string}.
+error messages and---on X Window system only---keyboard input, for
+encoding the format argument to @code{format-time-string}, and for
+decoding the return value of @code{format-time-string}.
@end defvar
@defvar system-messages-locale
locale is specified by environment variables in the usual POSIX fashion.
@end defvar
+@defun locale-info item
+This function returns locale data @var{item} for the current POSIX
+locale, if available. @var{item} should be one of these symbols:
+
+@table @code
+@item codeset
+Return the character set as a string (locale item @code{CODESET}).
+
+@item days
+Return a 7-element vector of day names (locale items
+@code{DAY_1} through @code{DAY_7});
+
+@item months
+Return a 12-element vector of month names (locale items @code{MON_1}
+through @code{MON_12}).
+
+@item paper
+Return a list @code{(@var{width} @var{height})} for the default paper
+size measured in milimeters (locale items @code{PAPER_WIDTH} and
+@code{PAPER_HEIGHT}).
+@end table
+
+If the system can't provide the requested information, or if
+@var{item} is not one of those symbols, the value is @code{nil}. All
+strings in the return value are decoded using
+@code{locale-coding-system}. @xref{Locales,,, libc, GNU Libc Manual},
+for more information about locales and locale items.
+@end defun
+
+@ignore
+ arch-tag: be705bf8-941b-4c35-84fc-ad7d20ddb7cb
+@end ignore