(recentf): Added version tag to the defgroup of
[bpt/emacs.git] / lispref / nonascii.texi
CommitLineData
cc6d0d2c
RS
1@c -*-texinfo-*-
2@c This is part of the GNU Emacs Lisp Reference Manual.
fd897522 3@c Copyright (C) 1998, 1999 Free Software Foundation, Inc.
cc6d0d2c
RS
4@c See the file elisp.texi for copying conditions.
5@setfilename ../info/characters
6@node Non-ASCII Characters, Searching and Matching, Text, Top
75708135 7@chapter Non-@sc{ascii} Characters
cc6d0d2c 8@cindex multibyte characters
75708135 9@cindex non-@sc{ascii} characters
cc6d0d2c 10
8241495d 11 This chapter covers the special issues relating to non-@sc{ascii}
cc6d0d2c
RS
12characters and how they are stored in strings and buffers.
13
14@menu
15* Text Representations::
16* Converting Representations::
17* Selecting a Representation::
18* Character Codes::
19* Character Sets::
cc6d0d2c 20* Chars and Bytes::
a9f0a989
RS
21* Splitting Characters::
22* Scanning Charsets::
23* Translation of Characters::
cc6d0d2c 24* Coding Systems::
a9f0a989 25* Input Methods::
2468d0c0 26* Locales:: Interacting with the POSIX locale.
cc6d0d2c
RS
27@end menu
28
29@node Text Representations
30@section Text Representations
31@cindex text representations
32
33 Emacs has two @dfn{text representations}---two ways to represent text
34in a string or buffer. These are called @dfn{unibyte} and
35@dfn{multibyte}. Each string, and each buffer, uses one of these two
36representations. For most purposes, you can ignore the issue of
37representations, because Emacs converts text between them as
38appropriate. Occasionally in Lisp programming you will need to pay
39attention to the difference.
40
41@cindex unibyte text
42 In unibyte representation, each character occupies one byte and
43therefore the possible character codes range from 0 to 255. Codes 0
8241495d
RS
44through 127 are @sc{ascii} characters; the codes from 128 through 255
45are used for one non-@sc{ascii} character set (you can choose which
969fe9b5 46character set by setting the variable @code{nonascii-insert-offset}).
cc6d0d2c
RS
47
48@cindex leading code
49@cindex multibyte text
1911e6e5 50@cindex trailing codes
cc6d0d2c
RS
51 In multibyte representation, a character may occupy more than one
52byte, and as a result, the full range of Emacs character codes can be
53stored. The first byte of a multibyte character is always in the range
54128 through 159 (octal 0200 through 0237). These values are called
a9f0a989
RS
55@dfn{leading codes}. The second and subsequent bytes of a multibyte
56character are always in the range 160 through 255 (octal 0240 through
1911e6e5 570377); these values are @dfn{trailing codes}.
cc6d0d2c 58
b6954afd
RS
59 Some sequences of bytes do not form meaningful multibyte characters:
60for example, a single isolated byte in the range 128 through 255 is
61never meaningful. Such byte sequences are not entirely valid, and never
62appear in proper multibyte text (since that consists of a sequence of
63@emph{characters}); but they can appear as part of ``raw bytes''
64(@pxref{Explicit Encoding}).
65
cc6d0d2c
RS
66 In a buffer, the buffer-local value of the variable
67@code{enable-multibyte-characters} specifies the representation used.
08f0f5e9
KH
68The representation for a string is determined and recorded in the string
69when the string is constructed.
cc6d0d2c 70
cc6d0d2c
RS
71@defvar enable-multibyte-characters
72This variable specifies the current buffer's text representation.
73If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
74it contains unibyte text.
75
969fe9b5
RS
76You cannot set this variable directly; instead, use the function
77@code{set-buffer-multibyte} to change a buffer's representation.
cc6d0d2c
RS
78@end defvar
79
cc6d0d2c 80@defvar default-enable-multibyte-characters
a9f0a989 81This variable's value is entirely equivalent to @code{(default-value
cc6d0d2c 82'enable-multibyte-characters)}, and setting this variable changes that
a9f0a989
RS
83default value. Setting the local binding of
84@code{enable-multibyte-characters} in a specific buffer is not allowed,
85but changing the default value is supported, and it is a reasonable
86thing to do, because it has no effect on existing buffers.
cc6d0d2c
RS
87
88The @samp{--unibyte} command line option does its job by setting the
89default value to @code{nil} early in startup.
90@end defvar
91
b6954afd
RS
92@defun position-bytes position
93@tindex position-bytes
94Return the byte-position corresponding to buffer position @var{position}
95in the current buffer.
96@end defun
97
98@defun byte-to-position byte-position
99@tindex byte-to-position
100Return the buffer position corresponding to byte-position
101@var{byte-position} in the current buffer.
102@end defun
103
cc6d0d2c 104@defun multibyte-string-p string
b6954afd 105Return @code{t} if @var{string} is a multibyte string.
cc6d0d2c
RS
106@end defun
107
108@node Converting Representations
109@section Converting Text Representations
110
111 Emacs can convert unibyte text to multibyte; it can also convert
112multibyte text to unibyte, though this conversion loses information. In
113general these conversions happen when inserting text into a buffer, or
114when putting text from several strings together in one string. You can
115also explicitly convert a string's contents to either representation.
116
117 Emacs chooses the representation for a string based on the text that
118it is constructed from. The general rule is to convert unibyte text to
119multibyte text when combining it with other multibyte text, because the
120multibyte representation is more general and can hold whatever
121characters the unibyte text has.
122
123 When inserting text into a buffer, Emacs converts the text to the
124buffer's representation, as specified by
125@code{enable-multibyte-characters} in that buffer. In particular, when
126you insert multibyte text into a unibyte buffer, Emacs converts the text
127to unibyte, even though this conversion cannot in general preserve all
128the characters that might be in the multibyte text. The other natural
129alternative, to convert the buffer contents to multibyte, is not
130acceptable because the buffer's representation is a choice made by the
969fe9b5 131user that cannot be overridden automatically.
cc6d0d2c 132
8241495d
RS
133 Converting unibyte text to multibyte text leaves @sc{ascii} characters
134unchanged, and likewise 128 through 159. It converts the non-@sc{ascii}
969fe9b5
RS
135codes 160 through 255 by adding the value @code{nonascii-insert-offset}
136to each character code. By setting this variable, you specify which
a9f0a989
RS
137character set the unibyte characters correspond to (@pxref{Character
138Sets}). For example, if @code{nonascii-insert-offset} is 2048, which is
139@code{(- (make-char 'latin-iso8859-1) 128)}, then the unibyte
8241495d 140non-@sc{ascii} characters correspond to Latin 1. If it is 2688, which
a9f0a989
RS
141is @code{(- (make-char 'greek-iso8859-7) 128)}, then they correspond to
142Greek letters.
cc6d0d2c 143
8241495d
RS
144 Converting multibyte text to unibyte is simpler: it discards all but
145the low 8 bits of each character code. If @code{nonascii-insert-offset}
146has a reasonable value, corresponding to the beginning of some character
147set, this conversion is the inverse of the other: converting unibyte
148text to multibyte and back to unibyte reproduces the original unibyte
149text.
cc6d0d2c 150
cc6d0d2c 151@defvar nonascii-insert-offset
8241495d 152This variable specifies the amount to add to a non-@sc{ascii} character
cc6d0d2c 153when converting unibyte text to multibyte. It also applies when
a9f0a989 154@code{self-insert-command} inserts a character in the unibyte
8241495d 155non-@sc{ascii} range, 128 through 255. However, the function
a9f0a989 156@code{insert-char} does not perform this conversion.
cc6d0d2c
RS
157
158The right value to use to select character set @var{cs} is @code{(-
a9f0a989 159(make-char @var{cs}) 128)}. If the value of
cc6d0d2c
RS
160@code{nonascii-insert-offset} is zero, then conversion actually uses the
161value for the Latin 1 character set, rather than zero.
162@end defvar
163
a9f0a989 164@defvar nonascii-translation-table
cc6d0d2c
RS
165This variable provides a more general alternative to
166@code{nonascii-insert-offset}. You can use it to specify independently
167how to translate each code in the range of 128 through 255 into a
168multibyte character. The value should be a vector, or @code{nil}.
969fe9b5 169If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
cc6d0d2c
RS
170@end defvar
171
cc6d0d2c
RS
172@defun string-make-unibyte string
173This function converts the text of @var{string} to unibyte
1911e6e5 174representation, if it isn't already, and returns the result. If
969fe9b5 175@var{string} is a unibyte string, it is returned unchanged.
cc6d0d2c
RS
176@end defun
177
cc6d0d2c
RS
178@defun string-make-multibyte string
179This function converts the text of @var{string} to multibyte
1911e6e5 180representation, if it isn't already, and returns the result. If
969fe9b5 181@var{string} is a multibyte string, it is returned unchanged.
cc6d0d2c
RS
182@end defun
183
184@node Selecting a Representation
185@section Selecting a Representation
186
187 Sometimes it is useful to examine an existing buffer or string as
188multibyte when it was unibyte, or vice versa.
189
cc6d0d2c
RS
190@defun set-buffer-multibyte multibyte
191Set the representation type of the current buffer. If @var{multibyte}
192is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}
193is @code{nil}, the buffer becomes unibyte.
194
195This function leaves the buffer contents unchanged when viewed as a
196sequence of bytes. As a consequence, it can change the contents viewed
197as characters; a sequence of two bytes which is treated as one character
198in multibyte representation will count as two characters in unibyte
199representation.
200
201This function sets @code{enable-multibyte-characters} to record which
202representation is in use. It also adjusts various data in the buffer
969fe9b5
RS
203(including overlays, text properties and markers) so that they cover the
204same text as they did before.
b6954afd
RS
205
206You cannot use @code{set-buffer-multibyte} on an indirect buffer,
207because indirect buffers always inherit the representation of the
208base buffer.
cc6d0d2c
RS
209@end defun
210
cc6d0d2c
RS
211@defun string-as-unibyte string
212This function returns a string with the same bytes as @var{string} but
213treating each byte as a character. This means that the value may have
214more characters than @var{string} has.
215
b6954afd
RS
216If @var{string} is already a unibyte string, then the value is
217@var{string} itself.
cc6d0d2c
RS
218@end defun
219
cc6d0d2c
RS
220@defun string-as-multibyte string
221This function returns a string with the same bytes as @var{string} but
222treating each multibyte sequence as one character. This means that the
223value may have fewer characters than @var{string} has.
224
b6954afd
RS
225If @var{string} is already a multibyte string, then the value is
226@var{string} itself.
cc6d0d2c
RS
227@end defun
228
229@node Character Codes
230@section Character Codes
231@cindex character codes
232
233 The unibyte and multibyte text representations use different character
234codes. The valid character codes for unibyte representation range from
2350 to 255---the values that can fit in one byte. The valid character
236codes for multibyte representation range from 0 to 524287, but not all
237values in that range are valid. In particular, the values 128 through
969fe9b5 238255 are not legitimate in multibyte text (though they can occur in ``raw
8241495d 239bytes''; @pxref{Explicit Encoding}). Only the @sc{ascii} codes 0
969fe9b5 240through 127 are fully legitimate in both representations.
cc6d0d2c
RS
241
242@defun char-valid-p charcode
243This returns @code{t} if @var{charcode} is valid for either one of the two
244text representations.
245
246@example
247(char-valid-p 65)
248 @result{} t
249(char-valid-p 256)
250 @result{} nil
251(char-valid-p 2248)
252 @result{} t
253@end example
254@end defun
255
256@node Character Sets
257@section Character Sets
258@cindex character sets
259
260 Emacs classifies characters into various @dfn{character sets}, each of
261which has a name which is a symbol. Each character belongs to one and
262only one character set.
263
264 In general, there is one character set for each distinct script. For
265example, @code{latin-iso8859-1} is one character set,
266@code{greek-iso8859-7} is another, and @code{ascii} is another. An
969fe9b5
RS
267Emacs character set can hold at most 9025 characters; therefore, in some
268cases, characters that would logically be grouped together are split
a9f0a989
RS
269into several character sets. For example, one set of Chinese
270characters, generally known as Big 5, is divided into two Emacs
271character sets, @code{chinese-big5-1} and @code{chinese-big5-2}.
cc6d0d2c 272
cc6d0d2c 273@defun charsetp object
8241495d 274Returns @code{t} if @var{object} is a symbol that names a character set,
cc6d0d2c
RS
275@code{nil} otherwise.
276@end defun
277
cc6d0d2c
RS
278@defun charset-list
279This function returns a list of all defined character set names.
280@end defun
281
cc6d0d2c 282@defun char-charset character
b6954afd
RS
283This function returns the name of the character set that @var{character}
284belongs to.
cc6d0d2c
RS
285@end defun
286
8241495d
RS
287@defun charset-plist charset
288@tindex charset-plist
289This function returns the charset property list of the character set
290@var{charset}. Although @var{charset} is a symbol, this is not the same
291as the property list of that symbol. Charset properties are used for
292special purposes within Emacs; for example, @code{x-charset-registry}
293helps determine which fonts to use (@pxref{Font Selection}).
294@end defun
295
cc6d0d2c
RS
296@node Chars and Bytes
297@section Characters and Bytes
298@cindex bytes and characters
299
a9f0a989
RS
300@cindex introduction sequence
301@cindex dimension (of character set)
cc6d0d2c 302 In multibyte representation, each character occupies one or more
a9f0a989 303bytes. Each character set has an @dfn{introduction sequence}, which is
8241495d 304normally one or two bytes long. (Exception: the @sc{ascii} character
1911e6e5
RS
305set has a zero-length introduction sequence.) The introduction sequence
306is the beginning of the byte sequence for any character in the character
307set. The rest of the character's bytes distinguish it from the other
308characters in the same character set. Depending on the character set,
309there are either one or two distinguishing bytes; the number of such
310bytes is called the @dfn{dimension} of the character set.
a9f0a989
RS
311
312@defun charset-dimension charset
b6954afd
RS
313This function returns the dimension of @var{charset}; at present, the
314dimension is always 1 or 2.
315@end defun
316
317@defun charset-bytes charset
318@tindex charset-bytes
319This function returns the number of bytes used to represent a character
320in character set @var{charset}.
a9f0a989
RS
321@end defun
322
323 This is the simplest way to determine the byte length of a character
324set's introduction sequence:
325
326@example
b6954afd 327(- (charset-bytes @var{charset})
a9f0a989
RS
328 (charset-dimension @var{charset}))
329@end example
330
331@node Splitting Characters
332@section Splitting Characters
333
334 The functions in this section convert between characters and the byte
335values used to represent them. For most purposes, there is no need to
336be concerned with the sequence of bytes used to represent a character,
969fe9b5 337because Emacs translates automatically when necessary.
cc6d0d2c 338
cc6d0d2c
RS
339@defun split-char character
340Return a list containing the name of the character set of
a9f0a989
RS
341@var{character}, followed by one or two byte values (integers) which
342identify @var{character} within that character set. The number of byte
343values is the character set's dimension.
cc6d0d2c
RS
344
345@example
346(split-char 2248)
347 @result{} (latin-iso8859-1 72)
348(split-char 65)
349 @result{} (ascii 65)
350@end example
351
8241495d 352Unibyte non-@sc{ascii} characters are considered as part of
cc6d0d2c
RS
353the @code{ascii} character set:
354
355@example
356(split-char 192)
357 @result{} (ascii 192)
358@end example
359@end defun
360
cc6d0d2c 361@defun make-char charset &rest byte-values
a9f0a989
RS
362This function returns the character in character set @var{charset}
363identified by @var{byte-values}. This is roughly the inverse of
364@code{split-char}. Normally, you should specify either one or two
365@var{byte-values}, according to the dimension of @var{charset}. For
366example,
cc6d0d2c
RS
367
368@example
369(make-char 'latin-iso8859-1 72)
370 @result{} 2248
371@end example
372@end defun
373
a9f0a989
RS
374@cindex generic characters
375 If you call @code{make-char} with no @var{byte-values}, the result is
376a @dfn{generic character} which stands for @var{charset}. A generic
377character is an integer, but it is @emph{not} valid for insertion in the
378buffer as a character. It can be used in @code{char-table-range} to
379refer to the whole character set (@pxref{Char-Tables}).
380@code{char-valid-p} returns @code{nil} for generic characters.
381For example:
382
383@example
384(make-char 'latin-iso8859-1)
385 @result{} 2176
386(char-valid-p 2176)
387 @result{} nil
388(split-char 2176)
389 @result{} (latin-iso8859-1 0)
390@end example
391
392@node Scanning Charsets
393@section Scanning for Character Sets
394
395 Sometimes it is useful to find out which character sets appear in a
396part of a buffer or a string. One use for this is in determining which
397coding systems (@pxref{Coding Systems}) are capable of representing all
398of the text in question.
399
400@defun find-charset-region beg end &optional translation
a9f0a989
RS
401This function returns a list of the character sets that appear in the
402current buffer between positions @var{beg} and @var{end}.
403
404The optional argument @var{translation} specifies a translation table to
405be used in scanning the text (@pxref{Translation of Characters}). If it
406is non-@code{nil}, then each character in the region is translated
407through this table, and the value returned describes the translated
408characters instead of the characters actually in the buffer.
b6954afd
RS
409
410In two peculiar cases, the value includes the symbol @code{unknown}:
411
412@itemize @bullet
413@item
8241495d 414When a unibyte buffer contains non-@sc{ascii} characters.
b6954afd
RS
415
416@item
417When a multibyte buffer contains invalid byte-sequences (raw bytes).
418@xref{Explicit Encoding}.
419@end itemize
a9f0a989
RS
420@end defun
421
422@defun find-charset-string string &optional translation
b6954afd
RS
423This function returns a list of the character sets that appear in the
424string @var{string}. It is just like @code{find-charset-region}, except
425that it applies to the contents of @var{string} instead of part of the
426current buffer.
a9f0a989
RS
427@end defun
428
429@node Translation of Characters
430@section Translation of Characters
431@cindex character translation tables
432@cindex translation tables
433
434 A @dfn{translation table} specifies a mapping of characters
435into characters. These tables are used in encoding and decoding, and
436for other purposes. Some coding systems specify their own particular
437translation tables; there are also default translation tables which
438apply to all other coding systems.
439
8241495d
RS
440@defun make-translation-table &rest translations
441This function returns a translation table based on the argument
442@var{translations}. Each element of
443@var{translations} should be a list of the form @code{(@var{from}
a9f0a989
RS
444. @var{to})}; this says to translate the character @var{from} into
445@var{to}.
446
447You can also map one whole character set into another character set with
448the same dimension. To do this, you specify a generic character (which
449designates a character set) for @var{from} (@pxref{Splitting Characters}).
450In this case, @var{to} should also be a generic character, for another
451character set of the same dimension. Then the translation table
452translates each character of @var{from}'s character set into the
453corresponding character of @var{to}'s character set.
454@end defun
455
456 In decoding, the translation table's translations are applied to the
457characters that result from ordinary decoding. If a coding system has
458property @code{character-translation-table-for-decode}, that specifies
459the translation table to use. Otherwise, if
b1f687a2
RS
460@code{standard-translation-table-for-decode} is non-@code{nil}, decoding
461uses that table.
a9f0a989
RS
462
463 In encoding, the translation table's translations are applied to the
464characters in the buffer, and the result of translation is actually
465encoded. If a coding system has property
466@code{character-translation-table-for-encode}, that specifies the
467translation table to use. Otherwise the variable
b1f687a2
RS
468@code{standard-translation-table-for-encode} specifies the translation
469table.
a9f0a989 470
b1f687a2 471@defvar standard-translation-table-for-decode
a9f0a989
RS
472This is the default translation table for decoding, for
473coding systems that don't specify any other translation table.
474@end defvar
475
b1f687a2 476@defvar standard-translation-table-for-encode
a9f0a989
RS
477This is the default translation table for encoding, for
478coding systems that don't specify any other translation table.
479@end defvar
480
cc6d0d2c
RS
481@node Coding Systems
482@section Coding Systems
483
484@cindex coding system
485 When Emacs reads or writes a file, and when Emacs sends text to a
486subprocess or receives text from a subprocess, it normally performs
487character code conversion and end-of-line conversion as specified
488by a particular @dfn{coding system}.
489
8241495d
RS
490 How to define a coding system is an arcane matter, and is not
491documented here.
b6954afd 492
a9f0a989
RS
493@menu
494* Coding System Basics::
495* Encoding and I/O::
496* Lisp and Coding Systems::
1911e6e5 497* User-Chosen Coding Systems::
a9f0a989
RS
498* Default Coding Systems::
499* Specifying Coding Systems::
500* Explicit Encoding::
501* Terminal I/O Encoding::
502* MS-DOS File Types::
503@end menu
504
505@node Coding System Basics
506@subsection Basic Concepts of Coding Systems
507
cc6d0d2c
RS
508@cindex character code conversion
509 @dfn{Character code conversion} involves conversion between the encoding
510used inside Emacs and some other encoding. Emacs supports many
511different encodings, in that it can convert to and from them. For
512example, it can convert text to or from encodings such as Latin 1, Latin
5132, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some
514cases, Emacs supports several alternative encodings for the same
515characters; for example, there are three coding systems for the Cyrillic
516(Russian) alphabet: ISO, Alternativnyj, and KOI8.
517
cc6d0d2c 518 Most coding systems specify a particular character code for
8241495d
RS
519conversion, but some of them leave the choice unspecified---to be chosen
520heuristically for each file, based on the data.
cc6d0d2c 521
969fe9b5
RS
522@cindex end of line conversion
523 @dfn{End of line conversion} handles three different conventions used
524on various systems for representing end of line in files. The Unix
525convention is to use the linefeed character (also called newline). The
8241495d
RS
526DOS convention is to use a carriage-return and a linefeed at the end of
527a line. The Mac convention is to use just carriage-return.
969fe9b5 528
cc6d0d2c
RS
529@cindex base coding system
530@cindex variant coding system
531 @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line
532conversion unspecified, to be chosen based on the data. @dfn{Variant
533coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and
534@code{latin-1-mac} specify the end-of-line conversion explicitly as
a9f0a989 535well. Most base coding systems have three corresponding variants whose
cc6d0d2c
RS
536names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.
537
a9f0a989
RS
538 The coding system @code{raw-text} is special in that it prevents
539character code conversion, and causes the buffer visited with that
540coding system to be a unibyte buffer. It does not specify the
541end-of-line conversion, allowing that to be determined as usual by the
542data, and has the usual three variants which specify the end-of-line
543conversion. @code{no-conversion} is equivalent to @code{raw-text-unix}:
544it specifies no conversion of either character codes or end-of-line.
545
546 The coding system @code{emacs-mule} specifies that the data is
547represented in the internal Emacs encoding. This is like
548@code{raw-text} in that no code conversion happens, but different in
549that the result is multibyte data.
550
551@defun coding-system-get coding-system property
a9f0a989
RS
552This function returns the specified property of the coding system
553@var{coding-system}. Most coding system properties exist for internal
554purposes, but one that you might find useful is @code{mime-charset}.
555That property's value is the name used in MIME for the character coding
556which this coding system can read and write. Examples:
557
558@example
559(coding-system-get 'iso-latin-1 'mime-charset)
560 @result{} iso-8859-1
561(coding-system-get 'iso-2022-cn 'mime-charset)
562 @result{} iso-2022-cn
563(coding-system-get 'cyrillic-koi8 'mime-charset)
564 @result{} koi8-r
565@end example
566
567The value of the @code{mime-charset} property is also defined
568as an alias for the coding system.
569@end defun
570
571@node Encoding and I/O
572@subsection Encoding and I/O
573
1911e6e5 574 The principal purpose of coding systems is for use in reading and
a9f0a989
RS
575writing files. The function @code{insert-file-contents} uses
576a coding system for decoding the file data, and @code{write-region}
577uses one to encode the buffer contents.
578
579 You can specify the coding system to use either explicitly
580(@pxref{Specifying Coding Systems}), or implicitly using the defaulting
581mechanism (@pxref{Default Coding Systems}). But these methods may not
582completely specify what to do. For example, they may choose a coding
583system such as @code{undefined} which leaves the character code
584conversion to be determined from the data. In these cases, the I/O
585operation finishes the job of choosing a coding system. Very often
586you will want to find out afterwards which coding system was chosen.
587
588@defvar buffer-file-coding-system
a9f0a989
RS
589This variable records the coding system that was used for visiting the
590current buffer. It is used for saving the buffer, and for writing part
591of the buffer with @code{write-region}. When those operations ask the
592user to specify a different coding system,
593@code{buffer-file-coding-system} is updated to the coding system
594specified.
b6954afd
RS
595
596However, @code{buffer-file-coding-system} does not affect sending text
597to a subprocess.
a9f0a989
RS
598@end defvar
599
600@defvar save-buffer-coding-system
a9f0a989 601This variable specifies the coding system for saving the buffer---but it
8241495d
RS
602is not used for @code{write-region}.
603
604When a command to save the buffer starts out to use
605@code{save-buffer-coding-system}, and that coding system cannot handle
606the actual text in the buffer, the command asks the user to choose
607another coding system. After that happens, the command also updates
608@code{save-buffer-coding-system} to represent the coding system that the
609user specified.
a9f0a989
RS
610@end defvar
611
612@defvar last-coding-system-used
a9f0a989
RS
613I/O operations for files and subprocesses set this variable to the
614coding system name that was used. The explicit encoding and decoding
615functions (@pxref{Explicit Encoding}) set it too.
616
617@strong{Warning:} Since receiving subprocess output sets this variable,
8241495d
RS
618it can change whenever Emacs waits; therefore, you should copy the
619value shortly after the function call that stores the value you are
a9f0a989
RS
620interested in.
621@end defvar
622
2eb4136f
RS
623 The variable @code{selection-coding-system} specifies how to encode
624selections for the window system. @xref{Window System Selections}.
625
969fe9b5
RS
626@node Lisp and Coding Systems
627@subsection Coding Systems in Lisp
628
8241495d 629 Here are the Lisp facilities for working with coding systems:
cc6d0d2c 630
cc6d0d2c
RS
631@defun coding-system-list &optional base-only
632This function returns a list of all coding system names (symbols). If
633@var{base-only} is non-@code{nil}, the value includes only the
634base coding systems. Otherwise, it includes variant coding systems as well.
635@end defun
636
cc6d0d2c
RS
637@defun coding-system-p object
638This function returns @code{t} if @var{object} is a coding system
639name.
640@end defun
641
cc6d0d2c
RS
642@defun check-coding-system coding-system
643This function checks the validity of @var{coding-system}.
644If that is valid, it returns @var{coding-system}.
645Otherwise it signals an error with condition @code{coding-system-error}.
646@end defun
647
a9f0a989 648@defun coding-system-change-eol-conversion coding-system eol-type
a9f0a989 649This function returns a coding system which is like @var{coding-system}
1911e6e5 650except for its eol conversion, which is specified by @code{eol-type}.
a9f0a989
RS
651@var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or
652@code{nil}. If it is @code{nil}, the returned coding system determines
653the end-of-line conversion from the data.
654@end defun
969fe9b5 655
a9f0a989 656@defun coding-system-change-text-conversion eol-coding text-coding
a9f0a989
RS
657This function returns a coding system which uses the end-of-line
658conversion of @var{eol-coding}, and the text conversion of
659@var{text-coding}. If @var{text-coding} is @code{nil}, it returns
660@code{undecided}, or one of its variants according to @var{eol-coding}.
969fe9b5
RS
661@end defun
662
a9f0a989 663@defun find-coding-systems-region from to
a9f0a989
RS
664This function returns a list of coding systems that could be used to
665encode a text between @var{from} and @var{to}. All coding systems in
666the list can safely encode any multibyte characters in that portion of
667the text.
668
669If the text contains no multibyte characters, the function returns the
670list @code{(undecided)}.
671@end defun
672
673@defun find-coding-systems-string string
a9f0a989
RS
674This function returns a list of coding systems that could be used to
675encode the text of @var{string}. All coding systems in the list can
676safely encode any multibyte characters in @var{string}. If the text
677contains no multibyte characters, this returns the list
678@code{(undecided)}.
679@end defun
680
681@defun find-coding-systems-for-charsets charsets
a9f0a989
RS
682This function returns a list of coding systems that could be used to
683encode all the character sets in the list @var{charsets}.
684@end defun
685
686@defun detect-coding-region start end &optional highest
cc6d0d2c
RS
687This function chooses a plausible coding system for decoding the text
688from @var{start} to @var{end}. This text should be ``raw bytes''
969fe9b5 689(@pxref{Explicit Encoding}).
cc6d0d2c 690
a9f0a989 691Normally this function returns a list of coding systems that could
cc6d0d2c 692handle decoding the text that was scanned. They are listed in order of
a9f0a989
RS
693decreasing priority. But if @var{highest} is non-@code{nil}, then the
694return value is just one coding system, the one that is highest in
695priority.
696
8241495d 697If the region contains only @sc{ascii} characters, the value
a9f0a989 698is @code{undecided} or @code{(undecided)}.
cc6d0d2c
RS
699@end defun
700
a9f0a989 701@defun detect-coding-string string highest
cc6d0d2c
RS
702This function is like @code{detect-coding-region} except that it
703operates on the contents of @var{string} instead of bytes in the buffer.
1911e6e5
RS
704@end defun
705
706 @xref{Process Information}, for how to examine or set the coding
707systems used for I/O to a subprocess.
708
709@node User-Chosen Coding Systems
710@subsection User-Chosen Coding Systems
711
1911e6e5 712@defun select-safe-coding-system from to &optional preferred-coding-system
ebc6903b 713This function selects a coding system for encoding the text between
1911e6e5
RS
714@var{from} and @var{to}, asking the user to choose if necessary.
715
716The optional argument @var{preferred-coding-system} specifies a coding
ebc6903b
RS
717system to try first. If that one can handle the text in the specified
718region, then it is used. If this argument is omitted, the current
719buffer's value of @code{buffer-file-coding-system} is tried first.
1911e6e5
RS
720
721If the region contains some multibyte characters that the preferred
722coding system cannot encode, this function asks the user to choose from
723a list of coding systems which can encode the text, and returns the
724user's choice.
725
726One other kludgy feature: if @var{from} is a string, the string is the
727target text, and @var{to} is ignored.
969fe9b5
RS
728@end defun
729
730 Here are two functions you can use to let the user specify a coding
731system, with completion. @xref{Completion}.
732
a9f0a989 733@defun read-coding-system prompt &optional default
969fe9b5
RS
734This function reads a coding system using the minibuffer, prompting with
735string @var{prompt}, and returns the coding system name as a symbol. If
736the user enters null input, @var{default} specifies which coding system
737to return. It should be a symbol or a string.
738@end defun
739
969fe9b5
RS
740@defun read-non-nil-coding-system prompt
741This function reads a coding system using the minibuffer, prompting with
a9f0a989 742string @var{prompt}, and returns the coding system name as a symbol. If
969fe9b5
RS
743the user tries to enter null input, it asks the user to try again.
744@xref{Coding Systems}.
cc6d0d2c
RS
745@end defun
746
747@node Default Coding Systems
a9f0a989 748@subsection Default Coding Systems
cc6d0d2c 749
a9f0a989
RS
750 This section describes variables that specify the default coding
751system for certain files or when running certain subprograms, and the
1911e6e5 752function that I/O operations use to access them.
a9f0a989
RS
753
754 The idea of these variables is that you set them once and for all to the
755defaults you want, and then do not change them again. To specify a
756particular coding system for a particular operation in a Lisp program,
757don't change these variables; instead, override them using
758@code{coding-system-for-read} and @code{coding-system-for-write}
759(@pxref{Specifying Coding Systems}).
cc6d0d2c 760
cc6d0d2c
RS
761@defvar file-coding-system-alist
762This variable is an alist that specifies the coding systems to use for
763reading and writing particular files. Each element has the form
764@code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular
765expression that matches certain file names. The element applies to file
766names that match @var{pattern}.
767
1911e6e5 768The @sc{cdr} of the element, @var{coding}, should be either a coding
8241495d
RS
769system, a cons cell containing two coding systems, or a function name (a
770symbol with a function definition). If @var{coding} is a coding system,
771that coding system is used for both reading the file and writing it. If
772@var{coding} is a cons cell containing two coding systems, its @sc{car}
773specifies the coding system for decoding, and its @sc{cdr} specifies the
774coding system for encoding.
775
776If @var{coding} is a function name, the function must return a coding
cc6d0d2c
RS
777system or a cons cell containing two coding systems. This value is used
778as described above.
779@end defvar
780
cc6d0d2c
RS
781@defvar process-coding-system-alist
782This variable is an alist specifying which coding systems to use for a
783subprocess, depending on which program is running in the subprocess. It
784works like @code{file-coding-system-alist}, except that @var{pattern} is
785matched against the program name used to start the subprocess. The coding
786system or systems specified in this alist are used to initialize the
787coding systems used for I/O to the subprocess, but you can specify
788other coding systems later using @code{set-process-coding-system}.
789@end defvar
790
8241495d
RS
791 @strong{Warning:} Coding systems such as @code{undecided}, which
792determine the coding system from the data, do not work entirely reliably
1911e6e5 793with asynchronous subprocess output. This is because Emacs handles
a9f0a989
RS
794asynchronous subprocess output in batches, as it arrives. If the coding
795system leaves the character code conversion unspecified, or leaves the
796end-of-line conversion unspecified, Emacs must try to detect the proper
797conversion from one batch at a time, and this does not always work.
798
799 Therefore, with an asynchronous subprocess, if at all possible, use a
800coding system which determines both the character code conversion and
801the end of line conversion---that is, one like @code{latin-1-unix},
802rather than @code{undecided} or @code{latin-1}.
803
cc6d0d2c
RS
804@defvar network-coding-system-alist
805This variable is an alist that specifies the coding system to use for
806network streams. It works much like @code{file-coding-system-alist},
969fe9b5 807with the difference that the @var{pattern} in an element may be either a
cc6d0d2c
RS
808port number or a regular expression. If it is a regular expression, it
809is matched against the network service name used to open the network
810stream.
811@end defvar
812
cc6d0d2c
RS
813@defvar default-process-coding-system
814This variable specifies the coding systems to use for subprocess (and
815network stream) input and output, when nothing else specifies what to
816do.
817
a9f0a989
RS
818The value should be a cons cell of the form @code{(@var{input-coding}
819. @var{output-coding})}. Here @var{input-coding} applies to input from
820the subprocess, and @var{output-coding} applies to output to it.
cc6d0d2c
RS
821@end defvar
822
a9f0a989 823@defun find-operation-coding-system operation &rest arguments
a9f0a989
RS
824This function returns the coding system to use (by default) for
825performing @var{operation} with @var{arguments}. The value has this
826form:
827
828@example
829(@var{decoding-system} @var{encoding-system})
830@end example
831
832The first element, @var{decoding-system}, is the coding system to use
833for decoding (in case @var{operation} does decoding), and
834@var{encoding-system} is the coding system for encoding (in case
835@var{operation} does encoding).
836
8241495d 837The argument @var{operation} should be a symbol, one of
a9f0a989
RS
838@code{insert-file-contents}, @code{write-region}, @code{call-process},
839@code{call-process-region}, @code{start-process}, or
8241495d
RS
840@code{open-network-stream}. These are the names of the Emacs I/O primitives
841that can do coding system conversion.
a9f0a989
RS
842
843The remaining arguments should be the same arguments that might be given
8241495d 844to that I/O primitive. Depending on the primitive, one of those
a9f0a989
RS
845arguments is selected as the @dfn{target}. For example, if
846@var{operation} does file I/O, whichever argument specifies the file
847name is the target. For subprocess primitives, the process name is the
848target. For @code{open-network-stream}, the target is the service name
849or port number.
850
851This function looks up the target in @code{file-coding-system-alist},
852@code{process-coding-system-alist}, or
853@code{network-coding-system-alist}, depending on @var{operation}.
854@xref{Default Coding Systems}.
855@end defun
856
cc6d0d2c 857@node Specifying Coding Systems
a9f0a989 858@subsection Specifying a Coding System for One Operation
cc6d0d2c
RS
859
860 You can specify the coding system for a specific operation by binding
861the variables @code{coding-system-for-read} and/or
862@code{coding-system-for-write}.
863
cc6d0d2c
RS
864@defvar coding-system-for-read
865If this variable is non-@code{nil}, it specifies the coding system to
866use for reading a file, or for input from a synchronous subprocess.
867
868It also applies to any asynchronous subprocess or network stream, but in
869a different way: the value of @code{coding-system-for-read} when you
870start the subprocess or open the network stream specifies the input
871decoding method for that subprocess or network stream. It remains in
872use for that subprocess or network stream unless and until overridden.
873
874The right way to use this variable is to bind it with @code{let} for a
875specific I/O operation. Its global value is normally @code{nil}, and
876you should not globally set it to any other value. Here is an example
877of the right way to use the variable:
878
879@example
880;; @r{Read the file with no character code conversion.}
969fe9b5 881;; @r{Assume @sc{crlf} represents end-of-line.}
cc6d0d2c
RS
882(let ((coding-system-for-write 'emacs-mule-dos))
883 (insert-file-contents filename))
884@end example
885
886When its value is non-@code{nil}, @code{coding-system-for-read} takes
a9f0a989 887precedence over all other methods of specifying a coding system to use for
cc6d0d2c
RS
888input, including @code{file-coding-system-alist},
889@code{process-coding-system-alist} and
890@code{network-coding-system-alist}.
891@end defvar
892
cc6d0d2c
RS
893@defvar coding-system-for-write
894This works much like @code{coding-system-for-read}, except that it
895applies to output rather than input. It affects writing to files,
b6954afd 896as well as sending output to subprocesses and net connections.
cc6d0d2c
RS
897
898When a single operation does both input and output, as do
899@code{call-process-region} and @code{start-process}, both
900@code{coding-system-for-read} and @code{coding-system-for-write}
901affect it.
902@end defvar
903
cc6d0d2c
RS
904@defvar inhibit-eol-conversion
905When this variable is non-@code{nil}, no end-of-line conversion is done,
906no matter which coding system is specified. This applies to all the
907Emacs I/O and subprocess primitives, and to the explicit encoding and
908decoding functions (@pxref{Explicit Encoding}).
909@end defvar
910
cc6d0d2c 911@node Explicit Encoding
a9f0a989 912@subsection Explicit Encoding and Decoding
cc6d0d2c
RS
913@cindex encoding text
914@cindex decoding text
915
916 All the operations that transfer text in and out of Emacs have the
917ability to use a coding system to encode or decode the text.
918You can also explicitly encode and decode text using the functions
919in this section.
920
921@cindex raw bytes
922 The result of encoding, and the input to decoding, are not ordinary
923text. They are ``raw bytes''---bytes that represent text in the same
924way that an external file would. When a buffer contains raw bytes, it
925is most natural to mark that buffer as using unibyte representation,
926using @code{set-buffer-multibyte} (@pxref{Selecting a Representation}),
969fe9b5
RS
927but this is not required. If the buffer's contents are only temporarily
928raw, leave the buffer multibyte, which will be correct after you decode
929them.
cc6d0d2c
RS
930
931 The usual way to get raw bytes in a buffer, for explicit decoding, is
969fe9b5 932to read them from a file with @code{insert-file-contents-literally}
cc6d0d2c 933(@pxref{Reading from Files}) or specify a non-@code{nil} @var{rawfile}
969fe9b5 934argument when visiting a file with @code{find-file-noselect}.
cc6d0d2c
RS
935
936 The usual way to use the raw bytes that result from explicitly
937encoding text is to copy them to a file or process---for example, to
969fe9b5 938write them with @code{write-region} (@pxref{Writing to Files}), and
cc6d0d2c
RS
939suppress encoding for that @code{write-region} call by binding
940@code{coding-system-for-write} to @code{no-conversion}.
941
b6954afd
RS
942 Raw bytes typically contain stray individual bytes with values in the
943range 128 through 255, that are legitimate only as part of multibyte
944sequences. Even if the buffer is multibyte, Emacs treats each such
945individual byte as a character and uses the byte value as its character
946code. In this way, character codes 128 through 255 can be found in a
947multibyte buffer, even though they are not legitimate multibyte
948character codes.
949
1911e6e5 950 Raw bytes sometimes contain overlong byte-sequences that look like a
b6954afd
RS
951proper multibyte character plus extra superfluous trailing codes. For
952most purposes, Emacs treats such a sequence in a buffer or string as a
953single character, and if you look at its character code, you get the
954value that corresponds to the multibyte character
955sequence---disregarding the extra trailing codes. This is not quite
956clean, but raw bytes are used only in limited ways, so as a practical
957matter it is not worth the trouble to treat this case differently.
958
959 When a multibyte buffer contains illegitimate byte sequences,
08f0f5e9 960sometimes insertion or deletion can cause them to coalesce into a
b6954afd
RS
961legitimate multibyte character. For example, suppose the buffer
962contains the sequence 129 68 192, 68 being the character @samp{D}. If
963you delete the @samp{D}, the bytes 129 and 192 become adjacent, and thus
964become one multibyte character (Latin-1 A with grave accent). Point
965moves to one side or the other of the character, since it cannot be
966within a character. Don't be alarmed by this.
967
968 Some really peculiar situations prevent proper coalescence. For
969example, if you narrow the buffer so that the accessible portion begins
970just before the @samp{D}, then delete the @samp{D}, the two surrounding
971bytes cannot coalesce because one of them is outside the accessible
972portion of the buffer. In this case, the deletion cannot be done, so
973@code{delete-region} signals an error.
974
975 Here are the functions to perform explicit encoding or decoding. The
976decoding functions produce ``raw bytes''; the encoding functions are
977meant to operate on ``raw bytes''. All of these functions discard text
978properties.
1911e6e5 979
cc6d0d2c
RS
980@defun encode-coding-region start end coding-system
981This function encodes the text from @var{start} to @var{end} according
969fe9b5
RS
982to coding system @var{coding-system}. The encoded text replaces the
983original text in the buffer. The result of encoding is ``raw bytes,''
984but the buffer remains multibyte if it was multibyte before.
cc6d0d2c
RS
985@end defun
986
cc6d0d2c
RS
987@defun encode-coding-string string coding-system
988This function encodes the text in @var{string} according to coding
989system @var{coding-system}. It returns a new string containing the
969fe9b5 990encoded text. The result of encoding is a unibyte string of ``raw bytes.''
cc6d0d2c
RS
991@end defun
992
cc6d0d2c
RS
993@defun decode-coding-region start end coding-system
994This function decodes the text from @var{start} to @var{end} according
995to coding system @var{coding-system}. The decoded text replaces the
996original text in the buffer. To make explicit decoding useful, the text
997before decoding ought to be ``raw bytes.''
998@end defun
999
cc6d0d2c
RS
1000@defun decode-coding-string string coding-system
1001This function decodes the text in @var{string} according to coding
1002system @var{coding-system}. It returns a new string containing the
1003decoded text. To make explicit decoding useful, the contents of
1004@var{string} ought to be ``raw bytes.''
1005@end defun
969fe9b5 1006
a9f0a989
RS
1007@node Terminal I/O Encoding
1008@subsection Terminal I/O Encoding
1009
1010 Emacs can decode keyboard input using a coding system, and encode
2eb4136f
RS
1011terminal output. This is useful for terminals that transmit or display
1012text using a particular encoding such as Latin-1. Emacs does not set
1013@code{last-coding-system-used} for encoding or decoding for the
1014terminal.
a9f0a989
RS
1015
1016@defun keyboard-coding-system
a9f0a989
RS
1017This function returns the coding system that is in use for decoding
1018keyboard input---or @code{nil} if no coding system is to be used.
1019@end defun
1020
1021@defun set-keyboard-coding-system coding-system
a9f0a989
RS
1022This function specifies @var{coding-system} as the coding system to
1023use for decoding keyboard input. If @var{coding-system} is @code{nil},
1024that means do not decode keyboard input.
1025@end defun
1026
1027@defun terminal-coding-system
a9f0a989
RS
1028This function returns the coding system that is in use for encoding
1029terminal output---or @code{nil} for no encoding.
1030@end defun
1031
1032@defun set-terminal-coding-system coding-system
a9f0a989
RS
1033This function specifies @var{coding-system} as the coding system to use
1034for encoding terminal output. If @var{coding-system} is @code{nil},
1035that means do not encode terminal output.
1036@end defun
1037
969fe9b5 1038@node MS-DOS File Types
a9f0a989 1039@subsection MS-DOS File Types
969fe9b5
RS
1040@cindex DOS file types
1041@cindex MS-DOS file types
1042@cindex Windows file types
1043@cindex file types on MS-DOS and Windows
1044@cindex text files and binary files
1045@cindex binary files and text files
1046
8241495d
RS
1047 On MS-DOS and Microsoft Windows, Emacs guesses the appropriate
1048end-of-line conversion for a file by looking at the file's name. This
1049feature classifies fils as @dfn{text files} and @dfn{binary files}. By
1050``binary file'' we mean a file of literal byte values that are not
1051necessarily meant to be characters; Emacs does no end-of-line conversion
1052and no character code conversion for them. On the other hand, the bytes
1053in a text file are intended to represent characters; when you create a
1054new file whose name implies that it is a text file, Emacs uses DOS
1055end-of-line conversion.
969fe9b5
RS
1056
1057@defvar buffer-file-type
1058This variable, automatically buffer-local in each buffer, records the
a9f0a989
RS
1059file type of the buffer's visited file. When a buffer does not specify
1060a coding system with @code{buffer-file-coding-system}, this variable is
1061used to determine which coding system to use when writing the contents
1062of the buffer. It should be @code{nil} for text, @code{t} for binary.
1063If it is @code{t}, the coding system is @code{no-conversion}.
1064Otherwise, @code{undecided-dos} is used.
1065
1066Normally this variable is set by visiting a file; it is set to
1067@code{nil} if the file was visited without any actual conversion.
969fe9b5
RS
1068@end defvar
1069
1070@defopt file-name-buffer-file-type-alist
1071This variable holds an alist for recognizing text and binary files.
1072Each element has the form (@var{regexp} . @var{type}), where
1073@var{regexp} is matched against the file name, and @var{type} may be
1074@code{nil} for text, @code{t} for binary, or a function to call to
1075compute which. If it is a function, then it is called with a single
1076argument (the file name) and should return @code{t} or @code{nil}.
1077
8241495d 1078When running on MS-DOS or MS-Windows, Emacs checks this alist to decide
969fe9b5
RS
1079which coding system to use when reading a file. For a text file,
1080@code{undecided-dos} is used. For a binary file, @code{no-conversion}
1081is used.
1082
1083If no element in this alist matches a given file name, then
1084@code{default-buffer-file-type} says how to treat the file.
1085@end defopt
1086
1087@defopt default-buffer-file-type
1088This variable says how to handle files for which
1089@code{file-name-buffer-file-type-alist} says nothing about the type.
1090
1091If this variable is non-@code{nil}, then these files are treated as
a9f0a989
RS
1092binary: the coding system @code{no-conversion} is used. Otherwise,
1093nothing special is done for them---the coding system is deduced solely
1094from the file contents, in the usual Emacs fashion.
969fe9b5
RS
1095@end defopt
1096
a9f0a989
RS
1097@node Input Methods
1098@section Input Methods
1099@cindex input methods
1100
8241495d 1101 @dfn{Input methods} provide convenient ways of entering non-@sc{ascii}
a9f0a989 1102characters from the keyboard. Unlike coding systems, which translate
8241495d 1103non-@sc{ascii} characters to and from encodings meant to be read by
a9f0a989
RS
1104programs, input methods provide human-friendly commands. (@xref{Input
1105Methods,,, emacs, The GNU Emacs Manual}, for information on how users
1106use input methods to enter text.) How to define input methods is not
1107yet documented in this manual, but here we describe how to use them.
1108
1109 Each input method has a name, which is currently a string;
1110in the future, symbols may also be usable as input method names.
1111
a9f0a989
RS
1112@defvar current-input-method
1113This variable holds the name of the input method now active in the
1114current buffer. (It automatically becomes local in each buffer when set
1115in any fashion.) It is @code{nil} if no input method is active in the
1116buffer now.
969fe9b5
RS
1117@end defvar
1118
a9f0a989
RS
1119@defvar default-input-method
1120This variable holds the default input method for commands that choose an
1121input method. Unlike @code{current-input-method}, this variable is
1122normally global.
969fe9b5 1123@end defvar
a9f0a989 1124
a9f0a989
RS
1125@defun set-input-method input-method
1126This function activates input method @var{input-method} for the current
1127buffer. It also sets @code{default-input-method} to @var{input-method}.
1128If @var{input-method} is @code{nil}, this function deactivates any input
1129method for the current buffer.
1130@end defun
1131
a9f0a989
RS
1132@defun read-input-method-name prompt &optional default inhibit-null
1133This function reads an input method name with the minibuffer, prompting
1134with @var{prompt}. If @var{default} is non-@code{nil}, that is returned
1135by default, if the user enters empty input. However, if
1136@var{inhibit-null} is non-@code{nil}, empty input signals an error.
1137
1138The returned value is a string.
1139@end defun
1140
a9f0a989
RS
1141@defvar input-method-alist
1142This variable defines all the supported input methods.
1143Each element defines one input method, and should have the form:
1144
1145@example
1911e6e5
RS
1146(@var{input-method} @var{language-env} @var{activate-func}
1147 @var{title} @var{description} @var{args}...)
a9f0a989
RS
1148@end example
1149
1911e6e5
RS
1150Here @var{input-method} is the input method name, a string;
1151@var{language-env} is another string, the name of the language
1152environment this input method is recommended for. (That serves only for
1153documentation purposes.)
a9f0a989
RS
1154
1155@var{title} is a string to display in the mode line while this method is
1156active. @var{description} is a string describing this method and what
1157it is good for.
1158
1159@var{activate-func} is a function to call to activate this method. The
1160@var{args}, if any, are passed as arguments to @var{activate-func}. All
1161told, the arguments to @var{activate-func} are @var{input-method} and
1162the @var{args}.
1911e6e5 1163@end defvar
a9f0a989 1164
2eb4136f
RS
1165 The fundamental interface to input methods is through the
1166variable @code{input-method-function}. @xref{Reading One Event}.
2468d0c0
DL
1167
1168@node Locales
1169@section Locales
1170@cindex locale
1171
1172 POSIX defines a concept of ``locales'' which control which language
1173to use in language-related features. These Emacs variables control
1174how Emacs interacts with these features.
1175
1176@defvar locale-coding-system
1177@tindex locale-coding-system
1178This variable specifies the coding system to use for decoding system
1179error messages, for encoding the format argument to
1180@code{format-time-string}, and for decoding the return value of
1181@code{format-time-string}.
1182@end defvar
1183
1184@defvar system-messages-locale
1185@tindex system-messages-locale
1186This variable specifies the locale to use for generating system error
1187messages. Changing the locale can cause messages to come out in a
9c17f494 1188different language or in a different orthography. If the variable is
2468d0c0
DL
1189@code{nil}, the locale is specified by environment variables in the
1190usual POSIX fashion.
1191@end defvar
1192
1193@defvar system-time-locale
1194@tindex system-time-locale
1195This variable specifies the locale to use for formatting time values.
1196Changing the locale can cause messages to appear according to the
1197conventions of a different language. If the variable is @code{nil}, the
1198locale is specified by environment variables in the usual POSIX fashion.
1199@end defvar