lispref/nonascii.texi

   1 @c -*-texinfo-*-
   2 @c This is part of the GNU Emacs Lisp Reference Manual.
   3 @c Copyright (C) 1998 Free Software Foundation, Inc.
   4 @c See the file elisp.texi for copying conditions.
   5 @setfilename ../info/characters
   6 @node Non-ASCII Characters, Searching and Matching, Text, Top
   7 @chapter Non-ASCII Characters
   8 @cindex multibyte characters
   9 @cindex non-ASCII characters
  10
  11   This chapter covers the special issues relating to non-@sc{ascii}
  12 characters and how they are stored in strings and buffers.
  13
  14 @menu
  15 * Text Representations::
  16 * Converting Representations::
  17 * Selecting a Representation::
  18 * Character Codes::
  19 * Character Sets::
  20 * Chars and Bytes::
  21 * Splitting Characters::
  22 * Scanning Charsets::
  23 * Translation of Characters::
  24 * Coding Systems::
  25 * Input Methods::
  26 @end menu
  27
  28 @node Text Representations
  29 @section Text Representations
  30 @cindex text representations
  31
  32   Emacs has two @dfn{text representations}---two ways to represent text
  33 in a string or buffer.  These are called @dfn{unibyte} and
  34 @dfn{multibyte}.  Each string, and each buffer, uses one of these two
  35 representations.  For most purposes, you can ignore the issue of
  36 representations, because Emacs converts text between them as
  37 appropriate.  Occasionally in Lisp programming you will need to pay
  38 attention to the difference.
  39
  40 @cindex unibyte text
  41   In unibyte representation, each character occupies one byte and
  42 therefore the possible character codes range from 0 to 255.  Codes 0
  43 through 127 are @sc{ascii} characters; the codes from 128 through 255
  44 are used for one non-@sc{ascii} character set (you can choose which
  45 character set by setting the variable @code{nonascii-insert-offset}).
  46
  47 @cindex leading code
  48 @cindex multibyte text
  49 @cindex trailing codes
  50   In multibyte representation, a character may occupy more than one
  51 byte, and as a result, the full range of Emacs character codes can be
  52 stored.  The first byte of a multibyte character is always in the range
  53 128 through 159 (octal 0200 through 0237).  These values are called
  54 @dfn{leading codes}.  The second and subsequent bytes of a multibyte
  55 character are always in the range 160 through 255 (octal 0240 through
  56 0377); these values are @dfn{trailing codes}.
  57
  58   Some sequences of bytes do not form meaningful multibyte characters:
  59 for example, a single isolated byte in the range 128 through 255 is
  60 never meaningful.  Such byte sequences are not entirely valid, and never
  61 appear in proper multibyte text (since that consists of a sequence of
  62 @emph{characters}); but they can appear as part of ``raw bytes''
  63 (@pxref{Explicit Encoding}).
  64
  65   In a buffer, the buffer-local value of the variable
  66 @code{enable-multibyte-characters} specifies the representation used.
  67 The representation for a string is determined and recorded in the string
  68 when the string is constructed.
  69
  70 @defvar enable-multibyte-characters
  71 @tindex enable-multibyte-characters
  72 This variable specifies the current buffer's text representation.
  73 If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
  74 it contains unibyte text.
  75
  76 You cannot set this variable directly; instead, use the function
  77 @code{set-buffer-multibyte} to change a buffer's representation.
  78 @end defvar
  79
  80 @defvar default-enable-multibyte-characters
  81 @tindex default-enable-multibyte-characters
  82 This variable's value is entirely equivalent to @code{(default-value
  83 'enable-multibyte-characters)}, and setting this variable changes that
  84 default value.  Setting the local binding of
  85 @code{enable-multibyte-characters} in a specific buffer is not allowed,
  86 but changing the default value is supported, and it is a reasonable
  87 thing to do, because it has no effect on existing buffers.
  88
  89 The @samp{--unibyte} command line option does its job by setting the
  90 default value to @code{nil} early in startup.
  91 @end defvar
  92
  93 @defun position-bytes position
  94 @tindex position-bytes
  95 Return the byte-position corresponding to buffer position @var{position}
  96 in the current buffer.
  97 @end defun
  98
  99 @defun byte-to-position byte-position
 100 @tindex byte-to-position
 101 Return the buffer position corresponding to byte-position
 102 @var{byte-position} in the current buffer.
 103 @end defun
 104
 105 @defun multibyte-string-p string
 106 @tindex multibyte-string-p
 107 Return @code{t} if @var{string} is a multibyte string.
 108 @end defun
 109
 110 @node Converting Representations
 111 @section Converting Text Representations
 112
 113   Emacs can convert unibyte text to multibyte; it can also convert
 114 multibyte text to unibyte, though this conversion loses information.  In
 115 general these conversions happen when inserting text into a buffer, or
 116 when putting text from several strings together in one string.  You can
 117 also explicitly convert a string's contents to either representation.
 118
 119   Emacs chooses the representation for a string based on the text that
 120 it is constructed from.  The general rule is to convert unibyte text to
 121 multibyte text when combining it with other multibyte text, because the
 122 multibyte representation is more general and can hold whatever
 123 characters the unibyte text has.
 124
 125   When inserting text into a buffer, Emacs converts the text to the
 126 buffer's representation, as specified by
 127 @code{enable-multibyte-characters} in that buffer.  In particular, when
 128 you insert multibyte text into a unibyte buffer, Emacs converts the text
 129 to unibyte, even though this conversion cannot in general preserve all
 130 the characters that might be in the multibyte text.  The other natural
 131 alternative, to convert the buffer contents to multibyte, is not
 132 acceptable because the buffer's representation is a choice made by the
 133 user that cannot be overridden automatically.
 134
 135   Converting unibyte text to multibyte text leaves @sc{ascii} characters
 136 unchanged, and likewise 128 through 159.  It converts the non-@sc{ascii}
 137 codes 160 through 255 by adding the value @code{nonascii-insert-offset}
 138 to each character code.  By setting this variable, you specify which
 139 character set the unibyte characters correspond to (@pxref{Character
 140 Sets}).  For example, if @code{nonascii-insert-offset} is 2048, which is
 141 @code{(- (make-char 'latin-iso8859-1) 128)}, then the unibyte
 142 non-@sc{ascii} characters correspond to Latin 1.  If it is 2688, which
 143 is @code{(- (make-char 'greek-iso8859-7) 128)}, then they correspond to
 144 Greek letters.
 145
 146   Converting multibyte text to unibyte is simpler: it discards all but
 147 the low 8 bits of each character code.  If @code{nonascii-insert-offset}
 148 has a reasonable value, corresponding to the beginning of some character
 149 set, this conversion is the inverse of the other: converting unibyte
 150 text to multibyte and back to unibyte reproduces the original unibyte
 151 text.
 152
 153 @defvar nonascii-insert-offset
 154 @tindex nonascii-insert-offset
 155 This variable specifies the amount to add to a non-@sc{ascii} character
 156 when converting unibyte text to multibyte.  It also applies when
 157 @code{self-insert-command} inserts a character in the unibyte
 158 non-@sc{ascii} range, 128 through 255.  However, the function
 159 @code{insert-char} does not perform this conversion.
 160
 161 The right value to use to select character set @var{cs} is @code{(-
 162 (make-char @var{cs}) 128)}.  If the value of
 163 @code{nonascii-insert-offset} is zero, then conversion actually uses the
 164 value for the Latin 1 character set, rather than zero.
 165 @end defvar
 166
 167 @defvar nonascii-translation-table
 168 @tindex nonascii-translation-table
 169 This variable provides a more general alternative to
 170 @code{nonascii-insert-offset}.  You can use it to specify independently
 171 how to translate each code in the range of 128 through 255 into a
 172 multibyte character.  The value should be a vector, or @code{nil}.
 173 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
 174 @end defvar
 175
 176 @defun string-make-unibyte string
 177 @tindex string-make-unibyte
 178 This function converts the text of @var{string} to unibyte
 179 representation, if it isn't already, and returns the result.  If
 180 @var{string} is a unibyte string, it is returned unchanged.
 181 @end defun
 182
 183 @defun string-make-multibyte string
 184 @tindex string-make-multibyte
 185 This function converts the text of @var{string} to multibyte
 186 representation, if it isn't already, and returns the result.  If
 187 @var{string} is a multibyte string, it is returned unchanged.
 188 @end defun
 189
 190 @node Selecting a Representation
 191 @section Selecting a Representation
 192
 193   Sometimes it is useful to examine an existing buffer or string as
 194 multibyte when it was unibyte, or vice versa.
 195
 196 @defun set-buffer-multibyte multibyte
 197 @tindex set-buffer-multibyte
 198 Set the representation type of the current buffer.  If @var{multibyte}
 199 is non-@code{nil}, the buffer becomes multibyte.  If @var{multibyte}
 200 is @code{nil}, the buffer becomes unibyte.
 201
 202 This function leaves the buffer contents unchanged when viewed as a
 203 sequence of bytes.  As a consequence, it can change the contents viewed
 204 as characters; a sequence of two bytes which is treated as one character
 205 in multibyte representation will count as two characters in unibyte
 206 representation.
 207
 208 This function sets @code{enable-multibyte-characters} to record which
 209 representation is in use.  It also adjusts various data in the buffer
 210 (including overlays, text properties and markers) so that they cover the
 211 same text as they did before.
 212
 213 You cannot use @code{set-buffer-multibyte} on an indirect buffer,
 214 because indirect buffers always inherit the representation of the
 215 base buffer.
 216 @end defun
 217
 218 @defun string-as-unibyte string
 219 @tindex string-as-unibyte
 220 This function returns a string with the same bytes as @var{string} but
 221 treating each byte as a character.  This means that the value may have
 222 more characters than @var{string} has.
 223
 224 If @var{string} is already a unibyte string, then the value is
 225 @var{string} itself.
 226 @end defun
 227
 228 @defun string-as-multibyte string
 229 @tindex string-as-multibyte
 230 This function returns a string with the same bytes as @var{string} but
 231 treating each multibyte sequence as one character.  This means that the
 232 value may have fewer characters than @var{string} has.
 233
 234 If @var{string} is already a multibyte string, then the value is
 235 @var{string} itself.
 236 @end defun
 237
 238 @node Character Codes
 239 @section Character Codes
 240 @cindex character codes
 241
 242   The unibyte and multibyte text representations use different character
 243 codes.  The valid character codes for unibyte representation range from
 244 0 to 255---the values that can fit in one byte.  The valid character
 245 codes for multibyte representation range from 0 to 524287, but not all
 246 values in that range are valid.  In particular, the values 128 through
 247 255 are not legitimate in multibyte text (though they can occur in ``raw
 248 bytes''; @pxref{Explicit Encoding}).  Only the @sc{ascii} codes 0
 249 through 127 are fully legitimate in both representations.
 250
 251 @defun char-valid-p charcode
 252 This returns @code{t} if @var{charcode} is valid for either one of the two
 253 text representations.
 254
 255 @example
 256 (char-valid-p 65)
 257      @result{} t
 258 (char-valid-p 256)
 259      @result{} nil
 260 (char-valid-p 2248)
 261      @result{} t
 262 @end example
 263 @end defun
 264
 265 @node Character Sets
 266 @section Character Sets
 267 @cindex character sets
 268
 269   Emacs classifies characters into various @dfn{character sets}, each of
 270 which has a name which is a symbol.  Each character belongs to one and
 271 only one character set.
 272
 273   In general, there is one character set for each distinct script.  For
 274 example, @code{latin-iso8859-1} is one character set,
 275 @code{greek-iso8859-7} is another, and @code{ascii} is another.  An
 276 Emacs character set can hold at most 9025 characters; therefore, in some
 277 cases, characters that would logically be grouped together are split
 278 into several character sets.  For example, one set of Chinese
 279 characters, generally known as Big 5, is divided into two Emacs
 280 character sets, @code{chinese-big5-1} and @code{chinese-big5-2}.
 281
 282 @defun charsetp object
 283 @tindex charsetp
 284 Returns @code{t} if @var{object} is a symbol that names a character set,
 285 @code{nil} otherwise.
 286 @end defun
 287
 288 @defun charset-list
 289 @tindex charset-list
 290 This function returns a list of all defined character set names.
 291 @end defun
 292
 293 @defun char-charset character
 294 @tindex char-charset
 295 This function returns the name of the character set that @var{character}
 296 belongs to.
 297 @end defun
 298
 299 @defun charset-plist charset
 300 @tindex charset-plist
 301 This function returns the charset property list of the character set
 302 @var{charset}.  Although @var{charset} is a symbol, this is not the same
 303 as the property list of that symbol.  Charset properties are used for
 304 special purposes within Emacs; for example, @code{x-charset-registry}
 305 helps determine which fonts to use (@pxref{Font Selection}).
 306 @end defun
 307
 308 @node Chars and Bytes
 309 @section Characters and Bytes
 310 @cindex bytes and characters
 311
 312 @cindex introduction sequence
 313 @cindex dimension (of character set)
 314   In multibyte representation, each character occupies one or more
 315 bytes.  Each character set has an @dfn{introduction sequence}, which is
 316 normally one or two bytes long.  (Exception: the @sc{ascii} character
 317 set has a zero-length introduction sequence.)  The introduction sequence
 318 is the beginning of the byte sequence for any character in the character
 319 set.  The rest of the character's bytes distinguish it from the other
 320 characters in the same character set.  Depending on the character set,
 321 there are either one or two distinguishing bytes; the number of such
 322 bytes is called the @dfn{dimension} of the character set.
 323
 324 @defun charset-dimension charset
 325 @tindex charset-dimension
 326 This function returns the dimension of @var{charset}; at present, the
 327 dimension is always 1 or 2.
 328 @end defun
 329
 330 @defun charset-bytes charset
 331 @tindex charset-bytes
 332 This function returns the number of bytes used to represent a character
 333 in character set @var{charset}.
 334 @end defun
 335
 336   This is the simplest way to determine the byte length of a character
 337 set's introduction sequence:
 338
 339 @example
 340 (- (charset-bytes @var{charset})
 341    (charset-dimension @var{charset}))
 342 @end example
 343
 344 @node Splitting Characters
 345 @section Splitting Characters
 346
 347   The functions in this section convert between characters and the byte
 348 values used to represent them.  For most purposes, there is no need to
 349 be concerned with the sequence of bytes used to represent a character,
 350 because Emacs translates automatically when necessary.
 351
 352 @defun split-char character
 353 @tindex split-char
 354 Return a list containing the name of the character set of
 355 @var{character}, followed by one or two byte values (integers) which
 356 identify @var{character} within that character set.  The number of byte
 357 values is the character set's dimension.
 358
 359 @example
 360 (split-char 2248)
 361      @result{} (latin-iso8859-1 72)
 362 (split-char 65)
 363      @result{} (ascii 65)
 364 @end example
 365
 366 Unibyte non-@sc{ascii} characters are considered as part of
 367 the @code{ascii} character set:
 368
 369 @example
 370 (split-char 192)
 371      @result{} (ascii 192)
 372 @end example
 373 @end defun
 374
 375 @defun make-char charset &rest byte-values
 376 @tindex make-char
 377 This function returns the character in character set @var{charset}
 378 identified by @var{byte-values}.  This is roughly the inverse of
 379 @code{split-char}.  Normally, you should specify either one or two
 380 @var{byte-values}, according to the dimension of @var{charset}.  For
 381 example,
 382
 383 @example
 384 (make-char 'latin-iso8859-1 72)
 385      @result{} 2248
 386 @end example
 387 @end defun
 388
 389 @cindex generic characters
 390   If you call @code{make-char} with no @var{byte-values}, the result is
 391 a @dfn{generic character} which stands for @var{charset}.  A generic
 392 character is an integer, but it is @emph{not} valid for insertion in the
 393 buffer as a character.  It can be used in @code{char-table-range} to
 394 refer to the whole character set (@pxref{Char-Tables}).
 395 @code{char-valid-p} returns @code{nil} for generic characters.
 396 For example:
 397
 398 @example
 399 (make-char 'latin-iso8859-1)
 400      @result{} 2176
 401 (char-valid-p 2176)
 402      @result{} nil
 403 (split-char 2176)
 404      @result{} (latin-iso8859-1 0)
 405 @end example
 406
 407 @node Scanning Charsets
 408 @section Scanning for Character Sets
 409
 410   Sometimes it is useful to find out which character sets appear in a
 411 part of a buffer or a string.  One use for this is in determining which
 412 coding systems (@pxref{Coding Systems}) are capable of representing all
 413 of the text in question.
 414
 415 @defun find-charset-region beg end &optional translation
 416 @tindex find-charset-region
 417 This function returns a list of the character sets that appear in the
 418 current buffer between positions @var{beg} and @var{end}.
 419
 420 The optional argument @var{translation} specifies a translation table to
 421 be used in scanning the text (@pxref{Translation of Characters}).  If it
 422 is non-@code{nil}, then each character in the region is translated
 423 through this table, and the value returned describes the translated
 424 characters instead of the characters actually in the buffer.
 425
 426 In two peculiar cases, the value includes the symbol @code{unknown}:
 427
 428 @itemize @bullet
 429 @item
 430 When a unibyte buffer contains non-@sc{ascii} characters.
 431
 432 @item
 433 When a multibyte buffer contains invalid byte-sequences (raw bytes).
 434 @xref{Explicit Encoding}.
 435 @end itemize
 436 @end defun
 437
 438 @defun find-charset-string string &optional translation
 439 @tindex find-charset-string
 440 This function returns a list of the character sets that appear in the
 441 string @var{string}.  It is just like @code{find-charset-region}, except
 442 that it applies to the contents of @var{string} instead of part of the
 443 current buffer.
 444 @end defun
 445
 446 @node Translation of Characters
 447 @section Translation of Characters
 448 @cindex character translation tables
 449 @cindex translation tables
 450
 451   A @dfn{translation table} specifies a mapping of characters
 452 into characters.  These tables are used in encoding and decoding, and
 453 for other purposes.  Some coding systems specify their own particular
 454 translation tables; there are also default translation tables which
 455 apply to all other coding systems.
 456
 457 @defun make-translation-table &rest translations
 458 This function returns a translation table based on the argument
 459 @var{translations}.  Each element of
 460 @var{translations} should be a list of the form @code{(@var{from}
 461 . @var{to})}; this says to translate the character @var{from} into
 462 @var{to}.
 463
 464 You can also map one whole character set into another character set with
 465 the same dimension.  To do this, you specify a generic character (which
 466 designates a character set) for @var{from} (@pxref{Splitting Characters}).
 467 In this case, @var{to} should also be a generic character, for another
 468 character set of the same dimension.  Then the translation table
 469 translates each character of @var{from}'s character set into the
 470 corresponding character of @var{to}'s character set.
 471 @end defun
 472
 473   In decoding, the translation table's translations are applied to the
 474 characters that result from ordinary decoding.  If a coding system has
 475 property @code{character-translation-table-for-decode}, that specifies
 476 the translation table to use.  Otherwise, if
 477 @code{standard-translation-table-for-decode} is non-@code{nil}, decoding
 478 uses that table.
 479
 480   In encoding, the translation table's translations are applied to the
 481 characters in the buffer, and the result of translation is actually
 482 encoded.  If a coding system has property
 483 @code{character-translation-table-for-encode}, that specifies the
 484 translation table to use.  Otherwise the variable
 485 @code{standard-translation-table-for-encode} specifies the translation
 486 table.
 487
 488 @defvar standard-translation-table-for-decode
 489 This is the default translation table for decoding, for
 490 coding systems that don't specify any other translation table.
 491 @end defvar
 492
 493 @defvar standard-translation-table-for-encode
 494 This is the default translation table for encoding, for
 495 coding systems that don't specify any other translation table.
 496 @end defvar
 497
 498 @node Coding Systems
 499 @section Coding Systems
 500
 501 @cindex coding system
 502   When Emacs reads or writes a file, and when Emacs sends text to a
 503 subprocess or receives text from a subprocess, it normally performs
 504 character code conversion and end-of-line conversion as specified
 505 by a particular @dfn{coding system}.
 506
 507   How to define a coding system is an arcane matter, and is not
 508 documented here.
 509
 510 @menu
 511 * Coding System Basics::
 512 * Encoding and I/O::
 513 * Lisp and Coding Systems::
 514 * User-Chosen Coding Systems::
 515 * Default Coding Systems::
 516 * Specifying Coding Systems::
 517 * Explicit Encoding::
 518 * Terminal I/O Encoding::
 519 * MS-DOS File Types::
 520 @end menu
 521
 522 @node Coding System Basics
 523 @subsection Basic Concepts of Coding Systems
 524
 525 @cindex character code conversion
 526   @dfn{Character code conversion} involves conversion between the encoding
 527 used inside Emacs and some other encoding.  Emacs supports many
 528 different encodings, in that it can convert to and from them.  For
 529 example, it can convert text to or from encodings such as Latin 1, Latin
 530 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022.  In some
 531 cases, Emacs supports several alternative encodings for the same
 532 characters; for example, there are three coding systems for the Cyrillic
 533 (Russian) alphabet: ISO, Alternativnyj, and KOI8.
 534
 535   Most coding systems specify a particular character code for
 536 conversion, but some of them leave the choice unspecified---to be chosen
 537 heuristically for each file, based on the data.
 538
 539 @cindex end of line conversion
 540   @dfn{End of line conversion} handles three different conventions used
 541 on various systems for representing end of line in files.  The Unix
 542 convention is to use the linefeed character (also called newline).  The
 543 DOS convention is to use a carriage-return and a linefeed at the end of
 544 a line.  The Mac convention is to use just carriage-return.
 545
 546 @cindex base coding system
 547 @cindex variant coding system
 548   @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line
 549 conversion unspecified, to be chosen based on the data.  @dfn{Variant
 550 coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and
 551 @code{latin-1-mac} specify the end-of-line conversion explicitly as
 552 well.  Most base coding systems have three corresponding variants whose
 553 names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.
 554
 555   The coding system @code{raw-text} is special in that it prevents
 556 character code conversion, and causes the buffer visited with that
 557 coding system to be a unibyte buffer.  It does not specify the
 558 end-of-line conversion, allowing that to be determined as usual by the
 559 data, and has the usual three variants which specify the end-of-line
 560 conversion.  @code{no-conversion} is equivalent to @code{raw-text-unix}:
 561 it specifies no conversion of either character codes or end-of-line.
 562
 563   The coding system @code{emacs-mule} specifies that the data is
 564 represented in the internal Emacs encoding.  This is like
 565 @code{raw-text} in that no code conversion happens, but different in
 566 that the result is multibyte data.
 567
 568 @defun coding-system-get coding-system property
 569 @tindex coding-system-get
 570 This function returns the specified property of the coding system
 571 @var{coding-system}.  Most coding system properties exist for internal
 572 purposes, but one that you might find useful is @code{mime-charset}.
 573 That property's value is the name used in MIME for the character coding
 574 which this coding system can read and write.  Examples:
 575
 576 @example
 577 (coding-system-get 'iso-latin-1 'mime-charset)
 578      @result{} iso-8859-1
 579 (coding-system-get 'iso-2022-cn 'mime-charset)
 580      @result{} iso-2022-cn
 581 (coding-system-get 'cyrillic-koi8 'mime-charset)
 582      @result{} koi8-r
 583 @end example
 584
 585 The value of the @code{mime-charset} property is also defined
 586 as an alias for the coding system.
 587 @end defun
 588
 589 @node Encoding and I/O
 590 @subsection Encoding and I/O
 591
 592   The principal purpose of coding systems is for use in reading and
 593 writing files.  The function @code{insert-file-contents} uses
 594 a coding system for decoding the file data, and @code{write-region}
 595 uses one to encode the buffer contents.
 596
 597   You can specify the coding system to use either explicitly
 598 (@pxref{Specifying Coding Systems}), or implicitly using the defaulting
 599 mechanism (@pxref{Default Coding Systems}).  But these methods may not
 600 completely specify what to do.  For example, they may choose a coding
 601 system such as @code{undefined} which leaves the character code
 602 conversion to be determined from the data.  In these cases, the I/O
 603 operation finishes the job of choosing a coding system.  Very often
 604 you will want to find out afterwards which coding system was chosen.
 605
 606 @defvar buffer-file-coding-system
 607 @tindex buffer-file-coding-system
 608 This variable records the coding system that was used for visiting the
 609 current buffer.  It is used for saving the buffer, and for writing part
 610 of the buffer with @code{write-region}.  When those operations ask the
 611 user to specify a different coding system,
 612 @code{buffer-file-coding-system} is updated to the coding system
 613 specified.
 614
 615 However, @code{buffer-file-coding-system} does not affect sending text
 616 to a subprocess.
 617 @end defvar
 618
 619 @defvar save-buffer-coding-system
 620 @tindex save-buffer-coding-system
 621 This variable specifies the coding system for saving the buffer---but it
 622 is not used for @code{write-region}.
 623
 624 When a command to save the buffer starts out to use
 625 @code{save-buffer-coding-system}, and that coding system cannot handle
 626 the actual text in the buffer, the command asks the user to choose
 627 another coding system.  After that happens, the command also updates
 628 @code{save-buffer-coding-system} to represent the coding system that the
 629 user specified.
 630 @end defvar
 631
 632 @defvar last-coding-system-used
 633 @tindex last-coding-system-used
 634 I/O operations for files and subprocesses set this variable to the
 635 coding system name that was used.  The explicit encoding and decoding
 636 functions (@pxref{Explicit Encoding}) set it too.
 637
 638 @strong{Warning:} Since receiving subprocess output sets this variable,
 639 it can change whenever Emacs waits; therefore, you should copy the
 640 value shortly after the function call that stores the value you are
 641 interested in.
 642 @end defvar
 643
 644   The variable @code{selection-coding-system} specifies how to encode
 645 selections for the window system.  @xref{Window System Selections}.
 646
 647 @node Lisp and Coding Systems
 648 @subsection Coding Systems in Lisp
 649
 650   Here are the Lisp facilities for working with coding systems:
 651
 652 @defun coding-system-list &optional base-only
 653 @tindex coding-system-list
 654 This function returns a list of all coding system names (symbols).  If
 655 @var{base-only} is non-@code{nil}, the value includes only the
 656 base coding systems.  Otherwise, it includes variant coding systems as well.
 657 @end defun
 658
 659 @defun coding-system-p object
 660 @tindex coding-system-p
 661 This function returns @code{t} if @var{object} is a coding system
 662 name.
 663 @end defun
 664
 665 @defun check-coding-system coding-system
 666 @tindex check-coding-system
 667 This function checks the validity of @var{coding-system}.
 668 If that is valid, it returns @var{coding-system}.
 669 Otherwise it signals an error with condition @code{coding-system-error}.
 670 @end defun
 671
 672 @defun coding-system-change-eol-conversion coding-system eol-type
 673 @tindex coding-system-change-eol-conversion
 674 This function returns a coding system which is like @var{coding-system}
 675 except for its eol conversion, which is specified by @code{eol-type}.
 676 @var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or
 677 @code{nil}.  If it is @code{nil}, the returned coding system determines
 678 the end-of-line conversion from the data.
 679 @end defun
 680
 681 @defun coding-system-change-text-conversion eol-coding text-coding
 682 @tindex coding-system-change-text-conversion
 683 This function returns a coding system which uses the end-of-line
 684 conversion of @var{eol-coding}, and the text conversion of
 685 @var{text-coding}.  If @var{text-coding} is @code{nil}, it returns
 686 @code{undecided}, or one of its variants according to @var{eol-coding}.
 687 @end defun
 688
 689 @defun find-coding-systems-region from to
 690 @tindex find-coding-systems-region
 691 This function returns a list of coding systems that could be used to
 692 encode a text between @var{from} and @var{to}.  All coding systems in
 693 the list can safely encode any multibyte characters in that portion of
 694 the text.
 695
 696 If the text contains no multibyte characters, the function returns the
 697 list @code{(undecided)}.
 698 @end defun
 699
 700 @defun find-coding-systems-string string
 701 @tindex find-coding-systems-string
 702 This function returns a list of coding systems that could be used to
 703 encode the text of @var{string}.  All coding systems in the list can
 704 safely encode any multibyte characters in @var{string}.  If the text
 705 contains no multibyte characters, this returns the list
 706 @code{(undecided)}.
 707 @end defun
 708
 709 @defun find-coding-systems-for-charsets charsets
 710 @tindex find-coding-systems-for-charsets
 711 This function returns a list of coding systems that could be used to
 712 encode all the character sets in the list @var{charsets}.
 713 @end defun
 714
 715 @defun detect-coding-region start end &optional highest
 716 @tindex detect-coding-region
 717 This function chooses a plausible coding system for decoding the text
 718 from @var{start} to @var{end}.  This text should be ``raw bytes''
 719 (@pxref{Explicit Encoding}).
 720
 721 Normally this function returns a list of coding systems that could
 722 handle decoding the text that was scanned.  They are listed in order of
 723 decreasing priority.  But if @var{highest} is non-@code{nil}, then the
 724 return value is just one coding system, the one that is highest in
 725 priority.
 726
 727 If the region contains only @sc{ascii} characters, the value
 728 is @code{undecided} or @code{(undecided)}.
 729 @end defun
 730
 731 @defun detect-coding-string string highest
 732 @tindex detect-coding-string
 733 This function is like @code{detect-coding-region} except that it
 734 operates on the contents of @var{string} instead of bytes in the buffer.
 735 @end defun
 736
 737   @xref{Process Information}, for how to examine or set the coding
 738 systems used for I/O to a subprocess.
 739
 740 @node User-Chosen Coding Systems
 741 @subsection User-Chosen Coding Systems
 742
 743 @tindex select-safe-coding-system
 744 @defun select-safe-coding-system from to &optional preferred-coding-system
 745 This function selects a coding system for encoding the text between
 746 @var{from} and @var{to}, asking the user to choose if necessary.
 747
 748 The optional argument @var{preferred-coding-system} specifies a coding
 749 system to try first.  If that one can handle the text in the specified
 750 region, then it is used.  If this argument is omitted, the current
 751 buffer's value of @code{buffer-file-coding-system} is tried first.
 752
 753 If the region contains some multibyte characters that the preferred
 754 coding system cannot encode, this function asks the user to choose from
 755 a list of coding systems which can encode the text, and returns the
 756 user's choice.
 757
 758 One other kludgy feature: if @var{from} is a string, the string is the
 759 target text, and @var{to} is ignored.
 760 @end defun
 761
 762   Here are two functions you can use to let the user specify a coding
 763 system, with completion.  @xref{Completion}.
 764
 765 @defun read-coding-system prompt &optional default
 766 @tindex read-coding-system
 767 This function reads a coding system using the minibuffer, prompting with
 768 string @var{prompt}, and returns the coding system name as a symbol.  If
 769 the user enters null input, @var{default} specifies which coding system
 770 to return.  It should be a symbol or a string.
 771 @end defun
 772
 773 @defun read-non-nil-coding-system prompt
 774 @tindex read-non-nil-coding-system
 775 This function reads a coding system using the minibuffer, prompting with
 776 string @var{prompt}, and returns the coding system name as a symbol.  If
 777 the user tries to enter null input, it asks the user to try again.
 778 @xref{Coding Systems}.
 779 @end defun
 780
 781 @node Default Coding Systems
 782 @subsection Default Coding Systems
 783
 784   This section describes variables that specify the default coding
 785 system for certain files or when running certain subprograms, and the
 786 function that I/O operations use to access them.
 787
 788   The idea of these variables is that you set them once and for all to the
 789 defaults you want, and then do not change them again.  To specify a
 790 particular coding system for a particular operation in a Lisp program,
 791 don't change these variables; instead, override them using
 792 @code{coding-system-for-read} and @code{coding-system-for-write}
 793 (@pxref{Specifying Coding Systems}).
 794
 795 @defvar file-coding-system-alist
 796 @tindex file-coding-system-alist
 797 This variable is an alist that specifies the coding systems to use for
 798 reading and writing particular files.  Each element has the form
 799 @code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular
 800 expression that matches certain file names.  The element applies to file
 801 names that match @var{pattern}.
 802
 803 The @sc{cdr} of the element, @var{coding}, should be either a coding
 804 system, a cons cell containing two coding systems, or a function name (a
 805 symbol with a function definition).  If @var{coding} is a coding system,
 806 that coding system is used for both reading the file and writing it.  If
 807 @var{coding} is a cons cell containing two coding systems, its @sc{car}
 808 specifies the coding system for decoding, and its @sc{cdr} specifies the
 809 coding system for encoding.
 810
 811 If @var{coding} is a function name, the function must return a coding
 812 system or a cons cell containing two coding systems.  This value is used
 813 as described above.
 814 @end defvar
 815
 816 @defvar process-coding-system-alist
 817 @tindex process-coding-system-alist
 818 This variable is an alist specifying which coding systems to use for a
 819 subprocess, depending on which program is running in the subprocess.  It
 820 works like @code{file-coding-system-alist}, except that @var{pattern} is
 821 matched against the program name used to start the subprocess.  The coding
 822 system or systems specified in this alist are used to initialize the
 823 coding systems used for I/O to the subprocess, but you can specify
 824 other coding systems later using @code{set-process-coding-system}.
 825 @end defvar
 826
 827   @strong{Warning:} Coding systems such as @code{undecided}, which
 828 determine the coding system from the data, do not work entirely reliably
 829 with asynchronous subprocess output.  This is because Emacs handles
 830 asynchronous subprocess output in batches, as it arrives.  If the coding
 831 system leaves the character code conversion unspecified, or leaves the
 832 end-of-line conversion unspecified, Emacs must try to detect the proper
 833 conversion from one batch at a time, and this does not always work.
 834
 835   Therefore, with an asynchronous subprocess, if at all possible, use a
 836 coding system which determines both the character code conversion and
 837 the end of line conversion---that is, one like @code{latin-1-unix},
 838 rather than @code{undecided} or @code{latin-1}.
 839
 840 @defvar network-coding-system-alist
 841 @tindex network-coding-system-alist
 842 This variable is an alist that specifies the coding system to use for
 843 network streams.  It works much like @code{file-coding-system-alist},
 844 with the difference that the @var{pattern} in an element may be either a
 845 port number or a regular expression.  If it is a regular expression, it
 846 is matched against the network service name used to open the network
 847 stream.
 848 @end defvar
 849
 850 @defvar default-process-coding-system
 851 @tindex default-process-coding-system
 852 This variable specifies the coding systems to use for subprocess (and
 853 network stream) input and output, when nothing else specifies what to
 854 do.
 855
 856 The value should be a cons cell of the form @code{(@var{input-coding}
 857 . @var{output-coding})}.  Here @var{input-coding} applies to input from
 858 the subprocess, and @var{output-coding} applies to output to it.
 859 @end defvar
 860
 861 @defun find-operation-coding-system operation &rest arguments
 862 @tindex find-operation-coding-system
 863 This function returns the coding system to use (by default) for
 864 performing @var{operation} with @var{arguments}.  The value has this
 865 form:
 866
 867 @example
 868 (@var{decoding-system} @var{encoding-system})
 869 @end example
 870
 871 The first element, @var{decoding-system}, is the coding system to use
 872 for decoding (in case @var{operation} does decoding), and
 873 @var{encoding-system} is the coding system for encoding (in case
 874 @var{operation} does encoding).
 875
 876 The argument @var{operation} should be a symbol, one of
 877 @code{insert-file-contents}, @code{write-region}, @code{call-process},
 878 @code{call-process-region}, @code{start-process}, or
 879 @code{open-network-stream}.  These are the names of the Emacs I/O primitives
 880 that can do coding system conversion.
 881
 882 The remaining arguments should be the same arguments that might be given
 883 to that I/O primitive.  Depending on the primitive, one of those
 884 arguments is selected as the @dfn{target}.  For example, if
 885 @var{operation} does file I/O, whichever argument specifies the file
 886 name is the target.  For subprocess primitives, the process name is the
 887 target.  For @code{open-network-stream}, the target is the service name
 888 or port number.
 889
 890 This function looks up the target in @code{file-coding-system-alist},
 891 @code{process-coding-system-alist}, or
 892 @code{network-coding-system-alist}, depending on @var{operation}.
 893 @xref{Default Coding Systems}.
 894 @end defun
 895
 896 @node Specifying Coding Systems
 897 @subsection Specifying a Coding System for One Operation
 898
 899   You can specify the coding system for a specific operation by binding
 900 the variables @code{coding-system-for-read} and/or
 901 @code{coding-system-for-write}.
 902
 903 @defvar coding-system-for-read
 904 @tindex coding-system-for-read
 905 If this variable is non-@code{nil}, it specifies the coding system to
 906 use for reading a file, or for input from a synchronous subprocess.
 907
 908 It also applies to any asynchronous subprocess or network stream, but in
 909 a different way: the value of @code{coding-system-for-read} when you
 910 start the subprocess or open the network stream specifies the input
 911 decoding method for that subprocess or network stream.  It remains in
 912 use for that subprocess or network stream unless and until overridden.
 913
 914 The right way to use this variable is to bind it with @code{let} for a
 915 specific I/O operation.  Its global value is normally @code{nil}, and
 916 you should not globally set it to any other value.  Here is an example
 917 of the right way to use the variable:
 918
 919 @example
 920 ;; @r{Read the file with no character code conversion.}
 921 ;; @r{Assume @sc{crlf} represents end-of-line.}
 922 (let ((coding-system-for-write 'emacs-mule-dos))
 923   (insert-file-contents filename))
 924 @end example
 925
 926 When its value is non-@code{nil}, @code{coding-system-for-read} takes
 927 precedence over all other methods of specifying a coding system to use for
 928 input, including @code{file-coding-system-alist},
 929 @code{process-coding-system-alist} and
 930 @code{network-coding-system-alist}.
 931 @end defvar
 932
 933 @defvar coding-system-for-write
 934 @tindex coding-system-for-write
 935 This works much like @code{coding-system-for-read}, except that it
 936 applies to output rather than input.  It affects writing to files,
 937 as well as sending output to subprocesses and net connections.
 938
 939 When a single operation does both input and output, as do
 940 @code{call-process-region} and @code{start-process}, both
 941 @code{coding-system-for-read} and @code{coding-system-for-write}
 942 affect it.
 943 @end defvar
 944
 945 @defvar inhibit-eol-conversion
 946 @tindex inhibit-eol-conversion
 947 When this variable is non-@code{nil}, no end-of-line conversion is done,
 948 no matter which coding system is specified.  This applies to all the
 949 Emacs I/O and subprocess primitives, and to the explicit encoding and
 950 decoding functions (@pxref{Explicit Encoding}).
 951 @end defvar
 952
 953 @node Explicit Encoding
 954 @subsection Explicit Encoding and Decoding
 955 @cindex encoding text
 956 @cindex decoding text
 957
 958   All the operations that transfer text in and out of Emacs have the
 959 ability to use a coding system to encode or decode the text.
 960 You can also explicitly encode and decode text using the functions
 961 in this section.
 962
 963 @cindex raw bytes
 964   The result of encoding, and the input to decoding, are not ordinary
 965 text.  They are ``raw bytes''---bytes that represent text in the same
 966 way that an external file would.  When a buffer contains raw bytes, it
 967 is most natural to mark that buffer as using unibyte representation,
 968 using @code{set-buffer-multibyte} (@pxref{Selecting a Representation}),
 969 but this is not required.  If the buffer's contents are only temporarily
 970 raw, leave the buffer multibyte, which will be correct after you decode
 971 them.
 972
 973   The usual way to get raw bytes in a buffer, for explicit decoding, is
 974 to read them from a file with @code{insert-file-contents-literally}
 975 (@pxref{Reading from Files}) or specify a non-@code{nil} @var{rawfile}
 976 argument when visiting a file with @code{find-file-noselect}.
 977
 978   The usual way to use the raw bytes that result from explicitly
 979 encoding text is to copy them to a file or process---for example, to
 980 write them with @code{write-region} (@pxref{Writing to Files}), and
 981 suppress encoding for that @code{write-region} call by binding
 982 @code{coding-system-for-write} to @code{no-conversion}.
 983
 984   Raw bytes typically contain stray individual bytes with values in the
 985 range 128 through 255, that are legitimate only as part of multibyte
 986 sequences.  Even if the buffer is multibyte, Emacs treats each such
 987 individual byte as a character and uses the byte value as its character
 988 code.  In this way, character codes 128 through 255 can be found in a
 989 multibyte buffer, even though they are not legitimate multibyte
 990 character codes.
 991
 992   Raw bytes sometimes contain overlong byte-sequences that look like a
 993 proper multibyte character plus extra superfluous trailing codes.  For
 994 most purposes, Emacs treats such a sequence in a buffer or string as a
 995 single character, and if you look at its character code, you get the
 996 value that corresponds to the multibyte character
 997 sequence---disregarding the extra trailing codes.  This is not quite
 998 clean, but raw bytes are used only in limited ways, so as a practical
 999 matter it is not worth the trouble to treat this case differently.
1000
1001   When a multibyte buffer contains illegitimate byte sequences,
1002 sometimes insertion or deletion can cause them to coalesce into a
1003 legitimate multibyte character.  For example, suppose the buffer
1004 contains the sequence 129 68 192, 68 being the character @samp{D}.  If
1005 you delete the @samp{D}, the bytes 129 and 192 become adjacent, and thus
1006 become one multibyte character (Latin-1 A with grave accent).  Point
1007 moves to one side or the other of the character, since it cannot be
1008 within a character.  Don't be alarmed by this.
1009
1010   Some really peculiar situations prevent proper coalescence.  For
1011 example, if you narrow the buffer so that the accessible portion begins
1012 just before the @samp{D}, then delete the @samp{D}, the two surrounding
1013 bytes cannot coalesce because one of them is outside the accessible
1014 portion of the buffer.  In this case, the deletion cannot be done, so
1015 @code{delete-region} signals an error.
1016
1017   Here are the functions to perform explicit encoding or decoding.  The
1018 decoding functions produce ``raw bytes''; the encoding functions are
1019 meant to operate on ``raw bytes''.  All of these functions discard text
1020 properties.
1021
1022 @defun encode-coding-region start end coding-system
1023 @tindex encode-coding-region
1024 This function encodes the text from @var{start} to @var{end} according
1025 to coding system @var{coding-system}.  The encoded text replaces the
1026 original text in the buffer.  The result of encoding is ``raw bytes,''
1027 but the buffer remains multibyte if it was multibyte before.
1028 @end defun
1029
1030 @defun encode-coding-string string coding-system
1031 @tindex encode-coding-string
1032 This function encodes the text in @var{string} according to coding
1033 system @var{coding-system}.  It returns a new string containing the
1034 encoded text.  The result of encoding is a unibyte string of ``raw bytes.''
1035 @end defun
1036
1037 @defun decode-coding-region start end coding-system
1038 @tindex decode-coding-region
1039 This function decodes the text from @var{start} to @var{end} according
1040 to coding system @var{coding-system}.  The decoded text replaces the
1041 original text in the buffer.  To make explicit decoding useful, the text
1042 before decoding ought to be ``raw bytes.''
1043 @end defun
1044
1045 @defun decode-coding-string string coding-system
1046 @tindex decode-coding-string
1047 This function decodes the text in @var{string} according to coding
1048 system @var{coding-system}.  It returns a new string containing the
1049 decoded text.  To make explicit decoding useful, the contents of
1050 @var{string} ought to be ``raw bytes.''
1051 @end defun
1052
1053 @node Terminal I/O Encoding
1054 @subsection Terminal I/O Encoding
1055
1056   Emacs can decode keyboard input using a coding system, and encode
1057 terminal output.  This is useful for terminals that transmit or display
1058 text using a particular encoding such as Latin-1.  Emacs does not set
1059 @code{last-coding-system-used} for encoding or decoding for the
1060 terminal.
1061
1062 @defun keyboard-coding-system
1063 @tindex keyboard-coding-system
1064 This function returns the coding system that is in use for decoding
1065 keyboard input---or @code{nil} if no coding system is to be used.
1066 @end defun
1067
1068 @defun set-keyboard-coding-system coding-system
1069 @tindex set-keyboard-coding-system
1070 This function specifies @var{coding-system} as the coding system to
1071 use for decoding keyboard input.  If @var{coding-system} is @code{nil},
1072 that means do not decode keyboard input.
1073 @end defun
1074
1075 @defun terminal-coding-system
1076 @tindex terminal-coding-system
1077 This function returns the coding system that is in use for encoding
1078 terminal output---or @code{nil} for no encoding.
1079 @end defun
1080
1081 @defun set-terminal-coding-system coding-system
1082 @tindex set-terminal-coding-system
1083 This function specifies @var{coding-system} as the coding system to use
1084 for encoding terminal output.  If @var{coding-system} is @code{nil},
1085 that means do not encode terminal output.
1086 @end defun
1087
1088 @node MS-DOS File Types
1089 @subsection MS-DOS File Types
1090 @cindex DOS file types
1091 @cindex MS-DOS file types
1092 @cindex Windows file types
1093 @cindex file types on MS-DOS and Windows
1094 @cindex text files and binary files
1095 @cindex binary files and text files
1096
1097   On MS-DOS and Microsoft Windows, Emacs guesses the appropriate
1098 end-of-line conversion for a file by looking at the file's name.  This
1099 feature classifies fils as @dfn{text files} and @dfn{binary files}.  By
1100 ``binary file'' we mean a file of literal byte values that are not
1101 necessarily meant to be characters; Emacs does no end-of-line conversion
1102 and no character code conversion for them.  On the other hand, the bytes
1103 in a text file are intended to represent characters; when you create a
1104 new file whose name implies that it is a text file, Emacs uses DOS
1105 end-of-line conversion.
1106
1107 @defvar buffer-file-type
1108 This variable, automatically buffer-local in each buffer, records the
1109 file type of the buffer's visited file.  When a buffer does not specify
1110 a coding system with @code{buffer-file-coding-system}, this variable is
1111 used to determine which coding system to use when writing the contents
1112 of the buffer.  It should be @code{nil} for text, @code{t} for binary.
1113 If it is @code{t}, the coding system is @code{no-conversion}.
1114 Otherwise, @code{undecided-dos} is used.
1115
1116 Normally this variable is set by visiting a file; it is set to
1117 @code{nil} if the file was visited without any actual conversion.
1118 @end defvar
1119
1120 @defopt file-name-buffer-file-type-alist
1121 This variable holds an alist for recognizing text and binary files.
1122 Each element has the form (@var{regexp} . @var{type}), where
1123 @var{regexp} is matched against the file name, and @var{type} may be
1124 @code{nil} for text, @code{t} for binary, or a function to call to
1125 compute which.  If it is a function, then it is called with a single
1126 argument (the file name) and should return @code{t} or @code{nil}.
1127
1128 When running on MS-DOS or MS-Windows, Emacs checks this alist to decide
1129 which coding system to use when reading a file.  For a text file,
1130 @code{undecided-dos} is used.  For a binary file, @code{no-conversion}
1131 is used.
1132
1133 If no element in this alist matches a given file name, then
1134 @code{default-buffer-file-type} says how to treat the file.
1135 @end defopt
1136
1137 @defopt default-buffer-file-type
1138 This variable says how to handle files for which
1139 @code{file-name-buffer-file-type-alist} says nothing about the type.
1140
1141 If this variable is non-@code{nil}, then these files are treated as
1142 binary: the coding system @code{no-conversion} is used.  Otherwise,
1143 nothing special is done for them---the coding system is deduced solely
1144 from the file contents, in the usual Emacs fashion.
1145 @end defopt
1146
1147 @node Input Methods
1148 @section Input Methods
1149 @cindex input methods
1150
1151   @dfn{Input methods} provide convenient ways of entering non-@sc{ascii}
1152 characters from the keyboard.  Unlike coding systems, which translate
1153 non-@sc{ascii} characters to and from encodings meant to be read by
1154 programs, input methods provide human-friendly commands.  (@xref{Input
1155 Methods,,, emacs, The GNU Emacs Manual}, for information on how users
1156 use input methods to enter text.)  How to define input methods is not
1157 yet documented in this manual, but here we describe how to use them.
1158
1159   Each input method has a name, which is currently a string;
1160 in the future, symbols may also be usable as input method names.
1161
1162 @tindex current-input-method
1163 @defvar current-input-method
1164 This variable holds the name of the input method now active in the
1165 current buffer.  (It automatically becomes local in each buffer when set
1166 in any fashion.)  It is @code{nil} if no input method is active in the
1167 buffer now.
1168 @end defvar
1169
1170 @tindex default-input-method
1171 @defvar default-input-method
1172 This variable holds the default input method for commands that choose an
1173 input method.  Unlike @code{current-input-method}, this variable is
1174 normally global.
1175 @end defvar
1176
1177 @tindex set-input-method
1178 @defun set-input-method input-method
1179 This function activates input method @var{input-method} for the current
1180 buffer.  It also sets @code{default-input-method} to @var{input-method}.
1181 If @var{input-method} is @code{nil}, this function deactivates any input
1182 method for the current buffer.
1183 @end defun
1184
1185 @tindex read-input-method-name
1186 @defun read-input-method-name prompt &optional default inhibit-null
1187 This function reads an input method name with the minibuffer, prompting
1188 with @var{prompt}.  If @var{default} is non-@code{nil}, that is returned
1189 by default, if the user enters empty input.  However, if
1190 @var{inhibit-null} is non-@code{nil}, empty input signals an error.
1191
1192 The returned value is a string.
1193 @end defun
1194
1195 @tindex input-method-alist
1196 @defvar input-method-alist
1197 This variable defines all the supported input methods.
1198 Each element defines one input method, and should have the form:
1199
1200 @example
1201 (@var{input-method} @var{language-env} @var{activate-func}
1202  @var{title} @var{description} @var{args}...)
1203 @end example
1204
1205 Here @var{input-method} is the input method name, a string;
1206 @var{language-env} is another string, the name of the language
1207 environment this input method is recommended for.  (That serves only for
1208 documentation purposes.)
1209
1210 @var{title} is a string to display in the mode line while this method is
1211 active.  @var{description} is a string describing this method and what
1212 it is good for.
1213
1214 @var{activate-func} is a function to call to activate this method.  The
1215 @var{args}, if any, are passed as arguments to @var{activate-func}.  All
1216 told, the arguments to @var{activate-func} are @var{input-method} and
1217 the @var{args}.
1218 @end defvar
1219
1220   The fundamental interface to input methods is through the
1221 variable @code{input-method-function}.  @xref{Reading One Event}.