2 @c This is part of the GNU Emacs Lisp Reference Manual.
3 @c Copyright (C) 1998 Free Software Foundation, Inc.
4 @c See the file elisp.texi for copying conditions.
5 @setfilename ../info/characters
6 @node Non-ASCII Characters, Searching and Matching, Text, Top
7 @chapter Non-ASCII Characters
8 @cindex multibyte characters
9 @cindex non-ASCII characters
11 This chapter covers the special issues relating to non-@sc{ASCII}
12 characters and how they are stored in strings and buffers.
15 * Text Representations::
16 * Converting Representations::
17 * Selecting a Representation::
23 * Lisp and Coding System::
24 * Default Coding Systems::
25 * Specifying Coding Systems::
28 * MS-DOS Subprocesses::
31 @node Text Representations
32 @section Text Representations
33 @cindex text representations
35 Emacs has two @dfn{text representations}---two ways to represent text
36 in a string or buffer. These are called @dfn{unibyte} and
37 @dfn{multibyte}. Each string, and each buffer, uses one of these two
38 representations. For most purposes, you can ignore the issue of
39 representations, because Emacs converts text between them as
40 appropriate. Occasionally in Lisp programming you will need to pay
41 attention to the difference.
44 In unibyte representation, each character occupies one byte and
45 therefore the possible character codes range from 0 to 255. Codes 0
46 through 127 are @sc{ASCII} characters; the codes from 128 through 255
47 are used for one non-@sc{ASCII} character set (you can choose which
48 character set by setting the variable @code{nonascii-insert-offset}).
51 @cindex multibyte text
52 In multibyte representation, a character may occupy more than one
53 byte, and as a result, the full range of Emacs character codes can be
54 stored. The first byte of a multibyte character is always in the range
55 128 through 159 (octal 0200 through 0237). These values are called
56 @dfn{leading codes}. The first byte determines which character set the
57 character belongs to (@pxref{Character Sets}); in particular, it
58 determines how many bytes long the sequence is. The second and
59 subsequent bytes of a multibyte character are always in the range 160
60 through 255 (octal 0240 through 0377).
62 In a buffer, the buffer-local value of the variable
63 @code{enable-multibyte-characters} specifies the representation used.
64 The representation for a string is determined based on the string
65 contents when the string is constructed.
67 @tindex enable-multibyte-characters
68 @defvar enable-multibyte-characters
69 This variable specifies the current buffer's text representation.
70 If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
71 it contains unibyte text.
73 You cannot set this variable directly; instead, use the function
74 @code{set-buffer-multibyte} to change a buffer's representation.
77 @tindex default-enable-multibyte-characters
78 @defvar default-enable-multibyte-characters
79 This variable`s value is entirely equivalent to @code{(default-value
80 'enable-multibyte-characters)}, and setting this variable changes that
81 default value. Although setting the local binding of
82 @code{enable-multibyte-characters} in a specific buffer is dangerous,
83 changing the default value is safe, and it is a reasonable thing to do.
85 The @samp{--unibyte} command line option does its job by setting the
86 default value to @code{nil} early in startup.
89 @tindex multibyte-string-p
90 @defun multibyte-string-p string
91 Return @code{t} if @var{string} contains multibyte characters.
94 @node Converting Representations
95 @section Converting Text Representations
97 Emacs can convert unibyte text to multibyte; it can also convert
98 multibyte text to unibyte, though this conversion loses information. In
99 general these conversions happen when inserting text into a buffer, or
100 when putting text from several strings together in one string. You can
101 also explicitly convert a string's contents to either representation.
103 Emacs chooses the representation for a string based on the text that
104 it is constructed from. The general rule is to convert unibyte text to
105 multibyte text when combining it with other multibyte text, because the
106 multibyte representation is more general and can hold whatever
107 characters the unibyte text has.
109 When inserting text into a buffer, Emacs converts the text to the
110 buffer's representation, as specified by
111 @code{enable-multibyte-characters} in that buffer. In particular, when
112 you insert multibyte text into a unibyte buffer, Emacs converts the text
113 to unibyte, even though this conversion cannot in general preserve all
114 the characters that might be in the multibyte text. The other natural
115 alternative, to convert the buffer contents to multibyte, is not
116 acceptable because the buffer's representation is a choice made by the
117 user that cannot be overridden automatically.
119 Converting unibyte text to multibyte text leaves @sc{ASCII} characters
120 unchanged, and likewise 128 through 159. It converts the non-@sc{ASCII}
121 codes 160 through 255 by adding the value @code{nonascii-insert-offset}
122 to each character code. By setting this variable, you specify which
123 character set the unibyte characters correspond to. For example, if
124 @code{nonascii-insert-offset} is 2048, which is @code{(- (make-char
125 'latin-iso8859-1 0) 128)}, then the unibyte non-@sc{ASCII} characters
126 correspond to Latin 1. If it is 2688, which is @code{(- (make-char
127 'greek-iso8859-7 0) 128)}, then they correspond to Greek letters.
129 Converting multibyte text to unibyte is simpler: it performs
130 logical-and of each character code with 255. If
131 @code{nonascii-insert-offset} has a reasonable value, corresponding to
132 the beginning of some character set, this conversion is the inverse of
133 the other: converting unibyte text to multibyte and back to unibyte
134 reproduces the original unibyte text.
136 @tindex nonascii-insert-offset
137 @defvar nonascii-insert-offset
138 This variable specifies the amount to add to a non-@sc{ASCII} character
139 when converting unibyte text to multibyte. It also applies when
140 @code{insert-char} or @code{self-insert-command} inserts a character in
141 the unibyte non-@sc{ASCII} range, 128 through 255.
143 The right value to use to select character set @var{cs} is @code{(-
144 (make-char @var{cs} 0) 128)}. If the value of
145 @code{nonascii-insert-offset} is zero, then conversion actually uses the
146 value for the Latin 1 character set, rather than zero.
149 @tindex nonascii-translate-table
150 @defvar nonascii-translate-table
151 This variable provides a more general alternative to
152 @code{nonascii-insert-offset}. You can use it to specify independently
153 how to translate each code in the range of 128 through 255 into a
154 multibyte character. The value should be a vector, or @code{nil}.
155 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
158 @tindex string-make-unibyte
159 @defun string-make-unibyte string
160 This function converts the text of @var{string} to unibyte
161 representation, if it isn't already, and return the result. If
162 @var{string} is a unibyte string, it is returned unchanged.
165 @tindex string-make-multibyte
166 @defun string-make-multibyte string
167 This function converts the text of @var{string} to multibyte
168 representation, if it isn't already, and return the result. If
169 @var{string} is a multibyte string, it is returned unchanged.
172 @node Selecting a Representation
173 @section Selecting a Representation
175 Sometimes it is useful to examine an existing buffer or string as
176 multibyte when it was unibyte, or vice versa.
178 @tindex set-buffer-multibyte
179 @defun set-buffer-multibyte multibyte
180 Set the representation type of the current buffer. If @var{multibyte}
181 is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}
182 is @code{nil}, the buffer becomes unibyte.
184 This function leaves the buffer contents unchanged when viewed as a
185 sequence of bytes. As a consequence, it can change the contents viewed
186 as characters; a sequence of two bytes which is treated as one character
187 in multibyte representation will count as two characters in unibyte
190 This function sets @code{enable-multibyte-characters} to record which
191 representation is in use. It also adjusts various data in the buffer
192 (including overlays, text properties and markers) so that they cover the
193 same text as they did before.
196 @tindex string-as-unibyte
197 @defun string-as-unibyte string
198 This function returns a string with the same bytes as @var{string} but
199 treating each byte as a character. This means that the value may have
200 more characters than @var{string} has.
202 If @var{string} is unibyte already, then the value is @var{string}
206 @tindex string-as-multibyte
207 @defun string-as-multibyte string
208 This function returns a string with the same bytes as @var{string} but
209 treating each multibyte sequence as one character. This means that the
210 value may have fewer characters than @var{string} has.
212 If @var{string} is multibyte already, then the value is @var{string}
216 @node Character Codes
217 @section Character Codes
218 @cindex character codes
220 The unibyte and multibyte text representations use different character
221 codes. The valid character codes for unibyte representation range from
222 0 to 255---the values that can fit in one byte. The valid character
223 codes for multibyte representation range from 0 to 524287, but not all
224 values in that range are valid. In particular, the values 128 through
225 255 are not legitimate in multibyte text (though they can occur in ``raw
226 bytes''; @pxref{Explicit Encoding}). Only the @sc{ASCII} codes 0
227 through 127 are fully legitimate in both representations.
229 @defun char-valid-p charcode
230 This returns @code{t} if @var{charcode} is valid for either one of the two
231 text representations.
244 @section Character Sets
245 @cindex character sets
247 Emacs classifies characters into various @dfn{character sets}, each of
248 which has a name which is a symbol. Each character belongs to one and
249 only one character set.
251 In general, there is one character set for each distinct script. For
252 example, @code{latin-iso8859-1} is one character set,
253 @code{greek-iso8859-7} is another, and @code{ascii} is another. An
254 Emacs character set can hold at most 9025 characters; therefore, in some
255 cases, characters that would logically be grouped together are split
256 into several character sets. For example, one set of Chinese characters
257 is divided into eight Emacs character sets, @code{chinese-cns11643-1}
258 through @code{chinese-cns11643-7}.
261 @defun charsetp object
262 Return @code{t} if @var{object} is a character set name symbol,
263 @code{nil} otherwise.
268 This function returns a list of all defined character set names.
272 @defun char-charset character
273 This function returns the the name of the character
274 set that @var{character} belongs to.
277 @node Scanning Charsets
278 @section Scanning for Character Sets
280 Sometimes it is useful to find out which character sets appear in a
281 part of a buffer or a string. One use for this is in determining which
282 coding systems (@pxref{Coding Systems}) are capable of representing all
283 of the text in question.
285 @tindex find-charset-region
286 @defun find-charset-region beg end &optional unification
287 This function returns a list of the character sets
288 that appear in the current buffer between positions @var{beg}
292 @tindex find-charset-string
293 @defun find-charset-string string &optional unification
294 This function returns a list of the character sets
295 that appear in the string @var{string}.
298 @node Chars and Bytes
299 @section Characters and Bytes
300 @cindex bytes and characters
302 In multibyte representation, each character occupies one or more
303 bytes. The functions in this section convert between characters and the
304 byte values used to represent them. For most purposes, there is no need
305 to be concerned with the number of bytes used to represent a character
306 because Emacs translates automatically when necessary.
309 @defun char-bytes character
310 This function returns the number of bytes used to represent the
311 character @var{character}. In most cases, this is the same as
312 @code{(length (split-char @var{character}))}; the only exception is for
313 ASCII characters and the codes used in unibyte text, which use just one
323 This function's values are correct for both multibyte and unibyte
324 representations, because the non-@sc{ASCII} character codes used in
325 those two representations do not overlap.
334 @defun split-char character
335 Return a list containing the name of the character set of
336 @var{character}, followed by one or two byte-values which identify
337 @var{character} within that character set.
341 @result{} (latin-iso8859-1 72)
346 Unibyte non-@sc{ASCII} characters are considered as part of
347 the @code{ascii} character set:
351 @result{} (ascii 192)
356 @defun make-char charset &rest byte-values
357 Thus function returns the character in character set @var{charset}
358 identified by @var{byte-values}. This is roughly the opposite of
362 (make-char 'latin-iso8859-1 72)
368 @section Coding Systems
370 @cindex coding system
371 When Emacs reads or writes a file, and when Emacs sends text to a
372 subprocess or receives text from a subprocess, it normally performs
373 character code conversion and end-of-line conversion as specified
374 by a particular @dfn{coding system}.
376 @cindex character code conversion
377 @dfn{Character code conversion} involves conversion between the encoding
378 used inside Emacs and some other encoding. Emacs supports many
379 different encodings, in that it can convert to and from them. For
380 example, it can convert text to or from encodings such as Latin 1, Latin
381 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some
382 cases, Emacs supports several alternative encodings for the same
383 characters; for example, there are three coding systems for the Cyrillic
384 (Russian) alphabet: ISO, Alternativnyj, and KOI8.
386 Most coding systems specify a particular character code for
387 conversion, but some of them leave this unspecified---to be chosen
388 heuristically based on the data.
390 @cindex end of line conversion
391 @dfn{End of line conversion} handles three different conventions used
392 on various systems for representing end of line in files. The Unix
393 convention is to use the linefeed character (also called newline). The
394 DOS convention is to use the two character sequence, carriage-return
395 linefeed, at the end of a line. The Mac convention is to use just
398 @cindex base coding system
399 @cindex variant coding system
400 @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line
401 conversion unspecified, to be chosen based on the data. @dfn{Variant
402 coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and
403 @code{latin-1-mac} specify the end-of-line conversion explicitly as
404 well. Each base coding system has three corresponding variants whose
405 names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.
407 @node Lisp and Coding Systems
408 @subsection Coding Systems in Lisp
410 Here are Lisp facilities for working with coding systems;
412 @tindex coding-system-list
413 @defun coding-system-list &optional base-only
414 This function returns a list of all coding system names (symbols). If
415 @var{base-only} is non-@code{nil}, the value includes only the
416 base coding systems. Otherwise, it includes variant coding systems as well.
419 @tindex coding-system-p
420 @defun coding-system-p object
421 This function returns @code{t} if @var{object} is a coding system
425 @tindex check-coding-system
426 @defun check-coding-system coding-system
427 This function checks the validity of @var{coding-system}.
428 If that is valid, it returns @var{coding-system}.
429 Otherwise it signals an error with condition @code{coding-system-error}.
432 @tindex find-safe-coding-system
433 @defun find-safe-coding-system from to
434 Return a list of proper coding systems to encode a text between
435 @var{from} and @var{to}. All coding systems in the list can safely
436 encode any multibyte characters in the text.
438 If the text contains no multibyte characters, return a list of a single
439 element @code{undecided}.
442 @tindex detect-coding-region
443 @defun detect-coding-region start end highest
444 This function chooses a plausible coding system for decoding the text
445 from @var{start} to @var{end}. This text should be ``raw bytes''
446 (@pxref{Explicit Encoding}).
448 Normally this function returns is a list of coding systems that could
449 handle decoding the text that was scanned. They are listed in order of
450 decreasing priority, based on the priority specified by the user with
451 @code{prefer-coding-system}. But if @var{highest} is non-@code{nil},
452 then the return value is just one coding system, the one that is highest
456 @tindex detect-coding-string string highest
457 @defun detect-coding-string
458 This function is like @code{detect-coding-region} except that it
459 operates on the contents of @var{string} instead of bytes in the buffer.
462 @defun find-operation-coding-system operation &rest arguments
463 This function returns the coding system to use (by default) for
464 performing @var{operation} with @var{arguments}. The value has this
468 (@var{decoding-system} @var{encoding-system})
471 The first element, @var{decoding-system}, is the coding system to use
472 for decoding (in case @var{operation} does decoding), and
473 @var{encoding-system} is the coding system for encoding (in case
474 @var{operation} does encoding).
476 The argument @var{operation} should be an Emacs I/O primitive:
477 @code{insert-file-contents}, @code{write-region}, @code{call-process},
478 @code{call-process-region}, @code{start-process}, or
479 @code{open-network-stream}.
481 The remaining arguments should be the same arguments that might be given
482 to that I/O primitive. Depending on which primitive, one of those
483 arguments is selected as the @dfn{target}. For example, if
484 @var{operation} does file I/O, whichever argument specifies the file
485 name is the target. For subprocess primitives, the process name is the
486 target. For @code{open-network-stream}, the target is the service name
489 This function looks up the target in @code{file-coding-system-alist},
490 @code{process-coding-system-alist}, or
491 @code{network-coding-system-alist}, depending on @var{operation}.
492 @xref{Default Coding Systems}.
495 Here are two functions you can use to let the user specify a coding
496 system, with completion. @xref{Completion}.
498 @tindex read-coding-system
499 @defun read-coding-system prompt default
500 This function reads a coding system using the minibuffer, prompting with
501 string @var{prompt}, and returns the coding system name as a symbol. If
502 the user enters null input, @var{default} specifies which coding system
503 to return. It should be a symbol or a string.
506 @tindex read-non-nil-coding-system
507 @defun read-non-nil-coding-system prompt
508 This function reads a coding system using the minibuffer, prompting with
509 string @var{prompt},and returns the coding system name as a symbol. If
510 the user tries to enter null input, it asks the user to try again.
511 @xref{Coding Systems}.
514 @node Default Coding Systems
515 @section Default Coding Systems
517 These variable specify which coding system to use by default for
518 certain files or when running certain subprograms. The idea of these
519 variables is that you set them once and for all to the defaults you
520 want, and then do not change them again. To specify a particular coding
521 system for a particular operation in a Lisp program, don't change these
522 variables; instead, override them using @code{coding-system-for-read}
523 and @code{coding-system-for-write} (@pxref{Specifying Coding Systems}).
525 @tindex file-coding-system-alist
526 @defvar file-coding-system-alist
527 This variable is an alist that specifies the coding systems to use for
528 reading and writing particular files. Each element has the form
529 @code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular
530 expression that matches certain file names. The element applies to file
531 names that match @var{pattern}.
533 The @sc{cdr} of the element, @var{val}, should be either a coding
534 system, a cons cell containing two coding systems, or a function symbol.
535 If @var{val} is a coding system, that coding system is used for both
536 reading the file and writing it. If @var{val} is a cons cell containing
537 two coding systems, its @sc{car} specifies the coding system for
538 decoding, and its @sc{cdr} specifies the coding system for encoding.
540 If @var{val} is a function symbol, the function must return a coding
541 system or a cons cell containing two coding systems. This value is used
545 @tindex process-coding-system-alist
546 @defvar process-coding-system-alist
547 This variable is an alist specifying which coding systems to use for a
548 subprocess, depending on which program is running in the subprocess. It
549 works like @code{file-coding-system-alist}, except that @var{pattern} is
550 matched against the program name used to start the subprocess. The coding
551 system or systems specified in this alist are used to initialize the
552 coding systems used for I/O to the subprocess, but you can specify
553 other coding systems later using @code{set-process-coding-system}.
556 @tindex network-coding-system-alist
557 @defvar network-coding-system-alist
558 This variable is an alist that specifies the coding system to use for
559 network streams. It works much like @code{file-coding-system-alist},
560 with the difference that the @var{pattern} in an element may be either a
561 port number or a regular expression. If it is a regular expression, it
562 is matched against the network service name used to open the network
566 @tindex default-process-coding-system
567 @defvar default-process-coding-system
568 This variable specifies the coding systems to use for subprocess (and
569 network stream) input and output, when nothing else specifies what to
572 The value should be a cons cell of the form @code{(@var{output-coding}
573 . @var{input-coding})}. Here @var{output-coding} applies to output to
574 the subprocess, and @var{input-coding} applies to input from it.
577 @node Specifying Coding Systems
578 @section Specifying a Coding System for One Operation
580 You can specify the coding system for a specific operation by binding
581 the variables @code{coding-system-for-read} and/or
582 @code{coding-system-for-write}.
584 @tindex coding-system-for-read
585 @defvar coding-system-for-read
586 If this variable is non-@code{nil}, it specifies the coding system to
587 use for reading a file, or for input from a synchronous subprocess.
589 It also applies to any asynchronous subprocess or network stream, but in
590 a different way: the value of @code{coding-system-for-read} when you
591 start the subprocess or open the network stream specifies the input
592 decoding method for that subprocess or network stream. It remains in
593 use for that subprocess or network stream unless and until overridden.
595 The right way to use this variable is to bind it with @code{let} for a
596 specific I/O operation. Its global value is normally @code{nil}, and
597 you should not globally set it to any other value. Here is an example
598 of the right way to use the variable:
601 ;; @r{Read the file with no character code conversion.}
602 ;; @r{Assume @sc{crlf} represents end-of-line.}
603 (let ((coding-system-for-write 'emacs-mule-dos))
604 (insert-file-contents filename))
607 When its value is non-@code{nil}, @code{coding-system-for-read} takes
608 precedence all other methods of specifying a coding system to use for
609 input, including @code{file-coding-system-alist},
610 @code{process-coding-system-alist} and
611 @code{network-coding-system-alist}.
614 @tindex coding-system-for-write
615 @defvar coding-system-for-write
616 This works much like @code{coding-system-for-read}, except that it
617 applies to output rather than input. It affects writing to files,
618 subprocesses, and net connections.
620 When a single operation does both input and output, as do
621 @code{call-process-region} and @code{start-process}, both
622 @code{coding-system-for-read} and @code{coding-system-for-write}
626 @tindex last-coding-system-used
627 @defvar last-coding-system-used
628 All I/O operations that use a coding system set this variable
629 to the coding system name that was used.
632 @tindex inhibit-eol-conversion
633 @defvar inhibit-eol-conversion
634 When this variable is non-@code{nil}, no end-of-line conversion is done,
635 no matter which coding system is specified. This applies to all the
636 Emacs I/O and subprocess primitives, and to the explicit encoding and
637 decoding functions (@pxref{Explicit Encoding}).
640 @tindex keyboard-coding-system
641 @defun keyboard-coding-system
642 This function returns the coding system that is in use for decoding
643 keyboard input---or @code{nil} if no coding system is to be used.
646 @tindex set-keyboard-coding-system
647 @defun set-keyboard-coding-system coding-system
648 This function specifies @var{coding-system} as the coding system to
649 use for decoding keyboard input. If @var{coding-system} is @code{nil},
650 that means do not decode keyboard input.
653 @tindex terminal-coding-system
654 @defun terminal-coding-system
655 This function returns the coding system that is in use for encoding
656 terminal output---or @code{nil} for no encoding.
659 @tindex set-terminal-coding-system
660 @defun set-terminal-coding-system coding-system
661 This function specifies @var{coding-system} as the coding system to use
662 for encoding terminal output. If @var{coding-system} is @code{nil},
663 that means do not encode terminal output.
666 See also the functions @code{process-coding-system} and
667 @code{set-process-coding-system}. @xref{Process Information}.
669 See also @code{read-coding-system} in @ref{High-Level Completion}.
671 @node Explicit Encoding
672 @section Explicit Encoding and Decoding
673 @cindex encoding text
674 @cindex decoding text
676 All the operations that transfer text in and out of Emacs have the
677 ability to use a coding system to encode or decode the text.
678 You can also explicitly encode and decode text using the functions
682 The result of encoding, and the input to decoding, are not ordinary
683 text. They are ``raw bytes''---bytes that represent text in the same
684 way that an external file would. When a buffer contains raw bytes, it
685 is most natural to mark that buffer as using unibyte representation,
686 using @code{set-buffer-multibyte} (@pxref{Selecting a Representation}),
687 but this is not required. If the buffer's contents are only temporarily
688 raw, leave the buffer multibyte, which will be correct after you decode
691 The usual way to get raw bytes in a buffer, for explicit decoding, is
692 to read them from a file with @code{insert-file-contents-literally}
693 (@pxref{Reading from Files}) or specify a non-@code{nil} @var{rawfile}
694 argument when visiting a file with @code{find-file-noselect}.
696 The usual way to use the raw bytes that result from explicitly
697 encoding text is to copy them to a file or process---for example, to
698 write them with @code{write-region} (@pxref{Writing to Files}), and
699 suppress encoding for that @code{write-region} call by binding
700 @code{coding-system-for-write} to @code{no-conversion}.
702 @tindex encode-coding-region
703 @defun encode-coding-region start end coding-system
704 This function encodes the text from @var{start} to @var{end} according
705 to coding system @var{coding-system}. The encoded text replaces the
706 original text in the buffer. The result of encoding is ``raw bytes,''
707 but the buffer remains multibyte if it was multibyte before.
710 @tindex encode-coding-string
711 @defun encode-coding-string string coding-system
712 This function encodes the text in @var{string} according to coding
713 system @var{coding-system}. It returns a new string containing the
714 encoded text. The result of encoding is a unibyte string of ``raw bytes.''
717 @tindex decode-coding-region
718 @defun decode-coding-region start end coding-system
719 This function decodes the text from @var{start} to @var{end} according
720 to coding system @var{coding-system}. The decoded text replaces the
721 original text in the buffer. To make explicit decoding useful, the text
722 before decoding ought to be ``raw bytes.''
725 @tindex decode-coding-string
726 @defun decode-coding-string string coding-system
727 This function decodes the text in @var{string} according to coding
728 system @var{coding-system}. It returns a new string containing the
729 decoded text. To make explicit decoding useful, the contents of
730 @var{string} ought to be ``raw bytes.''
733 @node MS-DOS File Types
734 @section MS-DOS File Types
735 @cindex DOS file types
736 @cindex MS-DOS file types
737 @cindex Windows file types
738 @cindex file types on MS-DOS and Windows
739 @cindex text files and binary files
740 @cindex binary files and text files
742 Emacs on MS-DOS and on MS-Windows recognizes certain file names as
743 text files or binary files. For a text file, Emacs always uses DOS
744 end-of-line conversion. For a binary file, Emacs does no end-of-line
745 conversion and no character code conversion.
747 @defvar buffer-file-type
748 This variable, automatically buffer-local in each buffer, records the
749 file type of the buffer's visited file. The value is @code{nil} for
750 text, @code{t} for binary. When a buffer does not specify a coding
751 system with @code{buffer-file-coding-system}, this variable is used by
752 the function @code{find-buffer-file-type-coding-system} to determine
753 which coding system to use when writing the contents of the buffer.
756 @defopt file-name-buffer-file-type-alist
757 This variable holds an alist for recognizing text and binary files.
758 Each element has the form (@var{regexp} . @var{type}), where
759 @var{regexp} is matched against the file name, and @var{type} may be
760 @code{nil} for text, @code{t} for binary, or a function to call to
761 compute which. If it is a function, then it is called with a single
762 argument (the file name) and should return @code{t} or @code{nil}.
764 Emacs when running on MS-DOS or MS-Windows checks this alist to decide
765 which coding system to use when reading a file. For a text file,
766 @code{undecided-dos} is used. For a binary file, @code{no-conversion}
769 If no element in this alist matches a given file name, then
770 @code{default-buffer-file-type} says how to treat the file.
773 @defopt default-buffer-file-type
774 This variable says how to handle files for which
775 @code{file-name-buffer-file-type-alist} says nothing about the type.
777 If this variable is non-@code{nil}, then these files are treated as
778 binary. Otherwise, nothing special is done for them---the coding system
779 is deduced solely from the file contents, in the usual Emacs fashion.
782 @node MS-DOS Subprocesses
783 @section MS-DOS Subprocesses
785 On Microsoft operating systems, these variables provide an alternative
786 way to specify the kind of end-of-line conversion to use for input and
787 output. The variable @code{binary-process-input} applies to input sent
788 to the subprocess, and @code{binary-process-output} applies to output
789 received from it. A non-@code{nil} value means the data is ``binary,''
790 and @code{nil} means the data is text.
792 @defvar binary-process-input
793 If this variable is @code{nil}, convert newlines to @sc{crlf} sequences in
794 the input to a synchronous subprocess.
797 @defvar binary-process-output
798 If this variable is @code{nil}, convert @sc{crlf} sequences to newlines in
799 the output from a synchronous subprocess.