(fortran-comment-region): Fix typo.
[bpt/emacs.git] / man / mule.texi
CommitLineData
6bf7aab6 1@c This is part of the Emacs manual.
98c271eb 2@c Copyright (C) 1997, 1999, 2000 Free Software Foundation, Inc.
6bf7aab6
DL
3@c See file emacs.texi for copying conditions.
4@node International, Major Modes, Frames, Top
5@chapter International Character Set Support
6@cindex MULE
7@cindex international scripts
8@cindex multibyte characters
9@cindex encoding of characters
10
cca7bf28 11@cindex Celtic
6bf7aab6 12@cindex Chinese
fbc164de 13@cindex Cyrillic
cca7bf28 14@cindex Czech
6bf7aab6
DL
15@cindex Devanagari
16@cindex Hindi
17@cindex Marathi
fbc164de 18@cindex Ethiopic
cca7bf28 19@cindex German
6bf7aab6 20@cindex Greek
fbc164de 21@cindex Hebrew
6bf7aab6
DL
22@cindex IPA
23@cindex Japanese
24@cindex Korean
25@cindex Lao
cca7bf28
EZ
26@cindex Latin
27@cindex Polish
28@cindex Romanian
29@cindex Slovak
30@cindex Slovenian
6bf7aab6
DL
31@cindex Thai
32@cindex Tibetan
cca7bf28 33@cindex Turkish
6bf7aab6 34@cindex Vietnamese
732b9cdd
GM
35@cindex Dutch
36@cindex Spanish
6bf7aab6
DL
37 Emacs supports a wide variety of international character sets,
38including European variants of the Latin alphabet, as well as Chinese,
fbc164de
PE
39Cyrillic, Devanagari (Hindi and Marathi), Ethiopic, Greek, Hebrew, IPA,
40Japanese, Korean, Lao, Thai, Tibetan, and Vietnamese scripts. These features
6bf7aab6
DL
41have been merged from the modified version of Emacs known as MULE (for
42``MULti-lingual Enhancement to GNU Emacs'')
43
9d9c2e39 44 Emacs also supports various encodings of these characters used by
4b40407a 45other internationalized software, such as word processors and mailers.
9d9c2e39 46
6bf7aab6
DL
47@menu
48* International Intro:: Basic concepts of multibyte characters.
49* Enabling Multibyte:: Controlling whether to use multibyte characters.
50* Language Environments:: Setting things up for the language you use.
51* Input Methods:: Entering text characters not on your keyboard.
52* Select Input Method:: Specifying your choice of input methods.
53* Multibyte Conversion:: How single-byte characters convert to multibyte.
54* Coding Systems:: Character set conversion when you read and
55 write files, and so on.
56* Recognize Coding:: How Emacs figures out which conversion to use.
57* Specify Coding:: Various ways to choose which conversion to use.
58* Fontsets:: Fontsets are collections of fonts
59 that cover the whole spectrum of characters.
60* Defining Fontsets:: Defining a new fontset.
60245086 61* Undisplayable Characters:: When characters don't display.
521ab838 62* Single-Byte Character Support::
6bf7aab6
DL
63 You can pick one European character set
64 to use without multibyte characters.
65@end menu
66
67@node International Intro
68@section Introduction to International Character Sets
69
2565a55e
EZ
70 The users of international character sets and scripts have established
71many more-or-less standard coding systems for storing files. Emacs
72internally uses a single multibyte character encoding, so that it can
73intermix characters from all these scripts in a single buffer or string.
74This encoding represents each non-ASCII character as a sequence of bytes
75in the range 0200 through 0377. Emacs translates between the multibyte
76character encoding and various other coding systems when reading and
77writing files, when exchanging data with subprocesses, and (in some
78cases) in the @kbd{C-q} command (@pxref{Multibyte Conversion}).
6bf7aab6
DL
79
80@kindex C-h h
81@findex view-hello-file
457b792c 82@cindex undisplayable characters
4b40407a 83@cindex @samp{?} in display
6bf7aab6
DL
84 The command @kbd{C-h h} (@code{view-hello-file}) displays the file
85@file{etc/HELLO}, which shows how to say ``hello'' in many languages.
4b40407a
RS
86This illustrates various scripts. If some characters can't be
87displayed on your terminal, they appear as @samp{?} or as hollow boxes
88(@pxref{Undisplayable Characters}).
89
90 Keyboards, even in the countries where these character sets are used,
91generally don't have keys for all the characters in them. So Emacs
92supports various @dfn{input methods}, typically one for each script or
93language, to make it convenient to type them.
94
95@kindex C-x RET
96 The prefix key @kbd{C-x @key{RET}} is used for commands that pertain
97to multibyte characters, coding systems, and input methods.
98
99@ignore
100@c This is commented out because it doesn't fit here, or anywhere.
101@c This manual does not discuss "character sets" as they
102@c are used in Mule, and it makes no sense to mention these commands
103@c except as part of a larger discussion of the topic.
104@c But it is not clear that topic is worth mentioning here,
105@c since that is more of an implementation concept
106@c than a user-level concept. And when we switch to Unicode,
107@c character sets in the current sense may not even exist.
6bf7aab6 108
2565a55e
EZ
109@findex list-charset-chars
110@cindex characters in a certain charset
111 The command @kbd{M-x list-charset-chars} prompts for a name of a
112character set, and displays all the characters in that character set.
113
f5fac081
EZ
114@findex describe-character-set
115@cindex character set, description
116 The command @kbd{M-x describe-character-set} prompts for a character
117set name and displays information about that character set, including
118its internal representation within Emacs.
4b40407a 119@end ignore
6bf7aab6
DL
120
121@node Enabling Multibyte
122@section Enabling Multibyte Characters
123
124 You can enable or disable multibyte character support, either for
125Emacs as a whole, or for a single buffer. When multibyte characters are
126disabled in a buffer, then each byte in that buffer represents a
127character, even codes 0200 through 0377. The old features for
128supporting the European character sets, ISO Latin-1 and ISO Latin-2,
129work as they did in Emacs 19 and also work for the other ISO 8859
130character sets.
131
132 However, there is no need to turn off multibyte character support to
133use ISO Latin; the Emacs multibyte character set includes all the
134characters in these character sets, and Emacs can translate
135automatically to and from the ISO codes.
136
137 To edit a particular file in unibyte representation, visit it using
138@code{find-file-literally}. @xref{Visiting}. To convert a buffer in
139multibyte representation into a single-byte representation of the same
140characters, the easiest way is to save the contents in a file, kill the
141buffer, and find the file again with @code{find-file-literally}. You
142can also use @kbd{C-x @key{RET} c}
143(@code{universal-coding-system-argument}) and specify @samp{raw-text} as
144the coding system with which to find or save a file. @xref{Specify
145Coding}. Finding a file as @samp{raw-text} doesn't disable format
146conversion, uncompression and auto mode selection as
147@code{find-file-literally} does.
148
149@vindex enable-multibyte-characters
150@vindex default-enable-multibyte-characters
151 To turn off multibyte character support by default, start Emacs with
152the @samp{--unibyte} option (@pxref{Initial Options}), or set the
60a96371 153environment variable @env{EMACS_UNIBYTE}. You can also customize
6bf7aab6
DL
154@code{enable-multibyte-characters} or, equivalently, directly set the
155variable @code{default-enable-multibyte-characters} in your init file to
156have basically the same effect as @samp{--unibyte}.
157
576f17ff
EZ
158@cindex Lisp files, and multibyte operation
159@cindex multibyte operation, and Lisp files
160@cindex unibyte operation, and Lisp files
161@cindex init file, and non-ASCII characters
162@cindex environment variables, and non-ASCII characters
4b40407a
RS
163 With @samp{--unibyte}, multibyte strings are not created during
164initialization from the values of environment variables,
165@file{/etc/passwd} entries etc.@: that contain non-ASCII 8-bit
166characters.
167
168 Emacs normally loads Lisp files as multibyte, regardless of whether
169you used @samp{--unibyte}. This includes the Emacs initialization
170file, @file{.emacs}, and the initialization files of Emacs packages
171such as Gnus. However, you can specify unibyte loading for a
172particular Lisp file, by putting @samp{-*-unibyte: t;-*-} in a comment
173on the first line. Then that file is always loaded as unibyte text,
174even if you did not start Emacs with @samp{--unibyte}. The motivation
175for these conventions is that it is more reliable to always load any
176particular Lisp file in the same way. However, you can load a Lisp
177file as unibyte, on any one occasion, by typing @kbd{C-x @key{RET} c
178raw-text @key{RET}} immediately before loading it.
6bf7aab6
DL
179
180 The mode line indicates whether multibyte character support is enabled
181in the current buffer. If it is, there are two or more characters (most
182often two dashes) before the colon near the beginning of the mode line.
183When multibyte characters are not enabled, just one dash precedes the
184colon.
185
186@node Language Environments
187@section Language Environments
188@cindex language environments
189
190 All supported character sets are supported in Emacs buffers whenever
191multibyte characters are enabled; there is no need to select a
192particular language in order to display its characters in an Emacs
193buffer. However, it is important to select a @dfn{language environment}
194in order to set various defaults. The language environment really
195represents a choice of preferred script (more or less) rather than a
196choice of language.
197
198 The language environment controls which coding systems to recognize
199when reading text (@pxref{Recognize Coding}). This applies to files,
200incoming mail, netnews, and any other text you read into Emacs. It may
201also specify the default coding system to use when you create a file.
202Each language environment also specifies a default input method.
203
204@findex set-language-environment
fbc164de
PE
205@vindex current-language-environment
206 To select a language environment, customize the option
207@code{current-language-environment} or use the command @kbd{M-x
6bf7aab6
DL
208set-language-environment}. It makes no difference which buffer is
209current when you use this command, because the effects apply globally to
210the Emacs session. The supported language environments include:
211
60245086 212@cindex Euro sign
6bf7aab6 213@quotation
fbc164de
PE
214Chinese-BIG5, Chinese-CNS, Chinese-GB, Cyrillic-ALT, Cyrillic-ISO,
215Cyrillic-KOI8, Czech, Devanagari, English, Ethiopic, German, Greek,
cca7bf28
EZ
216Hebrew, IPA, Japanese, Korean, Lao, Latin-1, Latin-2, Latin-3, Latin-4,
217Latin-5, Latin-8 (Celtic), Latin-9 (updated Latin-1, with the Euro
732b9cdd
GM
218sign), Polish, Romanian, Slovak, Slovenian, Thai, Tibetan, Turkish,
219Dutch, Spanish, and Vietnamese.
6bf7aab6
DL
220@end quotation
221
4b40407a 222@cindex fonts for various scripts
0d314165 223@cindex Intlfonts package, installation
4b40407a
RS
224 To display the script(s) used by your language environment on a
225graphical display, you need to have a suitable font. If some of the
226characters appear as empty boxes, you should install the GNU Intlfonts
0d314165
EZ
227package, which includes fonts for all supported scripts.@footnote{If
228you run Emacs on X, you need to inform the X server about the location
229of the newly installed fonts with the following commands:
230
231@example
232 xset fp+ /usr/local/share/emacs/fonts
233 xset fp rehash
234@end example
235}
4b40407a 236@xref{Fontsets}, for more details about setting up your fonts.
9aeaea42 237
fbc164de
PE
238@findex set-locale-environment
239@vindex locale-language-names
240@vindex locale-charset-language-names
60245086 241@cindex locales
6bf7aab6 242 Some operating systems let you specify the language you are using by
fbc164de 243setting the locale environment variables @env{LC_ALL}, @env{LC_CTYPE},
4b40407a
RS
244or @env{LANG}.@footnote{If more than one of these is set, the first
245one that is nonempty specifies your locale for this purpose.} Emacs
246handles this during startup by matching your locale against entries in
247the value of the variables @code{locale-charset-language-names} and
fbc164de 248@code{locale-language-names} and selects the corresponding language
4b40407a
RS
249environment if a match is found. (The former variable overrides the
250latter.) It also adjusts the display table and terminal coding
251system, the locale coding system, and the preferred coding system as
252needed for the locale.
253
254 If you modify the @env{LC_ALL}, @env{LC_CTYPE}, or @env{LANG}
255environment variables while running Emacs, you may want to invoke the
256@code{set-locale-environment} function afterwards to readjust the
257language environment from the new locale.
fa71a532 258
fbc164de
PE
259@vindex locale-preferred-coding-systems
260 The @code{set-locale-environment} function normally uses the preferred
261coding system established by the language environment to decode system
262messages. But if your locale matches an entry in the variable
263@code{locale-preferred-coding-systems}, Emacs uses the corresponding
264coding system instead. For example, if the locale @samp{ja_JP.PCK}
265matches @code{japanese-shift-jis} in
266@code{locale-preferred-coding-systems}, Emacs uses that encoding even
267though it might normally use @code{japanese-iso-8bit}.
268
4b40407a
RS
269 You can override the language environment chosen at startup with
270explicit use of the command @code{set-language-environment}, or with
271customization of @code{current-language-environment} in your init
272file.
6bf7aab6
DL
273
274@kindex C-h L
275@findex describe-language-environment
276 To display information about the effects of a certain language
277environment @var{lang-env}, use the command @kbd{C-h L @var{lang-env}
278@key{RET}} (@code{describe-language-environment}). This tells you which
279languages this language environment is useful for, and lists the
280character sets, coding systems, and input methods that go with it. It
281also shows some sample text to illustrate scripts used in this language
282environment. By default, this command describes the chosen language
283environment.
284
285@vindex set-language-environment-hook
286 You can customize any language environment with the normal hook
287@code{set-language-environment-hook}. The command
288@code{set-language-environment} runs that hook after setting up the new
289language environment. The hook functions can test for a specific
290language environment by checking the variable
0d314165
EZ
291@code{current-language-environment}. This hook is where you should
292put non-default settings for specific language environment, such as
293coding systems for keyboard input and terminal output, the default
294input method, etc.
6bf7aab6
DL
295
296@vindex exit-language-environment-hook
297 Before it starts to set up the new language environment,
298@code{set-language-environment} first runs the hook
299@code{exit-language-environment-hook}. This hook is useful for undoing
300customizations that were made with @code{set-language-environment-hook}.
301For instance, if you set up a special key binding in a specific language
302environment using @code{set-language-environment-hook}, you should set
303up @code{exit-language-environment-hook} to restore the normal binding
304for that key.
305
306@node Input Methods
307@section Input Methods
308
309@cindex input methods
310 An @dfn{input method} is a kind of character conversion designed
311specifically for interactive input. In Emacs, typically each language
312has its own input method; sometimes several languages which use the same
313characters can share one input method. A few languages support several
314input methods.
315
0d314165
EZ
316 The simplest kind of input method works by mapping ASCII letters
317into another alphabet; this allows you to type characters which your
318keyboard doesn't support directly. This is how the Greek and Russian
319input methods work.
6bf7aab6
DL
320
321 A more powerful technique is composition: converting sequences of
322characters into one letter. Many European input methods use composition
323to produce a single non-ASCII letter from a sequence that consists of a
324letter followed by accent characters (or vice versa). For example, some
325methods convert the sequence @kbd{a'} into a single accented letter.
326These input methods have no special commands of their own; all they do
327is compose sequences of printing characters.
328
329 The input methods for syllabic scripts typically use mapping followed
330by composition. The input methods for Thai and Korean work this way.
331First, letters are mapped into symbols for particular sounds or tone
332marks; then, sequences of these which make up a whole syllable are
333mapped into one syllable sign.
334
335 Chinese and Japanese require more complex methods. In Chinese input
336methods, first you enter the phonetic spelling of a Chinese word (in
337input method @code{chinese-py}, among others), or a sequence of portions
338of the character (input methods @code{chinese-4corner} and
339@code{chinese-sw}, and others). Since one phonetic spelling typically
340corresponds to many different Chinese characters, you must select one of
341the alternatives using special Emacs commands. Keys such as @kbd{C-f},
342@kbd{C-b}, @kbd{C-n}, @kbd{C-p}, and digits have special definitions in
343this situation, used for selecting among the alternatives. @key{TAB}
344displays a buffer showing all the possibilities.
345
346 In Japanese input methods, first you input a whole word using
347phonetic spelling; then, after the word is in the buffer, Emacs converts
348it into one or more characters using a large dictionary. One phonetic
349spelling corresponds to many differently written Japanese words, so you
350must select one of them; use @kbd{C-n} and @kbd{C-p} to cycle through
351the alternatives.
352
353 Sometimes it is useful to cut off input method processing so that the
354characters you have just entered will not combine with subsequent
355characters. For example, in input method @code{latin-1-postfix}, the
356sequence @kbd{e '} combines to form an @samp{e} with an accent. What if
357you want to enter them as separate characters?
358
359 One way is to type the accent twice; that is a special feature for
360entering the separate letter and accent. For example, @kbd{e ' '} gives
361you the two characters @samp{e'}. Another way is to type another letter
362after the @kbd{e}---something that won't combine with that---and
363immediately delete it. For example, you could type @kbd{e e @key{DEL}
364'} to get separate @samp{e} and @samp{'}.
365
366 Another method, more general but not quite as easy to type, is to use
367@kbd{C-\ C-\} between two characters to stop them from combining. This
368is the command @kbd{C-\} (@code{toggle-input-method}) used twice.
369@ifinfo
370@xref{Select Input Method}.
371@end ifinfo
372
0d314165 373@cindex incremental search, input method interference
6bf7aab6
DL
374 @kbd{C-\ C-\} is especially useful inside an incremental search,
375because it stops waiting for more characters to combine, and starts
376searching for what you have already entered.
377
378@vindex input-method-verbose-flag
379@vindex input-method-highlight-flag
380 The variables @code{input-method-highlight-flag} and
381@code{input-method-verbose-flag} control how input methods explain what
382is happening. If @code{input-method-highlight-flag} is non-@code{nil},
383the partial sequence is highlighted in the buffer. If
384@code{input-method-verbose-flag} is non-@code{nil}, the list of possible
385characters to type next is displayed in the echo area (but not when you
386are in the minibuffer).
387
98c271eb 388@cindex Leim package
4b40407a
RS
389 Input methods are implemented in the separate Leim package: they are
390available only if the system administrator used Leim when building
391Emacs. If Emacs was built without Leim, you will find that no input
392methods are defined.
98c271eb 393
6bf7aab6
DL
394@node Select Input Method
395@section Selecting an Input Method
396
397@table @kbd
398@item C-\
399Enable or disable use of the selected input method.
400
401@item C-x @key{RET} C-\ @var{method} @key{RET}
402Select a new input method for the current buffer.
403
404@item C-h I @var{method} @key{RET}
405@itemx C-h C-\ @var{method} @key{RET}
406@findex describe-input-method
407@kindex C-h I
408@kindex C-h C-\
409Describe the input method @var{method} (@code{describe-input-method}).
67320f8d
DL
410By default, it describes the current input method (if any). This
411description should give you the full details of how to use any
a39fb83d 412particular input method.
6bf7aab6
DL
413
414@item M-x list-input-methods
415Display a list of all the supported input methods.
416@end table
417
418@findex set-input-method
419@vindex current-input-method
420@kindex C-x RET C-\
421 To choose an input method for the current buffer, use @kbd{C-x
422@key{RET} C-\} (@code{set-input-method}). This command reads the
423input method name with the minibuffer; the name normally starts with the
424language environment that it is meant to be used with. The variable
425@code{current-input-method} records which input method is selected.
426
427@findex toggle-input-method
428@kindex C-\
429 Input methods use various sequences of ASCII characters to stand for
430non-ASCII characters. Sometimes it is useful to turn off the input
431method temporarily. To do this, type @kbd{C-\}
432(@code{toggle-input-method}). To reenable the input method, type
433@kbd{C-\} again.
434
435 If you type @kbd{C-\} and you have not yet selected an input method,
436it prompts for you to specify one. This has the same effect as using
437@kbd{C-x @key{RET} C-\} to specify an input method.
438
dbee590b
EZ
439 When invoked with a numeric argument, as in @kbd{C-u C-\},
440@code{toggle-input-method} always prompts you for an input method,
441suggesting the most recently selected one as the default.
442
6bf7aab6
DL
443@vindex default-input-method
444 Selecting a language environment specifies a default input method for
445use in various buffers. When you have a default input method, you can
446select it in the current buffer by typing @kbd{C-\}. The variable
447@code{default-input-method} specifies the default input method
448(@code{nil} means there is none).
449
0d314165
EZ
450 In some language environments, which support several different input
451methods, you might want to use an input method different from the
452default chosen by @code{set-language-environment}. You can instruct
453Emacs to select a different default input method for a certain
454language environment if you by using
455@code{set-language-environment-hook} (@pxref{Language Environments,
456set-language-environment-hook}). For example:
457
458@lisp
459(defun my-chinese-setup ()
460 "Set up my private Chinese environment."
461 (if (equal current-language-environment "Chinese-GB")
462 (setq default-input-method "chinese-tonepy")))
463(add-hook 'set-language-environment-hook 'my-chinese-setup)
464@end lisp
465
466@noindent
467This sets the default input method to be @code{chinese-tonepy}
468whenever you choose a Chinese-GB language environment.
469
6bf7aab6
DL
470@findex quail-set-keyboard-layout
471 Some input methods for alphabetic scripts work by (in effect)
472remapping the keyboard to emulate various keyboard layouts commonly used
473for those scripts. How to do this remapping properly depends on your
474actual keyboard layout. To specify which layout your keyboard has, use
475the command @kbd{M-x quail-set-keyboard-layout}.
476
477@findex list-input-methods
478 To display a list of all the supported input methods, type @kbd{M-x
479list-input-methods}. The list gives information about each input
480method, including the string that stands for it in the mode line.
481
482@node Multibyte Conversion
483@section Unibyte and Multibyte Non-ASCII characters
484
485 When multibyte characters are enabled, character codes 0240 (octal)
486through 0377 (octal) are not really legitimate in the buffer. The valid
487non-ASCII printing characters have codes that start from 0400.
488
4b40407a
RS
489 If you type a self-inserting character in the range 0240 through
4900377, or if you use @kbd{C-q} to insert one, Emacs assumes you
491intended to use one of the ISO Latin-@var{n} character sets, and
492converts it to the Emacs code representing that Latin-@var{n}
493character. You select @emph{which} ISO Latin character set to use
494through your choice of language environment
6bf7aab6
DL
495@iftex
496(see above).
497@end iftex
498@ifinfo
499(@pxref{Language Environments}).
500@end ifinfo
501If you do not specify a choice, the default is Latin-1.
502
4b40407a
RS
503 If you insert a character in the range 0200 through 0237, which
504forms the @code{eight-bit-control} character set, it is inserted
60245086
DL
505literally. You should normally avoid doing this since buffers
506containing such characters have to be written out in either the
4b40407a
RS
507@code{emacs-mule} or @code{raw-text} coding system, which is usually
508not what you want.
6bf7aab6
DL
509
510@node Coding Systems
511@section Coding Systems
512@cindex coding systems
513
514 Users of various languages have established many more-or-less standard
515coding systems for representing them. Emacs does not use these coding
516systems internally; instead, it converts from various coding systems to
517its own system when reading data, and converts the internal coding
518system to other coding systems when writing data. Conversion is
519possible in reading or writing files, in sending or receiving from the
520terminal, and in exchanging data with subprocesses.
521
522 Emacs assigns a name to each coding system. Most coding systems are
523used for one language, and the name of the coding system starts with the
524language name. Some coding systems are used for several languages;
525their names usually start with @samp{iso}. There are also special
526coding systems @code{no-conversion}, @code{raw-text} and
527@code{emacs-mule} which do not convert printing characters at all.
528
9d9c2e39
EZ
529 A special class of coding systems, collectively known as
530@dfn{codepages}, is designed to support text encoded by MS-Windows and
531MS-DOS software. To use any of these systems, you need to create it
532with @kbd{M-x codepage-setup}. @xref{MS-DOS and MULE}.
533
6bf7aab6
DL
534 In addition to converting various representations of non-ASCII
535characters, a coding system can perform end-of-line conversion. Emacs
536handles three different conventions for how to separate lines in a file:
537newline, carriage-return linefeed, and just carriage-return.
538
539@table @kbd
540@item C-h C @var{coding} @key{RET}
541Describe coding system @var{coding}.
542
543@item C-h C @key{RET}
544Describe the coding systems currently in use.
545
546@item M-x list-coding-systems
547Display a list of all the supported coding systems.
548@end table
549
550@kindex C-h C
551@findex describe-coding-system
552 The command @kbd{C-h C} (@code{describe-coding-system}) displays
553information about particular coding systems. You can specify a coding
554system name as argument; alternatively, with an empty argument, it
555describes the coding systems currently selected for various purposes,
556both in the current buffer and as the defaults, and the priority list
557for recognizing coding systems (@pxref{Recognize Coding}).
558
559@findex list-coding-systems
560 To display a list of all the supported coding systems, type @kbd{M-x
561list-coding-systems}. The list gives information about each coding
562system, including the letter that stands for it in the mode line
563(@pxref{Mode Line}).
564
565@cindex end-of-line conversion
566@cindex MS-DOS end-of-line conversion
567@cindex Macintosh end-of-line conversion
568 Each of the coding systems that appear in this list---except for
569@code{no-conversion}, which means no conversion of any kind---specifies
570how and whether to convert printing characters, but leaves the choice of
571end-of-line conversion to be decided based on the contents of each file.
572For example, if the file appears to use the sequence carriage-return
573linefeed to separate lines, DOS end-of-line conversion will be used.
574
575 Each of the listed coding systems has three variants which specify
576exactly what to do for end-of-line conversion:
577
578@table @code
579@item @dots{}-unix
580Don't do any end-of-line conversion; assume the file uses
581newline to separate lines. (This is the convention normally used
582on Unix and GNU systems.)
583
584@item @dots{}-dos
585Assume the file uses carriage-return linefeed to separate lines, and do
586the appropriate conversion. (This is the convention normally used on
2684ed46 587Microsoft systems.@footnote{It is also specified for MIME @samp{text/*}
6bf7aab6
DL
588bodies and in other network transport contexts. It is different
589from the SGML reference syntax record-start/record-end format which
590Emacs doesn't support directly.})
591
592@item @dots{}-mac
593Assume the file uses carriage-return to separate lines, and do the
594appropriate conversion. (This is the convention normally used on the
595Macintosh system.)
596@end table
597
598 These variant coding systems are omitted from the
599@code{list-coding-systems} display for brevity, since they are entirely
600predictable. For example, the coding system @code{iso-latin-1} has
601variants @code{iso-latin-1-unix}, @code{iso-latin-1-dos} and
602@code{iso-latin-1-mac}.
603
604 The coding system @code{raw-text} is good for a file which is mainly
605ASCII text, but may contain byte values above 127 which are not meant to
606encode non-ASCII characters. With @code{raw-text}, Emacs copies those
607byte values unchanged, and sets @code{enable-multibyte-characters} to
608@code{nil} in the current buffer so that they will be interpreted
609properly. @code{raw-text} handles end-of-line conversion in the usual
610way, based on the data encountered, and has the usual three variants to
611specify the kind of end-of-line conversion to use.
612
613 In contrast, the coding system @code{no-conversion} specifies no
614character code conversion at all---none for non-ASCII byte values and
615none for end of line. This is useful for reading or writing binary
616files, tar files, and other files that must be examined verbatim. It,
617too, sets @code{enable-multibyte-characters} to @code{nil}.
618
619 The easiest way to edit a file with no conversion of any kind is with
620the @kbd{M-x find-file-literally} command. This uses
621@code{no-conversion}, and also suppresses other Emacs features that
622might convert the file contents before you see them. @xref{Visiting}.
623
624 The coding system @code{emacs-mule} means that the file contains
625non-ASCII characters stored with the internal Emacs encoding. It
626handles end-of-line conversion based on the data encountered, and has
627the usual three variants to specify the kind of end-of-line conversion.
628
629@node Recognize Coding
630@section Recognizing Coding Systems
631
632 Most of the time, Emacs can recognize which coding system to use for
633any given file---once you have specified your preferences.
634
635 Some coding systems can be recognized or distinguished by which byte
636sequences appear in the data. However, there are coding systems that
637cannot be distinguished, not even potentially. For example, there is no
638way to distinguish between Latin-1 and Latin-2; they use the same byte
639values with different meanings.
640
641 Emacs handles this situation by means of a priority list of coding
642systems. Whenever Emacs reads a file, if you do not specify the coding
643system to use, Emacs checks the data against each coding system,
644starting with the first in priority and working down the list, until it
645finds a coding system that fits the data. Then it converts the file
646contents assuming that they are represented in this coding system.
647
648 The priority list of coding systems depends on the selected language
649environment (@pxref{Language Environments}). For example, if you use
650French, you probably want Emacs to prefer Latin-1 to Latin-2; if you use
651Czech, you probably want Latin-2 to be preferred. This is one of the
652reasons to specify a language environment.
653
654@findex prefer-coding-system
655 However, you can alter the priority list in detail with the command
656@kbd{M-x prefer-coding-system}. This command reads the name of a coding
657system from the minibuffer, and adds it to the front of the priority
658list, so that it is preferred to all others. If you use this command
659several times, each use adds one element to the front of the priority
660list.
661
662 If you use a coding system that specifies the end-of-line conversion
663type, such as @code{iso-8859-1-dos}, what that means is that Emacs
664should attempt to recognize @code{iso-8859-1} with priority, and should
665use DOS end-of-line conversion in case it recognizes @code{iso-8859-1}.
666
667@vindex file-coding-system-alist
668 Sometimes a file name indicates which coding system to use for the
669file. The variable @code{file-coding-system-alist} specifies this
670correspondence. There is a special function
671@code{modify-coding-system-alist} for adding elements to this list. For
672example, to read and write all @samp{.txt} files using the coding system
673@code{china-iso-8bit}, you can execute this Lisp expression:
674
675@smallexample
676(modify-coding-system-alist 'file "\\.txt\\'" 'china-iso-8bit)
677@end smallexample
678
679@noindent
680The first argument should be @code{file}, the second argument should be
681a regular expression that determines which files this applies to, and
682the third argument says which coding system to use for these files.
683
684@vindex inhibit-eol-conversion
5be757c3 685@cindex DOS-style end-of-line display
6bf7aab6
DL
686 Emacs recognizes which kind of end-of-line conversion to use based on
687the contents of the file: if it sees only carriage-returns, or only
688carriage-return linefeed sequences, then it chooses the end-of-line
689conversion accordingly. You can inhibit the automatic use of
690end-of-line conversion by setting the variable @code{inhibit-eol-conversion}
0d314165
EZ
691to non-@code{nil}. If you do that, DOS-style files will be displayed
692with the @samp{^M} characters visible in the buffer; some people
693prefer this to the more subtle @samp{(DOS)} end-of-line type
694indication near the left edge of the mode line (@pxref{Mode Line,
695eol-mnemonic}.
6bf7aab6 696
5be757c3
EZ
697@vindex inhibit-iso-escape-detection
698@cindex escape sequences in files
699 By default, the automatic detection of coding system is sensitive to
700escape sequences. If Emacs sees a sequence of characters that begin
4b40407a
RS
701with an escape character, and the sequence is valid as an ISO-2022
702code, that tells Emacs to use one of the ISO-2022 encodings to decode
703the file.
5be757c3 704
4b40407a
RS
705 However, there may be cases that you want to read escape sequences
706in a file as is. In such a case, you can set the variable
5be757c3 707@code{inhibit-iso-escape-detection} to non-@code{nil}. Then the code
4b40407a
RS
708detection ignores any escape sequences, and never uses an ISO-2022
709encoding. The result is that all escape sequences become visible in
710the buffer.
5be757c3
EZ
711
712 The default value of @code{inhibit-iso-escape-detection} is
4b40407a
RS
713@code{nil}. We recommend that you not change it permanently, only for
714one specific operation. That's because many Emacs Lisp source files
715that contain non-ASCII characters are encoded in the coding system
716@code{iso-2022-7bit} in the Emacs distribution, and they won't be
717decoded correctly when you visit those files if you suppress the
718escape sequence detection.
5be757c3 719
6bf7aab6
DL
720@vindex coding
721 You can specify the coding system for a particular file using the
722@samp{-*-@dots{}-*-} construct at the beginning of a file, or a local
723variables list at the end (@pxref{File Variables}). You do this by
724defining a value for the ``variable'' named @code{coding}. Emacs does
725not really have a variable @code{coding}; instead of setting a variable,
726it uses the specified coding system for the file. For example,
727@samp{-*-mode: C; coding: latin-1;-*-} specifies use of the Latin-1
728coding system, as well as C mode. If you specify the coding explicitly
729in the file, that overrides @code{file-coding-system-alist}.
730
731@vindex auto-coding-alist
732 The variable @code{auto-coding-alist} is the strongest way to specify
733the coding system for certain patterns of file names; this variable even
734overrides @samp{-*-coding:-*-} tags in the file itself. Emacs uses this
735feature for tar and archive files, to prevent Emacs from being confused
736by a @samp{-*-coding:-*-} tag in a member of the archive and thinking it
737applies to the archive file as a whole.
738
739@vindex buffer-file-coding-system
740 Once Emacs has chosen a coding system for a buffer, it stores that
741coding system in @code{buffer-file-coding-system} and uses that coding
742system, by default, for operations that write from this buffer into a
743file. This includes the commands @code{save-buffer} and
744@code{write-region}. If you want to write files from this buffer using
745a different coding system, you can specify a different coding system for
746the buffer using @code{set-buffer-file-coding-system} (@pxref{Specify
747Coding}).
748
4b40407a
RS
749 You can insert any possible character into any Emacs buffer, but
750most coding systems can only handle some of the possible characters.
751This means that you can insert characters that cannot be encoded with
752the coding system that will be used to save the buffer. For example,
753you could start with an ASCII file and insert a few Latin-1 characters
22f515a9 754into it, or you could edit a text file in Polish encoded in
4b40407a
RS
755@code{iso-8859-2} and add to it translations of several Polish words
756into Russian. When you save the buffer, Emacs cannot use the current
757value of @code{buffer-file-coding-system}, because the characters you
758added cannot be encoded by that coding system.
2a886892
EZ
759
760 When that happens, Emacs tries the most-preferred coding system (set
761by @kbd{M-x prefer-coding-system} or @kbd{M-x
4b40407a
RS
762set-language-environment}), and if that coding system can safely
763encode all of the characters in the buffer, Emacs uses it, and stores
764its value in @code{buffer-file-coding-system}. Otherwise, Emacs
765displays a list of coding systems suitable for encoding the buffer's
766contents, and asks to choose one of those coding systems.
767
768 If you insert the unsuitable characters in a mail message, Emacs
769behaves a bit differently. It additionally checks whether the
770most-preferred coding system is recommended for use in MIME messages;
771if it isn't, Emacs tells you that the most-preferred coding system is
772not recommended and prompts you for another coding system. This is so
773you won't inadvertently send a message encoded in a way that your
774recipient's mail software will have difficulty decoding. (If you do
775want to use the most-preferred coding system, you can type its name to
776Emacs prompt anyway.)
2a886892 777
6bf7aab6
DL
778@vindex sendmail-coding-system
779 When you send a message with Mail mode (@pxref{Sending Mail}), Emacs has
780four different ways to determine the coding system to use for encoding
781the message text. It tries the buffer's own value of
782@code{buffer-file-coding-system}, if that is non-@code{nil}. Otherwise,
783it uses the value of @code{sendmail-coding-system}, if that is
784non-@code{nil}. The third way is to use the default coding system for
785new files, which is controlled by your choice of language environment,
786if that is non-@code{nil}. If all of these three values are @code{nil},
787Emacs encodes outgoing mail using the Latin-1 coding system.
788
789@vindex rmail-decode-mime-charset
790 When you get new mail in Rmail, each message is translated
791automatically from the coding system it is written in---as if it were a
792separate file. This uses the priority list of coding systems that you
793have specified. If a MIME message specifies a character set, Rmail
794obeys that specification, unless @code{rmail-decode-mime-charset} is
795@code{nil}.
796
797@vindex rmail-file-coding-system
798 For reading and saving Rmail files themselves, Emacs uses the coding
799system specified by the variable @code{rmail-file-coding-system}. The
800default value is @code{nil}, which means that Rmail files are not
801translated (they are read and written in the Emacs internal character
802code).
803
804@node Specify Coding
805@section Specifying a Coding System
806
807 In cases where Emacs does not automatically choose the right coding
808system, you can use these commands to specify one:
809
810@table @kbd
811@item C-x @key{RET} f @var{coding} @key{RET}
812Use coding system @var{coding} for the visited file
813in the current buffer.
814
815@item C-x @key{RET} c @var{coding} @key{RET}
816Specify coding system @var{coding} for the immediately following
817command.
818
819@item C-x @key{RET} k @var{coding} @key{RET}
820Use coding system @var{coding} for keyboard input.
821
822@item C-x @key{RET} t @var{coding} @key{RET}
823Use coding system @var{coding} for terminal output.
824
825@item C-x @key{RET} p @var{input-coding} @key{RET} @var{output-coding} @key{RET}
826Use coding systems @var{input-coding} and @var{output-coding} for
827subprocess input and output in the current buffer.
828
829@item C-x @key{RET} x @var{coding} @key{RET}
830Use coding system @var{coding} for transferring selections to and from
831other programs through the window system.
832
833@item C-x @key{RET} X @var{coding} @key{RET}
834Use coding system @var{coding} for transferring @emph{one}
835selection---the next one---to or from the window system.
836@end table
837
838@kindex C-x RET f
839@findex set-buffer-file-coding-system
840 The command @kbd{C-x @key{RET} f} (@code{set-buffer-file-coding-system})
841specifies the file coding system for the current buffer---in other
842words, which coding system to use when saving or rereading the visited
843file. You specify which coding system using the minibuffer. Since this
844command applies to a file you have already visited, it affects only the
845way the file is saved.
846
847@kindex C-x RET c
848@findex universal-coding-system-argument
849 Another way to specify the coding system for a file is when you visit
850the file. First use the command @kbd{C-x @key{RET} c}
851(@code{universal-coding-system-argument}); this command uses the
852minibuffer to read a coding system name. After you exit the minibuffer,
853the specified coding system is used for @emph{the immediately following
854command}.
855
856 So if the immediately following command is @kbd{C-x C-f}, for example,
857it reads the file using that coding system (and records the coding
858system for when the file is saved). Or if the immediately following
859command is @kbd{C-x C-w}, it writes the file using that coding system.
860Other file commands affected by a specified coding system include
861@kbd{C-x C-i} and @kbd{C-x C-v}, as well as the other-window variants of
862@kbd{C-x C-f}.
863
864 @kbd{C-x @key{RET} c} also affects commands that start subprocesses,
865including @kbd{M-x shell} (@pxref{Shell}).
866
867 However, if the immediately following command does not use the coding
868system, then @kbd{C-x @key{RET} c} ultimately has no effect.
869
870 An easy way to visit a file with no conversion is with the @kbd{M-x
871find-file-literally} command. @xref{Visiting}.
872
873@vindex default-buffer-file-coding-system
874 The variable @code{default-buffer-file-coding-system} specifies the
875choice of coding system to use when you create a new file. It applies
876when you find a new file, and when you create a buffer and then save it
877in a file. Selecting a language environment typically sets this
878variable to a good choice of default coding system for that language
879environment.
880
881@kindex C-x RET t
882@findex set-terminal-coding-system
883 The command @kbd{C-x @key{RET} t} (@code{set-terminal-coding-system})
884specifies the coding system for terminal output. If you specify a
885character code for terminal output, all characters output to the
886terminal are translated into that coding system.
887
888 This feature is useful for certain character-only terminals built to
889support specific languages or character sets---for example, European
890terminals that support one of the ISO Latin character sets. You need to
891specify the terminal coding system when using multibyte text, so that
892Emacs knows which characters the terminal can actually handle.
893
894 By default, output to the terminal is not translated at all, unless
60245086
DL
895Emacs can deduce the proper coding system from your terminal type or
896your locale specification (@pxref{Language Environments}).
6bf7aab6
DL
897
898@kindex C-x RET k
899@findex set-keyboard-coding-system
aa120288 900@vindex keyboard-coding-system
6bf7aab6 901 The command @kbd{C-x @key{RET} k} (@code{set-keyboard-coding-system})
aa120288 902or the Custom option @code{keyboard-coding-system}
6bf7aab6
DL
903specifies the coding system for keyboard input. Character-code
904translation of keyboard input is useful for terminals with keys that
905send non-ASCII graphic characters---for example, some terminals designed
906for ISO Latin-1 or subsets of it.
907
908 By default, keyboard input is not translated at all.
909
910 There is a similarity between using a coding system translation for
911keyboard input, and using an input method: both define sequences of
912keyboard input that translate into single characters. However, input
913methods are designed to be convenient for interactive use by humans, and
914the sequences that are translated are typically sequences of ASCII
915printing characters. Coding systems typically translate sequences of
916non-graphic characters.
917
918@kindex C-x RET x
919@kindex C-x RET X
920@findex set-selection-coding-system
921@findex set-next-selection-coding-system
922 The command @kbd{C-x @key{RET} x} (@code{set-selection-coding-system})
923specifies the coding system for sending selected text to the window
924system, and for receiving the text of selections made in other
925applications. This command applies to all subsequent selections, until
926you override it by using the command again. The command @kbd{C-x
927@key{RET} X} (@code{set-next-selection-coding-system}) specifies the
928coding system for the next selection made in Emacs or read by Emacs.
929
930@kindex C-x RET p
931@findex set-buffer-process-coding-system
932 The command @kbd{C-x @key{RET} p} (@code{set-buffer-process-coding-system})
933specifies the coding system for input and output to a subprocess. This
934command applies to the current buffer; normally, each subprocess has its
935own buffer, and thus you can use this command to specify translation to
936and from a particular subprocess by giving the command in the
937corresponding buffer.
938
a895a5a5
KH
939 The default for translation of process input and output depends on the
940current language environment.
6bf7aab6
DL
941
942@vindex file-name-coding-system
0d314165 943@cindex file names with non-ASCII characters
6bf7aab6
DL
944 The variable @code{file-name-coding-system} specifies a coding system
945to use for encoding file names. If you set the variable to a coding
946system name (as a Lisp symbol or a string), Emacs encodes file names
947using that coding system for all file operations. This makes it
948possible to use non-ASCII characters in file names---or, at least, those
949non-ASCII characters which the specified coding system can encode.
950
951 If @code{file-name-coding-system} is @code{nil}, Emacs uses a default
952coding system determined by the selected language environment. In the
953default language environment, any non-ASCII characters in file names are
954not encoded specially; they appear in the file system using the internal
955Emacs representation.
956
957 @strong{Warning:} if you change @code{file-name-coding-system} (or the
958language environment) in the middle of an Emacs session, problems can
959result if you have already visited files whose names were encoded using
960the earlier coding system and cannot be encoded (or are encoded
961differently) under the new coding system. If you try to save one of
962these buffers under the visited file name, saving may use the wrong file
963name, or it may get an error. If such a problem happens, use @kbd{C-x
964C-w} to specify a new file name for that buffer.
965
fbc164de 966@vindex locale-coding-system
4b40407a
RS
967 The variable @code{locale-coding-system} specifies a coding system
968to use when encoding and decoding system strings such as system error
969messages and @code{format-time-string} formats and time stamps. You
970should choose a coding system that is compatible with the underlying
971system's text representation, which is normally specified by one of
972the environment variables @env{LC_ALL}, @env{LC_CTYPE}, and
973@env{LANG}. (The first one whose value is nonempty is the one that
974determines the text representation.)
fbc164de 975
6bf7aab6
DL
976@node Fontsets
977@section Fontsets
978@cindex fontsets
979
97878c08
EZ
980 A font for X typically defines shapes for one alphabet or script.
981Therefore, displaying the entire range of scripts that Emacs supports
982requires a collection of many fonts. In Emacs, such a collection is
983called a @dfn{fontset}. A fontset is defined by a list of fonts, each
984assigned to handle a range of character codes.
6bf7aab6
DL
985
986 Each fontset has a name, like a font. The available X fonts are
987defined by the X server; fontsets, however, are defined within Emacs
988itself. Once you have defined a fontset, you can use it within Emacs by
989specifying its name, anywhere that you could use a single font. Of
990course, Emacs fontsets can use only the fonts that the X server
991supports; if certain characters appear on the screen as hollow boxes,
992this means that the fontset in use for them has no font for those
4b40407a 993characters.@footnote{The Emacs installation instructions have information on
60245086 994additional font support.}
6bf7aab6
DL
995
996 Emacs creates two fontsets automatically: the @dfn{standard fontset}
997and the @dfn{startup fontset}. The standard fontset is most likely to
998have fonts for a wide variety of non-ASCII characters; however, this is
999not the default for Emacs to use. (By default, Emacs tries to find a
1000font which has bold and italic variants.) You can specify use of the
1001standard fontset with the @samp{-fn} option, or with the @samp{Font} X
1002resource (@pxref{Font X}). For example,
1003
1004@example
1005emacs -fn fontset-standard
1006@end example
1007
1008 A fontset does not necessarily specify a font for every character
1009code. If a fontset specifies no font for a certain character, or if it
1010specifies a font that does not exist on your system, then it cannot
1011display that character properly. It will display that character as an
1012empty box instead.
1013
1014@vindex highlight-wrong-size-font
1015 The fontset height and width are determined by the ASCII characters
1016(that is, by the font used for ASCII characters in that fontset). If
1017another font in the fontset has a different height, or a different
1018width, then characters assigned to that font are clipped to the
1019fontset's size. If @code{highlight-wrong-size-font} is non-@code{nil},
1020a box is displayed around these wrong-size characters as well.
1021
1022@node Defining Fontsets
1023@section Defining fontsets
1024
1025@vindex standard-fontset-spec
1026@cindex standard fontset
1027 Emacs creates a standard fontset automatically according to the value
1028of @code{standard-fontset-spec}. This fontset's name is
1029
1030@example
1031-*-fixed-medium-r-normal-*-16-*-*-*-*-*-fontset-standard
1032@end example
1033
1034@noindent
1035or just @samp{fontset-standard} for short.
1036
1037 Bold, italic, and bold-italic variants of the standard fontset are
1038created automatically. Their names have @samp{bold} instead of
1039@samp{medium}, or @samp{i} instead of @samp{r}, or both.
1040
1041@cindex startup fontset
1042 If you specify a default ASCII font with the @samp{Font} resource or
1043the @samp{-fn} argument, Emacs generates a fontset from it
1044automatically. This is the @dfn{startup fontset} and its name is
1045@code{fontset-startup}. It does this by replacing the @var{foundry},
1046@var{family}, @var{add_style}, and @var{average_width} fields of the
1047font name with @samp{*}, replacing @var{charset_registry} field with
1048@samp{fontset}, and replacing @var{charset_encoding} field with
1049@samp{startup}, then using the resulting string to specify a fontset.
1050
1051 For instance, if you start Emacs this way,
1052
1053@example
1054emacs -fn "*courier-medium-r-normal--14-140-*-iso8859-1"
1055@end example
1056
1057@noindent
1058Emacs generates the following fontset and uses it for the initial X
1059window frame:
1060
1061@example
1062-*-*-medium-r-normal-*-14-140-*-*-*-*-fontset-startup
1063@end example
1064
1065 With the X resource @samp{Emacs.Font}, you can specify a fontset name
1066just like an actual font name. But be careful not to specify a fontset
1067name in a wildcard resource like @samp{Emacs*Font}---that wildcard
1068specification applies to various other purposes, such as menus, and
1069menus cannot handle fontsets.
1070
1071 You can specify additional fontsets using X resources named
1072@samp{Fontset-@var{n}}, where @var{n} is an integer starting from 0.
1073The resource value should have this form:
1074
1075@smallexample
1076@var{fontpattern}, @r{[}@var{charsetname}:@var{fontname}@r{]@dots{}}
1077@end smallexample
1078
1079@noindent
1080@var{fontpattern} should have the form of a standard X font name, except
1081for the last two fields. They should have the form
1082@samp{fontset-@var{alias}}.
1083
1084 The fontset has two names, one long and one short. The long name is
1085@var{fontpattern}. The short name is @samp{fontset-@var{alias}}. You
1086can refer to the fontset by either name.
1087
1088 The construct @samp{@var{charset}:@var{font}} specifies which font to
1089use (in this fontset) for one particular character set. Here,
1090@var{charset} is the name of a character set, and @var{font} is the
1091font to use for that character set. You can use this construct any
1092number of times in defining one fontset.
1093
1094 For the other character sets, Emacs chooses a font based on
1095@var{fontpattern}. It replaces @samp{fontset-@var{alias}} with values
1096that describe the character set. For the ASCII character font,
1097@samp{fontset-@var{alias}} is replaced with @samp{ISO8859-1}.
1098
1099 In addition, when several consecutive fields are wildcards, Emacs
1100collapses them into a single wildcard. This is to prevent use of
1101auto-scaled fonts. Fonts made by scaling larger fonts are not usable
1102for editing, and scaling a smaller font is not useful because it is
1103better to use the smaller font in its own size, which Emacs does.
1104
1105 Thus if @var{fontpattern} is this,
1106
1107@example
1108-*-fixed-medium-r-normal-*-24-*-*-*-*-*-fontset-24
1109@end example
1110
1111@noindent
1112the font specification for ASCII characters would be this:
1113
1114@example
1115-*-fixed-medium-r-normal-*-24-*-ISO8859-1
1116@end example
1117
1118@noindent
1119and the font specification for Chinese GB2312 characters would be this:
1120
1121@example
1122-*-fixed-medium-r-normal-*-24-*-gb2312*-*
1123@end example
1124
1125 You may not have any Chinese font matching the above font
1126specification. Most X distributions include only Chinese fonts that
1127have @samp{song ti} or @samp{fangsong ti} in @var{family} field. In
1128such a case, @samp{Fontset-@var{n}} can be specified as below:
1129
1130@smallexample
1131Emacs.Fontset-0: -*-fixed-medium-r-normal-*-24-*-*-*-*-*-fontset-24,\
1132 chinese-gb2312:-*-*-medium-r-normal-*-24-*-gb2312*-*
1133@end smallexample
1134
1135@noindent
1136Then, the font specifications for all but Chinese GB2312 characters have
1137@samp{fixed} in the @var{family} field, and the font specification for
1138Chinese GB2312 characters has a wild card @samp{*} in the @var{family}
1139field.
1140
1141@findex create-fontset-from-fontset-spec
1142 The function that processes the fontset resource value to create the
1143fontset is called @code{create-fontset-from-fontset-spec}. You can also
1144call this function explicitly to create a fontset.
1145
1146 @xref{Font X}, for more information about font naming in X.
1147
60245086
DL
1148@node Undisplayable Characters
1149@section Undisplayable Characters
1150
4b40407a
RS
1151 Your terminal may be unable to display some non-@sc{ascii}
1152characters. Most non-windowing terminals can only use a single
1153character set (use the variable @code{default-terminal-coding-system}
1154(@pxref{Specify Coding}) to tell Emacs which one); characters which
1155can't be encoded in that coding system are displayed as @samp{?} by
1156default.
1157
1158 Windowing terminals can display a broader range of characters, but
1159you may not have fonts installed for all of them; characters that have
1160no font appear as a hollow box.
60245086 1161
4b40407a
RS
1162 If you use Latin-1 characters but your terminal can't display
1163Latin-1, you can arrange to display mnemonic @sc{ascii} sequences
1164instead, e.g.@: @samp{"o} for o-umlaut. Load the library
1165@file{iso-ascii} to do this.
60245086 1166
741c4ff9 1167@vindex latin1-display
4b40407a
RS
1168 If your terminal can display Latin-1, you can display characters
1169from other European character sets using a mixture of equivalent
1170Latin-1 characters and @sc{ascii} mnemonics. Use the Custom option
1171@code{latin1-display} to enable this. The mnemonic @sc{ascii}
1172sequences mostly correspond to those of the prefix input methods.
60245086 1173
521ab838
DL
1174@node Single-Byte Character Support
1175@section Single-byte Character Set Support
6bf7aab6
DL
1176
1177@cindex European character sets
1178@cindex accented characters
1179@cindex ISO Latin character sets
1180@cindex Unibyte operation
6bf7aab6
DL
1181 The ISO 8859 Latin-@var{n} character sets define character codes in
1182the range 160 to 255 to handle the accented letters and punctuation
521ab838
DL
1183needed by various European languages (and some non-European ones).
1184If you disable multibyte
6bf7aab6
DL
1185characters, Emacs can still handle @emph{one} of these character codes
1186at a time. To specify @emph{which} of these codes to use, invoke
1187@kbd{M-x set-language-environment} and specify a suitable language
1188environment such as @samp{Latin-@var{n}}.
1189
1190 For more information about unibyte operation, see @ref{Enabling
1191Multibyte}. Note particularly that you probably want to ensure that
1192your initialization files are read as unibyte if they contain non-ASCII
1193characters.
1194
1195@vindex unibyte-display-via-language-environment
1196 Emacs can also display those characters, provided the terminal or font
1197in use supports them. This works automatically. Alternatively, if you
1198are using a window system, Emacs can also display single-byte characters
1199through fontsets, in effect by displaying the equivalent multibyte
1200characters according to the current language environment. To request
1201this, set the variable @code{unibyte-display-via-language-environment}
1202to a non-@code{nil} value.
1203
1204@cindex @code{iso-ascii} library
1205 If your terminal does not support display of the Latin-1 character
1206set, Emacs can display these characters as ASCII sequences which at
1207least give you a clear idea of what the characters are. To do this,
1208load the library @code{iso-ascii}. Similar libraries for other
1209Latin-@var{n} character sets could be implemented, but we don't have
1210them yet.
1211
1212@findex standard-display-8bit
1213@cindex 8-bit display
1214 Normally non-ISO-8859 characters (between characters 128 and 159
1215inclusive) are displayed as octal escapes. You can change this for
2684ed46 1216non-standard ``extended'' versions of ISO-8859 character sets by using the
6bf7aab6
DL
1217function @code{standard-display-8bit} in the @code{disp-table} library.
1218
133f8c71 1219 There are several ways you can input single-byte non-ASCII
6bf7aab6
DL
1220characters:
1221
1222@itemize @bullet
521ab838 1223@cindex 8-bit input
6bf7aab6
DL
1224@item
1225If your keyboard can generate character codes 128 and up, representing
4b40407a
RS
1226non-ASCII you can type those character codes directly.
1227
1228On a windowing terminal, you should not need to do anything special to
1229use these keys; they should simply work. On a text-only terminal, you
1230should use the command @code{M-x set-keyboard-coding-system} or the
1231Custom option @code{keyboard-coding-system} to specify which coding
1232system your keyboard uses (@pxref{Specify Coding}). Enabling this
1233feature will probably require you to use @kbd{ESC} to type Meta
1234characters; however, on a Linux console or in @code{xterm}, you can
1235arrange for Meta to be converted to @kbd{ESC} and still be able type
12368-bit characters present directly on the keyboard or using
1237@kbd{Compose} or @kbd{AltGr} keys. @xref{User Input}.
521ab838 1238
6bf7aab6
DL
1239@item
1240You can use an input method for the selected language environment.
1241@xref{Input Methods}. When you use an input method in a unibyte buffer,
1242the non-ASCII character you specify with it is converted to unibyte.
1243
1244@kindex C-x 8
1245@cindex @code{iso-transl} library
98c271eb
DL
1246@cindex compose character
1247@cindex dead character
6bf7aab6
DL
1248@item
1249For Latin-1 only, you can use the
1250key @kbd{C-x 8} as a ``compose character'' prefix for entry of
1251non-ASCII Latin-1 printing characters. @kbd{C-x 8} is good for
1252insertion (in the minibuffer as well as other buffers), for searching,
1253and in any other context where a key sequence is allowed.
1254
1255@kbd{C-x 8} works by loading the @code{iso-transl} library. Once that
1256library is loaded, the @key{ALT} modifier key, if you have one, serves
1257the same purpose as @kbd{C-x 8}; use @key{ALT} together with an accent
1258character to modify the following letter. In addition, if you have keys
4b40407a 1259for the Latin-1 ``dead accent characters,'' they too are defined to
6bf7aab6 1260compose with the following character, once @code{iso-transl} is loaded.
133f8c71
DL
1261Use @kbd{C-x 8 C-h} to list the available translations as mnemonic
1262command names.
1263
133f8c71 1264@item
98c271eb
DL
1265@cindex @code{iso-acc} library
1266@cindex ISO Accents mode
1267@findex iso-accents-mode
13142d70 1268@cindex Latin-1, Latin-2 and Latin-3 input mode
4b40407a
RS
1269For Latin-1, Latin-2 and Latin-3, @kbd{M-x iso-accents-mode} installs
1270a minor mode which works much like the @code{latin-1-prefix} input
1271method does not depend on having the input methods installed. This
1272mode is buffer-local. It can be customized for various languages with
1273@kbd{M-x iso-accents-customize}.
6bf7aab6 1274@end itemize