HCoop Git - bpt/emacs.git/blame_incremental

... / ...

Commit	Line	Data
	1	@c --texinfo--
	2	@c This is part of the GNU Emacs Lisp Reference Manual.
	3	@c Copyright (C) 1998-1999, 2001-2014 Free Software Foundation, Inc.
	4	@c See the file elisp.texi for copying conditions.
	5	@node Non-ASCII Characters
	6	@chapter Non-@acronym{ASCII} Characters
	7	@cindex multibyte characters
	8	@cindex characters, multi-byte
	9	@cindex non-@acronym{ASCII} characters
	10
	11	This chapter covers the special issues relating to characters and
	12	how they are stored in strings and buffers.
	13
	14	@menu
	15	* Text Representations:: How Emacs represents text.
	16	* Disabling Multibyte:: Controlling whether to use multibyte characters.
	17	* Converting Representations:: Converting unibyte to multibyte and vice versa.
	18	* Selecting a Representation:: Treating a byte sequence as unibyte or multi.
	19	* Character Codes:: How unibyte and multibyte relate to
	20	codes of individual characters.
	21	* Character Properties:: Character attributes that define their
	22	behavior and handling.
	23	* Character Sets:: The space of possible character codes
	24	is divided into various character sets.
	25	* Scanning Charsets:: Which character sets are used in a buffer?
	26	* Translation of Characters:: Translation tables are used for conversion.
	27	* Coding Systems:: Coding systems are conversions for saving files.
	28	* Input Methods:: Input methods allow users to enter various
	29	non-ASCII characters without special keyboards.
	30	* Locales:: Interacting with the POSIX locale.
	31	@end menu
	32
	33	@node Text Representations
	34	@section Text Representations
	35	@cindex text representation
	36
	37	Emacs buffers and strings support a large repertoire of characters
	38	from many different scripts, allowing users to type and display text
	39	in almost any known written language.
	40
	41	@cindex character codepoint
	42	@cindex codespace
	43	@cindex Unicode
	44	To support this multitude of characters and scripts, Emacs closely
	45	follows the @dfn{Unicode Standard}. The Unicode Standard assigns a
	46	unique number, called a @dfn{codepoint}, to each and every character.
	47	The range of codepoints defined by Unicode, or the Unicode
	48	@dfn{codespace}, is @code{0..#x10FFFF} (in hexadecimal notation),
	49	inclusive. Emacs extends this range with codepoints in the range
	50	@code{#x110000..#x3FFFFF}, which it uses for representing characters
	51	that are not unified with Unicode and @dfn{raw 8-bit bytes} that
	52	cannot be interpreted as characters. Thus, a character codepoint in
	53	Emacs is a 22-bit integer.
	54
	55	@cindex internal representation of characters
	56	@cindex characters, representation in buffers and strings
	57	@cindex multibyte text
	58	To conserve memory, Emacs does not hold fixed-length 22-bit numbers
	59	that are codepoints of text characters within buffers and strings.
	60	Rather, Emacs uses a variable-length internal representation of
	61	characters, that stores each character as a sequence of 1 to 5 8-bit
	62	bytes, depending on the magnitude of its codepoint@footnote{
	63	This internal representation is based on one of the encodings defined
	64	by the Unicode Standard, called @dfn{UTF-8}, for representing any
	65	Unicode codepoint, but Emacs extends UTF-8 to represent the additional
	66	codepoints it uses for raw 8-bit bytes and characters not unified with
	67	Unicode.}. For example, any @acronym{ASCII} character takes up only 1
	68	byte, a Latin-1 character takes up 2 bytes, etc. We call this
	69	representation of text @dfn{multibyte}.
	70
	71	Outside Emacs, characters can be represented in many different
	72	encodings, such as ISO-8859-1, GB-2312, Big-5, etc. Emacs converts
	73	between these external encodings and its internal representation, as
	74	appropriate, when it reads text into a buffer or a string, or when it
	75	writes text to a disk file or passes it to some other process.
	76
	77	Occasionally, Emacs needs to hold and manipulate encoded text or
	78	binary non-text data in its buffers or strings. For example, when
	79	Emacs visits a file, it first reads the file's text verbatim into a
	80	buffer, and only then converts it to the internal representation.
	81	Before the conversion, the buffer holds encoded text.
	82
	83	@cindex unibyte text
	84	Encoded text is not really text, as far as Emacs is concerned, but
	85	rather a sequence of raw 8-bit bytes. We call buffers and strings
	86	that hold encoded text @dfn{unibyte} buffers and strings, because
	87	Emacs treats them as a sequence of individual bytes. Usually, Emacs
	88	displays unibyte buffers and strings as octal codes such as
	89	@code{\237}. We recommend that you never use unibyte buffers and
	90	strings except for manipulating encoded text or binary non-text data.
	91
	92	In a buffer, the buffer-local value of the variable
	93	@code{enable-multibyte-characters} specifies the representation used.
	94	The representation for a string is determined and recorded in the string
	95	when the string is constructed.
	96
	97	@defvar enable-multibyte-characters
	98	This variable specifies the current buffer's text representation.
	99	If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
	100	it contains unibyte encoded text or binary non-text data.
	101
	102	You cannot set this variable directly; instead, use the function
	103	@code{set-buffer-multibyte} to change a buffer's representation.
	104	@end defvar
	105
	106	@defun position-bytes position
	107	Buffer positions are measured in character units. This function
	108	returns the byte-position corresponding to buffer position
	109	@var{position} in the current buffer. This is 1 at the start of the
	110	buffer, and counts upward in bytes. If @var{position} is out of
	111	range, the value is @code{nil}.
	112	@end defun
	113
	114	@defun byte-to-position byte-position
	115	Return the buffer position, in character units, corresponding to given
	116	@var{byte-position} in the current buffer. If @var{byte-position} is
	117	out of range, the value is @code{nil}. In a multibyte buffer, an
	118	arbitrary value of @var{byte-position} can be not at character
	119	boundary, but inside a multibyte sequence representing a single
	120	character; in this case, this function returns the buffer position of
	121	the character whose multibyte sequence includes @var{byte-position}.
	122	In other words, the value does not change for all byte positions that
	123	belong to the same character.
	124	@end defun
	125
	126	@defun multibyte-string-p string
	127	Return @code{t} if @var{string} is a multibyte string, @code{nil}
	128	otherwise. This function also returns @code{nil} if @var{string} is
	129	some object other than a string.
	130	@end defun
	131
	132	@defun string-bytes string
	133	@cindex string, number of bytes
	134	This function returns the number of bytes in @var{string}.
	135	If @var{string} is a multibyte string, this can be greater than
	136	@code{(length @var{string})}.
	137	@end defun
	138
	139	@defun unibyte-string &rest bytes
	140	This function concatenates all its argument @var{bytes} and makes the
	141	result a unibyte string.
	142	@end defun
	143
	144	@node Disabling Multibyte
	145	@section Disabling Multibyte Characters
	146	@cindex disabling multibyte
	147
	148	By default, Emacs starts in multibyte mode: it stores the contents
	149	of buffers and strings using an internal encoding that represents
	150	non-@acronym{ASCII} characters using multi-byte sequences. Multibyte
	151	mode allows you to use all the supported languages and scripts without
	152	limitations.
	153
	154	@cindex turn multibyte support on or off
	155	Under very special circumstances, you may want to disable multibyte
	156	character support, for a specific buffer.
	157	When multibyte characters are disabled in a buffer, we call
	158	that @dfn{unibyte mode}. In unibyte mode, each character in the
	159	buffer has a character code ranging from 0 through 255 (0377 octal); 0
	160	through 127 (0177 octal) represent @acronym{ASCII} characters, and 128
	161	(0200 octal) through 255 (0377 octal) represent non-@acronym{ASCII}
	162	characters.
	163
	164	To edit a particular file in unibyte representation, visit it using
	165	@code{find-file-literally}. @xref{Visiting Functions}. You can
	166	convert a multibyte buffer to unibyte by saving it to a file, killing
	167	the buffer, and visiting the file again with
	168	@code{find-file-literally}. Alternatively, you can use @kbd{C-x
	169	@key{RET} c} (@code{universal-coding-system-argument}) and specify
	170	@samp{raw-text} as the coding system with which to visit or save a
	171	file. @xref{Text Coding, , Specifying a Coding System for File Text,
	172	emacs, GNU Emacs Manual}. Unlike @code{find-file-literally}, finding
	173	a file as @samp{raw-text} doesn't disable format conversion,
	174	uncompression, or auto mode selection.
	175
	176	@c See http://debbugs.gnu.org/11226 for lack of unibyte tooltip.
	177	@vindex enable-multibyte-characters
	178	The buffer-local variable @code{enable-multibyte-characters} is
	179	non-@code{nil} in multibyte buffers, and @code{nil} in unibyte ones.
	180	The mode line also indicates whether a buffer is multibyte or not.
	181	With a graphical display, in a multibyte buffer, the portion of the
	182	mode line that indicates the character set has a tooltip that (amongst
	183	other things) says that the buffer is multibyte. In a unibyte buffer,
	184	the character set indicator is absent. Thus, in a unibyte buffer
	185	(when using a graphical display) there is normally nothing before the
	186	indication of the visited file's end-of-line convention (colon,
	187	backslash, etc.), unless you are using an input method.
	188
	189	@findex toggle-enable-multibyte-characters
	190	You can turn off multibyte support in a specific buffer by invoking the
	191	command @code{toggle-enable-multibyte-characters} in that buffer.
	192
	193	@node Converting Representations
	194	@section Converting Text Representations
	195
	196	Emacs can convert unibyte text to multibyte; it can also convert
	197	multibyte text to unibyte, provided that the multibyte text contains
	198	only @acronym{ASCII} and 8-bit raw bytes. In general, these
	199	conversions happen when inserting text into a buffer, or when putting
	200	text from several strings together in one string. You can also
	201	explicitly convert a string's contents to either representation.
	202
	203	Emacs chooses the representation for a string based on the text from
	204	which it is constructed. The general rule is to convert unibyte text
	205	to multibyte text when combining it with other multibyte text, because
	206	the multibyte representation is more general and can hold whatever
	207	characters the unibyte text has.
	208
	209	When inserting text into a buffer, Emacs converts the text to the
	210	buffer's representation, as specified by
	211	@code{enable-multibyte-characters} in that buffer. In particular, when
	212	you insert multibyte text into a unibyte buffer, Emacs converts the text
	213	to unibyte, even though this conversion cannot in general preserve all
	214	the characters that might be in the multibyte text. The other natural
	215	alternative, to convert the buffer contents to multibyte, is not
	216	acceptable because the buffer's representation is a choice made by the
	217	user that cannot be overridden automatically.
	218
	219	Converting unibyte text to multibyte text leaves @acronym{ASCII}
	220	characters unchanged, and converts bytes with codes 128 through 255 to
	221	the multibyte representation of raw eight-bit bytes.
	222
	223	Converting multibyte text to unibyte converts all @acronym{ASCII}
	224	and eight-bit characters to their single-byte form, but loses
	225	information for non-@acronym{ASCII} characters by discarding all but
	226	the low 8 bits of each character's codepoint. Converting unibyte text
	227	to multibyte and back to unibyte reproduces the original unibyte text.
	228
	229	The next two functions either return the argument @var{string}, or a
	230	newly created string with no text properties.
	231
	232	@defun string-to-multibyte string
	233	This function returns a multibyte string containing the same sequence
	234	of characters as @var{string}. If @var{string} is a multibyte string,
	235	it is returned unchanged. The function assumes that @var{string}
	236	includes only @acronym{ASCII} characters and raw 8-bit bytes; the
	237	latter are converted to their multibyte representation corresponding
	238	to the codepoints @code{#x3FFF80} through @code{#x3FFFFF}, inclusive
	239	(@pxref{Text Representations, codepoints}).
	240	@end defun
	241
	242	@defun string-to-unibyte string
	243	This function returns a unibyte string containing the same sequence of
	244	characters as @var{string}. It signals an error if @var{string}
	245	contains a non-@acronym{ASCII} character. If @var{string} is a
	246	unibyte string, it is returned unchanged. Use this function for
	247	@var{string} arguments that contain only @acronym{ASCII} and eight-bit
	248	characters.
	249	@end defun
	250
	251	@c FIXME: Should `@var{character}' be `@var{byte}'?
	252	@defun byte-to-string byte
	253	@cindex byte to string
	254	This function returns a unibyte string containing a single byte of
	255	character data, @var{character}. It signals an error if
	256	@var{character} is not an integer between 0 and 255.
	257	@end defun
	258
	259	@defun multibyte-char-to-unibyte char
	260	This converts the multibyte character @var{char} to a unibyte
	261	character, and returns that character. If @var{char} is neither
	262	@acronym{ASCII} nor eight-bit, the function returns @minus{}1.
	263	@end defun
	264
	265	@defun unibyte-char-to-multibyte char
	266	This convert the unibyte character @var{char} to a multibyte
	267	character, assuming @var{char} is either @acronym{ASCII} or raw 8-bit
	268	byte.
	269	@end defun
	270
	271	@node Selecting a Representation
	272	@section Selecting a Representation
	273
	274	Sometimes it is useful to examine an existing buffer or string as
	275	multibyte when it was unibyte, or vice versa.
	276
	277	@defun set-buffer-multibyte multibyte
	278	Set the representation type of the current buffer. If @var{multibyte}
	279	is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}
	280	is @code{nil}, the buffer becomes unibyte.
	281
	282	This function leaves the buffer contents unchanged when viewed as a
	283	sequence of bytes. As a consequence, it can change the contents
	284	viewed as characters; for instance, a sequence of three bytes which is
	285	treated as one character in multibyte representation will count as
	286	three characters in unibyte representation. Eight-bit characters
	287	representing raw bytes are an exception. They are represented by one
	288	byte in a unibyte buffer, but when the buffer is set to multibyte,
	289	they are converted to two-byte sequences, and vice versa.
	290
	291	This function sets @code{enable-multibyte-characters} to record which
	292	representation is in use. It also adjusts various data in the buffer
	293	(including overlays, text properties and markers) so that they cover the
	294	same text as they did before.
	295
	296	This function signals an error if the buffer is narrowed, since the
	297	narrowing might have occurred in the middle of multibyte character
	298	sequences.
	299
	300	This function also signals an error if the buffer is an indirect
	301	buffer. An indirect buffer always inherits the representation of its
	302	base buffer.
	303	@end defun
	304
	305	@defun string-as-unibyte string
	306	If @var{string} is already a unibyte string, this function returns
	307	@var{string} itself. Otherwise, it returns a new string with the same
	308	bytes as @var{string}, but treating each byte as a separate character
	309	(so that the value may have more characters than @var{string}); as an
	310	exception, each eight-bit character representing a raw byte is
	311	converted into a single byte. The newly-created string contains no
	312	text properties.
	313	@end defun
	314
	315	@defun string-as-multibyte string
	316	If @var{string} is a multibyte string, this function returns
	317	@var{string} itself. Otherwise, it returns a new string with the same
	318	bytes as @var{string}, but treating each multibyte sequence as one
	319	character. This means that the value may have fewer characters than
	320	@var{string} has. If a byte sequence in @var{string} is invalid as a
	321	multibyte representation of a single character, each byte in the
	322	sequence is treated as a raw 8-bit byte. The newly-created string
	323	contains no text properties.
	324	@end defun
	325
	326	@node Character Codes
	327	@section Character Codes
	328	@cindex character codes
	329
	330	The unibyte and multibyte text representations use different
	331	character codes. The valid character codes for unibyte representation
	332	range from 0 to @code{#xFF} (255)---the values that can fit in one
	333	byte. The valid character codes for multibyte representation range
	334	from 0 to @code{#x3FFFFF}. In this code space, values 0 through
	335	@code{#x7F} (127) are for @acronym{ASCII} characters, and values
	336	@code{#x80} (128) through @code{#x3FFF7F} (4194175) are for
	337	non-@acronym{ASCII} characters.
	338
	339	Emacs character codes are a superset of the Unicode standard.
	340	Values 0 through @code{#x10FFFF} (1114111) correspond to Unicode
	341	characters of the same codepoint; values @code{#x110000} (1114112)
	342	through @code{#x3FFF7F} (4194175) represent characters that are not
	343	unified with Unicode; and values @code{#x3FFF80} (4194176) through
	344	@code{#x3FFFFF} (4194303) represent eight-bit raw bytes.
	345
	346	@defun characterp charcode
	347	This returns @code{t} if @var{charcode} is a valid character, and
	348	@code{nil} otherwise.
	349
	350	@example
	351	@group
	352	(characterp 65)
	353	@result{} t
	354	@end group
	355	@group
	356	(characterp 4194303)
	357	@result{} t
	358	@end group
	359	@group
	360	(characterp 4194304)
	361	@result{} nil
	362	@end group
	363	@end example
	364	@end defun
	365
	366	@cindex maximum value of character codepoint
	367	@cindex codepoint, largest value
	368	@defun max-char
	369	This function returns the largest value that a valid character
	370	codepoint can have.
	371
	372	@example
	373	@group
	374	(characterp (max-char))
	375	@result{} t
	376	@end group
	377	@group
	378	(characterp (1+ (max-char)))
	379	@result{} nil
	380	@end group
	381	@end example
	382	@end defun
	383
	384	@defun get-byte &optional pos string
	385	This function returns the byte at character position @var{pos} in the
	386	current buffer. If the current buffer is unibyte, this is literally
	387	the byte at that position. If the buffer is multibyte, byte values of
	388	@acronym{ASCII} characters are the same as character codepoints,
	389	whereas eight-bit raw bytes are converted to their 8-bit codes. The
	390	function signals an error if the character at @var{pos} is
	391	non-@acronym{ASCII}.
	392
	393	The optional argument @var{string} means to get a byte value from that
	394	string instead of the current buffer.
	395	@end defun
	396
	397	@node Character Properties
	398	@section Character Properties
	399	@cindex character properties
	400	A @dfn{character property} is a named attribute of a character that
	401	specifies how the character behaves and how it should be handled
	402	during text processing and display. Thus, character properties are an
	403	important part of specifying the character's semantics.
	404
	405	@c FIXME: Use the latest URI of this chapter?
	406	@c http://www.unicode.org/versions/latest/ch04.pdf
	407	On the whole, Emacs follows the Unicode Standard in its implementation
	408	of character properties. In particular, Emacs supports the
	409	@uref{http://www.unicode.org/reports/tr23/, Unicode Character Property
	410	Model}, and the Emacs character property database is derived from the
	411	Unicode Character Database (@acronym{UCD}). See the
	412	@uref{http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf, Character
	413	Properties chapter of the Unicode Standard}, for a detailed
	414	description of Unicode character properties and their meaning. This
	415	section assumes you are already familiar with that chapter of the
	416	Unicode Standard, and want to apply that knowledge to Emacs Lisp
	417	programs.
	418
	419	In Emacs, each property has a name, which is a symbol, and a set of
	420	possible values, whose types depend on the property; if a character
	421	does not have a certain property, the value is @code{nil}. As a
	422	general rule, the names of character properties in Emacs are produced
	423	from the corresponding Unicode properties by downcasing them and
	424	replacing each @samp{_} character with a dash @samp{-}. For example,
	425	@code{Canonical_Combining_Class} becomes
	426	@code{canonical-combining-class}. However, sometimes we shorten the
	427	names to make their use easier.
	428
	429	@cindex unassigned character codepoints
	430	Some codepoints are left @dfn{unassigned} by the
	431	@acronym{UCD}---they don't correspond to any character. The Unicode
	432	Standard defines default values of properties for such codepoints;
	433	they are mentioned below for each property.
	434
	435	Here is the full list of value types for all the character
	436	properties that Emacs knows about:
	437
	438	@table @code
	439	@item name
	440	Corresponds to the @code{Name} Unicode property. The value is a
	441	string consisting of upper-case Latin letters A to Z, digits, spaces,
	442	and hyphen @samp{-} characters. For unassigned codepoints, the value
	443	is @code{nil}.
	444
	445	@cindex unicode general category
	446	@item general-category
	447	Corresponds to the @code{General_Category} Unicode property. The
	448	value is a symbol whose name is a 2-letter abbreviation of the
	449	character's classification. For unassigned codepoints, the value
	450	is @code{Cn}.
	451
	452	@item canonical-combining-class
	453	Corresponds to the @code{Canonical_Combining_Class} Unicode property.
	454	The value is an integer. For unassigned codepoints, the value
	455	is zero.
	456
	457	@cindex bidirectional class of characters
	458	@item bidi-class
	459	Corresponds to the Unicode @code{Bidi_Class} property. The value is a
	460	symbol whose name is the Unicode @dfn{directional type} of the
	461	character. Emacs uses this property when it reorders bidirectional
	462	text for display (@pxref{Bidirectional Display}). For unassigned
	463	codepoints, the value depends on the code blocks to which the
	464	codepoint belongs: most unassigned codepoints get the value of
	465	@code{L} (strong L), but some get values of @code{AL} (Arabic letter)
	466	or @code{R} (strong R).
	467
	468	@item decomposition
	469	Corresponds to the Unicode properties @code{Decomposition_Type} and
	470	@code{Decomposition_Value}. The value is a list, whose first element
	471	may be a symbol representing a compatibility formatting tag, such as
	472	@code{small}@footnote{The Unicode specification writes these tag names
	473	inside @samp{<..>} brackets, but the tag names in Emacs do not include
	474	the brackets; e.g., Unicode specifies @samp{<small>} where Emacs uses
	475	@samp{small}. }; the other elements are characters that give the
	476	compatibility decomposition sequence of this character. For
	477	unassigned codepoints, the value is the character itself.
	478
	479	@item decimal-digit-value
	480	Corresponds to the Unicode @code{Numeric_Value} property for
	481	characters whose @code{Numeric_Type} is @samp{Decimal}. The value is
	482	an integer. For unassigned codepoints, the value is
	483	@code{nil}, which means @acronym{NaN}, or ``not-a-number''.
	484
	485	@item digit-value
	486	Corresponds to the Unicode @code{Numeric_Value} property for
	487	characters whose @code{Numeric_Type} is @samp{Digit}. The value is an
	488	integer. Examples of such characters include compatibility
	489	subscript and superscript digits, for which the value is the
	490	corresponding number. For unassigned codepoints, the value is
	491	@code{nil}, which means @acronym{NaN}.
	492
	493	@item numeric-value
	494	Corresponds to the Unicode @code{Numeric_Value} property for
	495	characters whose @code{Numeric_Type} is @samp{Numeric}. The value of
	496	this property is a number. Examples of
	497	characters that have this property include fractions, subscripts,
	498	superscripts, Roman numerals, currency numerators, and encircled
	499	numbers. For example, the value of this property for the character
	500	@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}. For
	501	unassigned codepoints, the value is @code{nil}, which means
	502	@acronym{NaN}.
	503
	504	@cindex mirroring of characters
	505	@item mirrored
	506	Corresponds to the Unicode @code{Bidi_Mirrored} property. The value
	507	of this property is a symbol, either @code{Y} or @code{N}. For
	508	unassigned codepoints, the value is @code{N}.
	509
	510	@item mirroring
	511	Corresponds to the Unicode @code{Bidi_Mirroring_Glyph} property. The
	512	value of this property is a character whose glyph represents the
	513	mirror image of the character's glyph, or @code{nil} if there's no
	514	defined mirroring glyph. All the characters whose @code{mirrored}
	515	property is @code{N} have @code{nil} as their @code{mirroring}
	516	property; however, some characters whose @code{mirrored} property is
	517	@code{Y} also have @code{nil} for @code{mirroring}, because no
	518	appropriate characters exist with mirrored glyphs. Emacs uses this
	519	property to display mirror images of characters when appropriate
	520	(@pxref{Bidirectional Display}). For unassigned codepoints, the value
	521	is @code{nil}.
	522
	523	@item old-name
	524	Corresponds to the Unicode @code{Unicode_1_Name} property. The value
	525	is a string. Unassigned codepoints, and characters that have no value
	526	for this property, the value is @code{nil}.
	527
	528	@item iso-10646-comment
	529	Corresponds to the Unicode @code{ISO_Comment} property. The value is
	530	a string. For unassigned codepoints, the value is an empty string.
	531
	532	@item uppercase
	533	Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property.
	534	The value of this property is a single character. For unassigned
	535	codepoints, the value is @code{nil}, which means the character itself.
	536
	537	@item lowercase
	538	Corresponds to the Unicode @code{Simple_Lowercase_Mapping} property.
	539	The value of this property is a single character. For unassigned
	540	codepoints, the value is @code{nil}, which means the character itself.
	541
	542	@item titlecase
	543	Corresponds to the Unicode @code{Simple_Titlecase_Mapping} property.
	544	@dfn{Title case} is a special form of a character used when the first
	545	character of a word needs to be capitalized. The value of this
	546	property is a single character. For unassigned codepoints, the value
	547	is @code{nil}, which means the character itself.
	548	@end table
	549
	550	@defun get-char-code-property char propname
	551	This function returns the value of @var{char}'s @var{propname} property.
	552
	553	@example
	554	@group
	555	(get-char-code-property ?\s 'general-category)
	556	@result{} Zs
	557	@end group
	558	@group
	559	(get-char-code-property ?1 'general-category)
	560	@result{} Nd
	561	@end group
	562	@group
	563	;; subscript 4
	564	(get-char-code-property ?\u2084 'digit-value)
	565	@result{} 4
	566	@end group
	567	@group
	568	;; one fifth
	569	(get-char-code-property ?\u2155 'numeric-value)
	570	@result{} 0.2
	571	@end group
	572	@group
	573	;; Roman IV
	574	(get-char-code-property ?\u2163 'numeric-value)
	575	@result{} 4
	576	@end group
	577	@end example
	578	@end defun
	579
	580	@defun char-code-property-description prop value
	581	This function returns the description string of property @var{prop}'s
	582	@var{value}, or @code{nil} if @var{value} has no description.
	583
	584	@example
	585	@group
	586	(char-code-property-description 'general-category 'Zs)
	587	@result{} "Separator, Space"
	588	@end group
	589	@group
	590	(char-code-property-description 'general-category 'Nd)
	591	@result{} "Number, Decimal Digit"
	592	@end group
	593	@group
	594	(char-code-property-description 'numeric-value '1/5)
	595	@result{} nil
	596	@end group
	597	@end example
	598	@end defun
	599
	600	@defun put-char-code-property char propname value
	601	This function stores @var{value} as the value of the property
	602	@var{propname} for the character @var{char}.
	603	@end defun
	604
	605	@defvar unicode-category-table
	606	The value of this variable is a char-table (@pxref{Char-Tables}) that
	607	specifies, for each character, its Unicode @code{General_Category}
	608	property as a symbol.
	609	@end defvar
	610
	611	@defvar char-script-table
	612	@cindex script symbols
	613	The value of this variable is a char-table that specifies, for each
	614	character, a symbol whose name is the script to which the character
	615	belongs, according to the Unicode Standard classification of the
	616	Unicode code space into script-specific blocks. This char-table has a
	617	single extra slot whose value is the list of all script symbols.
	618	@end defvar
	619
	620	@defvar char-width-table
	621	The value of this variable is a char-table that specifies the width of
	622	each character in columns that it will occupy on the screen.
	623	@end defvar
	624
	625	@defvar printable-chars
	626	The value of this variable is a char-table that specifies, for each
	627	character, whether it is printable or not. That is, if evaluating
	628	@code{(aref printable-chars char)} results in @code{t}, the character
	629	is printable, and if it results in @code{nil}, it is not.
	630	@end defvar
	631
	632	@node Character Sets
	633	@section Character Sets
	634	@cindex character sets
	635
	636	@cindex charset
	637	@cindex coded character set
	638	An Emacs @dfn{character set}, or @dfn{charset}, is a set of characters
	639	in which each character is assigned a numeric code point. (The
	640	Unicode Standard calls this a @dfn{coded character set}.) Each Emacs
	641	charset has a name which is a symbol. A single character can belong
	642	to any number of different character sets, but it will generally have
	643	a different code point in each charset. Examples of character sets
	644	include @code{ascii}, @code{iso-8859-1}, @code{greek-iso8859-7}, and
	645	@code{windows-1255}. The code point assigned to a character in a
	646	charset is usually different from its code point used in Emacs buffers
	647	and strings.
	648
	649	@cindex @code{emacs}, a charset
	650	@cindex @code{unicode}, a charset
	651	@cindex @code{eight-bit}, a charset
	652	Emacs defines several special character sets. The character set
	653	@code{unicode} includes all the characters whose Emacs code points are
	654	in the range @code{0..#x10FFFF}. The character set @code{emacs}
	655	includes all @acronym{ASCII} and non-@acronym{ASCII} characters.
	656	Finally, the @code{eight-bit} charset includes the 8-bit raw bytes;
	657	Emacs uses it to represent raw bytes encountered in text.
	658
	659	@defun charsetp object
	660	Returns @code{t} if @var{object} is a symbol that names a character set,
	661	@code{nil} otherwise.
	662	@end defun
	663
	664	@defvar charset-list
	665	The value is a list of all defined character set names.
	666	@end defvar
	667
	668	@defun charset-priority-list &optional highestp
	669	This function returns a list of all defined character sets ordered by
	670	their priority. If @var{highestp} is non-@code{nil}, the function
	671	returns a single character set of the highest priority.
	672	@end defun
	673
	674	@defun set-charset-priority &rest charsets
	675	This function makes @var{charsets} the highest priority character sets.
	676	@end defun
	677
	678	@defun char-charset character &optional restriction
	679	This function returns the name of the character set of highest
	680	priority that @var{character} belongs to. @acronym{ASCII} characters
	681	are an exception: for them, this function always returns @code{ascii}.
	682
	683	If @var{restriction} is non-@code{nil}, it should be a list of
	684	charsets to search. Alternatively, it can be a coding system, in
	685	which case the returned charset must be supported by that coding
	686	system (@pxref{Coding Systems}).
	687	@end defun
	688
	689	@c TODO: Explain the properties here and add indexes such as 'charset property'.
	690	@defun charset-plist charset
	691	This function returns the property list of the character set
	692	@var{charset}. Although @var{charset} is a symbol, this is not the
	693	same as the property list of that symbol. Charset properties include
	694	important information about the charset, such as its documentation
	695	string, short name, etc.
	696	@end defun
	697
	698	@defun put-charset-property charset propname value
	699	This function sets the @var{propname} property of @var{charset} to the
	700	given @var{value}.
	701	@end defun
	702
	703	@defun get-charset-property charset propname
	704	This function returns the value of @var{charset}s property
	705	@var{propname}.
	706	@end defun
	707
	708	@deffn Command list-charset-chars charset
	709	This command displays a list of characters in the character set
	710	@var{charset}.
	711	@end deffn
	712
	713	Emacs can convert between its internal representation of a character
	714	and the character's codepoint in a specific charset. The following
	715	two functions support these conversions.
	716
	717	@c FIXME: decode-char and encode-char accept and ignore an additional
	718	@c argument @var{restriction}. When that argument actually makes a
	719	@c difference, it should be documented here.
	720	@defun decode-char charset code-point
	721	This function decodes a character that is assigned a @var{code-point}
	722	in @var{charset}, to the corresponding Emacs character, and returns
	723	it. If @var{charset} doesn't contain a character of that code point,
	724	the value is @code{nil}. If @var{code-point} doesn't fit in a Lisp
	725	integer (@pxref{Integer Basics, most-positive-fixnum}), it can be
	726	specified as a cons cell @code{(@var{high} . @var{low})}, where
	727	@var{low} are the lower 16 bits of the value and @var{high} are the
	728	high 16 bits.
	729	@end defun
	730
	731	@defun encode-char char charset
	732	This function returns the code point assigned to the character
	733	@var{char} in @var{charset}. If the result does not fit in a Lisp
	734	integer, it is returned as a cons cell @code{(@var{high} . @var{low})}
	735	that fits the second argument of @code{decode-char} above. If
	736	@var{charset} doesn't have a codepoint for @var{char}, the value is
	737	@code{nil}.
	738	@end defun
	739
	740	The following function comes in handy for applying a certain
	741	function to all or part of the characters in a charset:
	742
	743	@defun map-charset-chars function charset &optional arg from-code to-code
	744	Call @var{function} for characters in @var{charset}. @var{function}
	745	is called with two arguments. The first one is a cons cell
	746	@code{(@var{from} . @var{to})}, where @var{from} and @var{to}
	747	indicate a range of characters contained in charset. The second
	748	argument passed to @var{function} is @var{arg}.
	749
	750	By default, the range of codepoints passed to @var{function} includes
	751	all the characters in @var{charset}, but optional arguments
	752	@var{from-code} and @var{to-code} limit that to the range of
	753	characters between these two codepoints of @var{charset}. If either
	754	of them is @code{nil}, it defaults to the first or last codepoint of
	755	@var{charset}, respectively.
	756	@end defun
	757
	758	@node Scanning Charsets
	759	@section Scanning for Character Sets
	760
	761	Sometimes it is useful to find out which character set a particular
	762	character belongs to. One use for this is in determining which coding
	763	systems (@pxref{Coding Systems}) are capable of representing all of
	764	the text in question; another is to determine the font(s) for
	765	displaying that text.
	766
	767	@defun charset-after &optional pos
	768	This function returns the charset of highest priority containing the
	769	character at position @var{pos} in the current buffer. If @var{pos}
	770	is omitted or @code{nil}, it defaults to the current value of point.
	771	If @var{pos} is out of range, the value is @code{nil}.
	772	@end defun
	773
	774	@defun find-charset-region beg end &optional translation
	775	This function returns a list of the character sets of highest priority
	776	that contain characters in the current buffer between positions
	777	@var{beg} and @var{end}.
	778
	779	The optional argument @var{translation} specifies a translation table
	780	to use for scanning the text (@pxref{Translation of Characters}). If
	781	it is non-@code{nil}, then each character in the region is translated
	782	through this table, and the value returned describes the translated
	783	characters instead of the characters actually in the buffer.
	784	@end defun
	785
	786	@defun find-charset-string string &optional translation
	787	This function returns a list of character sets of highest priority
	788	that contain characters in @var{string}. It is just like
	789	@code{find-charset-region}, except that it applies to the contents of
	790	@var{string} instead of part of the current buffer.
	791	@end defun
	792
	793	@node Translation of Characters
	794	@section Translation of Characters
	795	@cindex character translation tables
	796	@cindex translation tables
	797
	798	A @dfn{translation table} is a char-table (@pxref{Char-Tables}) that
	799	specifies a mapping of characters into characters. These tables are
	800	used in encoding and decoding, and for other purposes. Some coding
	801	systems specify their own particular translation tables; there are
	802	also default translation tables which apply to all other coding
	803	systems.
	804
	805	A translation table has two extra slots. The first is either
	806	@code{nil} or a translation table that performs the reverse
	807	translation; the second is the maximum number of characters to look up
	808	for translating sequences of characters (see the description of
	809	@code{make-translation-table-from-alist} below).
	810
	811	@defun make-translation-table &rest translations
	812	This function returns a translation table based on the argument
	813	@var{translations}. Each element of @var{translations} should be a
	814	list of elements of the form @code{(@var{from} . @var{to})}; this says
	815	to translate the character @var{from} into @var{to}.
	816
	817	The arguments and the forms in each argument are processed in order,
	818	and if a previous form already translates @var{to} to some other
	819	character, say @var{to-alt}, @var{from} is also translated to
	820	@var{to-alt}.
	821	@end defun
	822
	823	During decoding, the translation table's translations are applied to
	824	the characters that result from ordinary decoding. If a coding system
	825	has the property @code{:decode-translation-table}, that specifies the
	826	translation table to use, or a list of translation tables to apply in
	827	sequence. (This is a property of the coding system, as returned by
	828	@code{coding-system-get}, not a property of the symbol that is the
	829	coding system's name. @xref{Coding System Basics,, Basic Concepts of
	830	Coding Systems}.) Finally, if
	831	@code{standard-translation-table-for-decode} is non-@code{nil}, the
	832	resulting characters are translated by that table.
	833
	834	During encoding, the translation table's translations are applied to
	835	the characters in the buffer, and the result of translation is
	836	actually encoded. If a coding system has property
	837	@code{:encode-translation-table}, that specifies the translation table
	838	to use, or a list of translation tables to apply in sequence. In
	839	addition, if the variable @code{standard-translation-table-for-encode}
	840	is non-@code{nil}, it specifies the translation table to use for
	841	translating the result.
	842
	843	@defvar standard-translation-table-for-decode
	844	This is the default translation table for decoding. If a coding
	845	systems specifies its own translation tables, the table that is the
	846	value of this variable, if non-@code{nil}, is applied after them.
	847	@end defvar
	848
	849	@defvar standard-translation-table-for-encode
	850	This is the default translation table for encoding. If a coding
	851	systems specifies its own translation tables, the table that is the
	852	value of this variable, if non-@code{nil}, is applied after them.
	853	@end defvar
	854
	855	@c FIXME: This variable is obsolete since 23.1. We should mention
	856	@c that here or simply remove this defvar. --xfq
	857	@defvar translation-table-for-input
	858	Self-inserting characters are translated through this translation
	859	table before they are inserted. Search commands also translate their
	860	input through this table, so they can compare more reliably with
	861	what's in the buffer.
	862
	863	This variable automatically becomes buffer-local when set.
	864	@end defvar
	865
	866	@defun make-translation-table-from-vector vec
	867	This function returns a translation table made from @var{vec} that is
	868	an array of 256 elements to map bytes (values 0 through #xFF) to
	869	characters. Elements may be @code{nil} for untranslated bytes. The
	870	returned table has a translation table for reverse mapping in the
	871	first extra slot, and the value @code{1} in the second extra slot.
	872
	873	This function provides an easy way to make a private coding system
	874	that maps each byte to a specific character. You can specify the
	875	returned table and the reverse translation table using the properties
	876	@code{:decode-translation-table} and @code{:encode-translation-table}
	877	respectively in the @var{props} argument to
	878	@code{define-coding-system}.
	879	@end defun
	880
	881	@defun make-translation-table-from-alist alist
	882	This function is similar to @code{make-translation-table} but returns
	883	a complex translation table rather than a simple one-to-one mapping.
	884	Each element of @var{alist} is of the form @code{(@var{from}
	885	. @var{to})}, where @var{from} and @var{to} are either characters or
	886	vectors specifying a sequence of characters. If @var{from} is a
	887	character, that character is translated to @var{to} (i.e., to a
	888	character or a character sequence). If @var{from} is a vector of
	889	characters, that sequence is translated to @var{to}. The returned
	890	table has a translation table for reverse mapping in the first extra
	891	slot, and the maximum length of all the @var{from} character sequences
	892	in the second extra slot.
	893	@end defun
	894
	895	@node Coding Systems
	896	@section Coding Systems
	897
	898	@cindex coding system
	899	When Emacs reads or writes a file, and when Emacs sends text to a
	900	subprocess or receives text from a subprocess, it normally performs
	901	character code conversion and end-of-line conversion as specified
	902	by a particular @dfn{coding system}.
	903
	904	How to define a coding system is an arcane matter, and is not
	905	documented here.
	906
	907	@menu
	908	* Coding System Basics:: Basic concepts.
	909	* Encoding and I/O:: How file I/O functions handle coding systems.
	910	* Lisp and Coding Systems:: Functions to operate on coding system names.
	911	* User-Chosen Coding Systems:: Asking the user to choose a coding system.
	912	* Default Coding Systems:: Controlling the default choices.
	913	* Specifying Coding Systems:: Requesting a particular coding system
	914	for a single file operation.
	915	* Explicit Encoding:: Encoding or decoding text without doing I/O.
	916	* Terminal I/O Encoding:: Use of encoding for terminal I/O.
	917	@end menu
	918
	919	@node Coding System Basics
	920	@subsection Basic Concepts of Coding Systems
	921
	922	@cindex character code conversion
	923	@dfn{Character code conversion} involves conversion between the
	924	internal representation of characters used inside Emacs and some other
	925	encoding. Emacs supports many different encodings, in that it can
	926	convert to and from them. For example, it can convert text to or from
	927	encodings such as Latin 1, Latin 2, Latin 3, Latin 4, Latin 5, and
	928	several variants of ISO 2022. In some cases, Emacs supports several
	929	alternative encodings for the same characters; for example, there are
	930	three coding systems for the Cyrillic (Russian) alphabet: ISO,
	931	Alternativnyj, and KOI8.
	932
	933	Every coding system specifies a particular set of character code
	934	conversions, but the coding system @code{undecided} is special: it
	935	leaves the choice unspecified, to be chosen heuristically for each
	936	file, based on the file's data.
	937
	938	In general, a coding system doesn't guarantee roundtrip identity:
	939	decoding a byte sequence using coding system, then encoding the
	940	resulting text in the same coding system, can produce a different byte
	941	sequence. But some coding systems do guarantee that the byte sequence
	942	will be the same as what you originally decoded. Here are a few
	943	examples:
	944
	945	@quotation
	946	iso-8859-1, utf-8, big5, shift_jis, euc-jp
	947	@end quotation
	948
	949	Encoding buffer text and then decoding the result can also fail to
	950	reproduce the original text. For instance, if you encode a character
	951	with a coding system which does not support that character, the result
	952	is unpredictable, and thus decoding it using the same coding system
	953	may produce a different text. Currently, Emacs can't report errors
	954	that result from encoding unsupported characters.
	955
	956	@cindex EOL conversion
	957	@cindex end-of-line conversion
	958	@cindex line end conversion
	959	@dfn{End of line conversion} handles three different conventions
	960	used on various systems for representing end of line in files. The
	961	Unix convention, used on GNU and Unix systems, is to use the linefeed
	962	character (also called newline). The DOS convention, used on
	963	MS-Windows and MS-DOS systems, is to use a carriage-return and a
	964	linefeed at the end of a line. The Mac convention is to use just
	965	carriage-return. (This was the convention used on the Macintosh
	966	system prior to OS X.)
	967
	968	@cindex base coding system
	969	@cindex variant coding system
	970	@dfn{Base coding systems} such as @code{latin-1} leave the end-of-line
	971	conversion unspecified, to be chosen based on the data. @dfn{Variant
	972	coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and
	973	@code{latin-1-mac} specify the end-of-line conversion explicitly as
	974	well. Most base coding systems have three corresponding variants whose
	975	names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.
	976
	977	@vindex raw-text@r{ coding system}
	978	The coding system @code{raw-text} is special in that it prevents
	979	character code conversion, and causes the buffer visited with this
	980	coding system to be a unibyte buffer. For historical reasons, you can
	981	save both unibyte and multibyte text with this coding system. When
	982	you use @code{raw-text} to encode multibyte text, it does perform one
	983	character code conversion: it converts eight-bit characters to their
	984	single-byte external representation. @code{raw-text} does not specify
	985	the end-of-line conversion, allowing that to be determined as usual by
	986	the data, and has the usual three variants which specify the
	987	end-of-line conversion.
	988
	989	@vindex no-conversion@r{ coding system}
	990	@vindex binary@r{ coding system}
	991	@code{no-conversion} (and its alias @code{binary}) is equivalent to
	992	@code{raw-text-unix}: it specifies no conversion of either character
	993	codes or end-of-line.
	994
	995	@vindex emacs-internal@r{ coding system}
	996	@vindex utf-8-emacs@r{ coding system}
	997	The coding system @code{utf-8-emacs} specifies that the data is
	998	represented in the internal Emacs encoding (@pxref{Text
	999	Representations}). This is like @code{raw-text} in that no code
	1000	conversion happens, but different in that the result is multibyte
	1001	data. The name @code{emacs-internal} is an alias for
	1002	@code{utf-8-emacs}.
	1003
	1004	@defun coding-system-get coding-system property
	1005	This function returns the specified property of the coding system
	1006	@var{coding-system}. Most coding system properties exist for internal
	1007	purposes, but one that you might find useful is @code{:mime-charset}.
	1008	That property's value is the name used in MIME for the character coding
	1009	which this coding system can read and write. Examples:
	1010
	1011	@example
	1012	(coding-system-get 'iso-latin-1 :mime-charset)
	1013	@result{} iso-8859-1
	1014	(coding-system-get 'iso-2022-cn :mime-charset)
	1015	@result{} iso-2022-cn
	1016	(coding-system-get 'cyrillic-koi8 :mime-charset)
	1017	@result{} koi8-r
	1018	@end example
	1019
	1020	The value of the @code{:mime-charset} property is also defined
	1021	as an alias for the coding system.
	1022	@end defun
	1023
	1024	@cindex alias, for coding systems
	1025	@defun coding-system-aliases coding-system
	1026	This function returns the list of aliases of @var{coding-system}.
	1027	@end defun
	1028
	1029	@node Encoding and I/O
	1030	@subsection Encoding and I/O
	1031
	1032	The principal purpose of coding systems is for use in reading and
	1033	writing files. The function @code{insert-file-contents} uses a coding
	1034	system to decode the file data, and @code{write-region} uses one to
	1035	encode the buffer contents.
	1036
	1037	You can specify the coding system to use either explicitly
	1038	(@pxref{Specifying Coding Systems}), or implicitly using a default
	1039	mechanism (@pxref{Default Coding Systems}). But these methods may not
	1040	completely specify what to do. For example, they may choose a coding
	1041	system such as @code{undefined} which leaves the character code
	1042	conversion to be determined from the data. In these cases, the I/O
	1043	operation finishes the job of choosing a coding system. Very often
	1044	you will want to find out afterwards which coding system was chosen.
	1045
	1046	@defvar buffer-file-coding-system
	1047	This buffer-local variable records the coding system used for saving the
	1048	buffer and for writing part of the buffer with @code{write-region}. If
	1049	the text to be written cannot be safely encoded using the coding system
	1050	specified by this variable, these operations select an alternative
	1051	encoding by calling the function @code{select-safe-coding-system}
	1052	(@pxref{User-Chosen Coding Systems}). If selecting a different encoding
	1053	requires to ask the user to specify a coding system,
	1054	@code{buffer-file-coding-system} is updated to the newly selected coding
	1055	system.
	1056
	1057	@code{buffer-file-coding-system} does @emph{not} affect sending text
	1058	to a subprocess.
	1059	@end defvar
	1060
	1061	@defvar save-buffer-coding-system
	1062	This variable specifies the coding system for saving the buffer (by
	1063	overriding @code{buffer-file-coding-system}). Note that it is not used
	1064	for @code{write-region}.
	1065
	1066	When a command to save the buffer starts out to use
	1067	@code{buffer-file-coding-system} (or @code{save-buffer-coding-system}),
	1068	and that coding system cannot handle
	1069	the actual text in the buffer, the command asks the user to choose
	1070	another coding system (by calling @code{select-safe-coding-system}).
	1071	After that happens, the command also updates
	1072	@code{buffer-file-coding-system} to represent the coding system that
	1073	the user specified.
	1074	@end defvar
	1075
	1076	@defvar last-coding-system-used
	1077	I/O operations for files and subprocesses set this variable to the
	1078	coding system name that was used. The explicit encoding and decoding
	1079	functions (@pxref{Explicit Encoding}) set it too.
	1080
	1081	@strong{Warning:} Since receiving subprocess output sets this variable,
	1082	it can change whenever Emacs waits; therefore, you should copy the
	1083	value shortly after the function call that stores the value you are
	1084	interested in.
	1085	@end defvar
	1086
	1087	The variable @code{selection-coding-system} specifies how to encode
	1088	selections for the window system. @xref{Window System Selections}.
	1089
	1090	@defvar file-name-coding-system
	1091	The variable @code{file-name-coding-system} specifies the coding
	1092	system to use for encoding file names. Emacs encodes file names using
	1093	that coding system for all file operations. If
	1094	@code{file-name-coding-system} is @code{nil}, Emacs uses a default
	1095	coding system determined by the selected language environment. In the
	1096	default language environment, any non-@acronym{ASCII} characters in
	1097	file names are not encoded specially; they appear in the file system
	1098	using the internal Emacs representation.
	1099	@end defvar
	1100
	1101	@strong{Warning:} if you change @code{file-name-coding-system} (or
	1102	the language environment) in the middle of an Emacs session, problems
	1103	can result if you have already visited files whose names were encoded
	1104	using the earlier coding system and are handled differently under the
	1105	new coding system. If you try to save one of these buffers under the
	1106	visited file name, saving may use the wrong file name, or it may get
	1107	an error. If such a problem happens, use @kbd{C-x C-w} to specify a
	1108	new file name for that buffer.
	1109
	1110	@cindex file-name encoding, MS-Windows
	1111	On Windows 2000 and later, Emacs by default uses Unicode APIs to
	1112	pass file names to the OS, so the value of
	1113	@code{file-name-coding-system} is largely ignored. Lisp applications
	1114	that need to encode or decode file names on the Lisp level should use
	1115	@code{utf-8} coding-system when @code{system-type} is
	1116	@code{windows-nt}; the conversion of UTF-8 encoded file names to the
	1117	encoding appropriate for communicating with the OS is performed
	1118	internally by Emacs.
	1119
	1120	@node Lisp and Coding Systems
	1121	@subsection Coding Systems in Lisp
	1122
	1123	Here are the Lisp facilities for working with coding systems:
	1124
	1125	@cindex list all coding systems
	1126	@defun coding-system-list &optional base-only
	1127	This function returns a list of all coding system names (symbols). If
	1128	@var{base-only} is non-@code{nil}, the value includes only the
	1129	base coding systems. Otherwise, it includes alias and variant coding
	1130	systems as well.
	1131	@end defun
	1132
	1133	@defun coding-system-p object
	1134	This function returns @code{t} if @var{object} is a coding system
	1135	name or @code{nil}.
	1136	@end defun
	1137
	1138	@cindex validity of coding system
	1139	@cindex coding system, validity check
	1140	@defun check-coding-system coding-system
	1141	This function checks the validity of @var{coding-system}. If that is
	1142	valid, it returns @var{coding-system}. If @var{coding-system} is
	1143	@code{nil}, the function return @code{nil}. For any other values, it
	1144	signals an error whose @code{error-symbol} is @code{coding-system-error}
	1145	(@pxref{Signaling Errors, signal}).
	1146	@end defun
	1147
	1148	@cindex eol type of coding system
	1149	@defun coding-system-eol-type coding-system
	1150	This function returns the type of end-of-line (a.k.a.@: @dfn{eol})
	1151	conversion used by @var{coding-system}. If @var{coding-system}
	1152	specifies a certain eol conversion, the return value is an integer 0,
	1153	1, or 2, standing for @code{unix}, @code{dos}, and @code{mac},
	1154	respectively. If @var{coding-system} doesn't specify eol conversion
	1155	explicitly, the return value is a vector of coding systems, each one
	1156	with one of the possible eol conversion types, like this:
	1157
	1158	@lisp
	1159	(coding-system-eol-type 'latin-1)
	1160	@result{} [latin-1-unix latin-1-dos latin-1-mac]
	1161	@end lisp
	1162
	1163	@noindent
	1164	If this function returns a vector, Emacs will decide, as part of the
	1165	text encoding or decoding process, what eol conversion to use. For
	1166	decoding, the end-of-line format of the text is auto-detected, and the
	1167	eol conversion is set to match it (e.g., DOS-style CRLF format will
	1168	imply @code{dos} eol conversion). For encoding, the eol conversion is
	1169	taken from the appropriate default coding system (e.g.,
	1170	default value of @code{buffer-file-coding-system} for
	1171	@code{buffer-file-coding-system}), or from the default eol conversion
	1172	appropriate for the underlying platform.
	1173	@end defun
	1174
	1175	@cindex eol conversion of coding system
	1176	@defun coding-system-change-eol-conversion coding-system eol-type
	1177	This function returns a coding system which is like @var{coding-system}
	1178	except for its eol conversion, which is specified by @code{eol-type}.
	1179	@var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or
	1180	@code{nil}. If it is @code{nil}, the returned coding system determines
	1181	the end-of-line conversion from the data.
	1182
	1183	@var{eol-type} may also be 0, 1 or 2, standing for @code{unix},
	1184	@code{dos} and @code{mac}, respectively.
	1185	@end defun
	1186
	1187	@cindex text conversion of coding system
	1188	@defun coding-system-change-text-conversion eol-coding text-coding
	1189	This function returns a coding system which uses the end-of-line
	1190	conversion of @var{eol-coding}, and the text conversion of
	1191	@var{text-coding}. If @var{text-coding} is @code{nil}, it returns
	1192	@code{undecided}, or one of its variants according to @var{eol-coding}.
	1193	@end defun
	1194
	1195	@cindex safely encode region
	1196	@cindex coding systems for encoding region
	1197	@defun find-coding-systems-region from to
	1198	This function returns a list of coding systems that could be used to
	1199	encode a text between @var{from} and @var{to}. All coding systems in
	1200	the list can safely encode any multibyte characters in that portion of
	1201	the text.
	1202
	1203	If the text contains no multibyte characters, the function returns the
	1204	list @code{(undecided)}.
	1205	@end defun
	1206
	1207	@cindex safely encode a string
	1208	@cindex coding systems for encoding a string
	1209	@defun find-coding-systems-string string
	1210	This function returns a list of coding systems that could be used to
	1211	encode the text of @var{string}. All coding systems in the list can
	1212	safely encode any multibyte characters in @var{string}. If the text
	1213	contains no multibyte characters, this returns the list
	1214	@code{(undecided)}.
	1215	@end defun
	1216
	1217	@cindex charset, coding systems to encode
	1218	@cindex safely encode characters in a charset
	1219	@defun find-coding-systems-for-charsets charsets
	1220	This function returns a list of coding systems that could be used to
	1221	encode all the character sets in the list @var{charsets}.
	1222	@end defun
	1223
	1224	@defun check-coding-systems-region start end coding-system-list
	1225	This function checks whether coding systems in the list
	1226	@code{coding-system-list} can encode all the characters in the region
	1227	between @var{start} and @var{end}. If all of the coding systems in
	1228	the list can encode the specified text, the function returns
	1229	@code{nil}. If some coding systems cannot encode some of the
	1230	characters, the value is an alist, each element of which has the form
	1231	@code{(@var{coding-system1} @var{pos1} @var{pos2} @dots{})}, meaning
	1232	that @var{coding-system1} cannot encode characters at buffer positions
	1233	@var{pos1}, @var{pos2}, @enddots{}.
	1234
	1235	@var{start} may be a string, in which case @var{end} is ignored and
	1236	the returned value references string indices instead of buffer
	1237	positions.
	1238	@end defun
	1239
	1240	@defun detect-coding-region start end &optional highest
	1241	This function chooses a plausible coding system for decoding the text
	1242	from @var{start} to @var{end}. This text should be a byte sequence,
	1243	i.e., unibyte text or multibyte text with only @acronym{ASCII} and
	1244	eight-bit characters (@pxref{Explicit Encoding}).
	1245
	1246	Normally this function returns a list of coding systems that could
	1247	handle decoding the text that was scanned. They are listed in order of
	1248	decreasing priority. But if @var{highest} is non-@code{nil}, then the
	1249	return value is just one coding system, the one that is highest in
	1250	priority.
	1251
	1252	If the region contains only @acronym{ASCII} characters except for such
	1253	ISO-2022 control characters ISO-2022 as @code{ESC}, the value is
	1254	@code{undecided} or @code{(undecided)}, or a variant specifying
	1255	end-of-line conversion, if that can be deduced from the text.
	1256
	1257	If the region contains null bytes, the value is @code{no-conversion},
	1258	even if the region contains text encoded in some coding system.
	1259	@end defun
	1260
	1261	@defun detect-coding-string string &optional highest
	1262	This function is like @code{detect-coding-region} except that it
	1263	operates on the contents of @var{string} instead of bytes in the buffer.
	1264	@end defun
	1265
	1266	@cindex null bytes, and decoding text
	1267	@defvar inhibit-null-byte-detection
	1268	If this variable has a non-@code{nil} value, null bytes are ignored
	1269	when detecting the encoding of a region or a string. This allows to
	1270	correctly detect the encoding of text that contains null bytes, such
	1271	as Info files with Index nodes.
	1272	@end defvar
	1273
	1274	@defvar inhibit-iso-escape-detection
	1275	If this variable has a non-@code{nil} value, ISO-2022 escape sequences
	1276	are ignored when detecting the encoding of a region or a string. The
	1277	result is that no text is ever detected as encoded in some ISO-2022
	1278	encoding, and all escape sequences become visible in a buffer.
	1279	@strong{Warning:} @emph{Use this variable with extreme caution,
	1280	because many files in the Emacs distribution use ISO-2022 encoding.}
	1281	@end defvar
	1282
	1283	@cindex charsets supported by a coding system
	1284	@defun coding-system-charset-list coding-system
	1285	This function returns the list of character sets (@pxref{Character
	1286	Sets}) supported by @var{coding-system}. Some coding systems that
	1287	support too many character sets to list them all yield special values:
	1288	@itemize @bullet
	1289	@item
	1290	If @var{coding-system} supports all Emacs characters, the value is
	1291	@code{(emacs)}.
	1292	@item
	1293	If @var{coding-system} supports all Unicode characters, the value is
	1294	@code{(unicode)}.
	1295	@item
	1296	If @var{coding-system} supports all ISO-2022 charsets, the value is
	1297	@code{iso-2022}.
	1298	@item
	1299	If @var{coding-system} supports all the characters in the internal
	1300	coding system used by Emacs version 21 (prior to the implementation of
	1301	internal Unicode support), the value is @code{emacs-mule}.
	1302	@end itemize
	1303	@end defun
	1304
	1305	@xref{Coding systems for a subprocess,, Process Information}, in
	1306	particular the description of the functions
	1307	@code{process-coding-system} and @code{set-process-coding-system}, for
	1308	how to examine or set the coding systems used for I/O to a subprocess.
	1309
	1310	@node User-Chosen Coding Systems
	1311	@subsection User-Chosen Coding Systems
	1312
	1313	@cindex select safe coding system
	1314	@defun select-safe-coding-system from to &optional default-coding-system accept-default-p file
	1315	This function selects a coding system for encoding specified text,
	1316	asking the user to choose if necessary. Normally the specified text
	1317	is the text in the current buffer between @var{from} and @var{to}. If
	1318	@var{from} is a string, the string specifies the text to encode, and
	1319	@var{to} is ignored.
	1320
	1321	If the specified text includes raw bytes (@pxref{Text
	1322	Representations}), @code{select-safe-coding-system} suggests
	1323	@code{raw-text} for its encoding.
	1324
	1325	If @var{default-coding-system} is non-@code{nil}, that is the first
	1326	coding system to try; if that can handle the text,
	1327	@code{select-safe-coding-system} returns that coding system. It can
	1328	also be a list of coding systems; then the function tries each of them
	1329	one by one. After trying all of them, it next tries the current
	1330	buffer's value of @code{buffer-file-coding-system} (if it is not
	1331	@code{undecided}), then the default value of
	1332	@code{buffer-file-coding-system} and finally the user's most
	1333	preferred coding system, which the user can set using the command
	1334	@code{prefer-coding-system} (@pxref{Recognize Coding,, Recognizing
	1335	Coding Systems, emacs, The GNU Emacs Manual}).
	1336
	1337	If one of those coding systems can safely encode all the specified
	1338	text, @code{select-safe-coding-system} chooses it and returns it.
	1339	Otherwise, it asks the user to choose from a list of coding systems
	1340	which can encode all the text, and returns the user's choice.
	1341
	1342	@var{default-coding-system} can also be a list whose first element is
	1343	t and whose other elements are coding systems. Then, if no coding
	1344	system in the list can handle the text, @code{select-safe-coding-system}
	1345	queries the user immediately, without trying any of the three
	1346	alternatives described above.
	1347
	1348	The optional argument @var{accept-default-p}, if non-@code{nil},
	1349	should be a function to determine whether a coding system selected
	1350	without user interaction is acceptable. @code{select-safe-coding-system}
	1351	calls this function with one argument, the base coding system of the
	1352	selected coding system. If @var{accept-default-p} returns @code{nil},
	1353	@code{select-safe-coding-system} rejects the silently selected coding
	1354	system, and asks the user to select a coding system from a list of
	1355	possible candidates.
	1356
	1357	@vindex select-safe-coding-system-accept-default-p
	1358	If the variable @code{select-safe-coding-system-accept-default-p} is
	1359	non-@code{nil}, it should be a function taking a single argument.
	1360	It is used in place of @var{accept-default-p}, overriding any
	1361	value supplied for this argument.
	1362
	1363	As a final step, before returning the chosen coding system,
	1364	@code{select-safe-coding-system} checks whether that coding system is
	1365	consistent with what would be selected if the contents of the region
	1366	were read from a file. (If not, this could lead to data corruption in
	1367	a file subsequently re-visited and edited.) Normally,
	1368	@code{select-safe-coding-system} uses @code{buffer-file-name} as the
	1369	file for this purpose, but if @var{file} is non-@code{nil}, it uses
	1370	that file instead (this can be relevant for @code{write-region} and
	1371	similar functions). If it detects an apparent inconsistency,
	1372	@code{select-safe-coding-system} queries the user before selecting the
	1373	coding system.
	1374	@end defun
	1375
	1376	Here are two functions you can use to let the user specify a coding
	1377	system, with completion. @xref{Completion}.
	1378
	1379	@defun read-coding-system prompt &optional default
	1380	This function reads a coding system using the minibuffer, prompting with
	1381	string @var{prompt}, and returns the coding system name as a symbol. If
	1382	the user enters null input, @var{default} specifies which coding system
	1383	to return. It should be a symbol or a string.
	1384	@end defun
	1385
	1386	@defun read-non-nil-coding-system prompt
	1387	This function reads a coding system using the minibuffer, prompting with
	1388	string @var{prompt}, and returns the coding system name as a symbol. If
	1389	the user tries to enter null input, it asks the user to try again.
	1390	@xref{Coding Systems}.
	1391	@end defun
	1392
	1393	@node Default Coding Systems
	1394	@subsection Default Coding Systems
	1395	@cindex default coding system
	1396	@cindex coding system, automatically determined
	1397
	1398	This section describes variables that specify the default coding
	1399	system for certain files or when running certain subprograms, and the
	1400	function that I/O operations use to access them.
	1401
	1402	The idea of these variables is that you set them once and for all to the
	1403	defaults you want, and then do not change them again. To specify a
	1404	particular coding system for a particular operation in a Lisp program,
	1405	don't change these variables; instead, override them using
	1406	@code{coding-system-for-read} and @code{coding-system-for-write}
	1407	(@pxref{Specifying Coding Systems}).
	1408
	1409	@cindex file contents, and default coding system
	1410	@defopt auto-coding-regexp-alist
	1411	This variable is an alist of text patterns and corresponding coding
	1412	systems. Each element has the form @code{(@var{regexp}
	1413	. @var{coding-system})}; a file whose first few kilobytes match
	1414	@var{regexp} is decoded with @var{coding-system} when its contents are
	1415	read into a buffer. The settings in this alist take priority over
	1416	@code{coding:} tags in the files and the contents of
	1417	@code{file-coding-system-alist} (see below). The default value is set
	1418	so that Emacs automatically recognizes mail files in Babyl format and
	1419	reads them with no code conversions.
	1420	@end defopt
	1421
	1422	@cindex file name, and default coding system
	1423	@defopt file-coding-system-alist
	1424	This variable is an alist that specifies the coding systems to use for
	1425	reading and writing particular files. Each element has the form
	1426	@code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular
	1427	expression that matches certain file names. The element applies to file
	1428	names that match @var{pattern}.
	1429
	1430	The @sc{cdr} of the element, @var{coding}, should be either a coding
	1431	system, a cons cell containing two coding systems, or a function name (a
	1432	symbol with a function definition). If @var{coding} is a coding system,
	1433	that coding system is used for both reading the file and writing it. If
	1434	@var{coding} is a cons cell containing two coding systems, its @sc{car}
	1435	specifies the coding system for decoding, and its @sc{cdr} specifies the
	1436	coding system for encoding.
	1437
	1438	If @var{coding} is a function name, the function should take one
	1439	argument, a list of all arguments passed to
	1440	@code{find-operation-coding-system}. It must return a coding system
	1441	or a cons cell containing two coding systems. This value has the same
	1442	meaning as described above.
	1443
	1444	If @var{coding} (or what returned by the above function) is
	1445	@code{undecided}, the normal code-detection is performed.
	1446	@end defopt
	1447
	1448	@defopt auto-coding-alist
	1449	This variable is an alist that specifies the coding systems to use for
	1450	reading and writing particular files. Its form is like that of
	1451	@code{file-coding-system-alist}, but, unlike the latter, this variable
	1452	takes priority over any @code{coding:} tags in the file.
	1453	@end defopt
	1454
	1455	@cindex program name, and default coding system
	1456	@defvar process-coding-system-alist
	1457	This variable is an alist specifying which coding systems to use for a
	1458	subprocess, depending on which program is running in the subprocess. It
	1459	works like @code{file-coding-system-alist}, except that @var{pattern} is
	1460	matched against the program name used to start the subprocess. The coding
	1461	system or systems specified in this alist are used to initialize the
	1462	coding systems used for I/O to the subprocess, but you can specify
	1463	other coding systems later using @code{set-process-coding-system}.
	1464	@end defvar
	1465
	1466	@strong{Warning:} Coding systems such as @code{undecided}, which
	1467	determine the coding system from the data, do not work entirely reliably
	1468	with asynchronous subprocess output. This is because Emacs handles
	1469	asynchronous subprocess output in batches, as it arrives. If the coding
	1470	system leaves the character code conversion unspecified, or leaves the
	1471	end-of-line conversion unspecified, Emacs must try to detect the proper
	1472	conversion from one batch at a time, and this does not always work.
	1473
	1474	Therefore, with an asynchronous subprocess, if at all possible, use a
	1475	coding system which determines both the character code conversion and
	1476	the end of line conversion---that is, one like @code{latin-1-unix},
	1477	rather than @code{undecided} or @code{latin-1}.
	1478
	1479	@cindex port number, and default coding system
	1480	@cindex network service name, and default coding system
	1481	@defvar network-coding-system-alist
	1482	This variable is an alist that specifies the coding system to use for
	1483	network streams. It works much like @code{file-coding-system-alist},
	1484	with the difference that the @var{pattern} in an element may be either a
	1485	port number or a regular expression. If it is a regular expression, it
	1486	is matched against the network service name used to open the network
	1487	stream.
	1488	@end defvar
	1489
	1490	@defvar default-process-coding-system
	1491	This variable specifies the coding systems to use for subprocess (and
	1492	network stream) input and output, when nothing else specifies what to
	1493	do.
	1494
	1495	The value should be a cons cell of the form @code{(@var{input-coding}
	1496	. @var{output-coding})}. Here @var{input-coding} applies to input from
	1497	the subprocess, and @var{output-coding} applies to output to it.
	1498	@end defvar
	1499
	1500	@cindex default coding system, functions to determine
	1501	@defopt auto-coding-functions
	1502	This variable holds a list of functions that try to determine a
	1503	coding system for a file based on its undecoded contents.
	1504
	1505	Each function in this list should be written to look at text in the
	1506	current buffer, but should not modify it in any way. The buffer will
	1507	contain undecoded text of parts of the file. Each function should
	1508	take one argument, @var{size}, which tells it how many characters to
	1509	look at, starting from point. If the function succeeds in determining
	1510	a coding system for the file, it should return that coding system.
	1511	Otherwise, it should return @code{nil}.
	1512
	1513	If a file has a @samp{coding:} tag, that takes precedence, so these
	1514	functions won't be called.
	1515	@end defopt
	1516
	1517	@defun find-auto-coding filename size
	1518	This function tries to determine a suitable coding system for
	1519	@var{filename}. It examines the buffer visiting the named file, using
	1520	the variables documented above in sequence, until it finds a match for
	1521	one of the rules specified by these variables. It then returns a cons
	1522	cell of the form @code{(@var{coding} . @var{source})}, where
	1523	@var{coding} is the coding system to use and @var{source} is a symbol,
	1524	one of @code{auto-coding-alist}, @code{auto-coding-regexp-alist},
	1525	@code{:coding}, or @code{auto-coding-functions}, indicating which one
	1526	supplied the matching rule. The value @code{:coding} means the coding
	1527	system was specified by the @code{coding:} tag in the file
	1528	(@pxref{Specify Coding,, coding tag, emacs, The GNU Emacs Manual}).
	1529	The order of looking for a matching rule is @code{auto-coding-alist}
	1530	first, then @code{auto-coding-regexp-alist}, then the @code{coding:}
	1531	tag, and lastly @code{auto-coding-functions}. If no matching rule was
	1532	found, the function returns @code{nil}.
	1533
	1534	The second argument @var{size} is the size of text, in characters,
	1535	following point. The function examines text only within @var{size}
	1536	characters after point. Normally, the buffer should be positioned at
	1537	the beginning when this function is called, because one of the places
	1538	for the @code{coding:} tag is the first one or two lines of the file;
	1539	in that case, @var{size} should be the size of the buffer.
	1540	@end defun
	1541
	1542	@defun set-auto-coding filename size
	1543	This function returns a suitable coding system for file
	1544	@var{filename}. It uses @code{find-auto-coding} to find the coding
	1545	system. If no coding system could be determined, the function returns
	1546	@code{nil}. The meaning of the argument @var{size} is like in
	1547	@code{find-auto-coding}.
	1548	@end defun
	1549
	1550	@defun find-operation-coding-system operation &rest arguments
	1551	This function returns the coding system to use (by default) for
	1552	performing @var{operation} with @var{arguments}. The value has this
	1553	form:
	1554
	1555	@example
	1556	(@var{decoding-system} . @var{encoding-system})
	1557	@end example
	1558
	1559	The first element, @var{decoding-system}, is the coding system to use
	1560	for decoding (in case @var{operation} does decoding), and
	1561	@var{encoding-system} is the coding system for encoding (in case
	1562	@var{operation} does encoding).
	1563
	1564	The argument @var{operation} is a symbol; it should be one of
	1565	@code{write-region}, @code{start-process}, @code{call-process},
	1566	@code{call-process-region}, @code{insert-file-contents}, or
	1567	@code{open-network-stream}. These are the names of the Emacs I/O
	1568	primitives that can do character code and eol conversion.
	1569
	1570	The remaining arguments should be the same arguments that might be given
	1571	to the corresponding I/O primitive. Depending on the primitive, one
	1572	of those arguments is selected as the @dfn{target}. For example, if
	1573	@var{operation} does file I/O, whichever argument specifies the file
	1574	name is the target. For subprocess primitives, the process name is the
	1575	target. For @code{open-network-stream}, the target is the service name
	1576	or port number.
	1577
	1578	Depending on @var{operation}, this function looks up the target in
	1579	@code{file-coding-system-alist}, @code{process-coding-system-alist},
	1580	or @code{network-coding-system-alist}. If the target is found in the
	1581	alist, @code{find-operation-coding-system} returns its association in
	1582	the alist; otherwise it returns @code{nil}.
	1583
	1584	If @var{operation} is @code{insert-file-contents}, the argument
	1585	corresponding to the target may be a cons cell of the form
	1586	@code{(@var{filename} . @var{buffer})}. In that case, @var{filename}
	1587	is a file name to look up in @code{file-coding-system-alist}, and
	1588	@var{buffer} is a buffer that contains the file's contents (not yet
	1589	decoded). If @code{file-coding-system-alist} specifies a function to
	1590	call for this file, and that function needs to examine the file's
	1591	contents (as it usually does), it should examine the contents of
	1592	@var{buffer} instead of reading the file.
	1593	@end defun
	1594
	1595	@node Specifying Coding Systems
	1596	@subsection Specifying a Coding System for One Operation
	1597
	1598	You can specify the coding system for a specific operation by binding
	1599	the variables @code{coding-system-for-read} and/or
	1600	@code{coding-system-for-write}.
	1601
	1602	@defvar coding-system-for-read
	1603	If this variable is non-@code{nil}, it specifies the coding system to
	1604	use for reading a file, or for input from a synchronous subprocess.
	1605
	1606	It also applies to any asynchronous subprocess or network stream, but in
	1607	a different way: the value of @code{coding-system-for-read} when you
	1608	start the subprocess or open the network stream specifies the input
	1609	decoding method for that subprocess or network stream. It remains in
	1610	use for that subprocess or network stream unless and until overridden.
	1611
	1612	The right way to use this variable is to bind it with @code{let} for a
	1613	specific I/O operation. Its global value is normally @code{nil}, and
	1614	you should not globally set it to any other value. Here is an example
	1615	of the right way to use the variable:
	1616
	1617	@example
	1618	;; @r{Read the file with no character code conversion.}
	1619	(let ((coding-system-for-read 'no-conversion))
	1620	(insert-file-contents filename))
	1621	@end example
	1622
	1623	When its value is non-@code{nil}, this variable takes precedence over
	1624	all other methods of specifying a coding system to use for input,
	1625	including @code{file-coding-system-alist},
	1626	@code{process-coding-system-alist} and
	1627	@code{network-coding-system-alist}.
	1628	@end defvar
	1629
	1630	@defvar coding-system-for-write
	1631	This works much like @code{coding-system-for-read}, except that it
	1632	applies to output rather than input. It affects writing to files,
	1633	as well as sending output to subprocesses and net connections.
	1634
	1635	When a single operation does both input and output, as do
	1636	@code{call-process-region} and @code{start-process}, both
	1637	@code{coding-system-for-read} and @code{coding-system-for-write}
	1638	affect it.
	1639	@end defvar
	1640
	1641	@defopt inhibit-eol-conversion
	1642	When this variable is non-@code{nil}, no end-of-line conversion is done,
	1643	no matter which coding system is specified. This applies to all the
	1644	Emacs I/O and subprocess primitives, and to the explicit encoding and
	1645	decoding functions (@pxref{Explicit Encoding}).
	1646	@end defopt
	1647
	1648	@cindex priority order of coding systems
	1649	@cindex coding systems, priority
	1650	Sometimes, you need to prefer several coding systems for some
	1651	operation, rather than fix a single one. Emacs lets you specify a
	1652	priority order for using coding systems. This ordering affects the
	1653	sorting of lists of coding systems returned by functions such as
	1654	@code{find-coding-systems-region} (@pxref{Lisp and Coding Systems}).
	1655
	1656	@defun coding-system-priority-list &optional highestp
	1657	This function returns the list of coding systems in the order of their
	1658	current priorities. Optional argument @var{highestp}, if
	1659	non-@code{nil}, means return only the highest priority coding system.
	1660	@end defun
	1661
	1662	@defun set-coding-system-priority &rest coding-systems
	1663	This function puts @var{coding-systems} at the beginning of the
	1664	priority list for coding systems, thus making their priority higher
	1665	than all the rest.
	1666	@end defun
	1667
	1668	@defmac with-coding-priority coding-systems &rest body@dots{}
	1669	This macro execute @var{body}, like @code{progn} does
	1670	(@pxref{Sequencing, progn}), with @var{coding-systems} at the front of
	1671	the priority list for coding systems. @var{coding-systems} should be
	1672	a list of coding systems to prefer during execution of @var{body}.
	1673	@end defmac
	1674
	1675	@node Explicit Encoding
	1676	@subsection Explicit Encoding and Decoding
	1677	@cindex encoding in coding systems
	1678	@cindex decoding in coding systems
	1679
	1680	All the operations that transfer text in and out of Emacs have the
	1681	ability to use a coding system to encode or decode the text.
	1682	You can also explicitly encode and decode text using the functions
	1683	in this section.
	1684
	1685	The result of encoding, and the input to decoding, are not ordinary
	1686	text. They logically consist of a series of byte values; that is, a
	1687	series of @acronym{ASCII} and eight-bit characters. In unibyte
	1688	buffers and strings, these characters have codes in the range 0
	1689	through #xFF (255). In a multibyte buffer or string, eight-bit
	1690	characters have character codes higher than #xFF (@pxref{Text
	1691	Representations}), but Emacs transparently converts them to their
	1692	single-byte values when you encode or decode such text.
	1693
	1694	The usual way to read a file into a buffer as a sequence of bytes, so
	1695	you can decode the contents explicitly, is with
	1696	@code{insert-file-contents-literally} (@pxref{Reading from Files});
	1697	alternatively, specify a non-@code{nil} @var{rawfile} argument when
	1698	visiting a file with @code{find-file-noselect}. These methods result in
	1699	a unibyte buffer.
	1700
	1701	The usual way to use the byte sequence that results from explicitly
	1702	encoding text is to copy it to a file or process---for example, to write
	1703	it with @code{write-region} (@pxref{Writing to Files}), and suppress
	1704	encoding by binding @code{coding-system-for-write} to
	1705	@code{no-conversion}.
	1706
	1707	Here are the functions to perform explicit encoding or decoding. The
	1708	encoding functions produce sequences of bytes; the decoding functions
	1709	are meant to operate on sequences of bytes. All of these functions
	1710	discard text properties. They also set @code{last-coding-system-used}
	1711	to the precise coding system they used.
	1712
	1713	@deffn Command encode-coding-region start end coding-system &optional destination
	1714	This command encodes the text from @var{start} to @var{end} according
	1715	to coding system @var{coding-system}. Normally, the encoded text
	1716	replaces the original text in the buffer, but the optional argument
	1717	@var{destination} can change that. If @var{destination} is a buffer,
	1718	the encoded text is inserted in that buffer after point (point does
	1719	not move); if it is @code{t}, the command returns the encoded text as
	1720	a unibyte string without inserting it.
	1721
	1722	If encoded text is inserted in some buffer, this command returns the
	1723	length of the encoded text.
	1724
	1725	The result of encoding is logically a sequence of bytes, but the
	1726	buffer remains multibyte if it was multibyte before, and any 8-bit
	1727	bytes are converted to their multibyte representation (@pxref{Text
	1728	Representations}).
	1729
	1730	@cindex @code{undecided} coding-system, when encoding
	1731	Do @emph{not} use @code{undecided} for @var{coding-system} when
	1732	encoding text, since that may lead to unexpected results. Instead,
	1733	use @code{select-safe-coding-system} (@pxref{User-Chosen Coding
	1734	Systems, select-safe-coding-system}) to suggest a suitable encoding,
	1735	if there's no obvious pertinent value for @var{coding-system}.
	1736	@end deffn
	1737
	1738	@defun encode-coding-string string coding-system &optional nocopy buffer
	1739	This function encodes the text in @var{string} according to coding
	1740	system @var{coding-system}. It returns a new string containing the
	1741	encoded text, except when @var{nocopy} is non-@code{nil}, in which
	1742	case the function may return @var{string} itself if the encoding
	1743	operation is trivial. The result of encoding is a unibyte string.
	1744	@end defun
	1745
	1746	@deffn Command decode-coding-region start end coding-system &optional destination
	1747	This command decodes the text from @var{start} to @var{end} according
	1748	to coding system @var{coding-system}. To make explicit decoding
	1749	useful, the text before decoding ought to be a sequence of byte
	1750	values, but both multibyte and unibyte buffers are acceptable (in the
	1751	multibyte case, the raw byte values should be represented as eight-bit
	1752	characters). Normally, the decoded text replaces the original text in
	1753	the buffer, but the optional argument @var{destination} can change
	1754	that. If @var{destination} is a buffer, the decoded text is inserted
	1755	in that buffer after point (point does not move); if it is @code{t},
	1756	the command returns the decoded text as a multibyte string without
	1757	inserting it.
	1758
	1759	If decoded text is inserted in some buffer, this command returns the
	1760	length of the decoded text.
	1761
	1762	This command puts a @code{charset} text property on the decoded text.
	1763	The value of the property states the character set used to decode the
	1764	original text.
	1765	@end deffn
	1766
	1767	@defun decode-coding-string string coding-system &optional nocopy buffer
	1768	This function decodes the text in @var{string} according to
	1769	@var{coding-system}. It returns a new string containing the decoded
	1770	text, except when @var{nocopy} is non-@code{nil}, in which case the
	1771	function may return @var{string} itself if the decoding operation is
	1772	trivial. To make explicit decoding useful, the contents of
	1773	@var{string} ought to be a unibyte string with a sequence of byte
	1774	values, but a multibyte string is also acceptable (assuming it
	1775	contains 8-bit bytes in their multibyte form).
	1776
	1777	If optional argument @var{buffer} specifies a buffer, the decoded text
	1778	is inserted in that buffer after point (point does not move). In this
	1779	case, the return value is the length of the decoded text.
	1780
	1781	@cindex @code{charset}, text property
	1782	This function puts a @code{charset} text property on the decoded text.
	1783	The value of the property states the character set used to decode the
	1784	original text:
	1785
	1786	@example
	1787	@group
	1788	(decode-coding-string "Gr\374ss Gott" 'latin-1)
	1789	@result{} #("Gr@"uss Gott" 0 9 (charset iso-8859-1))
	1790	@end group
	1791	@end example
	1792	@end defun
	1793
	1794	@defun decode-coding-inserted-region from to filename &optional visit beg end replace
	1795	This function decodes the text from @var{from} to @var{to} as if
	1796	it were being read from file @var{filename} using @code{insert-file-contents}
	1797	using the rest of the arguments provided.
	1798
	1799	The normal way to use this function is after reading text from a file
	1800	without decoding, if you decide you would rather have decoded it.
	1801	Instead of deleting the text and reading it again, this time with
	1802	decoding, you can call this function.
	1803	@end defun
	1804
	1805	@node Terminal I/O Encoding
	1806	@subsection Terminal I/O Encoding
	1807
	1808	Emacs can use coding systems to decode keyboard input and encode
	1809	terminal output. This is useful for terminals that transmit or
	1810	display text using a particular encoding, such as Latin-1. Emacs does
	1811	not set @code{last-coding-system-used} when encoding or decoding
	1812	terminal I/O.
	1813
	1814	@defun keyboard-coding-system &optional terminal
	1815	This function returns the coding system used for decoding keyboard
	1816	input from @var{terminal}. A value of @code{no-conversion} means no
	1817	decoding is done. If @var{terminal} is omitted or @code{nil}, it
	1818	means the selected frame's terminal. @xref{Multiple Terminals}.
	1819	@end defun
	1820
	1821	@deffn Command set-keyboard-coding-system coding-system &optional terminal
	1822	This command specifies @var{coding-system} as the coding system to use
	1823	for decoding keyboard input from @var{terminal}. If
	1824	@var{coding-system} is @code{nil}, that means not to decode keyboard
	1825	input. If @var{terminal} is a frame, it means that frame's terminal;
	1826	if it is @code{nil}, that means the currently selected frame's
	1827	terminal. @xref{Multiple Terminals}.
	1828	@end deffn
	1829
	1830	@defun terminal-coding-system &optional terminal
	1831	This function returns the coding system that is in use for encoding
	1832	terminal output from @var{terminal}. A value of @code{no-conversion}
	1833	means no encoding is done. If @var{terminal} is a frame, it means
	1834	that frame's terminal; if it is @code{nil}, that means the currently
	1835	selected frame's terminal.
	1836	@end defun
	1837
	1838	@deffn Command set-terminal-coding-system coding-system &optional terminal
	1839	This command specifies @var{coding-system} as the coding system to use
	1840	for encoding terminal output from @var{terminal}. If
	1841	@var{coding-system} is @code{nil}, that means not to encode terminal
	1842	output. If @var{terminal} is a frame, it means that frame's terminal;
	1843	if it is @code{nil}, that means the currently selected frame's
	1844	terminal.
	1845	@end deffn
	1846
	1847	@node Input Methods
	1848	@section Input Methods
	1849	@cindex input methods
	1850
	1851	@dfn{Input methods} provide convenient ways of entering non-@acronym{ASCII}
	1852	characters from the keyboard. Unlike coding systems, which translate
	1853	non-@acronym{ASCII} characters to and from encodings meant to be read by
	1854	programs, input methods provide human-friendly commands. (@xref{Input
	1855	Methods,,, emacs, The GNU Emacs Manual}, for information on how users
	1856	use input methods to enter text.) How to define input methods is not
	1857	yet documented in this manual, but here we describe how to use them.
	1858
	1859	Each input method has a name, which is currently a string;
	1860	in the future, symbols may also be usable as input method names.
	1861
	1862	@defvar current-input-method
	1863	This variable holds the name of the input method now active in the
	1864	current buffer. (It automatically becomes local in each buffer when set
	1865	in any fashion.) It is @code{nil} if no input method is active in the
	1866	buffer now.
	1867	@end defvar
	1868
	1869	@defopt default-input-method
	1870	This variable holds the default input method for commands that choose an
	1871	input method. Unlike @code{current-input-method}, this variable is
	1872	normally global.
	1873	@end defopt
	1874
	1875	@deffn Command set-input-method input-method
	1876	This command activates input method @var{input-method} for the current
	1877	buffer. It also sets @code{default-input-method} to @var{input-method}.
	1878	If @var{input-method} is @code{nil}, this command deactivates any input
	1879	method for the current buffer.
	1880	@end deffn
	1881
	1882	@defun read-input-method-name prompt &optional default inhibit-null
	1883	This function reads an input method name with the minibuffer, prompting
	1884	with @var{prompt}. If @var{default} is non-@code{nil}, that is returned
	1885	by default, if the user enters empty input. However, if
	1886	@var{inhibit-null} is non-@code{nil}, empty input signals an error.
	1887
	1888	The returned value is a string.
	1889	@end defun
	1890
	1891	@defvar input-method-alist
	1892	This variable defines all the supported input methods.
	1893	Each element defines one input method, and should have the form:
	1894
	1895	@example
	1896	(@var{input-method} @var{language-env} @var{activate-func}
	1897	@var{title} @var{description} @var{args}...)
	1898	@end example
	1899
	1900	Here @var{input-method} is the input method name, a string;
	1901	@var{language-env} is another string, the name of the language
	1902	environment this input method is recommended for. (That serves only for
	1903	documentation purposes.)
	1904
	1905	@var{activate-func} is a function to call to activate this method. The
	1906	@var{args}, if any, are passed as arguments to @var{activate-func}. All
	1907	told, the arguments to @var{activate-func} are @var{input-method} and
	1908	the @var{args}.
	1909
	1910	@var{title} is a string to display in the mode line while this method is
	1911	active. @var{description} is a string describing this method and what
	1912	it is good for.
	1913	@end defvar
	1914
	1915	The fundamental interface to input methods is through the
	1916	variable @code{input-method-function}. @xref{Reading One Event},
	1917	and @ref{Invoking the Input Method}.
	1918
	1919	@node Locales
	1920	@section Locales
	1921	@cindex locale
	1922
	1923	POSIX defines a concept of ``locales'' which control which language
	1924	to use in language-related features. These Emacs variables control
	1925	how Emacs interacts with these features.
	1926
	1927	@defvar locale-coding-system
	1928	@cindex keyboard input decoding on X
	1929	This variable specifies the coding system to use for decoding system
	1930	error messages and---on X Window system only---keyboard input, for
	1931	encoding the format argument to @code{format-time-string}, and for
	1932	decoding the return value of @code{format-time-string}.
	1933	@end defvar
	1934
	1935	@defvar system-messages-locale
	1936	This variable specifies the locale to use for generating system error
	1937	messages. Changing the locale can cause messages to come out in a
	1938	different language or in a different orthography. If the variable is
	1939	@code{nil}, the locale is specified by environment variables in the
	1940	usual POSIX fashion.
	1941	@end defvar
	1942
	1943	@defvar system-time-locale
	1944	This variable specifies the locale to use for formatting time values.
	1945	Changing the locale can cause messages to appear according to the
	1946	conventions of a different language. If the variable is @code{nil}, the
	1947	locale is specified by environment variables in the usual POSIX fashion.
	1948	@end defvar
	1949
	1950	@defun locale-info item
	1951	This function returns locale data @var{item} for the current POSIX
	1952	locale, if available. @var{item} should be one of these symbols:
	1953
	1954	@table @code
	1955	@item codeset
	1956	Return the character set as a string (locale item @code{CODESET}).
	1957
	1958	@item days
	1959	Return a 7-element vector of day names (locale items
	1960	@code{DAY_1} through @code{DAY_7});
	1961
	1962	@item months
	1963	Return a 12-element vector of month names (locale items @code{MON_1}
	1964	through @code{MON_12}).
	1965
	1966	@item paper
	1967	Return a list @code{(@var{width} @var{height})} for the default paper
	1968	size measured in millimeters (locale items @code{PAPER_WIDTH} and
	1969	@code{PAPER_HEIGHT}).
	1970	@end table
	1971
	1972	If the system can't provide the requested information, or if
	1973	@var{item} is not one of those symbols, the value is @code{nil}. All
	1974	strings in the return value are decoded using
	1975	@code{locale-coding-system}. @xref{Locales,,, libc, The GNU Libc Manual},
	1976	for more information about locales and locale items.
	1977	@end defun