HCoop Git - bpt/emacs.git/blame_incremental

... / ...

Commit	Line	Data
	1	@c --texinfo--
	2	@c This is part of the GNU Emacs Lisp Reference Manual.
	3	@c Copyright (C) 1998-1999, 2001-2013 Free Software Foundation, Inc.
	4	@c See the file elisp.texi for copying conditions.
	5	@node Non-ASCII Characters
	6	@chapter Non-@acronym{ASCII} Characters
	7	@cindex multibyte characters
	8	@cindex characters, multi-byte
	9	@cindex non-@acronym{ASCII} characters
	10
	11	This chapter covers the special issues relating to characters and
	12	how they are stored in strings and buffers.
	13
	14	@menu
	15	* Text Representations:: How Emacs represents text.
	16	* Disabling Multibyte:: Controlling whether to use multibyte characters.
	17	* Converting Representations:: Converting unibyte to multibyte and vice versa.
	18	* Selecting a Representation:: Treating a byte sequence as unibyte or multi.
	19	* Character Codes:: How unibyte and multibyte relate to
	20	codes of individual characters.
	21	* Character Properties:: Character attributes that define their
	22	behavior and handling.
	23	* Character Sets:: The space of possible character codes
	24	is divided into various character sets.
	25	* Scanning Charsets:: Which character sets are used in a buffer?
	26	* Translation of Characters:: Translation tables are used for conversion.
	27	* Coding Systems:: Coding systems are conversions for saving files.
	28	* Input Methods:: Input methods allow users to enter various
	29	non-ASCII characters without special keyboards.
	30	* Locales:: Interacting with the POSIX locale.
	31	@end menu
	32
	33	@node Text Representations
	34	@section Text Representations
	35	@cindex text representation
	36
	37	Emacs buffers and strings support a large repertoire of characters
	38	from many different scripts, allowing users to type and display text
	39	in almost any known written language.
	40
	41	@cindex character codepoint
	42	@cindex codespace
	43	@cindex Unicode
	44	To support this multitude of characters and scripts, Emacs closely
	45	follows the @dfn{Unicode Standard}. The Unicode Standard assigns a
	46	unique number, called a @dfn{codepoint}, to each and every character.
	47	The range of codepoints defined by Unicode, or the Unicode
	48	@dfn{codespace}, is @code{0..#x10FFFF} (in hexadecimal notation),
	49	inclusive. Emacs extends this range with codepoints in the range
	50	@code{#x110000..#x3FFFFF}, which it uses for representing characters
	51	that are not unified with Unicode and @dfn{raw 8-bit bytes} that
	52	cannot be interpreted as characters. Thus, a character codepoint in
	53	Emacs is a 22-bit integer number.
	54
	55	@cindex internal representation of characters
	56	@cindex characters, representation in buffers and strings
	57	@cindex multibyte text
	58	To conserve memory, Emacs does not hold fixed-length 22-bit numbers
	59	that are codepoints of text characters within buffers and strings.
	60	Rather, Emacs uses a variable-length internal representation of
	61	characters, that stores each character as a sequence of 1 to 5 8-bit
	62	bytes, depending on the magnitude of its codepoint@footnote{
	63	This internal representation is based on one of the encodings defined
	64	by the Unicode Standard, called @dfn{UTF-8}, for representing any
	65	Unicode codepoint, but Emacs extends UTF-8 to represent the additional
	66	codepoints it uses for raw 8-bit bytes and characters not unified with
	67	Unicode.}. For example, any @acronym{ASCII} character takes up only 1
	68	byte, a Latin-1 character takes up 2 bytes, etc. We call this
	69	representation of text @dfn{multibyte}.
	70
	71	Outside Emacs, characters can be represented in many different
	72	encodings, such as ISO-8859-1, GB-2312, Big-5, etc. Emacs converts
	73	between these external encodings and its internal representation, as
	74	appropriate, when it reads text into a buffer or a string, or when it
	75	writes text to a disk file or passes it to some other process.
	76
	77	Occasionally, Emacs needs to hold and manipulate encoded text or
	78	binary non-text data in its buffers or strings. For example, when
	79	Emacs visits a file, it first reads the file's text verbatim into a
	80	buffer, and only then converts it to the internal representation.
	81	Before the conversion, the buffer holds encoded text.
	82
	83	@cindex unibyte text
	84	Encoded text is not really text, as far as Emacs is concerned, but
	85	rather a sequence of raw 8-bit bytes. We call buffers and strings
	86	that hold encoded text @dfn{unibyte} buffers and strings, because
	87	Emacs treats them as a sequence of individual bytes. Usually, Emacs
	88	displays unibyte buffers and strings as octal codes such as
	89	@code{\237}. We recommend that you never use unibyte buffers and
	90	strings except for manipulating encoded text or binary non-text data.
	91
	92	In a buffer, the buffer-local value of the variable
	93	@code{enable-multibyte-characters} specifies the representation used.
	94	The representation for a string is determined and recorded in the string
	95	when the string is constructed.
	96
	97	@defvar enable-multibyte-characters
	98	This variable specifies the current buffer's text representation.
	99	If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
	100	it contains unibyte encoded text or binary non-text data.
	101
	102	You cannot set this variable directly; instead, use the function
	103	@code{set-buffer-multibyte} to change a buffer's representation.
	104	@end defvar
	105
	106	@defun position-bytes position
	107	Buffer positions are measured in character units. This function
	108	returns the byte-position corresponding to buffer position
	109	@var{position} in the current buffer. This is 1 at the start of the
	110	buffer, and counts upward in bytes. If @var{position} is out of
	111	range, the value is @code{nil}.
	112	@end defun
	113
	114	@defun byte-to-position byte-position
	115	Return the buffer position, in character units, corresponding to given
	116	@var{byte-position} in the current buffer. If @var{byte-position} is
	117	out of range, the value is @code{nil}. In a multibyte buffer, an
	118	arbitrary value of @var{byte-position} can be not at character
	119	boundary, but inside a multibyte sequence representing a single
	120	character; in this case, this function returns the buffer position of
	121	the character whose multibyte sequence includes @var{byte-position}.
	122	In other words, the value does not change for all byte positions that
	123	belong to the same character.
	124	@end defun
	125
	126	@defun multibyte-string-p string
	127	Return @code{t} if @var{string} is a multibyte string, @code{nil}
	128	otherwise. This function also returns @code{nil} if @var{string} is
	129	some object other than a string.
	130	@end defun
	131
	132	@defun string-bytes string
	133	@cindex string, number of bytes
	134	This function returns the number of bytes in @var{string}.
	135	If @var{string} is a multibyte string, this can be greater than
	136	@code{(length @var{string})}.
	137	@end defun
	138
	139	@defun unibyte-string &rest bytes
	140	This function concatenates all its argument @var{bytes} and makes the
	141	result a unibyte string.
	142	@end defun
	143
	144	@node Disabling Multibyte
	145	@section Disabling Multibyte Characters
	146	@cindex disabling multibyte
	147
	148	By default, Emacs starts in multibyte mode: it stores the contents
	149	of buffers and strings using an internal encoding that represents
	150	non-@acronym{ASCII} characters using multi-byte sequences. Multibyte
	151	mode allows you to use all the supported languages and scripts without
	152	limitations.
	153
	154	@cindex turn multibyte support on or off
	155	Under very special circumstances, you may want to disable multibyte
	156	character support, for a specific buffer.
	157	When multibyte characters are disabled in a buffer, we call
	158	that @dfn{unibyte mode}. In unibyte mode, each character in the
	159	buffer has a character code ranging from 0 through 255 (0377 octal); 0
	160	through 127 (0177 octal) represent @acronym{ASCII} characters, and 128
	161	(0200 octal) through 255 (0377 octal) represent non-@acronym{ASCII}
	162	characters.
	163
	164	To edit a particular file in unibyte representation, visit it using
	165	@code{find-file-literally}. @xref{Visiting Functions}. You can
	166	convert a multibyte buffer to unibyte by saving it to a file, killing
	167	the buffer, and visiting the file again with
	168	@code{find-file-literally}. Alternatively, you can use @kbd{C-x
	169	@key{RET} c} (@code{universal-coding-system-argument}) and specify
	170	@samp{raw-text} as the coding system with which to visit or save a
	171	file. @xref{Text Coding, , Specifying a Coding System for File Text,
	172	emacs, GNU Emacs Manual}. Unlike @code{find-file-literally}, finding
	173	a file as @samp{raw-text} doesn't disable format conversion,
	174	uncompression, or auto mode selection.
	175
	176	@c See http://debbugs.gnu.org/11226 for lack of unibyte tooltip.
	177	@vindex enable-multibyte-characters
	178	The buffer-local variable @code{enable-multibyte-characters} is
	179	non-@code{nil} in multibyte buffers, and @code{nil} in unibyte ones.
	180	The mode line also indicates whether a buffer is multibyte or not.
	181	With a graphical display, in a multibyte buffer, the portion of the
	182	mode line that indicates the character set has a tooltip that (amongst
	183	other things) says that the buffer is multibyte. In a unibyte buffer,
	184	the character set indicator is absent. Thus, in a unibyte buffer
	185	(when using a graphical display) there is normally nothing before the
	186	indication of the visited file's end-of-line convention (colon,
	187	backslash, etc.), unless you are using an input method.
	188
	189	@findex toggle-enable-multibyte-characters
	190	You can turn off multibyte support in a specific buffer by invoking the
	191	command @code{toggle-enable-multibyte-characters} in that buffer.
	192
	193	@node Converting Representations
	194	@section Converting Text Representations
	195
	196	Emacs can convert unibyte text to multibyte; it can also convert
	197	multibyte text to unibyte, provided that the multibyte text contains
	198	only @acronym{ASCII} and 8-bit raw bytes. In general, these
	199	conversions happen when inserting text into a buffer, or when putting
	200	text from several strings together in one string. You can also
	201	explicitly convert a string's contents to either representation.
	202
	203	Emacs chooses the representation for a string based on the text from
	204	which it is constructed. The general rule is to convert unibyte text
	205	to multibyte text when combining it with other multibyte text, because
	206	the multibyte representation is more general and can hold whatever
	207	characters the unibyte text has.
	208
	209	When inserting text into a buffer, Emacs converts the text to the
	210	buffer's representation, as specified by
	211	@code{enable-multibyte-characters} in that buffer. In particular, when
	212	you insert multibyte text into a unibyte buffer, Emacs converts the text
	213	to unibyte, even though this conversion cannot in general preserve all
	214	the characters that might be in the multibyte text. The other natural
	215	alternative, to convert the buffer contents to multibyte, is not
	216	acceptable because the buffer's representation is a choice made by the
	217	user that cannot be overridden automatically.
	218
	219	Converting unibyte text to multibyte text leaves @acronym{ASCII}
	220	characters unchanged, and converts bytes with codes 128 through 255 to
	221	the multibyte representation of raw eight-bit bytes.
	222
	223	Converting multibyte text to unibyte converts all @acronym{ASCII}
	224	and eight-bit characters to their single-byte form, but loses
	225	information for non-@acronym{ASCII} characters by discarding all but
	226	the low 8 bits of each character's codepoint. Converting unibyte text
	227	to multibyte and back to unibyte reproduces the original unibyte text.
	228
	229	The next two functions either return the argument @var{string}, or a
	230	newly created string with no text properties.
	231
	232	@defun string-to-multibyte string
	233	This function returns a multibyte string containing the same sequence
	234	of characters as @var{string}. If @var{string} is a multibyte string,
	235	it is returned unchanged. The function assumes that @var{string}
	236	includes only @acronym{ASCII} characters and raw 8-bit bytes; the
	237	latter are converted to their multibyte representation corresponding
	238	to the codepoints @code{#x3FFF80} through @code{#x3FFFFF}, inclusive
	239	(@pxref{Text Representations, codepoints}).
	240	@end defun
	241
	242	@defun string-to-unibyte string
	243	This function returns a unibyte string containing the same sequence of
	244	characters as @var{string}. It signals an error if @var{string}
	245	contains a non-@acronym{ASCII} character. If @var{string} is a
	246	unibyte string, it is returned unchanged. Use this function for
	247	@var{string} arguments that contain only @acronym{ASCII} and eight-bit
	248	characters.
	249	@end defun
	250
	251	@c FIXME: Should `@var{character}' be `@var{byte}'?
	252	@defun byte-to-string byte
	253	@cindex byte to string
	254	This function returns a unibyte string containing a single byte of
	255	character data, @var{character}. It signals an error if
	256	@var{character} is not an integer between 0 and 255.
	257	@end defun
	258
	259	@defun multibyte-char-to-unibyte char
	260	This converts the multibyte character @var{char} to a unibyte
	261	character, and returns that character. If @var{char} is neither
	262	@acronym{ASCII} nor eight-bit, the function returns -1.
	263	@end defun
	264
	265	@defun unibyte-char-to-multibyte char
	266	This convert the unibyte character @var{char} to a multibyte
	267	character, assuming @var{char} is either @acronym{ASCII} or raw 8-bit
	268	byte.
	269	@end defun
	270
	271	@node Selecting a Representation
	272	@section Selecting a Representation
	273
	274	Sometimes it is useful to examine an existing buffer or string as
	275	multibyte when it was unibyte, or vice versa.
	276
	277	@defun set-buffer-multibyte multibyte
	278	Set the representation type of the current buffer. If @var{multibyte}
	279	is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}
	280	is @code{nil}, the buffer becomes unibyte.
	281
	282	This function leaves the buffer contents unchanged when viewed as a
	283	sequence of bytes. As a consequence, it can change the contents
	284	viewed as characters; for instance, a sequence of three bytes which is
	285	treated as one character in multibyte representation will count as
	286	three characters in unibyte representation. Eight-bit characters
	287	representing raw bytes are an exception. They are represented by one
	288	byte in a unibyte buffer, but when the buffer is set to multibyte,
	289	they are converted to two-byte sequences, and vice versa.
	290
	291	This function sets @code{enable-multibyte-characters} to record which
	292	representation is in use. It also adjusts various data in the buffer
	293	(including overlays, text properties and markers) so that they cover the
	294	same text as they did before.
	295
	296	This function signals an error if the buffer is narrowed, since the
	297	narrowing might have occurred in the middle of multibyte character
	298	sequences.
	299
	300	This function also signals an error if the buffer is an indirect
	301	buffer. An indirect buffer always inherits the representation of its
	302	base buffer.
	303	@end defun
	304
	305	@defun string-as-unibyte string
	306	If @var{string} is already a unibyte string, this function returns
	307	@var{string} itself. Otherwise, it returns a new string with the same
	308	bytes as @var{string}, but treating each byte as a separate character
	309	(so that the value may have more characters than @var{string}); as an
	310	exception, each eight-bit character representing a raw byte is
	311	converted into a single byte. The newly-created string contains no
	312	text properties.
	313	@end defun
	314
	315	@defun string-as-multibyte string
	316	If @var{string} is a multibyte string, this function returns
	317	@var{string} itself. Otherwise, it returns a new string with the same
	318	bytes as @var{string}, but treating each multibyte sequence as one
	319	character. This means that the value may have fewer characters than
	320	@var{string} has. If a byte sequence in @var{string} is invalid as a
	321	multibyte representation of a single character, each byte in the
	322	sequence is treated as a raw 8-bit byte. The newly-created string
	323	contains no text properties.
	324	@end defun
	325
	326	@node Character Codes
	327	@section Character Codes
	328	@cindex character codes
	329
	330	The unibyte and multibyte text representations use different
	331	character codes. The valid character codes for unibyte representation
	332	range from 0 to @code{#xFF} (255)---the values that can fit in one
	333	byte. The valid character codes for multibyte representation range
	334	from 0 to @code{#x3FFFFF}. In this code space, values 0 through
	335	@code{#x7F} (127) are for @acronym{ASCII} characters, and values
	336	@code{#x80} (128) through @code{#x3FFF7F} (4194175) are for
	337	non-@acronym{ASCII} characters.
	338
	339	Emacs character codes are a superset of the Unicode standard.
	340	Values 0 through @code{#x10FFFF} (1114111) correspond to Unicode
	341	characters of the same codepoint; values @code{#x110000} (1114112)
	342	through @code{#x3FFF7F} (4194175) represent characters that are not
	343	unified with Unicode; and values @code{#x3FFF80} (4194176) through
	344	@code{#x3FFFFF} (4194303) represent eight-bit raw bytes.
	345
	346	@defun characterp charcode
	347	This returns @code{t} if @var{charcode} is a valid character, and
	348	@code{nil} otherwise.
	349
	350	@example
	351	@group
	352	(characterp 65)
	353	@result{} t
	354	@end group
	355	@group
	356	(characterp 4194303)
	357	@result{} t
	358	@end group
	359	@group
	360	(characterp 4194304)
	361	@result{} nil
	362	@end group
	363	@end example
	364	@end defun
	365
	366	@cindex maximum value of character codepoint
	367	@cindex codepoint, largest value
	368	@defun max-char
	369	This function returns the largest value that a valid character
	370	codepoint can have.
	371
	372	@example
	373	@group
	374	(characterp (max-char))
	375	@result{} t
	376	@end group
	377	@group
	378	(characterp (1+ (max-char)))
	379	@result{} nil
	380	@end group
	381	@end example
	382	@end defun
	383
	384	@defun get-byte &optional pos string
	385	This function returns the byte at character position @var{pos} in the
	386	current buffer. If the current buffer is unibyte, this is literally
	387	the byte at that position. If the buffer is multibyte, byte values of
	388	@acronym{ASCII} characters are the same as character codepoints,
	389	whereas eight-bit raw bytes are converted to their 8-bit codes. The
	390	function signals an error if the character at @var{pos} is
	391	non-@acronym{ASCII}.
	392
	393	The optional argument @var{string} means to get a byte value from that
	394	string instead of the current buffer.
	395	@end defun
	396
	397	@node Character Properties
	398	@section Character Properties
	399	@cindex character properties
	400	A @dfn{character property} is a named attribute of a character that
	401	specifies how the character behaves and how it should be handled
	402	during text processing and display. Thus, character properties are an
	403	important part of specifying the character's semantics.
	404
	405	@c FIXME: Use the latest URI of this chapter?
	406	@c http://www.unicode.org/versions/latest/ch04.pdf
	407	On the whole, Emacs follows the Unicode Standard in its implementation
	408	of character properties. In particular, Emacs supports the
	409	@uref{http://www.unicode.org/reports/tr23/, Unicode Character Property
	410	Model}, and the Emacs character property database is derived from the
	411	Unicode Character Database (@acronym{UCD}). See the
	412	@uref{http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf, Character
	413	Properties chapter of the Unicode Standard}, for a detailed
	414	description of Unicode character properties and their meaning. This
	415	section assumes you are already familiar with that chapter of the
	416	Unicode Standard, and want to apply that knowledge to Emacs Lisp
	417	programs.
	418
	419	In Emacs, each property has a name, which is a symbol, and a set of
	420	possible values, whose types depend on the property; if a character
	421	does not have a certain property, the value is @code{nil}. As a
	422	general rule, the names of character properties in Emacs are produced
	423	from the corresponding Unicode properties by downcasing them and
	424	replacing each @samp{_} character with a dash @samp{-}. For example,
	425	@code{Canonical_Combining_Class} becomes
	426	@code{canonical-combining-class}. However, sometimes we shorten the
	427	names to make their use easier.
	428
	429	@cindex unassigned character codepoints
	430	Some codepoints are left @dfn{unassigned} by the
	431	@acronym{UCD}---they don't correspond to any character. The Unicode
	432	Standard defines default values of properties for such codepoints;
	433	they are mentioned below for each property.
	434
	435	Here is the full list of value types for all the character
	436	properties that Emacs knows about:
	437
	438	@table @code
	439	@item name
	440	Corresponds to the @code{Name} Unicode property. The value is a
	441	string consisting of upper-case Latin letters A to Z, digits, spaces,
	442	and hyphen @samp{-} characters. For unassigned codepoints, the value
	443	is an empty string.
	444
	445	@cindex unicode general category
	446	@item general-category
	447	Corresponds to the @code{General_Category} Unicode property. The
	448	value is a symbol whose name is a 2-letter abbreviation of the
	449	character's classification. For unassigned codepoints, the value
	450	is @code{Cn}.
	451
	452	@item canonical-combining-class
	453	Corresponds to the @code{Canonical_Combining_Class} Unicode property.
	454	The value is an integer number. For unassigned codepoints, the value
	455	is zero.
	456
	457	@cindex bidirectional class of characters
	458	@item bidi-class
	459	Corresponds to the Unicode @code{Bidi_Class} property. The value is a
	460	symbol whose name is the Unicode @dfn{directional type} of the
	461	character. Emacs uses this property when it reorders bidirectional
	462	text for display (@pxref{Bidirectional Display}). For unassigned
	463	codepoints, the value depends on the code blocks to which the
	464	codepoint belongs: most unassigned codepoints get the value of
	465	@code{L} (strong L), but some get values of @code{AL} (Arabic letter)
	466	or @code{R} (strong R).
	467
	468	@item decomposition
	469	Corresponds to the Unicode properties @code{Decomposition_Type} and
	470	@code{Decomposition_Value}. The value is a list, whose first element
	471	may be a symbol representing a compatibility formatting tag, such as
	472	@code{small}@footnote{The Unicode specification writes these tag names
	473	inside @samp{<..>} brackets, but the tag names in Emacs do not include
	474	the brackets; e.g., Unicode specifies @samp{<small>} where Emacs uses
	475	@samp{small}. }; the other elements are characters that give the
	476	compatibility decomposition sequence of this character. For
	477	unassigned codepoints, the value is the character itself.
	478
	479	@item decimal-digit-value
	480	Corresponds to the Unicode @code{Numeric_Value} property for
	481	characters whose @code{Numeric_Type} is @samp{Decimal}. The value is
	482	an integer number. For unassigned codepoints, the value is
	483	@code{nil}, which means @acronym{NaN}, or ``not-a-number''.
	484
	485	@item digit-value
	486	Corresponds to the Unicode @code{Numeric_Value} property for
	487	characters whose @code{Numeric_Type} is @samp{Digit}. The value is an
	488	integer number. Examples of such characters include compatibility
	489	subscript and superscript digits, for which the value is the
	490	corresponding number. For unassigned codepoints, the value is
	491	@code{nil}, which means @acronym{NaN}.
	492
	493	@item numeric-value
	494	Corresponds to the Unicode @code{Numeric_Value} property for
	495	characters whose @code{Numeric_Type} is @samp{Numeric}. The value of
	496	this property is an integer or a floating-point number. Examples of
	497	characters that have this property include fractions, subscripts,
	498	superscripts, Roman numerals, currency numerators, and encircled
	499	numbers. For example, the value of this property for the character
	500	@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}. For
	501	unassigned codepoints, the value is @code{nil}, which means
	502	@acronym{NaN}.
	503
	504	@cindex mirroring of characters
	505	@item mirrored
	506	Corresponds to the Unicode @code{Bidi_Mirrored} property. The value
	507	of this property is a symbol, either @code{Y} or @code{N}. For
	508	unassigned codepoints, the value is @code{N}.
	509
	510	@item mirroring
	511	Corresponds to the Unicode @code{Bidi_Mirroring_Glyph} property. The
	512	value of this property is a character whose glyph represents the
	513	mirror image of the character's glyph, or @code{nil} if there's no
	514	defined mirroring glyph. All the characters whose @code{mirrored}
	515	property is @code{N} have @code{nil} as their @code{mirroring}
	516	property; however, some characters whose @code{mirrored} property is
	517	@code{Y} also have @code{nil} for @code{mirroring}, because no
	518	appropriate characters exist with mirrored glyphs. Emacs uses this
	519	property to display mirror images of characters when appropriate
	520	(@pxref{Bidirectional Display}). For unassigned codepoints, the value
	521	is @code{nil}.
	522
	523	@item old-name
	524	Corresponds to the Unicode @code{Unicode_1_Name} property. The value
	525	is a string. For unassigned codepoints, the value is an empty string.
	526
	527	@item iso-10646-comment
	528	Corresponds to the Unicode @code{ISO_Comment} property. The value is
	529	a string. For unassigned codepoints, the value is an empty string.
	530
	531	@item uppercase
	532	Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property.
	533	The value of this property is a single character. For unassigned
	534	codepoints, the value is @code{nil}, which means the character itself.
	535
	536	@item lowercase
	537	Corresponds to the Unicode @code{Simple_Lowercase_Mapping} property.
	538	The value of this property is a single character. For unassigned
	539	codepoints, the value is @code{nil}, which means the character itself.
	540
	541	@item titlecase
	542	Corresponds to the Unicode @code{Simple_Titlecase_Mapping} property.
	543	@dfn{Title case} is a special form of a character used when the first
	544	character of a word needs to be capitalized. The value of this
	545	property is a single character. For unassigned codepoints, the value
	546	is @code{nil}, which means the character itself.
	547	@end table
	548
	549	@defun get-char-code-property char propname
	550	This function returns the value of @var{char}'s @var{propname} property.
	551
	552	@c FIXME: Use ‘?\s’ instead of ‘? ’ for the space character in the
	553	@c first example? --xfq
	554	@example
	555	@group
	556	(get-char-code-property ? 'general-category)
	557	@result{} Zs
	558	@end group
	559	@group
	560	(get-char-code-property ?1 'general-category)
	561	@result{} Nd
	562	@end group
	563	@group
	564	;; subscript 4
	565	(get-char-code-property ?\u2084 'digit-value)
	566	@result{} 4
	567	@end group
	568	@group
	569	;; one fifth
	570	(get-char-code-property ?\u2155 'numeric-value)
	571	@result{} 0.2
	572	@end group
	573	@group
	574	;; Roman IV
	575	(get-char-code-property ?\u2163 'numeric-value)
	576	@result{} 4
	577	@end group
	578	@end example
	579	@end defun
	580
	581	@defun char-code-property-description prop value
	582	This function returns the description string of property @var{prop}'s
	583	@var{value}, or @code{nil} if @var{value} has no description.
	584
	585	@example
	586	@group
	587	(char-code-property-description 'general-category 'Zs)
	588	@result{} "Separator, Space"
	589	@end group
	590	@group
	591	(char-code-property-description 'general-category 'Nd)
	592	@result{} "Number, Decimal Digit"
	593	@end group
	594	@group
	595	(char-code-property-description 'numeric-value '1/5)
	596	@result{} nil
	597	@end group
	598	@end example
	599	@end defun
	600
	601	@defun put-char-code-property char propname value
	602	This function stores @var{value} as the value of the property
	603	@var{propname} for the character @var{char}.
	604	@end defun
	605
	606	@defvar unicode-category-table
	607	The value of this variable is a char-table (@pxref{Char-Tables}) that
	608	specifies, for each character, its Unicode @code{General_Category}
	609	property as a symbol.
	610	@end defvar
	611
	612	@defvar char-script-table
	613	The value of this variable is a char-table that specifies, for each
	614	character, a symbol whose name is the script to which the character
	615	belongs, according to the Unicode Standard classification of the
	616	Unicode code space into script-specific blocks. This char-table has a
	617	single extra slot whose value is the list of all script symbols.
	618	@end defvar
	619
	620	@defvar char-width-table
	621	The value of this variable is a char-table that specifies the width of
	622	each character in columns that it will occupy on the screen.
	623	@end defvar
	624
	625	@defvar printable-chars
	626	The value of this variable is a char-table that specifies, for each
	627	character, whether it is printable or not. That is, if evaluating
	628	@code{(aref printable-chars char)} results in @code{t}, the character
	629	is printable, and if it results in @code{nil}, it is not.
	630	@end defvar
	631
	632	@node Character Sets
	633	@section Character Sets
	634	@cindex character sets
	635
	636	@cindex charset
	637	@cindex coded character set
	638	An Emacs @dfn{character set}, or @dfn{charset}, is a set of characters
	639	in which each character is assigned a numeric code point. (The
	640	Unicode Standard calls this a @dfn{coded character set}.) Each Emacs
	641	charset has a name which is a symbol. A single character can belong
	642	to any number of different character sets, but it will generally have
	643	a different code point in each charset. Examples of character sets
	644	include @code{ascii}, @code{iso-8859-1}, @code{greek-iso8859-7}, and
	645	@code{windows-1255}. The code point assigned to a character in a
	646	charset is usually different from its code point used in Emacs buffers
	647	and strings.
	648
	649	@cindex @code{emacs}, a charset
	650	@cindex @code{unicode}, a charset
	651	@cindex @code{eight-bit}, a charset
	652	Emacs defines several special character sets. The character set
	653	@code{unicode} includes all the characters whose Emacs code points are
	654	in the range @code{0..#x10FFFF}. The character set @code{emacs}
	655	includes all @acronym{ASCII} and non-@acronym{ASCII} characters.
	656	Finally, the @code{eight-bit} charset includes the 8-bit raw bytes;
	657	Emacs uses it to represent raw bytes encountered in text.
	658
	659	@defun charsetp object
	660	Returns @code{t} if @var{object} is a symbol that names a character set,
	661	@code{nil} otherwise.
	662	@end defun
	663
	664	@defvar charset-list
	665	The value is a list of all defined character set names.
	666	@end defvar
	667
	668	@defun charset-priority-list &optional highestp
	669	This function returns a list of all defined character sets ordered by
	670	their priority. If @var{highestp} is non-@code{nil}, the function
	671	returns a single character set of the highest priority.
	672	@end defun
	673
	674	@defun set-charset-priority &rest charsets
	675	This function makes @var{charsets} the highest priority character sets.
	676	@end defun
	677
	678	@defun char-charset character &optional restriction
	679	This function returns the name of the character set of highest
	680	priority that @var{character} belongs to. @acronym{ASCII} characters
	681	are an exception: for them, this function always returns @code{ascii}.
	682
	683	If @var{restriction} is non-@code{nil}, it should be a list of
	684	charsets to search. Alternatively, it can be a coding system, in
	685	which case the returned charset must be supported by that coding
	686	system (@pxref{Coding Systems}).
	687	@end defun
	688
	689	@c TODO: Explain the properties here and add indexes such as ‘charset property’.
	690	@defun charset-plist charset
	691	This function returns the property list of the character set
	692	@var{charset}. Although @var{charset} is a symbol, this is not the
	693	same as the property list of that symbol. Charset properties include
	694	important information about the charset, such as its documentation
	695	string, short name, etc.
	696	@end defun
	697
	698	@defun put-charset-property charset propname value
	699	This function sets the @var{propname} property of @var{charset} to the
	700	given @var{value}.
	701	@end defun
	702
	703	@defun get-charset-property charset propname
	704	This function returns the value of @var{charset}s property
	705	@var{propname}.
	706	@end defun
	707
	708	@deffn Command list-charset-chars charset
	709	This command displays a list of characters in the character set
	710	@var{charset}.
	711	@end deffn
	712
	713	Emacs can convert between its internal representation of a character
	714	and the character's codepoint in a specific charset. The following
	715	two functions support these conversions.
	716
	717	@c FIXME: decode-char and encode-char accept and ignore an additional
	718	@c argument @var{restriction}. When that argument actually makes a
	719	@c difference, it should be documented here.
	720	@defun decode-char charset code-point
	721	This function decodes a character that is assigned a @var{code-point}
	722	in @var{charset}, to the corresponding Emacs character, and returns
	723	it. If @var{charset} doesn't contain a character of that code point,
	724	the value is @code{nil}. If @var{code-point} doesn't fit in a Lisp
	725	integer (@pxref{Integer Basics, most-positive-fixnum}), it can be
	726	specified as a cons cell @code{(@var{high} . @var{low})}, where
	727	@var{low} are the lower 16 bits of the value and @var{high} are the
	728	high 16 bits.
	729	@end defun
	730
	731	@defun encode-char char charset
	732	This function returns the code point assigned to the character
	733	@var{char} in @var{charset}. If the result does not fit in a Lisp
	734	integer, it is returned as a cons cell @code{(@var{high} . @var{low})}
	735	that fits the second argument of @code{decode-char} above. If
	736	@var{charset} doesn't have a codepoint for @var{char}, the value is
	737	@code{nil}.
	738	@end defun
	739
	740	The following function comes in handy for applying a certain
	741	function to all or part of the characters in a charset:
	742
	743	@defun map-charset-chars function charset &optional arg from-code to-code
	744	Call @var{function} for characters in @var{charset}. @var{function}
	745	is called with two arguments. The first one is a cons cell
	746	@code{(@var{from} . @var{to})}, where @var{from} and @var{to}
	747	indicate a range of characters contained in charset. The second
	748	argument passed to @var{function} is @var{arg}.
	749
	750	By default, the range of codepoints passed to @var{function} includes
	751	all the characters in @var{charset}, but optional arguments
	752	@var{from-code} and @var{to-code} limit that to the range of
	753	characters between these two codepoints of @var{charset}. If either
	754	of them is @code{nil}, it defaults to the first or last codepoint of
	755	@var{charset}, respectively.
	756	@end defun
	757
	758	@node Scanning Charsets
	759	@section Scanning for Character Sets
	760
	761	Sometimes it is useful to find out which character set a particular
	762	character belongs to. One use for this is in determining which coding
	763	systems (@pxref{Coding Systems}) are capable of representing all of
	764	the text in question; another is to determine the font(s) for
	765	displaying that text.
	766
	767	@defun charset-after &optional pos
	768	This function returns the charset of highest priority containing the
	769	character at position @var{pos} in the current buffer. If @var{pos}
	770	is omitted or @code{nil}, it defaults to the current value of point.
	771	If @var{pos} is out of range, the value is @code{nil}.
	772	@end defun
	773
	774	@defun find-charset-region beg end &optional translation
	775	This function returns a list of the character sets of highest priority
	776	that contain characters in the current buffer between positions
	777	@var{beg} and @var{end}.
	778
	779	The optional argument @var{translation} specifies a translation table
	780	to use for scanning the text (@pxref{Translation of Characters}). If
	781	it is non-@code{nil}, then each character in the region is translated
	782	through this table, and the value returned describes the translated
	783	characters instead of the characters actually in the buffer.
	784	@end defun
	785
	786	@defun find-charset-string string &optional translation
	787	This function returns a list of character sets of highest priority
	788	that contain characters in @var{string}. It is just like
	789	@code{find-charset-region}, except that it applies to the contents of
	790	@var{string} instead of part of the current buffer.
	791	@end defun
	792
	793	@node Translation of Characters
	794	@section Translation of Characters
	795	@cindex character translation tables
	796	@cindex translation tables
	797
	798	A @dfn{translation table} is a char-table (@pxref{Char-Tables}) that
	799	specifies a mapping of characters into characters. These tables are
	800	used in encoding and decoding, and for other purposes. Some coding
	801	systems specify their own particular translation tables; there are
	802	also default translation tables which apply to all other coding
	803	systems.
	804
	805	A translation table has two extra slots. The first is either
	806	@code{nil} or a translation table that performs the reverse
	807	translation; the second is the maximum number of characters to look up
	808	for translating sequences of characters (see the description of
	809	@code{make-translation-table-from-alist} below).
	810
	811	@defun make-translation-table &rest translations
	812	This function returns a translation table based on the argument
	813	@var{translations}. Each element of @var{translations} should be a
	814	list of elements of the form @code{(@var{from} . @var{to})}; this says
	815	to translate the character @var{from} into @var{to}.
	816
	817	The arguments and the forms in each argument are processed in order,
	818	and if a previous form already translates @var{to} to some other
	819	character, say @var{to-alt}, @var{from} is also translated to
	820	@var{to-alt}.
	821	@end defun
	822
	823	During decoding, the translation table's translations are applied to
	824	the characters that result from ordinary decoding. If a coding system
	825	has the property @code{:decode-translation-table}, that specifies the
	826	translation table to use, or a list of translation tables to apply in
	827	sequence. (This is a property of the coding system, as returned by
	828	@code{coding-system-get}, not a property of the symbol that is the
	829	coding system's name. @xref{Coding System Basics,, Basic Concepts of
	830	Coding Systems}.) Finally, if
	831	@code{standard-translation-table-for-decode} is non-@code{nil}, the
	832	resulting characters are translated by that table.
	833
	834	During encoding, the translation table's translations are applied to
	835	the characters in the buffer, and the result of translation is
	836	actually encoded. If a coding system has property
	837	@code{:encode-translation-table}, that specifies the translation table
	838	to use, or a list of translation tables to apply in sequence. In
	839	addition, if the variable @code{standard-translation-table-for-encode}
	840	is non-@code{nil}, it specifies the translation table to use for
	841	translating the result.
	842
	843	@defvar standard-translation-table-for-decode
	844	This is the default translation table for decoding. If a coding
	845	systems specifies its own translation tables, the table that is the
	846	value of this variable, if non-@code{nil}, is applied after them.
	847	@end defvar
	848
	849	@defvar standard-translation-table-for-encode
	850	This is the default translation table for encoding. If a coding
	851	systems specifies its own translation tables, the table that is the
	852	value of this variable, if non-@code{nil}, is applied after them.
	853	@end defvar
	854
	855	@c FIXME: This variable is obsolete since 23.1. We should mention
	856	@c that here or simply remove this defvar. --xfq
	857	@defvar translation-table-for-input
	858	Self-inserting characters are translated through this translation
	859	table before they are inserted. Search commands also translate their
	860	input through this table, so they can compare more reliably with
	861	what's in the buffer.
	862
	863	This variable automatically becomes buffer-local when set.
	864	@end defvar
	865
	866	@defun make-translation-table-from-vector vec
	867	This function returns a translation table made from @var{vec} that is
	868	an array of 256 elements to map bytes (values 0 through #xFF) to
	869	characters. Elements may be @code{nil} for untranslated bytes. The
	870	returned table has a translation table for reverse mapping in the
	871	first extra slot, and the value @code{1} in the second extra slot.
	872
	873	This function provides an easy way to make a private coding system
	874	that maps each byte to a specific character. You can specify the
	875	returned table and the reverse translation table using the properties
	876	@code{:decode-translation-table} and @code{:encode-translation-table}
	877	respectively in the @var{props} argument to
	878	@code{define-coding-system}.
	879	@end defun
	880
	881	@defun make-translation-table-from-alist alist
	882	This function is similar to @code{make-translation-table} but returns
	883	a complex translation table rather than a simple one-to-one mapping.
	884	Each element of @var{alist} is of the form @code{(@var{from}
	885	. @var{to})}, where @var{from} and @var{to} are either characters or
	886	vectors specifying a sequence of characters. If @var{from} is a
	887	character, that character is translated to @var{to} (i.e., to a
	888	character or a character sequence). If @var{from} is a vector of
	889	characters, that sequence is translated to @var{to}. The returned
	890	table has a translation table for reverse mapping in the first extra
	891	slot, and the maximum length of all the @var{from} character sequences
	892	in the second extra slot.
	893	@end defun
	894
	895	@node Coding Systems
	896	@section Coding Systems
	897
	898	@cindex coding system
	899	When Emacs reads or writes a file, and when Emacs sends text to a
	900	subprocess or receives text from a subprocess, it normally performs
	901	character code conversion and end-of-line conversion as specified
	902	by a particular @dfn{coding system}.
	903
	904	How to define a coding system is an arcane matter, and is not
	905	documented here.
	906
	907	@menu
	908	* Coding System Basics:: Basic concepts.
	909	* Encoding and I/O:: How file I/O functions handle coding systems.
	910	* Lisp and Coding Systems:: Functions to operate on coding system names.
	911	* User-Chosen Coding Systems:: Asking the user to choose a coding system.
	912	* Default Coding Systems:: Controlling the default choices.
	913	* Specifying Coding Systems:: Requesting a particular coding system
	914	for a single file operation.
	915	* Explicit Encoding:: Encoding or decoding text without doing I/O.
	916	* Terminal I/O Encoding:: Use of encoding for terminal I/O.
	917	@end menu
	918
	919	@node Coding System Basics
	920	@subsection Basic Concepts of Coding Systems
	921
	922	@cindex character code conversion
	923	@dfn{Character code conversion} involves conversion between the
	924	internal representation of characters used inside Emacs and some other
	925	encoding. Emacs supports many different encodings, in that it can
	926	convert to and from them. For example, it can convert text to or from
	927	encodings such as Latin 1, Latin 2, Latin 3, Latin 4, Latin 5, and
	928	several variants of ISO 2022. In some cases, Emacs supports several
	929	alternative encodings for the same characters; for example, there are
	930	three coding systems for the Cyrillic (Russian) alphabet: ISO,
	931	Alternativnyj, and KOI8.
	932
	933	Every coding system specifies a particular set of character code
	934	conversions, but the coding system @code{undecided} is special: it
	935	leaves the choice unspecified, to be chosen heuristically for each
	936	file, based on the file's data.
	937
	938	In general, a coding system doesn't guarantee roundtrip identity:
	939	decoding a byte sequence using coding system, then encoding the
	940	resulting text in the same coding system, can produce a different byte
	941	sequence. But some coding systems do guarantee that the byte sequence
	942	will be the same as what you originally decoded. Here are a few
	943	examples:
	944
	945	@quotation
	946	iso-8859-1, utf-8, big5, shift_jis, euc-jp
	947	@end quotation
	948
	949	Encoding buffer text and then decoding the result can also fail to
	950	reproduce the original text. For instance, if you encode a character
	951	with a coding system which does not support that character, the result
	952	is unpredictable, and thus decoding it using the same coding system
	953	may produce a different text. Currently, Emacs can't report errors
	954	that result from encoding unsupported characters.
	955
	956	@cindex EOL conversion
	957	@cindex end-of-line conversion
	958	@cindex line end conversion
	959	@dfn{End of line conversion} handles three different conventions
	960	used on various systems for representing end of line in files. The
	961	Unix convention, used on GNU and Unix systems, is to use the linefeed
	962	character (also called newline). The DOS convention, used on
	963	MS-Windows and MS-DOS systems, is to use a carriage-return and a
	964	linefeed at the end of a line. The Mac convention is to use just
	965	carriage-return. (This was the convention used on the Macintosh
	966	system prior to OS X.)
	967
	968	@cindex base coding system
	969	@cindex variant coding system
	970	@dfn{Base coding systems} such as @code{latin-1} leave the end-of-line
	971	conversion unspecified, to be chosen based on the data. @dfn{Variant
	972	coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and
	973	@code{latin-1-mac} specify the end-of-line conversion explicitly as
	974	well. Most base coding systems have three corresponding variants whose
	975	names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.
	976
	977	@vindex raw-text@r{ coding system}
	978	The coding system @code{raw-text} is special in that it prevents
	979	character code conversion, and causes the buffer visited with this
	980	coding system to be a unibyte buffer. For historical reasons, you can
	981	save both unibyte and multibyte text with this coding system. When
	982	you use @code{raw-text} to encode multibyte text, it does perform one
	983	character code conversion: it converts eight-bit characters to their
	984	single-byte external representation. @code{raw-text} does not specify
	985	the end-of-line conversion, allowing that to be determined as usual by
	986	the data, and has the usual three variants which specify the
	987	end-of-line conversion.
	988
	989	@vindex no-conversion@r{ coding system}
	990	@vindex binary@r{ coding system}
	991	@code{no-conversion} (and its alias @code{binary}) is equivalent to
	992	@code{raw-text-unix}: it specifies no conversion of either character
	993	codes or end-of-line.
	994
	995	@vindex emacs-internal@r{ coding system}
	996	@vindex utf-8-emacs@r{ coding system}
	997	The coding system @code{utf-8-emacs} specifies that the data is
	998	represented in the internal Emacs encoding (@pxref{Text
	999	Representations}). This is like @code{raw-text} in that no code
	1000	conversion happens, but different in that the result is multibyte
	1001	data. The name @code{emacs-internal} is an alias for
	1002	@code{utf-8-emacs}.
	1003
	1004	@defun coding-system-get coding-system property
	1005	This function returns the specified property of the coding system
	1006	@var{coding-system}. Most coding system properties exist for internal
	1007	purposes, but one that you might find useful is @code{:mime-charset}.
	1008	That property's value is the name used in MIME for the character coding
	1009	which this coding system can read and write. Examples:
	1010
	1011	@example
	1012	(coding-system-get 'iso-latin-1 :mime-charset)
	1013	@result{} iso-8859-1
	1014	(coding-system-get 'iso-2022-cn :mime-charset)
	1015	@result{} iso-2022-cn
	1016	(coding-system-get 'cyrillic-koi8 :mime-charset)
	1017	@result{} koi8-r
	1018	@end example
	1019
	1020	The value of the @code{:mime-charset} property is also defined
	1021	as an alias for the coding system.
	1022	@end defun
	1023
	1024	@cindex alias, for coding systems
	1025	@defun coding-system-aliases coding-system
	1026	This function returns the list of aliases of @var{coding-system}.
	1027	@end defun
	1028
	1029	@node Encoding and I/O
	1030	@subsection Encoding and I/O
	1031
	1032	The principal purpose of coding systems is for use in reading and
	1033	writing files. The function @code{insert-file-contents} uses a coding
	1034	system to decode the file data, and @code{write-region} uses one to
	1035	encode the buffer contents.
	1036
	1037	You can specify the coding system to use either explicitly
	1038	(@pxref{Specifying Coding Systems}), or implicitly using a default
	1039	mechanism (@pxref{Default Coding Systems}). But these methods may not
	1040	completely specify what to do. For example, they may choose a coding
	1041	system such as @code{undefined} which leaves the character code
	1042	conversion to be determined from the data. In these cases, the I/O
	1043	operation finishes the job of choosing a coding system. Very often
	1044	you will want to find out afterwards which coding system was chosen.
	1045
	1046	@defvar buffer-file-coding-system
	1047	This buffer-local variable records the coding system used for saving the
	1048	buffer and for writing part of the buffer with @code{write-region}. If
	1049	the text to be written cannot be safely encoded using the coding system
	1050	specified by this variable, these operations select an alternative
	1051	encoding by calling the function @code{select-safe-coding-system}
	1052	(@pxref{User-Chosen Coding Systems}). If selecting a different encoding
	1053	requires to ask the user to specify a coding system,
	1054	@code{buffer-file-coding-system} is updated to the newly selected coding
	1055	system.
	1056
	1057	@code{buffer-file-coding-system} does @emph{not} affect sending text
	1058	to a subprocess.
	1059	@end defvar
	1060
	1061	@defvar save-buffer-coding-system
	1062	This variable specifies the coding system for saving the buffer (by
	1063	overriding @code{buffer-file-coding-system}). Note that it is not used
	1064	for @code{write-region}.
	1065
	1066	When a command to save the buffer starts out to use
	1067	@code{buffer-file-coding-system} (or @code{save-buffer-coding-system}),
	1068	and that coding system cannot handle
	1069	the actual text in the buffer, the command asks the user to choose
	1070	another coding system (by calling @code{select-safe-coding-system}).
	1071	After that happens, the command also updates
	1072	@code{buffer-file-coding-system} to represent the coding system that
	1073	the user specified.
	1074	@end defvar
	1075
	1076	@defvar last-coding-system-used
	1077	I/O operations for files and subprocesses set this variable to the
	1078	coding system name that was used. The explicit encoding and decoding
	1079	functions (@pxref{Explicit Encoding}) set it too.
	1080
	1081	@strong{Warning:} Since receiving subprocess output sets this variable,
	1082	it can change whenever Emacs waits; therefore, you should copy the
	1083	value shortly after the function call that stores the value you are
	1084	interested in.
	1085	@end defvar
	1086
	1087	The variable @code{selection-coding-system} specifies how to encode
	1088	selections for the window system. @xref{Window System Selections}.
	1089
	1090	@defvar file-name-coding-system
	1091	The variable @code{file-name-coding-system} specifies the coding
	1092	system to use for encoding file names. Emacs encodes file names using
	1093	that coding system for all file operations. If
	1094	@code{file-name-coding-system} is @code{nil}, Emacs uses a default
	1095	coding system determined by the selected language environment. In the
	1096	default language environment, any non-@acronym{ASCII} characters in
	1097	file names are not encoded specially; they appear in the file system
	1098	using the internal Emacs representation.
	1099	@end defvar
	1100
	1101	@strong{Warning:} if you change @code{file-name-coding-system} (or
	1102	the language environment) in the middle of an Emacs session, problems
	1103	can result if you have already visited files whose names were encoded
	1104	using the earlier coding system and are handled differently under the
	1105	new coding system. If you try to save one of these buffers under the
	1106	visited file name, saving may use the wrong file name, or it may get
	1107	an error. If such a problem happens, use @kbd{C-x C-w} to specify a
	1108	new file name for that buffer.
	1109
	1110	@node Lisp and Coding Systems
	1111	@subsection Coding Systems in Lisp
	1112
	1113	Here are the Lisp facilities for working with coding systems:
	1114
	1115	@cindex list all coding systems
	1116	@defun coding-system-list &optional base-only
	1117	This function returns a list of all coding system names (symbols). If
	1118	@var{base-only} is non-@code{nil}, the value includes only the
	1119	base coding systems. Otherwise, it includes alias and variant coding
	1120	systems as well.
	1121	@end defun
	1122
	1123	@defun coding-system-p object
	1124	This function returns @code{t} if @var{object} is a coding system
	1125	name or @code{nil}.
	1126	@end defun
	1127
	1128	@cindex validity of coding system
	1129	@cindex coding system, validity check
	1130	@defun check-coding-system coding-system
	1131	This function checks the validity of @var{coding-system}. If that is
	1132	valid, it returns @var{coding-system}. If @var{coding-system} is
	1133	@code{nil}, the function return @code{nil}. For any other values, it
	1134	signals an error whose @code{error-symbol} is @code{coding-system-error}
	1135	(@pxref{Signaling Errors, signal}).
	1136	@end defun
	1137
	1138	@cindex eol type of coding system
	1139	@defun coding-system-eol-type coding-system
	1140	This function returns the type of end-of-line (a.k.a.@: @dfn{eol})
	1141	conversion used by @var{coding-system}. If @var{coding-system}
	1142	specifies a certain eol conversion, the return value is an integer 0,
	1143	1, or 2, standing for @code{unix}, @code{dos}, and @code{mac},
	1144	respectively. If @var{coding-system} doesn't specify eol conversion
	1145	explicitly, the return value is a vector of coding systems, each one
	1146	with one of the possible eol conversion types, like this:
	1147
	1148	@lisp
	1149	(coding-system-eol-type 'latin-1)
	1150	@result{} [latin-1-unix latin-1-dos latin-1-mac]
	1151	@end lisp
	1152
	1153	@noindent
	1154	If this function returns a vector, Emacs will decide, as part of the
	1155	text encoding or decoding process, what eol conversion to use. For
	1156	decoding, the end-of-line format of the text is auto-detected, and the
	1157	eol conversion is set to match it (e.g., DOS-style CRLF format will
	1158	imply @code{dos} eol conversion). For encoding, the eol conversion is
	1159	taken from the appropriate default coding system (e.g.,
	1160	default value of @code{buffer-file-coding-system} for
	1161	@code{buffer-file-coding-system}), or from the default eol conversion
	1162	appropriate for the underlying platform.
	1163	@end defun
	1164
	1165	@cindex eol conversion of coding system
	1166	@defun coding-system-change-eol-conversion coding-system eol-type
	1167	This function returns a coding system which is like @var{coding-system}
	1168	except for its eol conversion, which is specified by @code{eol-type}.
	1169	@var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or
	1170	@code{nil}. If it is @code{nil}, the returned coding system determines
	1171	the end-of-line conversion from the data.
	1172
	1173	@var{eol-type} may also be 0, 1 or 2, standing for @code{unix},
	1174	@code{dos} and @code{mac}, respectively.
	1175	@end defun
	1176
	1177	@cindex text conversion of coding system
	1178	@defun coding-system-change-text-conversion eol-coding text-coding
	1179	This function returns a coding system which uses the end-of-line
	1180	conversion of @var{eol-coding}, and the text conversion of
	1181	@var{text-coding}. If @var{text-coding} is @code{nil}, it returns
	1182	@code{undecided}, or one of its variants according to @var{eol-coding}.
	1183	@end defun
	1184
	1185	@cindex safely encode region
	1186	@cindex coding systems for encoding region
	1187	@defun find-coding-systems-region from to
	1188	This function returns a list of coding systems that could be used to
	1189	encode a text between @var{from} and @var{to}. All coding systems in
	1190	the list can safely encode any multibyte characters in that portion of
	1191	the text.
	1192
	1193	If the text contains no multibyte characters, the function returns the
	1194	list @code{(undecided)}.
	1195	@end defun
	1196
	1197	@cindex safely encode a string
	1198	@cindex coding systems for encoding a string
	1199	@defun find-coding-systems-string string
	1200	This function returns a list of coding systems that could be used to
	1201	encode the text of @var{string}. All coding systems in the list can
	1202	safely encode any multibyte characters in @var{string}. If the text
	1203	contains no multibyte characters, this returns the list
	1204	@code{(undecided)}.
	1205	@end defun
	1206
	1207	@cindex charset, coding systems to encode
	1208	@cindex safely encode characters in a charset
	1209	@defun find-coding-systems-for-charsets charsets
	1210	This function returns a list of coding systems that could be used to
	1211	encode all the character sets in the list @var{charsets}.
	1212	@end defun
	1213
	1214	@defun check-coding-systems-region start end coding-system-list
	1215	This function checks whether coding systems in the list
	1216	@code{coding-system-list} can encode all the characters in the region
	1217	between @var{start} and @var{end}. If all of the coding systems in
	1218	the list can encode the specified text, the function returns
	1219	@code{nil}. If some coding systems cannot encode some of the
	1220	characters, the value is an alist, each element of which has the form
	1221	@code{(@var{coding-system1} @var{pos1} @var{pos2} @dots{})}, meaning
	1222	that @var{coding-system1} cannot encode characters at buffer positions
	1223	@var{pos1}, @var{pos2}, @enddots{}.
	1224
	1225	@var{start} may be a string, in which case @var{end} is ignored and
	1226	the returned value references string indices instead of buffer
	1227	positions.
	1228	@end defun
	1229
	1230	@defun detect-coding-region start end &optional highest
	1231	This function chooses a plausible coding system for decoding the text
	1232	from @var{start} to @var{end}. This text should be a byte sequence,
	1233	i.e., unibyte text or multibyte text with only @acronym{ASCII} and
	1234	eight-bit characters (@pxref{Explicit Encoding}).
	1235
	1236	Normally this function returns a list of coding systems that could
	1237	handle decoding the text that was scanned. They are listed in order of
	1238	decreasing priority. But if @var{highest} is non-@code{nil}, then the
	1239	return value is just one coding system, the one that is highest in
	1240	priority.
	1241
	1242	If the region contains only @acronym{ASCII} characters except for such
	1243	ISO-2022 control characters ISO-2022 as @code{ESC}, the value is
	1244	@code{undecided} or @code{(undecided)}, or a variant specifying
	1245	end-of-line conversion, if that can be deduced from the text.
	1246
	1247	If the region contains null bytes, the value is @code{no-conversion},
	1248	even if the region contains text encoded in some coding system.
	1249	@end defun
	1250
	1251	@defun detect-coding-string string &optional highest
	1252	This function is like @code{detect-coding-region} except that it
	1253	operates on the contents of @var{string} instead of bytes in the buffer.
	1254	@end defun
	1255
	1256	@cindex null bytes, and decoding text
	1257	@defvar inhibit-null-byte-detection
	1258	If this variable has a non-@code{nil} value, null bytes are ignored
	1259	when detecting the encoding of a region or a string. This allows to
	1260	correctly detect the encoding of text that contains null bytes, such
	1261	as Info files with Index nodes.
	1262	@end defvar
	1263
	1264	@defvar inhibit-iso-escape-detection
	1265	If this variable has a non-@code{nil} value, ISO-2022 escape sequences
	1266	are ignored when detecting the encoding of a region or a string. The
	1267	result is that no text is ever detected as encoded in some ISO-2022
	1268	encoding, and all escape sequences become visible in a buffer.
	1269	@strong{Warning:} @emph{Use this variable with extreme caution,
	1270	because many files in the Emacs distribution use ISO-2022 encoding.}
	1271	@end defvar
	1272
	1273	@cindex charsets supported by a coding system
	1274	@defun coding-system-charset-list coding-system
	1275	This function returns the list of character sets (@pxref{Character
	1276	Sets}) supported by @var{coding-system}. Some coding systems that
	1277	support too many character sets to list them all yield special values:
	1278	@itemize @bullet
	1279	@item
	1280	If @var{coding-system} supports all the ISO-2022 charsets, the value
	1281	is @code{iso-2022}.
	1282	@item
	1283	If @var{coding-system} supports all Emacs characters, the value is
	1284	@code{(emacs)}.
	1285	@item
	1286	If @var{coding-system} supports all emacs-mule characters, the value
	1287	is @code{emacs-mule}.
	1288	@item
	1289	If @var{coding-system} supports all Unicode characters, the value is
	1290	@code{(unicode)}.
	1291	@end itemize
	1292	@end defun
	1293
	1294	@xref{Coding systems for a subprocess,, Process Information}, in
	1295	particular the description of the functions
	1296	@code{process-coding-system} and @code{set-process-coding-system}, for
	1297	how to examine or set the coding systems used for I/O to a subprocess.
	1298
	1299	@node User-Chosen Coding Systems
	1300	@subsection User-Chosen Coding Systems
	1301
	1302	@cindex select safe coding system
	1303	@defun select-safe-coding-system from to &optional default-coding-system accept-default-p file
	1304	This function selects a coding system for encoding specified text,
	1305	asking the user to choose if necessary. Normally the specified text
	1306	is the text in the current buffer between @var{from} and @var{to}. If
	1307	@var{from} is a string, the string specifies the text to encode, and
	1308	@var{to} is ignored.
	1309
	1310	If the specified text includes raw bytes (@pxref{Text
	1311	Representations}), @code{select-safe-coding-system} suggests
	1312	@code{raw-text} for its encoding.
	1313
	1314	If @var{default-coding-system} is non-@code{nil}, that is the first
	1315	coding system to try; if that can handle the text,
	1316	@code{select-safe-coding-system} returns that coding system. It can
	1317	also be a list of coding systems; then the function tries each of them
	1318	one by one. After trying all of them, it next tries the current
	1319	buffer's value of @code{buffer-file-coding-system} (if it is not
	1320	@code{undecided}), then the default value of
	1321	@code{buffer-file-coding-system} and finally the user's most
	1322	preferred coding system, which the user can set using the command
	1323	@code{prefer-coding-system} (@pxref{Recognize Coding,, Recognizing
	1324	Coding Systems, emacs, The GNU Emacs Manual}).
	1325
	1326	If one of those coding systems can safely encode all the specified
	1327	text, @code{select-safe-coding-system} chooses it and returns it.
	1328	Otherwise, it asks the user to choose from a list of coding systems
	1329	which can encode all the text, and returns the user's choice.
	1330
	1331	@var{default-coding-system} can also be a list whose first element is
	1332	t and whose other elements are coding systems. Then, if no coding
	1333	system in the list can handle the text, @code{select-safe-coding-system}
	1334	queries the user immediately, without trying any of the three
	1335	alternatives described above.
	1336
	1337	The optional argument @var{accept-default-p}, if non-@code{nil},
	1338	should be a function to determine whether a coding system selected
	1339	without user interaction is acceptable. @code{select-safe-coding-system}
	1340	calls this function with one argument, the base coding system of the
	1341	selected coding system. If @var{accept-default-p} returns @code{nil},
	1342	@code{select-safe-coding-system} rejects the silently selected coding
	1343	system, and asks the user to select a coding system from a list of
	1344	possible candidates.
	1345
	1346	@vindex select-safe-coding-system-accept-default-p
	1347	If the variable @code{select-safe-coding-system-accept-default-p} is
	1348	non-@code{nil}, it should be a function taking a single argument.
	1349	It is used in place of @var{accept-default-p}, overriding any
	1350	value supplied for this argument.
	1351
	1352	As a final step, before returning the chosen coding system,
	1353	@code{select-safe-coding-system} checks whether that coding system is
	1354	consistent with what would be selected if the contents of the region
	1355	were read from a file. (If not, this could lead to data corruption in
	1356	a file subsequently re-visited and edited.) Normally,
	1357	@code{select-safe-coding-system} uses @code{buffer-file-name} as the
	1358	file for this purpose, but if @var{file} is non-@code{nil}, it uses
	1359	that file instead (this can be relevant for @code{write-region} and
	1360	similar functions). If it detects an apparent inconsistency,
	1361	@code{select-safe-coding-system} queries the user before selecting the
	1362	coding system.
	1363	@end defun
	1364
	1365	Here are two functions you can use to let the user specify a coding
	1366	system, with completion. @xref{Completion}.
	1367
	1368	@defun read-coding-system prompt &optional default
	1369	This function reads a coding system using the minibuffer, prompting with
	1370	string @var{prompt}, and returns the coding system name as a symbol. If
	1371	the user enters null input, @var{default} specifies which coding system
	1372	to return. It should be a symbol or a string.
	1373	@end defun
	1374
	1375	@defun read-non-nil-coding-system prompt
	1376	This function reads a coding system using the minibuffer, prompting with
	1377	string @var{prompt}, and returns the coding system name as a symbol. If
	1378	the user tries to enter null input, it asks the user to try again.
	1379	@xref{Coding Systems}.
	1380	@end defun
	1381
	1382	@node Default Coding Systems
	1383	@subsection Default Coding Systems
	1384	@cindex default coding system
	1385	@cindex coding system, automatically determined
	1386
	1387	This section describes variables that specify the default coding
	1388	system for certain files or when running certain subprograms, and the
	1389	function that I/O operations use to access them.
	1390
	1391	The idea of these variables is that you set them once and for all to the
	1392	defaults you want, and then do not change them again. To specify a
	1393	particular coding system for a particular operation in a Lisp program,
	1394	don't change these variables; instead, override them using
	1395	@code{coding-system-for-read} and @code{coding-system-for-write}
	1396	(@pxref{Specifying Coding Systems}).
	1397
	1398	@cindex file contents, and default coding system
	1399	@defopt auto-coding-regexp-alist
	1400	This variable is an alist of text patterns and corresponding coding
	1401	systems. Each element has the form @code{(@var{regexp}
	1402	. @var{coding-system})}; a file whose first few kilobytes match
	1403	@var{regexp} is decoded with @var{coding-system} when its contents are
	1404	read into a buffer. The settings in this alist take priority over
	1405	@code{coding:} tags in the files and the contents of
	1406	@code{file-coding-system-alist} (see below). The default value is set
	1407	so that Emacs automatically recognizes mail files in Babyl format and
	1408	reads them with no code conversions.
	1409	@end defopt
	1410
	1411	@cindex file name, and default coding system
	1412	@defopt file-coding-system-alist
	1413	This variable is an alist that specifies the coding systems to use for
	1414	reading and writing particular files. Each element has the form
	1415	@code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular
	1416	expression that matches certain file names. The element applies to file
	1417	names that match @var{pattern}.
	1418
	1419	The @sc{cdr} of the element, @var{coding}, should be either a coding
	1420	system, a cons cell containing two coding systems, or a function name (a
	1421	symbol with a function definition). If @var{coding} is a coding system,
	1422	that coding system is used for both reading the file and writing it. If
	1423	@var{coding} is a cons cell containing two coding systems, its @sc{car}
	1424	specifies the coding system for decoding, and its @sc{cdr} specifies the
	1425	coding system for encoding.
	1426
	1427	If @var{coding} is a function name, the function should take one
	1428	argument, a list of all arguments passed to
	1429	@code{find-operation-coding-system}. It must return a coding system
	1430	or a cons cell containing two coding systems. This value has the same
	1431	meaning as described above.
	1432
	1433	If @var{coding} (or what returned by the above function) is
	1434	@code{undecided}, the normal code-detection is performed.
	1435	@end defopt
	1436
	1437	@defopt auto-coding-alist
	1438	This variable is an alist that specifies the coding systems to use for
	1439	reading and writing particular files. Its form is like that of
	1440	@code{file-coding-system-alist}, but, unlike the latter, this variable
	1441	takes priority over any @code{coding:} tags in the file.
	1442	@end defopt
	1443
	1444	@cindex program name, and default coding system
	1445	@defvar process-coding-system-alist
	1446	This variable is an alist specifying which coding systems to use for a
	1447	subprocess, depending on which program is running in the subprocess. It
	1448	works like @code{file-coding-system-alist}, except that @var{pattern} is
	1449	matched against the program name used to start the subprocess. The coding
	1450	system or systems specified in this alist are used to initialize the
	1451	coding systems used for I/O to the subprocess, but you can specify
	1452	other coding systems later using @code{set-process-coding-system}.
	1453	@end defvar
	1454
	1455	@strong{Warning:} Coding systems such as @code{undecided}, which
	1456	determine the coding system from the data, do not work entirely reliably
	1457	with asynchronous subprocess output. This is because Emacs handles
	1458	asynchronous subprocess output in batches, as it arrives. If the coding
	1459	system leaves the character code conversion unspecified, or leaves the
	1460	end-of-line conversion unspecified, Emacs must try to detect the proper
	1461	conversion from one batch at a time, and this does not always work.
	1462
	1463	Therefore, with an asynchronous subprocess, if at all possible, use a
	1464	coding system which determines both the character code conversion and
	1465	the end of line conversion---that is, one like @code{latin-1-unix},
	1466	rather than @code{undecided} or @code{latin-1}.
	1467
	1468	@cindex port number, and default coding system
	1469	@cindex network service name, and default coding system
	1470	@defvar network-coding-system-alist
	1471	This variable is an alist that specifies the coding system to use for
	1472	network streams. It works much like @code{file-coding-system-alist},
	1473	with the difference that the @var{pattern} in an element may be either a
	1474	port number or a regular expression. If it is a regular expression, it
	1475	is matched against the network service name used to open the network
	1476	stream.
	1477	@end defvar
	1478
	1479	@defvar default-process-coding-system
	1480	This variable specifies the coding systems to use for subprocess (and
	1481	network stream) input and output, when nothing else specifies what to
	1482	do.
	1483
	1484	The value should be a cons cell of the form @code{(@var{input-coding}
	1485	. @var{output-coding})}. Here @var{input-coding} applies to input from
	1486	the subprocess, and @var{output-coding} applies to output to it.
	1487	@end defvar
	1488
	1489	@cindex default coding system, functions to determine
	1490	@defopt auto-coding-functions
	1491	This variable holds a list of functions that try to determine a
	1492	coding system for a file based on its undecoded contents.
	1493
	1494	Each function in this list should be written to look at text in the
	1495	current buffer, but should not modify it in any way. The buffer will
	1496	contain undecoded text of parts of the file. Each function should
	1497	take one argument, @var{size}, which tells it how many characters to
	1498	look at, starting from point. If the function succeeds in determining
	1499	a coding system for the file, it should return that coding system.
	1500	Otherwise, it should return @code{nil}.
	1501
	1502	If a file has a @samp{coding:} tag, that takes precedence, so these
	1503	functions won't be called.
	1504	@end defopt
	1505
	1506	@defun find-auto-coding filename size
	1507	This function tries to determine a suitable coding system for
	1508	@var{filename}. It examines the buffer visiting the named file, using
	1509	the variables documented above in sequence, until it finds a match for
	1510	one of the rules specified by these variables. It then returns a cons
	1511	cell of the form @code{(@var{coding} . @var{source})}, where
	1512	@var{coding} is the coding system to use and @var{source} is a symbol,
	1513	one of @code{auto-coding-alist}, @code{auto-coding-regexp-alist},
	1514	@code{:coding}, or @code{auto-coding-functions}, indicating which one
	1515	supplied the matching rule. The value @code{:coding} means the coding
	1516	system was specified by the @code{coding:} tag in the file
	1517	(@pxref{Specify Coding,, coding tag, emacs, The GNU Emacs Manual}).
	1518	The order of looking for a matching rule is @code{auto-coding-alist}
	1519	first, then @code{auto-coding-regexp-alist}, then the @code{coding:}
	1520	tag, and lastly @code{auto-coding-functions}. If no matching rule was
	1521	found, the function returns @code{nil}.
	1522
	1523	The second argument @var{size} is the size of text, in characters,
	1524	following point. The function examines text only within @var{size}
	1525	characters after point. Normally, the buffer should be positioned at
	1526	the beginning when this function is called, because one of the places
	1527	for the @code{coding:} tag is the first one or two lines of the file;
	1528	in that case, @var{size} should be the size of the buffer.
	1529	@end defun
	1530
	1531	@defun set-auto-coding filename size
	1532	This function returns a suitable coding system for file
	1533	@var{filename}. It uses @code{find-auto-coding} to find the coding
	1534	system. If no coding system could be determined, the function returns
	1535	@code{nil}. The meaning of the argument @var{size} is like in
	1536	@code{find-auto-coding}.
	1537	@end defun
	1538
	1539	@defun find-operation-coding-system operation &rest arguments
	1540	This function returns the coding system to use (by default) for
	1541	performing @var{operation} with @var{arguments}. The value has this
	1542	form:
	1543
	1544	@example
	1545	(@var{decoding-system} . @var{encoding-system})
	1546	@end example
	1547
	1548	The first element, @var{decoding-system}, is the coding system to use
	1549	for decoding (in case @var{operation} does decoding), and
	1550	@var{encoding-system} is the coding system for encoding (in case
	1551	@var{operation} does encoding).
	1552
	1553	The argument @var{operation} is a symbol; it should be one of
	1554	@code{write-region}, @code{start-process}, @code{call-process},
	1555	@code{call-process-region}, @code{insert-file-contents}, or
	1556	@code{open-network-stream}. These are the names of the Emacs I/O
	1557	primitives that can do character code and eol conversion.
	1558
	1559	The remaining arguments should be the same arguments that might be given
	1560	to the corresponding I/O primitive. Depending on the primitive, one
	1561	of those arguments is selected as the @dfn{target}. For example, if
	1562	@var{operation} does file I/O, whichever argument specifies the file
	1563	name is the target. For subprocess primitives, the process name is the
	1564	target. For @code{open-network-stream}, the target is the service name
	1565	or port number.
	1566
	1567	Depending on @var{operation}, this function looks up the target in
	1568	@code{file-coding-system-alist}, @code{process-coding-system-alist},
	1569	or @code{network-coding-system-alist}. If the target is found in the
	1570	alist, @code{find-operation-coding-system} returns its association in
	1571	the alist; otherwise it returns @code{nil}.
	1572
	1573	If @var{operation} is @code{insert-file-contents}, the argument
	1574	corresponding to the target may be a cons cell of the form
	1575	@code{(@var{filename} . @var{buffer})}. In that case, @var{filename}
	1576	is a file name to look up in @code{file-coding-system-alist}, and
	1577	@var{buffer} is a buffer that contains the file's contents (not yet
	1578	decoded). If @code{file-coding-system-alist} specifies a function to
	1579	call for this file, and that function needs to examine the file's
	1580	contents (as it usually does), it should examine the contents of
	1581	@var{buffer} instead of reading the file.
	1582	@end defun
	1583
	1584	@node Specifying Coding Systems
	1585	@subsection Specifying a Coding System for One Operation
	1586
	1587	You can specify the coding system for a specific operation by binding
	1588	the variables @code{coding-system-for-read} and/or
	1589	@code{coding-system-for-write}.
	1590
	1591	@defvar coding-system-for-read
	1592	If this variable is non-@code{nil}, it specifies the coding system to
	1593	use for reading a file, or for input from a synchronous subprocess.
	1594
	1595	It also applies to any asynchronous subprocess or network stream, but in
	1596	a different way: the value of @code{coding-system-for-read} when you
	1597	start the subprocess or open the network stream specifies the input
	1598	decoding method for that subprocess or network stream. It remains in
	1599	use for that subprocess or network stream unless and until overridden.
	1600
	1601	The right way to use this variable is to bind it with @code{let} for a
	1602	specific I/O operation. Its global value is normally @code{nil}, and
	1603	you should not globally set it to any other value. Here is an example
	1604	of the right way to use the variable:
	1605
	1606	@example
	1607	;; @r{Read the file with no character code conversion.}
	1608	;; @r{Assume @acronym{crlf} represents end-of-line.}
	1609	(let ((coding-system-for-read 'emacs-mule-dos))
	1610	(insert-file-contents filename))
	1611	@end example
	1612
	1613	When its value is non-@code{nil}, this variable takes precedence over
	1614	all other methods of specifying a coding system to use for input,
	1615	including @code{file-coding-system-alist},
	1616	@code{process-coding-system-alist} and
	1617	@code{network-coding-system-alist}.
	1618	@end defvar
	1619
	1620	@defvar coding-system-for-write
	1621	This works much like @code{coding-system-for-read}, except that it
	1622	applies to output rather than input. It affects writing to files,
	1623	as well as sending output to subprocesses and net connections.
	1624
	1625	When a single operation does both input and output, as do
	1626	@code{call-process-region} and @code{start-process}, both
	1627	@code{coding-system-for-read} and @code{coding-system-for-write}
	1628	affect it.
	1629	@end defvar
	1630
	1631	@defopt inhibit-eol-conversion
	1632	When this variable is non-@code{nil}, no end-of-line conversion is done,
	1633	no matter which coding system is specified. This applies to all the
	1634	Emacs I/O and subprocess primitives, and to the explicit encoding and
	1635	decoding functions (@pxref{Explicit Encoding}).
	1636	@end defopt
	1637
	1638	@cindex priority order of coding systems
	1639	@cindex coding systems, priority
	1640	Sometimes, you need to prefer several coding systems for some
	1641	operation, rather than fix a single one. Emacs lets you specify a
	1642	priority order for using coding systems. This ordering affects the
	1643	sorting of lists of coding systems returned by functions such as
	1644	@code{find-coding-systems-region} (@pxref{Lisp and Coding Systems}).
	1645
	1646	@defun coding-system-priority-list &optional highestp
	1647	This function returns the list of coding systems in the order of their
	1648	current priorities. Optional argument @var{highestp}, if
	1649	non-@code{nil}, means return only the highest priority coding system.
	1650	@end defun
	1651
	1652	@defun set-coding-system-priority &rest coding-systems
	1653	This function puts @var{coding-systems} at the beginning of the
	1654	priority list for coding systems, thus making their priority higher
	1655	than all the rest.
	1656	@end defun
	1657
	1658	@defmac with-coding-priority coding-systems &rest body@dots{}
	1659	This macro execute @var{body}, like @code{progn} does
	1660	(@pxref{Sequencing, progn}), with @var{coding-systems} at the front of
	1661	the priority list for coding systems. @var{coding-systems} should be
	1662	a list of coding systems to prefer during execution of @var{body}.
	1663	@end defmac
	1664
	1665	@node Explicit Encoding
	1666	@subsection Explicit Encoding and Decoding
	1667	@cindex encoding in coding systems
	1668	@cindex decoding in coding systems
	1669
	1670	All the operations that transfer text in and out of Emacs have the
	1671	ability to use a coding system to encode or decode the text.
	1672	You can also explicitly encode and decode text using the functions
	1673	in this section.
	1674
	1675	The result of encoding, and the input to decoding, are not ordinary
	1676	text. They logically consist of a series of byte values; that is, a
	1677	series of @acronym{ASCII} and eight-bit characters. In unibyte
	1678	buffers and strings, these characters have codes in the range 0
	1679	through #xFF (255). In a multibyte buffer or string, eight-bit
	1680	characters have character codes higher than #xFF (@pxref{Text
	1681	Representations}), but Emacs transparently converts them to their
	1682	single-byte values when you encode or decode such text.
	1683
	1684	The usual way to read a file into a buffer as a sequence of bytes, so
	1685	you can decode the contents explicitly, is with
	1686	@code{insert-file-contents-literally} (@pxref{Reading from Files});
	1687	alternatively, specify a non-@code{nil} @var{rawfile} argument when
	1688	visiting a file with @code{find-file-noselect}. These methods result in
	1689	a unibyte buffer.
	1690
	1691	The usual way to use the byte sequence that results from explicitly
	1692	encoding text is to copy it to a file or process---for example, to write
	1693	it with @code{write-region} (@pxref{Writing to Files}), and suppress
	1694	encoding by binding @code{coding-system-for-write} to
	1695	@code{no-conversion}.
	1696
	1697	Here are the functions to perform explicit encoding or decoding. The
	1698	encoding functions produce sequences of bytes; the decoding functions
	1699	are meant to operate on sequences of bytes. All of these functions
	1700	discard text properties. They also set @code{last-coding-system-used}
	1701	to the precise coding system they used.
	1702
	1703	@deffn Command encode-coding-region start end coding-system &optional destination
	1704	This command encodes the text from @var{start} to @var{end} according
	1705	to coding system @var{coding-system}. Normally, the encoded text
	1706	replaces the original text in the buffer, but the optional argument
	1707	@var{destination} can change that. If @var{destination} is a buffer,
	1708	the encoded text is inserted in that buffer after point (point does
	1709	not move); if it is @code{t}, the command returns the encoded text as
	1710	a unibyte string without inserting it.
	1711
	1712	If encoded text is inserted in some buffer, this command returns the
	1713	length of the encoded text.
	1714
	1715	The result of encoding is logically a sequence of bytes, but the
	1716	buffer remains multibyte if it was multibyte before, and any 8-bit
	1717	bytes are converted to their multibyte representation (@pxref{Text
	1718	Representations}).
	1719
	1720	@cindex @code{undecided} coding-system, when encoding
	1721	Do @emph{not} use @code{undecided} for @var{coding-system} when
	1722	encoding text, since that may lead to unexpected results. Instead,
	1723	use @code{select-safe-coding-system} (@pxref{User-Chosen Coding
	1724	Systems, select-safe-coding-system}) to suggest a suitable encoding,
	1725	if there's no obvious pertinent value for @var{coding-system}.
	1726	@end deffn
	1727
	1728	@defun encode-coding-string string coding-system &optional nocopy buffer
	1729	This function encodes the text in @var{string} according to coding
	1730	system @var{coding-system}. It returns a new string containing the
	1731	encoded text, except when @var{nocopy} is non-@code{nil}, in which
	1732	case the function may return @var{string} itself if the encoding
	1733	operation is trivial. The result of encoding is a unibyte string.
	1734	@end defun
	1735
	1736	@deffn Command decode-coding-region start end coding-system &optional destination
	1737	This command decodes the text from @var{start} to @var{end} according
	1738	to coding system @var{coding-system}. To make explicit decoding
	1739	useful, the text before decoding ought to be a sequence of byte
	1740	values, but both multibyte and unibyte buffers are acceptable (in the
	1741	multibyte case, the raw byte values should be represented as eight-bit
	1742	characters). Normally, the decoded text replaces the original text in
	1743	the buffer, but the optional argument @var{destination} can change
	1744	that. If @var{destination} is a buffer, the decoded text is inserted
	1745	in that buffer after point (point does not move); if it is @code{t},
	1746	the command returns the decoded text as a multibyte string without
	1747	inserting it.
	1748
	1749	If decoded text is inserted in some buffer, this command returns the
	1750	length of the decoded text.
	1751
	1752	This command puts a @code{charset} text property on the decoded text.
	1753	The value of the property states the character set used to decode the
	1754	original text.
	1755	@end deffn
	1756
	1757	@defun decode-coding-string string coding-system &optional nocopy buffer
	1758	This function decodes the text in @var{string} according to
	1759	@var{coding-system}. It returns a new string containing the decoded
	1760	text, except when @var{nocopy} is non-@code{nil}, in which case the
	1761	function may return @var{string} itself if the decoding operation is
	1762	trivial. To make explicit decoding useful, the contents of
	1763	@var{string} ought to be a unibyte string with a sequence of byte
	1764	values, but a multibyte string is also acceptable (assuming it
	1765	contains 8-bit bytes in their multibyte form).
	1766
	1767	If optional argument @var{buffer} specifies a buffer, the decoded text
	1768	is inserted in that buffer after point (point does not move). In this
	1769	case, the return value is the length of the decoded text.
	1770
	1771	@cindex @code{charset}, text property
	1772	This function puts a @code{charset} text property on the decoded text.
	1773	The value of the property states the character set used to decode the
	1774	original text:
	1775
	1776	@example
	1777	@group
	1778	(decode-coding-string "Gr\374ss Gott" 'latin-1)
	1779	@result{} #("Gr@"uss Gott" 0 9 (charset iso-8859-1))
	1780	@end group
	1781	@end example
	1782	@end defun
	1783
	1784	@defun decode-coding-inserted-region from to filename &optional visit beg end replace
	1785	This function decodes the text from @var{from} to @var{to} as if
	1786	it were being read from file @var{filename} using @code{insert-file-contents}
	1787	using the rest of the arguments provided.
	1788
	1789	The normal way to use this function is after reading text from a file
	1790	without decoding, if you decide you would rather have decoded it.
	1791	Instead of deleting the text and reading it again, this time with
	1792	decoding, you can call this function.
	1793	@end defun
	1794
	1795	@node Terminal I/O Encoding
	1796	@subsection Terminal I/O Encoding
	1797
	1798	Emacs can decode keyboard input using a coding system, and encode
	1799	terminal output. This is useful for terminals that transmit or
	1800	display text using a particular encoding such as Latin-1. Emacs does
	1801	not set @code{last-coding-system-used} for encoding or decoding of
	1802	terminal I/O.
	1803
	1804	@defun keyboard-coding-system &optional terminal
	1805	This function returns the coding system that is in use for decoding
	1806	keyboard input from @var{terminal}---or @code{nil} if no coding system
	1807	is to be used for that terminal. If @var{terminal} is omitted or
	1808	@code{nil}, it means the selected frame's terminal. @xref{Multiple
	1809	Terminals}.
	1810	@end defun
	1811
	1812	@deffn Command set-keyboard-coding-system coding-system &optional terminal
	1813	This command specifies @var{coding-system} as the coding system to use
	1814	for decoding keyboard input from @var{terminal}. If
	1815	@var{coding-system} is @code{nil}, that means do not decode keyboard
	1816	input. If @var{terminal} is a frame, it means that frame's terminal;
	1817	if it is @code{nil}, that means the currently selected frame's
	1818	terminal. @xref{Multiple Terminals}.
	1819	@end deffn
	1820
	1821	@defun terminal-coding-system &optional terminal
	1822	This function returns the coding system that is in use for encoding
	1823	terminal output from @var{terminal}---or @code{nil} if the output is
	1824	not encoded. If @var{terminal} is a frame, it means that frame's
	1825	terminal; if it is @code{nil}, that means the currently selected
	1826	frame's terminal.
	1827	@end defun
	1828
	1829	@deffn Command set-terminal-coding-system coding-system &optional terminal
	1830	This command specifies @var{coding-system} as the coding system to use
	1831	for encoding terminal output from @var{terminal}. If
	1832	@var{coding-system} is @code{nil}, terminal output is not encoded. If
	1833	@var{terminal} is a frame, it means that frame's terminal; if it is
	1834	@code{nil}, that means the currently selected frame's terminal.
	1835	@end deffn
	1836
	1837	@node Input Methods
	1838	@section Input Methods
	1839	@cindex input methods
	1840
	1841	@dfn{Input methods} provide convenient ways of entering non-@acronym{ASCII}
	1842	characters from the keyboard. Unlike coding systems, which translate
	1843	non-@acronym{ASCII} characters to and from encodings meant to be read by
	1844	programs, input methods provide human-friendly commands. (@xref{Input
	1845	Methods,,, emacs, The GNU Emacs Manual}, for information on how users
	1846	use input methods to enter text.) How to define input methods is not
	1847	yet documented in this manual, but here we describe how to use them.
	1848
	1849	Each input method has a name, which is currently a string;
	1850	in the future, symbols may also be usable as input method names.
	1851
	1852	@defvar current-input-method
	1853	This variable holds the name of the input method now active in the
	1854	current buffer. (It automatically becomes local in each buffer when set
	1855	in any fashion.) It is @code{nil} if no input method is active in the
	1856	buffer now.
	1857	@end defvar
	1858
	1859	@defopt default-input-method
	1860	This variable holds the default input method for commands that choose an
	1861	input method. Unlike @code{current-input-method}, this variable is
	1862	normally global.
	1863	@end defopt
	1864
	1865	@deffn Command set-input-method input-method
	1866	This command activates input method @var{input-method} for the current
	1867	buffer. It also sets @code{default-input-method} to @var{input-method}.
	1868	If @var{input-method} is @code{nil}, this command deactivates any input
	1869	method for the current buffer.
	1870	@end deffn
	1871
	1872	@defun read-input-method-name prompt &optional default inhibit-null
	1873	This function reads an input method name with the minibuffer, prompting
	1874	with @var{prompt}. If @var{default} is non-@code{nil}, that is returned
	1875	by default, if the user enters empty input. However, if
	1876	@var{inhibit-null} is non-@code{nil}, empty input signals an error.
	1877
	1878	The returned value is a string.
	1879	@end defun
	1880
	1881	@defvar input-method-alist
	1882	This variable defines all the supported input methods.
	1883	Each element defines one input method, and should have the form:
	1884
	1885	@example
	1886	(@var{input-method} @var{language-env} @var{activate-func}
	1887	@var{title} @var{description} @var{args}...)
	1888	@end example
	1889
	1890	Here @var{input-method} is the input method name, a string;
	1891	@var{language-env} is another string, the name of the language
	1892	environment this input method is recommended for. (That serves only for
	1893	documentation purposes.)
	1894
	1895	@var{activate-func} is a function to call to activate this method. The
	1896	@var{args}, if any, are passed as arguments to @var{activate-func}. All
	1897	told, the arguments to @var{activate-func} are @var{input-method} and
	1898	the @var{args}.
	1899
	1900	@var{title} is a string to display in the mode line while this method is
	1901	active. @var{description} is a string describing this method and what
	1902	it is good for.
	1903	@end defvar
	1904
	1905	The fundamental interface to input methods is through the
	1906	variable @code{input-method-function}. @xref{Reading One Event},
	1907	and @ref{Invoking the Input Method}.
	1908
	1909	@node Locales
	1910	@section Locales
	1911	@cindex locale
	1912
	1913	POSIX defines a concept of ``locales'' which control which language
	1914	to use in language-related features. These Emacs variables control
	1915	how Emacs interacts with these features.
	1916
	1917	@defvar locale-coding-system
	1918	@cindex keyboard input decoding on X
	1919	This variable specifies the coding system to use for decoding system
	1920	error messages and---on X Window system only---keyboard input, for
	1921	encoding the format argument to @code{format-time-string}, and for
	1922	decoding the return value of @code{format-time-string}.
	1923	@end defvar
	1924
	1925	@defvar system-messages-locale
	1926	This variable specifies the locale to use for generating system error
	1927	messages. Changing the locale can cause messages to come out in a
	1928	different language or in a different orthography. If the variable is
	1929	@code{nil}, the locale is specified by environment variables in the
	1930	usual POSIX fashion.
	1931	@end defvar
	1932
	1933	@defvar system-time-locale
	1934	This variable specifies the locale to use for formatting time values.
	1935	Changing the locale can cause messages to appear according to the
	1936	conventions of a different language. If the variable is @code{nil}, the
	1937	locale is specified by environment variables in the usual POSIX fashion.
	1938	@end defvar
	1939
	1940	@defun locale-info item
	1941	This function returns locale data @var{item} for the current POSIX
	1942	locale, if available. @var{item} should be one of these symbols:
	1943
	1944	@table @code
	1945	@item codeset
	1946	Return the character set as a string (locale item @code{CODESET}).
	1947
	1948	@item days
	1949	Return a 7-element vector of day names (locale items
	1950	@code{DAY_1} through @code{DAY_7});
	1951
	1952	@item months
	1953	Return a 12-element vector of month names (locale items @code{MON_1}
	1954	through @code{MON_12}).
	1955
	1956	@item paper
	1957	Return a list @code{(@var{width} @var{height})} for the default paper
	1958	size measured in millimeters (locale items @code{PAPER_WIDTH} and
	1959	@code{PAPER_HEIGHT}).
	1960	@end table
	1961
	1962	If the system can't provide the requested information, or if
	1963	@var{item} is not one of those symbols, the value is @code{nil}. All
	1964	strings in the return value are decoded using
	1965	@code{locale-coding-system}. @xref{Locales,,, libc, The GNU Libc Manual},
	1966	for more information about locales and locale items.
	1967	@end defun