X-Git-Url: http://git.hcoop.net/bpt/emacs.git/blobdiff_plain/bc039a3b7dee37fa86932a54af083b2c7ac37fd3..acaf905b1130aae80fa59d2c861ffd4c8eb75486:/doc/lispref/nonascii.texi diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi index a3f25af471..19c7298117 100644 --- a/doc/lispref/nonascii.texi +++ b/doc/lispref/nonascii.texi @@ -1,6 +1,6 @@ @c -*-texinfo-*- @c This is part of the GNU Emacs Lisp Reference Manual. -@c Copyright (C) 1998-1999, 2001-2011 Free Software Foundation, Inc. +@c Copyright (C) 1998-1999, 2001-2012 Free Software Foundation, Inc. @c See the file elisp.texi for copying conditions. @setfilename ../../info/characters @node Non-ASCII Characters, Searching and Matching, Text, Top @@ -201,7 +201,7 @@ characters. @defun byte-to-string byte @cindex byte to string This function returns a unibyte string containing a single byte of -character data, @var{character}. It signals a error if +character data, @var{character}. It signals an error if @var{character} is not an integer between 0 and 255. @end defun @@ -369,29 +369,44 @@ replacing each @samp{_} character with a dash @samp{-}. For example, @code{canonical-combining-class}. However, sometimes we shorten the names to make their use easier. +@cindex unassigned character codepoints + Some codepoints are left @dfn{unassigned} by the +@acronym{UCD}---they don't correspond to any character. The Unicode +Standard defines default values of properties for such codepoints; +they are mentioned below for each property. + Here is the full list of value types for all the character properties that Emacs knows about: @table @code @item name -This property corresponds to the Unicode @code{Name} property. The -value is a string consisting of upper-case Latin letters A to Z, -digits, spaces, and hyphen @samp{-} characters. +Corresponds to the @code{Name} Unicode property. The value is a +string consisting of upper-case Latin letters A to Z, digits, spaces, +and hyphen @samp{-} characters. For unassigned codepoints, the value +is an empty string. @cindex unicode general category @item general-category -This property corresponds to the Unicode @code{General_Category} -property. The value is a symbol whose name is a 2-letter abbreviation -of the character's classification. +Corresponds to the @code{General_Category} Unicode property. The +value is a symbol whose name is a 2-letter abbreviation of the +character's classification. For unassigned codepoints, the value +is @code{Cn}. @item canonical-combining-class -Corresponds to the Unicode @code{Canonical_Combining_Class} property. -The value is an integer number. +Corresponds to the @code{Canonical_Combining_Class} Unicode property. +The value is an integer number. For unassigned codepoints, the value +is zero. +@cindex bidirectional class of characters @item bidi-class Corresponds to the Unicode @code{Bidi_Class} property. The value is a symbol whose name is the Unicode @dfn{directional type} of the -character. +character. Emacs uses this property when it reorders bidirectional +text for display (@pxref{Bidirectional Display}). For unassigned +codepoints, the value depends on the code blocks to which the +codepoint belongs: most unassigned codepoints get the value of +@code{L} (strong L), but some get values of @code{AL} (Arabic letter) +or @code{R} (strong R). @item decomposition Corresponds to the Unicode @code{Decomposition_Type} and @@ -403,19 +418,22 @@ Note that the Unicode spec writes these tag names inside brackets; e.g., Unicode specifies @samp{} where Emacs uses @samp{small}. }; the other elements are characters that give the compatibility -decomposition sequence of this character. +decomposition sequence of this character. For unassigned codepoints, +the value is the character itself. @item decimal-digit-value Corresponds to the Unicode @code{Numeric_Value} property for characters whose @code{Numeric_Type} is @samp{Digit}. The value is an -integer number. +integer number. For unassigned codepoints, the value is @code{nil}, +which means @acronym{NaN}, or ``not-a-number''. @item digit-value Corresponds to the Unicode @code{Numeric_Value} property for characters whose @code{Numeric_Type} is @samp{Decimal}. The value is an integer number. Examples of such characters include compatibility subscript and superscript digits, for which the value is the -corresponding number. +corresponding number. For unassigned codepoints, the value is +@code{nil}, which means @acronym{NaN}. @item numeric-value Corresponds to the Unicode @code{Numeric_Value} property for @@ -424,33 +442,53 @@ this property is an integer or a floating-point number. Examples of characters that have this property include fractions, subscripts, superscripts, Roman numerals, currency numerators, and encircled numbers. For example, the value of this property for the character -@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}. +@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}. For +unassigned codepoints, the value is @code{nil}, which means +@acronym{NaN}. +@cindex mirroring of characters @item mirrored Corresponds to the Unicode @code{Bidi_Mirrored} property. The value -of this property is a symbol, either @code{Y} or @code{N}. +of this property is a symbol, either @code{Y} or @code{N}. For +unassigned codepoints, the value is @code{N}. + +@item mirroring +Corresponds to the Unicode @code{Bidi_Mirroring_Glyph} property. The +value of this property is a character whose glyph represents the +mirror image of the character's glyph, or @code{nil} if there's no +defined mirroring glyph. All the characters whose @code{mirrored} +property is @code{N} have @code{nil} as their @code{mirroring} +property; however, some characters whose @code{mirrored} property is +@code{Y} also have @code{nil} for @code{mirroring}, because no +appropriate characters exist with mirrored glyphs. Emacs uses this +property to display mirror images of characters when appropriate +(@pxref{Bidirectional Display}). For unassigned codepoints, the value +is @code{nil}. @item old-name Corresponds to the Unicode @code{Unicode_1_Name} property. The value -is a string. +is a string. For unassigned codepoints, the value is an empty string. @item iso-10646-comment Corresponds to the Unicode @code{ISO_Comment} property. The value is -a string. +a string. For unassigned codepoints, the value is an empty string. @item uppercase Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property. -The value of this property is a single character. +The value of this property is a single character. For unassigned +codepoints, the value is @code{nil}, which means the character itself. @item lowercase Corresponds to the Unicode @code{Simple_Lowercase_Mapping} property. -The value of this property is a single character. +The value of this property is a single character. For unassigned +codepoints, the value is @code{nil}, which means the character itself. @item titlecase Corresponds to the Unicode @code{Simple_Titlecase_Mapping} property. @dfn{Title case} is a special form of a character used when the first character of a word needs to be capitalized. The value of this -property is a single character. +property is a single character. For unassigned codepoints, the value +is @code{nil}, which means the character itself. @end table @defun get-char-code-property char propname @@ -466,15 +504,18 @@ This function returns the value of @var{char}'s @var{propname} property. @result{} Nd @end group @group -(get-char-code-property ?\u2084 'digit-value) ; subscript 4 +;; subscript 4 +(get-char-code-property ?\u2084 'digit-value) @result{} 4 @end group @group -(get-char-code-property ?\u2155 'numeric-value) ; one fifth +;; one fifth +(get-char-code-property ?\u2155 'numeric-value) @result{} 0.2 @end group @group -(get-char-code-property ?\u2163 'numeric-value) ; Roman IV +;; Roman IV +(get-char-code-property ?\u2163 'numeric-value) @result{} 4 @end group @end example @@ -1449,11 +1490,11 @@ for decoding (in case @var{operation} does decoding), and @var{encoding-system} is the coding system for encoding (in case @var{operation} does encoding). -The argument @var{operation} is a symbol, one of @code{write-region}, -@code{start-process}, @code{call-process}, @code{call-process-region}, -@code{insert-file-contents}, or @code{open-network-stream}. These are -the names of the Emacs I/O primitives that can do character code and -eol conversion. +The argument @var{operation} is a symbol; it should be one of +@code{write-region}, @code{start-process}, @code{call-process}, +@code{call-process-region}, @code{insert-file-contents}, or +@code{open-network-stream}. These are the names of the Emacs I/O +primitives that can do character code and eol conversion. The remaining arguments should be the same arguments that might be given to the corresponding I/O primitive. Depending on the primitive, one @@ -1539,7 +1580,7 @@ decoding functions (@pxref{Explicit Encoding}). Sometimes, you need to prefer several coding systems for some operation, rather than fix a single one. Emacs lets you specify a priority order for using coding systems. This ordering affects the -sorting of lists of coding sysems returned by functions such as +sorting of lists of coding systems returned by functions such as @code{find-coding-systems-region} (@pxref{Lisp and Coding Systems}). @defun coding-system-priority-list &optional highestp