X-Git-Url: http://git.hcoop.net/bpt/emacs.git/blobdiff_plain/bc039a3b7dee37fa86932a54af083b2c7ac37fd3..acaf905b1130aae80fa59d2c861ffd4c8eb75486:/doc/lispref/nonascii.texi

diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi
index a3f25af471..19c7298117 100644
--- a/doc/lispref/nonascii.texi
+++ b/doc/lispref/nonascii.texi
@@ -1,6 +1,6 @@
 @c -*-texinfo-*-
 @c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1998-1999, 2001-2011  Free Software Foundation, Inc.
+@c Copyright (C) 1998-1999, 2001-2012  Free Software Foundation, Inc.
 @c See the file elisp.texi for copying conditions.
 @setfilename ../../info/characters
 @node Non-ASCII Characters, Searching and Matching, Text, Top
@@ -201,7 +201,7 @@ characters.
 @defun byte-to-string byte
 @cindex byte to string
 This function returns a unibyte string containing a single byte of
-character data, @var{character}.  It signals a error if
+character data, @var{character}.  It signals an error if
 @var{character} is not an integer between 0 and 255.
 @end defun
 
@@ -369,29 +369,44 @@ replacing each @samp{_} character with a dash @samp{-}.  For example,
 @code{canonical-combining-class}.  However, sometimes we shorten the
 names to make their use easier.
 
+@cindex unassigned character codepoints
+  Some codepoints are left @dfn{unassigned} by the
+@acronym{UCD}---they don't correspond to any character.  The Unicode
+Standard defines default values of properties for such codepoints;
+they are mentioned below for each property.
+
   Here is the full list of value types for all the character
 properties that Emacs knows about:
 
 @table @code
 @item name
-This property corresponds to the Unicode @code{Name} property.  The
-value is a string consisting of upper-case Latin letters A to Z,
-digits, spaces, and hyphen @samp{-} characters.
+Corresponds to the @code{Name} Unicode property.  The value is a
+string consisting of upper-case Latin letters A to Z, digits, spaces,
+and hyphen @samp{-} characters.  For unassigned codepoints, the value
+is an empty string.
 
 @cindex unicode general category
 @item general-category
-This property corresponds to the Unicode @code{General_Category}
-property.  The value is a symbol whose name is a 2-letter abbreviation
-of the character's classification.
+Corresponds to the @code{General_Category} Unicode property.  The
+value is a symbol whose name is a 2-letter abbreviation of the
+character's classification.  For unassigned codepoints, the value
+is @code{Cn}.
 
 @item canonical-combining-class
-Corresponds to the Unicode @code{Canonical_Combining_Class} property.
-The value is an integer number.
+Corresponds to the @code{Canonical_Combining_Class} Unicode property.
+The value is an integer number.  For unassigned codepoints, the value
+is zero.
 
+@cindex bidirectional class of characters
 @item bidi-class
 Corresponds to the Unicode @code{Bidi_Class} property.  The value is a
 symbol whose name is the Unicode @dfn{directional type} of the
-character.
+character.  Emacs uses this property when it reorders bidirectional
+text for display (@pxref{Bidirectional Display}).  For unassigned
+codepoints, the value depends on the code blocks to which the
+codepoint belongs: most unassigned codepoints get the value of
+@code{L} (strong L), but some get values of @code{AL} (Arabic letter)
+or @code{R} (strong R).
 
 @item decomposition
 Corresponds to the Unicode @code{Decomposition_Type} and
@@ -403,19 +418,22 @@ Note that the Unicode spec writes these tag names inside
 brackets; e.g., Unicode specifies @samp{<small>} where Emacs uses
 @samp{small}.
 }; the other elements are characters that give the compatibility
-decomposition sequence of this character.
+decomposition sequence of this character.  For unassigned codepoints,
+the value is the character itself.
 
 @item decimal-digit-value
 Corresponds to the Unicode @code{Numeric_Value} property for
 characters whose @code{Numeric_Type} is @samp{Digit}.  The value is an
-integer number.
+integer number.  For unassigned codepoints, the value is @code{nil},
+which means @acronym{NaN}, or ``not-a-number''.
 
 @item digit-value
 Corresponds to the Unicode @code{Numeric_Value} property for
 characters whose @code{Numeric_Type} is @samp{Decimal}.  The value is
 an integer number.  Examples of such characters include compatibility
 subscript and superscript digits, for which the value is the
-corresponding number.
+corresponding number.  For unassigned codepoints, the value is
+@code{nil}, which means @acronym{NaN}.
 
 @item numeric-value
 Corresponds to the Unicode @code{Numeric_Value} property for
@@ -424,33 +442,53 @@ this property is an integer or a floating-point number.  Examples of
 characters that have this property include fractions, subscripts,
 superscripts, Roman numerals, currency numerators, and encircled
 numbers.  For example, the value of this property for the character
-@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}.
+@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}.  For
+unassigned codepoints, the value is @code{nil}, which means
+@acronym{NaN}.
 
+@cindex mirroring of characters
 @item mirrored
 Corresponds to the Unicode @code{Bidi_Mirrored} property.  The value
-of this property is a symbol, either @code{Y} or @code{N}.
+of this property is a symbol, either @code{Y} or @code{N}.  For
+unassigned codepoints, the value is @code{N}.
+
+@item mirroring
+Corresponds to the Unicode @code{Bidi_Mirroring_Glyph} property.  The
+value of this property is a character whose glyph represents the
+mirror image of the character's glyph, or @code{nil} if there's no
+defined mirroring glyph.  All the characters whose @code{mirrored}
+property is @code{N} have @code{nil} as their @code{mirroring}
+property; however, some characters whose @code{mirrored} property is
+@code{Y} also have @code{nil} for @code{mirroring}, because no
+appropriate characters exist with mirrored glyphs.  Emacs uses this
+property to display mirror images of characters when appropriate
+(@pxref{Bidirectional Display}).  For unassigned codepoints, the value
+is @code{nil}.
 
 @item old-name
 Corresponds to the Unicode @code{Unicode_1_Name} property.  The value
-is a string.
+is a string.  For unassigned codepoints, the value is an empty string.
 
 @item iso-10646-comment
 Corresponds to the Unicode @code{ISO_Comment} property.  The value is
-a string.
+a string.  For unassigned codepoints, the value is an empty string.
 
 @item uppercase
 Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property.
-The value of this property is a single character.
+The value of this property is a single character.  For unassigned
+codepoints, the value is @code{nil}, which means the character itself.
 
 @item lowercase
 Corresponds to the Unicode @code{Simple_Lowercase_Mapping} property.
-The value of this property is a single character.
+The value of this property is a single character.  For unassigned
+codepoints, the value is @code{nil}, which means the character itself.
 
 @item titlecase
 Corresponds to the Unicode @code{Simple_Titlecase_Mapping} property.
 @dfn{Title case} is a special form of a character used when the first
 character of a word needs to be capitalized.  The value of this
-property is a single character.
+property is a single character.  For unassigned codepoints, the value
+is @code{nil}, which means the character itself.
 @end table
 
 @defun get-char-code-property char propname
@@ -466,15 +504,18 @@ This function returns the value of @var{char}'s @var{propname} property.
      @result{} Nd
 @end group
 @group
-(get-char-code-property ?\u2084 'digit-value) ; subscript 4
+;; subscript 4
+(get-char-code-property ?\u2084 'digit-value)
      @result{} 4
 @end group
 @group
-(get-char-code-property ?\u2155 'numeric-value) ; one fifth
+;; one fifth
+(get-char-code-property ?\u2155 'numeric-value)
      @result{} 0.2
 @end group
 @group
-(get-char-code-property ?\u2163 'numeric-value) ; Roman IV
+;; Roman IV
+(get-char-code-property ?\u2163 'numeric-value)
      @result{} 4
 @end group
 @end example
@@ -1449,11 +1490,11 @@ for decoding (in case @var{operation} does decoding), and
 @var{encoding-system} is the coding system for encoding (in case
 @var{operation} does encoding).
 
-The argument @var{operation} is a symbol, one of @code{write-region},
-@code{start-process}, @code{call-process}, @code{call-process-region},
-@code{insert-file-contents}, or @code{open-network-stream}.  These are
-the names of the Emacs I/O primitives that can do character code and
-eol conversion.
+The argument @var{operation} is a symbol; it should be one of
+@code{write-region}, @code{start-process}, @code{call-process},
+@code{call-process-region}, @code{insert-file-contents}, or
+@code{open-network-stream}.  These are the names of the Emacs I/O
+primitives that can do character code and eol conversion.
 
 The remaining arguments should be the same arguments that might be given
 to the corresponding I/O primitive.  Depending on the primitive, one
@@ -1539,7 +1580,7 @@ decoding functions (@pxref{Explicit Encoding}).
   Sometimes, you need to prefer several coding systems for some
 operation, rather than fix a single one.  Emacs lets you specify a
 priority order for using coding systems.  This ordering affects the
-sorting of lists of coding sysems returned by functions such as
+sorting of lists of coding systems returned by functions such as
 @code{find-coding-systems-region} (@pxref{Lisp and Coding Systems}).
 
 @defun coding-system-priority-list &optional highestp