| 1 | @c -*-texinfo-*- |
| 2 | @c This is part of the GNU Emacs Lisp Reference Manual. |
| 3 | @c Copyright (C) 1990-1995, 1998-1999, 2001-2014 Free Software |
| 4 | @c Foundation, Inc. |
| 5 | @c See the file elisp.texi for copying conditions. |
| 6 | @node Syntax Tables |
| 7 | @chapter Syntax Tables |
| 8 | @cindex parsing buffer text |
| 9 | @cindex syntax table |
| 10 | @cindex text parsing |
| 11 | |
| 12 | A @dfn{syntax table} specifies the syntactic role of each character |
| 13 | in a buffer. It can be used to determine where words, symbols, and |
| 14 | other syntactic constructs begin and end. This information is used by |
| 15 | many Emacs facilities, including Font Lock mode (@pxref{Font Lock |
| 16 | Mode}) and the various complex movement commands (@pxref{Motion}). |
| 17 | |
| 18 | @menu |
| 19 | * Basics: Syntax Basics. Basic concepts of syntax tables. |
| 20 | * Syntax Descriptors:: How characters are classified. |
| 21 | * Syntax Table Functions:: How to create, examine and alter syntax tables. |
| 22 | * Syntax Properties:: Overriding syntax with text properties. |
| 23 | * Motion and Syntax:: Moving over characters with certain syntaxes. |
| 24 | * Parsing Expressions:: Parsing balanced expressions |
| 25 | using the syntax table. |
| 26 | * Syntax Table Internals:: How syntax table information is stored. |
| 27 | * Categories:: Another way of classifying character syntax. |
| 28 | @end menu |
| 29 | |
| 30 | @node Syntax Basics |
| 31 | @section Syntax Table Concepts |
| 32 | |
| 33 | A syntax table is a data structure which can be used to look up the |
| 34 | @dfn{syntax class} and other syntactic properties of each character. |
| 35 | Syntax tables are used by Lisp programs for scanning and moving across |
| 36 | text. |
| 37 | |
| 38 | Internally, a syntax table is a char-table (@pxref{Char-Tables}). |
| 39 | The element at index @var{c} describes the character with code |
| 40 | @var{c}; its value is a cons cell which specifies the syntax of the |
| 41 | character in question. @xref{Syntax Table Internals}, for details. |
| 42 | However, instead of using @code{aset} and @code{aref} to modify and |
| 43 | inspect syntax table contents, you should usually use the higher-level |
| 44 | functions @code{char-syntax} and @code{modify-syntax-entry}, which are |
| 45 | described in @ref{Syntax Table Functions}. |
| 46 | |
| 47 | @defun syntax-table-p object |
| 48 | This function returns @code{t} if @var{object} is a syntax table. |
| 49 | @end defun |
| 50 | |
| 51 | Each buffer has its own major mode, and each major mode has its own |
| 52 | idea of the syntax class of various characters. For example, in Lisp |
| 53 | mode, the character @samp{;} begins a comment, but in C mode, it |
| 54 | terminates a statement. To support these variations, the syntax table |
| 55 | is local to each buffer. Typically, each major mode has its own |
| 56 | syntax table, which it installs in all buffers that use that mode. |
| 57 | For example, the variable @code{emacs-lisp-mode-syntax-table} holds |
| 58 | the syntax table used by Emacs Lisp mode, and |
| 59 | @code{c-mode-syntax-table} holds the syntax table used by C mode. |
| 60 | Changing a major mode's syntax table alters the syntax in all of that |
| 61 | mode's buffers, as well as in any buffers subsequently put in that |
| 62 | mode. Occasionally, several similar modes share one syntax table. |
| 63 | @xref{Example Major Modes}, for an example of how to set up a syntax |
| 64 | table. |
| 65 | |
| 66 | @cindex standard syntax table |
| 67 | @cindex inheritance, syntax table |
| 68 | A syntax table can @dfn{inherit} from another syntax table, which is |
| 69 | called its @dfn{parent syntax table}. A syntax table can leave the |
| 70 | syntax class of some characters unspecified, by giving them the |
| 71 | ``inherit'' syntax class; such a character then acquires the syntax |
| 72 | class specified by the parent syntax table (@pxref{Syntax Class |
| 73 | Table}). Emacs defines a @dfn{standard syntax table}, which is the |
| 74 | default parent syntax table, and is also the syntax table used by |
| 75 | Fundamental mode. |
| 76 | |
| 77 | @defun standard-syntax-table |
| 78 | This function returns the standard syntax table, which is the syntax |
| 79 | table used in Fundamental mode. |
| 80 | @end defun |
| 81 | |
| 82 | Syntax tables are not used by the Emacs Lisp reader, which has its |
| 83 | own built-in syntactic rules which cannot be changed. (Some Lisp |
| 84 | systems provide ways to redefine the read syntax, but we decided to |
| 85 | leave this feature out of Emacs Lisp for simplicity.) |
| 86 | |
| 87 | @node Syntax Descriptors |
| 88 | @section Syntax Descriptors |
| 89 | @cindex syntax class |
| 90 | |
| 91 | The @dfn{syntax class} of a character describes its syntactic role. |
| 92 | Each syntax table specifies the syntax class of each character. There |
| 93 | is no necessary relationship between the class of a character in one |
| 94 | syntax table and its class in any other table. |
| 95 | |
| 96 | Each syntax class is designated by a mnemonic character, which |
| 97 | serves as the name of the class when you need to specify a class. |
| 98 | Usually, this designator character is one that is often assigned that |
| 99 | class; however, its meaning as a designator is unvarying and |
| 100 | independent of what syntax that character currently has. Thus, |
| 101 | @samp{\} as a designator character always means ``escape character'' |
| 102 | syntax, regardless of whether the @samp{\} character actually has that |
| 103 | syntax in the current syntax table. |
| 104 | @ifnottex |
| 105 | @xref{Syntax Class Table}, for a list of syntax classes and their |
| 106 | designator characters. |
| 107 | @end ifnottex |
| 108 | |
| 109 | @cindex syntax descriptor |
| 110 | A @dfn{syntax descriptor} is a Lisp string that describes the syntax |
| 111 | class and other syntactic properties of a character. When you want to |
| 112 | modify the syntax of a character, that is done by calling the function |
| 113 | @code{modify-syntax-entry} and passing a syntax descriptor as one of |
| 114 | its arguments (@pxref{Syntax Table Functions}). |
| 115 | |
| 116 | The first character in a syntax descriptor must be a syntax class |
| 117 | designator character. The second character, if present, specifies a |
| 118 | matching character (e.g., in Lisp, the matching character for |
| 119 | @samp{(} is @samp{)}); a space specifies that there is no matching |
| 120 | character. Then come characters specifying additional syntax |
| 121 | properties (@pxref{Syntax Flags}). |
| 122 | |
| 123 | If no matching character or flags are needed, only one character |
| 124 | (specifying the syntax class) is sufficient. |
| 125 | |
| 126 | For example, the syntax descriptor for the character @samp{*} in C |
| 127 | mode is @code{". 23"} (i.e., punctuation, matching character slot |
| 128 | unused, second character of a comment-starter, first character of a |
| 129 | comment-ender), and the entry for @samp{/} is @samp{@w{. 14}} (i.e., |
| 130 | punctuation, matching character slot unused, first character of a |
| 131 | comment-starter, second character of a comment-ender). |
| 132 | |
| 133 | Emacs also defines @dfn{raw syntax descriptors}, which are used to |
| 134 | describe syntax classes at a lower level. @xref{Syntax Table |
| 135 | Internals}. |
| 136 | |
| 137 | @menu |
| 138 | * Syntax Class Table:: Table of syntax classes. |
| 139 | * Syntax Flags:: Additional flags each character can have. |
| 140 | @end menu |
| 141 | |
| 142 | @node Syntax Class Table |
| 143 | @subsection Table of Syntax Classes |
| 144 | @cindex syntax class table |
| 145 | |
| 146 | Here is a table of syntax classes, the characters that designate |
| 147 | them, their meanings, and examples of their use. |
| 148 | |
| 149 | @table @asis |
| 150 | @item Whitespace characters: @samp{@ } or @samp{-} |
| 151 | Characters that separate symbols and words from each other. |
| 152 | Typically, whitespace characters have no other syntactic significance, |
| 153 | and multiple whitespace characters are syntactically equivalent to a |
| 154 | single one. Space, tab, and formfeed are classified as whitespace in |
| 155 | almost all major modes. |
| 156 | |
| 157 | This syntax class can be designated by either @w{@samp{@ }} or |
| 158 | @samp{-}. Both designators are equivalent. |
| 159 | |
| 160 | @item Word constituents: @samp{w} |
| 161 | Parts of words in human languages. These are typically used in |
| 162 | variable and command names in programs. All upper- and lower-case |
| 163 | letters, and the digits, are typically word constituents. |
| 164 | |
| 165 | @item Symbol constituents: @samp{_} |
| 166 | Extra characters used in variable and command names along with word |
| 167 | constituents. Examples include the characters @samp{$&*+-_<>} in Lisp |
| 168 | mode, which may be part of a symbol name even though they are not part |
| 169 | of English words. In standard C, the only non-word-constituent |
| 170 | character that is valid in symbols is underscore (@samp{_}). |
| 171 | |
| 172 | @item Punctuation characters: @samp{.} |
| 173 | Characters used as punctuation in a human language, or used in a |
| 174 | programming language to separate symbols from one another. Some |
| 175 | programming language modes, such as Emacs Lisp mode, have no |
| 176 | characters in this class since the few characters that are not symbol |
| 177 | or word constituents all have other uses. Other programming language |
| 178 | modes, such as C mode, use punctuation syntax for operators. |
| 179 | |
| 180 | @item Open parenthesis characters: @samp{(} |
| 181 | @itemx Close parenthesis characters: @samp{)} |
| 182 | Characters used in dissimilar pairs to surround sentences or |
| 183 | expressions. Such a grouping is begun with an open parenthesis |
| 184 | character and terminated with a close. Each open parenthesis |
| 185 | character matches a particular close parenthesis character, and vice |
| 186 | versa. Normally, Emacs indicates momentarily the matching open |
| 187 | parenthesis when you insert a close parenthesis. @xref{Blinking}. |
| 188 | |
| 189 | In human languages, and in C code, the parenthesis pairs are |
| 190 | @samp{()}, @samp{[]}, and @samp{@{@}}. In Emacs Lisp, the delimiters |
| 191 | for lists and vectors (@samp{()} and @samp{[]}) are classified as |
| 192 | parenthesis characters. |
| 193 | |
| 194 | @item String quotes: @samp{"} |
| 195 | Characters used to delimit string constants. The same string quote |
| 196 | character appears at the beginning and the end of a string. Such |
| 197 | quoted strings do not nest. |
| 198 | |
| 199 | The parsing facilities of Emacs consider a string as a single token. |
| 200 | The usual syntactic meanings of the characters in the string are |
| 201 | suppressed. |
| 202 | |
| 203 | The Lisp modes have two string quote characters: double-quote (@samp{"}) |
| 204 | and vertical bar (@samp{|}). @samp{|} is not used in Emacs Lisp, but it |
| 205 | is used in Common Lisp. C also has two string quote characters: |
| 206 | double-quote for strings, and single-quote (@samp{'}) for character |
| 207 | constants. |
| 208 | |
| 209 | Human text has no string quote characters. We do not want quotation |
| 210 | marks to turn off the usual syntactic properties of other characters |
| 211 | in the quotation. |
| 212 | |
| 213 | @item Escape-syntax characters: @samp{\} |
| 214 | Characters that start an escape sequence, such as is used in string |
| 215 | and character constants. The character @samp{\} belongs to this class |
| 216 | in both C and Lisp. (In C, it is used thus only inside strings, but |
| 217 | it turns out to cause no trouble to treat it this way throughout C |
| 218 | code.) |
| 219 | |
| 220 | Characters in this class count as part of words if |
| 221 | @code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}. |
| 222 | |
| 223 | @item Character quotes: @samp{/} |
| 224 | Characters used to quote the following character so that it loses its |
| 225 | normal syntactic meaning. This differs from an escape character in |
| 226 | that only the character immediately following is ever affected. |
| 227 | |
| 228 | Characters in this class count as part of words if |
| 229 | @code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}. |
| 230 | |
| 231 | This class is used for backslash in @TeX{} mode. |
| 232 | |
| 233 | @item Paired delimiters: @samp{$} |
| 234 | Similar to string quote characters, except that the syntactic |
| 235 | properties of the characters between the delimiters are not |
| 236 | suppressed. Only @TeX{} mode uses a paired delimiter presently---the |
| 237 | @samp{$} that both enters and leaves math mode. |
| 238 | |
| 239 | @item Expression prefixes: @samp{'} |
| 240 | Characters used for syntactic operators that are considered as part of |
| 241 | an expression if they appear next to one. In Lisp modes, these |
| 242 | characters include the apostrophe, @samp{'} (used for quoting), the |
| 243 | comma, @samp{,} (used in macros), and @samp{#} (used in the read |
| 244 | syntax for certain data types). |
| 245 | |
| 246 | @item Comment starters: @samp{<} |
| 247 | @itemx Comment enders: @samp{>} |
| 248 | @cindex comment syntax |
| 249 | Characters used in various languages to delimit comments. Human text |
| 250 | has no comment characters. In Lisp, the semicolon (@samp{;}) starts a |
| 251 | comment and a newline or formfeed ends one. |
| 252 | |
| 253 | @item Inherit standard syntax: @samp{@@} |
| 254 | This syntax class does not specify a particular syntax. It says to |
| 255 | look in the standard syntax table to find the syntax of this |
| 256 | character. |
| 257 | |
| 258 | @item Generic comment delimiters: @samp{!} |
| 259 | Characters that start or end a special kind of comment. @emph{Any} |
| 260 | generic comment delimiter matches @emph{any} generic comment |
| 261 | delimiter, but they cannot match a comment starter or comment ender; |
| 262 | generic comment delimiters can only match each other. |
| 263 | |
| 264 | This syntax class is primarily meant for use with the |
| 265 | @code{syntax-table} text property (@pxref{Syntax Properties}). You |
| 266 | can mark any range of characters as forming a comment, by giving the |
| 267 | first and last characters of the range @code{syntax-table} properties |
| 268 | identifying them as generic comment delimiters. |
| 269 | |
| 270 | @item Generic string delimiters: @samp{|} |
| 271 | Characters that start or end a string. This class differs from the |
| 272 | string quote class in that @emph{any} generic string delimiter can |
| 273 | match any other generic string delimiter; but they do not match |
| 274 | ordinary string quote characters. |
| 275 | |
| 276 | This syntax class is primarily meant for use with the |
| 277 | @code{syntax-table} text property (@pxref{Syntax Properties}). You |
| 278 | can mark any range of characters as forming a string constant, by |
| 279 | giving the first and last characters of the range @code{syntax-table} |
| 280 | properties identifying them as generic string delimiters. |
| 281 | @end table |
| 282 | |
| 283 | @node Syntax Flags |
| 284 | @subsection Syntax Flags |
| 285 | @cindex syntax flags |
| 286 | |
| 287 | In addition to the classes, entries for characters in a syntax table |
| 288 | can specify flags. There are eight possible flags, represented by the |
| 289 | characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b}, @samp{c}, |
| 290 | @samp{n}, and @samp{p}. |
| 291 | |
| 292 | All the flags except @samp{p} are used to describe comment |
| 293 | delimiters. The digit flags are used for comment delimiters made up |
| 294 | of 2 characters. They indicate that a character can @emph{also} be |
| 295 | part of a comment sequence, in addition to the syntactic properties |
| 296 | associated with its character class. The flags are independent of the |
| 297 | class and each other for the sake of characters such as @samp{*} in |
| 298 | C mode, which is a punctuation character, @emph{and} the second |
| 299 | character of a start-of-comment sequence (@samp{/*}), @emph{and} the |
| 300 | first character of an end-of-comment sequence (@samp{*/}). The flags |
| 301 | @samp{b}, @samp{c}, and @samp{n} are used to qualify the corresponding |
| 302 | comment delimiter. |
| 303 | |
| 304 | Here is a table of the possible flags for a character @var{c}, |
| 305 | and what they mean: |
| 306 | |
| 307 | @itemize @bullet |
| 308 | @item |
| 309 | @samp{1} means @var{c} is the start of a two-character comment-start |
| 310 | sequence. |
| 311 | |
| 312 | @item |
| 313 | @samp{2} means @var{c} is the second character of such a sequence. |
| 314 | |
| 315 | @item |
| 316 | @samp{3} means @var{c} is the start of a two-character comment-end |
| 317 | sequence. |
| 318 | |
| 319 | @item |
| 320 | @samp{4} means @var{c} is the second character of such a sequence. |
| 321 | |
| 322 | @item |
| 323 | @samp{b} means that @var{c} as a comment delimiter belongs to the |
| 324 | alternative ``b'' comment style. For a two-character comment starter, |
| 325 | this flag is only significant on the second char, and for a 2-character |
| 326 | comment ender it is only significant on the first char. |
| 327 | |
| 328 | @item |
| 329 | @samp{c} means that @var{c} as a comment delimiter belongs to the |
| 330 | alternative ``c'' comment style. For a two-character comment |
| 331 | delimiter, @samp{c} on either character makes it of style ``c''. |
| 332 | |
| 333 | @item |
| 334 | @samp{n} on a comment delimiter character specifies |
| 335 | that this kind of comment can be nested. For a two-character |
| 336 | comment delimiter, @samp{n} on either character makes it |
| 337 | nestable. |
| 338 | |
| 339 | @cindex comment style |
| 340 | Emacs supports several comment styles simultaneously in any one syntax |
| 341 | table. A comment style is a set of flags @samp{b}, @samp{c}, and |
| 342 | @samp{n}, so there can be up to 8 different comment styles. |
| 343 | Each comment delimiter has a style and only matches comment delimiters |
| 344 | of the same style. Thus if a comment starts with the comment-start |
| 345 | sequence of style ``bn'', it will extend until the next matching |
| 346 | comment-end sequence of style ``bn''. |
| 347 | |
| 348 | The appropriate comment syntax settings for C++ can be as follows: |
| 349 | |
| 350 | @table @asis |
| 351 | @item @samp{/} |
| 352 | @samp{124} |
| 353 | @item @samp{*} |
| 354 | @samp{23b} |
| 355 | @item newline |
| 356 | @samp{>} |
| 357 | @end table |
| 358 | |
| 359 | This defines four comment-delimiting sequences: |
| 360 | |
| 361 | @table @asis |
| 362 | @item @samp{/*} |
| 363 | This is a comment-start sequence for ``b'' style because the |
| 364 | second character, @samp{*}, has the @samp{b} flag. |
| 365 | |
| 366 | @item @samp{//} |
| 367 | This is a comment-start sequence for ``a'' style because the second |
| 368 | character, @samp{/}, does not have the @samp{b} flag. |
| 369 | |
| 370 | @item @samp{*/} |
| 371 | This is a comment-end sequence for ``b'' style because the first |
| 372 | character, @samp{*}, has the @samp{b} flag. |
| 373 | |
| 374 | @item newline |
| 375 | This is a comment-end sequence for ``a'' style, because the newline |
| 376 | character does not have the @samp{b} flag. |
| 377 | @end table |
| 378 | |
| 379 | @item |
| 380 | @samp{p} identifies an additional ``prefix character'' for Lisp syntax. |
| 381 | These characters are treated as whitespace when they appear between |
| 382 | expressions. When they appear within an expression, they are handled |
| 383 | according to their usual syntax classes. |
| 384 | |
| 385 | The function @code{backward-prefix-chars} moves back over these |
| 386 | characters, as well as over characters whose primary syntax class is |
| 387 | prefix (@samp{'}). @xref{Motion and Syntax}. |
| 388 | @end itemize |
| 389 | |
| 390 | @node Syntax Table Functions |
| 391 | @section Syntax Table Functions |
| 392 | |
| 393 | In this section we describe functions for creating, accessing and |
| 394 | altering syntax tables. |
| 395 | |
| 396 | @defun make-syntax-table &optional table |
| 397 | This function creates a new syntax table. If @var{table} is |
| 398 | non-@code{nil}, the parent of the new syntax table is @var{table}; |
| 399 | otherwise, the parent is the standard syntax table. |
| 400 | |
| 401 | In the new syntax table, all characters are initially given the |
| 402 | ``inherit'' (@samp{@@}) syntax class, i.e., their syntax is inherited |
| 403 | from the parent table (@pxref{Syntax Class Table}). |
| 404 | @end defun |
| 405 | |
| 406 | @defun copy-syntax-table &optional table |
| 407 | This function constructs a copy of @var{table} and returns it. If |
| 408 | @var{table} is omitted or @code{nil}, it returns a copy of the |
| 409 | standard syntax table. Otherwise, an error is signaled if @var{table} |
| 410 | is not a syntax table. |
| 411 | @end defun |
| 412 | |
| 413 | @deffn Command modify-syntax-entry char syntax-descriptor &optional table |
| 414 | @cindex syntax entry, setting |
| 415 | This function sets the syntax entry for @var{char} according to |
| 416 | @var{syntax-descriptor}. @var{char} must be a character, or a cons |
| 417 | cell of the form @code{(@var{min} . @var{max})}; in the latter case, |
| 418 | the function sets the syntax entries for all characters in the range |
| 419 | between @var{min} and @var{max}, inclusive. |
| 420 | |
| 421 | The syntax is changed only for @var{table}, which defaults to the |
| 422 | current buffer's syntax table, and not in any other syntax table. |
| 423 | |
| 424 | The argument @var{syntax-descriptor} is a syntax descriptor, i.e., a |
| 425 | string whose first character is a syntax class designator and whose |
| 426 | second and subsequent characters optionally specify a matching |
| 427 | character and syntax flags. @xref{Syntax Descriptors}. An error is |
| 428 | signaled if @var{syntax-descriptor} is not a valid syntax descriptor. |
| 429 | |
| 430 | This function always returns @code{nil}. The old syntax information in |
| 431 | the table for this character is discarded. |
| 432 | |
| 433 | @example |
| 434 | @group |
| 435 | @exdent @r{Examples:} |
| 436 | |
| 437 | ;; @r{Put the space character in class whitespace.} |
| 438 | (modify-syntax-entry ?\s " ") |
| 439 | @result{} nil |
| 440 | @end group |
| 441 | |
| 442 | @group |
| 443 | ;; @r{Make @samp{$} an open parenthesis character,} |
| 444 | ;; @r{with @samp{^} as its matching close.} |
| 445 | (modify-syntax-entry ?$ "(^") |
| 446 | @result{} nil |
| 447 | @end group |
| 448 | |
| 449 | @group |
| 450 | ;; @r{Make @samp{^} a close parenthesis character,} |
| 451 | ;; @r{with @samp{$} as its matching open.} |
| 452 | (modify-syntax-entry ?^ ")$") |
| 453 | @result{} nil |
| 454 | @end group |
| 455 | |
| 456 | @group |
| 457 | ;; @r{Make @samp{/} a punctuation character,} |
| 458 | ;; @r{the first character of a start-comment sequence,} |
| 459 | ;; @r{and the second character of an end-comment sequence.} |
| 460 | ;; @r{This is used in C mode.} |
| 461 | (modify-syntax-entry ?/ ". 14") |
| 462 | @result{} nil |
| 463 | @end group |
| 464 | @end example |
| 465 | @end deffn |
| 466 | |
| 467 | @defun char-syntax character |
| 468 | This function returns the syntax class of @var{character}, represented |
| 469 | by its designator character (@pxref{Syntax Class Table}). This |
| 470 | returns @emph{only} the class, not its matching character or syntax |
| 471 | flags. |
| 472 | |
| 473 | The following examples apply to C mode. (We use @code{string} to make |
| 474 | it easier to see the character returned by @code{char-syntax}.) |
| 475 | |
| 476 | @example |
| 477 | @group |
| 478 | ;; Space characters have whitespace syntax class. |
| 479 | (string (char-syntax ?\s)) |
| 480 | @result{} " " |
| 481 | @end group |
| 482 | |
| 483 | @group |
| 484 | ;; Forward slash characters have punctuation syntax. |
| 485 | ;; Note that this @code{char-syntax} call does not reveal |
| 486 | ;; that it is also part of comment-start and -end sequences. |
| 487 | (string (char-syntax ?/)) |
| 488 | @result{} "." |
| 489 | @end group |
| 490 | |
| 491 | @group |
| 492 | ;; Open parenthesis characters have open parenthesis syntax. |
| 493 | ;; Note that this @code{char-syntax} call does not reveal that |
| 494 | ;; it has a matching character, @samp{)}. |
| 495 | (string (char-syntax ?\()) |
| 496 | @result{} "(" |
| 497 | @end group |
| 498 | @end example |
| 499 | |
| 500 | @end defun |
| 501 | |
| 502 | @defun set-syntax-table table |
| 503 | This function makes @var{table} the syntax table for the current buffer. |
| 504 | It returns @var{table}. |
| 505 | @end defun |
| 506 | |
| 507 | @defun syntax-table |
| 508 | This function returns the current syntax table, which is the table for |
| 509 | the current buffer. |
| 510 | @end defun |
| 511 | |
| 512 | @deffn Command describe-syntax &optional buffer |
| 513 | This command displays the contents of the syntax table of |
| 514 | @var{buffer} (by default, the current buffer) in a help buffer. |
| 515 | @end deffn |
| 516 | |
| 517 | @defmac with-syntax-table table body@dots{} |
| 518 | This macro executes @var{body} using @var{table} as the current syntax |
| 519 | table. It returns the value of the last form in @var{body}, after |
| 520 | restoring the old current syntax table. |
| 521 | |
| 522 | Since each buffer has its own current syntax table, we should make that |
| 523 | more precise: @code{with-syntax-table} temporarily alters the current |
| 524 | syntax table of whichever buffer is current at the time the macro |
| 525 | execution starts. Other buffers are not affected. |
| 526 | @end defmac |
| 527 | |
| 528 | @node Syntax Properties |
| 529 | @section Syntax Properties |
| 530 | @kindex syntax-table @r{(text property)} |
| 531 | |
| 532 | When the syntax table is not flexible enough to specify the syntax of |
| 533 | a language, you can override the syntax table for specific character |
| 534 | occurrences in the buffer, by applying a @code{syntax-table} text |
| 535 | property. @xref{Text Properties}, for how to apply text properties. |
| 536 | |
| 537 | The valid values of @code{syntax-table} text property are: |
| 538 | |
| 539 | @table @asis |
| 540 | @item @var{syntax-table} |
| 541 | If the property value is a syntax table, that table is used instead of |
| 542 | the current buffer's syntax table to determine the syntax for the |
| 543 | underlying text character. |
| 544 | |
| 545 | @item @code{(@var{syntax-code} . @var{matching-char})} |
| 546 | A cons cell of this format is a raw syntax descriptor (@pxref{Syntax |
| 547 | Table Internals}), which directly specifies a syntax class for the |
| 548 | underlying text character. |
| 549 | |
| 550 | @item @code{nil} |
| 551 | If the property is @code{nil}, the character's syntax is determined from |
| 552 | the current syntax table in the usual way. |
| 553 | @end table |
| 554 | |
| 555 | @defvar parse-sexp-lookup-properties |
| 556 | If this is non-@code{nil}, the syntax scanning functions, like |
| 557 | @code{forward-sexp}, pay attention to syntax text properties. |
| 558 | Otherwise they use only the current syntax table. |
| 559 | @end defvar |
| 560 | |
| 561 | @defvar syntax-propertize-function |
| 562 | This variable, if non-@code{nil}, should store a function for applying |
| 563 | @code{syntax-table} properties to a specified stretch of text. It is |
| 564 | intended to be used by major modes to install a function which applies |
| 565 | @code{syntax-table} properties in some mode-appropriate way. |
| 566 | |
| 567 | The function is called by @code{syntax-ppss} (@pxref{Position Parse}), |
| 568 | and by Font Lock mode during syntactic fontification (@pxref{Syntactic |
| 569 | Font Lock}). It is called with two arguments, @var{start} and |
| 570 | @var{end}, which are the starting and ending positions of the text on |
| 571 | which it should act. It is allowed to call @code{syntax-ppss} on any |
| 572 | position before @var{end}. However, it should not call |
| 573 | @code{syntax-ppss-flush-cache}; so, it is not allowed to call |
| 574 | @code{syntax-ppss} on some position and later modify the buffer at an |
| 575 | earlier position. |
| 576 | @end defvar |
| 577 | |
| 578 | @defvar syntax-propertize-extend-region-functions |
| 579 | This abnormal hook is run by the syntax parsing code prior to calling |
| 580 | @code{syntax-propertize-function}. Its role is to help locate safe |
| 581 | starting and ending buffer positions for passing to |
| 582 | @code{syntax-propertize-function}. For example, a major mode can add |
| 583 | a function to this hook to identify multi-line syntactic constructs, |
| 584 | and ensure that the boundaries do not fall in the middle of one. |
| 585 | |
| 586 | Each function in this hook should accept two arguments, @var{start} |
| 587 | and @var{end}. It should return either a cons cell of two adjusted |
| 588 | buffer positions, @code{(@var{new-start} . @var{new-end})}, or |
| 589 | @code{nil} if no adjustment is necessary. The hook functions are run |
| 590 | in turn, repeatedly, until they all return @code{nil}. |
| 591 | @end defvar |
| 592 | |
| 593 | @node Motion and Syntax |
| 594 | @section Motion and Syntax |
| 595 | |
| 596 | This section describes functions for moving across characters that |
| 597 | have certain syntax classes. |
| 598 | |
| 599 | @defun skip-syntax-forward syntaxes &optional limit |
| 600 | This function moves point forward across characters having syntax |
| 601 | classes mentioned in @var{syntaxes} (a string of syntax class |
| 602 | characters). It stops when it encounters the end of the buffer, or |
| 603 | position @var{limit} (if specified), or a character it is not supposed |
| 604 | to skip. |
| 605 | |
| 606 | If @var{syntaxes} starts with @samp{^}, then the function skips |
| 607 | characters whose syntax is @emph{not} in @var{syntaxes}. |
| 608 | |
| 609 | The return value is the distance traveled, which is a nonnegative |
| 610 | integer. |
| 611 | @end defun |
| 612 | |
| 613 | @defun skip-syntax-backward syntaxes &optional limit |
| 614 | This function moves point backward across characters whose syntax |
| 615 | classes are mentioned in @var{syntaxes}. It stops when it encounters |
| 616 | the beginning of the buffer, or position @var{limit} (if specified), or |
| 617 | a character it is not supposed to skip. |
| 618 | |
| 619 | If @var{syntaxes} starts with @samp{^}, then the function skips |
| 620 | characters whose syntax is @emph{not} in @var{syntaxes}. |
| 621 | |
| 622 | The return value indicates the distance traveled. It is an integer that |
| 623 | is zero or less. |
| 624 | @end defun |
| 625 | |
| 626 | @defun backward-prefix-chars |
| 627 | This function moves point backward over any number of characters with |
| 628 | expression prefix syntax. This includes both characters in the |
| 629 | expression prefix syntax class, and characters with the @samp{p} flag. |
| 630 | @end defun |
| 631 | |
| 632 | @node Parsing Expressions |
| 633 | @section Parsing Expressions |
| 634 | |
| 635 | This section describes functions for parsing and scanning balanced |
| 636 | expressions. We will refer to such expressions as @dfn{sexps}, |
| 637 | following the terminology of Lisp, even though these functions can act |
| 638 | on languages other than Lisp. Basically, a sexp is either a balanced |
| 639 | parenthetical grouping, a string, or a ``symbol'' (i.e., a sequence |
| 640 | of characters whose syntax is either word constituent or symbol |
| 641 | constituent). However, characters in the expression prefix syntax |
| 642 | class (@pxref{Syntax Class Table}) are treated as part of the sexp if |
| 643 | they appear next to it. |
| 644 | |
| 645 | The syntax table controls the interpretation of characters, so these |
| 646 | functions can be used for Lisp expressions when in Lisp mode and for C |
| 647 | expressions when in C mode. @xref{List Motion}, for convenient |
| 648 | higher-level functions for moving over balanced expressions. |
| 649 | |
| 650 | A character's syntax controls how it changes the state of the |
| 651 | parser, rather than describing the state itself. For example, a |
| 652 | string delimiter character toggles the parser state between |
| 653 | ``in-string'' and ``in-code'', but the syntax of characters does not |
| 654 | directly say whether they are inside a string. For example (note that |
| 655 | 15 is the syntax code for generic string delimiters), |
| 656 | |
| 657 | @example |
| 658 | (put-text-property 1 9 'syntax-table '(15 . nil)) |
| 659 | @end example |
| 660 | |
| 661 | @noindent |
| 662 | does not tell Emacs that the first eight chars of the current buffer |
| 663 | are a string, but rather that they are all string delimiters. As a |
| 664 | result, Emacs treats them as four consecutive empty string constants. |
| 665 | |
| 666 | @menu |
| 667 | * Motion via Parsing:: Motion functions that work by parsing. |
| 668 | * Position Parse:: Determining the syntactic state of a position. |
| 669 | * Parser State:: How Emacs represents a syntactic state. |
| 670 | * Low-Level Parsing:: Parsing across a specified region. |
| 671 | * Control Parsing:: Parameters that affect parsing. |
| 672 | @end menu |
| 673 | |
| 674 | @node Motion via Parsing |
| 675 | @subsection Motion Commands Based on Parsing |
| 676 | |
| 677 | This section describes simple point-motion functions that operate |
| 678 | based on parsing expressions. |
| 679 | |
| 680 | @defun scan-lists from count depth |
| 681 | This function scans forward @var{count} balanced parenthetical |
| 682 | groupings from position @var{from}. It returns the position where the |
| 683 | scan stops. If @var{count} is negative, the scan moves backwards. |
| 684 | |
| 685 | If @var{depth} is nonzero, treat the starting position as being |
| 686 | @var{depth} parentheses deep. The scanner moves forward or backward |
| 687 | through the buffer until the depth changes to zero @var{count} times. |
| 688 | Hence, a positive value for @var{depth} has the effect of moving out |
| 689 | @var{depth} levels of parenthesis from the starting position, while a |
| 690 | negative @var{depth} has the effect of moving deeper by @var{-depth} |
| 691 | levels of parenthesis. |
| 692 | |
| 693 | Scanning ignores comments if @code{parse-sexp-ignore-comments} is |
| 694 | non-@code{nil}. |
| 695 | |
| 696 | If the scan reaches the beginning or end of the accessible part of the |
| 697 | buffer before it has scanned over @var{count} parenthetical groupings, |
| 698 | the return value is @code{nil} if the depth at that point is zero; if |
| 699 | the depth is non-zero, a @code{scan-error} error is signaled. |
| 700 | @end defun |
| 701 | |
| 702 | @defun scan-sexps from count |
| 703 | This function scans forward @var{count} sexps from position @var{from}. |
| 704 | It returns the position where the scan stops. If @var{count} is |
| 705 | negative, the scan moves backwards. |
| 706 | |
| 707 | Scanning ignores comments if @code{parse-sexp-ignore-comments} is |
| 708 | non-@code{nil}. |
| 709 | |
| 710 | If the scan reaches the beginning or end of (the accessible part of) the |
| 711 | buffer while in the middle of a parenthetical grouping, an error is |
| 712 | signaled. If it reaches the beginning or end between groupings but |
| 713 | before count is used up, @code{nil} is returned. |
| 714 | @end defun |
| 715 | |
| 716 | @defun forward-comment count |
| 717 | This function moves point forward across @var{count} complete comments |
| 718 | (that is, including the starting delimiter and the terminating |
| 719 | delimiter if any), plus any whitespace encountered on the way. It |
| 720 | moves backward if @var{count} is negative. If it encounters anything |
| 721 | other than a comment or whitespace, it stops, leaving point at the |
| 722 | place where it stopped. This includes (for instance) finding the end |
| 723 | of a comment when moving forward and expecting the beginning of one. |
| 724 | The function also stops immediately after moving over the specified |
| 725 | number of complete comments. If @var{count} comments are found as |
| 726 | expected, with nothing except whitespace between them, it returns |
| 727 | @code{t}; otherwise it returns @code{nil}. |
| 728 | |
| 729 | This function cannot tell whether the ``comments'' it traverses are |
| 730 | embedded within a string. If they look like comments, it treats them |
| 731 | as comments. |
| 732 | |
| 733 | To move forward over all comments and whitespace following point, use |
| 734 | @code{(forward-comment (buffer-size))}. @code{(buffer-size)} is a |
| 735 | good argument to use, because the number of comments in the buffer |
| 736 | cannot exceed that many. |
| 737 | @end defun |
| 738 | |
| 739 | @node Position Parse |
| 740 | @subsection Finding the Parse State for a Position |
| 741 | |
| 742 | For syntactic analysis, such as in indentation, often the useful |
| 743 | thing is to compute the syntactic state corresponding to a given buffer |
| 744 | position. This function does that conveniently. |
| 745 | |
| 746 | @defun syntax-ppss &optional pos |
| 747 | This function returns the parser state that the parser would reach at |
| 748 | position @var{pos} starting from the beginning of the buffer. |
| 749 | @iftex |
| 750 | See the next section for |
| 751 | @end iftex |
| 752 | @ifnottex |
| 753 | @xref{Parser State}, |
| 754 | @end ifnottex |
| 755 | for a description of the parser state. |
| 756 | |
| 757 | The return value is the same as if you call the low-level parsing |
| 758 | function @code{parse-partial-sexp} to parse from the beginning of the |
| 759 | buffer to @var{pos} (@pxref{Low-Level Parsing}). However, |
| 760 | @code{syntax-ppss} uses a cache to speed up the computation. Due to |
| 761 | this optimization, the second value (previous complete subexpression) |
| 762 | and sixth value (minimum parenthesis depth) in the returned parser |
| 763 | state are not meaningful. |
| 764 | |
| 765 | This function has a side effect: it adds a buffer-local entry to |
| 766 | @code{before-change-functions} (@pxref{Change Hooks}) for |
| 767 | @code{syntax-ppss-flush-cache} (see below). This entry keeps the |
| 768 | cache consistent as the buffer is modified. However, the cache might |
| 769 | not be updated if @code{syntax-ppss} is called while |
| 770 | @code{before-change-functions} is temporarily let-bound, or if the |
| 771 | buffer is modified without running the hook, such as when using |
| 772 | @code{inhibit-modification-hooks}. In those cases, it is necessary to |
| 773 | call @code{syntax-ppss-flush-cache} explicitly. |
| 774 | @end defun |
| 775 | |
| 776 | @defun syntax-ppss-flush-cache beg &rest ignored-args |
| 777 | This function flushes the cache used by @code{syntax-ppss}, starting |
| 778 | at position @var{beg}. The remaining arguments, @var{ignored-args}, |
| 779 | are ignored; this function accepts them so that it can be directly |
| 780 | used on hooks such as @code{before-change-functions} (@pxref{Change |
| 781 | Hooks}). |
| 782 | @end defun |
| 783 | |
| 784 | Major modes can make @code{syntax-ppss} run faster by specifying |
| 785 | where it needs to start parsing. |
| 786 | |
| 787 | @defvar syntax-begin-function |
| 788 | If this is non-@code{nil}, it should be a function that moves to an |
| 789 | earlier buffer position where the parser state is equivalent to |
| 790 | @code{nil}---in other words, a position outside of any comment, |
| 791 | string, or parenthesis. @code{syntax-ppss} uses it to further |
| 792 | optimize its computations, when the cache gives no help. |
| 793 | @end defvar |
| 794 | |
| 795 | @node Parser State |
| 796 | @subsection Parser State |
| 797 | @cindex parser state |
| 798 | |
| 799 | A @dfn{parser state} is a list of ten elements describing the state |
| 800 | of the syntactic parser, after it parses the text between a specified |
| 801 | starting point and a specified end point in the buffer. Parsing |
| 802 | functions such as @code{syntax-ppss} |
| 803 | @ifnottex |
| 804 | (@pxref{Position Parse}) |
| 805 | @end ifnottex |
| 806 | return a parser state as the value. Some parsing functions accept a |
| 807 | parser state as an argument, for resuming parsing. |
| 808 | |
| 809 | Here are the meanings of the elements of the parser state: |
| 810 | |
| 811 | @enumerate 0 |
| 812 | @item |
| 813 | The depth in parentheses, counting from 0. @strong{Warning:} this can |
| 814 | be negative if there are more close parens than open parens between |
| 815 | the parser's starting point and end point. |
| 816 | |
| 817 | @item |
| 818 | @cindex innermost containing parentheses |
| 819 | The character position of the start of the innermost parenthetical |
| 820 | grouping containing the stopping point; @code{nil} if none. |
| 821 | |
| 822 | @item |
| 823 | @cindex previous complete subexpression |
| 824 | The character position of the start of the last complete subexpression |
| 825 | terminated; @code{nil} if none. |
| 826 | |
| 827 | @item |
| 828 | @cindex inside string |
| 829 | Non-@code{nil} if inside a string. More precisely, this is the |
| 830 | character that will terminate the string, or @code{t} if a generic |
| 831 | string delimiter character should terminate it. |
| 832 | |
| 833 | @item |
| 834 | @cindex inside comment |
| 835 | @code{t} if inside a non-nestable comment (of any comment style; |
| 836 | @pxref{Syntax Flags}); or the comment nesting level if inside a |
| 837 | comment that can be nested. |
| 838 | |
| 839 | @item |
| 840 | @cindex quote character |
| 841 | @code{t} if the end point is just after a quote character. |
| 842 | |
| 843 | @item |
| 844 | The minimum parenthesis depth encountered during this scan. |
| 845 | |
| 846 | @item |
| 847 | What kind of comment is active: @code{nil} if not in a comment or in a |
| 848 | comment of style @samp{a}; 1 for a comment of style @samp{b}; 2 for a |
| 849 | comment of style @samp{c}; and @code{syntax-table} for a comment that |
| 850 | should be ended by a generic comment delimiter character. |
| 851 | |
| 852 | @item |
| 853 | The string or comment start position. While inside a comment, this is |
| 854 | the position where the comment began; while inside a string, this is the |
| 855 | position where the string began. When outside of strings and comments, |
| 856 | this element is @code{nil}. |
| 857 | |
| 858 | @item |
| 859 | Internal data for continuing the parsing. The meaning of this |
| 860 | data is subject to change; it is used if you pass this list |
| 861 | as the @var{state} argument to another call. |
| 862 | @end enumerate |
| 863 | |
| 864 | Elements 1, 2, and 6 are ignored in a state which you pass as an |
| 865 | argument to continue parsing, and elements 8 and 9 are used only in |
| 866 | trivial cases. Those elements are mainly used internally by the |
| 867 | parser code. |
| 868 | |
| 869 | One additional piece of useful information is available from a |
| 870 | parser state using this function: |
| 871 | |
| 872 | @defun syntax-ppss-toplevel-pos state |
| 873 | This function extracts, from parser state @var{state}, the last |
| 874 | position scanned in the parse which was at top level in grammatical |
| 875 | structure. ``At top level'' means outside of any parentheses, |
| 876 | comments, or strings. |
| 877 | |
| 878 | The value is @code{nil} if @var{state} represents a parse which has |
| 879 | arrived at a top level position. |
| 880 | @end defun |
| 881 | |
| 882 | @node Low-Level Parsing |
| 883 | @subsection Low-Level Parsing |
| 884 | |
| 885 | The most basic way to use the expression parser is to tell it |
| 886 | to start at a given position with a certain state, and parse up to |
| 887 | a specified end position. |
| 888 | |
| 889 | @defun parse-partial-sexp start limit &optional target-depth stop-before state stop-comment |
| 890 | This function parses a sexp in the current buffer starting at |
| 891 | @var{start}, not scanning past @var{limit}. It stops at position |
| 892 | @var{limit} or when certain criteria described below are met, and sets |
| 893 | point to the location where parsing stops. It returns a parser state |
| 894 | @ifinfo |
| 895 | (@pxref{Parser State}) |
| 896 | @end ifinfo |
| 897 | describing the status of the parse at the point where it stops. |
| 898 | |
| 899 | @cindex parenthesis depth |
| 900 | If the third argument @var{target-depth} is non-@code{nil}, parsing |
| 901 | stops if the depth in parentheses becomes equal to @var{target-depth}. |
| 902 | The depth starts at 0, or at whatever is given in @var{state}. |
| 903 | |
| 904 | If the fourth argument @var{stop-before} is non-@code{nil}, parsing |
| 905 | stops when it comes to any character that starts a sexp. If |
| 906 | @var{stop-comment} is non-@code{nil}, parsing stops when it comes to the |
| 907 | start of a comment. If @var{stop-comment} is the symbol |
| 908 | @code{syntax-table}, parsing stops after the start of a comment or a |
| 909 | string, or the end of a comment or a string, whichever comes first. |
| 910 | |
| 911 | If @var{state} is @code{nil}, @var{start} is assumed to be at the top |
| 912 | level of parenthesis structure, such as the beginning of a function |
| 913 | definition. Alternatively, you might wish to resume parsing in the |
| 914 | middle of the structure. To do this, you must provide a @var{state} |
| 915 | argument that describes the initial status of parsing. The value |
| 916 | returned by a previous call to @code{parse-partial-sexp} will do |
| 917 | nicely. |
| 918 | @end defun |
| 919 | |
| 920 | @node Control Parsing |
| 921 | @subsection Parameters to Control Parsing |
| 922 | |
| 923 | @defvar multibyte-syntax-as-symbol |
| 924 | If this variable is non-@code{nil}, @code{scan-sexps} treats all |
| 925 | non-@acronym{ASCII} characters as symbol constituents regardless |
| 926 | of what the syntax table says about them. (However, text properties |
| 927 | can still override the syntax.) |
| 928 | @end defvar |
| 929 | |
| 930 | @defopt parse-sexp-ignore-comments |
| 931 | @cindex skipping comments |
| 932 | If the value is non-@code{nil}, then comments are treated as |
| 933 | whitespace by the functions in this section and by @code{forward-sexp}, |
| 934 | @code{scan-lists} and @code{scan-sexps}. |
| 935 | @end defopt |
| 936 | |
| 937 | @vindex parse-sexp-lookup-properties |
| 938 | The behavior of @code{parse-partial-sexp} is also affected by |
| 939 | @code{parse-sexp-lookup-properties} (@pxref{Syntax Properties}). |
| 940 | |
| 941 | You can use @code{forward-comment} to move forward or backward over |
| 942 | one comment or several comments. |
| 943 | |
| 944 | @node Syntax Table Internals |
| 945 | @section Syntax Table Internals |
| 946 | @cindex syntax table internals |
| 947 | |
| 948 | Syntax tables are implemented as char-tables (@pxref{Char-Tables}), |
| 949 | but most Lisp programs don't work directly with their elements. |
| 950 | Syntax tables do not store syntax data as syntax descriptors |
| 951 | (@pxref{Syntax Descriptors}); they use an internal format, which is |
| 952 | documented in this section. This internal format can also be assigned |
| 953 | as syntax properties (@pxref{Syntax Properties}). |
| 954 | |
| 955 | @cindex syntax code |
| 956 | @cindex raw syntax descriptor |
| 957 | Each entry in a syntax table is a @dfn{raw syntax descriptor}: a |
| 958 | cons cell of the form @code{(@var{syntax-code} |
| 959 | . @var{matching-char})}. @var{syntax-code} is an integer which |
| 960 | encodes the syntax class and syntax flags, according to the table |
| 961 | below. @var{matching-char}, if non-@code{nil}, specifies a matching |
| 962 | character (similar to the second character in a syntax descriptor). |
| 963 | |
| 964 | Here are the syntax codes corresponding to the various syntax |
| 965 | classes: |
| 966 | |
| 967 | @multitable @columnfractions .2 .3 .2 .3 |
| 968 | @item |
| 969 | @i{Code} @tab @i{Class} @tab @i{Code} @tab @i{Class} |
| 970 | @item |
| 971 | 0 @tab whitespace @tab 8 @tab paired delimiter |
| 972 | @item |
| 973 | 1 @tab punctuation @tab 9 @tab escape |
| 974 | @item |
| 975 | 2 @tab word @tab 10 @tab character quote |
| 976 | @item |
| 977 | 3 @tab symbol @tab 11 @tab comment-start |
| 978 | @item |
| 979 | 4 @tab open parenthesis @tab 12 @tab comment-end |
| 980 | @item |
| 981 | 5 @tab close parenthesis @tab 13 @tab inherit |
| 982 | @item |
| 983 | 6 @tab expression prefix @tab 14 @tab generic comment |
| 984 | @item |
| 985 | 7 @tab string quote @tab 15 @tab generic string |
| 986 | @end multitable |
| 987 | |
| 988 | @noindent |
| 989 | For example, in the standard syntax table, the entry for @samp{(} is |
| 990 | @code{(4 . 41)}. 41 is the character code for @samp{)}. |
| 991 | |
| 992 | Syntax flags are encoded in higher order bits, starting 16 bits from |
| 993 | the least significant bit. This table gives the power of two which |
| 994 | corresponds to each syntax flag. |
| 995 | |
| 996 | @multitable @columnfractions .15 .3 .15 .3 |
| 997 | @item |
| 998 | @i{Prefix} @tab @i{Flag} @tab @i{Prefix} @tab @i{Flag} |
| 999 | @item |
| 1000 | @samp{1} @tab @code{(lsh 1 16)} @tab @samp{p} @tab @code{(lsh 1 20)} |
| 1001 | @item |
| 1002 | @samp{2} @tab @code{(lsh 1 17)} @tab @samp{b} @tab @code{(lsh 1 21)} |
| 1003 | @item |
| 1004 | @samp{3} @tab @code{(lsh 1 18)} @tab @samp{n} @tab @code{(lsh 1 22)} |
| 1005 | @item |
| 1006 | @samp{4} @tab @code{(lsh 1 19)} |
| 1007 | @end multitable |
| 1008 | |
| 1009 | @defun string-to-syntax desc |
| 1010 | Given a syntax descriptor @var{desc} (a string), this function returns |
| 1011 | the corresponding raw syntax descriptor. |
| 1012 | @end defun |
| 1013 | |
| 1014 | @defun syntax-after pos |
| 1015 | This function returns the raw syntax descriptor for the character in |
| 1016 | the buffer after position @var{pos}, taking account of syntax |
| 1017 | properties as well as the syntax table. If @var{pos} is outside the |
| 1018 | buffer's accessible portion (@pxref{Narrowing, accessible portion}), |
| 1019 | the return value is @code{nil}. |
| 1020 | @end defun |
| 1021 | |
| 1022 | @defun syntax-class syntax |
| 1023 | This function returns the syntax code for the raw syntax descriptor |
| 1024 | @var{syntax}. More precisely, it takes the raw syntax descriptor's |
| 1025 | @var{syntax-code} component, masks off the high 16 bits which record |
| 1026 | the syntax flags, and returns the resulting integer. |
| 1027 | |
| 1028 | If @var{syntax} is @code{nil}, the return value is returns @code{nil}. |
| 1029 | This is so that the expression |
| 1030 | |
| 1031 | @example |
| 1032 | (syntax-class (syntax-after pos)) |
| 1033 | @end example |
| 1034 | |
| 1035 | @noindent |
| 1036 | evaluates to @code{nil} if @code{pos} is outside the buffer's |
| 1037 | accessible portion, without throwing errors or returning an incorrect |
| 1038 | code. |
| 1039 | @end defun |
| 1040 | |
| 1041 | @node Categories |
| 1042 | @section Categories |
| 1043 | @cindex categories of characters |
| 1044 | @cindex character categories |
| 1045 | |
| 1046 | @dfn{Categories} provide an alternate way of classifying characters |
| 1047 | syntactically. You can define several categories as needed, then |
| 1048 | independently assign each character to one or more categories. Unlike |
| 1049 | syntax classes, categories are not mutually exclusive; it is normal for |
| 1050 | one character to belong to several categories. |
| 1051 | |
| 1052 | @cindex category table |
| 1053 | Each buffer has a @dfn{category table} which records which categories |
| 1054 | are defined and also which characters belong to each category. Each |
| 1055 | category table defines its own categories, but normally these are |
| 1056 | initialized by copying from the standard categories table, so that the |
| 1057 | standard categories are available in all modes. |
| 1058 | |
| 1059 | Each category has a name, which is an @acronym{ASCII} printing character in |
| 1060 | the range @w{@samp{ }} to @samp{~}. You specify the name of a category |
| 1061 | when you define it with @code{define-category}. |
| 1062 | |
| 1063 | @cindex category set |
| 1064 | The category table is actually a char-table (@pxref{Char-Tables}). |
| 1065 | The element of the category table at index @var{c} is a @dfn{category |
| 1066 | set}---a bool-vector---that indicates which categories character @var{c} |
| 1067 | belongs to. In this category set, if the element at index @var{cat} is |
| 1068 | @code{t}, that means category @var{cat} is a member of the set, and that |
| 1069 | character @var{c} belongs to category @var{cat}. |
| 1070 | |
| 1071 | For the next three functions, the optional argument @var{table} |
| 1072 | defaults to the current buffer's category table. |
| 1073 | |
| 1074 | @defun define-category char docstring &optional table |
| 1075 | This function defines a new category, with name @var{char} and |
| 1076 | documentation @var{docstring}, for the category table @var{table}. |
| 1077 | |
| 1078 | Here's an example of defining a new category for characters that have |
| 1079 | strong right-to-left directionality (@pxref{Bidirectional Display}) |
| 1080 | and using it in a special category table: |
| 1081 | |
| 1082 | @example |
| 1083 | (defvar special-category-table-for-bidi |
| 1084 | (let ((category-table (make-category-table)) |
| 1085 | (uniprop-table (unicode-property-table-internal 'bidi-class))) |
| 1086 | (define-category ?R "Characters of bidi-class R, AL, or RLO" |
| 1087 | category-table) |
| 1088 | (map-char-table |
| 1089 | #'(lambda (key val) |
| 1090 | (if (memq val '(R AL RLO)) |
| 1091 | (modify-category-entry key ?R category-table))) |
| 1092 | uniprop-table) |
| 1093 | category-table)) |
| 1094 | @end example |
| 1095 | @end defun |
| 1096 | |
| 1097 | @defun category-docstring category &optional table |
| 1098 | This function returns the documentation string of category @var{category} |
| 1099 | in category table @var{table}. |
| 1100 | |
| 1101 | @example |
| 1102 | (category-docstring ?a) |
| 1103 | @result{} "ASCII" |
| 1104 | (category-docstring ?l) |
| 1105 | @result{} "Latin" |
| 1106 | @end example |
| 1107 | @end defun |
| 1108 | |
| 1109 | @defun get-unused-category &optional table |
| 1110 | This function returns a category name (a character) which is not |
| 1111 | currently defined in @var{table}. If all possible categories are in use |
| 1112 | in @var{table}, it returns @code{nil}. |
| 1113 | @end defun |
| 1114 | |
| 1115 | @defun category-table |
| 1116 | This function returns the current buffer's category table. |
| 1117 | @end defun |
| 1118 | |
| 1119 | @defun category-table-p object |
| 1120 | This function returns @code{t} if @var{object} is a category table, |
| 1121 | otherwise @code{nil}. |
| 1122 | @end defun |
| 1123 | |
| 1124 | @defun standard-category-table |
| 1125 | This function returns the standard category table. |
| 1126 | @end defun |
| 1127 | |
| 1128 | @defun copy-category-table &optional table |
| 1129 | This function constructs a copy of @var{table} and returns it. If |
| 1130 | @var{table} is not supplied (or is @code{nil}), it returns a copy of the |
| 1131 | standard category table. Otherwise, an error is signaled if @var{table} |
| 1132 | is not a category table. |
| 1133 | @end defun |
| 1134 | |
| 1135 | @defun set-category-table table |
| 1136 | This function makes @var{table} the category table for the current |
| 1137 | buffer. It returns @var{table}. |
| 1138 | @end defun |
| 1139 | |
| 1140 | @defun make-category-table |
| 1141 | This creates and returns an empty category table. In an empty category |
| 1142 | table, no categories have been allocated, and no characters belong to |
| 1143 | any categories. |
| 1144 | @end defun |
| 1145 | |
| 1146 | @defun make-category-set categories |
| 1147 | This function returns a new category set---a bool-vector---whose initial |
| 1148 | contents are the categories listed in the string @var{categories}. The |
| 1149 | elements of @var{categories} should be category names; the new category |
| 1150 | set has @code{t} for each of those categories, and @code{nil} for all |
| 1151 | other categories. |
| 1152 | |
| 1153 | @example |
| 1154 | (make-category-set "al") |
| 1155 | @result{} #&128"\0\0\0\0\0\0\0\0\0\0\0\0\2\20\0\0" |
| 1156 | @end example |
| 1157 | @end defun |
| 1158 | |
| 1159 | @defun char-category-set char |
| 1160 | This function returns the category set for character @var{char} in the |
| 1161 | current buffer's category table. This is the bool-vector which |
| 1162 | records which categories the character @var{char} belongs to. The |
| 1163 | function @code{char-category-set} does not allocate storage, because |
| 1164 | it returns the same bool-vector that exists in the category table. |
| 1165 | |
| 1166 | @example |
| 1167 | (char-category-set ?a) |
| 1168 | @result{} #&128"\0\0\0\0\0\0\0\0\0\0\0\0\2\20\0\0" |
| 1169 | @end example |
| 1170 | @end defun |
| 1171 | |
| 1172 | @defun category-set-mnemonics category-set |
| 1173 | This function converts the category set @var{category-set} into a string |
| 1174 | containing the characters that designate the categories that are members |
| 1175 | of the set. |
| 1176 | |
| 1177 | @example |
| 1178 | (category-set-mnemonics (char-category-set ?a)) |
| 1179 | @result{} "al" |
| 1180 | @end example |
| 1181 | @end defun |
| 1182 | |
| 1183 | @defun modify-category-entry char category &optional table reset |
| 1184 | This function modifies the category set of @var{char} in category |
| 1185 | table @var{table} (which defaults to the current buffer's category |
| 1186 | table). @var{char} can be a character, or a cons cell of the form |
| 1187 | @code{(@var{min} . @var{max})}; in the latter case, the function |
| 1188 | modifies the category sets of all characters in the range between |
| 1189 | @var{min} and @var{max}, inclusive. |
| 1190 | |
| 1191 | Normally, it modifies a category set by adding @var{category} to it. |
| 1192 | But if @var{reset} is non-@code{nil}, then it deletes @var{category} |
| 1193 | instead. |
| 1194 | @end defun |
| 1195 | |
| 1196 | @deffn Command describe-categories &optional buffer-or-name |
| 1197 | This function describes the category specifications in the current |
| 1198 | category table. It inserts the descriptions in a buffer, and then |
| 1199 | displays that buffer. If @var{buffer-or-name} is non-@code{nil}, it |
| 1200 | describes the category table of that buffer instead. |
| 1201 | @end deffn |