| 1 | @c -*-texinfo-*- |
| 2 | @c This is part of the GNU Emacs Lisp Reference Manual. |
| 3 | @c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc. |
| 4 | @c See the file elisp.texi for copying conditions. |
| 5 | @setfilename ../info/syntax |
| 6 | @node Syntax Tables, Abbrevs, Searching and Matching, Top |
| 7 | @chapter Syntax Tables |
| 8 | @cindex parsing |
| 9 | @cindex syntax table |
| 10 | @cindex text parsing |
| 11 | |
| 12 | A @dfn{syntax table} specifies the syntactic textual function of each |
| 13 | character. This information is used by the parsing commands, the |
| 14 | complex movement commands, and others to determine where words, symbols, |
| 15 | and other syntactic constructs begin and end. The current syntax table |
| 16 | controls the meaning of the word motion functions (@pxref{Word Motion}) |
| 17 | and the list motion functions (@pxref{List Motion}) as well as the |
| 18 | functions in this chapter. |
| 19 | |
| 20 | @menu |
| 21 | * Basics: Syntax Basics. Basic concepts of syntax tables. |
| 22 | * Desc: Syntax Descriptors. How characters are classified. |
| 23 | * Syntax Table Functions:: How to create, examine and alter syntax tables. |
| 24 | * Motion and Syntax:: Moving over characters with certain syntaxes. |
| 25 | * Parsing Expressions:: Parsing balanced expressions |
| 26 | using the syntax table. |
| 27 | * Standard Syntax Tables:: Syntax tables used by various major modes. |
| 28 | * Syntax Table Internals:: How syntax table information is stored. |
| 29 | @end menu |
| 30 | |
| 31 | @node Syntax Basics |
| 32 | @section Syntax Table Concepts |
| 33 | |
| 34 | @ifinfo |
| 35 | A @dfn{syntax table} provides Emacs with the information that |
| 36 | determines the syntactic use of each character in a buffer. This |
| 37 | information is used by the parsing commands, the complex movement |
| 38 | commands, and others to determine where words, symbols, and other |
| 39 | syntactic constructs begin and end. The current syntax table controls |
| 40 | the meaning of the word motion functions (@pxref{Word Motion}) and the |
| 41 | list motion functions (@pxref{List Motion}) as well as the functions in |
| 42 | this chapter. |
| 43 | @end ifinfo |
| 44 | |
| 45 | A syntax table is a vector of 256 elements; it contains one entry for |
| 46 | each of the 256 possible characters in an 8-bit byte. Each element is |
| 47 | an integer that encodes the syntax of the character in question. |
| 48 | |
| 49 | Syntax tables are used only for moving across text, not for the Emacs |
| 50 | Lisp reader. Emacs Lisp uses built-in syntactic rules when reading Lisp |
| 51 | expressions, and these rules cannot be changed. |
| 52 | |
| 53 | Each buffer has its own major mode, and each major mode has its own |
| 54 | idea of the syntactic class of various characters. For example, in Lisp |
| 55 | mode, the character @samp{;} begins a comment, but in C mode, it |
| 56 | terminates a statement. To support these variations, Emacs makes the |
| 57 | choice of syntax table local to each buffer. Typically, each major |
| 58 | mode has its own syntax table and installs that table in each buffer |
| 59 | that uses that mode. Changing this table alters the syntax in all |
| 60 | those buffers as well as in any buffers subsequently put in that mode. |
| 61 | Occasionally several similar modes share one syntax table. |
| 62 | @xref{Example Major Modes}, for an example of how to set up a syntax |
| 63 | table. |
| 64 | |
| 65 | A syntax table can inherit the data for some characters from the |
| 66 | standard syntax table, while specifying other characters itself. The |
| 67 | ``inherit'' syntax class means ``inherit this character's syntax from |
| 68 | the standard syntax table.'' Most major modes' syntax tables inherit |
| 69 | the syntax of character codes 0 through 31 and 128 through 255. This is |
| 70 | useful with character sets such as ISO Latin-1 that have additional |
| 71 | alphabetic characters in the range 128 to 255. Just changing the |
| 72 | standard syntax for these characters affects all major modes. |
| 73 | |
| 74 | @defun syntax-table-p object |
| 75 | This function returns @code{t} if @var{object} is a vector of length 256 |
| 76 | elements. This means that the vector may be a syntax table. However, |
| 77 | according to this test, any vector of length 256 is considered to be a |
| 78 | syntax table, no matter what its contents. |
| 79 | @end defun |
| 80 | |
| 81 | @node Syntax Descriptors |
| 82 | @section Syntax Descriptors |
| 83 | @cindex syntax classes |
| 84 | |
| 85 | This section describes the syntax classes and flags that denote the |
| 86 | syntax of a character, and how they are represented as a @dfn{syntax |
| 87 | descriptor}, which is a Lisp string that you pass to |
| 88 | @code{modify-syntax-entry} to specify the desired syntax. |
| 89 | |
| 90 | Emacs defines a number of @dfn{syntax classes}. Each syntax table |
| 91 | puts each character into one class. There is no necessary relationship |
| 92 | between the class of a character in one syntax table and its class in |
| 93 | any other table. |
| 94 | |
| 95 | Each class is designated by a mnemonic character, which serves as the |
| 96 | name of the class when you need to specify a class. Usually the |
| 97 | designator character is one that is frequently in that class; however, |
| 98 | its meaning as a designator is unvarying and independent of what syntax |
| 99 | that character currently has. |
| 100 | |
| 101 | @cindex syntax descriptor |
| 102 | A syntax descriptor is a Lisp string that specifies a syntax class, a |
| 103 | matching character (used only for the parenthesis classes) and flags. |
| 104 | The first character is the designator for a syntax class. The second |
| 105 | character is the character to match; if it is unused, put a space there. |
| 106 | Then come the characters for any desired flags. If no matching |
| 107 | character or flags are needed, one character is sufficient. |
| 108 | |
| 109 | For example, the descriptor for the character @samp{*} in C mode is |
| 110 | @samp{@w{. 23}} (i.e., punctuation, matching character slot unused, |
| 111 | second character of a comment-starter, first character of an |
| 112 | comment-ender), and the entry for @samp{/} is @samp{@w{. 14}} (i.e., |
| 113 | punctuation, matching character slot unused, first character of a |
| 114 | comment-starter, second character of a comment-ender). |
| 115 | |
| 116 | @menu |
| 117 | * Syntax Class Table:: Table of syntax classes. |
| 118 | * Syntax Flags:: Additional flags each character can have. |
| 119 | @end menu |
| 120 | |
| 121 | @node Syntax Class Table |
| 122 | @subsection Table of Syntax Classes |
| 123 | |
| 124 | Here is a table of syntax classes, the characters that stand for them, |
| 125 | their meanings, and examples of their use. |
| 126 | |
| 127 | @deffn {Syntax class} @w{whitespace character} |
| 128 | @dfn{Whitespace characters} (designated with @w{@samp{@ }} or @samp{-}) |
| 129 | separate symbols and words from each other. Typically, whitespace |
| 130 | characters have no other syntactic significance, and multiple whitespace |
| 131 | characters are syntactically equivalent to a single one. Space, tab, |
| 132 | newline and formfeed are almost always classified as whitespace. |
| 133 | @end deffn |
| 134 | |
| 135 | @deffn {Syntax class} @w{word constituent} |
| 136 | @dfn{Word constituents} (designated with @samp{w}) are parts of normal |
| 137 | English words and are typically used in variable and command names in |
| 138 | programs. All upper- and lower-case letters, and the digits, are typically |
| 139 | word constituents. |
| 140 | @end deffn |
| 141 | |
| 142 | @deffn {Syntax class} @w{symbol constituent} |
| 143 | @dfn{Symbol constituents} (designated with @samp{_}) are the extra |
| 144 | characters that are used in variable and command names along with word |
| 145 | constituents. For example, the symbol constituents class is used in |
| 146 | Lisp mode to indicate that certain characters may be part of symbol |
| 147 | names even though they are not part of English words. These characters |
| 148 | are @samp{$&*+-_<>}. In standard C, the only non-word-constituent |
| 149 | character that is valid in symbols is underscore (@samp{_}). |
| 150 | @end deffn |
| 151 | |
| 152 | @deffn {Syntax class} @w{punctuation character} |
| 153 | @dfn{Punctuation characters} (@samp{.}) are those characters that are |
| 154 | used as punctuation in English, or are used in some way in a programming |
| 155 | language to separate symbols from one another. Most programming |
| 156 | language modes, including Emacs Lisp mode, have no characters in this |
| 157 | class since the few characters that are not symbol or word constituents |
| 158 | all have other uses. |
| 159 | @end deffn |
| 160 | |
| 161 | @deffn {Syntax class} @w{open parenthesis character} |
| 162 | @deffnx {Syntax class} @w{close parenthesis character} |
| 163 | @cindex parenthesis syntax |
| 164 | Open and close @dfn{parenthesis characters} are characters used in |
| 165 | dissimilar pairs to surround sentences or expressions. Such a grouping |
| 166 | is begun with an open parenthesis character and terminated with a close. |
| 167 | Each open parenthesis character matches a particular close parenthesis |
| 168 | character, and vice versa. Normally, Emacs indicates momentarily the |
| 169 | matching open parenthesis when you insert a close parenthesis. |
| 170 | @xref{Blinking}. |
| 171 | |
| 172 | The class of open parentheses is designated with @samp{(}, and that of |
| 173 | close parentheses with @samp{)}. |
| 174 | |
| 175 | In English text, and in C code, the parenthesis pairs are @samp{()}, |
| 176 | @samp{[]}, and @samp{@{@}}. In Emacs Lisp, the delimiters for lists and |
| 177 | vectors (@samp{()} and @samp{[]}) are classified as parenthesis |
| 178 | characters. |
| 179 | @end deffn |
| 180 | |
| 181 | @deffn {Syntax class} @w{string quote} |
| 182 | @dfn{String quote characters} (designated with @samp{"}) are used in |
| 183 | many languages, including Lisp and C, to delimit string constants. The |
| 184 | same string quote character appears at the beginning and the end of a |
| 185 | string. Such quoted strings do not nest. |
| 186 | |
| 187 | The parsing facilities of Emacs consider a string as a single token. |
| 188 | The usual syntactic meanings of the characters in the string are |
| 189 | suppressed. |
| 190 | |
| 191 | The Lisp modes have two string quote characters: double-quote (@samp{"}) |
| 192 | and vertical bar (@samp{|}). @samp{|} is not used in Emacs Lisp, but it |
| 193 | is used in Common Lisp. C also has two string quote characters: |
| 194 | double-quote for strings, and single-quote (@samp{'}) for character |
| 195 | constants. |
| 196 | |
| 197 | English text has no string quote characters because English is not a |
| 198 | programming language. Although quotation marks are used in English, |
| 199 | we do not want them to turn off the usual syntactic properties of |
| 200 | other characters in the quotation. |
| 201 | @end deffn |
| 202 | |
| 203 | @deffn {Syntax class} @w{escape} |
| 204 | An @dfn{escape character} (designated with @samp{\}) starts an escape |
| 205 | sequence such as is used in C string and character constants. The |
| 206 | character @samp{\} belongs to this class in both C and Lisp. (In C, it |
| 207 | is used thus only inside strings, but it turns out to cause no trouble |
| 208 | to treat it this way throughout C code.) |
| 209 | |
| 210 | Characters in this class count as part of words if |
| 211 | @code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}. |
| 212 | @end deffn |
| 213 | |
| 214 | @deffn {Syntax class} @w{character quote} |
| 215 | A @dfn{character quote character} (designated with @samp{/}) quotes the |
| 216 | following character so that it loses its normal syntactic meaning. This |
| 217 | differs from an escape character in that only the character immediately |
| 218 | following is ever affected. |
| 219 | |
| 220 | Characters in this class count as part of words if |
| 221 | @code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}. |
| 222 | |
| 223 | This class is used for backslash in @TeX{} mode. |
| 224 | @end deffn |
| 225 | |
| 226 | @deffn {Syntax class} @w{paired delimiter} |
| 227 | @dfn{Paired delimiter characters} (designated with @samp{$}) are like |
| 228 | string quote characters except that the syntactic properties of the |
| 229 | characters between the delimiters are not suppressed. Only @TeX{} mode |
| 230 | uses a paired delimiter presently---the @samp{$} that both enters and |
| 231 | leaves math mode. |
| 232 | @end deffn |
| 233 | |
| 234 | @deffn {Syntax class} @w{expression prefix} |
| 235 | An @dfn{expression prefix operator} (designated with @samp{'}) is used |
| 236 | for syntactic operators that are part of an expression if they appear |
| 237 | next to one. These characters in Lisp include the apostrophe, @samp{'} |
| 238 | (used for quoting), the comma, @samp{,} (used in macros), and @samp{#} |
| 239 | (used in the read syntax for certain data types). |
| 240 | @end deffn |
| 241 | |
| 242 | @deffn {Syntax class} @w{comment starter} |
| 243 | @deffnx {Syntax class} @w{comment ender} |
| 244 | @cindex comment syntax |
| 245 | The @dfn{comment starter} and @dfn{comment ender} characters are used in |
| 246 | various languages to delimit comments. These classes are designated |
| 247 | with @samp{<} and @samp{>}, respectively. |
| 248 | |
| 249 | English text has no comment characters. In Lisp, the semicolon |
| 250 | (@samp{;}) starts a comment and a newline or formfeed ends one. |
| 251 | @end deffn |
| 252 | |
| 253 | @deffn {Syntax class} @w{inherit} |
| 254 | This syntax class does not specify a syntax. It says to look in the |
| 255 | standard syntax table to find the syntax of this character. The |
| 256 | designator for this syntax code is @samp{@@}. |
| 257 | @end deffn |
| 258 | |
| 259 | @node Syntax Flags |
| 260 | @subsection Syntax Flags |
| 261 | @cindex syntax flags |
| 262 | |
| 263 | In addition to the classes, entries for characters in a syntax table |
| 264 | can include flags. There are six possible flags, represented by the |
| 265 | characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b} and |
| 266 | @samp{p}. |
| 267 | |
| 268 | All the flags except @samp{p} are used to describe multi-character |
| 269 | comment delimiters. The digit flags indicate that a character can |
| 270 | @emph{also} be part of a comment sequence, in addition to the syntactic |
| 271 | properties associated with its character class. The flags are |
| 272 | independent of the class and each other for the sake of characters such |
| 273 | as @samp{*} in C mode, which is a punctuation character, @emph{and} the |
| 274 | second character of a start-of-comment sequence (@samp{/*}), @emph{and} |
| 275 | the first character of an end-of-comment sequence (@samp{*/}). |
| 276 | |
| 277 | The flags for a character @var{c} are: |
| 278 | |
| 279 | @itemize @bullet |
| 280 | @item |
| 281 | @samp{1} means @var{c} is the start of a two-character comment-start |
| 282 | sequence. |
| 283 | |
| 284 | @item |
| 285 | @samp{2} means @var{c} is the second character of such a sequence. |
| 286 | |
| 287 | @item |
| 288 | @samp{3} means @var{c} is the start of a two-character comment-end |
| 289 | sequence. |
| 290 | |
| 291 | @item |
| 292 | @samp{4} means @var{c} is the second character of such a sequence. |
| 293 | |
| 294 | @item |
| 295 | @c Emacs 19 feature |
| 296 | @samp{b} means that @var{c} as a comment delimiter belongs to the |
| 297 | alternative ``b'' comment style. |
| 298 | |
| 299 | Emacs supports two comment styles simultaneously in any one syntax |
| 300 | table. This is for the sake of C++. Each style of comment syntax has |
| 301 | its own comment-start sequence and its own comment-end sequence. Each |
| 302 | comment must stick to one style or the other; thus, if it starts with |
| 303 | the comment-start sequence of style ``b'', it must also end with the |
| 304 | comment-end sequence of style ``b''. |
| 305 | |
| 306 | The two comment-start sequences must begin with the same character; only |
| 307 | the second character may differ. Mark the second character of the |
| 308 | ``b''-style comment-start sequence with the @samp{b} flag. |
| 309 | |
| 310 | A comment-end sequence (one or two characters) applies to the ``b'' |
| 311 | style if its first character has the @samp{b} flag set; otherwise, it |
| 312 | applies to the ``a'' style. |
| 313 | |
| 314 | The appropriate comment syntax settings for C++ are as follows: |
| 315 | |
| 316 | @table @asis |
| 317 | @item @samp{/} |
| 318 | @samp{124b} |
| 319 | @item @samp{*} |
| 320 | @samp{23} |
| 321 | @item newline |
| 322 | @samp{>b} |
| 323 | @end table |
| 324 | |
| 325 | This defines four comment-delimiting sequences: |
| 326 | |
| 327 | @table @asis |
| 328 | @item @samp{/*} |
| 329 | This is a comment-start sequence for ``a'' style because the |
| 330 | second character, @samp{*}, does not have the @samp{b} flag. |
| 331 | |
| 332 | @item @samp{//} |
| 333 | This is a comment-start sequence for ``b'' style because the second |
| 334 | character, @samp{/}, does have the @samp{b} flag. |
| 335 | |
| 336 | @item @samp{*/} |
| 337 | This is a comment-end sequence for ``a'' style because the first |
| 338 | character, @samp{*}, does not have the @samp{b} flag |
| 339 | |
| 340 | @item newline |
| 341 | This is a comment-end sequence for ``b'' style, because the newline |
| 342 | character has the @samp{b} flag. |
| 343 | @end table |
| 344 | |
| 345 | @item |
| 346 | @c Emacs 19 feature |
| 347 | @samp{p} identifies an additional ``prefix character'' for Lisp syntax. |
| 348 | These characters are treated as whitespace when they appear between |
| 349 | expressions. When they appear within an expression, they are handled |
| 350 | according to their usual syntax codes. |
| 351 | |
| 352 | The function @code{backward-prefix-chars} moves back over these |
| 353 | characters, as well as over characters whose primary syntax class is |
| 354 | prefix (@samp{'}). @xref{Motion and Syntax}. |
| 355 | @end itemize |
| 356 | |
| 357 | @node Syntax Table Functions |
| 358 | @section Syntax Table Functions |
| 359 | |
| 360 | In this section we describe functions for creating, accessing and |
| 361 | altering syntax tables. |
| 362 | |
| 363 | @defun make-syntax-table |
| 364 | This function creates a new syntax table. Character codes 0 through |
| 365 | 31 and 128 through 255 are set up to inherit from the standard syntax |
| 366 | table. The other character codes are set up by copying what the |
| 367 | standard syntax table says about them. |
| 368 | |
| 369 | Most major mode syntax tables are created in this way. |
| 370 | @end defun |
| 371 | |
| 372 | @defun copy-syntax-table &optional table |
| 373 | This function constructs a copy of @var{table} and returns it. If |
| 374 | @var{table} is not supplied (or is @code{nil}), it returns a copy of the |
| 375 | current syntax table. Otherwise, an error is signaled if @var{table} is |
| 376 | not a syntax table. |
| 377 | @end defun |
| 378 | |
| 379 | @deffn Command modify-syntax-entry char syntax-descriptor &optional table |
| 380 | This function sets the syntax entry for @var{char} according to |
| 381 | @var{syntax-descriptor}. The syntax is changed only for @var{table}, |
| 382 | which defaults to the current buffer's syntax table, and not in any |
| 383 | other syntax table. The argument @var{syntax-descriptor} specifies the |
| 384 | desired syntax; this is a string beginning with a class designator |
| 385 | character, and optionally containing a matching character and flags as |
| 386 | well. @xref{Syntax Descriptors}. |
| 387 | |
| 388 | This function always returns @code{nil}. The old syntax information in |
| 389 | the table for this character is discarded. |
| 390 | |
| 391 | An error is signaled if the first character of the syntax descriptor is not |
| 392 | one of the twelve syntax class designator characters. An error is also |
| 393 | signaled if @var{char} is not a character. |
| 394 | |
| 395 | @example |
| 396 | @group |
| 397 | @exdent @r{Examples:} |
| 398 | |
| 399 | ;; @r{Put the space character in class whitespace.} |
| 400 | (modify-syntax-entry ?\ " ") |
| 401 | @result{} nil |
| 402 | @end group |
| 403 | |
| 404 | @group |
| 405 | ;; @r{Make @samp{$} an open parenthesis character,} |
| 406 | ;; @r{with @samp{^} as its matching close.} |
| 407 | (modify-syntax-entry ?$ "(^") |
| 408 | @result{} nil |
| 409 | @end group |
| 410 | |
| 411 | @group |
| 412 | ;; @r{Make @samp{^} a close parenthesis character,} |
| 413 | ;; @r{with @samp{$} as its matching open.} |
| 414 | (modify-syntax-entry ?^ ")$") |
| 415 | @result{} nil |
| 416 | @end group |
| 417 | |
| 418 | @group |
| 419 | ;; @r{Make @samp{/} a punctuation character,} |
| 420 | ;; @r{the first character of a start-comment sequence,} |
| 421 | ;; @r{and the second character of an end-comment sequence.} |
| 422 | ;; @r{This is used in C mode.} |
| 423 | (modify-syntax-entry ?/ ". 14") |
| 424 | @result{} nil |
| 425 | @end group |
| 426 | @end example |
| 427 | @end deffn |
| 428 | |
| 429 | @defun char-syntax character |
| 430 | This function returns the syntax class of @var{character}, represented |
| 431 | by its mnemonic designator character. This @emph{only} returns the |
| 432 | class, not any matching parenthesis or flags. |
| 433 | |
| 434 | An error is signaled if @var{char} is not a character. |
| 435 | |
| 436 | The following examples apply to C mode. The first example shows that |
| 437 | the syntax class of space is whitespace (represented by a space). The |
| 438 | second example shows that the syntax of @samp{/} is punctuation. This |
| 439 | does not show the fact that it is also part of comment-start and -end |
| 440 | sequences. The third example shows that open parenthesis is in the class |
| 441 | of open parentheses. This does not show the fact that it has a matching |
| 442 | character, @samp{)}. |
| 443 | |
| 444 | @example |
| 445 | @group |
| 446 | (char-to-string (char-syntax ?\ )) |
| 447 | @result{} " " |
| 448 | @end group |
| 449 | |
| 450 | @group |
| 451 | (char-to-string (char-syntax ?/)) |
| 452 | @result{} "." |
| 453 | @end group |
| 454 | |
| 455 | @group |
| 456 | (char-to-string (char-syntax ?\()) |
| 457 | @result{} "(" |
| 458 | @end group |
| 459 | @end example |
| 460 | @end defun |
| 461 | |
| 462 | @defun set-syntax-table table |
| 463 | This function makes @var{table} the syntax table for the current buffer. |
| 464 | It returns @var{table}. |
| 465 | @end defun |
| 466 | |
| 467 | @defun syntax-table |
| 468 | This function returns the current syntax table, which is the table for |
| 469 | the current buffer. |
| 470 | @end defun |
| 471 | |
| 472 | @node Motion and Syntax |
| 473 | @section Motion and Syntax |
| 474 | |
| 475 | This section describes functions for moving across characters in |
| 476 | certain syntax classes. None of these functions exists in Emacs |
| 477 | version 18 or earlier. |
| 478 | |
| 479 | @defun skip-syntax-forward syntaxes &optional limit |
| 480 | This function moves point forward across characters having syntax classes |
| 481 | mentioned in @var{syntaxes}. It stops when it encounters the end of |
| 482 | the buffer, or position @var{limit} (if specified), or a character it is |
| 483 | not supposed to skip. |
| 484 | @ignore @c may want to change this. |
| 485 | The return value is the distance traveled, which is a nonnegative |
| 486 | integer. |
| 487 | @end ignore |
| 488 | @end defun |
| 489 | |
| 490 | @defun skip-syntax-backward syntaxes &optional limit |
| 491 | This function moves point backward across characters whose syntax |
| 492 | classes are mentioned in @var{syntaxes}. It stops when it encounters |
| 493 | the beginning of the buffer, or position @var{limit} (if specified), or a |
| 494 | character it is not supposed to skip. |
| 495 | @ignore @c may want to change this. |
| 496 | The return value indicates the distance traveled. It is an integer that |
| 497 | is zero or less. |
| 498 | @end ignore |
| 499 | @end defun |
| 500 | |
| 501 | @defun backward-prefix-chars |
| 502 | This function moves point backward over any number of characters with |
| 503 | expression prefix syntax. This includes both characters in the |
| 504 | expression prefix syntax class, and characters with the @samp{p} flag. |
| 505 | @end defun |
| 506 | |
| 507 | @node Parsing Expressions |
| 508 | @section Parsing Balanced Expressions |
| 509 | |
| 510 | Here are several functions for parsing and scanning balanced |
| 511 | expressions, also known as @dfn{sexps}, in which parentheses match in |
| 512 | pairs. The syntax table controls the interpretation of characters, so |
| 513 | these functions can be used for Lisp expressions when in Lisp mode and |
| 514 | for C expressions when in C mode. @xref{List Motion}, for convenient |
| 515 | higher-level functions for moving over balanced expressions. |
| 516 | |
| 517 | @defun parse-partial-sexp start limit &optional target-depth stop-before state stop-comment |
| 518 | This function parses a sexp in the current buffer starting at |
| 519 | @var{start}, not scanning past @var{limit}. It stops at position |
| 520 | @var{limit} or when certain criteria described below are met, and sets |
| 521 | point to the location where parsing stops. It returns a value |
| 522 | describing the status of the parse at the point where it stops. |
| 523 | |
| 524 | If @var{state} is @code{nil}, @var{start} is assumed to be at the top |
| 525 | level of parenthesis structure, such as the beginning of a function |
| 526 | definition. Alternatively, you might wish to resume parsing in the |
| 527 | middle of the structure. To do this, you must provide a @var{state} |
| 528 | argument that describes the initial status of parsing. |
| 529 | |
| 530 | @cindex parenthesis depth |
| 531 | If the third argument @var{target-depth} is non-@code{nil}, parsing |
| 532 | stops if the depth in parentheses becomes equal to @var{target-depth}. |
| 533 | The depth starts at 0, or at whatever is given in @var{state}. |
| 534 | |
| 535 | If the fourth argument @var{stop-before} is non-@code{nil}, parsing |
| 536 | stops when it comes to any character that starts a sexp. If |
| 537 | @var{stop-comment} is non-@code{nil}, parsing stops when it comes to the |
| 538 | start of a comment. |
| 539 | |
| 540 | @cindex parse state |
| 541 | The fifth argument @var{state} is an eight-element list of the same |
| 542 | form as the value of this function, described below. The return value |
| 543 | of one call may be used to initialize the state of the parse on another |
| 544 | call to @code{parse-partial-sexp}. |
| 545 | |
| 546 | The result is a list of eight elements describing the final state of |
| 547 | the parse: |
| 548 | |
| 549 | @enumerate 0 |
| 550 | @item |
| 551 | The depth in parentheses, counting from 0. |
| 552 | |
| 553 | @item |
| 554 | @cindex innermost containing parentheses |
| 555 | The character position of the start of the innermost parenthetical |
| 556 | grouping containing the stopping point; @code{nil} if none. |
| 557 | |
| 558 | @item |
| 559 | @cindex previous complete subexpression |
| 560 | The character position of the start of the last complete subexpression |
| 561 | terminated; @code{nil} if none. |
| 562 | |
| 563 | @item |
| 564 | @cindex inside string |
| 565 | Non-@code{nil} if inside a string. More precisely, this is the |
| 566 | character that will terminate the string. |
| 567 | |
| 568 | @item |
| 569 | @cindex inside comment |
| 570 | @code{t} if inside a comment (of either style). |
| 571 | |
| 572 | @item |
| 573 | @cindex quote character |
| 574 | @code{t} if point is just after a quote character. |
| 575 | |
| 576 | @item |
| 577 | The minimum parenthesis depth encountered during this scan. |
| 578 | |
| 579 | @item |
| 580 | @code{t} if inside a comment of style ``b''. |
| 581 | @end enumerate |
| 582 | |
| 583 | Elements 0, 3, 4, 5 and 7 are significant in the argument @var{state}. |
| 584 | |
| 585 | @cindex indenting with parentheses |
| 586 | This function is most often used to compute indentation for languages |
| 587 | that have nested parentheses. |
| 588 | @end defun |
| 589 | |
| 590 | @defun scan-lists from count depth |
| 591 | This function scans forward @var{count} balanced parenthetical groupings |
| 592 | from character number @var{from}. It returns the character position |
| 593 | where the scan stops. |
| 594 | |
| 595 | If @var{depth} is nonzero, parenthesis depth counting begins from that |
| 596 | value. The only candidates for stopping are places where the depth in |
| 597 | parentheses becomes zero; @code{scan-lists} counts @var{count} such |
| 598 | places and then stops. Thus, a positive value for @var{depth} means go |
| 599 | out @var{depth} levels of parenthesis. |
| 600 | |
| 601 | Scanning ignores comments if @code{parse-sexp-ignore-comments} is |
| 602 | non-@code{nil}. |
| 603 | |
| 604 | If the scan reaches the beginning or end of the buffer (or its |
| 605 | accessible portion), and the depth is not zero, an error is signaled. |
| 606 | If the depth is zero but the count is not used up, @code{nil} is |
| 607 | returned. |
| 608 | @end defun |
| 609 | |
| 610 | @defun scan-sexps from count |
| 611 | This function scans forward @var{count} sexps from character position |
| 612 | @var{from}. It returns the character position where the scan stops. |
| 613 | |
| 614 | Scanning ignores comments if @code{parse-sexp-ignore-comments} is |
| 615 | non-@code{nil}. |
| 616 | |
| 617 | If the scan reaches the beginning or end of (the accessible part of) the |
| 618 | buffer in the middle of a parenthetical grouping, an error is signaled. |
| 619 | If it reaches the beginning or end between groupings but before count is |
| 620 | used up, @code{nil} is returned. |
| 621 | @end defun |
| 622 | |
| 623 | @defvar parse-sexp-ignore-comments |
| 624 | @cindex skipping comments |
| 625 | If the value is non-@code{nil}, then comments are treated as |
| 626 | whitespace by the functions in this section and by @code{forward-sexp}. |
| 627 | |
| 628 | In older Emacs versions, this feature worked only when the comment |
| 629 | terminator is something like @samp{*/}, and appears only to end a |
| 630 | comment. In languages where newlines terminate comments, it was |
| 631 | necessary make this variable @code{nil}, since not every newline is the |
| 632 | end of a comment. This limitation no longer exists. |
| 633 | @end defvar |
| 634 | |
| 635 | You can use @code{forward-comment} to move forward or backward over |
| 636 | one comment or several comments. |
| 637 | |
| 638 | @defun forward-comment count |
| 639 | This function moves point forward across @var{count} comments (backward, |
| 640 | if @var{count} is negative). If it finds anything other than a comment |
| 641 | or whitespace, it stops, leaving point at the place where it stopped. |
| 642 | It also stops after satisfying @var{count}. |
| 643 | @end defun |
| 644 | |
| 645 | To move forward over all comments and whitespace following point, use |
| 646 | @code{(forward-comment (buffer-size))}. @code{(buffer-size)} is a good |
| 647 | argument to use, because the number of comments in the buffer cannot |
| 648 | exceed that many. |
| 649 | |
| 650 | @node Standard Syntax Tables |
| 651 | @section Some Standard Syntax Tables |
| 652 | |
| 653 | Most of the major modes in Emacs have their own syntax tables. Here |
| 654 | are several of them: |
| 655 | |
| 656 | @defun standard-syntax-table |
| 657 | This function returns the standard syntax table, which is the syntax |
| 658 | table used in Fundamental mode. |
| 659 | @end defun |
| 660 | |
| 661 | @defvar text-mode-syntax-table |
| 662 | The value of this variable is the syntax table used in Text mode. |
| 663 | @end defvar |
| 664 | |
| 665 | @defvar c-mode-syntax-table |
| 666 | The value of this variable is the syntax table for C-mode buffers. |
| 667 | @end defvar |
| 668 | |
| 669 | @defvar emacs-lisp-mode-syntax-table |
| 670 | The value of this variable is the syntax table used in Emacs Lisp mode |
| 671 | by editing commands. (It has no effect on the Lisp @code{read} |
| 672 | function.) |
| 673 | @end defvar |
| 674 | |
| 675 | @node Syntax Table Internals |
| 676 | @section Syntax Table Internals |
| 677 | @cindex syntax table internals |
| 678 | |
| 679 | Each element of a syntax table is an integer that encodes the syntax |
| 680 | of one character: the syntax class, possible matching character, and |
| 681 | flags. Lisp programs don't usually work with the elements directly; the |
| 682 | Lisp-level syntax table functions usually work with syntax descriptors |
| 683 | (@pxref{Syntax Descriptors}). |
| 684 | |
| 685 | The low 8 bits of each element of a syntax table indicate the |
| 686 | syntax class. |
| 687 | |
| 688 | @table @asis |
| 689 | @item @i{Integer} |
| 690 | @i{Class} |
| 691 | @item 0 |
| 692 | whitespace |
| 693 | @item 1 |
| 694 | punctuation |
| 695 | @item 2 |
| 696 | word |
| 697 | @item 3 |
| 698 | symbol |
| 699 | @item 4 |
| 700 | open parenthesis |
| 701 | @item 5 |
| 702 | close parenthesis |
| 703 | @item 6 |
| 704 | expression prefix |
| 705 | @item 7 |
| 706 | string quote |
| 707 | @item 8 |
| 708 | paired delimiter |
| 709 | @item 9 |
| 710 | escape |
| 711 | @item 10 |
| 712 | character quote |
| 713 | @item 11 |
| 714 | comment-start |
| 715 | @item 12 |
| 716 | comment-end |
| 717 | @item 13 |
| 718 | inherit |
| 719 | @end table |
| 720 | |
| 721 | The next 8 bits are the matching opposite parenthesis (if the |
| 722 | character has parenthesis syntax); otherwise, they are not meaningful. |
| 723 | The next 6 bits are the flags. |