Merge from emacs-24; up to 2012-04-21T14:12:27Z!sdl.web@gmail.com
[bpt/emacs.git] / doc / lispref / syntax.texi
CommitLineData
b8d4c8d0
GM
1@c -*-texinfo-*-
2@c This is part of the GNU Emacs Lisp Reference Manual.
acaf905b 3@c Copyright (C) 1990-1995, 1998-1999, 2001-2012
d24880de 4@c Free Software Foundation, Inc.
b8d4c8d0 5@c See the file elisp.texi for copying conditions.
b8d4c8d0
GM
6@node Syntax Tables, Abbrevs, Searching and Matching, Top
7@chapter Syntax Tables
8@cindex parsing buffer text
9@cindex syntax table
10@cindex text parsing
11
4230351b
CY
12 A @dfn{syntax table} specifies the syntactic role of each character
13in a buffer. It can be used to determine where words, symbols, and
14other syntactic constructs begin and end. This information is used by
15many Emacs facilities, including Font Lock mode (@pxref{Font Lock
16Mode}) and the various complex movement commands (@pxref{Motion}).
b8d4c8d0
GM
17
18@menu
19* Basics: Syntax Basics. Basic concepts of syntax tables.
4230351b 20* Syntax Descriptors:: How characters are classified.
b8d4c8d0
GM
21* Syntax Table Functions:: How to create, examine and alter syntax tables.
22* Syntax Properties:: Overriding syntax with text properties.
d24880de 23* Motion and Syntax:: Moving over characters with certain syntaxes.
b8d4c8d0
GM
24* Parsing Expressions:: Parsing balanced expressions
25 using the syntax table.
26* Standard Syntax Tables:: Syntax tables used by various major modes.
27* Syntax Table Internals:: How syntax table information is stored.
28* Categories:: Another way of classifying character syntax.
29@end menu
30
31@node Syntax Basics
32@section Syntax Table Concepts
33
b8d4c8d0
GM
34 A syntax table is a char-table (@pxref{Char-Tables}). The element at
35index @var{c} describes the character with code @var{c}. The element's
36value should be a list that encodes the syntax of the character in
37question.
38
39 Syntax tables are used only for moving across text, not for the Emacs
40Lisp reader. Emacs Lisp uses built-in syntactic rules when reading Lisp
41expressions, and these rules cannot be changed. (Some Lisp systems
42provide ways to redefine the read syntax, but we decided to leave this
43feature out of Emacs Lisp for simplicity.)
44
45 Each buffer has its own major mode, and each major mode has its own
4230351b
CY
46idea of the syntactic class of various characters. For example, in
47Lisp mode, the character @samp{;} begins a comment, but in C mode, it
b8d4c8d0 48terminates a statement. To support these variations, Emacs makes the
4230351b
CY
49syntax table local to each buffer. Typically, each major mode has its
50own syntax table and installs that table in each buffer that uses that
51mode. Changing this table alters the syntax in all those buffers as
52well as in any buffers subsequently put in that mode. Occasionally
53several similar modes share one syntax table. @xref{Example Major
54Modes}, for an example of how to set up a syntax table.
b8d4c8d0
GM
55
56A syntax table can inherit the data for some characters from the
57standard syntax table, while specifying other characters itself. The
58``inherit'' syntax class means ``inherit this character's syntax from
16152b76 59the standard syntax table''. Just changing the standard syntax for a
b8d4c8d0
GM
60character affects all syntax tables that inherit from it.
61
62@defun syntax-table-p object
63This function returns @code{t} if @var{object} is a syntax table.
64@end defun
65
66@node Syntax Descriptors
67@section Syntax Descriptors
68@cindex syntax class
69
4230351b
CY
70 The syntactic role of a character is called its @dfn{syntax class}.
71Each syntax table specifies the syntax class of each character. There
b8d4c8d0
GM
72is no necessary relationship between the class of a character in one
73syntax table and its class in any other table.
74
4230351b
CY
75 Each syntax class is designated by a mnemonic character, which
76serves as the name of the class when you need to specify a class.
77Usually, this designator character is one that is often assigned that
78class; however, its meaning as a designator is unvarying and
79independent of what syntax that character currently has. Thus,
80@samp{\} as a designator character always means ``escape character''
81syntax, regardless of whether the @samp{\} character actually has that
82syntax in the current syntax table.
83@ifnottex
84@xref{Syntax Class Table}, for a list of syntax classes.
85@end ifnottex
b8d4c8d0
GM
86
87@cindex syntax descriptor
4230351b
CY
88 A @dfn{syntax descriptor} is a Lisp string that describes the syntax
89classes and other syntactic properties of a character. When you want
90to modify the syntax of a character, that is done by calling the
91function @code{modify-syntax-entry} and passing a syntax descriptor as
92one of its arguments (@pxref{Syntax Table Functions}).
93
94 The first character in a syntax descriptor designates the syntax
95class. The second character specifies a matching character (e.g.@: in
96Lisp, the matching character for @samp{(} is @samp{)}); if there is no
97matching character, put a space there. Then come the characters for
98any desired flags.
99
100 If no matching character or flags are needed, only one character
101(specifying the syntax class) is sufficient.
b8d4c8d0
GM
102
103 For example, the syntax descriptor for the character @samp{*} in C
ad21a12a 104mode is @code{". 23"} (i.e., punctuation, matching character slot
b8d4c8d0
GM
105unused, second character of a comment-starter, first character of a
106comment-ender), and the entry for @samp{/} is @samp{@w{. 14}} (i.e.,
107punctuation, matching character slot unused, first character of a
108comment-starter, second character of a comment-ender).
109
110@menu
111* Syntax Class Table:: Table of syntax classes.
112* Syntax Flags:: Additional flags each character can have.
113@end menu
114
115@node Syntax Class Table
116@subsection Table of Syntax Classes
117
4230351b
CY
118 Here is a table of syntax classes, the characters that designate
119them, their meanings, and examples of their use.
b8d4c8d0 120
4230351b
CY
121@table @asis
122@item Whitespace characters: @samp{@ } or @samp{-}
123Characters that separate symbols and words from each other.
124Typically, whitespace characters have no other syntactic significance,
125and multiple whitespace characters are syntactically equivalent to a
126single one. Space, tab, and formfeed are classified as whitespace in
127almost all major modes.
128
129This syntax class can be designated by either @w{@samp{@ }} or
130@samp{-}. Both designators are equivalent.
131
132@item Word constituents: @samp{w}
133Parts of words in human languages. These are typically used in
134variable and command names in programs. All upper- and lower-case
135letters, and the digits, are typically word constituents.
136
137@item Symbol constituents: @samp{_}
138Extra characters used in variable and command names along with word
139constituents. Examples include the characters @samp{$&*+-_<>} in Lisp
140mode, which may be part of a symbol name even though they are not part
141of English words. In standard C, the only non-word-constituent
b8d4c8d0 142character that is valid in symbols is underscore (@samp{_}).
b8d4c8d0 143
4230351b
CY
144@item Punctuation characters: @samp{.}
145Characters used as punctuation in a human language, or used in a
146programming language to separate symbols from one another. Some
147programming language modes, such as Emacs Lisp mode, have no
148characters in this class since the few characters that are not symbol
149or word constituents all have other uses. Other programming language
150modes, such as C mode, use punctuation syntax for operators.
151
152@item Open parenthesis characters: @samp{(}
153@itemx Close parenthesis characters: @samp{)}
154Characters used in dissimilar pairs to surround sentences or
155expressions. Such a grouping is begun with an open parenthesis
156character and terminated with a close. Each open parenthesis
157character matches a particular close parenthesis character, and vice
158versa. Normally, Emacs indicates momentarily the matching open
159parenthesis when you insert a close parenthesis. @xref{Blinking}.
160
161In human languages, and in C code, the parenthesis pairs are
162@samp{()}, @samp{[]}, and @samp{@{@}}. In Emacs Lisp, the delimiters
163for lists and vectors (@samp{()} and @samp{[]}) are classified as
164parenthesis characters.
165
166@item String quotes: @samp{"}
167Characters used to delimit string constants. The same string quote
168character appears at the beginning and the end of a string. Such
169quoted strings do not nest.
b8d4c8d0
GM
170
171The parsing facilities of Emacs consider a string as a single token.
172The usual syntactic meanings of the characters in the string are
173suppressed.
174
175The Lisp modes have two string quote characters: double-quote (@samp{"})
176and vertical bar (@samp{|}). @samp{|} is not used in Emacs Lisp, but it
177is used in Common Lisp. C also has two string quote characters:
178double-quote for strings, and single-quote (@samp{'}) for character
179constants.
180
4230351b
CY
181Human text has no string quote characters. We do not want quotation
182marks to turn off the usual syntactic properties of other characters
183in the quotation.
b8d4c8d0 184
4230351b
CY
185@item Escape-syntax characters: @samp{\}
186Characters that start an escape sequence, such as is used in string
187and character constants. The character @samp{\} belongs to this class
188in both C and Lisp. (In C, it is used thus only inside strings, but
189it turns out to cause no trouble to treat it this way throughout C
190code.)
b8d4c8d0
GM
191
192Characters in this class count as part of words if
193@code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}.
b8d4c8d0 194
4230351b
CY
195@item Character quotes: @samp{/}
196Characters used to quote the following character so that it loses its
197normal syntactic meaning. This differs from an escape character in
198that only the character immediately following is ever affected.
b8d4c8d0
GM
199
200Characters in this class count as part of words if
201@code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}.
202
203This class is used for backslash in @TeX{} mode.
b8d4c8d0 204
4230351b
CY
205@item Paired delimiters: @samp{$}
206Similar to string quote characters, except that the syntactic
207properties of the characters between the delimiters are not
208suppressed. Only @TeX{} mode uses a paired delimiter presently---the
209@samp{$} that both enters and leaves math mode.
210
211@item Expression prefixes: @samp{'}
212Characters used for syntactic operators that are considered as part of
213an expression if they appear next to one. In Lisp modes, these
214characters include the apostrophe, @samp{'} (used for quoting), the
215comma, @samp{,} (used in macros), and @samp{#} (used in the read
216syntax for certain data types).
217
218@item Comment starters: @samp{<}
219@itemx Comment enders: @samp{>}
b8d4c8d0 220@cindex comment syntax
4230351b
CY
221Characters used in various languages to delimit comments. Human text
222has no comment characters. In Lisp, the semicolon (@samp{;}) starts a
223comment and a newline or formfeed ends one.
b8d4c8d0 224
4230351b
CY
225@item Inherit standard syntax: @samp{@@}
226This syntax class does not specify a particular syntax. It says to
227look in the standard syntax table to find the syntax of this
228character.
b8d4c8d0 229
4230351b
CY
230@item Generic comment delimiters: @samp{!}
231Characters that start or end a special kind of comment. @emph{Any}
232generic comment delimiter matches @emph{any} generic comment
233delimiter, but they cannot match a comment starter or comment ender;
234generic comment delimiters can only match each other.
b8d4c8d0
GM
235
236This syntax class is primarily meant for use with the
4230351b
CY
237@code{syntax-table} text property (@pxref{Syntax Properties}). You
238can mark any range of characters as forming a comment, by giving the
239first and last characters of the range @code{syntax-table} properties
b8d4c8d0 240identifying them as generic comment delimiters.
b8d4c8d0 241
4230351b
CY
242@item Generic string delimiters: @samp{|}
243Characters that start or end a string. This class differs from the
244string quote class in that @emph{any} generic string delimiter can
245match any other generic string delimiter; but they do not match
246ordinary string quote characters.
b8d4c8d0
GM
247
248This syntax class is primarily meant for use with the
4230351b
CY
249@code{syntax-table} text property (@pxref{Syntax Properties}). You
250can mark any range of characters as forming a string constant, by
251giving the first and last characters of the range @code{syntax-table}
252properties identifying them as generic string delimiters.
253@end table
b8d4c8d0
GM
254
255@node Syntax Flags
256@subsection Syntax Flags
257@cindex syntax flags
258
259 In addition to the classes, entries for characters in a syntax table
7cf78aac
SM
260can specify flags. There are eight possible flags, represented by the
261characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b}, @samp{c},
262@samp{n}, and @samp{p}.
263
264 All the flags except @samp{p} are used to describe comment
265delimiters. The digit flags are used for comment delimiters made up
266of 2 characters. They indicate that a character can @emph{also} be
267part of a comment sequence, in addition to the syntactic properties
268associated with its character class. The flags are independent of the
269class and each other for the sake of characters such as @samp{*} in
270C mode, which is a punctuation character, @emph{and} the second
271character of a start-of-comment sequence (@samp{/*}), @emph{and} the
272first character of an end-of-comment sequence (@samp{*/}). The flags
273@samp{b}, @samp{c}, and @samp{n} are used to qualify the corresponding
274comment delimiter.
b8d4c8d0
GM
275
276 Here is a table of the possible flags for a character @var{c},
277and what they mean:
278
279@itemize @bullet
280@item
281@samp{1} means @var{c} is the start of a two-character comment-start
282sequence.
283
284@item
285@samp{2} means @var{c} is the second character of such a sequence.
286
287@item
288@samp{3} means @var{c} is the start of a two-character comment-end
289sequence.
290
291@item
292@samp{4} means @var{c} is the second character of such a sequence.
293
294@item
b8d4c8d0 295@samp{b} means that @var{c} as a comment delimiter belongs to the
7cf78aac
SM
296alternative ``b'' comment style. For a two-character comment starter,
297this flag is only significant on the second char, and for a 2-character
298comment ender it is only significant on the first char.
b8d4c8d0 299
7cf78aac
SM
300@item
301@samp{c} means that @var{c} as a comment delimiter belongs to the
302alternative ``c'' comment style. For a two-character comment
303delimiter, @samp{c} on either character makes it of style ``c''.
b8d4c8d0 304
7cf78aac
SM
305@item
306@samp{n} on a comment delimiter character specifies
307that this kind of comment can be nested. For a two-character
308comment delimiter, @samp{n} on either character makes it
309nestable.
b8d4c8d0 310
7cf78aac
SM
311Emacs supports several comment styles simultaneously in any one syntax
312table. A comment style is a set of flags @samp{b}, @samp{c}, and
313@samp{n}, so there can be up to 8 different comment styles.
314Each comment delimiter has a style and only matches comment delimiters
315of the same style. Thus if a comment starts with the comment-start
316sequence of style ``bn'', it will extend until the next matching
317comment-end sequence of style ``bn''.
b8d4c8d0 318
7cf78aac 319The appropriate comment syntax settings for C++ can be as follows:
b8d4c8d0
GM
320
321@table @asis
322@item @samp{/}
7cf78aac 323@samp{124}
b8d4c8d0 324@item @samp{*}
7cf78aac 325@samp{23b}
b8d4c8d0 326@item newline
7cf78aac 327@samp{>}
b8d4c8d0
GM
328@end table
329
330This defines four comment-delimiting sequences:
331
332@table @asis
333@item @samp{/*}
7cf78aac
SM
334This is a comment-start sequence for ``b'' style because the
335second character, @samp{*}, has the @samp{b} flag.
b8d4c8d0
GM
336
337@item @samp{//}
7cf78aac
SM
338This is a comment-start sequence for ``a'' style because the second
339character, @samp{/}, does not have the @samp{b} flag.
b8d4c8d0
GM
340
341@item @samp{*/}
7cf78aac 342This is a comment-end sequence for ``b'' style because the first
35a30759 343character, @samp{*}, has the @samp{b} flag.
b8d4c8d0
GM
344
345@item newline
7cf78aac
SM
346This is a comment-end sequence for ``a'' style, because the newline
347character does not have the @samp{b} flag.
b8d4c8d0
GM
348@end table
349
b8d4c8d0
GM
350@item
351@c Emacs 19 feature
352@samp{p} identifies an additional ``prefix character'' for Lisp syntax.
353These characters are treated as whitespace when they appear between
354expressions. When they appear within an expression, they are handled
355according to their usual syntax classes.
356
357The function @code{backward-prefix-chars} moves back over these
358characters, as well as over characters whose primary syntax class is
359prefix (@samp{'}). @xref{Motion and Syntax}.
360@end itemize
361
362@node Syntax Table Functions
363@section Syntax Table Functions
364
365 In this section we describe functions for creating, accessing and
366altering syntax tables.
367
368@defun make-syntax-table &optional table
369This function creates a new syntax table, with all values initialized
370to @code{nil}. If @var{table} is non-@code{nil}, it becomes the
371parent of the new syntax table, otherwise the standard syntax table is
372the parent. Like all char-tables, a syntax table inherits from its
373parent. Thus the original syntax of all characters in the returned
374syntax table is determined by the parent. @xref{Char-Tables}.
375
376Most major mode syntax tables are created in this way.
377@end defun
378
379@defun copy-syntax-table &optional table
380This function constructs a copy of @var{table} and returns it. If
381@var{table} is not supplied (or is @code{nil}), it returns a copy of the
382standard syntax table. Otherwise, an error is signaled if @var{table} is
383not a syntax table.
384@end defun
385
386@deffn Command modify-syntax-entry char syntax-descriptor &optional table
387This function sets the syntax entry for @var{char} according to
4230351b 388@var{syntax-descriptor}. @var{char} must be a character, or a cons
f147ff75
CY
389cell of the form @code{(@var{min} . @var{max})}; in the latter case,
390the function sets the syntax entries for all characters in the range
391between @var{min} and @var{max}, inclusive.
392
393The syntax is changed only for @var{table}, which defaults to the
4230351b
CY
394current buffer's syntax table, and not in any other syntax table.
395
396The argument @var{syntax-descriptor} is a syntax descriptor for the
397desired syntax (i.e.@: a string beginning with a class designator
398character, and optionally containing a matching character and syntax
399flags). An error is signaled if the first character is not one of the
400seventeen syntax class designators. @xref{Syntax Descriptors}.
b8d4c8d0
GM
401
402This function always returns @code{nil}. The old syntax information in
403the table for this character is discarded.
404
b8d4c8d0
GM
405@example
406@group
407@exdent @r{Examples:}
408
409;; @r{Put the space character in class whitespace.}
410(modify-syntax-entry ?\s " ")
411 @result{} nil
412@end group
413
414@group
415;; @r{Make @samp{$} an open parenthesis character,}
416;; @r{with @samp{^} as its matching close.}
417(modify-syntax-entry ?$ "(^")
418 @result{} nil
419@end group
420
421@group
422;; @r{Make @samp{^} a close parenthesis character,}
423;; @r{with @samp{$} as its matching open.}
424(modify-syntax-entry ?^ ")$")
425 @result{} nil
426@end group
427
428@group
429;; @r{Make @samp{/} a punctuation character,}
430;; @r{the first character of a start-comment sequence,}
431;; @r{and the second character of an end-comment sequence.}
432;; @r{This is used in C mode.}
433(modify-syntax-entry ?/ ". 14")
434 @result{} nil
435@end group
436@end example
437@end deffn
438
439@defun char-syntax character
440This function returns the syntax class of @var{character}, represented
441by its mnemonic designator character. This returns @emph{only} the
442class, not any matching parenthesis or flags.
443
444An error is signaled if @var{char} is not a character.
445
446The following examples apply to C mode. The first example shows that
447the syntax class of space is whitespace (represented by a space). The
448second example shows that the syntax of @samp{/} is punctuation. This
449does not show the fact that it is also part of comment-start and -end
450sequences. The third example shows that open parenthesis is in the class
451of open parentheses. This does not show the fact that it has a matching
452character, @samp{)}.
453
454@example
455@group
456(string (char-syntax ?\s))
457 @result{} " "
458@end group
459
460@group
461(string (char-syntax ?/))
462 @result{} "."
463@end group
464
465@group
466(string (char-syntax ?\())
467 @result{} "("
468@end group
469@end example
470
471We use @code{string} to make it easier to see the character returned by
472@code{char-syntax}.
473@end defun
474
475@defun set-syntax-table table
476This function makes @var{table} the syntax table for the current buffer.
477It returns @var{table}.
478@end defun
479
480@defun syntax-table
481This function returns the current syntax table, which is the table for
482the current buffer.
483@end defun
484
485@defmac with-syntax-table @var{table} @var{body}@dots{}
486This macro executes @var{body} using @var{table} as the current syntax
487table. It returns the value of the last form in @var{body}, after
488restoring the old current syntax table.
489
490Since each buffer has its own current syntax table, we should make that
491more precise: @code{with-syntax-table} temporarily alters the current
492syntax table of whichever buffer is current at the time the macro
493execution starts. Other buffers are not affected.
494@end defmac
495
496@node Syntax Properties
497@section Syntax Properties
498@kindex syntax-table @r{(text property)}
499
500When the syntax table is not flexible enough to specify the syntax of
4230351b
CY
501a language, you can override the syntax table for specific character
502occurrences in the buffer, by applying a @code{syntax-table} text
503property. @xref{Text Properties}, for how to apply text properties.
b8d4c8d0 504
4230351b 505 The valid values of @code{syntax-table} text property are:
b8d4c8d0
GM
506
507@table @asis
508@item @var{syntax-table}
509If the property value is a syntax table, that table is used instead of
4230351b
CY
510the current buffer's syntax table to determine the syntax for the
511underlying text character.
b8d4c8d0
GM
512
513@item @code{(@var{syntax-code} . @var{matching-char})}
4230351b
CY
514A cons cell of this format specifies the syntax for the underlying
515text character. (@pxref{Syntax Table Internals})
b8d4c8d0
GM
516
517@item @code{nil}
518If the property is @code{nil}, the character's syntax is determined from
519the current syntax table in the usual way.
520@end table
521
522@defvar parse-sexp-lookup-properties
4230351b
CY
523If this is non-@code{nil}, the syntax scanning functions, like
524@code{forward-sexp}, pay attention to syntax text properties.
525Otherwise they use only the current syntax table.
526@end defvar
527
528@defvar syntax-propertize-function
529This variable, if non-@code{nil}, should store a function for applying
530@code{syntax-table} properties to a specified stretch of text. It is
531intended to be used by major modes to install a function which applies
532@code{syntax-table} properties in some mode-appropriate way.
533
534The function is called by @code{syntax-ppss} (@pxref{Position Parse}),
535and by Font Lock mode during syntactic fontification (@pxref{Syntactic
536Font Lock}). It is called with two arguments, @var{start} and
537@var{end}, which are the starting and ending positions of the text on
538which it should act. It is allowed to call @code{syntax-ppss} on any
539position before @var{end}. However, it should not call
540@code{syntax-ppss-flush-cache}; so, it is not allowed to call
541@code{syntax-ppss} on some position and later modify the buffer at an
542earlier position.
543@end defvar
544
545@defvar syntax-propertize-extend-region-functions
546This abnormal hook is run by the syntax parsing code prior to calling
547@code{syntax-propertize-function}. Its role is to help locate safe
548starting and ending buffer positions for passing to
549@code{syntax-propertize-function}. For example, a major mode can add
550a function to this hook to identify multi-line syntactic constructs,
551and ensure that the boundaries do not fall in the middle of one.
552
553Each function in this hook should accept two arguments, @var{start}
554and @var{end}. It should return either a cons cell of two adjusted
555buffer positions, @code{(@var{new-start} . @var{new-end})}, or
556@code{nil} if no adjustment is necessary. The hook functions are run
557in turn, repeatedly, until they all return @code{nil}.
b8d4c8d0
GM
558@end defvar
559
560@node Motion and Syntax
561@section Motion and Syntax
562
563 This section describes functions for moving across characters that
564have certain syntax classes.
565
566@defun skip-syntax-forward syntaxes &optional limit
567This function moves point forward across characters having syntax
568classes mentioned in @var{syntaxes} (a string of syntax class
569characters). It stops when it encounters the end of the buffer, or
570position @var{limit} (if specified), or a character it is not supposed
571to skip.
572
573If @var{syntaxes} starts with @samp{^}, then the function skips
574characters whose syntax is @emph{not} in @var{syntaxes}.
575
576The return value is the distance traveled, which is a nonnegative
577integer.
578@end defun
579
580@defun skip-syntax-backward syntaxes &optional limit
581This function moves point backward across characters whose syntax
582classes are mentioned in @var{syntaxes}. It stops when it encounters
583the beginning of the buffer, or position @var{limit} (if specified), or
584a character it is not supposed to skip.
585
586If @var{syntaxes} starts with @samp{^}, then the function skips
587characters whose syntax is @emph{not} in @var{syntaxes}.
588
589The return value indicates the distance traveled. It is an integer that
590is zero or less.
591@end defun
592
593@defun backward-prefix-chars
594This function moves point backward over any number of characters with
595expression prefix syntax. This includes both characters in the
596expression prefix syntax class, and characters with the @samp{p} flag.
597@end defun
598
599@node Parsing Expressions
600@section Parsing Expressions
601
602 This section describes functions for parsing and scanning balanced
a037c171
CY
603expressions. We will refer to such expressions as @dfn{sexps},
604following the terminology of Lisp, even though these functions can act
605on languages other than Lisp. Basically, a sexp is either a balanced
606parenthetical grouping, a string, or a ``symbol'' (i.e.@: a sequence
607of characters whose syntax is either word constituent or symbol
4230351b
CY
608constituent). However, characters in the expression prefix syntax
609class (@pxref{Syntax Class Table}) are treated as part of the sexp if
610they appear next to it.
b8d4c8d0
GM
611
612 The syntax table controls the interpretation of characters, so these
613functions can be used for Lisp expressions when in Lisp mode and for C
614expressions when in C mode. @xref{List Motion}, for convenient
615higher-level functions for moving over balanced expressions.
616
617 A character's syntax controls how it changes the state of the
618parser, rather than describing the state itself. For example, a
619string delimiter character toggles the parser state between
16152b76 620``in-string'' and ``in-code'', but the syntax of characters does not
b8d4c8d0
GM
621directly say whether they are inside a string. For example (note that
62215 is the syntax code for generic string delimiters),
623
624@example
625(put-text-property 1 9 'syntax-table '(15 . nil))
626@end example
627
628@noindent
629does not tell Emacs that the first eight chars of the current buffer
630are a string, but rather that they are all string delimiters. As a
631result, Emacs treats them as four consecutive empty string constants.
632
633@menu
634* Motion via Parsing:: Motion functions that work by parsing.
635* Position Parse:: Determining the syntactic state of a position.
636* Parser State:: How Emacs represents a syntactic state.
637* Low-Level Parsing:: Parsing across a specified region.
638* Control Parsing:: Parameters that affect parsing.
639@end menu
640
641@node Motion via Parsing
642@subsection Motion Commands Based on Parsing
643
644 This section describes simple point-motion functions that operate
645based on parsing expressions.
646
647@defun scan-lists from count depth
7b926f3f
CY
648This function scans forward @var{count} balanced parenthetical
649groupings from position @var{from}. It returns the position where the
650scan stops. If @var{count} is negative, the scan moves backwards.
651
652If @var{depth} is nonzero, treat the starting position as being
653@var{depth} parentheses deep. The scanner moves forward or backward
654through the buffer until the depth changes to zero @var{count} times.
655Hence, a positive value for @var{depth} has the effect of moving out
656@var{depth} levels of parenthesis from the starting position, while a
657negative @var{depth} has the effect of moving deeper by @var{-depth}
658levels of parenthesis.
b8d4c8d0
GM
659
660Scanning ignores comments if @code{parse-sexp-ignore-comments} is
661non-@code{nil}.
662
7b926f3f
CY
663If the scan reaches the beginning or end of the accessible part of the
664buffer before it has scanned over @var{count} parenthetical groupings,
665the return value is @code{nil} if the depth at that point is zero; if
666the depth is non-zero, a @code{scan-error} error is signaled.
b8d4c8d0
GM
667@end defun
668
669@defun scan-sexps from count
670This function scans forward @var{count} sexps from position @var{from}.
671It returns the position where the scan stops. If @var{count} is
672negative, the scan moves backwards.
673
674Scanning ignores comments if @code{parse-sexp-ignore-comments} is
675non-@code{nil}.
676
677If the scan reaches the beginning or end of (the accessible part of) the
678buffer while in the middle of a parenthetical grouping, an error is
679signaled. If it reaches the beginning or end between groupings but
680before count is used up, @code{nil} is returned.
681@end defun
682
683@defun forward-comment count
684This function moves point forward across @var{count} complete comments
685 (that is, including the starting delimiter and the terminating
686delimiter if any), plus any whitespace encountered on the way. It
687moves backward if @var{count} is negative. If it encounters anything
688other than a comment or whitespace, it stops, leaving point at the
689place where it stopped. This includes (for instance) finding the end
690of a comment when moving forward and expecting the beginning of one.
691The function also stops immediately after moving over the specified
692number of complete comments. If @var{count} comments are found as
693expected, with nothing except whitespace between them, it returns
694@code{t}; otherwise it returns @code{nil}.
695
696This function cannot tell whether the ``comments'' it traverses are
697embedded within a string. If they look like comments, it treats them
698as comments.
b8d4c8d0
GM
699
700To move forward over all comments and whitespace following point, use
4230351b
CY
701@code{(forward-comment (buffer-size))}. @code{(buffer-size)} is a
702good argument to use, because the number of comments in the buffer
703cannot exceed that many.
704@end defun
b8d4c8d0
GM
705
706@node Position Parse
707@subsection Finding the Parse State for a Position
708
709 For syntactic analysis, such as in indentation, often the useful
710thing is to compute the syntactic state corresponding to a given buffer
711position. This function does that conveniently.
712
713@defun syntax-ppss &optional pos
4230351b
CY
714This function returns the parser state that the parser would reach at
715position @var{pos} starting from the beginning of the buffer.
716@iftex
717See the next section for
718@end iftex
719@ifnottex
720@xref{Parser State},
721@end ifnottex
722for a description of the parser state.
723
724The return value is the same as if you call the low-level parsing
725function @code{parse-partial-sexp} to parse from the beginning of the
726buffer to @var{pos} (@pxref{Low-Level Parsing}). However,
727@code{syntax-ppss} uses a cache to speed up the computation. Due to
728this optimization, the second value (previous complete subexpression)
729and sixth value (minimum parenthesis depth) in the returned parser
730state are not meaningful.
731
732This function has a side effect: it adds a buffer-local entry to
733@code{before-change-functions} (@pxref{Change Hooks}) for
734@code{syntax-ppss-flush-cache} (see below). This entry keeps the
735cache consistent as the buffer is modified. However, the cache might
736not be updated if @code{syntax-ppss} is called while
b8d4c8d0 737@code{before-change-functions} is temporarily let-bound, or if the
4230351b
CY
738buffer is modified without running the hook, such as when using
739@code{inhibit-modification-hooks}. In those cases, it is necessary to
740call @code{syntax-ppss-flush-cache} explicitly.
741@end defun
b8d4c8d0 742
106e6894
CY
743@defun syntax-ppss-flush-cache beg &rest ignored-args
744This function flushes the cache used by @code{syntax-ppss}, starting
745at position @var{beg}. The remaining arguments, @var{ignored-args},
746are ignored; this function accepts them so that it can be directly
747used on hooks such as @code{before-change-functions} (@pxref{Change
748Hooks}).
b8d4c8d0
GM
749@end defun
750
751 Major modes can make @code{syntax-ppss} run faster by specifying
752where it needs to start parsing.
753
754@defvar syntax-begin-function
755If this is non-@code{nil}, it should be a function that moves to an
756earlier buffer position where the parser state is equivalent to
757@code{nil}---in other words, a position outside of any comment,
758string, or parenthesis. @code{syntax-ppss} uses it to further
759optimize its computations, when the cache gives no help.
760@end defvar
761
762@node Parser State
763@subsection Parser State
764@cindex parser state
765
4230351b
CY
766 A @dfn{parser state} is a list of ten elements describing the state
767of the syntactic parser, after it parses the text between a specified
768starting point and a specified end point in the buffer. Parsing
769functions such as @code{syntax-ppss}
770@ifnottex
771(@pxref{Position Parse})
772@end ifnottex
773return a parser state as the value. Some parsing functions accept a
774parser state as an argument, for resuming parsing.
775
776 Here are the meanings of the elements of the parser state:
b8d4c8d0
GM
777
778@enumerate 0
779@item
780The depth in parentheses, counting from 0. @strong{Warning:} this can
781be negative if there are more close parens than open parens between
4230351b 782the parser's starting point and end point.
b8d4c8d0
GM
783
784@item
785@cindex innermost containing parentheses
786The character position of the start of the innermost parenthetical
787grouping containing the stopping point; @code{nil} if none.
788
789@item
790@cindex previous complete subexpression
791The character position of the start of the last complete subexpression
792terminated; @code{nil} if none.
793
794@item
795@cindex inside string
796Non-@code{nil} if inside a string. More precisely, this is the
797character that will terminate the string, or @code{t} if a generic
798string delimiter character should terminate it.
799
800@item
801@cindex inside comment
4230351b
CY
802@code{t} if inside a non-nestable comment (of any comment style;
803@pxref{Syntax Flags}); or the comment nesting level if inside a
804comment that can be nested.
b8d4c8d0
GM
805
806@item
807@cindex quote character
4230351b 808@code{t} if the end point is just after a quote character.
b8d4c8d0
GM
809
810@item
811The minimum parenthesis depth encountered during this scan.
812
813@item
4230351b
CY
814What kind of comment is active: @code{nil} if not in a comment or in a
815comment of style @samp{a}; 1 for a comment of style @samp{b}; 2 for a
816comment of style @samp{c}; and @code{syntax-table} for a comment that
817should be ended by a generic comment delimiter character.
b8d4c8d0
GM
818
819@item
820The string or comment start position. While inside a comment, this is
821the position where the comment began; while inside a string, this is the
822position where the string began. When outside of strings and comments,
823this element is @code{nil}.
824
825@item
826Internal data for continuing the parsing. The meaning of this
827data is subject to change; it is used if you pass this list
828as the @var{state} argument to another call.
829@end enumerate
830
831 Elements 1, 2, and 6 are ignored in a state which you pass as an
832argument to continue parsing, and elements 8 and 9 are used only in
4230351b
CY
833trivial cases. Those elements are mainly used internally by the
834parser code.
b8d4c8d0
GM
835
836 One additional piece of useful information is available from a
837parser state using this function:
838
839@defun syntax-ppss-toplevel-pos state
840This function extracts, from parser state @var{state}, the last
841position scanned in the parse which was at top level in grammatical
842structure. ``At top level'' means outside of any parentheses,
843comments, or strings.
844
845The value is @code{nil} if @var{state} represents a parse which has
846arrived at a top level position.
847@end defun
848
b8d4c8d0
GM
849@node Low-Level Parsing
850@subsection Low-Level Parsing
851
852 The most basic way to use the expression parser is to tell it
853to start at a given position with a certain state, and parse up to
854a specified end position.
855
856@defun parse-partial-sexp start limit &optional target-depth stop-before state stop-comment
857This function parses a sexp in the current buffer starting at
858@var{start}, not scanning past @var{limit}. It stops at position
859@var{limit} or when certain criteria described below are met, and sets
860point to the location where parsing stops. It returns a parser state
861describing the status of the parse at the point where it stops.
862
863@cindex parenthesis depth
864If the third argument @var{target-depth} is non-@code{nil}, parsing
865stops if the depth in parentheses becomes equal to @var{target-depth}.
866The depth starts at 0, or at whatever is given in @var{state}.
867
868If the fourth argument @var{stop-before} is non-@code{nil}, parsing
869stops when it comes to any character that starts a sexp. If
870@var{stop-comment} is non-@code{nil}, parsing stops when it comes to the
871start of a comment. If @var{stop-comment} is the symbol
872@code{syntax-table}, parsing stops after the start of a comment or a
873string, or the end of a comment or a string, whichever comes first.
874
875If @var{state} is @code{nil}, @var{start} is assumed to be at the top
876level of parenthesis structure, such as the beginning of a function
877definition. Alternatively, you might wish to resume parsing in the
878middle of the structure. To do this, you must provide a @var{state}
879argument that describes the initial status of parsing. The value
880returned by a previous call to @code{parse-partial-sexp} will do
881nicely.
882@end defun
883
884@node Control Parsing
885@subsection Parameters to Control Parsing
886
887@defvar multibyte-syntax-as-symbol
888If this variable is non-@code{nil}, @code{scan-sexps} treats all
889non-@acronym{ASCII} characters as symbol constituents regardless
890of what the syntax table says about them. (However, text properties
891can still override the syntax.)
892@end defvar
893
894@defopt parse-sexp-ignore-comments
895@cindex skipping comments
896If the value is non-@code{nil}, then comments are treated as
897whitespace by the functions in this section and by @code{forward-sexp},
898@code{scan-lists} and @code{scan-sexps}.
899@end defopt
900
901@vindex parse-sexp-lookup-properties
902The behavior of @code{parse-partial-sexp} is also affected by
903@code{parse-sexp-lookup-properties} (@pxref{Syntax Properties}).
904
905You can use @code{forward-comment} to move forward or backward over
906one comment or several comments.
907
908@node Standard Syntax Tables
909@section Some Standard Syntax Tables
910
911 Most of the major modes in Emacs have their own syntax tables. Here
912are several of them:
913
914@defun standard-syntax-table
915This function returns the standard syntax table, which is the syntax
916table used in Fundamental mode.
917@end defun
918
919@defvar text-mode-syntax-table
920The value of this variable is the syntax table used in Text mode.
921@end defvar
922
923@defvar c-mode-syntax-table
924The value of this variable is the syntax table for C-mode buffers.
925@end defvar
926
927@defvar emacs-lisp-mode-syntax-table
928The value of this variable is the syntax table used in Emacs Lisp mode
929by editing commands. (It has no effect on the Lisp @code{read}
930function.)
931@end defvar
932
933@node Syntax Table Internals
934@section Syntax Table Internals
935@cindex syntax table internals
936
937 Lisp programs don't usually work with the elements directly; the
938Lisp-level syntax table functions usually work with syntax descriptors
939(@pxref{Syntax Descriptors}). Nonetheless, here we document the
940internal format. This format is used mostly when manipulating
941syntax properties.
942
943 Each element of a syntax table is a cons cell of the form
944@code{(@var{syntax-code} . @var{matching-char})}. The @sc{car},
945@var{syntax-code}, is an integer that encodes the syntax class, and any
946flags. The @sc{cdr}, @var{matching-char}, is non-@code{nil} if
947a character to match was specified.
948
949 This table gives the value of @var{syntax-code} which corresponds
950to each syntactic type.
951
952@multitable @columnfractions .05 .3 .3 .31
953@item
954@tab
955@i{Integer} @i{Class}
956@tab
957@i{Integer} @i{Class}
958@tab
959@i{Integer} @i{Class}
960@item
961@tab
9620 @ @ whitespace
963@tab
9645 @ @ close parenthesis
965@tab
96610 @ @ character quote
967@item
968@tab
9691 @ @ punctuation
970@tab
9716 @ @ expression prefix
972@tab
97311 @ @ comment-start
974@item
975@tab
9762 @ @ word
977@tab
9787 @ @ string quote
979@tab
98012 @ @ comment-end
981@item
982@tab
9833 @ @ symbol
984@tab
9858 @ @ paired delimiter
986@tab
98713 @ @ inherit
988@item
989@tab
9904 @ @ open parenthesis
991@tab
9929 @ @ escape
993@tab
99414 @ @ generic comment
995@item
996@tab
99715 @ generic string
998@end multitable
999
1000 For example, the usual syntax value for @samp{(} is @code{(4 . 41)}.
1001(41 is the character code for @samp{)}.)
1002
1003 The flags are encoded in higher order bits, starting 16 bits from the
1004least significant bit. This table gives the power of two which
1005corresponds to each syntax flag.
1006
1007@multitable @columnfractions .05 .3 .3 .3
1008@item
1009@tab
1010@i{Prefix} @i{Flag}
1011@tab
1012@i{Prefix} @i{Flag}
1013@tab
1014@i{Prefix} @i{Flag}
1015@item
1016@tab
1017@samp{1} @ @ @code{(lsh 1 16)}
1018@tab
1019@samp{4} @ @ @code{(lsh 1 19)}
1020@tab
1021@samp{b} @ @ @code{(lsh 1 21)}
1022@item
1023@tab
1024@samp{2} @ @ @code{(lsh 1 17)}
1025@tab
1026@samp{p} @ @ @code{(lsh 1 20)}
1027@tab
1028@samp{n} @ @ @code{(lsh 1 22)}
1029@item
1030@tab
1031@samp{3} @ @ @code{(lsh 1 18)}
1032@end multitable
1033
1034@defun string-to-syntax @var{desc}
1035This function returns the internal form corresponding to the syntax
1036descriptor @var{desc}, a cons cell @code{(@var{syntax-code}
1037. @var{matching-char})}.
1038@end defun
1039
1040@defun syntax-after pos
1041This function returns the syntax code of the character in the buffer
1042after position @var{pos}, taking account of syntax properties as well
1043as the syntax table. If @var{pos} is outside the buffer's accessible
1044portion (@pxref{Narrowing, accessible portion}), this function returns
1045@code{nil}.
1046@end defun
1047
1048@defun syntax-class syntax
1049This function returns the syntax class of the syntax code
1050@var{syntax}. (It masks off the high 16 bits that hold the flags
1051encoded in the syntax descriptor.) If @var{syntax} is @code{nil}, it
1052returns @code{nil}; this is so evaluating the expression
1053
1054@example
1055(syntax-class (syntax-after pos))
1056@end example
1057
1058@noindent
1059where @code{pos} is outside the buffer's accessible portion, will
1060yield @code{nil} without throwing errors or producing wrong syntax
1061class codes.
1062@end defun
1063
1064@node Categories
1065@section Categories
1066@cindex categories of characters
1067@cindex character categories
1068
1069 @dfn{Categories} provide an alternate way of classifying characters
1070syntactically. You can define several categories as needed, then
1071independently assign each character to one or more categories. Unlike
1072syntax classes, categories are not mutually exclusive; it is normal for
1073one character to belong to several categories.
1074
1075@cindex category table
1076 Each buffer has a @dfn{category table} which records which categories
1077are defined and also which characters belong to each category. Each
1078category table defines its own categories, but normally these are
1079initialized by copying from the standard categories table, so that the
1080standard categories are available in all modes.
1081
1082 Each category has a name, which is an @acronym{ASCII} printing character in
1083the range @w{@samp{ }} to @samp{~}. You specify the name of a category
1084when you define it with @code{define-category}.
1085
1086 The category table is actually a char-table (@pxref{Char-Tables}).
1087The element of the category table at index @var{c} is a @dfn{category
1088set}---a bool-vector---that indicates which categories character @var{c}
1089belongs to. In this category set, if the element at index @var{cat} is
1090@code{t}, that means category @var{cat} is a member of the set, and that
1091character @var{c} belongs to category @var{cat}.
1092
1093For the next three functions, the optional argument @var{table}
1094defaults to the current buffer's category table.
1095
1096@defun define-category char docstring &optional table
1097This function defines a new category, with name @var{char} and
1098documentation @var{docstring}, for the category table @var{table}.
1ea897d5
EZ
1099
1100Here's an example of defining a new category for characters that have
1101strong right-to-left directionality (@pxref{Bidirectional Display})
1102and using it in a special category table:
1103
1104@example
1105(defvar special-category-table-for-bidi
1106 (let ((category-table (make-category-table))
1107 (uniprop-table (unicode-property-table-internal 'bidi-class)))
1108 (define-category ?R "Characters of bidi-class R, AL, or RLO"
1109 category-table)
1110 (map-char-table
1111 #'(lambda (key val)
1112 (if (memq val '(R AL RLO))
1113 (modify-category-entry key ?R category-table)))
1114 uniprop-table)
1115 category-table))
1116@end example
b8d4c8d0
GM
1117@end defun
1118
1119@defun category-docstring category &optional table
1120This function returns the documentation string of category @var{category}
1121in category table @var{table}.
1122
1123@example
1124(category-docstring ?a)
1125 @result{} "ASCII"
1126(category-docstring ?l)
1127 @result{} "Latin"
1128@end example
1129@end defun
1130
1131@defun get-unused-category &optional table
1132This function returns a category name (a character) which is not
1133currently defined in @var{table}. If all possible categories are in use
1134in @var{table}, it returns @code{nil}.
1135@end defun
1136
1137@defun category-table
1138This function returns the current buffer's category table.
1139@end defun
1140
1141@defun category-table-p object
1142This function returns @code{t} if @var{object} is a category table,
1143otherwise @code{nil}.
1144@end defun
1145
1146@defun standard-category-table
1147This function returns the standard category table.
1148@end defun
1149
1150@defun copy-category-table &optional table
1151This function constructs a copy of @var{table} and returns it. If
1152@var{table} is not supplied (or is @code{nil}), it returns a copy of the
1153standard category table. Otherwise, an error is signaled if @var{table}
1154is not a category table.
1155@end defun
1156
1157@defun set-category-table table
1158This function makes @var{table} the category table for the current
1159buffer. It returns @var{table}.
1160@end defun
1161
1162@defun make-category-table
1163This creates and returns an empty category table. In an empty category
1164table, no categories have been allocated, and no characters belong to
1165any categories.
1166@end defun
1167
1168@defun make-category-set categories
1169This function returns a new category set---a bool-vector---whose initial
1170contents are the categories listed in the string @var{categories}. The
1171elements of @var{categories} should be category names; the new category
1172set has @code{t} for each of those categories, and @code{nil} for all
1173other categories.
1174
1175@example
1176(make-category-set "al")
1177 @result{} #&128"\0\0\0\0\0\0\0\0\0\0\0\0\2\20\0\0"
1178@end example
1179@end defun
1180
1181@defun char-category-set char
1182This function returns the category set for character @var{char} in the
1183current buffer's category table. This is the bool-vector which
1184records which categories the character @var{char} belongs to. The
1185function @code{char-category-set} does not allocate storage, because
1186it returns the same bool-vector that exists in the category table.
1187
1188@example
1189(char-category-set ?a)
1190 @result{} #&128"\0\0\0\0\0\0\0\0\0\0\0\0\2\20\0\0"
1191@end example
1192@end defun
1193
1194@defun category-set-mnemonics category-set
1195This function converts the category set @var{category-set} into a string
1196containing the characters that designate the categories that are members
1197of the set.
1198
1199@example
1200(category-set-mnemonics (char-category-set ?a))
1201 @result{} "al"
1202@end example
1203@end defun
1204
f147ff75
CY
1205@defun modify-category-entry char category &optional table reset
1206This function modifies the category set of @var{char} in category
b8d4c8d0 1207table @var{table} (which defaults to the current buffer's category
f147ff75
CY
1208table). @var{char} can be a character, or a cons cell of the form
1209@code{(@var{min} . @var{max})}; in the latter case, the function
1210modifies the category sets of all characters in the range between
1211@var{min} and @var{max}, inclusive.
b8d4c8d0 1212
f147ff75 1213Normally, it modifies a category set by adding @var{category} to it.
b8d4c8d0
GM
1214But if @var{reset} is non-@code{nil}, then it deletes @var{category}
1215instead.
1216@end defun
1217
1218@deffn Command describe-categories &optional buffer-or-name
1219This function describes the category specifications in the current
1220category table. It inserts the descriptions in a buffer, and then
1221displays that buffer. If @var{buffer-or-name} is non-@code{nil}, it
1222describes the category table of that buffer instead.
1223@end deffn