* lisp/isearch.el (isearch-mode, isearch-done): Don't set arg LOCAL
[bpt/emacs.git] / doc / lispref / searching.texi
CommitLineData
b8d4c8d0
GM
1@c -*-texinfo-*-
2@c This is part of the GNU Emacs Lisp Reference Manual.
ab422c4d
PE
3@c Copyright (C) 1990-1995, 1998-1999, 2001-2013 Free Software
4@c Foundation, Inc.
b8d4c8d0 5@c See the file elisp.texi for copying conditions.
ecc6530d 6@node Searching and Matching
b8d4c8d0
GM
7@chapter Searching and Matching
8@cindex searching
9
10 GNU Emacs provides two ways to search through a buffer for specified
11text: exact string searches and regular expression searches. After a
12regular expression search, you can examine the @dfn{match data} to
13determine which text matched the whole regular expression or various
14portions of it.
15
16@menu
17* String Search:: Search for an exact match.
18* Searching and Case:: Case-independent or case-significant searching.
19* Regular Expressions:: Describing classes of strings.
20* Regexp Search:: Searching for a match for a regexp.
21* POSIX Regexps:: Searching POSIX-style for the longest match.
22* Match Data:: Finding out which part of the text matched,
23 after a string or regexp search.
d24880de 24* Search and Replace:: Commands that loop, searching and replacing.
b8d4c8d0
GM
25* Standard Regexps:: Useful regexps for finding sentences, pages,...
26@end menu
27
28 The @samp{skip-chars@dots{}} functions also perform a kind of searching.
29@xref{Skipping Characters}. To search for changes in character
30properties, see @ref{Property Search}.
31
32@node String Search
33@section Searching for Strings
34@cindex string search
35
36 These are the primitive functions for searching through the text in a
37buffer. They are meant for use in programs, but you may call them
38interactively. If you do so, they prompt for the search string; the
39arguments @var{limit} and @var{noerror} are @code{nil}, and @var{repeat}
4fb9a543
GM
40is 1. For more details on interactive searching, @pxref{Search,,
41Searching and Replacement, emacs, The GNU Emacs Manual}.
b8d4c8d0
GM
42
43 These search functions convert the search string to multibyte if the
44buffer is multibyte; they convert the search string to unibyte if the
45buffer is unibyte. @xref{Text Representations}.
46
47@deffn Command search-forward string &optional limit noerror repeat
48This function searches forward from point for an exact match for
49@var{string}. If successful, it sets point to the end of the occurrence
50found, and returns the new value of point. If no match is found, the
51value and side effects depend on @var{noerror} (see below).
b8d4c8d0
GM
52
53In the following example, point is initially at the beginning of the
54line. Then @code{(search-forward "fox")} moves point after the last
55letter of @samp{fox}:
56
57@example
58@group
59---------- Buffer: foo ----------
60@point{}The quick brown fox jumped over the lazy dog.
61---------- Buffer: foo ----------
62@end group
63
64@group
65(search-forward "fox")
66 @result{} 20
67
68---------- Buffer: foo ----------
69The quick brown fox@point{} jumped over the lazy dog.
70---------- Buffer: foo ----------
71@end group
72@end example
73
4fb9a543
GM
74The argument @var{limit} specifies the bound to the search, and should
75be a position in the current buffer. No match extending after
b8d4c8d0
GM
76that position is accepted. If @var{limit} is omitted or @code{nil}, it
77defaults to the end of the accessible portion of the buffer.
78
79@kindex search-failed
80What happens when the search fails depends on the value of
81@var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
82error is signaled. If @var{noerror} is @code{t}, @code{search-forward}
83returns @code{nil} and does nothing. If @var{noerror} is neither
84@code{nil} nor @code{t}, then @code{search-forward} moves point to the
4fb9a543
GM
85upper bound and returns @code{nil}.
86@c I see no prospect of this ever changing, and frankly the current
87@c behavior seems better, so there seems no need to mention this.
88@ignore
89(It would be more consistent now to return the new position of point
90in that case, but some existing programs may depend on a value of
91@code{nil}.)
92@end ignore
b8d4c8d0
GM
93
94The argument @var{noerror} only affects valid searches which fail to
95find a match. Invalid arguments cause errors regardless of
96@var{noerror}.
97
acc28cb9
CY
98If @var{repeat} is a positive number @var{n}, it serves as a repeat
99count: the search is repeated @var{n} times, each time starting at the
100end of the previous time's match. If these successive searches
101succeed, the function succeeds, moving point and returning its new
102value. Otherwise the search fails, with results depending on the
103value of @var{noerror}, as described above. If @var{repeat} is a
104negative number -@var{n}, it serves as a repeat count of @var{n} for a
105search in the opposite (backward) direction.
b8d4c8d0
GM
106@end deffn
107
108@deffn Command search-backward string &optional limit noerror repeat
109This function searches backward from point for @var{string}. It is
acc28cb9
CY
110like @code{search-forward}, except that it searches backwards rather
111than forwards. Backward searches leave point at the beginning of the
112match.
b8d4c8d0
GM
113@end deffn
114
115@deffn Command word-search-forward string &optional limit noerror repeat
b8d4c8d0
GM
116This function searches forward from point for a ``word'' match for
117@var{string}. If it finds a match, it sets point to the end of the
118match found, and returns the new value of point.
b8d4c8d0
GM
119
120Word matching regards @var{string} as a sequence of words, disregarding
121punctuation that separates them. It searches the buffer for the same
122sequence of words. Each word must be distinct in the buffer (searching
123for the word @samp{ball} does not match the word @samp{balls}), but the
124details of punctuation and spacing are ignored (searching for @samp{ball
125boy} does match @samp{ball. Boy!}).
126
127In this example, point is initially at the beginning of the buffer; the
128search leaves it between the @samp{y} and the @samp{!}.
129
130@example
131@group
132---------- Buffer: foo ----------
133@point{}He said "Please! Find
134the ball boy!"
135---------- Buffer: foo ----------
136@end group
137
138@group
139(word-search-forward "Please find the ball, boy.")
4fb9a543 140 @result{} 36
b8d4c8d0
GM
141
142---------- Buffer: foo ----------
143He said "Please! Find
144the ball boy@point{}!"
145---------- Buffer: foo ----------
146@end group
147@end example
148
149If @var{limit} is non-@code{nil}, it must be a position in the current
150buffer; it specifies the upper bound to the search. The match found
151must not extend after that position.
152
153If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
154an error if the search fails. If @var{noerror} is @code{t}, then it
155returns @code{nil} instead of signaling an error. If @var{noerror} is
156neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the
157end of the accessible portion of the buffer) and returns @code{nil}.
158
159If @var{repeat} is non-@code{nil}, then the search is repeated that many
160times. Point is positioned at the end of the last match.
4fb9a543
GM
161
162@findex word-search-regexp
163Internal, @code{word-search-forward} and related functions use the
164function @code{word-search-regexp} to convert @var{string} to a
165regular expression that ignores punctuation.
b8d4c8d0
GM
166@end deffn
167
fca4ec76
CY
168@deffn Command word-search-forward-lax string &optional limit noerror repeat
169This command is identical to @code{word-search-forward}, except that
4fb9a543 170the end of @var{string} need not match a word boundary, unless @var{string} ends
fca4ec76
CY
171in whitespace. For instance, searching for @samp{ball boy} matches
172@samp{ball boyee}, but does not match @samp{aball boy}.
173@end deffn
174
b8d4c8d0
GM
175@deffn Command word-search-backward string &optional limit noerror repeat
176This function searches backward from point for a word match to
177@var{string}. This function is just like @code{word-search-forward}
178except that it searches backward and normally leaves point at the
179beginning of the match.
180@end deffn
181
fca4ec76
CY
182@deffn Command word-search-backward-lax string &optional limit noerror repeat
183This command is identical to @code{word-search-backward}, except that
4fb9a543 184the end of @var{string} need not match a word boundary, unless @var{string} ends
fca4ec76
CY
185in whitespace.
186@end deffn
187
b8d4c8d0
GM
188@node Searching and Case
189@section Searching and Case
190@cindex searching and case
191
192 By default, searches in Emacs ignore the case of the text they are
193searching through; if you specify searching for @samp{FOO}, then
194@samp{Foo} or @samp{foo} is also considered a match. This applies to
195regular expressions, too; thus, @samp{[aB]} would match @samp{a} or
196@samp{A} or @samp{b} or @samp{B}.
197
198 If you do not want this feature, set the variable
199@code{case-fold-search} to @code{nil}. Then all letters must match
200exactly, including case. This is a buffer-local variable; altering the
201variable affects only the current buffer. (@xref{Intro to
4fb9a543
GM
202Buffer-Local}.) Alternatively, you may change the default value.
203In Lisp code, you will more typically use @code{let} to bind
204@code{case-fold-search} to the desired value.
b8d4c8d0
GM
205
206 Note that the user-level incremental search feature handles case
fca4ec76
CY
207distinctions differently. When the search string contains only lower
208case letters, the search ignores case, but when the search string
209contains one or more upper case letters, the search becomes
210case-sensitive. But this has nothing to do with the searching
4fb9a543
GM
211functions used in Lisp code. @xref{Incremental Search,,, emacs,
212The GNU Emacs Manual}.
b8d4c8d0
GM
213
214@defopt case-fold-search
215This buffer-local variable determines whether searches should ignore
216case. If the variable is @code{nil} they do not ignore case; otherwise
4fb9a543 217(and by default) they do ignore case.
b8d4c8d0
GM
218@end defopt
219
fca4ec76 220@defopt case-replace
4fb9a543 221This variable determines whether the higher-level replacement
fca4ec76
CY
222functions should preserve case. If the variable is @code{nil}, that
223means to use the replacement text verbatim. A non-@code{nil} value
224means to convert the case of the replacement text according to the
225text being replaced.
226
227This variable is used by passing it as an argument to the function
228@code{replace-match}. @xref{Replacing Match}.
229@end defopt
230
b8d4c8d0
GM
231@node Regular Expressions
232@section Regular Expressions
233@cindex regular expression
234@cindex regexp
235
fca4ec76 236 A @dfn{regular expression}, or @dfn{regexp} for short, is a pattern that
b8d4c8d0
GM
237denotes a (possibly infinite) set of strings. Searching for matches for
238a regexp is a very powerful operation. This section explains how to write
239regexps; the following section says how to search for them.
240
241@findex re-builder
242@cindex regular expressions, developing
d14daa28 243 For interactive development of regular expressions, you
b8d4c8d0
GM
244can use the @kbd{M-x re-builder} command. It provides a convenient
245interface for creating regular expressions, by giving immediate visual
246feedback in a separate buffer. As you edit the regexp, all its
247matches in the target buffer are highlighted. Each parenthesized
248sub-expression of the regexp is shown in a distinct face, which makes
249it easier to verify even very complex regexps.
250
251@menu
252* Syntax of Regexps:: Rules for writing regular expressions.
253* Regexp Example:: Illustrates regular expression syntax.
254* Regexp Functions:: Functions for operating on regular expressions.
255@end menu
256
257@node Syntax of Regexps
258@subsection Syntax of Regular Expressions
259
260 Regular expressions have a syntax in which a few characters are
261special constructs and the rest are @dfn{ordinary}. An ordinary
262character is a simple regular expression that matches that character
263and nothing else. The special characters are @samp{.}, @samp{*},
264@samp{+}, @samp{?}, @samp{[}, @samp{^}, @samp{$}, and @samp{\}; no new
265special characters will be defined in the future. The character
266@samp{]} is special if it ends a character alternative (see later).
267The character @samp{-} is special inside a character alternative. A
268@samp{[:} and balancing @samp{:]} enclose a character class inside a
269character alternative. Any other character appearing in a regular
270expression is ordinary, unless a @samp{\} precedes it.
271
272 For example, @samp{f} is not a special character, so it is ordinary, and
273therefore @samp{f} is a regular expression that matches the string
274@samp{f} and no other string. (It does @emph{not} match the string
275@samp{fg}, but it does match a @emph{part} of that string.) Likewise,
76f1a3c3 276@samp{o} is a regular expression that matches only @samp{o}.
b8d4c8d0
GM
277
278 Any two regular expressions @var{a} and @var{b} can be concatenated. The
279result is a regular expression that matches a string if @var{a} matches
280some amount of the beginning of that string and @var{b} matches the rest of
76f1a3c3 281the string.
b8d4c8d0
GM
282
283 As a simple example, we can concatenate the regular expressions @samp{f}
284and @samp{o} to get the regular expression @samp{fo}, which matches only
285the string @samp{fo}. Still trivial. To do something more powerful, you
286need to use one of the special regular expression constructs.
287
288@menu
289* Regexp Special:: Special characters in regular expressions.
290* Char Classes:: Character classes used in regular expressions.
291* Regexp Backslash:: Backslash-sequences in regular expressions.
292@end menu
293
294@node Regexp Special
295@subsubsection Special Characters in Regular Expressions
296
297 Here is a list of the characters that are special in a regular
298expression.
299
300@need 800
301@table @asis
302@item @samp{.}@: @r{(Period)}
303@cindex @samp{.} in regexp
304is a special character that matches any single character except a newline.
305Using concatenation, we can make regular expressions like @samp{a.b}, which
306matches any three-character string that begins with @samp{a} and ends with
76f1a3c3 307@samp{b}.
b8d4c8d0
GM
308
309@item @samp{*}
310@cindex @samp{*} in regexp
311is not a construct by itself; it is a postfix operator that means to
312match the preceding regular expression repetitively as many times as
313possible. Thus, @samp{o*} matches any number of @samp{o}s (including no
314@samp{o}s).
315
316@samp{*} always applies to the @emph{smallest} possible preceding
317expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating
318@samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
319
d14daa28 320@cindex backtracking and regular expressions
b8d4c8d0
GM
321The matcher processes a @samp{*} construct by matching, immediately, as
322many repetitions as can be found. Then it continues with the rest of
323the pattern. If that fails, backtracking occurs, discarding some of the
324matches of the @samp{*}-modified construct in the hope that that will
325make it possible to match the rest of the pattern. For example, in
326matching @samp{ca*ar} against the string @samp{caaar}, the @samp{a*}
327first tries to match all three @samp{a}s; but the rest of the pattern is
328@samp{ar} and there is only @samp{r} left to match, so this try fails.
329The next alternative is for @samp{a*} to match only two @samp{a}s. With
330this choice, the rest of the regexp matches successfully.
331
332@strong{Warning:} Nested repetition operators can run for an
333indefinitely long time, if they lead to ambiguous matching. For
334example, trying to match the regular expression @samp{\(x+y*\)*a}
335against the string @samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz} could
336take hours before it ultimately fails. Emacs must try each way of
337grouping the @samp{x}s before concluding that none of them can work.
338Even worse, @samp{\(x*\)*} can match the null string in infinitely
339many ways, so it causes an infinite loop. To avoid these problems,
340check nested repetitions carefully, to make sure that they do not
341cause combinatorial explosions in backtracking.
342
343@item @samp{+}
344@cindex @samp{+} in regexp
345is a postfix operator, similar to @samp{*} except that it must match
346the preceding expression at least once. So, for example, @samp{ca+r}
347matches the strings @samp{car} and @samp{caaaar} but not the string
348@samp{cr}, whereas @samp{ca*r} matches all three strings.
349
350@item @samp{?}
351@cindex @samp{?} in regexp
352is a postfix operator, similar to @samp{*} except that it must match the
353preceding expression either once or not at all. For example,
354@samp{ca?r} matches @samp{car} or @samp{cr}; nothing else.
355
356@item @samp{*?}, @samp{+?}, @samp{??}
3645358a 357@cindex non-greedy repetition characters in regexp
b8d4c8d0
GM
358These are ``non-greedy'' variants of the operators @samp{*}, @samp{+}
359and @samp{?}. Where those operators match the largest possible
360substring (consistent with matching the entire containing expression),
361the non-greedy variants match the smallest possible substring
362(consistent with matching the entire containing expression).
363
364For example, the regular expression @samp{c[ad]*a} when applied to the
365string @samp{cdaaada} matches the whole string; but the regular
366expression @samp{c[ad]*?a}, applied to that same string, matches just
367@samp{cda}. (The smallest possible match here for @samp{[ad]*?} that
368permits the whole expression to match is @samp{d}.)
369
370@item @samp{[ @dots{} ]}
371@cindex character alternative (in regexp)
372@cindex @samp{[} in regexp
373@cindex @samp{]} in regexp
374is a @dfn{character alternative}, which begins with @samp{[} and is
375terminated by @samp{]}. In the simplest case, the characters between
376the two brackets are what this character alternative can match.
377
378Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and
379@samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s
ba3bf1d9 380(including the empty string). It follows that @samp{c[ad]*r}
b8d4c8d0
GM
381matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
382
383You can also include character ranges in a character alternative, by
384writing the starting and ending characters with a @samp{-} between them.
385Thus, @samp{[a-z]} matches any lower-case @acronym{ASCII} letter.
386Ranges may be intermixed freely with individual characters, as in
387@samp{[a-z$%.]}, which matches any lower case @acronym{ASCII} letter
388or @samp{$}, @samp{%} or period.
389
d14daa28
GM
390If @code{case-fold-search} is non-@code{nil}, @samp{[a-z]} also
391matches upper-case letters. Note that a range like @samp{[a-z]} is
392not affected by the locale's collation sequence, it always represents
393a sequence in @acronym{ASCII} order.
1df7defd 394@c This wasn't obvious to me, since, e.g., the grep manual "Character
efdf29da
GM
395@c Classes and Bracket Expressions" specifically notes the opposite
396@c behavior. But by experiment Emacs seems unaffected by LC_COLLATE
397@c in this regard.
d14daa28
GM
398
399Note also that the usual regexp special characters are not special inside a
b8d4c8d0
GM
400character alternative. A completely different set of characters is
401special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
402
403To include a @samp{]} in a character alternative, you must make it the
404first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}.
405To include a @samp{-}, write @samp{-} as the first or last character of
406the character alternative, or put it after a range. Thus, @samp{[]-]}
d14daa28
GM
407matches both @samp{]} and @samp{-}. (As explained below, you cannot
408use @samp{\]} to include a @samp{]} inside a character alternative,
409since @samp{\} is not special there.)
b8d4c8d0
GM
410
411To include @samp{^} in a character alternative, put it anywhere but at
412the beginning.
413
d14daa28
GM
414@c What if it starts with a multibyte and ends with a unibyte?
415@c That doesn't seem to match anything...?
b8d4c8d0
GM
416If a range starts with a unibyte character @var{c} and ends with a
417multibyte character @var{c2}, the range is divided into two parts: one
d14daa28
GM
418spans the unibyte characters @samp{@var{c}..?\377}, the other the
419multibyte characters @samp{@var{c1}..@var{c2}}, where @var{c1} is the
420first character of the charset to which @var{c2} belongs.
b8d4c8d0 421
ba3bf1d9 422A character alternative can also specify named character classes
d14daa28
GM
423(@pxref{Char Classes}). This is a POSIX feature. For example,
424@samp{[[:ascii:]]} matches any @acronym{ASCII} character.
425Using a character class is equivalent to mentioning each of the
426characters in that class; but the latter is not feasible in practice,
427since some classes include thousands of different characters.
b8d4c8d0
GM
428
429@item @samp{[^ @dots{} ]}
430@cindex @samp{^} in regexp
431@samp{[^} begins a @dfn{complemented character alternative}. This
432matches any character except the ones specified. Thus,
433@samp{[^a-z0-9A-Z]} matches all characters @emph{except} letters and
434digits.
435
436@samp{^} is not special in a character alternative unless it is the first
437character. The character following the @samp{^} is treated as if it
438were first (in other words, @samp{-} and @samp{]} are not special there).
439
440A complemented character alternative can match a newline, unless newline is
441mentioned as one of the characters not to match. This is in contrast to
442the handling of regexps in programs such as @code{grep}.
443
ba3bf1d9
CY
444You can specify named character classes, just like in character
445alternatives. For instance, @samp{[^[:ascii:]]} matches any
446non-@acronym{ASCII} character. @xref{Char Classes}.
447
b8d4c8d0
GM
448@item @samp{^}
449@cindex beginning of line in regexp
450When matching a buffer, @samp{^} matches the empty string, but only at the
451beginning of a line in the text being matched (or the beginning of the
452accessible portion of the buffer). Otherwise it fails to match
453anything. Thus, @samp{^foo} matches a @samp{foo} that occurs at the
454beginning of a line.
455
456When matching a string instead of a buffer, @samp{^} matches at the
457beginning of the string or after a newline character.
458
459For historical compatibility reasons, @samp{^} can be used only at the
460beginning of the regular expression, or after @samp{\(}, @samp{\(?:}
461or @samp{\|}.
462
463@item @samp{$}
464@cindex @samp{$} in regexp
465@cindex end of line in regexp
466is similar to @samp{^} but matches only at the end of a line (or the
467end of the accessible portion of the buffer). Thus, @samp{x+$}
468matches a string of one @samp{x} or more at the end of a line.
469
470When matching a string instead of a buffer, @samp{$} matches at the end
471of the string or before a newline character.
472
473For historical compatibility reasons, @samp{$} can be used only at the
474end of the regular expression, or before @samp{\)} or @samp{\|}.
475
476@item @samp{\}
477@cindex @samp{\} in regexp
478has two functions: it quotes the special characters (including
479@samp{\}), and it introduces additional special constructs.
480
481Because @samp{\} quotes special characters, @samp{\$} is a regular
482expression that matches only @samp{$}, and @samp{\[} is a regular
483expression that matches only @samp{[}, and so on.
484
485Note that @samp{\} also has special meaning in the read syntax of Lisp
486strings (@pxref{String Type}), and must be quoted with @samp{\}. For
487example, the regular expression that matches the @samp{\} character is
488@samp{\\}. To write a Lisp string that contains the characters
489@samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
490@samp{\}. Therefore, the read syntax for a regular expression matching
76f1a3c3 491@samp{\} is @code{"\\\\"}.
b8d4c8d0
GM
492@end table
493
494@strong{Please note:} For historical compatibility, special characters
495are treated as ordinary ones if they are in contexts where their special
496meanings make no sense. For example, @samp{*foo} treats @samp{*} as
497ordinary since there is no preceding expression on which the @samp{*}
498can act. It is poor practice to depend on this behavior; quote the
76f1a3c3 499special character anyway, regardless of where it appears.
b8d4c8d0
GM
500
501As a @samp{\} is not special inside a character alternative, it can
502never remove the special meaning of @samp{-} or @samp{]}. So you
503should not quote these characters when they have no special meaning
504either. This would not clarify anything, since backslashes can
505legitimately precede these characters where they @emph{have} special
506meaning, as in @samp{[^\]} (@code{"[^\\]"} for Lisp string syntax),
507which matches any single character except a backslash.
508
509In practice, most @samp{]} that occur in regular expressions close a
510character alternative and hence are special. However, occasionally a
511regular expression may try to match a complex pattern of literal
512@samp{[} and @samp{]}. In such situations, it sometimes may be
513necessary to carefully parse the regexp from the start to determine
514which square brackets enclose a character alternative. For example,
515@samp{[^][]]} consists of the complemented character alternative
516@samp{[^][]} (which matches any single character that is not a square
517bracket), followed by a literal @samp{]}.
518
519The exact rules are that at the beginning of a regexp, @samp{[} is
520special and @samp{]} not. This lasts until the first unquoted
521@samp{[}, after which we are in a character alternative; @samp{[} is
522no longer special (except when it starts a character class) but @samp{]}
523is special, unless it immediately follows the special @samp{[} or that
524@samp{[} followed by a @samp{^}. This lasts until the next special
525@samp{]} that does not end a character class. This ends the character
526alternative and restores the ordinary syntax of regular expressions;
527an unquoted @samp{[} is special again and a @samp{]} not.
528
529@node Char Classes
530@subsubsection Character Classes
531@cindex character classes in regexp
532
533 Here is a table of the classes you can use in a character alternative,
534and what they mean:
535
536@table @samp
537@item [:ascii:]
538This matches any @acronym{ASCII} character (codes 0--127).
539@item [:alnum:]
540This matches any letter or digit. (At present, for multibyte
541characters, it matches anything that has word syntax.)
542@item [:alpha:]
543This matches any letter. (At present, for multibyte characters, it
544matches anything that has word syntax.)
545@item [:blank:]
546This matches space and tab only.
547@item [:cntrl:]
548This matches any @acronym{ASCII} control character.
549@item [:digit:]
550This matches @samp{0} through @samp{9}. Thus, @samp{[-+[:digit:]]}
551matches any digit, as well as @samp{+} and @samp{-}.
552@item [:graph:]
553This matches graphic characters---everything except @acronym{ASCII} control
554characters, space, and the delete character.
555@item [:lower:]
4359a806
CY
556This matches any lower-case letter, as determined by the current case
557table (@pxref{Case Tables}). If @code{case-fold-search} is
558non-@code{nil}, this also matches any upper-case letter.
b8d4c8d0
GM
559@item [:multibyte:]
560This matches any multibyte character (@pxref{Text Representations}).
561@item [:nonascii:]
562This matches any non-@acronym{ASCII} character.
563@item [:print:]
564This matches printing characters---everything except @acronym{ASCII} control
565characters and the delete character.
566@item [:punct:]
567This matches any punctuation character. (At present, for multibyte
568characters, it matches anything that has non-word syntax.)
569@item [:space:]
570This matches any character that has whitespace syntax
571(@pxref{Syntax Class Table}).
572@item [:unibyte:]
573This matches any unibyte character (@pxref{Text Representations}).
574@item [:upper:]
4359a806
CY
575This matches any upper-case letter, as determined by the current case
576table (@pxref{Case Tables}). If @code{case-fold-search} is
577non-@code{nil}, this also matches any lower-case letter.
b8d4c8d0
GM
578@item [:word:]
579This matches any character that has word syntax (@pxref{Syntax Class
580Table}).
581@item [:xdigit:]
582This matches the hexadecimal digits: @samp{0} through @samp{9}, @samp{a}
583through @samp{f} and @samp{A} through @samp{F}.
584@end table
585
586@node Regexp Backslash
587@subsubsection Backslash Constructs in Regular Expressions
4963495d 588@cindex backslash in regular expressions
b8d4c8d0
GM
589
590 For the most part, @samp{\} followed by any character matches only
591that character. However, there are several exceptions: certain
f8152bcb
XF
592sequences starting with @samp{\} that have special meanings. Here is
593a table of the special @samp{\} constructs.
b8d4c8d0
GM
594
595@table @samp
596@item \|
597@cindex @samp{|} in regexp
598@cindex regexp alternative
599specifies an alternative.
600Two regular expressions @var{a} and @var{b} with @samp{\|} in
601between form an expression that matches anything that either @var{a} or
76f1a3c3 602@var{b} matches.
b8d4c8d0
GM
603
604Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
76f1a3c3 605but no other string.
b8d4c8d0
GM
606
607@samp{\|} applies to the largest possible surrounding expressions. Only a
608surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
76f1a3c3 609@samp{\|}.
b8d4c8d0
GM
610
611If you need full backtracking capability to handle multiple uses of
612@samp{\|}, use the POSIX regular expression functions (@pxref{POSIX
613Regexps}).
614
615@item \@{@var{m}\@}
616is a postfix operator that repeats the previous pattern exactly @var{m}
617times. Thus, @samp{x\@{5\@}} matches the string @samp{xxxxx}
618and nothing else. @samp{c[ad]\@{3\@}r} matches string such as
619@samp{caaar}, @samp{cdddr}, @samp{cadar}, and so on.
620
621@item \@{@var{m},@var{n}\@}
622is a more general postfix operator that specifies repetition with a
623minimum of @var{m} repeats and a maximum of @var{n} repeats. If @var{m}
624is omitted, the minimum is 0; if @var{n} is omitted, there is no
625maximum.
626
627For example, @samp{c[ad]\@{1,2\@}r} matches the strings @samp{car},
628@samp{cdr}, @samp{caar}, @samp{cadr}, @samp{cdar}, and @samp{cddr}, and
629nothing else.@*
d24880de
GM
630@samp{\@{0,1\@}} or @samp{\@{,1\@}} is equivalent to @samp{?}.@*
631@samp{\@{0,\@}} or @samp{\@{,\@}} is equivalent to @samp{*}.@*
b8d4c8d0
GM
632@samp{\@{1,\@}} is equivalent to @samp{+}.
633
634@item \( @dots{} \)
635@cindex @samp{(} in regexp
636@cindex @samp{)} in regexp
637@cindex regexp grouping
638is a grouping construct that serves three purposes:
639
640@enumerate
641@item
642To enclose a set of @samp{\|} alternatives for other operations. Thus,
643the regular expression @samp{\(foo\|bar\)x} matches either @samp{foox}
644or @samp{barx}.
645
646@item
647To enclose a complicated expression for the postfix operators @samp{*},
648@samp{+} and @samp{?} to operate on. Thus, @samp{ba\(na\)*} matches
649@samp{ba}, @samp{bana}, @samp{banana}, @samp{bananana}, etc., with any
650number (zero or more) of @samp{na} strings.
651
652@item
653To record a matched substring for future reference with
654@samp{\@var{digit}} (see below).
655@end enumerate
656
657This last application is not a consequence of the idea of a
658parenthetical grouping; it is a separate feature that was assigned as a
659second meaning to the same @samp{\( @dots{} \)} construct because, in
660practice, there was usually no conflict between the two meanings. But
661occasionally there is a conflict, and that led to the introduction of
662shy groups.
663
664@item \(?: @dots{} \)
80d7cdca
CY
665@cindex shy groups
666@cindex non-capturing group
667@cindex unnumbered group
47f24290 668@cindex @samp{(?:} in regexp
b8d4c8d0
GM
669is the @dfn{shy group} construct. A shy group serves the first two
670purposes of an ordinary group (controlling the nesting of other
671operators), but it does not get a number, so you cannot refer back to
80d7cdca
CY
672its value with @samp{\@var{digit}}. Shy groups are particularly
673useful for mechanically-constructed regular expressions, because they
674can be added automatically without altering the numbering of ordinary,
675non-shy groups.
b8d4c8d0 676
80d7cdca
CY
677Shy groups are also called @dfn{non-capturing} or @dfn{unnumbered
678groups}.
b8d4c8d0
GM
679
680@item \(?@var{num}: @dots{} \)
681is the @dfn{explicitly numbered group} construct. Normal groups get
682their number implicitly, based on their position, which can be
683inconvenient. This construct allows you to force a particular group
684number. There is no particular restriction on the numbering,
1df7defd
PE
685e.g., you can have several groups with the same number in which case
686the last one to match (i.e., the rightmost match) will win.
b8d4c8d0
GM
687Implicitly numbered groups always get the smallest integer larger than
688the one of any previous group.
689
690@item \@var{digit}
691matches the same text that matched the @var{digit}th occurrence of a
692grouping (@samp{\( @dots{} \)}) construct.
693
694In other words, after the end of a group, the matcher remembers the
695beginning and end of the text matched by that group. Later on in the
696regular expression you can use @samp{\} followed by @var{digit} to
697match that same text, whatever it may have been.
698
699The strings matching the first nine grouping constructs appearing in
700the entire regular expression passed to a search or matching function
701are assigned numbers 1 through 9 in the order that the open
702parentheses appear in the regular expression. So you can use
703@samp{\1} through @samp{\9} to refer to the text matched by the
704corresponding grouping constructs.
705
706For example, @samp{\(.*\)\1} matches any newline-free string that is
707composed of two identical halves. The @samp{\(.*\)} matches the first
708half, which may be anything, but the @samp{\1} that follows must match
709the same exact text.
710
711If a @samp{\( @dots{} \)} construct matches more than once (which can
712happen, for instance, if it is followed by @samp{*}), only the last
713match is recorded.
714
715If a particular grouping construct in the regular expression was never
716matched---for instance, if it appears inside of an alternative that
717wasn't used, or inside of a repetition that repeated zero times---then
718the corresponding @samp{\@var{digit}} construct never matches
748c30f4 719anything. To use an artificial example, @samp{\(foo\(b*\)\|lose\)\2}
b8d4c8d0
GM
720cannot match @samp{lose}: the second alternative inside the larger
721group matches it, but then @samp{\2} is undefined and can't match
722anything. But it can match @samp{foobb}, because the first
723alternative matches @samp{foob} and @samp{\2} matches @samp{b}.
724
725@item \w
726@cindex @samp{\w} in regexp
727matches any word-constituent character. The editor syntax table
728determines which characters these are. @xref{Syntax Tables}.
729
730@item \W
731@cindex @samp{\W} in regexp
732matches any character that is not a word constituent.
733
734@item \s@var{code}
735@cindex @samp{\s} in regexp
736matches any character whose syntax is @var{code}. Here @var{code} is a
737character that represents a syntax code: thus, @samp{w} for word
738constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
739etc. To represent whitespace syntax, use either @samp{-} or a space
740character. @xref{Syntax Class Table}, for a list of syntax codes and
741the characters that stand for them.
742
743@item \S@var{code}
744@cindex @samp{\S} in regexp
745matches any character whose syntax is not @var{code}.
746
1ea897d5 747@cindex category, regexp search for
b8d4c8d0
GM
748@item \c@var{c}
749matches any character whose category is @var{c}. Here @var{c} is a
750character that represents a category: thus, @samp{c} for Chinese
751characters or @samp{g} for Greek characters in the standard category
1ea897d5
EZ
752table. You can see the list of all the currently defined categories
753with @kbd{M-x describe-categories @key{RET}}. You can also define
754your own categories in addition to the standard ones using the
755@code{define-category} function (@pxref{Categories}).
b8d4c8d0
GM
756
757@item \C@var{c}
758matches any character whose category is not @var{c}.
759@end table
760
761 The following regular expression constructs match the empty string---that is,
762they don't use up any characters---but whether they match depends on the
763context. For all, the beginning and end of the accessible portion of
764the buffer are treated as if they were the actual beginning and end of
765the buffer.
766
767@table @samp
768@item \`
769@cindex @samp{\`} in regexp
770matches the empty string, but only at the beginning
771of the buffer or string being matched against.
772
773@item \'
774@cindex @samp{\'} in regexp
775matches the empty string, but only at the end of
776the buffer or string being matched against.
777
778@item \=
779@cindex @samp{\=} in regexp
780matches the empty string, but only at point.
781(This construct is not defined when matching against a string.)
782
783@item \b
784@cindex @samp{\b} in regexp
785matches the empty string, but only at the beginning or
786end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
787@samp{foo} as a separate word. @samp{\bballs?\b} matches
76f1a3c3 788@samp{ball} or @samp{balls} as a separate word.
b8d4c8d0
GM
789
790@samp{\b} matches at the beginning or end of the buffer (or string)
791regardless of what text appears next to it.
792
793@item \B
794@cindex @samp{\B} in regexp
795matches the empty string, but @emph{not} at the beginning or
796end of a word, nor at the beginning or end of the buffer (or string).
797
798@item \<
799@cindex @samp{\<} in regexp
800matches the empty string, but only at the beginning of a word.
801@samp{\<} matches at the beginning of the buffer (or string) only if a
802word-constituent character follows.
803
804@item \>
805@cindex @samp{\>} in regexp
806matches the empty string, but only at the end of a word. @samp{\>}
807matches at the end of the buffer (or string) only if the contents end
808with a word-constituent character.
809
810@item \_<
811@cindex @samp{\_<} in regexp
812matches the empty string, but only at the beginning of a symbol. A
813symbol is a sequence of one or more word or symbol constituent
814characters. @samp{\_<} matches at the beginning of the buffer (or
815string) only if a symbol-constituent character follows.
816
817@item \_>
818@cindex @samp{\_>} in regexp
819matches the empty string, but only at the end of a symbol. @samp{\_>}
820matches at the end of the buffer (or string) only if the contents end
821with a symbol-constituent character.
822@end table
823
824@kindex invalid-regexp
825 Not every string is a valid regular expression. For example, a string
d14daa28 826that ends inside a character alternative without a terminating @samp{]}
b8d4c8d0
GM
827is invalid, and so is a string that ends with a single @samp{\}. If
828an invalid regular expression is passed to any of the search functions,
829an @code{invalid-regexp} error is signaled.
830
831@node Regexp Example
b8d4c8d0
GM
832@subsection Complex Regexp Example
833
834 Here is a complicated regexp which was formerly used by Emacs to
835recognize the end of a sentence together with any whitespace that
836follows. (Nowadays Emacs uses a similar but more complex default
837regexp constructed by the function @code{sentence-end}.
838@xref{Standard Regexps}.)
839
d14daa28
GM
840 Below, we show first the regexp as a string in Lisp syntax (to
841distinguish spaces from tab characters), and then the result of
842evaluating it. The string constant begins and ends with a
b8d4c8d0
GM
843double-quote. @samp{\"} stands for a double-quote as part of the
844string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
845tab and @samp{\n} for a newline.
846
b8d4c8d0
GM
847@example
848@group
849"[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*"
850 @result{} "[.?!][]\"')@}]*\\($\\| $\\| \\|@ @ \\)[
851]*"
852@end group
853@end example
854
855@noindent
d14daa28 856In the output, tab and newline appear as themselves.
b8d4c8d0
GM
857
858 This regular expression contains four parts in succession and can be
859deciphered as follows:
860
861@table @code
862@item [.?!]
863The first part of the pattern is a character alternative that matches
864any one of three characters: period, question mark, and exclamation
865mark. The match must begin with one of these three characters. (This
866is one point where the new default regexp used by Emacs differs from
867the old. The new value also allows some non-@acronym{ASCII}
868characters that end a sentence without any following whitespace.)
869
870@item []\"')@}]*
871The second part of the pattern matches any closing braces and quotation
872marks, zero or more of them, that may follow the period, question mark
873or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in
874a string. The @samp{*} at the end indicates that the immediately
875preceding regular expression (a character alternative, in this case) may be
876repeated zero or more times.
877
878@item \\($\\|@ $\\|\t\\|@ @ \\)
879The third part of the pattern matches the whitespace that follows the
880end of a sentence: the end of a line (optionally with a space), or a
881tab, or two spaces. The double backslashes mark the parentheses and
882vertical bars as regular expression syntax; the parentheses delimit a
883group and the vertical bars separate alternatives. The dollar sign is
884used to match the end of a line.
885
886@item [ \t\n]*
887Finally, the last part of the pattern matches any additional whitespace
888beyond the minimum needed to end a sentence.
889@end table
890
891@node Regexp Functions
892@subsection Regular Expression Functions
893
894 These functions operate on regular expressions.
895
896@defun regexp-quote string
897This function returns a regular expression whose only exact match is
898@var{string}. Using this regular expression in @code{looking-at} will
899succeed only if the next characters in the buffer are @var{string};
900using it in a search function will succeed if the text being searched
fee88ca0 901contains @var{string}. @xref{Regexp Search}.
b8d4c8d0
GM
902
903This allows you to request an exact string match or search when calling
904a function that wants a regular expression.
905
906@example
907@group
908(regexp-quote "^The cat$")
909 @result{} "\\^The cat\\$"
910@end group
911@end example
912
913One use of @code{regexp-quote} is to combine an exact string match with
914context described as a regular expression. For example, this searches
915for the string that is the value of @var{string}, surrounded by
916whitespace:
917
918@example
919@group
920(re-search-forward
921 (concat "\\s-" (regexp-quote string) "\\s-"))
922@end group
923@end example
924@end defun
925
926@defun regexp-opt strings &optional paren
927This function returns an efficient regular expression that will match
928any of the strings in the list @var{strings}. This is useful when you
929need to make matching or searching as fast as possible---for example,
fee88ca0
GM
930for Font Lock mode@footnote{Note that @code{regexp-opt} does not
931guarantee that its result is absolutely the most efficient form
932possible. A hand-tuned regular expression can sometimes be slightly
933more efficient, but is almost never worth the effort.}.
1df7defd 934@c E.g., see http://debbugs.gnu.org/2816
b8d4c8d0
GM
935
936If the optional argument @var{paren} is non-@code{nil}, then the
937returned regular expression is always enclosed by at least one
938parentheses-grouping construct. If @var{paren} is @code{words}, then
07ff7702
MB
939that construct is additionally surrounded by @samp{\<} and @samp{\>};
940alternatively, if @var{paren} is @code{symbols}, then that construct
941is additionally surrounded by @samp{\_<} and @samp{\_>}
942(@code{symbols} is often appropriate when matching
943programming-language keywords and the like).
b8d4c8d0
GM
944
945This simplified definition of @code{regexp-opt} produces a
946regular expression which is equivalent to the actual value
947(but not as efficient):
948
949@example
fee88ca0 950(defun regexp-opt (strings &optional paren)
b8d4c8d0
GM
951 (let ((open-paren (if paren "\\(" ""))
952 (close-paren (if paren "\\)" "")))
953 (concat open-paren
954 (mapconcat 'regexp-quote strings "\\|")
955 close-paren)))
956@end example
957@end defun
958
959@defun regexp-opt-depth regexp
960This function returns the total number of grouping constructs
80d7cdca
CY
961(parenthesized expressions) in @var{regexp}. This does not include
962shy groups (@pxref{Regexp Backslash}).
b8d4c8d0
GM
963@end defun
964
fee88ca0
GM
965@c Supposedly an internal regexp-opt function, but table.el uses it at least.
966@defun regexp-opt-charset chars
967This function returns a regular expression matching a character in the
968list of characters @var{chars}.
969
970@example
971(regexp-opt-charset '(?a ?b ?c ?d ?e))
972 @result{} "[a-e]"
973@end example
974@end defun
975
976@c Internal functions: regexp-opt-group
977
b8d4c8d0
GM
978@node Regexp Search
979@section Regular Expression Searching
980@cindex regular expression searching
981@cindex regexp searching
982@cindex searching for regexp
983
984 In GNU Emacs, you can search for the next match for a regular
985expression either incrementally or not. For incremental search
986commands, see @ref{Regexp Search, , Regular Expression Search, emacs,
987The GNU Emacs Manual}. Here we describe only the search functions
988useful in programs. The principal one is @code{re-search-forward}.
989
990 These search functions convert the regular expression to multibyte if
991the buffer is multibyte; they convert the regular expression to unibyte
992if the buffer is unibyte. @xref{Text Representations}.
993
994@deffn Command re-search-forward regexp &optional limit noerror repeat
995This function searches forward in the current buffer for a string of
996text that is matched by the regular expression @var{regexp}. The
997function skips over any amount of text that is not matched by
998@var{regexp}, and leaves point at the end of the first match found.
999It returns the new value of point.
1000
1001If @var{limit} is non-@code{nil}, it must be a position in the current
1002buffer. It specifies the upper bound to the search. No match
1003extending after that position is accepted.
1004
1005If @var{repeat} is supplied, it must be a positive number; the search
1006is repeated that many times; each repetition starts at the end of the
1007previous match. If all these successive searches succeed, the search
1008succeeds, moving point and returning its new value. Otherwise the
1009search fails. What @code{re-search-forward} does when the search
1010fails depends on the value of @var{noerror}:
1011
1012@table @asis
1013@item @code{nil}
1014Signal a @code{search-failed} error.
1015@item @code{t}
1016Do nothing and return @code{nil}.
1017@item anything else
1018Move point to @var{limit} (or the end of the accessible portion of the
1019buffer) and return @code{nil}.
1020@end table
1021
1022In the following example, point is initially before the @samp{T}.
1023Evaluating the search call moves point to the end of that line (between
1024the @samp{t} of @samp{hat} and the newline).
1025
1026@example
1027@group
1028---------- Buffer: foo ----------
1029I read "@point{}The cat in the hat
1030comes back" twice.
1031---------- Buffer: foo ----------
1032@end group
1033
1034@group
1035(re-search-forward "[a-z]+" nil t 5)
1036 @result{} 27
1037
1038---------- Buffer: foo ----------
1039I read "The cat in the hat@point{}
1040comes back" twice.
1041---------- Buffer: foo ----------
1042@end group
1043@end example
1044@end deffn
1045
1046@deffn Command re-search-backward regexp &optional limit noerror repeat
1047This function searches backward in the current buffer for a string of
1048text that is matched by the regular expression @var{regexp}, leaving
1049point at the beginning of the first text found.
1050
1051This function is analogous to @code{re-search-forward}, but they are not
1052simple mirror images. @code{re-search-forward} finds the match whose
1053beginning is as close as possible to the starting point. If
1054@code{re-search-backward} were a perfect mirror image, it would find the
1055match whose end is as close as possible. However, in fact it finds the
1056match whose beginning is as close as possible (and yet ends before the
1057starting point). The reason for this is that matching a regular
1058expression at a given spot always works from beginning to end, and
1059starts at a specified beginning position.
1060
1061A true mirror-image of @code{re-search-forward} would require a special
1062feature for matching regular expressions from end to beginning. It's
1063not worth the trouble of implementing that.
1064@end deffn
1065
1066@defun string-match regexp string &optional start
1067This function returns the index of the start of the first match for
1068the regular expression @var{regexp} in @var{string}, or @code{nil} if
1069there is no match. If @var{start} is non-@code{nil}, the search starts
1070at that index in @var{string}.
1071
1072For example,
1073
1074@example
1075@group
1076(string-match
1077 "quick" "The quick brown fox jumped quickly.")
1078 @result{} 4
1079@end group
1080@group
1081(string-match
1082 "quick" "The quick brown fox jumped quickly." 8)
1083 @result{} 27
1084@end group
1085@end example
1086
1087@noindent
1088The index of the first character of the
1089string is 0, the index of the second character is 1, and so on.
1090
1091After this function returns, the index of the first character beyond
1092the match is available as @code{(match-end 0)}. @xref{Match Data}.
1093
1094@example
1095@group
1096(string-match
1097 "quick" "The quick brown fox jumped quickly." 8)
1098 @result{} 27
1099@end group
1100
1101@group
1102(match-end 0)
1103 @result{} 32
1104@end group
1105@end example
1106@end defun
1107
3645358a 1108@defun string-match-p regexp string &optional start
4433fa91
EZ
1109This predicate function does what @code{string-match} does, but it
1110avoids modifying the match data.
3645358a
EZ
1111@end defun
1112
b8d4c8d0
GM
1113@defun looking-at regexp
1114This function determines whether the text in the current buffer directly
1115following point matches the regular expression @var{regexp}. ``Directly
1116following'' means precisely that: the search is ``anchored'' and it can
1117succeed only starting with the first character following point. The
1118result is @code{t} if so, @code{nil} otherwise.
1119
fee88ca0 1120This function does not move point, but it does update the match data.
3645358a
EZ
1121@xref{Match Data}. If you need to test for a match without modifying
1122the match data, use @code{looking-at-p}, described below.
b8d4c8d0
GM
1123
1124In this example, point is located directly before the @samp{T}. If it
1125were anywhere else, the result would be @code{nil}.
1126
1127@example
1128@group
1129---------- Buffer: foo ----------
1130I read "@point{}The cat in the hat
1131comes back" twice.
1132---------- Buffer: foo ----------
1133
1134(looking-at "The cat in the hat$")
1135 @result{} t
1136@end group
1137@end example
1138@end defun
1139
1899a5d0 1140@defun looking-back regexp &optional limit greedy
fee88ca0
GM
1141This function returns @code{t} if @var{regexp} matches the text
1142immediately before point (i.e., ending at point), and @code{nil} otherwise.
b8d4c8d0
GM
1143
1144Because regular expression matching works only going forward, this is
1145implemented by searching backwards from point for a match that ends at
1146point. That can be quite slow if it has to search a long distance.
1147You can bound the time required by specifying @var{limit}, which says
1148not to search before @var{limit}. In this case, the match that is
6cfe977d 1149found must begin at or after @var{limit}. Here's an example:
1899a5d0 1150
b8d4c8d0
GM
1151@example
1152@group
1153---------- Buffer: foo ----------
1154I read "@point{}The cat in the hat
1155comes back" twice.
1156---------- Buffer: foo ----------
1157
1158(looking-back "read \"" 3)
1159 @result{} t
1160(looking-back "read \"" 4)
1161 @result{} nil
1162@end group
1163@end example
fee88ca0 1164
6cfe977d
XF
1165If @var{greedy} is non-@code{nil}, this function extends the match
1166backwards as far as possible, stopping when a single additional
1167previous character cannot be part of a match for regexp. When the
1168match is extended, its starting position is allowed to occur before
1169@var{limit}.
1170
fee88ca0
GM
1171@c http://debbugs.gnu.org/5689
1172As a general recommendation, try to avoid using @code{looking-back}
1173wherever possible, since it is slow. For this reason, there are no
1174plans to add a @code{looking-back-p} function.
b8d4c8d0
GM
1175@end defun
1176
3645358a
EZ
1177@defun looking-at-p regexp
1178This predicate function works like @code{looking-at}, but without
1179updating the match data.
1180@end defun
1181
b8d4c8d0
GM
1182@defvar search-spaces-regexp
1183If this variable is non-@code{nil}, it should be a regular expression
1184that says how to search for whitespace. In that case, any group of
1185spaces in a regular expression being searched for stands for use of
1186this regular expression. However, spaces inside of constructs such as
1187@samp{[@dots{}]} and @samp{*}, @samp{+}, @samp{?} are not affected by
1188@code{search-spaces-regexp}.
1189
1190Since this variable affects all regular expression search and match
1191constructs, you should bind it temporarily for as small as possible
1192a part of the code.
1193@end defvar
1194
1195@node POSIX Regexps
1196@section POSIX Regular Expression Searching
1197
fee88ca0 1198@cindex backtracking and POSIX regular expressions
b8d4c8d0
GM
1199 The usual regular expression functions do backtracking when necessary
1200to handle the @samp{\|} and repetition constructs, but they continue
1201this only until they find @emph{some} match. Then they succeed and
1202report the first match found.
1203
1204 This section describes alternative search functions which perform the
1205full backtracking specified by the POSIX standard for regular expression
1206matching. They continue backtracking until they have tried all
1207possibilities and found all matches, so they can report the longest
1df7defd 1208match, as required by POSIX@. This is much slower, so use these
b8d4c8d0
GM
1209functions only when you really need the longest match.
1210
1211 The POSIX search and match functions do not properly support the
3645358a
EZ
1212non-greedy repetition operators (@pxref{Regexp Special, non-greedy}).
1213This is because POSIX backtracking conflicts with the semantics of
1214non-greedy repetition.
b8d4c8d0 1215
106e6894 1216@deffn Command posix-search-forward regexp &optional limit noerror repeat
b8d4c8d0
GM
1217This is like @code{re-search-forward} except that it performs the full
1218backtracking specified by the POSIX standard for regular expression
1219matching.
106e6894 1220@end deffn
b8d4c8d0 1221
106e6894 1222@deffn Command posix-search-backward regexp &optional limit noerror repeat
b8d4c8d0
GM
1223This is like @code{re-search-backward} except that it performs the full
1224backtracking specified by the POSIX standard for regular expression
1225matching.
106e6894 1226@end deffn
b8d4c8d0
GM
1227
1228@defun posix-looking-at regexp
1229This is like @code{looking-at} except that it performs the full
1230backtracking specified by the POSIX standard for regular expression
1231matching.
1232@end defun
1233
1234@defun posix-string-match regexp string &optional start
1235This is like @code{string-match} except that it performs the full
1236backtracking specified by the POSIX standard for regular expression
1237matching.
1238@end defun
1239
1240@node Match Data
1241@section The Match Data
1242@cindex match data
1243
1244 Emacs keeps track of the start and end positions of the segments of
1245text found during a search; this is called the @dfn{match data}.
1246Thanks to the match data, you can search for a complex pattern, such
1247as a date in a mail message, and then extract parts of the match under
1248control of the pattern.
1249
1250 Because the match data normally describe the most recent search only,
1251you must be careful not to do another search inadvertently between the
1252search you wish to refer back to and the use of the match data. If you
1253can't avoid another intervening search, you must save and restore the
1254match data around it, to prevent it from being overwritten.
1255
d2a6c43b
TR
1256 Notice that all functions are allowed to overwrite the match data
1257unless they're explicitly documented not to do so. A consequence is
53964682 1258that functions that are run implicitly in the background
d2a6c43b
TR
1259(@pxref{Timers}, and @ref{Idle Timers}) should likely save and restore
1260the match data explicitly.
1261
b8d4c8d0 1262@menu
d24880de 1263* Replacing Match:: Replacing a substring that was matched.
b8d4c8d0 1264* Simple Match Data:: Accessing single items of match data,
d24880de 1265 such as where a particular subexpression started.
b8d4c8d0
GM
1266* Entire Match Data:: Accessing the entire match data at once, as a list.
1267* Saving Match Data:: Saving and restoring the match data.
1268@end menu
1269
1270@node Replacing Match
1271@subsection Replacing the Text that Matched
1272@cindex replace matched text
1273
1274 This function replaces all or part of the text matched by the last
1275search. It works by means of the match data.
1276
1277@cindex case in replacements
1278@defun replace-match replacement &optional fixedcase literal string subexp
c88b867f
CY
1279This function performs a replacement operation on a buffer or string.
1280
1281If you did the last search in a buffer, you should omit the
1282@var{string} argument or specify @code{nil} for it, and make sure that
1283the current buffer is the one in which you performed the last search.
1284Then this function edits the buffer, replacing the matched text with
1285@var{replacement}. It leaves point at the end of the replacement
1286text, and returns @code{t}.
1287
1288If you performed the last search on a string, pass the same string as
1289@var{string}. Then this function returns a new string, in which the
1290matched text is replaced by @var{replacement}.
b8d4c8d0
GM
1291
1292If @var{fixedcase} is non-@code{nil}, then @code{replace-match} uses
1293the replacement text without case conversion; otherwise, it converts
1294the replacement text depending upon the capitalization of the text to
1295be replaced. If the original text is all upper case, this converts
1296the replacement text to upper case. If all words of the original text
1297are capitalized, this capitalizes all the words of the replacement
1298text. If all the words are one-letter and they are all upper case,
1299they are treated as capitalized words rather than all-upper-case
1300words.
1301
1302If @var{literal} is non-@code{nil}, then @var{replacement} is inserted
1303exactly as it is, the only alterations being case changes as needed.
1304If it is @code{nil} (the default), then the character @samp{\} is treated
1305specially. If a @samp{\} appears in @var{replacement}, then it must be
1306part of one of the following sequences:
1307
1308@table @asis
1309@item @samp{\&}
1310@cindex @samp{&} in replacement
8a3afaf9 1311This stands for the entire text being replaced.
b8d4c8d0 1312
8a3afaf9 1313@item @samp{\@var{n}}, where @var{n} is a digit
b8d4c8d0 1314@cindex @samp{\@var{n}} in replacement
8a3afaf9
CY
1315This stands for the text that matched the @var{n}th subexpression in
1316the original regexp. Subexpressions are those expressions grouped
1317inside @samp{\(@dots{}\)}. If the @var{n}th subexpression never
1318matched, an empty string is substituted.
b8d4c8d0
GM
1319
1320@item @samp{\\}
1321@cindex @samp{\} in replacement
8a3afaf9
CY
1322This stands for a single @samp{\} in the replacement text.
1323
1324@item @samp{\?}
1325This stands for itself (for compatibility with @code{replace-regexp}
5f1a9647 1326and related commands; @pxref{Regexp Replace,,, emacs, The GNU
8a3afaf9 1327Emacs Manual}).
b8d4c8d0
GM
1328@end table
1329
8a3afaf9
CY
1330@noindent
1331Any other character following @samp{\} signals an error.
1332
1333The substitutions performed by @samp{\&} and @samp{\@var{n}} occur
1334after case conversion, if any. Therefore, the strings they substitute
1335are never case-converted.
b8d4c8d0
GM
1336
1337If @var{subexp} is non-@code{nil}, that says to replace just
1338subexpression number @var{subexp} of the regexp that was matched, not
1339the entire match. For example, after matching @samp{foo \(ba*r\)},
1340calling @code{replace-match} with 1 as @var{subexp} means to replace
1341just the text that matched @samp{\(ba*r\)}.
1342@end defun
1343
fe284805
JL
1344@defun match-substitute-replacement replacement &optional fixedcase literal string subexp
1345This function returns the text that would be inserted into the buffer
1346by @code{replace-match}, but without modifying the buffer. It is
1347useful if you want to present the user with actual replacement result,
1348with constructs like @samp{\@var{n}} or @samp{\&} substituted with
1349matched groups. Arguments @var{replacement} and optional
1350@var{fixedcase}, @var{literal}, @var{string} and @var{subexp} have the
1351same meaning as for @code{replace-match}.
1352@end defun
1353
b8d4c8d0
GM
1354@node Simple Match Data
1355@subsection Simple Match Data Access
1356
1357 This section explains how to use the match data to find out what was
1358matched by the last search or match operation, if it succeeded.
1359
1360 You can ask about the entire matching text, or about a particular
1361parenthetical subexpression of a regular expression. The @var{count}
1362argument in the functions below specifies which. If @var{count} is
1363zero, you are asking about the entire match. If @var{count} is
1364positive, it specifies which subexpression you want.
1365
1366 Recall that the subexpressions of a regular expression are those
1367expressions grouped with escaped parentheses, @samp{\(@dots{}\)}. The
1368@var{count}th subexpression is found by counting occurrences of
1369@samp{\(} from the beginning of the whole regular expression. The first
1370subexpression is numbered 1, the second 2, and so on. Only regular
1371expressions can have subexpressions---after a simple string search, the
1372only information available is about the entire match.
1373
1374 Every successful search sets the match data. Therefore, you should
1375query the match data immediately after searching, before calling any
1376other function that might perform another search. Alternatively, you
1377may save and restore the match data (@pxref{Saving Match Data}) around
fee88ca0
GM
1378the call to functions that could perform another search. Or use the
1379functions that explicitly do not modify the match data;
1df7defd 1380e.g., @code{string-match-p}.
b8d4c8d0 1381
fee88ca0
GM
1382@c This is an old comment and presumably there is no prospect of this
1383@c changing now. But still the advice stands.
b8d4c8d0 1384 A search which fails may or may not alter the match data. In the
fee88ca0
GM
1385current implementation, it does not, but we may change it in the
1386future. Don't try to rely on the value of the match data after a
1387failing search.
b8d4c8d0
GM
1388
1389@defun match-string count &optional in-string
1390This function returns, as a string, the text matched in the last search
1391or match operation. It returns the entire text if @var{count} is zero,
1392or just the portion corresponding to the @var{count}th parenthetical
1393subexpression, if @var{count} is positive.
1394
1395If the last such operation was done against a string with
1396@code{string-match}, then you should pass the same string as the
1397argument @var{in-string}. After a buffer search or match,
1398you should omit @var{in-string} or pass @code{nil} for it; but you
1399should make sure that the current buffer when you call
1400@code{match-string} is the one in which you did the searching or
fee88ca0 1401matching. Failure to follow this advice will lead to incorrect results.
b8d4c8d0
GM
1402
1403The value is @code{nil} if @var{count} is out of range, or for a
1404subexpression inside a @samp{\|} alternative that wasn't used or a
1405repetition that repeated zero times.
1406@end defun
1407
1408@defun match-string-no-properties count &optional in-string
1409This function is like @code{match-string} except that the result
1410has no text properties.
1411@end defun
1412
1413@defun match-beginning count
fee88ca0 1414This function returns the position of the start of the text matched by the
b8d4c8d0
GM
1415last regular expression searched for, or a subexpression of it.
1416
1417If @var{count} is zero, then the value is the position of the start of
1418the entire match. Otherwise, @var{count} specifies a subexpression in
1419the regular expression, and the value of the function is the starting
1420position of the match for that subexpression.
1421
1422The value is @code{nil} for a subexpression inside a @samp{\|}
1423alternative that wasn't used or a repetition that repeated zero times.
1424@end defun
1425
1426@defun match-end count
1427This function is like @code{match-beginning} except that it returns the
1428position of the end of the match, rather than the position of the
1429beginning.
1430@end defun
1431
1432 Here is an example of using the match data, with a comment showing the
1433positions within the text:
1434
1435@example
1436@group
1437(string-match "\\(qu\\)\\(ick\\)"
1438 "The quick fox jumped quickly.")
1439 ;0123456789
1440 @result{} 4
1441@end group
1442
1443@group
1444(match-string 0 "The quick fox jumped quickly.")
1445 @result{} "quick"
1446(match-string 1 "The quick fox jumped quickly.")
1447 @result{} "qu"
1448(match-string 2 "The quick fox jumped quickly.")
1449 @result{} "ick"
1450@end group
1451
1452@group
1453(match-beginning 1) ; @r{The beginning of the match}
1454 @result{} 4 ; @r{with @samp{qu} is at index 4.}
1455@end group
1456
1457@group
1458(match-beginning 2) ; @r{The beginning of the match}
1459 @result{} 6 ; @r{with @samp{ick} is at index 6.}
1460@end group
1461
1462@group
1463(match-end 1) ; @r{The end of the match}
1464 @result{} 6 ; @r{with @samp{qu} is at index 6.}
1465
1466(match-end 2) ; @r{The end of the match}
1467 @result{} 9 ; @r{with @samp{ick} is at index 9.}
1468@end group
1469@end example
1470
1471 Here is another example. Point is initially located at the beginning
1472of the line. Searching moves point to between the space and the word
1473@samp{in}. The beginning of the entire match is at the 9th character of
1474the buffer (@samp{T}), and the beginning of the match for the first
1475subexpression is at the 13th character (@samp{c}).
1476
1477@example
1478@group
1479(list
1480 (re-search-forward "The \\(cat \\)")
1481 (match-beginning 0)
1482 (match-beginning 1))
1899a5d0 1483 @result{} (17 9 13)
b8d4c8d0
GM
1484@end group
1485
1486@group
1487---------- Buffer: foo ----------
1488I read "The cat @point{}in the hat comes back" twice.
1489 ^ ^
1490 9 13
1491---------- Buffer: foo ----------
1492@end group
1493@end example
1494
1495@noindent
1496(In this case, the index returned is a buffer position; the first
1497character of the buffer counts as 1.)
1498
1499@node Entire Match Data
1500@subsection Accessing the Entire Match Data
1501
1502 The functions @code{match-data} and @code{set-match-data} read or
1503write the entire match data, all at once.
1504
1505@defun match-data &optional integers reuse reseat
1506This function returns a list of positions (markers or integers) that
fee88ca0 1507record all the information on the text that the last search matched.
b8d4c8d0
GM
1508Element zero is the position of the beginning of the match for the
1509whole expression; element one is the position of the end of the match
1510for the expression. The next two elements are the positions of the
1511beginning and end of the match for the first subexpression, and so on.
1512In general, element
1513@ifnottex
1514number 2@var{n}
1515@end ifnottex
1516@tex
1517number {\mathsurround=0pt $2n$}
1518@end tex
1519corresponds to @code{(match-beginning @var{n})}; and
1520element
1521@ifnottex
1522number 2@var{n} + 1
1523@end ifnottex
1524@tex
1525number {\mathsurround=0pt $2n+1$}
1526@end tex
1527corresponds to @code{(match-end @var{n})}.
1528
1529Normally all the elements are markers or @code{nil}, but if
1530@var{integers} is non-@code{nil}, that means to use integers instead
1531of markers. (In that case, the buffer itself is appended as an
1532additional element at the end of the list, to facilitate complete
1533restoration of the match data.) If the last match was done on a
1534string with @code{string-match}, then integers are always used,
1535since markers can't point into a string.
1536
1537If @var{reuse} is non-@code{nil}, it should be a list. In that case,
1538@code{match-data} stores the match data in @var{reuse}. That is,
1539@var{reuse} is destructively modified. @var{reuse} does not need to
1540have the right length. If it is not long enough to contain the match
1541data, it is extended. If it is too long, the length of @var{reuse}
1542stays the same, but the elements that were not used are set to
1543@code{nil}. The purpose of this feature is to reduce the need for
1544garbage collection.
1545
1546If @var{reseat} is non-@code{nil}, all markers on the @var{reuse} list
1547are reseated to point to nowhere.
1548
1549As always, there must be no possibility of intervening searches between
1550the call to a search function and the call to @code{match-data} that is
1551intended to access the match data for that search.
1552
1553@example
1554@group
1555(match-data)
1556 @result{} (#<marker at 9 in foo>
1557 #<marker at 17 in foo>
1558 #<marker at 13 in foo>
1559 #<marker at 17 in foo>)
1560@end group
1561@end example
1562@end defun
1563
1564@defun set-match-data match-list &optional reseat
1565This function sets the match data from the elements of @var{match-list},
1566which should be a list that was the value of a previous call to
1567@code{match-data}. (More precisely, anything that has the same format
1568will work.)
1569
1570If @var{match-list} refers to a buffer that doesn't exist, you don't get
1571an error; that sets the match data in a meaningless but harmless way.
1572
1573If @var{reseat} is non-@code{nil}, all markers on the @var{match-list} list
1574are reseated to point to nowhere.
1575
fee88ca0 1576@c TODO Make it properly obsolete.
b8d4c8d0
GM
1577@findex store-match-data
1578@code{store-match-data} is a semi-obsolete alias for @code{set-match-data}.
1579@end defun
1580
1581@node Saving Match Data
1582@subsection Saving and Restoring the Match Data
1583
fee88ca0 1584 When you call a function that may search, you may need to save
b8d4c8d0
GM
1585and restore the match data around that call, if you want to preserve the
1586match data from an earlier search for later use. Here is an example
1587that shows the problem that arises if you fail to save the match data:
1588
1589@example
1590@group
1591(re-search-forward "The \\(cat \\)")
1592 @result{} 48
fee88ca0 1593(foo) ; @r{@code{foo} does more searching.}
b8d4c8d0
GM
1594(match-end 0)
1595 @result{} 61 ; @r{Unexpected result---not 48!}
1596@end group
1597@end example
1598
1599 You can save and restore the match data with @code{save-match-data}:
1600
1601@defmac save-match-data body@dots{}
1602This macro executes @var{body}, saving and restoring the match
1603data around it. The return value is the value of the last form in
1604@var{body}.
1605@end defmac
1606
1607 You could use @code{set-match-data} together with @code{match-data} to
1608imitate the effect of the special form @code{save-match-data}. Here is
1609how:
1610
1611@example
1612@group
1613(let ((data (match-data)))
1614 (unwind-protect
1615 @dots{} ; @r{Ok to change the original match data.}
1616 (set-match-data data)))
1617@end group
1618@end example
1619
1620 Emacs automatically saves and restores the match data when it runs
1621process filter functions (@pxref{Filter Functions}) and process
1622sentinels (@pxref{Sentinels}).
1623
1624@ignore
1625 Here is a function which restores the match data provided the buffer
1626associated with it still exists.
1627
1628@smallexample
1629@group
1630(defun restore-match-data (data)
1631@c It is incorrect to split the first line of a doc string.
1632@c If there's a problem here, it should be solved in some other way.
1633 "Restore the match data DATA unless the buffer is missing."
1634 (catch 'foo
1635 (let ((d data))
1636@end group
1637 (while d
1638 (and (car d)
1639 (null (marker-buffer (car d)))
1640@group
1641 ;; @file{match-data} @r{buffer is deleted.}
1642 (throw 'foo nil))
1643 (setq d (cdr d)))
1644 (set-match-data data))))
1645@end group
1646@end smallexample
1647@end ignore
1648
1649@node Search and Replace
1650@section Search and Replace
1651@cindex replacement after search
1652@cindex searching and replacing
1653
1654 If you want to find all matches for a regexp in part of the buffer,
1655and replace them, the best way is to write an explicit loop using
1656@code{re-search-forward} and @code{replace-match}, like this:
1657
1658@example
1659(while (re-search-forward "foo[ \t]+bar" nil t)
1660 (replace-match "foobar"))
1661@end example
1662
1663@noindent
1664@xref{Replacing Match,, Replacing the Text that Matched}, for a
1665description of @code{replace-match}.
1666
1667 However, replacing matches in a string is more complex, especially
1668if you want to do it efficiently. So Emacs provides a function to do
1669this.
1670
1671@defun replace-regexp-in-string regexp rep string &optional fixedcase literal subexp start
1672This function copies @var{string} and searches it for matches for
1673@var{regexp}, and replaces them with @var{rep}. It returns the
1674modified copy. If @var{start} is non-@code{nil}, the search for
1675matches starts at that index in @var{string}, so matches starting
1676before that index are not changed.
1677
1678This function uses @code{replace-match} to do the replacement, and it
1679passes the optional arguments @var{fixedcase}, @var{literal} and
1680@var{subexp} along to @code{replace-match}.
1681
1682Instead of a string, @var{rep} can be a function. In that case,
1683@code{replace-regexp-in-string} calls @var{rep} for each match,
1684passing the text of the match as its sole argument. It collects the
1685value @var{rep} returns and passes that to @code{replace-match} as the
fee88ca0 1686replacement string. The match data at this point are the result
b8d4c8d0
GM
1687of matching @var{regexp} against a substring of @var{string}.
1688@end defun
1689
1690 If you want to write a command along the lines of @code{query-replace},
1691you can use @code{perform-replace} to do the work.
1692
1693@defun perform-replace from-string replacements query-flag regexp-flag delimited-flag &optional repeat-count map start end
1694This function is the guts of @code{query-replace} and related
1695commands. It searches for occurrences of @var{from-string} in the
1696text between positions @var{start} and @var{end} and replaces some or
1697all of them. If @var{start} is @code{nil} (or omitted), point is used
1698instead, and the end of the buffer's accessible portion is used for
1699@var{end}.
1700
1701If @var{query-flag} is @code{nil}, it replaces all
1702occurrences; otherwise, it asks the user what to do about each one.
1703
1704If @var{regexp-flag} is non-@code{nil}, then @var{from-string} is
1705considered a regular expression; otherwise, it must match literally. If
1706@var{delimited-flag} is non-@code{nil}, then only replacements
1707surrounded by word boundaries are considered.
1708
1709The argument @var{replacements} specifies what to replace occurrences
1710with. If it is a string, that string is used. It can also be a list of
1711strings, to be used in cyclic order.
1712
80120f13
EZ
1713If @var{replacements} is a cons cell, @w{@code{(@var{function}
1714. @var{data})}}, this means to call @var{function} after each match to
b8d4c8d0
GM
1715get the replacement text. This function is called with two arguments:
1716@var{data}, and the number of replacements already made.
1717
1718If @var{repeat-count} is non-@code{nil}, it should be an integer. Then
1719it specifies how many times to use each of the strings in the
1720@var{replacements} list before advancing cyclically to the next one.
1721
1722If @var{from-string} contains upper-case letters, then
1723@code{perform-replace} binds @code{case-fold-search} to @code{nil}, and
fee88ca0 1724it uses the @var{replacements} without altering their case.
b8d4c8d0
GM
1725
1726Normally, the keymap @code{query-replace-map} defines the possible
1727user responses for queries. The argument @var{map}, if
1728non-@code{nil}, specifies a keymap to use instead of
1729@code{query-replace-map}.
80120f13
EZ
1730
1731This function uses one of two functions to search for the next
1732occurrence of @var{from-string}. These functions are specified by the
1733values of two variables: @code{replace-re-search-function} and
1734@code{replace-search-function}. The former is called when the
1735argument @var{regexp-flag} is non-@code{nil}, the latter when it is
1736@code{nil}.
b8d4c8d0
GM
1737@end defun
1738
1739@defvar query-replace-map
1740This variable holds a special keymap that defines the valid user
1741responses for @code{perform-replace} and the commands that use it, as
1742well as @code{y-or-n-p} and @code{map-y-or-n-p}. This map is unusual
1743in two ways:
1744
1745@itemize @bullet
1746@item
1747The ``key bindings'' are not commands, just symbols that are meaningful
1748to the functions that use this map.
1749
1750@item
1751Prefix keys are not supported; each key binding must be for a
1752single-event key sequence. This is because the functions don't use
1753@code{read-key-sequence} to get the input; instead, they read a single
fee88ca0 1754event and look it up ``by hand''.
b8d4c8d0
GM
1755@end itemize
1756@end defvar
1757
1758Here are the meaningful ``bindings'' for @code{query-replace-map}.
1759Several of them are meaningful only for @code{query-replace} and
1760friends.
1761
1762@table @code
1763@item act
fee88ca0 1764Do take the action being considered---in other words, ``yes''.
b8d4c8d0
GM
1765
1766@item skip
fee88ca0 1767Do not take action for this question---in other words, ``no''.
b8d4c8d0
GM
1768
1769@item exit
fee88ca0
GM
1770Answer this question ``no'', and give up on the entire series of
1771questions, assuming that the answers will be ``no''.
1772
1773@item exit-prefix
1774Like @code{exit}, but add the key that was pressed to
c085e5b9 1775@code{unread-command-events} (@pxref{Event Input Misc}).
b8d4c8d0
GM
1776
1777@item act-and-exit
fee88ca0
GM
1778Answer this question ``yes'', and give up on the entire series of
1779questions, assuming that subsequent answers will be ``no''.
b8d4c8d0
GM
1780
1781@item act-and-show
fee88ca0 1782Answer this question ``yes'', but show the results---don't advance yet
b8d4c8d0
GM
1783to the next question.
1784
1785@item automatic
1786Answer this question and all subsequent questions in the series with
fee88ca0 1787``yes'', without further user interaction.
b8d4c8d0
GM
1788
1789@item backup
1790Move back to the previous place that a question was asked about.
1791
1792@item edit
1793Enter a recursive edit to deal with this question---instead of any
1794other action that would normally be taken.
1795
fee88ca0
GM
1796@item edit-replacement
1797Edit the replacement for this question in the minibuffer.
1798
b8d4c8d0
GM
1799@item delete-and-edit
1800Delete the text being considered, then enter a recursive edit to replace
1801it.
1802
1803@item recenter
011474aa
CY
1804@itemx scroll-up
1805@itemx scroll-down
1806@itemx scroll-other-window
1807@itemx scroll-other-window-down
1808Perform the specified window scroll operation, then ask the same
1809question again. Only @code{y-or-n-p} and related functions use this
1810answer.
b8d4c8d0
GM
1811
1812@item quit
1813Perform a quit right away. Only @code{y-or-n-p} and related functions
1814use this answer.
1815
1816@item help
1817Display some help, then ask again.
1818@end table
1819
2c0b8144
EZ
1820@defvar multi-query-replace-map
1821This variable holds a keymap that extends @code{query-replace-map} by
1822providing additional keybindings that are useful in multi-buffer
fee88ca0
GM
1823replacements. The additional ``bindings'' are:
1824
1825@table @code
1826@item automatic-all
1827Answer this question and all subsequent questions in the series with
1828``yes'', without further user interaction, for all remaining buffers.
1829
1830@item exit-current
1831Answer this question ``no'', and give up on the entire series of
1832questions for the current buffer. Continue to the next buffer in the
1833sequence.
1834@end table
2c0b8144
EZ
1835@end defvar
1836
80120f13
EZ
1837@defvar replace-search-function
1838This variable specifies a function that @code{perform-replace} calls
1839to search for the next string to replace. Its default value is
1840@code{search-forward}. Any other value should name a function of 3
1841arguments: the first 3 arguments of @code{search-forward}
1842(@pxref{String Search}).
1843@end defvar
1844
1845@defvar replace-re-search-function
1846This variable specifies a function that @code{perform-replace} calls
1847to search for the next regexp to replace. Its default value is
1848@code{re-search-forward}. Any other value should name a function of 3
1849arguments: the first 3 arguments of @code{re-search-forward}
1850(@pxref{Regexp Search}).
1851@end defvar
1852
b8d4c8d0
GM
1853@node Standard Regexps
1854@section Standard Regular Expressions Used in Editing
1855@cindex regexps used standardly in editing
1856@cindex standard regexps used in editing
1857
1858 This section describes some variables that hold regular expressions
1859used for certain purposes in editing:
1860
01f17ae2 1861@defopt page-delimiter
b8d4c8d0
GM
1862This is the regular expression describing line-beginnings that separate
1863pages. The default value is @code{"^\014"} (i.e., @code{"^^L"} or
1864@code{"^\C-l"}); this matches a line that starts with a formfeed
1865character.
01f17ae2 1866@end defopt
b8d4c8d0
GM
1867
1868 The following two regular expressions should @emph{not} assume the
1869match always starts at the beginning of a line; they should not use
1870@samp{^} to anchor the match. Most often, the paragraph commands do
1871check for a match only at the beginning of a line, which means that
1872@samp{^} would be superfluous. When there is a nonzero left margin,
1873they accept matches that start after the left margin. In that case, a
1874@samp{^} would be incorrect. However, a @samp{^} is harmless in modes
1875where a left margin is never used.
1876
01f17ae2 1877@defopt paragraph-separate
b8d4c8d0
GM
1878This is the regular expression for recognizing the beginning of a line
1879that separates paragraphs. (If you change this, you may have to
1880change @code{paragraph-start} also.) The default value is
1881@w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of
1882spaces, tabs, and form feeds (after its left margin).
01f17ae2 1883@end defopt
b8d4c8d0 1884
01f17ae2 1885@defopt paragraph-start
b8d4c8d0
GM
1886This is the regular expression for recognizing the beginning of a line
1887that starts @emph{or} separates paragraphs. The default value is
1888@w{@code{"\f\\|[ \t]*$"}}, which matches a line containing only
1889whitespace or starting with a form feed (after its left margin).
01f17ae2 1890@end defopt
b8d4c8d0 1891
01f17ae2 1892@defopt sentence-end
b8d4c8d0
GM
1893If non-@code{nil}, the value should be a regular expression describing
1894the end of a sentence, including the whitespace following the
1895sentence. (All paragraph boundaries also end sentences, regardless.)
1896
fee88ca0
GM
1897If the value is @code{nil}, as it is by default, then the function
1898@code{sentence-end} constructs the regexp. That is why you
b8d4c8d0
GM
1899should always call the function @code{sentence-end} to obtain the
1900regexp to be used to recognize the end of a sentence.
01f17ae2 1901@end defopt
b8d4c8d0
GM
1902
1903@defun sentence-end
1904This function returns the value of the variable @code{sentence-end},
1905if non-@code{nil}. Otherwise it returns a default value based on the
1906values of the variables @code{sentence-end-double-space}
1907(@pxref{Definition of sentence-end-double-space}),
fee88ca0 1908@code{sentence-end-without-period}, and
b8d4c8d0
GM
1909@code{sentence-end-without-space}.
1910@end defun