(update_frame): Don't put cursor in echo area unless
[bpt/emacs.git] / lispref / searching.texi
CommitLineData
7015aca4
RS
1@c -*-texinfo-*-
2@c This is part of the GNU Emacs Lisp Reference Manual.
3@c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc.
4@c See the file elisp.texi for copying conditions.
5@setfilename ../info/searching
6@node Searching and Matching, Syntax Tables, Text, Top
7@chapter Searching and Matching
8@cindex searching
9
10 GNU Emacs provides two ways to search through a buffer for specified
11text: exact string searches and regular expression searches. After a
12regular expression search, you can examine the @dfn{match data} to
13determine which text matched the whole regular expression or various
14portions of it.
15
16@menu
17* String Search:: Search for an exact match.
18* Regular Expressions:: Describing classes of strings.
19* Regexp Search:: Searching for a match for a regexp.
20* Search and Replace:: Internals of @code{query-replace}.
21* Match Data:: Finding out which part of the text matched
22 various parts of a regexp, after regexp search.
23* Searching and Case:: Case-independent or case-significant searching.
24* Standard Regexps:: Useful regexps for finding sentences, pages,...
25@end menu
26
27 The @samp{skip-chars@dots{}} functions also perform a kind of searching.
28@xref{Skipping Characters}.
29
30@node String Search
31@section Searching for Strings
32@cindex string search
33
34 These are the primitive functions for searching through the text in a
35buffer. They are meant for use in programs, but you may call them
36interactively. If you do so, they prompt for the search string;
37@var{limit} and @var{noerror} are set to @code{nil}, and @var{repeat}
38is set to 1.
39
40@deffn Command search-forward string &optional limit noerror repeat
41 This function searches forward from point for an exact match for
42@var{string}. If successful, it sets point to the end of the occurrence
43found, and returns the new value of point. If no match is found, the
44value and side effects depend on @var{noerror} (see below).
45@c Emacs 19 feature
46
47 In the following example, point is initially at the beginning of the
48line. Then @code{(search-forward "fox")} moves point after the last
49letter of @samp{fox}:
50
51@example
52@group
53---------- Buffer: foo ----------
54@point{}The quick brown fox jumped over the lazy dog.
55---------- Buffer: foo ----------
56@end group
57
58@group
59(search-forward "fox")
60 @result{} 20
61
62---------- Buffer: foo ----------
63The quick brown fox@point{} jumped over the lazy dog.
64---------- Buffer: foo ----------
65@end group
66@end example
67
68 The argument @var{limit} specifies the upper bound to the search. (It
69must be a position in the current buffer.) No match extending after
70that position is accepted. If @var{limit} is omitted or @code{nil}, it
71defaults to the end of the accessible portion of the buffer.
72
73@kindex search-failed
74 What happens when the search fails depends on the value of
75@var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
76error is signaled. If @var{noerror} is @code{t}, @code{search-forward}
77returns @code{nil} and does nothing. If @var{noerror} is neither
78@code{nil} nor @code{t}, then @code{search-forward} moves point to the
79upper bound and returns @code{nil}. (It would be more consistent now
80to return the new position of point in that case, but some programs
81may depend on a value of @code{nil}.)
82
61cfa852
RS
83If @var{repeat} is supplied (it must be a positive number), then the
84search is repeated that many times (each time starting at the end of the
85previous time's match). If these successive searches succeed, the
86function succeeds, moving point and returning its new value. Otherwise
87the search fails.
7015aca4
RS
88@end deffn
89
90@deffn Command search-backward string &optional limit noerror repeat
91This function searches backward from point for @var{string}. It is
92just like @code{search-forward} except that it searches backwards and
93leaves point at the beginning of the match.
94@end deffn
95
96@deffn Command word-search-forward string &optional limit noerror repeat
97@cindex word search
98This function searches forward from point for a ``word'' match for
99@var{string}. If it finds a match, it sets point to the end of the
100match found, and returns the new value of point.
101@c Emacs 19 feature
102
103Word matching regards @var{string} as a sequence of words, disregarding
104punctuation that separates them. It searches the buffer for the same
105sequence of words. Each word must be distinct in the buffer (searching
106for the word @samp{ball} does not match the word @samp{balls}), but the
107details of punctuation and spacing are ignored (searching for @samp{ball
108boy} does match @samp{ball. Boy!}).
109
110In this example, point is initially at the beginning of the buffer; the
111search leaves it between the @samp{y} and the @samp{!}.
112
113@example
114@group
115---------- Buffer: foo ----------
116@point{}He said "Please! Find
117the ball boy!"
118---------- Buffer: foo ----------
119@end group
120
121@group
122(word-search-forward "Please find the ball, boy.")
123 @result{} 35
124
125---------- Buffer: foo ----------
126He said "Please! Find
127the ball boy@point{}!"
128---------- Buffer: foo ----------
129@end group
130@end example
131
132If @var{limit} is non-@code{nil} (it must be a position in the current
133buffer), then it is the upper bound to the search. The match found must
134not extend after that position.
135
136If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
137an error if the search fails. If @var{noerror} is @code{t}, then it
138returns @code{nil} instead of signaling an error. If @var{noerror} is
139neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the
140end of the buffer) and returns @code{nil}.
141
142If @var{repeat} is non-@code{nil}, then the search is repeated that many
143times. Point is positioned at the end of the last match.
144@end deffn
145
146@deffn Command word-search-backward string &optional limit noerror repeat
147This function searches backward from point for a word match to
148@var{string}. This function is just like @code{word-search-forward}
149except that it searches backward and normally leaves point at the
150beginning of the match.
151@end deffn
152
153@node Regular Expressions
154@section Regular Expressions
155@cindex regular expression
156@cindex regexp
157
158 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
159denotes a (possibly infinite) set of strings. Searching for matches for
160a regexp is a very powerful operation. This section explains how to write
161regexps; the following section says how to search for them.
162
163@menu
164* Syntax of Regexps:: Rules for writing regular expressions.
165* Regexp Example:: Illustrates regular expression syntax.
166@end menu
167
168@node Syntax of Regexps
169@subsection Syntax of Regular Expressions
170
61cfa852
RS
171 Regular expressions have a syntax in which a few characters are
172special constructs and the rest are @dfn{ordinary}. An ordinary
173character is a simple regular expression that matches that character and
174nothing else. The special characters are @samp{.}, @samp{*}, @samp{+},
175@samp{?}, @samp{[}, @samp{]}, @samp{^}, @samp{$}, and @samp{\}; no new
176special characters will be defined in the future. Any other character
177appearing in a regular expression is ordinary, unless a @samp{\}
178precedes it.
7015aca4
RS
179
180For example, @samp{f} is not a special character, so it is ordinary, and
181therefore @samp{f} is a regular expression that matches the string
182@samp{f} and no other string. (It does @emph{not} match the string
183@samp{ff}.) Likewise, @samp{o} is a regular expression that matches
184only @samp{o}.@refill
185
186Any two regular expressions @var{a} and @var{b} can be concatenated. The
61cfa852 187result is a regular expression that matches a string if @var{a} matches
7015aca4
RS
188some amount of the beginning of that string and @var{b} matches the rest of
189the string.@refill
190
191As a simple example, we can concatenate the regular expressions @samp{f}
192and @samp{o} to get the regular expression @samp{fo}, which matches only
193the string @samp{fo}. Still trivial. To do something more powerful, you
194need to use one of the special characters. Here is a list of them:
195
196@need 1200
197@table @kbd
198@item .@: @r{(Period)}
199@cindex @samp{.} in regexp
200is a special character that matches any single character except a newline.
201Using concatenation, we can make regular expressions like @samp{a.b}, which
202matches any three-character string that begins with @samp{a} and ends with
203@samp{b}.@refill
204
205@item *
206@cindex @samp{*} in regexp
207is not a construct by itself; it is a suffix operator that means to
208repeat the preceding regular expression as many times as possible. In
209@samp{fo*}, the @samp{*} applies to the @samp{o}, so @samp{fo*} matches
210one @samp{f} followed by any number of @samp{o}s. The case of zero
211@samp{o}s is allowed: @samp{fo*} does match @samp{f}.@refill
212
213@samp{*} always applies to the @emph{smallest} possible preceding
214expression. Thus, @samp{fo*} has a repeating @samp{o}, not a
215repeating @samp{fo}.@refill
216
217The matcher processes a @samp{*} construct by matching, immediately,
218as many repetitions as can be found. Then it continues with the rest
219of the pattern. If that fails, backtracking occurs, discarding some
220of the matches of the @samp{*}-modified construct in case that makes
221it possible to match the rest of the pattern. For example, in matching
222@samp{ca*ar} against the string @samp{caaar}, the @samp{a*} first
223tries to match all three @samp{a}s; but the rest of the pattern is
224@samp{ar} and there is only @samp{r} left to match, so this try fails.
225The next alternative is for @samp{a*} to match only two @samp{a}s.
226With this choice, the rest of the regexp matches successfully.@refill
227
228@item +
229@cindex @samp{+} in regexp
230is a suffix operator similar to @samp{*} except that the preceding
231expression must match at least once. So, for example, @samp{ca+r}
232matches the strings @samp{car} and @samp{caaaar} but not the string
233@samp{cr}, whereas @samp{ca*r} matches all three strings.
234
235@item ?
236@cindex @samp{?} in regexp
237is a suffix operator similar to @samp{*} except that the preceding
238expression can match either once or not at all. For example,
239@samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing
240else.
241
242@item [ @dots{} ]
243@cindex character set (in regexp)
244@cindex @samp{[} in regexp
245@cindex @samp{]} in regexp
246@samp{[} begins a @dfn{character set}, which is terminated by a
247@samp{]}. In the simplest case, the characters between the two brackets
248form the set. Thus, @samp{[ad]} matches either one @samp{a} or one
249@samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s
250and @samp{d}s (including the empty string), from which it follows that
251@samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr},
252@samp{caddaar}, etc.@refill
253
254The usual regular expression special characters are not special inside a
255character set. A completely different set of special characters exists
256inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill
257
258@samp{-} is used for ranges of characters. To write a range, write two
259characters with a @samp{-} between them. Thus, @samp{[a-z]} matches any
260lower case letter. Ranges may be intermixed freely with individual
261characters, as in @samp{[a-z$%.]}, which matches any lower case letter
61cfa852 262or @samp{$}, @samp{%}, or a period.@refill
7015aca4
RS
263
264To include a @samp{]} in a character set, make it the first character.
265For example, @samp{[]a]} matches @samp{]} or @samp{a}. To include a
61cfa852 266@samp{-}, write @samp{-} as the first character in the set, or put it
7015aca4
RS
267immediately after a range. (You can replace one individual character
268@var{c} with the range @samp{@var{c}-@var{c}} to make a place to put the
61cfa852 269@samp{-}.) There is no way to write a set containing just @samp{-} and
7015aca4
RS
270@samp{]}.
271
272To include @samp{^} in a set, put it anywhere but at the beginning of
273the set.
274
275@item [^ @dots{} ]
276@cindex @samp{^} in regexp
277@samp{[^} begins a @dfn{complement character set}, which matches any
278character except the ones specified. Thus, @samp{[^a-z0-9A-Z]}
279matches all characters @emph{except} letters and digits.@refill
280
281@samp{^} is not special in a character set unless it is the first
282character. The character following the @samp{^} is treated as if it
283were first (thus, @samp{-} and @samp{]} are not special there).
284
285Note that a complement character set can match a newline, unless
286newline is mentioned as one of the characters not to match.
287
288@item ^
289@cindex @samp{^} in regexp
290@cindex beginning of line in regexp
61cfa852
RS
291is a special character that matches the empty string, but only at the
292beginning of a line in the text being matched. Otherwise it fails to
293match anything. Thus, @samp{^foo} matches a @samp{foo} that occurs at
294the beginning of a line.
7015aca4 295
61cfa852
RS
296When matching a string instead of a buffer, @samp{^} matches at the
297beginning of the string or after a newline character @samp{\n}.
7015aca4
RS
298
299@item $
300@cindex @samp{$} in regexp
301is similar to @samp{^} but matches only at the end of a line. Thus,
302@samp{x+$} matches a string of one @samp{x} or more at the end of a line.
303
61cfa852
RS
304When matching a string instead of a buffer, @samp{$} matches at the end
305of the string or before a newline character @samp{\n}.
7015aca4
RS
306
307@item \
308@cindex @samp{\} in regexp
309has two functions: it quotes the special characters (including
310@samp{\}), and it introduces additional special constructs.
311
312Because @samp{\} quotes special characters, @samp{\$} is a regular
61cfa852
RS
313expression that matches only @samp{$}, and @samp{\[} is a regular
314expression that matches only @samp{[}, and so on.
7015aca4
RS
315
316Note that @samp{\} also has special meaning in the read syntax of Lisp
317strings (@pxref{String Type}), and must be quoted with @samp{\}. For
318example, the regular expression that matches the @samp{\} character is
319@samp{\\}. To write a Lisp string that contains the characters
320@samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
321@samp{\}. Therefore, the read syntax for a regular expression matching
322@samp{\} is @code{"\\\\"}.@refill
323@end table
324
b22f3a19 325@strong{Please note:} For historical compatibility, special characters
7015aca4
RS
326are treated as ordinary ones if they are in contexts where their special
327meanings make no sense. For example, @samp{*foo} treats @samp{*} as
328ordinary since there is no preceding expression on which the @samp{*}
61cfa852
RS
329can act. It is poor practice to depend on this behavior; quote the
330special character anyway, regardless of where it appears.@refill
7015aca4
RS
331
332For the most part, @samp{\} followed by any character matches only
333that character. However, there are several exceptions: characters
61cfa852 334that, when preceded by @samp{\}, are special constructs. Such
7015aca4
RS
335characters are always ordinary when encountered on their own. Here
336is a table of @samp{\} constructs:
337
338@table @kbd
339@item \|
340@cindex @samp{|} in regexp
341@cindex regexp alternative
342specifies an alternative.
343Two regular expressions @var{a} and @var{b} with @samp{\|} in
344between form an expression that matches anything that either @var{a} or
345@var{b} matches.@refill
346
347Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
348but no other string.@refill
349
350@samp{\|} applies to the largest possible surrounding expressions. Only a
351surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
352@samp{\|}.@refill
353
354Full backtracking capability exists to handle multiple uses of @samp{\|}.
355
356@item \( @dots{} \)
357@cindex @samp{(} in regexp
358@cindex @samp{)} in regexp
359@cindex regexp grouping
360is a grouping construct that serves three purposes:
361
362@enumerate
363@item
364To enclose a set of @samp{\|} alternatives for other operations.
365Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}.
366
367@item
368To enclose an expression for a suffix operator such as @samp{*} to act
369on. Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any
370(zero or more) number of @samp{na} strings.@refill
371
372@item
373To record a matched substring for future reference.
374@end enumerate
375
376This last application is not a consequence of the idea of a
61cfa852 377parenthetical grouping; it is a separate feature that happens to be
7015aca4
RS
378assigned as a second meaning to the same @samp{\( @dots{} \)} construct
379because there is no conflict in practice between the two meanings.
380Here is an explanation of this feature:
381
382@item \@var{digit}
61cfa852 383matches the same text that matched the @var{digit}th occurrence of a
7015aca4
RS
384@samp{\( @dots{} \)} construct.
385
386In other words, after the end of a @samp{\( @dots{} \)} construct. the
387matcher remembers the beginning and end of the text matched by that
388construct. Then, later on in the regular expression, you can use
389@samp{\} followed by @var{digit} to match that same text, whatever it
390may have been.
391
392The strings matching the first nine @samp{\( @dots{} \)} constructs
393appearing in a regular expression are assigned numbers 1 through 9 in
394the order that the open parentheses appear in the regular expression.
395So you can use @samp{\1} through @samp{\9} to refer to the text matched
396by the corresponding @samp{\( @dots{} \)} constructs.
397
398For example, @samp{\(.*\)\1} matches any newline-free string that is
399composed of two identical halves. The @samp{\(.*\)} matches the first
400half, which may be anything, but the @samp{\1} that follows must match
401the same exact text.
402
403@item \w
404@cindex @samp{\w} in regexp
405matches any word-constituent character. The editor syntax table
406determines which characters these are. @xref{Syntax Tables}.
407
408@item \W
409@cindex @samp{\W} in regexp
61cfa852 410matches any character that is not a word constituent.
7015aca4
RS
411
412@item \s@var{code}
413@cindex @samp{\s} in regexp
414matches any character whose syntax is @var{code}. Here @var{code} is a
61cfa852 415character that represents a syntax code: thus, @samp{w} for word
7015aca4
RS
416constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
417etc. @xref{Syntax Tables}, for a list of syntax codes and the
418characters that stand for them.
419
420@item \S@var{code}
421@cindex @samp{\S} in regexp
422matches any character whose syntax is not @var{code}.
423@end table
424
61cfa852 425 The following regular expression constructs match the empty string---that is,
7015aca4
RS
426they don't use up any characters---but whether they match depends on the
427context.
428
429@table @kbd
430@item \`
431@cindex @samp{\`} in regexp
432matches the empty string, but only at the beginning
433of the buffer or string being matched against.
434
435@item \'
436@cindex @samp{\'} in regexp
437matches the empty string, but only at the end of
438the buffer or string being matched against.
439
440@item \=
441@cindex @samp{\=} in regexp
442matches the empty string, but only at point.
443(This construct is not defined when matching against a string.)
444
445@item \b
446@cindex @samp{\b} in regexp
447matches the empty string, but only at the beginning or
448end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
449@samp{foo} as a separate word. @samp{\bballs?\b} matches
450@samp{ball} or @samp{balls} as a separate word.@refill
451
452@item \B
453@cindex @samp{\B} in regexp
454matches the empty string, but @emph{not} at the beginning or
455end of a word.
456
457@item \<
458@cindex @samp{\<} in regexp
459matches the empty string, but only at the beginning of a word.
460
461@item \>
462@cindex @samp{\>} in regexp
463matches the empty string, but only at the end of a word.
464@end table
465
466@kindex invalid-regexp
467 Not every string is a valid regular expression. For example, a string
468with unbalanced square brackets is invalid (with a few exceptions, such
61cfa852 469as @samp{[]]}), and so is a string that ends with a single @samp{\}. If
7015aca4
RS
470an invalid regular expression is passed to any of the search functions,
471an @code{invalid-regexp} error is signaled.
472
473@defun regexp-quote string
474This function returns a regular expression string that matches exactly
475@var{string} and nothing else. This allows you to request an exact
476string match when calling a function that wants a regular expression.
477
478@example
479@group
480(regexp-quote "^The cat$")
481 @result{} "\\^The cat\\$"
482@end group
483@end example
484
485One use of @code{regexp-quote} is to combine an exact string match with
486context described as a regular expression. For example, this searches
61cfa852 487for the string that is the value of @code{string}, surrounded by
7015aca4
RS
488whitespace:
489
490@example
491@group
492(re-search-forward
61cfa852 493 (concat "\\s-" (regexp-quote string) "\\s-"))
7015aca4
RS
494@end group
495@end example
496@end defun
497
498@node Regexp Example
499@comment node-name, next, previous, up
500@subsection Complex Regexp Example
501
502 Here is a complicated regexp, used by Emacs to recognize the end of a
503sentence together with any whitespace that follows. It is the value of
504the variable @code{sentence-end}.
505
506 First, we show the regexp as a string in Lisp syntax to distinguish
507spaces from tab characters. The string constant begins and ends with a
508double-quote. @samp{\"} stands for a double-quote as part of the
509string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
510tab and @samp{\n} for a newline.
511
512@example
513"[.?!][]\"')@}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
514@end example
515
516 In contrast, if you evaluate the variable @code{sentence-end}, you
517will see the following:
518
519@example
520@group
521sentence-end
522@result{}
523"[.?!][]\"')@}]*\\($\\| $\\| \\| \\)[
524]*"
525@end group
526@end example
527
528@noindent
529In this output, tab and newline appear as themselves.
530
531 This regular expression contains four parts in succession and can be
532deciphered as follows:
533
534@table @code
535@item [.?!]
7fd1911a
RS
536The first part of the pattern is a character set that matches any one of
537three characters: period, question mark, and exclamation mark. The
7015aca4
RS
538match must begin with one of these three characters.
539
540@item []\"')@}]*
541The second part of the pattern matches any closing braces and quotation
542marks, zero or more of them, that may follow the period, question mark
543or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in
544a string. The @samp{*} at the end indicates that the immediately
545preceding regular expression (a character set, in this case) may be
546repeated zero or more times.
547
7fd1911a 548@item \\($\\|@ $\\|\t\\|@ @ \\)
7015aca4
RS
549The third part of the pattern matches the whitespace that follows the
550end of a sentence: the end of a line, or a tab, or two spaces. The
551double backslashes mark the parentheses and vertical bars as regular
7fd1911a 552expression syntax; the parentheses delimit a group and the vertical bars
7015aca4
RS
553separate alternatives. The dollar sign is used to match the end of a
554line.
555
556@item [ \t\n]*
557Finally, the last part of the pattern matches any additional whitespace
558beyond the minimum needed to end a sentence.
559@end table
560
561@node Regexp Search
562@section Regular Expression Searching
563@cindex regular expression searching
564@cindex regexp searching
565@cindex searching for regexp
566
567 In GNU Emacs, you can search for the next match for a regexp either
568incrementally or not. For incremental search commands, see @ref{Regexp
569Search, , Regular Expression Search, emacs, The GNU Emacs Manual}. Here
570we describe only the search functions useful in programs. The principal
571one is @code{re-search-forward}.
572
573@deffn Command re-search-forward regexp &optional limit noerror repeat
574This function searches forward in the current buffer for a string of
575text that is matched by the regular expression @var{regexp}. The
576function skips over any amount of text that is not matched by
577@var{regexp}, and leaves point at the end of the first match found.
578It returns the new value of point.
579
580If @var{limit} is non-@code{nil} (it must be a position in the current
581buffer), then it is the upper bound to the search. No match extending
582after that position is accepted.
583
584What happens when the search fails depends on the value of
585@var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
586error is signaled. If @var{noerror} is @code{t},
587@code{re-search-forward} does nothing and returns @code{nil}. If
588@var{noerror} is neither @code{nil} nor @code{t}, then
589@code{re-search-forward} moves point to @var{limit} (or the end of the
590buffer) and returns @code{nil}.
591
592If @var{repeat} is supplied (it must be a positive number), then the
593search is repeated that many times (each time starting at the end of the
594previous time's match). If these successive searches succeed, the
595function succeeds, moving point and returning its new value. Otherwise
596the search fails.
597
598In the following example, point is initially before the @samp{T}.
599Evaluating the search call moves point to the end of that line (between
600the @samp{t} of @samp{hat} and the newline).
601
602@example
603@group
604---------- Buffer: foo ----------
605I read "@point{}The cat in the hat
606comes back" twice.
607---------- Buffer: foo ----------
608@end group
609
610@group
611(re-search-forward "[a-z]+" nil t 5)
612 @result{} 27
613
614---------- Buffer: foo ----------
615I read "The cat in the hat@point{}
616comes back" twice.
617---------- Buffer: foo ----------
618@end group
619@end example
620@end deffn
621
622@deffn Command re-search-backward regexp &optional limit noerror repeat
623This function searches backward in the current buffer for a string of
624text that is matched by the regular expression @var{regexp}, leaving
625point at the beginning of the first text found.
626
7fd1911a
RS
627This function is analogous to @code{re-search-forward}, but they are not
628simple mirror images. @code{re-search-forward} finds the match whose
629beginning is as close as possible to the starting point. If
630@code{re-search-backward} were a perfect mirror image, it would find the
631match whose end is as close as possible. However, in fact it finds the
632match whose beginning is as close as possible. The reason is that
633matching a regular expression at a given spot always works from
634beginning to end, and starts at a specified beginning position.
7015aca4
RS
635
636A true mirror-image of @code{re-search-forward} would require a special
637feature for matching regexps from end to beginning. It's not worth the
638trouble of implementing that.
639@end deffn
640
641@defun string-match regexp string &optional start
642This function returns the index of the start of the first match for
643the regular expression @var{regexp} in @var{string}, or @code{nil} if
644there is no match. If @var{start} is non-@code{nil}, the search starts
645at that index in @var{string}.
646
647For example,
648
649@example
650@group
651(string-match
652 "quick" "The quick brown fox jumped quickly.")
653 @result{} 4
654@end group
655@group
656(string-match
657 "quick" "The quick brown fox jumped quickly." 8)
658 @result{} 27
659@end group
660@end example
661
662@noindent
663The index of the first character of the
664string is 0, the index of the second character is 1, and so on.
665
666After this function returns, the index of the first character beyond
667the match is available as @code{(match-end 0)}. @xref{Match Data}.
668
669@example
670@group
671(string-match
672 "quick" "The quick brown fox jumped quickly." 8)
673 @result{} 27
674@end group
675
676@group
677(match-end 0)
678 @result{} 32
679@end group
680@end example
681@end defun
682
683@defun looking-at regexp
684This function determines whether the text in the current buffer directly
685following point matches the regular expression @var{regexp}. ``Directly
686following'' means precisely that: the search is ``anchored'' and it can
687succeed only starting with the first character following point. The
688result is @code{t} if so, @code{nil} otherwise.
689
690This function does not move point, but it updates the match data, which
691you can access using @code{match-beginning} and @code{match-end}.
692@xref{Match Data}.
693
694In this example, point is located directly before the @samp{T}. If it
695were anywhere else, the result would be @code{nil}.
696
697@example
698@group
699---------- Buffer: foo ----------
700I read "@point{}The cat in the hat
701comes back" twice.
702---------- Buffer: foo ----------
703
704(looking-at "The cat in the hat$")
705 @result{} t
706@end group
707@end example
708@end defun
709
710@ignore
711@deffn Command delete-matching-lines regexp
712This function is identical to @code{delete-non-matching-lines}, save
713that it deletes what @code{delete-non-matching-lines} keeps.
714
715In the example below, point is located on the first line of text.
716
717@example
718@group
719---------- Buffer: foo ----------
720We hold these truths
721to be self-evident,
722that all men are created
723equal, and that they are
724---------- Buffer: foo ----------
725@end group
726
727@group
728(delete-matching-lines "the")
729 @result{} nil
730
731---------- Buffer: foo ----------
732to be self-evident,
733that all men are created
734---------- Buffer: foo ----------
735@end group
736@end example
737@end deffn
738
739@deffn Command flush-lines regexp
740This function is the same as @code{delete-matching-lines}.
741@end deffn
742
743@defun delete-non-matching-lines regexp
744This function deletes all lines following point which don't
745contain a match for the regular expression @var{regexp}.
746@end defun
747
748@deffn Command keep-lines regexp
749This function is the same as @code{delete-non-matching-lines}.
750@end deffn
751
752@deffn Command how-many regexp
753This function counts the number of matches for @var{regexp} there are in
754the current buffer following point. It prints this number in
755the echo area, returning the string printed.
756@end deffn
757
758@deffn Command count-matches regexp
759This function is a synonym of @code{how-many}.
760@end deffn
761
762@deffn Command list-matching-lines regexp nlines
763This function is a synonym of @code{occur}.
764Show all lines following point containing a match for @var{regexp}.
765Display each line with @var{nlines} lines before and after,
766or @code{-}@var{nlines} before if @var{nlines} is negative.
767@var{nlines} defaults to @code{list-matching-lines-default-context-lines}.
768Interactively it is the prefix arg.
769
770The lines are shown in a buffer named @samp{*Occur*}.
771It serves as a menu to find any of the occurrences in this buffer.
772@kbd{C-h m} (@code{describe-mode} in that buffer gives help.
773@end deffn
774
775@defopt list-matching-lines-default-context-lines
776Default value is 0.
777Default number of context lines to include around a @code{list-matching-lines}
778match. A negative number means to include that many lines before the match.
779A positive number means to include that many lines both before and after.
780@end defopt
781@end ignore
782
783@node Search and Replace
784@section Search and Replace
785@cindex replacement
786
787@defun perform-replace from-string replacements query-flag regexp-flag delimited-flag &optional repeat-count map
788This function is the guts of @code{query-replace} and related commands.
789It searches for occurrences of @var{from-string} and replaces some or
790all of them. If @var{query-flag} is @code{nil}, it replaces all
791occurrences; otherwise, it asks the user what to do about each one.
792
793If @var{regexp-flag} is non-@code{nil}, then @var{from-string} is
794considered a regular expression; otherwise, it must match literally. If
795@var{delimited-flag} is non-@code{nil}, then only replacements
796surrounded by word boundaries are considered.
797
798The argument @var{replacements} specifies what to replace occurrences
799with. If it is a string, that string is used. It can also be a list of
800strings, to be used in cyclic order.
801
802If @var{repeat-count} is non-@code{nil}, it should be an integer, the
803number of occurrences to consider. In this case, @code{perform-replace}
804returns after considering that many occurrences.
805
806Normally, the keymap @code{query-replace-map} defines the possible user
7fd1911a
RS
807responses for queries. The argument @var{map}, if non-@code{nil}, is a
808keymap to use instead of @code{query-replace-map}.
7015aca4
RS
809@end defun
810
811@defvar query-replace-map
812This variable holds a special keymap that defines the valid user
813responses for @code{query-replace} and related functions, as well as
814@code{y-or-n-p} and @code{map-y-or-n-p}. It is unusual in two ways:
815
816@itemize @bullet
817@item
818The ``key bindings'' are not commands, just symbols that are meaningful
819to the functions that use this map.
820
821@item
822Prefix keys are not supported; each key binding must be for a single event
823key sequence. This is because the functions don't use read key sequence to
824get the input; instead, they read a single event and look it up ``by hand.''
825@end itemize
826@end defvar
827
828Here are the meaningful ``bindings'' for @code{query-replace-map}.
829Several of them are meaningful only for @code{query-replace} and
830friends.
831
832@table @code
833@item act
834Do take the action being considered---in other words, ``yes.''
835
836@item skip
837Do not take action for this question---in other words, ``no.''
838
839@item exit
7fd1911a
RS
840Answer this question ``no,'' and give up on the entire series of
841questions, assuming that the answers will be ``no.''
7015aca4
RS
842
843@item act-and-exit
7fd1911a
RS
844Answer this question ``yes,'' and give up on the entire series of
845questions, assuming that subsequent answers will be ``no.''
7015aca4
RS
846
847@item act-and-show
848Answer this question ``yes,'' but show the results---don't advance yet
849to the next question.
850
851@item automatic
852Answer this question and all subsequent questions in the series with
853``yes,'' without further user interaction.
854
855@item backup
856Move back to the previous place that a question was asked about.
857
858@item edit
859Enter a recursive edit to deal with this question---instead of any
860other action that would normally be taken.
861
862@item delete-and-edit
863Delete the text being considered, then enter a recursive edit to replace
864it.
865
866@item recenter
867Redisplay and center the window, then ask the same question again.
868
869@item quit
870Perform a quit right away. Only @code{y-or-n-p} and related functions
871use this answer.
872
873@item help
874Display some help, then ask again.
875@end table
876
877@node Match Data
878@section The Match Data
879@cindex match data
880
881 Emacs keeps track of the positions of the start and end of segments of
882text found during a regular expression search. This means, for example,
883that you can search for a complex pattern, such as a date in an Rmail
884message, and then extract parts of the match under control of the
885pattern.
886
887 Because the match data normally describe the most recent search only,
888you must be careful not to do another search inadvertently between the
889search you wish to refer back to and the use of the match data. If you
890can't avoid another intervening search, you must save and restore the
891match data around it, to prevent it from being overwritten.
892
893@menu
894* Simple Match Data:: Accessing single items of match data,
895 such as where a particular subexpression started.
896* Replacing Match:: Replacing a substring that was matched.
897* Entire Match Data:: Accessing the entire match data at once, as a list.
898* Saving Match Data:: Saving and restoring the match data.
899@end menu
900
901@node Simple Match Data
902@subsection Simple Match Data Access
903
904 This section explains how to use the match data to find the starting
905point or ending point of the text that was matched by a particular
906search, or by a particular parenthetical subexpression of a regular
907expression.
908
909@defun match-beginning count
910This function returns the position of the start of text matched by the
911last regular expression searched for, or a subexpression of it.
912
7fd1911a
RS
913If @var{count} is zero, then the value is the position of the start of
914the text matched by the whole regexp. Otherwise, @var{count}, specifies
915a subexpression in the regular expresion. The value of the function is
916the starting position of the match for that subexpression.
7015aca4
RS
917
918Subexpressions of a regular expression are those expressions grouped
7fd1911a 919with escaped parentheses, @samp{\(@dots{}\)}. The @var{count}th
7015aca4
RS
920subexpression is found by counting occurrences of @samp{\(} from the
921beginning of the whole regular expression. The first subexpression is
922numbered 1, the second 2, and so on.
923
7fd1911a 924The value is @code{nil} for a subexpression inside a
7015aca4
RS
925@samp{\|} alternative that wasn't used in the match.
926@end defun
927
928@defun match-end count
929This function returns the position of the end of the text that matched
930the last regular expression searched for, or a subexpression of it.
931This function is otherwise similar to @code{match-beginning}.
932@end defun
933
934 Here is an example of using the match data, with a comment showing the
935positions within the text:
936
937@example
938@group
939(string-match "\\(qu\\)\\(ick\\)"
940 "The quick fox jumped quickly.")
941 ;0123456789
942 @result{} 4
943@end group
944
945@group
946(match-beginning 1) ; @r{The beginning of the match}
947 @result{} 4 ; @r{with @samp{qu} is at index 4.}
948@end group
949
950@group
951(match-beginning 2) ; @r{The beginning of the match}
952 @result{} 6 ; @r{with @samp{ick} is at index 6.}
953@end group
954
955@group
956(match-end 1) ; @r{The end of the match}
957 @result{} 6 ; @r{with @samp{qu} is at index 6.}
958
959(match-end 2) ; @r{The end of the match}
960 @result{} 9 ; @r{with @samp{ick} is at index 9.}
961@end group
962@end example
963
964 Here is another example. Point is initially located at the beginning
965of the line. Searching moves point to between the space and the word
966@samp{in}. The beginning of the entire match is at the 9th character of
967the buffer (@samp{T}), and the beginning of the match for the first
968subexpression is at the 13th character (@samp{c}).
969
970@example
971@group
972(list
973 (re-search-forward "The \\(cat \\)")
974 (match-beginning 0)
975 (match-beginning 1))
7fd1911a 976 @result{} (9 9 13)
7015aca4
RS
977@end group
978
979@group
980---------- Buffer: foo ----------
981I read "The cat @point{}in the hat comes back" twice.
982 ^ ^
983 9 13
984---------- Buffer: foo ----------
985@end group
986@end example
987
988@noindent
989(In this case, the index returned is a buffer position; the first
990character of the buffer counts as 1.)
991
992@node Replacing Match
993@subsection Replacing the Text That Matched
994
995 This function replaces the text matched by the last search with
996@var{replacement}.
997
998@cindex case in replacements
999@defun replace-match replacement &optional fixedcase literal
1000This function replaces the buffer text matched by the last search, with
1001@var{replacement}. It applies only to buffers; you can't use
1002@code{replace-match} to replace a substring found with
1003@code{string-match}.
1004
1005If @var{fixedcase} is non-@code{nil}, then the case of the replacement
1006text is not changed; otherwise, the replacement text is converted to a
1007different case depending upon the capitalization of the text to be
1008replaced. If the original text is all upper case, the replacement text
a890e1b0
RS
1009is converted to upper case. If the first word of the original text is
1010capitalized, then the first word of the replacement text is capitalized.
1011If the original text contains just one word, and that word is a capital
1012letter, @code{replace-match} considers this a capitalized first word
1013rather than all upper case.
7015aca4 1014
7fd1911a
RS
1015If @code{case-replace} is @code{nil}, then case conversion is not done,
1016regardless of the value of @var{fixed-case}. @xref{Searching and Case}.
1017
7015aca4
RS
1018If @var{literal} is non-@code{nil}, then @var{replacement} is inserted
1019exactly as it is, the only alterations being case changes as needed.
1020If it is @code{nil} (the default), then the character @samp{\} is treated
1021specially. If a @samp{\} appears in @var{replacement}, then it must be
1022part of one of the following sequences:
1023
1024@table @asis
1025@item @samp{\&}
1026@cindex @samp{&} in replacement
1027@samp{\&} stands for the entire text being replaced.
1028
1029@item @samp{\@var{n}}
1030@cindex @samp{\@var{n}} in replacement
7fd1911a
RS
1031@samp{\@var{n}}, where @var{n} is a digit, stands for the text that
1032matched the @var{n}th subexpression in the original regexp.
1033Subexpressions are those expressions grouped inside @samp{\(@dots{}\)}.
7015aca4
RS
1034
1035@item @samp{\\}
1036@cindex @samp{\} in replacement
1037@samp{\\} stands for a single @samp{\} in the replacement text.
1038@end table
1039
1040@code{replace-match} leaves point at the end of the replacement text,
1041and returns @code{t}.
1042@end defun
1043
1044@node Entire Match Data
1045@subsection Accessing the Entire Match Data
1046
1047 The functions @code{match-data} and @code{set-match-data} read or
1048write the entire match data, all at once.
1049
1050@defun match-data
1051This function returns a newly constructed list containing all the
1052information on what text the last search matched. Element zero is the
1053position of the beginning of the match for the whole expression; element
1054one is the position of the end of the match for the expression. The
1055next two elements are the positions of the beginning and end of the
1056match for the first subexpression, and so on. In general, element
1057@ifinfo
1058number 2@var{n}
1059@end ifinfo
1060@tex
1061number {\mathsurround=0pt $2n$}
1062@end tex
1063corresponds to @code{(match-beginning @var{n})}; and
1064element
1065@ifinfo
1066number 2@var{n} + 1
1067@end ifinfo
1068@tex
1069number {\mathsurround=0pt $2n+1$}
1070@end tex
1071corresponds to @code{(match-end @var{n})}.
1072
1073All the elements are markers or @code{nil} if matching was done on a
1074buffer, and all are integers or @code{nil} if matching was done on a
1075string with @code{string-match}. (In Emacs 18 and earlier versions,
1076markers were used even for matching on a string, except in the case
1077of the integer 0.)
1078
1079As always, there must be no possibility of intervening searches between
1080the call to a search function and the call to @code{match-data} that is
1081intended to access the match data for that search.
1082
1083@example
1084@group
1085(match-data)
1086 @result{} (#<marker at 9 in foo>
1087 #<marker at 17 in foo>
1088 #<marker at 13 in foo>
1089 #<marker at 17 in foo>)
1090@end group
1091@end example
1092@end defun
1093
1094@defun set-match-data match-list
1095This function sets the match data from the elements of @var{match-list},
1096which should be a list that was the value of a previous call to
1097@code{match-data}.
1098
1099If @var{match-list} refers to a buffer that doesn't exist, you don't get
1100an error; that sets the match data in a meaningless but harmless way.
1101
1102@findex store-match-data
1103@code{store-match-data} is an alias for @code{set-match-data}.
1104@end defun
1105
1106@node Saving Match Data
1107@subsection Saving and Restoring the Match Data
1108
d1280259
RS
1109 When you call a function that may do a search, you may need to save
1110and restore the match data around that call, if you want to preserve the
1111match data from an earlier search for later use. Here is an example
1112that shows the problem that arises if you fail to save the match data:
7015aca4
RS
1113
1114@example
1115@group
1116(re-search-forward "The \\(cat \\)")
1117 @result{} 48
1118(foo) ; @r{Perhaps @code{foo} does}
1119 ; @r{more searching.}
1120(match-end 0)
1121 @result{} 61 ; @r{Unexpected result---not 48!}
1122@end group
1123@end example
1124
d1280259 1125 You can save and restore the match data with @code{save-match-data}:
7015aca4
RS
1126
1127@defspec save-match-data body@dots{}
1128This special form executes @var{body}, saving and restoring the match
d1280259 1129data around it.
7015aca4
RS
1130@end defspec
1131
1132 You can use @code{set-match-data} together with @code{match-data} to
1133imitate the effect of the special form @code{save-match-data}. This is
1134useful for writing code that can run in Emacs 18. Here is how:
1135
1136@example
1137@group
1138(let ((data (match-data)))
1139 (unwind-protect
1140 @dots{} ; @r{May change the original match data.}
1141 (set-match-data data)))
1142@end group
1143@end example
1144
d1280259
RS
1145 Emacs automatically saves and restores the match data when it runs
1146process filter functions (@pxref{Filter Functions}) and process
1147sentinels (@pxref{Sentinels}).
1148
7015aca4
RS
1149@ignore
1150 Here is a function which restores the match data provided the buffer
1151associated with it still exists.
1152
1153@smallexample
1154@group
1155(defun restore-match-data (data)
1156@c It is incorrect to split the first line of a doc string.
1157@c If there's a problem here, it should be solved in some other way.
1158 "Restore the match data DATA unless the buffer is missing."
1159 (catch 'foo
1160 (let ((d data))
1161@end group
1162 (while d
1163 (and (car d)
1164 (null (marker-buffer (car d)))
1165@group
1166 ;; @file{match-data} @r{buffer is deleted.}
1167 (throw 'foo nil))
1168 (setq d (cdr d)))
1169 (set-match-data data))))
1170@end group
1171@end smallexample
1172@end ignore
1173
1174@node Searching and Case
1175@section Searching and Case
1176@cindex searching and case
1177
1178 By default, searches in Emacs ignore the case of the text they are
1179searching through; if you specify searching for @samp{FOO}, then
1180@samp{Foo} or @samp{foo} is also considered a match. Regexps, and in
1181particular character sets, are included: thus, @samp{[aB]} would match
1182@samp{a} or @samp{A} or @samp{b} or @samp{B}.
1183
1184 If you do not want this feature, set the variable
1185@code{case-fold-search} to @code{nil}. Then all letters must match
7fd1911a
RS
1186exactly, including case. This is a buffer-local variable; altering the
1187variable affects only the current buffer. (@xref{Intro to
7015aca4
RS
1188Buffer-Local}.) Alternatively, you may change the value of
1189@code{default-case-fold-search}, which is the default value of
1190@code{case-fold-search} for buffers that do not override it.
1191
1192 Note that the user-level incremental search feature handles case
1193distinctions differently. When given a lower case letter, it looks for
1194a match of either case, but when given an upper case letter, it looks
1195for an upper case letter only. But this has nothing to do with the
1196searching functions Lisp functions use.
1197
1198@defopt case-replace
7fd1911a
RS
1199This variable determines whether the replacement functions should
1200preserve case. If the variable is @code{nil}, that means to use the
1201replacement text verbatim. A non-@code{nil} value means to convert the
1202case of the replacement text according to the text being replaced.
1203
1204The function @code{replace-match} is where this variable actually has
1205its effect. @xref{Replacing Match}.
7015aca4
RS
1206@end defopt
1207
1208@defopt case-fold-search
1209This buffer-local variable determines whether searches should ignore
1210case. If the variable is @code{nil} they do not ignore case; otherwise
1211they do ignore case.
1212@end defopt
1213
1214@defvar default-case-fold-search
1215The value of this variable is the default value for
1216@code{case-fold-search} in buffers that do not override it. This is the
1217same as @code{(default-value 'case-fold-search)}.
1218@end defvar
1219
1220@node Standard Regexps
1221@section Standard Regular Expressions Used in Editing
1222@cindex regexps used standardly in editing
1223@cindex standard regexps used in editing
1224
1225 This section describes some variables that hold regular expressions
1226used for certain purposes in editing:
1227
1228@defvar page-delimiter
1229This is the regexp describing line-beginnings that separate pages. The
7fd1911a
RS
1230default value is @code{"^\014"} (i.e., @code{"^^L"} or @code{"^\C-l"});
1231this matches a line that starts with a formfeed character.
7015aca4
RS
1232@end defvar
1233
1234@defvar paragraph-separate
1235This is the regular expression for recognizing the beginning of a line
1236that separates paragraphs. (If you change this, you may have to
7fd1911a
RS
1237change @code{paragraph-start} also.) The default value is
1238@w{@code{"^[@ \t\f]*$"}}, which matches a line that consists entirely of
1239spaces, tabs, and form feeds.
7015aca4
RS
1240@end defvar
1241
1242@defvar paragraph-start
1243This is the regular expression for recognizing the beginning of a line
1244that starts @emph{or} separates paragraphs. The default value is
7fd1911a 1245@w{@code{"^[@ \t\n\f]"}}, which matches a line starting with a space, tab,
7015aca4
RS
1246newline, or form feed.
1247@end defvar
1248
1249@defvar sentence-end
1250This is the regular expression describing the end of a sentence. (All
1251paragraph boundaries also end sentences, regardless.) The default value
1252is:
1253
1254@example
7fd1911a 1255"[.?!][]\"')@}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
7015aca4
RS
1256@end example
1257
7fd1911a
RS
1258This means a period, question mark or exclamation mark, followed
1259optionally by a closing parenthetical character, followed by tabs,
1260spaces or new lines.
7015aca4
RS
1261
1262For a detailed explanation of this regular expression, see @ref{Regexp
1263Example}.
1264@end defvar