Capitalize some error messages.
[bpt/emacs.git] / lispref / searching.texi
CommitLineData
7015aca4
RS
1@c -*-texinfo-*-
2@c This is part of the GNU Emacs Lisp Reference Manual.
3@c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc.
4@c See the file elisp.texi for copying conditions.
5@setfilename ../info/searching
6@node Searching and Matching, Syntax Tables, Text, Top
7@chapter Searching and Matching
8@cindex searching
9
10 GNU Emacs provides two ways to search through a buffer for specified
11text: exact string searches and regular expression searches. After a
12regular expression search, you can examine the @dfn{match data} to
13determine which text matched the whole regular expression or various
14portions of it.
15
16@menu
17* String Search:: Search for an exact match.
18* Regular Expressions:: Describing classes of strings.
19* Regexp Search:: Searching for a match for a regexp.
20* Search and Replace:: Internals of @code{query-replace}.
21* Match Data:: Finding out which part of the text matched
22 various parts of a regexp, after regexp search.
23* Searching and Case:: Case-independent or case-significant searching.
24* Standard Regexps:: Useful regexps for finding sentences, pages,...
25@end menu
26
27 The @samp{skip-chars@dots{}} functions also perform a kind of searching.
28@xref{Skipping Characters}.
29
30@node String Search
31@section Searching for Strings
32@cindex string search
33
34 These are the primitive functions for searching through the text in a
35buffer. They are meant for use in programs, but you may call them
36interactively. If you do so, they prompt for the search string;
37@var{limit} and @var{noerror} are set to @code{nil}, and @var{repeat}
38is set to 1.
39
40@deffn Command search-forward string &optional limit noerror repeat
41 This function searches forward from point for an exact match for
42@var{string}. If successful, it sets point to the end of the occurrence
43found, and returns the new value of point. If no match is found, the
44value and side effects depend on @var{noerror} (see below).
45@c Emacs 19 feature
46
47 In the following example, point is initially at the beginning of the
48line. Then @code{(search-forward "fox")} moves point after the last
49letter of @samp{fox}:
50
51@example
52@group
53---------- Buffer: foo ----------
54@point{}The quick brown fox jumped over the lazy dog.
55---------- Buffer: foo ----------
56@end group
57
58@group
59(search-forward "fox")
60 @result{} 20
61
62---------- Buffer: foo ----------
63The quick brown fox@point{} jumped over the lazy dog.
64---------- Buffer: foo ----------
65@end group
66@end example
67
68 The argument @var{limit} specifies the upper bound to the search. (It
69must be a position in the current buffer.) No match extending after
70that position is accepted. If @var{limit} is omitted or @code{nil}, it
71defaults to the end of the accessible portion of the buffer.
72
73@kindex search-failed
74 What happens when the search fails depends on the value of
75@var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
76error is signaled. If @var{noerror} is @code{t}, @code{search-forward}
77returns @code{nil} and does nothing. If @var{noerror} is neither
78@code{nil} nor @code{t}, then @code{search-forward} moves point to the
79upper bound and returns @code{nil}. (It would be more consistent now
80to return the new position of point in that case, but some programs
81may depend on a value of @code{nil}.)
82
83 If @var{repeat} is non-@code{nil}, then the search is repeated that
84many times. Point is positioned at the end of the last match.
85@end deffn
86
87@deffn Command search-backward string &optional limit noerror repeat
88This function searches backward from point for @var{string}. It is
89just like @code{search-forward} except that it searches backwards and
90leaves point at the beginning of the match.
91@end deffn
92
93@deffn Command word-search-forward string &optional limit noerror repeat
94@cindex word search
95This function searches forward from point for a ``word'' match for
96@var{string}. If it finds a match, it sets point to the end of the
97match found, and returns the new value of point.
98@c Emacs 19 feature
99
100Word matching regards @var{string} as a sequence of words, disregarding
101punctuation that separates them. It searches the buffer for the same
102sequence of words. Each word must be distinct in the buffer (searching
103for the word @samp{ball} does not match the word @samp{balls}), but the
104details of punctuation and spacing are ignored (searching for @samp{ball
105boy} does match @samp{ball. Boy!}).
106
107In this example, point is initially at the beginning of the buffer; the
108search leaves it between the @samp{y} and the @samp{!}.
109
110@example
111@group
112---------- Buffer: foo ----------
113@point{}He said "Please! Find
114the ball boy!"
115---------- Buffer: foo ----------
116@end group
117
118@group
119(word-search-forward "Please find the ball, boy.")
120 @result{} 35
121
122---------- Buffer: foo ----------
123He said "Please! Find
124the ball boy@point{}!"
125---------- Buffer: foo ----------
126@end group
127@end example
128
129If @var{limit} is non-@code{nil} (it must be a position in the current
130buffer), then it is the upper bound to the search. The match found must
131not extend after that position.
132
133If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
134an error if the search fails. If @var{noerror} is @code{t}, then it
135returns @code{nil} instead of signaling an error. If @var{noerror} is
136neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the
137end of the buffer) and returns @code{nil}.
138
139If @var{repeat} is non-@code{nil}, then the search is repeated that many
140times. Point is positioned at the end of the last match.
141@end deffn
142
143@deffn Command word-search-backward string &optional limit noerror repeat
144This function searches backward from point for a word match to
145@var{string}. This function is just like @code{word-search-forward}
146except that it searches backward and normally leaves point at the
147beginning of the match.
148@end deffn
149
150@node Regular Expressions
151@section Regular Expressions
152@cindex regular expression
153@cindex regexp
154
155 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
156denotes a (possibly infinite) set of strings. Searching for matches for
157a regexp is a very powerful operation. This section explains how to write
158regexps; the following section says how to search for them.
159
160@menu
161* Syntax of Regexps:: Rules for writing regular expressions.
162* Regexp Example:: Illustrates regular expression syntax.
163@end menu
164
165@node Syntax of Regexps
166@subsection Syntax of Regular Expressions
167
168 Regular expressions have a syntax in which a few characters are special
169constructs and the rest are @dfn{ordinary}. An ordinary character is a
170simple regular expression which matches that character and nothing else.
171The special characters are @samp{$}, @samp{^}, @samp{.}, @samp{*},
172@samp{+}, @samp{?}, @samp{[}, @samp{]} and @samp{\}; no new special
173characters will be defined in the future. Any other character appearing
174in a regular expression is ordinary, unless a @samp{\} precedes it.
175
176For example, @samp{f} is not a special character, so it is ordinary, and
177therefore @samp{f} is a regular expression that matches the string
178@samp{f} and no other string. (It does @emph{not} match the string
179@samp{ff}.) Likewise, @samp{o} is a regular expression that matches
180only @samp{o}.@refill
181
182Any two regular expressions @var{a} and @var{b} can be concatenated. The
183result is a regular expression which matches a string if @var{a} matches
184some amount of the beginning of that string and @var{b} matches the rest of
185the string.@refill
186
187As a simple example, we can concatenate the regular expressions @samp{f}
188and @samp{o} to get the regular expression @samp{fo}, which matches only
189the string @samp{fo}. Still trivial. To do something more powerful, you
190need to use one of the special characters. Here is a list of them:
191
192@need 1200
193@table @kbd
194@item .@: @r{(Period)}
195@cindex @samp{.} in regexp
196is a special character that matches any single character except a newline.
197Using concatenation, we can make regular expressions like @samp{a.b}, which
198matches any three-character string that begins with @samp{a} and ends with
199@samp{b}.@refill
200
201@item *
202@cindex @samp{*} in regexp
203is not a construct by itself; it is a suffix operator that means to
204repeat the preceding regular expression as many times as possible. In
205@samp{fo*}, the @samp{*} applies to the @samp{o}, so @samp{fo*} matches
206one @samp{f} followed by any number of @samp{o}s. The case of zero
207@samp{o}s is allowed: @samp{fo*} does match @samp{f}.@refill
208
209@samp{*} always applies to the @emph{smallest} possible preceding
210expression. Thus, @samp{fo*} has a repeating @samp{o}, not a
211repeating @samp{fo}.@refill
212
213The matcher processes a @samp{*} construct by matching, immediately,
214as many repetitions as can be found. Then it continues with the rest
215of the pattern. If that fails, backtracking occurs, discarding some
216of the matches of the @samp{*}-modified construct in case that makes
217it possible to match the rest of the pattern. For example, in matching
218@samp{ca*ar} against the string @samp{caaar}, the @samp{a*} first
219tries to match all three @samp{a}s; but the rest of the pattern is
220@samp{ar} and there is only @samp{r} left to match, so this try fails.
221The next alternative is for @samp{a*} to match only two @samp{a}s.
222With this choice, the rest of the regexp matches successfully.@refill
223
224@item +
225@cindex @samp{+} in regexp
226is a suffix operator similar to @samp{*} except that the preceding
227expression must match at least once. So, for example, @samp{ca+r}
228matches the strings @samp{car} and @samp{caaaar} but not the string
229@samp{cr}, whereas @samp{ca*r} matches all three strings.
230
231@item ?
232@cindex @samp{?} in regexp
233is a suffix operator similar to @samp{*} except that the preceding
234expression can match either once or not at all. For example,
235@samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing
236else.
237
238@item [ @dots{} ]
239@cindex character set (in regexp)
240@cindex @samp{[} in regexp
241@cindex @samp{]} in regexp
242@samp{[} begins a @dfn{character set}, which is terminated by a
243@samp{]}. In the simplest case, the characters between the two brackets
244form the set. Thus, @samp{[ad]} matches either one @samp{a} or one
245@samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s
246and @samp{d}s (including the empty string), from which it follows that
247@samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr},
248@samp{caddaar}, etc.@refill
249
250The usual regular expression special characters are not special inside a
251character set. A completely different set of special characters exists
252inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill
253
254@samp{-} is used for ranges of characters. To write a range, write two
255characters with a @samp{-} between them. Thus, @samp{[a-z]} matches any
256lower case letter. Ranges may be intermixed freely with individual
257characters, as in @samp{[a-z$%.]}, which matches any lower case letter
258or @samp{$}, @samp{%} or a period.@refill
259
260To include a @samp{]} in a character set, make it the first character.
261For example, @samp{[]a]} matches @samp{]} or @samp{a}. To include a
262@samp{-}, write @samp{-} as the first character in the set, or put
263immediately after a range. (You can replace one individual character
264@var{c} with the range @samp{@var{c}-@var{c}} to make a place to put the
265@samp{-}). There is no way to write a set containing just @samp{-} and
266@samp{]}.
267
268To include @samp{^} in a set, put it anywhere but at the beginning of
269the set.
270
271@item [^ @dots{} ]
272@cindex @samp{^} in regexp
273@samp{[^} begins a @dfn{complement character set}, which matches any
274character except the ones specified. Thus, @samp{[^a-z0-9A-Z]}
275matches all characters @emph{except} letters and digits.@refill
276
277@samp{^} is not special in a character set unless it is the first
278character. The character following the @samp{^} is treated as if it
279were first (thus, @samp{-} and @samp{]} are not special there).
280
281Note that a complement character set can match a newline, unless
282newline is mentioned as one of the characters not to match.
283
284@item ^
285@cindex @samp{^} in regexp
286@cindex beginning of line in regexp
287is a special character that matches the empty string, but only at
288the beginning of a line in the text being matched. Otherwise it fails
289to match anything. Thus, @samp{^foo} matches a @samp{foo} which occurs
290at the beginning of a line.
291
292When matching a string, @samp{^} matches at the beginning of the string
293or after a newline character @samp{\n}.
294
295@item $
296@cindex @samp{$} in regexp
297is similar to @samp{^} but matches only at the end of a line. Thus,
298@samp{x+$} matches a string of one @samp{x} or more at the end of a line.
299
300When matching a string, @samp{$} matches at the end of the string
301or before a newline character @samp{\n}.
302
303@item \
304@cindex @samp{\} in regexp
305has two functions: it quotes the special characters (including
306@samp{\}), and it introduces additional special constructs.
307
308Because @samp{\} quotes special characters, @samp{\$} is a regular
309expression which matches only @samp{$}, and @samp{\[} is a regular
310expression which matches only @samp{[}, and so on.
311
312Note that @samp{\} also has special meaning in the read syntax of Lisp
313strings (@pxref{String Type}), and must be quoted with @samp{\}. For
314example, the regular expression that matches the @samp{\} character is
315@samp{\\}. To write a Lisp string that contains the characters
316@samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
317@samp{\}. Therefore, the read syntax for a regular expression matching
318@samp{\} is @code{"\\\\"}.@refill
319@end table
320
b22f3a19 321@strong{Please note:} For historical compatibility, special characters
7015aca4
RS
322are treated as ordinary ones if they are in contexts where their special
323meanings make no sense. For example, @samp{*foo} treats @samp{*} as
324ordinary since there is no preceding expression on which the @samp{*}
325can act. It is poor practice to depend on this behavior; better to
326quote the special character anyway, regardless of where it
327appears.@refill
328
329For the most part, @samp{\} followed by any character matches only
330that character. However, there are several exceptions: characters
331which, when preceded by @samp{\}, are special constructs. Such
332characters are always ordinary when encountered on their own. Here
333is a table of @samp{\} constructs:
334
335@table @kbd
336@item \|
337@cindex @samp{|} in regexp
338@cindex regexp alternative
339specifies an alternative.
340Two regular expressions @var{a} and @var{b} with @samp{\|} in
341between form an expression that matches anything that either @var{a} or
342@var{b} matches.@refill
343
344Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
345but no other string.@refill
346
347@samp{\|} applies to the largest possible surrounding expressions. Only a
348surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
349@samp{\|}.@refill
350
351Full backtracking capability exists to handle multiple uses of @samp{\|}.
352
353@item \( @dots{} \)
354@cindex @samp{(} in regexp
355@cindex @samp{)} in regexp
356@cindex regexp grouping
357is a grouping construct that serves three purposes:
358
359@enumerate
360@item
361To enclose a set of @samp{\|} alternatives for other operations.
362Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}.
363
364@item
365To enclose an expression for a suffix operator such as @samp{*} to act
366on. Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any
367(zero or more) number of @samp{na} strings.@refill
368
369@item
370To record a matched substring for future reference.
371@end enumerate
372
373This last application is not a consequence of the idea of a
374parenthetical grouping; it is a separate feature which happens to be
375assigned as a second meaning to the same @samp{\( @dots{} \)} construct
376because there is no conflict in practice between the two meanings.
377Here is an explanation of this feature:
378
379@item \@var{digit}
380matches the same text which matched the @var{digit}th occurrence of a
381@samp{\( @dots{} \)} construct.
382
383In other words, after the end of a @samp{\( @dots{} \)} construct. the
384matcher remembers the beginning and end of the text matched by that
385construct. Then, later on in the regular expression, you can use
386@samp{\} followed by @var{digit} to match that same text, whatever it
387may have been.
388
389The strings matching the first nine @samp{\( @dots{} \)} constructs
390appearing in a regular expression are assigned numbers 1 through 9 in
391the order that the open parentheses appear in the regular expression.
392So you can use @samp{\1} through @samp{\9} to refer to the text matched
393by the corresponding @samp{\( @dots{} \)} constructs.
394
395For example, @samp{\(.*\)\1} matches any newline-free string that is
396composed of two identical halves. The @samp{\(.*\)} matches the first
397half, which may be anything, but the @samp{\1} that follows must match
398the same exact text.
399
400@item \w
401@cindex @samp{\w} in regexp
402matches any word-constituent character. The editor syntax table
403determines which characters these are. @xref{Syntax Tables}.
404
405@item \W
406@cindex @samp{\W} in regexp
407matches any character that is not a word-constituent.
408
409@item \s@var{code}
410@cindex @samp{\s} in regexp
411matches any character whose syntax is @var{code}. Here @var{code} is a
412character which represents a syntax code: thus, @samp{w} for word
413constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
414etc. @xref{Syntax Tables}, for a list of syntax codes and the
415characters that stand for them.
416
417@item \S@var{code}
418@cindex @samp{\S} in regexp
419matches any character whose syntax is not @var{code}.
420@end table
421
422 These regular expression constructs match the empty string---that is,
423they don't use up any characters---but whether they match depends on the
424context.
425
426@table @kbd
427@item \`
428@cindex @samp{\`} in regexp
429matches the empty string, but only at the beginning
430of the buffer or string being matched against.
431
432@item \'
433@cindex @samp{\'} in regexp
434matches the empty string, but only at the end of
435the buffer or string being matched against.
436
437@item \=
438@cindex @samp{\=} in regexp
439matches the empty string, but only at point.
440(This construct is not defined when matching against a string.)
441
442@item \b
443@cindex @samp{\b} in regexp
444matches the empty string, but only at the beginning or
445end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
446@samp{foo} as a separate word. @samp{\bballs?\b} matches
447@samp{ball} or @samp{balls} as a separate word.@refill
448
449@item \B
450@cindex @samp{\B} in regexp
451matches the empty string, but @emph{not} at the beginning or
452end of a word.
453
454@item \<
455@cindex @samp{\<} in regexp
456matches the empty string, but only at the beginning of a word.
457
458@item \>
459@cindex @samp{\>} in regexp
460matches the empty string, but only at the end of a word.
461@end table
462
463@kindex invalid-regexp
464 Not every string is a valid regular expression. For example, a string
465with unbalanced square brackets is invalid (with a few exceptions, such
466as @samp{[]]}, and so is a string that ends with a single @samp{\}. If
467an invalid regular expression is passed to any of the search functions,
468an @code{invalid-regexp} error is signaled.
469
470@defun regexp-quote string
471This function returns a regular expression string that matches exactly
472@var{string} and nothing else. This allows you to request an exact
473string match when calling a function that wants a regular expression.
474
475@example
476@group
477(regexp-quote "^The cat$")
478 @result{} "\\^The cat\\$"
479@end group
480@end example
481
482One use of @code{regexp-quote} is to combine an exact string match with
483context described as a regular expression. For example, this searches
484for the string which is the value of @code{string}, surrounded by
485whitespace:
486
487@example
488@group
489(re-search-forward
490 (concat "\\s " (regexp-quote string) "\\s "))
491@end group
492@end example
493@end defun
494
495@node Regexp Example
496@comment node-name, next, previous, up
497@subsection Complex Regexp Example
498
499 Here is a complicated regexp, used by Emacs to recognize the end of a
500sentence together with any whitespace that follows. It is the value of
501the variable @code{sentence-end}.
502
503 First, we show the regexp as a string in Lisp syntax to distinguish
504spaces from tab characters. The string constant begins and ends with a
505double-quote. @samp{\"} stands for a double-quote as part of the
506string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
507tab and @samp{\n} for a newline.
508
509@example
510"[.?!][]\"')@}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
511@end example
512
513 In contrast, if you evaluate the variable @code{sentence-end}, you
514will see the following:
515
516@example
517@group
518sentence-end
519@result{}
520"[.?!][]\"')@}]*\\($\\| $\\| \\| \\)[
521]*"
522@end group
523@end example
524
525@noindent
526In this output, tab and newline appear as themselves.
527
528 This regular expression contains four parts in succession and can be
529deciphered as follows:
530
531@table @code
532@item [.?!]
533The first part of the pattern consists of three characters, a period, a
534question mark and an exclamation mark, within square brackets. The
535match must begin with one of these three characters.
536
537@item []\"')@}]*
538The second part of the pattern matches any closing braces and quotation
539marks, zero or more of them, that may follow the period, question mark
540or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in
541a string. The @samp{*} at the end indicates that the immediately
542preceding regular expression (a character set, in this case) may be
543repeated zero or more times.
544
545@item \\($\\|@ \\|\t\\|@ @ \\)
546The third part of the pattern matches the whitespace that follows the
547end of a sentence: the end of a line, or a tab, or two spaces. The
548double backslashes mark the parentheses and vertical bars as regular
549expression syntax; the parentheses mark the group and the vertical bars
550separate alternatives. The dollar sign is used to match the end of a
551line.
552
553@item [ \t\n]*
554Finally, the last part of the pattern matches any additional whitespace
555beyond the minimum needed to end a sentence.
556@end table
557
558@node Regexp Search
559@section Regular Expression Searching
560@cindex regular expression searching
561@cindex regexp searching
562@cindex searching for regexp
563
564 In GNU Emacs, you can search for the next match for a regexp either
565incrementally or not. For incremental search commands, see @ref{Regexp
566Search, , Regular Expression Search, emacs, The GNU Emacs Manual}. Here
567we describe only the search functions useful in programs. The principal
568one is @code{re-search-forward}.
569
570@deffn Command re-search-forward regexp &optional limit noerror repeat
571This function searches forward in the current buffer for a string of
572text that is matched by the regular expression @var{regexp}. The
573function skips over any amount of text that is not matched by
574@var{regexp}, and leaves point at the end of the first match found.
575It returns the new value of point.
576
577If @var{limit} is non-@code{nil} (it must be a position in the current
578buffer), then it is the upper bound to the search. No match extending
579after that position is accepted.
580
581What happens when the search fails depends on the value of
582@var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
583error is signaled. If @var{noerror} is @code{t},
584@code{re-search-forward} does nothing and returns @code{nil}. If
585@var{noerror} is neither @code{nil} nor @code{t}, then
586@code{re-search-forward} moves point to @var{limit} (or the end of the
587buffer) and returns @code{nil}.
588
589If @var{repeat} is supplied (it must be a positive number), then the
590search is repeated that many times (each time starting at the end of the
591previous time's match). If these successive searches succeed, the
592function succeeds, moving point and returning its new value. Otherwise
593the search fails.
594
595In the following example, point is initially before the @samp{T}.
596Evaluating the search call moves point to the end of that line (between
597the @samp{t} of @samp{hat} and the newline).
598
599@example
600@group
601---------- Buffer: foo ----------
602I read "@point{}The cat in the hat
603comes back" twice.
604---------- Buffer: foo ----------
605@end group
606
607@group
608(re-search-forward "[a-z]+" nil t 5)
609 @result{} 27
610
611---------- Buffer: foo ----------
612I read "The cat in the hat@point{}
613comes back" twice.
614---------- Buffer: foo ----------
615@end group
616@end example
617@end deffn
618
619@deffn Command re-search-backward regexp &optional limit noerror repeat
620This function searches backward in the current buffer for a string of
621text that is matched by the regular expression @var{regexp}, leaving
622point at the beginning of the first text found.
623
624This function is analogous to @code{re-search-forward}, but they are
625not simple mirror images. @code{re-search-forward} finds the match
626whose beginning is as close as possible. If @code{re-search-backward}
627were a perfect mirror image, it would find the match whose end is as
628close as possible. However, in fact it finds the match whose beginning
629is as close as possible. The reason is that matching a regular
630expression at a given spot always works from beginning to end, and is
631done at a specified beginning position.
632
633A true mirror-image of @code{re-search-forward} would require a special
634feature for matching regexps from end to beginning. It's not worth the
635trouble of implementing that.
636@end deffn
637
638@defun string-match regexp string &optional start
639This function returns the index of the start of the first match for
640the regular expression @var{regexp} in @var{string}, or @code{nil} if
641there is no match. If @var{start} is non-@code{nil}, the search starts
642at that index in @var{string}.
643
644For example,
645
646@example
647@group
648(string-match
649 "quick" "The quick brown fox jumped quickly.")
650 @result{} 4
651@end group
652@group
653(string-match
654 "quick" "The quick brown fox jumped quickly." 8)
655 @result{} 27
656@end group
657@end example
658
659@noindent
660The index of the first character of the
661string is 0, the index of the second character is 1, and so on.
662
663After this function returns, the index of the first character beyond
664the match is available as @code{(match-end 0)}. @xref{Match Data}.
665
666@example
667@group
668(string-match
669 "quick" "The quick brown fox jumped quickly." 8)
670 @result{} 27
671@end group
672
673@group
674(match-end 0)
675 @result{} 32
676@end group
677@end example
678@end defun
679
680@defun looking-at regexp
681This function determines whether the text in the current buffer directly
682following point matches the regular expression @var{regexp}. ``Directly
683following'' means precisely that: the search is ``anchored'' and it can
684succeed only starting with the first character following point. The
685result is @code{t} if so, @code{nil} otherwise.
686
687This function does not move point, but it updates the match data, which
688you can access using @code{match-beginning} and @code{match-end}.
689@xref{Match Data}.
690
691In this example, point is located directly before the @samp{T}. If it
692were anywhere else, the result would be @code{nil}.
693
694@example
695@group
696---------- Buffer: foo ----------
697I read "@point{}The cat in the hat
698comes back" twice.
699---------- Buffer: foo ----------
700
701(looking-at "The cat in the hat$")
702 @result{} t
703@end group
704@end example
705@end defun
706
707@ignore
708@deffn Command delete-matching-lines regexp
709This function is identical to @code{delete-non-matching-lines}, save
710that it deletes what @code{delete-non-matching-lines} keeps.
711
712In the example below, point is located on the first line of text.
713
714@example
715@group
716---------- Buffer: foo ----------
717We hold these truths
718to be self-evident,
719that all men are created
720equal, and that they are
721---------- Buffer: foo ----------
722@end group
723
724@group
725(delete-matching-lines "the")
726 @result{} nil
727
728---------- Buffer: foo ----------
729to be self-evident,
730that all men are created
731---------- Buffer: foo ----------
732@end group
733@end example
734@end deffn
735
736@deffn Command flush-lines regexp
737This function is the same as @code{delete-matching-lines}.
738@end deffn
739
740@defun delete-non-matching-lines regexp
741This function deletes all lines following point which don't
742contain a match for the regular expression @var{regexp}.
743@end defun
744
745@deffn Command keep-lines regexp
746This function is the same as @code{delete-non-matching-lines}.
747@end deffn
748
749@deffn Command how-many regexp
750This function counts the number of matches for @var{regexp} there are in
751the current buffer following point. It prints this number in
752the echo area, returning the string printed.
753@end deffn
754
755@deffn Command count-matches regexp
756This function is a synonym of @code{how-many}.
757@end deffn
758
759@deffn Command list-matching-lines regexp nlines
760This function is a synonym of @code{occur}.
761Show all lines following point containing a match for @var{regexp}.
762Display each line with @var{nlines} lines before and after,
763or @code{-}@var{nlines} before if @var{nlines} is negative.
764@var{nlines} defaults to @code{list-matching-lines-default-context-lines}.
765Interactively it is the prefix arg.
766
767The lines are shown in a buffer named @samp{*Occur*}.
768It serves as a menu to find any of the occurrences in this buffer.
769@kbd{C-h m} (@code{describe-mode} in that buffer gives help.
770@end deffn
771
772@defopt list-matching-lines-default-context-lines
773Default value is 0.
774Default number of context lines to include around a @code{list-matching-lines}
775match. A negative number means to include that many lines before the match.
776A positive number means to include that many lines both before and after.
777@end defopt
778@end ignore
779
780@node Search and Replace
781@section Search and Replace
782@cindex replacement
783
784@defun perform-replace from-string replacements query-flag regexp-flag delimited-flag &optional repeat-count map
785This function is the guts of @code{query-replace} and related commands.
786It searches for occurrences of @var{from-string} and replaces some or
787all of them. If @var{query-flag} is @code{nil}, it replaces all
788occurrences; otherwise, it asks the user what to do about each one.
789
790If @var{regexp-flag} is non-@code{nil}, then @var{from-string} is
791considered a regular expression; otherwise, it must match literally. If
792@var{delimited-flag} is non-@code{nil}, then only replacements
793surrounded by word boundaries are considered.
794
795The argument @var{replacements} specifies what to replace occurrences
796with. If it is a string, that string is used. It can also be a list of
797strings, to be used in cyclic order.
798
799If @var{repeat-count} is non-@code{nil}, it should be an integer, the
800number of occurrences to consider. In this case, @code{perform-replace}
801returns after considering that many occurrences.
802
803Normally, the keymap @code{query-replace-map} defines the possible user
804responses. The argument @var{map}, if non-@code{nil}, is a keymap to
805use instead of @code{query-replace-map}.
806@end defun
807
808@defvar query-replace-map
809This variable holds a special keymap that defines the valid user
810responses for @code{query-replace} and related functions, as well as
811@code{y-or-n-p} and @code{map-y-or-n-p}. It is unusual in two ways:
812
813@itemize @bullet
814@item
815The ``key bindings'' are not commands, just symbols that are meaningful
816to the functions that use this map.
817
818@item
819Prefix keys are not supported; each key binding must be for a single event
820key sequence. This is because the functions don't use read key sequence to
821get the input; instead, they read a single event and look it up ``by hand.''
822@end itemize
823@end defvar
824
825Here are the meaningful ``bindings'' for @code{query-replace-map}.
826Several of them are meaningful only for @code{query-replace} and
827friends.
828
829@table @code
830@item act
831Do take the action being considered---in other words, ``yes.''
832
833@item skip
834Do not take action for this question---in other words, ``no.''
835
836@item exit
837Answer this question ``no,'' and don't ask any more.
838
839@item act-and-exit
840Answer this question ``yes,'' and don't ask any more.
841
842@item act-and-show
843Answer this question ``yes,'' but show the results---don't advance yet
844to the next question.
845
846@item automatic
847Answer this question and all subsequent questions in the series with
848``yes,'' without further user interaction.
849
850@item backup
851Move back to the previous place that a question was asked about.
852
853@item edit
854Enter a recursive edit to deal with this question---instead of any
855other action that would normally be taken.
856
857@item delete-and-edit
858Delete the text being considered, then enter a recursive edit to replace
859it.
860
861@item recenter
862Redisplay and center the window, then ask the same question again.
863
864@item quit
865Perform a quit right away. Only @code{y-or-n-p} and related functions
866use this answer.
867
868@item help
869Display some help, then ask again.
870@end table
871
872@node Match Data
873@section The Match Data
874@cindex match data
875
876 Emacs keeps track of the positions of the start and end of segments of
877text found during a regular expression search. This means, for example,
878that you can search for a complex pattern, such as a date in an Rmail
879message, and then extract parts of the match under control of the
880pattern.
881
882 Because the match data normally describe the most recent search only,
883you must be careful not to do another search inadvertently between the
884search you wish to refer back to and the use of the match data. If you
885can't avoid another intervening search, you must save and restore the
886match data around it, to prevent it from being overwritten.
887
888@menu
889* Simple Match Data:: Accessing single items of match data,
890 such as where a particular subexpression started.
891* Replacing Match:: Replacing a substring that was matched.
892* Entire Match Data:: Accessing the entire match data at once, as a list.
893* Saving Match Data:: Saving and restoring the match data.
894@end menu
895
896@node Simple Match Data
897@subsection Simple Match Data Access
898
899 This section explains how to use the match data to find the starting
900point or ending point of the text that was matched by a particular
901search, or by a particular parenthetical subexpression of a regular
902expression.
903
904@defun match-beginning count
905This function returns the position of the start of text matched by the
906last regular expression searched for, or a subexpression of it.
907
908The argument @var{count}, a number, specifies a subexpression whose
909start position is the value. If @var{count} is zero, then the value is
910the position of the text matched by the whole regexp. If @var{count} is
911greater than zero, then the value is the position of the beginning of
912the text matched by the @var{count}th subexpression.
913
914Subexpressions of a regular expression are those expressions grouped
915inside of parentheses, @samp{\(@dots{}\)}. The @var{count}th
916subexpression is found by counting occurrences of @samp{\(} from the
917beginning of the whole regular expression. The first subexpression is
918numbered 1, the second 2, and so on.
919
920The value is @code{nil} for a parenthetical grouping inside of a
921@samp{\|} alternative that wasn't used in the match.
922@end defun
923
924@defun match-end count
925This function returns the position of the end of the text that matched
926the last regular expression searched for, or a subexpression of it.
927This function is otherwise similar to @code{match-beginning}.
928@end defun
929
930 Here is an example of using the match data, with a comment showing the
931positions within the text:
932
933@example
934@group
935(string-match "\\(qu\\)\\(ick\\)"
936 "The quick fox jumped quickly.")
937 ;0123456789
938 @result{} 4
939@end group
940
941@group
942(match-beginning 1) ; @r{The beginning of the match}
943 @result{} 4 ; @r{with @samp{qu} is at index 4.}
944@end group
945
946@group
947(match-beginning 2) ; @r{The beginning of the match}
948 @result{} 6 ; @r{with @samp{ick} is at index 6.}
949@end group
950
951@group
952(match-end 1) ; @r{The end of the match}
953 @result{} 6 ; @r{with @samp{qu} is at index 6.}
954
955(match-end 2) ; @r{The end of the match}
956 @result{} 9 ; @r{with @samp{ick} is at index 9.}
957@end group
958@end example
959
960 Here is another example. Point is initially located at the beginning
961of the line. Searching moves point to between the space and the word
962@samp{in}. The beginning of the entire match is at the 9th character of
963the buffer (@samp{T}), and the beginning of the match for the first
964subexpression is at the 13th character (@samp{c}).
965
966@example
967@group
968(list
969 (re-search-forward "The \\(cat \\)")
970 (match-beginning 0)
971 (match-beginning 1))
972 @result{} (t 9 13)
973@end group
974
975@group
976---------- Buffer: foo ----------
977I read "The cat @point{}in the hat comes back" twice.
978 ^ ^
979 9 13
980---------- Buffer: foo ----------
981@end group
982@end example
983
984@noindent
985(In this case, the index returned is a buffer position; the first
986character of the buffer counts as 1.)
987
988@node Replacing Match
989@subsection Replacing the Text That Matched
990
991 This function replaces the text matched by the last search with
992@var{replacement}.
993
994@cindex case in replacements
995@defun replace-match replacement &optional fixedcase literal
996This function replaces the buffer text matched by the last search, with
997@var{replacement}. It applies only to buffers; you can't use
998@code{replace-match} to replace a substring found with
999@code{string-match}.
1000
1001If @var{fixedcase} is non-@code{nil}, then the case of the replacement
1002text is not changed; otherwise, the replacement text is converted to a
1003different case depending upon the capitalization of the text to be
1004replaced. If the original text is all upper case, the replacement text
a890e1b0
RS
1005is converted to upper case. If the first word of the original text is
1006capitalized, then the first word of the replacement text is capitalized.
1007If the original text contains just one word, and that word is a capital
1008letter, @code{replace-match} considers this a capitalized first word
1009rather than all upper case.
7015aca4
RS
1010
1011If @var{literal} is non-@code{nil}, then @var{replacement} is inserted
1012exactly as it is, the only alterations being case changes as needed.
1013If it is @code{nil} (the default), then the character @samp{\} is treated
1014specially. If a @samp{\} appears in @var{replacement}, then it must be
1015part of one of the following sequences:
1016
1017@table @asis
1018@item @samp{\&}
1019@cindex @samp{&} in replacement
1020@samp{\&} stands for the entire text being replaced.
1021
1022@item @samp{\@var{n}}
1023@cindex @samp{\@var{n}} in replacement
1024@samp{\@var{n}} stands for the text that matched the @var{n}th
1025subexpression in the original regexp. Subexpressions are those
1026expressions grouped inside of @samp{\(@dots{}\)}. @var{n} is a digit.
1027
1028@item @samp{\\}
1029@cindex @samp{\} in replacement
1030@samp{\\} stands for a single @samp{\} in the replacement text.
1031@end table
1032
1033@code{replace-match} leaves point at the end of the replacement text,
1034and returns @code{t}.
1035@end defun
1036
1037@node Entire Match Data
1038@subsection Accessing the Entire Match Data
1039
1040 The functions @code{match-data} and @code{set-match-data} read or
1041write the entire match data, all at once.
1042
1043@defun match-data
1044This function returns a newly constructed list containing all the
1045information on what text the last search matched. Element zero is the
1046position of the beginning of the match for the whole expression; element
1047one is the position of the end of the match for the expression. The
1048next two elements are the positions of the beginning and end of the
1049match for the first subexpression, and so on. In general, element
1050@ifinfo
1051number 2@var{n}
1052@end ifinfo
1053@tex
1054number {\mathsurround=0pt $2n$}
1055@end tex
1056corresponds to @code{(match-beginning @var{n})}; and
1057element
1058@ifinfo
1059number 2@var{n} + 1
1060@end ifinfo
1061@tex
1062number {\mathsurround=0pt $2n+1$}
1063@end tex
1064corresponds to @code{(match-end @var{n})}.
1065
1066All the elements are markers or @code{nil} if matching was done on a
1067buffer, and all are integers or @code{nil} if matching was done on a
1068string with @code{string-match}. (In Emacs 18 and earlier versions,
1069markers were used even for matching on a string, except in the case
1070of the integer 0.)
1071
1072As always, there must be no possibility of intervening searches between
1073the call to a search function and the call to @code{match-data} that is
1074intended to access the match data for that search.
1075
1076@example
1077@group
1078(match-data)
1079 @result{} (#<marker at 9 in foo>
1080 #<marker at 17 in foo>
1081 #<marker at 13 in foo>
1082 #<marker at 17 in foo>)
1083@end group
1084@end example
1085@end defun
1086
1087@defun set-match-data match-list
1088This function sets the match data from the elements of @var{match-list},
1089which should be a list that was the value of a previous call to
1090@code{match-data}.
1091
1092If @var{match-list} refers to a buffer that doesn't exist, you don't get
1093an error; that sets the match data in a meaningless but harmless way.
1094
1095@findex store-match-data
1096@code{store-match-data} is an alias for @code{set-match-data}.
1097@end defun
1098
1099@node Saving Match Data
1100@subsection Saving and Restoring the Match Data
1101
1102 All asynchronous process functions (filters and sentinels) and
1103functions that use @code{recursive-edit} should save and restore the
1104match data if they do a search or if they let the user type arbitrary
1105commands. Saving the match data is useful in other cases as
1106well---whenever you want to access the match data resulting from an
1107earlier search, notwithstanding another intervening search.
1108
1109 This example shows the problem that can arise if you fail to
1110attend to this requirement:
1111
1112@example
1113@group
1114(re-search-forward "The \\(cat \\)")
1115 @result{} 48
1116(foo) ; @r{Perhaps @code{foo} does}
1117 ; @r{more searching.}
1118(match-end 0)
1119 @result{} 61 ; @r{Unexpected result---not 48!}
1120@end group
1121@end example
1122
1123 In Emacs versions 19 and later, you can save and restore the match
1124data with @code{save-match-data}:
1125
1126@defspec save-match-data body@dots{}
1127This special form executes @var{body}, saving and restoring the match
1128data around it. This is useful if you wish to do a search without
1129altering the match data that resulted from an earlier search.
1130@end defspec
1131
1132 You can use @code{set-match-data} together with @code{match-data} to
1133imitate the effect of the special form @code{save-match-data}. This is
1134useful for writing code that can run in Emacs 18. Here is how:
1135
1136@example
1137@group
1138(let ((data (match-data)))
1139 (unwind-protect
1140 @dots{} ; @r{May change the original match data.}
1141 (set-match-data data)))
1142@end group
1143@end example
1144
1145@ignore
1146 Here is a function which restores the match data provided the buffer
1147associated with it still exists.
1148
1149@smallexample
1150@group
1151(defun restore-match-data (data)
1152@c It is incorrect to split the first line of a doc string.
1153@c If there's a problem here, it should be solved in some other way.
1154 "Restore the match data DATA unless the buffer is missing."
1155 (catch 'foo
1156 (let ((d data))
1157@end group
1158 (while d
1159 (and (car d)
1160 (null (marker-buffer (car d)))
1161@group
1162 ;; @file{match-data} @r{buffer is deleted.}
1163 (throw 'foo nil))
1164 (setq d (cdr d)))
1165 (set-match-data data))))
1166@end group
1167@end smallexample
1168@end ignore
1169
1170@node Searching and Case
1171@section Searching and Case
1172@cindex searching and case
1173
1174 By default, searches in Emacs ignore the case of the text they are
1175searching through; if you specify searching for @samp{FOO}, then
1176@samp{Foo} or @samp{foo} is also considered a match. Regexps, and in
1177particular character sets, are included: thus, @samp{[aB]} would match
1178@samp{a} or @samp{A} or @samp{b} or @samp{B}.
1179
1180 If you do not want this feature, set the variable
1181@code{case-fold-search} to @code{nil}. Then all letters must match
1182exactly, including case. This is a per-buffer-local variable; altering
1183the variable affects only the current buffer. (@xref{Intro to
1184Buffer-Local}.) Alternatively, you may change the value of
1185@code{default-case-fold-search}, which is the default value of
1186@code{case-fold-search} for buffers that do not override it.
1187
1188 Note that the user-level incremental search feature handles case
1189distinctions differently. When given a lower case letter, it looks for
1190a match of either case, but when given an upper case letter, it looks
1191for an upper case letter only. But this has nothing to do with the
1192searching functions Lisp functions use.
1193
1194@defopt case-replace
1195This variable determines whether @code{query-replace} should preserve
1196case in replacements. If the variable is @code{nil}, then
1197@code{replace-match} should not try to convert case.
1198@end defopt
1199
1200@defopt case-fold-search
1201This buffer-local variable determines whether searches should ignore
1202case. If the variable is @code{nil} they do not ignore case; otherwise
1203they do ignore case.
1204@end defopt
1205
1206@defvar default-case-fold-search
1207The value of this variable is the default value for
1208@code{case-fold-search} in buffers that do not override it. This is the
1209same as @code{(default-value 'case-fold-search)}.
1210@end defvar
1211
1212@node Standard Regexps
1213@section Standard Regular Expressions Used in Editing
1214@cindex regexps used standardly in editing
1215@cindex standard regexps used in editing
1216
1217 This section describes some variables that hold regular expressions
1218used for certain purposes in editing:
1219
1220@defvar page-delimiter
1221This is the regexp describing line-beginnings that separate pages. The
1222default value is @code{"^\014"} (i.e., @code{"^^L"} or @code{"^\C-l"}).
1223@end defvar
1224
1225@defvar paragraph-separate
1226This is the regular expression for recognizing the beginning of a line
1227that separates paragraphs. (If you change this, you may have to
1228change @code{paragraph-start} also.) The default value is @code{"^[
1229\t\f]*$"}, which is a line that consists entirely of spaces, tabs, and
1230form feeds.
1231@end defvar
1232
1233@defvar paragraph-start
1234This is the regular expression for recognizing the beginning of a line
1235that starts @emph{or} separates paragraphs. The default value is
1236@code{"^[ \t\n\f]"}, which matches a line starting with a space, tab,
1237newline, or form feed.
1238@end defvar
1239
1240@defvar sentence-end
1241This is the regular expression describing the end of a sentence. (All
1242paragraph boundaries also end sentences, regardless.) The default value
1243is:
1244
1245@example
1246"[.?!][]\"')@}]*\\($\\|\t\\| \\)[ \t\n]*"
1247@end example
1248
1249This means a period, question mark or exclamation mark, followed by a
1250closing brace, followed by tabs, spaces or new lines.
1251
1252For a detailed explanation of this regular expression, see @ref{Regexp
1253Example}.
1254@end defvar