@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990-1995, 1998-1999, 2001-2011
-@c Free Software Foundation, Inc.
+@c Copyright (C) 1990-1995, 1998-1999, 2001-2013 Free Software
+@c Foundation, Inc.
@c See the file elisp.texi for copying conditions.
-@setfilename ../../info/searching
-@node Searching and Matching, Syntax Tables, Non-ASCII Characters, Top
+@node Searching and Matching
@chapter Searching and Matching
@cindex searching
buffer. They are meant for use in programs, but you may call them
interactively. If you do so, they prompt for the search string; the
arguments @var{limit} and @var{noerror} are @code{nil}, and @var{repeat}
-is 1.
+is 1. For more details on interactive searching, @pxref{Search,,
+Searching and Replacement, emacs, The GNU Emacs Manual}.
These search functions convert the search string to multibyte if the
buffer is multibyte; they convert the search string to unibyte if the
@var{string}. If successful, it sets point to the end of the occurrence
found, and returns the new value of point. If no match is found, the
value and side effects depend on @var{noerror} (see below).
-@c Emacs 19 feature
In the following example, point is initially at the beginning of the
line. Then @code{(search-forward "fox")} moves point after the last
@end group
@end example
-The argument @var{limit} specifies the upper bound to the search. (It
-must be a position in the current buffer.) No match extending after
+The argument @var{limit} specifies the bound to the search, and should
+be a position in the current buffer. No match extending after
that position is accepted. If @var{limit} is omitted or @code{nil}, it
defaults to the end of the accessible portion of the buffer.
error is signaled. If @var{noerror} is @code{t}, @code{search-forward}
returns @code{nil} and does nothing. If @var{noerror} is neither
@code{nil} nor @code{t}, then @code{search-forward} moves point to the
-upper bound and returns @code{nil}. (It would be more consistent now to
-return the new position of point in that case, but some existing
-programs may depend on a value of @code{nil}.)
+upper bound and returns @code{nil}.
+@c I see no prospect of this ever changing, and frankly the current
+@c behavior seems better, so there seems no need to mention this.
+@ignore
+(It would be more consistent now to return the new position of point
+in that case, but some existing programs may depend on a value of
+@code{nil}.)
+@end ignore
The argument @var{noerror} only affects valid searches which fail to
find a match. Invalid arguments cause errors regardless of
@var{noerror}.
-If @var{repeat} is supplied (it must be a positive number), then the
-search is repeated that many times (each time starting at the end of the
-previous time's match). If these successive searches succeed, the
-function succeeds, moving point and returning its new value. Otherwise
-the search fails, with results depending on the value of
-@var{noerror}, as described above.
+If @var{repeat} is a positive number @var{n}, it serves as a repeat
+count: the search is repeated @var{n} times, each time starting at the
+end of the previous time's match. If these successive searches
+succeed, the function succeeds, moving point and returning its new
+value. Otherwise the search fails, with results depending on the
+value of @var{noerror}, as described above. If @var{repeat} is a
+negative number -@var{n}, it serves as a repeat count of @var{n} for a
+search in the opposite (backward) direction.
@end deffn
@deffn Command search-backward string &optional limit noerror repeat
This function searches backward from point for @var{string}. It is
-just like @code{search-forward} except that it searches backwards and
-leaves point at the beginning of the match.
+like @code{search-forward}, except that it searches backwards rather
+than forwards. Backward searches leave point at the beginning of the
+match.
@end deffn
@deffn Command word-search-forward string &optional limit noerror repeat
@group
(word-search-forward "Please find the ball, boy.")
- @result{} 35
+ @result{} 36
---------- Buffer: foo ----------
He said "Please! Find
If @var{repeat} is non-@code{nil}, then the search is repeated that many
times. Point is positioned at the end of the last match.
+
+@findex word-search-regexp
+Internal, @code{word-search-forward} and related functions use the
+function @code{word-search-regexp} to convert @var{string} to a
+regular expression that ignores punctuation.
@end deffn
@deffn Command word-search-forward-lax string &optional limit noerror repeat
This command is identical to @code{word-search-forward}, except that
-the end of @code{string} need not match a word boundary unless it ends
+the end of @var{string} need not match a word boundary, unless @var{string} ends
in whitespace. For instance, searching for @samp{ball boy} matches
@samp{ball boyee}, but does not match @samp{aball boy}.
@end deffn
@deffn Command word-search-backward-lax string &optional limit noerror repeat
This command is identical to @code{word-search-backward}, except that
-the end of @code{string} need not match a word boundary unless it ends
+the end of @var{string} need not match a word boundary, unless @var{string} ends
in whitespace.
@end deffn
@code{case-fold-search} to @code{nil}. Then all letters must match
exactly, including case. This is a buffer-local variable; altering the
variable affects only the current buffer. (@xref{Intro to
-Buffer-Local}.) Alternatively, you may change the default value of
-@code{case-fold-search}.
+Buffer-Local}.) Alternatively, you may change the default value.
+In Lisp code, you will more typically use @code{let} to bind
+@code{case-fold-search} to the desired value.
Note that the user-level incremental search feature handles case
distinctions differently. When the search string contains only lower
case letters, the search ignores case, but when the search string
contains one or more upper case letters, the search becomes
case-sensitive. But this has nothing to do with the searching
-functions used in Lisp code.
+functions used in Lisp code. @xref{Incremental Search,,, emacs,
+The GNU Emacs Manual}.
@defopt case-fold-search
This buffer-local variable determines whether searches should ignore
case. If the variable is @code{nil} they do not ignore case; otherwise
-they do ignore case.
+(and by default) they do ignore case.
@end defopt
@defopt case-replace
-This variable determines whether the higher level replacement
+This variable determines whether the higher-level replacement
functions should preserve case. If the variable is @code{nil}, that
means to use the replacement text verbatim. A non-@code{nil} value
means to convert the case of the replacement text according to the
@findex re-builder
@cindex regular expressions, developing
- For convenient interactive development of regular expressions, you
+ For interactive development of regular expressions, you
can use the @kbd{M-x re-builder} command. It provides a convenient
interface for creating regular expressions, by giving immediate visual
feedback in a separate buffer. As you edit the regexp, all its
expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating
@samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
+@cindex backtracking and regular expressions
The matcher processes a @samp{*} construct by matching, immediately, as
many repetitions as can be found. Then it continues with the rest of
the pattern. If that fails, backtracking occurs, discarding some of the
@samp{[a-z$%.]}, which matches any lower case @acronym{ASCII} letter
or @samp{$}, @samp{%} or period.
-Note that the usual regexp special characters are not special inside a
+If @code{case-fold-search} is non-@code{nil}, @samp{[a-z]} also
+matches upper-case letters. Note that a range like @samp{[a-z]} is
+not affected by the locale's collation sequence, it always represents
+a sequence in @acronym{ASCII} order.
+@c This wasn't obvious to me, since, e.g., the grep manual "Character
+@c Classes and Bracket Expressions" specifically notes the opposite
+@c behavior. But by experiment Emacs seems unaffected by LC_COLLATE
+@c in this regard.
+
+Note also that the usual regexp special characters are not special inside a
character alternative. A completely different set of characters is
special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}.
To include a @samp{-}, write @samp{-} as the first or last character of
the character alternative, or put it after a range. Thus, @samp{[]-]}
-matches both @samp{]} and @samp{-}.
+matches both @samp{]} and @samp{-}. (As explained below, you cannot
+use @samp{\]} to include a @samp{]} inside a character alternative,
+since @samp{\} is not special there.)
To include @samp{^} in a character alternative, put it anywhere but at
the beginning.
+@c What if it starts with a multibyte and ends with a unibyte?
+@c That doesn't seem to match anything...?
If a range starts with a unibyte character @var{c} and ends with a
multibyte character @var{c2}, the range is divided into two parts: one
-is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where
-@var{c1} is the first character of the charset to which @var{c2}
-belongs.
+spans the unibyte characters @samp{@var{c}..?\377}, the other the
+multibyte characters @samp{@var{c1}..@var{c2}}, where @var{c1} is the
+first character of the charset to which @var{c2} belongs.
A character alternative can also specify named character classes
-(@pxref{Char Classes}). This is a POSIX feature whose syntax is
-@samp{[:@var{class}:]}. Using a character class is equivalent to
-mentioning each of the characters in that class; but the latter is not
-feasible in practice, since some classes include thousands of
-different characters.
+(@pxref{Char Classes}). This is a POSIX feature. For example,
+@samp{[[:ascii:]]} matches any @acronym{ASCII} character.
+Using a character class is equivalent to mentioning each of the
+characters in that class; but the latter is not feasible in practice,
+since some classes include thousands of different characters.
@item @samp{[^ @dots{} ]}
@cindex @samp{^} in regexp
@node Regexp Backslash
@subsubsection Backslash Constructs in Regular Expressions
+@cindex backslash in regular expressions
For the most part, @samp{\} followed by any character matches only
that character. However, there are several exceptions: certain
their number implicitly, based on their position, which can be
inconvenient. This construct allows you to force a particular group
number. There is no particular restriction on the numbering,
-e.g.@: you can have several groups with the same number in which case
-the last one to match (i.e.@: the rightmost match) will win.
+e.g., you can have several groups with the same number in which case
+the last one to match (i.e., the rightmost match) will win.
Implicitly numbered groups always get the smallest integer larger than
the one of any previous group.
@cindex @samp{\S} in regexp
matches any character whose syntax is not @var{code}.
+@cindex category, regexp search for
@item \c@var{c}
matches any character whose category is @var{c}. Here @var{c} is a
character that represents a category: thus, @samp{c} for Chinese
characters or @samp{g} for Greek characters in the standard category
-table.
+table. You can see the list of all the currently defined categories
+with @kbd{M-x describe-categories @key{RET}}. You can also define
+your own categories in addition to the standard ones using the
+@code{define-category} function (@pxref{Categories}).
@item \C@var{c}
matches any character whose category is not @var{c}.
@kindex invalid-regexp
Not every string is a valid regular expression. For example, a string
-that ends inside a character alternative without terminating @samp{]}
+that ends inside a character alternative without a terminating @samp{]}
is invalid, and so is a string that ends with a single @samp{\}. If
an invalid regular expression is passed to any of the search functions,
an @code{invalid-regexp} error is signaled.
@node Regexp Example
-@comment node-name, next, previous, up
@subsection Complex Regexp Example
Here is a complicated regexp which was formerly used by Emacs to
regexp constructed by the function @code{sentence-end}.
@xref{Standard Regexps}.)
- First, we show the regexp as a string in Lisp syntax to distinguish
-spaces from tab characters. The string constant begins and ends with a
+ Below, we show first the regexp as a string in Lisp syntax (to
+distinguish spaces from tab characters), and then the result of
+evaluating it. The string constant begins and ends with a
double-quote. @samp{\"} stands for a double-quote as part of the
string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
tab and @samp{\n} for a newline.
-@example
-"[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*"
-@end example
-
-@noindent
-In contrast, if you evaluate this string, you will see the following:
-
@example
@group
"[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*"
@end example
@noindent
-In this output, tab and newline appear as themselves.
+In the output, tab and newline appear as themselves.
This regular expression contains four parts in succession and can be
deciphered as follows:
@var{string}. Using this regular expression in @code{looking-at} will
succeed only if the next characters in the buffer are @var{string};
using it in a search function will succeed if the text being searched
-contains @var{string}.
+contains @var{string}. @xref{Regexp Search}.
This allows you to request an exact string match or search when calling
a function that wants a regular expression.
This function returns an efficient regular expression that will match
any of the strings in the list @var{strings}. This is useful when you
need to make matching or searching as fast as possible---for example,
-for Font Lock mode.
+for Font Lock mode@footnote{Note that @code{regexp-opt} does not
+guarantee that its result is absolutely the most efficient form
+possible. A hand-tuned regular expression can sometimes be slightly
+more efficient, but is almost never worth the effort.}.
+@c E.g., see http://debbugs.gnu.org/2816
If the optional argument @var{paren} is non-@code{nil}, then the
returned regular expression is always enclosed by at least one
(but not as efficient):
@example
-(defun regexp-opt (strings paren)
+(defun regexp-opt (strings &optional paren)
(let ((open-paren (if paren "\\(" ""))
(close-paren (if paren "\\)" "")))
(concat open-paren
shy groups (@pxref{Regexp Backslash}).
@end defun
+@c Supposedly an internal regexp-opt function, but table.el uses it at least.
+@defun regexp-opt-charset chars
+This function returns a regular expression matching a character in the
+list of characters @var{chars}.
+
+@example
+(regexp-opt-charset '(?a ?b ?c ?d ?e))
+ @result{} "[a-e]"
+@end example
+@end defun
+
+@c Internal functions: regexp-opt-group
+
@node Regexp Search
@section Regular Expression Searching
@cindex regular expression searching
succeed only starting with the first character following point. The
result is @code{t} if so, @code{nil} otherwise.
-This function does not move point, but it updates the match data, which
-you can access using @code{match-beginning} and @code{match-end}.
+This function does not move point, but it does update the match data.
@xref{Match Data}. If you need to test for a match without modifying
the match data, use @code{looking-at-p}, described below.
@end defun
@defun looking-back regexp &optional limit greedy
-This function returns @code{t} if @var{regexp} matches text before
-point, ending at point, and @code{nil} otherwise.
+This function returns @code{t} if @var{regexp} matches the text
+immediately before point (i.e., ending at point), and @code{nil} otherwise.
Because regular expression matching works only going forward, this is
implemented by searching backwards from point for a match that ends at
@result{} nil
@end group
@end example
+
+@c http://debbugs.gnu.org/5689
+As a general recommendation, try to avoid using @code{looking-back}
+wherever possible, since it is slow. For this reason, there are no
+plans to add a @code{looking-back-p} function.
@end defun
@defun looking-at-p regexp
@node POSIX Regexps
@section POSIX Regular Expression Searching
+@cindex backtracking and POSIX regular expressions
The usual regular expression functions do backtracking when necessary
to handle the @samp{\|} and repetition constructs, but they continue
this only until they find @emph{some} match. Then they succeed and
full backtracking specified by the POSIX standard for regular expression
matching. They continue backtracking until they have tried all
possibilities and found all matches, so they can report the longest
-match, as required by POSIX. This is much slower, so use these
+match, as required by POSIX@. This is much slower, so use these
functions only when you really need the longest match.
The POSIX search and match functions do not properly support the
can't avoid another intervening search, you must save and restore the
match data around it, to prevent it from being overwritten.
+ Notice that all functions are allowed to overwrite the match data
+unless they're explicitly documented not to do so. A consequence is
+that functions that are run implicitly in the background
+(@pxref{Timers}, and @ref{Idle Timers}) should likely save and restore
+the match data explicitly.
+
@menu
* Replacing Match:: Replacing a substring that was matched.
* Simple Match Data:: Accessing single items of match data,
@cindex case in replacements
@defun replace-match replacement &optional fixedcase literal string subexp
-This function replaces the text in the buffer (or in @var{string}) that
-was matched by the last search. It replaces that text with
-@var{replacement}.
+This function performs a replacement operation on a buffer or string.
-If you did the last search in a buffer, you should specify @code{nil}
-for @var{string} and make sure that the current buffer when you call
-@code{replace-match} is the one in which you did the searching or
-matching. Then @code{replace-match} does the replacement by editing
-the buffer; it leaves point at the end of the replacement text, and
-returns @code{t}.
+If you did the last search in a buffer, you should omit the
+@var{string} argument or specify @code{nil} for it, and make sure that
+the current buffer is the one in which you performed the last search.
+Then this function edits the buffer, replacing the matched text with
+@var{replacement}. It leaves point at the end of the replacement
+text, and returns @code{t}.
-If you did the search in a string, pass the same string as @var{string}.
-Then @code{replace-match} does the replacement by constructing and
-returning a new string.
+If you performed the last search on a string, pass the same string as
+@var{string}. Then this function returns a new string, in which the
+matched text is replaced by @var{replacement}.
If @var{fixedcase} is non-@code{nil}, then @code{replace-match} uses
the replacement text without case conversion; otherwise, it converts
@table @asis
@item @samp{\&}
@cindex @samp{&} in replacement
-@samp{\&} stands for the entire text being replaced.
+This stands for the entire text being replaced.
-@item @samp{\@var{n}}
+@item @samp{\@var{n}}, where @var{n} is a digit
@cindex @samp{\@var{n}} in replacement
-@samp{\@var{n}}, where @var{n} is a digit, stands for the text that
-matched the @var{n}th subexpression in the original regexp.
-Subexpressions are those expressions grouped inside @samp{\(@dots{}\)}.
-If the @var{n}th subexpression never matched, an empty string is substituted.
+This stands for the text that matched the @var{n}th subexpression in
+the original regexp. Subexpressions are those expressions grouped
+inside @samp{\(@dots{}\)}. If the @var{n}th subexpression never
+matched, an empty string is substituted.
@item @samp{\\}
@cindex @samp{\} in replacement
-@samp{\\} stands for a single @samp{\} in the replacement text.
+This stands for a single @samp{\} in the replacement text.
+
+@item @samp{\?}
+This stands for itself (for compatibility with @code{replace-regexp}
+and related commands; @pxref{Regexp Replacement,,, emacs, The GNU
+Emacs Manual}).
@end table
-These substitutions occur after case conversion, if any,
-so the strings they substitute are never case-converted.
+@noindent
+Any other character following @samp{\} signals an error.
+
+The substitutions performed by @samp{\&} and @samp{\@var{n}} occur
+after case conversion, if any. Therefore, the strings they substitute
+are never case-converted.
If @var{subexp} is non-@code{nil}, that says to replace just
subexpression number @var{subexp} of the regexp that was matched, not
query the match data immediately after searching, before calling any
other function that might perform another search. Alternatively, you
may save and restore the match data (@pxref{Saving Match Data}) around
-the call to functions that could perform another search.
+the call to functions that could perform another search. Or use the
+functions that explicitly do not modify the match data;
+e.g., @code{string-match-p}.
+@c This is an old comment and presumably there is no prospect of this
+@c changing now. But still the advice stands.
A search which fails may or may not alter the match data. In the
-past, a failing search did not do this, but we may change it in the
-future. So don't try to rely on the value of the match data after
-a failing search.
+current implementation, it does not, but we may change it in the
+future. Don't try to rely on the value of the match data after a
+failing search.
@defun match-string count &optional in-string
This function returns, as a string, the text matched in the last search
you should omit @var{in-string} or pass @code{nil} for it; but you
should make sure that the current buffer when you call
@code{match-string} is the one in which you did the searching or
-matching.
+matching. Failure to follow this advice will lead to incorrect results.
The value is @code{nil} if @var{count} is out of range, or for a
subexpression inside a @samp{\|} alternative that wasn't used or a
@end defun
@defun match-beginning count
-This function returns the position of the start of text matched by the
+This function returns the position of the start of the text matched by the
last regular expression searched for, or a subexpression of it.
If @var{count} is zero, then the value is the position of the start of
@defun match-data &optional integers reuse reseat
This function returns a list of positions (markers or integers) that
-record all the information on what text the last search matched.
+record all the information on the text that the last search matched.
Element zero is the position of the beginning of the match for the
whole expression; element one is the position of the end of the match
for the expression. The next two elements are the positions of the
If @var{reseat} is non-@code{nil}, all markers on the @var{match-list} list
are reseated to point to nowhere.
+@c TODO Make it properly obsolete.
@findex store-match-data
@code{store-match-data} is a semi-obsolete alias for @code{set-match-data}.
@end defun
@node Saving Match Data
@subsection Saving and Restoring the Match Data
- When you call a function that may do a search, you may need to save
+ When you call a function that may search, you may need to save
and restore the match data around that call, if you want to preserve the
match data from an earlier search for later use. Here is an example
that shows the problem that arises if you fail to save the match data:
@group
(re-search-forward "The \\(cat \\)")
@result{} 48
-(foo) ; @r{Perhaps @code{foo} does}
- ; @r{more searching.}
+(foo) ; @r{@code{foo} does more searching.}
(match-end 0)
@result{} 61 ; @r{Unexpected result---not 48!}
@end group
@code{replace-regexp-in-string} calls @var{rep} for each match,
passing the text of the match as its sole argument. It collects the
value @var{rep} returns and passes that to @code{replace-match} as the
-replacement string. The match-data at this point are the result
+replacement string. The match data at this point are the result
of matching @var{regexp} against a substring of @var{string}.
@end defun
If @var{from-string} contains upper-case letters, then
@code{perform-replace} binds @code{case-fold-search} to @code{nil}, and
-it uses the @code{replacements} without altering the case of them.
+it uses the @var{replacements} without altering their case.
Normally, the keymap @code{query-replace-map} defines the possible
user responses for queries. The argument @var{map}, if
Prefix keys are not supported; each key binding must be for a
single-event key sequence. This is because the functions don't use
@code{read-key-sequence} to get the input; instead, they read a single
-event and look it up ``by hand.''
+event and look it up ``by hand''.
@end itemize
@end defvar
@table @code
@item act
-Do take the action being considered---in other words, ``yes.''
+Do take the action being considered---in other words, ``yes''.
@item skip
-Do not take action for this question---in other words, ``no.''
+Do not take action for this question---in other words, ``no''.
@item exit
-Answer this question ``no,'' and give up on the entire series of
-questions, assuming that the answers will be ``no.''
+Answer this question ``no'', and give up on the entire series of
+questions, assuming that the answers will be ``no''.
+
+@item exit-prefix
+Like @code{exit}, but add the key that was pressed to
+@code{unread-comment-events}.
@item act-and-exit
-Answer this question ``yes,'' and give up on the entire series of
-questions, assuming that subsequent answers will be ``no.''
+Answer this question ``yes'', and give up on the entire series of
+questions, assuming that subsequent answers will be ``no''.
@item act-and-show
-Answer this question ``yes,'' but show the results---don't advance yet
+Answer this question ``yes'', but show the results---don't advance yet
to the next question.
@item automatic
Answer this question and all subsequent questions in the series with
-``yes,'' without further user interaction.
+``yes'', without further user interaction.
@item backup
Move back to the previous place that a question was asked about.
Enter a recursive edit to deal with this question---instead of any
other action that would normally be taken.
+@item edit-replacement
+Edit the replacement for this question in the minibuffer.
+
@item delete-and-edit
Delete the text being considered, then enter a recursive edit to replace
it.
@item recenter
-Redisplay and center the window, then ask the same question again.
+@itemx scroll-up
+@itemx scroll-down
+@itemx scroll-other-window
+@itemx scroll-other-window-down
+Perform the specified window scroll operation, then ask the same
+question again. Only @code{y-or-n-p} and related functions use this
+answer.
@item quit
Perform a quit right away. Only @code{y-or-n-p} and related functions
@defvar multi-query-replace-map
This variable holds a keymap that extends @code{query-replace-map} by
providing additional keybindings that are useful in multi-buffer
-replacements.
+replacements. The additional ``bindings'' are:
+
+@table @code
+@item automatic-all
+Answer this question and all subsequent questions in the series with
+``yes'', without further user interaction, for all remaining buffers.
+
+@item exit-current
+Answer this question ``no'', and give up on the entire series of
+questions for the current buffer. Continue to the next buffer in the
+sequence.
+@end table
@end defvar
@defvar replace-search-function
the end of a sentence, including the whitespace following the
sentence. (All paragraph boundaries also end sentences, regardless.)
-If the value is @code{nil}, the default, then the function
-@code{sentence-end} has to construct the regexp. That is why you
+If the value is @code{nil}, as it is by default, then the function
+@code{sentence-end} constructs the regexp. That is why you
should always call the function @code{sentence-end} to obtain the
regexp to be used to recognize the end of a sentence.
@end defopt
if non-@code{nil}. Otherwise it returns a default value based on the
values of the variables @code{sentence-end-double-space}
(@pxref{Definition of sentence-end-double-space}),
-@code{sentence-end-without-period} and
+@code{sentence-end-without-period}, and
@code{sentence-end-without-space}.
@end defun