@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999
+@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2004
@c Free Software Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@setfilename ../info/searching
@end group
@end example
-If @var{limit} is non-@code{nil} (it must be a position in the current
-buffer), then it is the upper bound to the search. The match found must
-not extend after that position.
+If @var{limit} is non-@code{nil}, it must be a position in the current
+buffer; it specifies the upper bound to the search. The match found
+must not extend after that position.
If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
an error if the search fails. If @var{noerror} is @code{t}, then it
in both multibyte and unibyte representations, because only the
@acronym{ASCII} characters are excluded.
-Starting in Emacs 21, a character alternative can also specify named
+A character alternative can also specify named
character classes (@pxref{Char Classes}). This is a POSIX feature whose
syntax is @samp{[:@var{class}:]}. Using a character class is equivalent
to mentioning each of the characters in that class; but the latter is
@cindex character classes in regexp
Here is a table of the classes you can use in a character alternative,
-in Emacs 21, and what they mean:
+and what they mean:
@table @samp
@item [:ascii:]
Here is a complicated regexp which was formerly used by Emacs to
recognize the end of a sentence together with any whitespace that
-follows. It was used as the variable @code{sentence-end}. (Its value
-nowadays contains alternatives for @samp{.}, @samp{?} and @samp{!} in
-other character sets.)
+follows. (Nowadays Emacs uses a similar but more complex default
+regexp constructed by the function @code{sentence-end}.
+@xref{Standard Regexps}.)
First, we show the regexp as a string in Lisp syntax to distinguish
spaces from tab characters. The string constant begins and ends with a
The first part of the pattern is a character alternative that matches
any one of three characters: period, question mark, and exclamation
mark. The match must begin with one of these three characters. (This
-is the one point where the new value of @code{sentence-end} differs
-from the old. The new value also lists sentence ending
-non-@acronym{ASCII} characters.)
+is one point where the new default regexp used by Emacs differs from
+the old. The new value also allows some non-@acronym{ASCII}
+characters that end a sentence without any following whitespace.)
@item []\"')@}]*
The second part of the pattern matches any closing braces and quotation
@var{regexp}, and leaves point at the end of the first match found.
It returns the new value of point.
-If @var{limit} is non-@code{nil} (it must be a position in the current
-buffer), then it is the upper bound to the search. No match extending
-after that position is accepted.
+If @var{limit} is non-@code{nil}, it must be a position in the current
+buffer. It specifies the upper bound to the search. No match
+extending after that position is accepted.
-If @var{repeat} is supplied (it must be a positive number), then the
-search is repeated that many times (each time starting at the end of the
-previous time's match). If all these successive searches succeed, the
-function succeeds, moving point and returning its new value. Otherwise
-the function fails.
+If @var{repeat} is supplied, it must be a positive number; the search
+is repeated that many times; each repetition starts at the end of the
+previous match. If all these successive searches succeed, the
+function succeeds, moving point and returning its new value.
+Otherwise the function fails.
What happens when the function fails depends on the value of
@var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
@end example
@end defun
+@defun looking-back regexp &optional limit
+This function returns @code{t} if @var{regexp} matches text before
+point, ending at point, and @code{nil} otherwise.
+
+Because regular expression matching works only going forward, this is
+implemented by searching backwards from point for a match that ends at
+point. That can be quite slow if it has to search a long distance.
+You can bound the time required by specifying @var{limit}, which says
+not to search before @var{limit}. In this case, the match that is
+found must begin at or after @var{limit}.
+
+@example
+@group
+---------- Buffer: foo ----------
+I read "@point{}The cat in the hat
+comes back" twice.
+---------- Buffer: foo ----------
+
+(looking-back "read \"" 3)
+ @result{} t
+(looking-back "read \"" 4)
+ @result{} nil
+@end group
+@end example
+@end defun
+
+@defvar search-spaces-regexp
+If this variable is non-@code{nil}, it should be a regular expression
+that says how to search for whitespace. In that case, any group of
+spaces in a regular expression being searched for stands for use of
+this regular expression. However, spaces inside of constructs such as
+@samp{[@dots{}]} and @samp{*}, @samp{+}, @samp{?} are not affected by
+@code{search-spaces-regexp}.
+
+Since this variable affects all regular expression search and match
+constructs, you should bind it temporarily for as small as possible
+a part of the code.
+@end defvar
+
@node POSIX Regexps
@section POSIX Regular Expression Searching
write the entire match data, all at once.
@defun match-data &optional integers reuse
-This function returns a newly constructed list containing all the
-information on what text the last search matched. Element zero is the
-position of the beginning of the match for the whole expression; element
-one is the position of the end of the match for the expression. The
-next two elements are the positions of the beginning and end of the
-match for the first subexpression, and so on. In general, element
+This function returns a list of positions (markers or integers) that
+record all the information on what text the last search matched.
+Element zero is the position of the beginning of the match for the
+whole expression; element one is the position of the end of the match
+for the expression. The next two elements are the positions of the
+beginning and end of the match for the first subexpression, and so on.
+In general, element
@ifnottex
number 2@var{n}
@end ifnottex
@end tex
corresponds to @code{(match-end @var{n})}.
-All the elements are markers or @code{nil} if matching was done on a
-buffer and all are integers or @code{nil} if matching was done on a
-string with @code{string-match}. If @var{integers} is
-non-@code{nil}, then the elements are integers or @code{nil}, even if
-matching was done on a buffer. In that case, the buffer itself is
-appended as an additional element at the end of the list
-to facilitate complete restoration of the match data. Also,
-@code{match-beginning} and
-@code{match-end} always return integers or @code{nil}.
+Normally all the elements are markers or @code{nil}, but if
+@var{integers} is non-@code{nil}, that means to use integers instead
+of markers. (In that case, the buffer itself is appended as an
+additional element at the end of the list, to facilitate complete
+restoration of the match data.) If the last match was done on a
+string with @code{string-match}, then integers are always used,
+since markers can't point into a string.
If @var{reuse} is non-@code{nil}, it should be a list. In that case,
@code{match-data} stores the match data in @var{reuse}. That is,
have the right length. If it is not long enough to contain the match
data, it is extended. If it is too long, the length of @var{reuse}
stays the same, but the elements that were not used are set to
-@code{nil}. The purpose of this feature is to avoid producing too
-much garbage, that would later have to be collected.
+@code{nil}. The purpose of this feature is to reduce the need for
+garbage collection.
As always, there must be no possibility of intervening searches between
the call to a search function and the call to @code{match-data} that is
@end defvar
@defvar sentence-end
-This is the regular expression describing the end of a sentence. (All
-paragraph boundaries also end sentences, regardless.) The (slightly
-simplified) default value is:
-
-@example
-"[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*"
-@end example
-
-This means a period, question mark or exclamation mark (the actual
-default value also lists their alternatives in other character sets),
-followed optionally by closing parenthetical characters, followed by
-tabs, spaces or new lines.
-
-For a detailed explanation of this regular expression, see @ref{Regexp
-Example}.
+If non-@code{nil}, the value should be a regular expression describing
+the end of a sentence, including the whitespace following the
+sentence. (All paragraph boundaries also end sentences, regardless.)
+
+If the value is @code{nil}, the default, then the function
+@code{sentence-end} has to construct the regexp. That is why you
+should always call the function @code{sentence-end} to obtain the
+regexp to be used to recognize the end of a sentence.
@end defvar
+@defun sentence-end
+This function returns the value of the variable @code{sentence-end},
+if non-@code{nil}. Otherwise it returns a default value based on the
+values of the variables @code{sentence-end-double-space}
+(@pxref{Definition of sentence-end-double-space}),
+@code{sentence-end-without-period} and
+@code{sentence-end-without-space}.
+@end defun
+
@ignore
arch-tag: c2573ca2-18aa-4839-93b8-924043ef831f
@end ignore