(Entire Match Data): Clarify when match-data

[bpt/emacs.git] / lispref / searching.texi
diff --git a/lispref/searching.texi b/lispref/searching.texi

index fd0d0e1..01d055c 100644 (file)
--- a/lispref/searching.texi
+++ b/lispref/searching.texi
@@ -1,6 +1,6 @@
  @c -*-texinfo-*-
  @c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999
+@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2004
  @c   Free Software Foundation, Inc.
  @c See the file elisp.texi for copying conditions.
  @setfilename ../info/searching
@@ -136,9 +136,9 @@ the ball boy@point{}!"
  @end group
  @end example
  
-If @var{limit} is non-@code{nil} (it must be a position in the current
-buffer), then it is the upper bound to the search.  The match found must
-not extend after that position.
+If @var{limit} is non-@code{nil}, it must be a position in the current
+buffer; it specifies the upper bound to the search.  The match found
+must not extend after that position.
  
  If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
  an error if the search fails.  If @var{noerror} is @code{t}, then it
@@ -338,7 +338,7 @@ does match all non-@acronym{ASCII} characters (see below regarding @samp{^}),
  in both multibyte and unibyte representations, because only the
  @acronym{ASCII} characters are excluded.
  
-Starting in Emacs 21, a character alternative can also specify named
+A character alternative can also specify named
  character classes (@pxref{Char Classes}).  This is a POSIX feature whose
  syntax is @samp{[:@var{class}:]}.  Using a character class is equivalent
  to mentioning each of the characters in that class; but the latter is
@@ -416,7 +416,7 @@ special character anyway, regardless of where it appears.@refill
  @cindex character classes in regexp
  
    Here is a table of the classes you can use in a character alternative,
-in Emacs 21, and what they mean:
+and what they mean:
  
  @table @samp
  @item [:ascii:]
@@ -694,9 +694,9 @@ an @code{invalid-regexp} error is signaled.
  
    Here is a complicated regexp which was formerly used by Emacs to
  recognize the end of a sentence together with any whitespace that
-follows.  It was used as the variable @code{sentence-end}.  (Its value
-nowadays contains alternatives for @samp{.}, @samp{?} and @samp{!} in
-other character sets.)
+follows.  (Nowadays Emacs uses a similar but more complex default
+regexp constructed by the function @code{sentence-end}.
+@xref{Standard Regexps}.)
  
    First, we show the regexp as a string in Lisp syntax to distinguish
  spaces from tab characters.  The string constant begins and ends with a
@@ -730,9 +730,9 @@ deciphered as follows:
  The first part of the pattern is a character alternative that matches
  any one of three characters: period, question mark, and exclamation
  mark.  The match must begin with one of these three characters.  (This
-is the one point where the new value of @code{sentence-end} differs
-from the old.  The new value also lists sentence ending
-non-@acronym{ASCII} characters.)
+is one point where the new default regexp used by Emacs differs from
+the old.  The new value also allows some non-@acronym{ASCII}
+characters that end a sentence without any following whitespace.)
  
  @item []\"')@}]*
  The second part of the pattern matches any closing braces and quotation
@@ -844,15 +844,15 @@ function skips over any amount of text that is not matched by
  @var{regexp}, and leaves point at the end of the first match found.
  It returns the new value of point.
  
-If @var{limit} is non-@code{nil} (it must be a position in the current
-buffer), then it is the upper bound to the search.  No match extending
-after that position is accepted.
+If @var{limit} is non-@code{nil}, it must be a position in the current
+buffer.  It specifies the upper bound to the search.  No match
+extending after that position is accepted.
  
-If @var{repeat} is supplied (it must be a positive number), then the
-search is repeated that many times (each time starting at the end of the
-previous time's match).  If all these successive searches succeed, the
-function succeeds, moving point and returning its new value.  Otherwise
-the function fails.
+If @var{repeat} is supplied, it must be a positive number; the search
+is repeated that many times; each repetition starts at the end of the
+previous match.  If all these successive searches succeed, the
+function succeeds, moving point and returning its new value.
+Otherwise the function fails.
  
  What happens when the function fails depends on the value of
  @var{noerror}.  If @var{noerror} is @code{nil}, a @code{search-failed}
@@ -975,6 +975,45 @@ comes back" twice.
  @end example
  @end defun
  
+@defun looking-back regexp &optional limit
+This function returns @code{t} if @var{regexp} matches text before
+point, ending at point, and @code{nil} otherwise.
+
+Because regular expression matching works only going forward, this is
+implemented by searching backwards from point for a match that ends at
+point.  That can be quite slow if it has to search a long distance.
+You can bound the time required by specifying @var{limit}, which says
+not to search before @var{limit}.  In this case, the match that is
+found must begin at or after @var{limit}.
+
+@example
+@group
+---------- Buffer: foo ----------
+I read "@point{}The cat in the hat
+comes back" twice.
+---------- Buffer: foo ----------
+
+(looking-back "read \"" 3)
+     @result{} t
+(looking-back "read \"" 4)
+     @result{} nil
+@end group
+@end example
+@end defun
+
+@defvar search-spaces-regexp
+If this variable is non-@code{nil}, it should be a regular expression
+that says how to search for whitespace.  In that case, any group of
+spaces in a regular expression being searched for stands for use of
+this regular expression.  However, spaces inside of constructs such as
+@samp{[@dots{}]} and @samp{*}, @samp{+}, @samp{?} are not affected by
+@code{search-spaces-regexp}.
+
+Since this variable affects all regular expression search and match
+constructs, you should bind it temporarily for as small as possible
+a part of the code.
+@end defvar
+
  @node POSIX Regexps
  @section POSIX Regular Expression Searching
  
@@ -1447,12 +1486,13 @@ character of the buffer counts as 1.)
  write the entire match data, all at once.
  
  @defun match-data &optional integers reuse
-This function returns a newly constructed list containing all the
-information on what text the last search matched.  Element zero is the
-position of the beginning of the match for the whole expression; element
-one is the position of the end of the match for the expression.  The
-next two elements are the positions of the beginning and end of the
-match for the first subexpression, and so on.  In general, element
+This function returns a list of positions (markers or integers) that
+record all the information on what text the last search matched.
+Element zero is the position of the beginning of the match for the
+whole expression; element one is the position of the end of the match
+for the expression.  The next two elements are the positions of the
+beginning and end of the match for the first subexpression, and so on.
+In general, element
  @ifnottex
  number 2@var{n}
  @end ifnottex
@@ -1469,15 +1509,13 @@ number {\mathsurround=0pt $2n+1$}
  @end tex
  corresponds to @code{(match-end @var{n})}.
  
-All the elements are markers or @code{nil} if matching was done on a
-buffer and all are integers or @code{nil} if matching was done on a
-string with @code{string-match}.   If @var{integers} is
-non-@code{nil}, then the elements are integers or @code{nil}, even if
-matching was done on a buffer.  In that case, the buffer itself is
-appended as an additional element at the end of the list
-to facilitate complete restoration of the match data.  Also,
-@code{match-beginning} and
-@code{match-end} always return integers or @code{nil}.
+Normally all the elements are markers or @code{nil}, but if
+@var{integers} is non-@code{nil}, that means to use integers instead
+of markers.  (In that case, the buffer itself is appended as an
+additional element at the end of the list, to facilitate complete
+restoration of the match data.)  If the last match was done on a
+string with @code{string-match}, then integers are always used,
+since markers can't point into a string.
  
  If @var{reuse} is non-@code{nil}, it should be a list.  In that case,
  @code{match-data} stores the match data in @var{reuse}.  That is,
@@ -1485,8 +1523,8 @@ If @var{reuse} is non-@code{nil}, it should be a list.  In that case,
  have the right length.  If it is not long enough to contain the match
  data, it is extended.  If it is too long, the length of @var{reuse}
  stays the same, but the elements that were not used are set to
-@code{nil}.  The purpose of this feature is to avoid producing too
-much garbage, that would later have to be collected.
+@code{nil}.  The purpose of this feature is to reduce the need for
+garbage collection.
  
  As always, there must be no possibility of intervening searches between
  the call to a search function and the call to @code{match-data} that is
@@ -1672,23 +1710,25 @@ whitespace or starting with a form feed (after its left margin).
  @end defvar
  
  @defvar sentence-end
-This is the regular expression describing the end of a sentence.  (All
-paragraph boundaries also end sentences, regardless.)  The (slightly
-simplified) default value is:
-
-@example
-"[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*"
-@end example
-
-This means a period, question mark or exclamation mark (the actual
-default value also lists their alternatives in other character sets),
-followed optionally by closing parenthetical characters, followed by
-tabs, spaces or new lines.
-
-For a detailed explanation of this regular expression, see @ref{Regexp
-Example}.
+If non-@code{nil}, the value should be a regular expression describing
+the end of a sentence, including the whitespace following the
+sentence.  (All paragraph boundaries also end sentences, regardless.)
+
+If the value is @code{nil}, the default, then the function
+@code{sentence-end} has to construct the regexp.  That is why you
+should always call the function @code{sentence-end} to obtain the
+regexp to be used to recognize the end of a sentence.
  @end defvar
  
+@defun sentence-end
+This function returns the value of the variable @code{sentence-end},
+if non-@code{nil}.  Otherwise it returns a default value based on the
+values of the variables @code{sentence-end-double-space}
+(@pxref{Definition of sentence-end-double-space}),
+@code{sentence-end-without-period} and
+@code{sentence-end-without-space}.
+@end defun
+
  @ignore
     arch-tag: c2573ca2-18aa-4839-93b8-924043ef831f
  @end ignore