HCoop Git - bpt/emacs.git/blame_incremental

... / ...

Commit	Line	Data
	1	@c --texinfo--
	2	@c This is part of the GNU Emacs Lisp Reference Manual.
	3	@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2002, 2003,
	4	@c 2004, 2005, 2006 Free Software Foundation, Inc.
	5	@c See the file elisp.texi for copying conditions.
	6	@setfilename ../info/searching
	7	@node Searching and Matching, Syntax Tables, Non-ASCII Characters, Top
	8	@chapter Searching and Matching
	9	@cindex searching
	10
	11	GNU Emacs provides two ways to search through a buffer for specified
	12	text: exact string searches and regular expression searches. After a
	13	regular expression search, you can examine the @dfn{match data} to
	14	determine which text matched the whole regular expression or various
	15	portions of it.
	16
	17	@menu
	18	* String Search:: Search for an exact match.
	19	* Searching and Case:: Case-independent or case-significant searching.
	20	* Regular Expressions:: Describing classes of strings.
	21	* Regexp Search:: Searching for a match for a regexp.
	22	* POSIX Regexps:: Searching POSIX-style for the longest match.
	23	* Match Data:: Finding out which part of the text matched,
	24	after a string or regexp search.
	25	* Search and Replace:: Commands that loop, searching and replacing.
	26	* Standard Regexps:: Useful regexps for finding sentences, pages,...
	27	@end menu
	28
	29	The @samp{skip-chars@dots{}} functions also perform a kind of searching.
	30	@xref{Skipping Characters}.
	31
	32	@node String Search
	33	@section Searching for Strings
	34	@cindex string search
	35
	36	These are the primitive functions for searching through the text in a
	37	buffer. They are meant for use in programs, but you may call them
	38	interactively. If you do so, they prompt for the search string; the
	39	arguments @var{limit} and @var{noerror} are @code{nil}, and @var{repeat}
	40	is 1.
	41
	42	These search functions convert the search string to multibyte if the
	43	buffer is multibyte; they convert the search string to unibyte if the
	44	buffer is unibyte. @xref{Text Representations}.
	45
	46	@deffn Command search-forward string &optional limit noerror repeat
	47	This function searches forward from point for an exact match for
	48	@var{string}. If successful, it sets point to the end of the occurrence
	49	found, and returns the new value of point. If no match is found, the
	50	value and side effects depend on @var{noerror} (see below).
	51	@c Emacs 19 feature
	52
	53	In the following example, point is initially at the beginning of the
	54	line. Then @code{(search-forward "fox")} moves point after the last
	55	letter of @samp{fox}:
	56
	57	@example
	58	@group
	59	---------- Buffer: foo ----------
	60	@point{}The quick brown fox jumped over the lazy dog.
	61	---------- Buffer: foo ----------
	62	@end group
	63
	64	@group
	65	(search-forward "fox")
	66	@result{} 20
	67
	68	---------- Buffer: foo ----------
	69	The quick brown fox@point{} jumped over the lazy dog.
	70	---------- Buffer: foo ----------
	71	@end group
	72	@end example
	73
	74	The argument @var{limit} specifies the upper bound to the search. (It
	75	must be a position in the current buffer.) No match extending after
	76	that position is accepted. If @var{limit} is omitted or @code{nil}, it
	77	defaults to the end of the accessible portion of the buffer.
	78
	79	@kindex search-failed
	80	What happens when the search fails depends on the value of
	81	@var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
	82	error is signaled. If @var{noerror} is @code{t}, @code{search-forward}
	83	returns @code{nil} and does nothing. If @var{noerror} is neither
	84	@code{nil} nor @code{t}, then @code{search-forward} moves point to the
	85	upper bound and returns @code{nil}. (It would be more consistent now to
	86	return the new position of point in that case, but some existing
	87	programs may depend on a value of @code{nil}.)
	88
	89	The argument @var{noerror} only affects valid searches which fail to
	90	find a match. Invalid arguments cause errors regardless of
	91	@var{noerror}.
	92
	93	If @var{repeat} is supplied (it must be a positive number), then the
	94	search is repeated that many times (each time starting at the end of the
	95	previous time's match). If these successive searches succeed, the
	96	function succeeds, moving point and returning its new value. Otherwise
	97	the search fails, with results depending on the value of
	98	@var{noerror}, as described above.
	99	@end deffn
	100
	101	@deffn Command search-backward string &optional limit noerror repeat
	102	This function searches backward from point for @var{string}. It is
	103	just like @code{search-forward} except that it searches backwards and
	104	leaves point at the beginning of the match.
	105	@end deffn
	106
	107	@deffn Command word-search-forward string &optional limit noerror repeat
	108	@cindex word search
	109	This function searches forward from point for a ``word'' match for
	110	@var{string}. If it finds a match, it sets point to the end of the
	111	match found, and returns the new value of point.
	112	@c Emacs 19 feature
	113
	114	Word matching regards @var{string} as a sequence of words, disregarding
	115	punctuation that separates them. It searches the buffer for the same
	116	sequence of words. Each word must be distinct in the buffer (searching
	117	for the word @samp{ball} does not match the word @samp{balls}), but the
	118	details of punctuation and spacing are ignored (searching for @samp{ball
	119	boy} does match @samp{ball. Boy!}).
	120
	121	In this example, point is initially at the beginning of the buffer; the
	122	search leaves it between the @samp{y} and the @samp{!}.
	123
	124	@example
	125	@group
	126	---------- Buffer: foo ----------
	127	@point{}He said "Please! Find
	128	the ball boy!"
	129	---------- Buffer: foo ----------
	130	@end group
	131
	132	@group
	133	(word-search-forward "Please find the ball, boy.")
	134	@result{} 35
	135
	136	---------- Buffer: foo ----------
	137	He said "Please! Find
	138	the ball boy@point{}!"
	139	---------- Buffer: foo ----------
	140	@end group
	141	@end example
	142
	143	If @var{limit} is non-@code{nil}, it must be a position in the current
	144	buffer; it specifies the upper bound to the search. The match found
	145	must not extend after that position.
	146
	147	If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
	148	an error if the search fails. If @var{noerror} is @code{t}, then it
	149	returns @code{nil} instead of signaling an error. If @var{noerror} is
	150	neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the
	151	end of the accessible portion of the buffer) and returns @code{nil}.
	152
	153	If @var{repeat} is non-@code{nil}, then the search is repeated that many
	154	times. Point is positioned at the end of the last match.
	155	@end deffn
	156
	157	@deffn Command word-search-backward string &optional limit noerror repeat
	158	This function searches backward from point for a word match to
	159	@var{string}. This function is just like @code{word-search-forward}
	160	except that it searches backward and normally leaves point at the
	161	beginning of the match.
	162	@end deffn
	163
	164	@node Searching and Case
	165	@section Searching and Case
	166	@cindex searching and case
	167
	168	By default, searches in Emacs ignore the case of the text they are
	169	searching through; if you specify searching for @samp{FOO}, then
	170	@samp{Foo} or @samp{foo} is also considered a match. This applies to
	171	regular expressions, too; thus, @samp{[aB]} would match @samp{a} or
	172	@samp{A} or @samp{b} or @samp{B}.
	173
	174	If you do not want this feature, set the variable
	175	@code{case-fold-search} to @code{nil}. Then all letters must match
	176	exactly, including case. This is a buffer-local variable; altering the
	177	variable affects only the current buffer. (@xref{Intro to
	178	Buffer-Local}.) Alternatively, you may change the value of
	179	@code{default-case-fold-search}, which is the default value of
	180	@code{case-fold-search} for buffers that do not override it.
	181
	182	Note that the user-level incremental search feature handles case
	183	distinctions differently. When given a lower case letter, it looks for
	184	a match of either case, but when given an upper case letter, it looks
	185	for an upper case letter only. But this has nothing to do with the
	186	searching functions used in Lisp code.
	187
	188	@defopt case-replace
	189	This variable determines whether the higher level replacement
	190	functions should preserve case. If the variable is @code{nil}, that
	191	means to use the replacement text verbatim. A non-@code{nil} value
	192	means to convert the case of the replacement text according to the
	193	text being replaced.
	194
	195	This variable is used by passing it as an argument to the function
	196	@code{replace-match}. @xref{Replacing Match}.
	197	@end defopt
	198
	199	@defopt case-fold-search
	200	This buffer-local variable determines whether searches should ignore
	201	case. If the variable is @code{nil} they do not ignore case; otherwise
	202	they do ignore case.
	203	@end defopt
	204
	205	@defvar default-case-fold-search
	206	The value of this variable is the default value for
	207	@code{case-fold-search} in buffers that do not override it. This is the
	208	same as @code{(default-value 'case-fold-search)}.
	209	@end defvar
	210
	211	@node Regular Expressions
	212	@section Regular Expressions
	213	@cindex regular expression
	214	@cindex regexp
	215
	216	A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
	217	denotes a (possibly infinite) set of strings. Searching for matches for
	218	a regexp is a very powerful operation. This section explains how to write
	219	regexps; the following section says how to search for them.
	220
	221	@findex re-builder
	222	@cindex authoring regular expressions
	223	For convenient interactive development of regular expressions, you
	224	can use the @kbd{M-x re-builder} command. It provides a convenient
	225	interface for creating regular expressions, by giving immediate visual
	226	feedback in a separate buffer. As you edit the regexp, all its
	227	matches in the target buffer are highlighted. Each parenthesized
	228	sub-expression of the regexp is shown in a distinct face, which makes
	229	it easier to verify even very complex regexps.
	230
	231	@menu
	232	* Syntax of Regexps:: Rules for writing regular expressions.
	233	* Regexp Example:: Illustrates regular expression syntax.
	234	* Regexp Functions:: Functions for operating on regular expressions.
	235	@end menu
	236
	237	@node Syntax of Regexps
	238	@subsection Syntax of Regular Expressions
	239
	240	Regular expressions have a syntax in which a few characters are
	241	special constructs and the rest are @dfn{ordinary}. An ordinary
	242	character is a simple regular expression that matches that character
	243	and nothing else. The special characters are @samp{.}, @samp{*},
	244	@samp{+}, @samp{?}, @samp{[}, @samp{^}, @samp{$}, and @samp{\}; no new
	245	special characters will be defined in the future. The character
	246	@samp{]} is special if it ends a character alternative (see later).
	247	The character @samp{-} is special inside a character alternative. A
	248	@samp{[:} and balancing @samp{:]} enclose a character class inside a
	249	character alternative. Any other character appearing in a regular
	250	expression is ordinary, unless a @samp{\} precedes it.
	251
	252	For example, @samp{f} is not a special character, so it is ordinary, and
	253	therefore @samp{f} is a regular expression that matches the string
	254	@samp{f} and no other string. (It does @emph{not} match the string
	255	@samp{fg}, but it does match a @emph{part} of that string.) Likewise,
	256	@samp{o} is a regular expression that matches only @samp{o}.@refill
	257
	258	Any two regular expressions @var{a} and @var{b} can be concatenated. The
	259	result is a regular expression that matches a string if @var{a} matches
	260	some amount of the beginning of that string and @var{b} matches the rest of
	261	the string.@refill
	262
	263	As a simple example, we can concatenate the regular expressions @samp{f}
	264	and @samp{o} to get the regular expression @samp{fo}, which matches only
	265	the string @samp{fo}. Still trivial. To do something more powerful, you
	266	need to use one of the special regular expression constructs.
	267
	268	@menu
	269	* Regexp Special:: Special characters in regular expressions.
	270	* Char Classes:: Character classes used in regular expressions.
	271	* Regexp Backslash:: Backslash-sequences in regular expressions.
	272	@end menu
	273
	274	@node Regexp Special
	275	@subsubsection Special Characters in Regular Expressions
	276
	277	Here is a list of the characters that are special in a regular
	278	expression.
	279
	280	@need 800
	281	@table @asis
	282	@item @samp{.}@: @r{(Period)}
	283	@cindex @samp{.} in regexp
	284	is a special character that matches any single character except a newline.
	285	Using concatenation, we can make regular expressions like @samp{a.b}, which
	286	matches any three-character string that begins with @samp{a} and ends with
	287	@samp{b}.@refill
	288
	289	@item @samp{*}
	290	@cindex @samp{*} in regexp
	291	is not a construct by itself; it is a postfix operator that means to
	292	match the preceding regular expression repetitively as many times as
	293	possible. Thus, @samp{o*} matches any number of @samp{o}s (including no
	294	@samp{o}s).
	295
	296	@samp{*} always applies to the @emph{smallest} possible preceding
	297	expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating
	298	@samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
	299
	300	The matcher processes a @samp{*} construct by matching, immediately, as
	301	many repetitions as can be found. Then it continues with the rest of
	302	the pattern. If that fails, backtracking occurs, discarding some of the
	303	matches of the @samp{*}-modified construct in the hope that that will
	304	make it possible to match the rest of the pattern. For example, in
	305	matching @samp{caar} against the string @samp{caaar}, the @samp{a}
	306	first tries to match all three @samp{a}s; but the rest of the pattern is
	307	@samp{ar} and there is only @samp{r} left to match, so this try fails.
	308	The next alternative is for @samp{a*} to match only two @samp{a}s. With
	309	this choice, the rest of the regexp matches successfully.
	310
	311	@strong{Warning:} Nested repetition operators take a long time,
	312	or even forever, if they
	313	lead to ambiguous matching. For example, trying to match the regular
	314	expression @samp{$x+y$a} against the string
	315	@samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz} could take hours before it
	316	ultimately fails. Emacs must try each way of grouping the 35
	317	@samp{x}s before concluding that none of them can work. Even worse,
	318	@samp{$x$} can match the null string in infinitely many ways, so
	319	it causes an infinite loop. To avoid these problems, check nested
	320	repetitions carefully, to make sure that they do not cause combinatorial
	321	explosions in backtracking.
	322
	323	@item @samp{+}
	324	@cindex @samp{+} in regexp
	325	is a postfix operator, similar to @samp{*} except that it must match
	326	the preceding expression at least once. So, for example, @samp{ca+r}
	327	matches the strings @samp{car} and @samp{caaaar} but not the string
	328	@samp{cr}, whereas @samp{ca*r} matches all three strings.
	329
	330	@item @samp{?}
	331	@cindex @samp{?} in regexp
	332	is a postfix operator, similar to @samp{*} except that it must match the
	333	preceding expression either once or not at all. For example,
	334	@samp{ca?r} matches @samp{car} or @samp{cr}; nothing else.
	335
	336	@item @samp{*?}, @samp{+?}, @samp{??}
	337	These are ``non-greedy'' variants of the operators @samp{*}, @samp{+}
	338	and @samp{?}. Where those operators match the largest possible
	339	substring (consistent with matching the entire containing expression),
	340	the non-greedy variants match the smallest possible substring
	341	(consistent with matching the entire containing expression).
	342
	343	For example, the regular expression @samp{c[ad]*a} when applied to the
	344	string @samp{cdaaada} matches the whole string; but the regular
	345	expression @samp{c[ad]*?a}, applied to that same string, matches just
	346	@samp{cda}. (The smallest possible match here for @samp{[ad]*?} that
	347	permits the whole expression to match is @samp{d}.)
	348
	349	@item @samp{[ @dots{} ]}
	350	@cindex character alternative (in regexp)
	351	@cindex @samp{[} in regexp
	352	@cindex @samp{]} in regexp
	353	is a @dfn{character alternative}, which begins with @samp{[} and is
	354	terminated by @samp{]}. In the simplest case, the characters between
	355	the two brackets are what this character alternative can match.
	356
	357	Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and
	358	@samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s
	359	(including the empty string), from which it follows that @samp{c[ad]*r}
	360	matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
	361
	362	You can also include character ranges in a character alternative, by
	363	writing the starting and ending characters with a @samp{-} between them.
	364	Thus, @samp{[a-z]} matches any lower-case @acronym{ASCII} letter.
	365	Ranges may be intermixed freely with individual characters, as in
	366	@samp{[a-z$%.]}, which matches any lower case @acronym{ASCII} letter
	367	or @samp{$}, @samp{%} or period.
	368
	369	Note that the usual regexp special characters are not special inside a
	370	character alternative. A completely different set of characters is
	371	special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
	372
	373	To include a @samp{]} in a character alternative, you must make it the
	374	first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}.
	375	To include a @samp{-}, write @samp{-} as the first or last character of
	376	the character alternative, or put it after a range. Thus, @samp{[]-]}
	377	matches both @samp{]} and @samp{-}.
	378
	379	To include @samp{^} in a character alternative, put it anywhere but at
	380	the beginning.
	381
	382	The beginning and end of a range of multibyte characters must be in
	383	the same character set (@pxref{Character Sets}). Thus,
	384	@code{"[\x8e0-\x97c]"} is invalid because character 0x8e0 (@samp{a}
	385	with grave accent) is in the Emacs character set for Latin-1 but the
	386	character 0x97c (@samp{u} with diaeresis) is in the Emacs character
	387	set for Latin-2. (We use Lisp string syntax to write that example,
	388	and a few others in the next few paragraphs, in order to include hex
	389	escape sequences in them.)
	390
	391	If a range starts with a unibyte character @var{c} and ends with a
	392	multibyte character @var{c2}, the range is divided into two parts: one
	393	is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where
	394	@var{c1} is the first character of the charset to which @var{c2}
	395	belongs.
	396
	397	You cannot always match all non-@acronym{ASCII} characters with the regular
	398	expression @code{"[\200-\377]"}. This works when searching a unibyte
	399	buffer or string (@pxref{Text Representations}), but not in a multibyte
	400	buffer or string, because many non-@acronym{ASCII} characters have codes
	401	above octal 0377. However, the regular expression @code{"[^\000-\177]"}
	402	does match all non-@acronym{ASCII} characters (see below regarding @samp{^}),
	403	in both multibyte and unibyte representations, because only the
	404	@acronym{ASCII} characters are excluded.
	405
	406	A character alternative can also specify named
	407	character classes (@pxref{Char Classes}). This is a POSIX feature whose
	408	syntax is @samp{[:@var{class}:]}. Using a character class is equivalent
	409	to mentioning each of the characters in that class; but the latter is
	410	not feasible in practice, since some classes include thousands of
	411	different characters.
	412
	413	@item @samp{[^ @dots{} ]}
	414	@cindex @samp{^} in regexp
	415	@samp{[^} begins a @dfn{complemented character alternative}. This
	416	matches any character except the ones specified. Thus,
	417	@samp{[^a-z0-9A-Z]} matches all characters @emph{except} letters and
	418	digits.
	419
	420	@samp{^} is not special in a character alternative unless it is the first
	421	character. The character following the @samp{^} is treated as if it
	422	were first (in other words, @samp{-} and @samp{]} are not special there).
	423
	424	A complemented character alternative can match a newline, unless newline is
	425	mentioned as one of the characters not to match. This is in contrast to
	426	the handling of regexps in programs such as @code{grep}.
	427
	428	@item @samp{^}
	429	@cindex beginning of line in regexp
	430	When matching a buffer, @samp{^} matches the empty string, but only at the
	431	beginning of a line in the text being matched (or the beginning of the
	432	accessible portion of the buffer). Otherwise it fails to match
	433	anything. Thus, @samp{^foo} matches a @samp{foo} that occurs at the
	434	beginning of a line.
	435
	436	When matching a string instead of a buffer, @samp{^} matches at the
	437	beginning of the string or after a newline character.
	438
	439	For historical compatibility reasons, @samp{^} can be used only at the
	440	beginning of the regular expression, or after @samp{\(} or @samp{\\|}.
	441
	442	@item @samp{$}
	443	@cindex @samp{$} in regexp
	444	@cindex end of line in regexp
	445	is similar to @samp{^} but matches only at the end of a line (or the
	446	end of the accessible portion of the buffer). Thus, @samp{x+$}
	447	matches a string of one @samp{x} or more at the end of a line.
	448
	449	When matching a string instead of a buffer, @samp{$} matches at the end
	450	of the string or before a newline character.
	451
	452	For historical compatibility reasons, @samp{$} can be used only at the
	453	end of the regular expression, or before @samp{\)} or @samp{\\|}.
	454
	455	@item @samp{\}
	456	@cindex @samp{\} in regexp
	457	has two functions: it quotes the special characters (including
	458	@samp{\}), and it introduces additional special constructs.
	459
	460	Because @samp{\} quotes special characters, @samp{\$} is a regular
	461	expression that matches only @samp{$}, and @samp{\[} is a regular
	462	expression that matches only @samp{[}, and so on.
	463
	464	Note that @samp{\} also has special meaning in the read syntax of Lisp
	465	strings (@pxref{String Type}), and must be quoted with @samp{\}. For
	466	example, the regular expression that matches the @samp{\} character is
	467	@samp{\\}. To write a Lisp string that contains the characters
	468	@samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
	469	@samp{\}. Therefore, the read syntax for a regular expression matching
	470	@samp{\} is @code{"\\\\"}.@refill
	471	@end table
	472
	473	@strong{Please note:} For historical compatibility, special characters
	474	are treated as ordinary ones if they are in contexts where their special
	475	meanings make no sense. For example, @samp{foo} treats @samp{} as
	476	ordinary since there is no preceding expression on which the @samp{*}
	477	can act. It is poor practice to depend on this behavior; quote the
	478	special character anyway, regardless of where it appears.@refill
	479
	480	As a @samp{\} is not special inside a character alternative, it can
	481	never remove the special meaning of @samp{-} or @samp{]}. So you
	482	should not quote these characters when they have no special meaning
	483	either. This would not clarify anything, since backslashes can
	484	legitimately precede these characters where they @emph{have} special
	485	meaning, as in @samp{[^\]} (@code{"[^\\]"} for Lisp string syntax),
	486	which matches any single character except a backslash.
	487
	488	In practice, most @samp{]} that occur in regular expressions close a
	489	character alternative and hence are special. However, occasionally a
	490	regular expression may try to match a complex pattern of literal
	491	@samp{[} and @samp{]}. In such situations, it sometimes may be
	492	necessary to carefully parse the regexp from the start to determine
	493	which square brackets enclose a character alternative. For example,
	494	@samp{[^][]]} consists of the complemented character alternative
	495	@samp{[^][]} (which matches any single character that is not a square
	496	bracket), followed by a literal @samp{]}.
	497
	498	The exact rules are that at the beginning of a regexp, @samp{[} is
	499	special and @samp{]} not. This lasts until the first unquoted
	500	@samp{[}, after which we are in a character alternative; @samp{[} is
	501	no longer special (except when it starts a character class) but @samp{]}
	502	is special, unless it immediately follows the special @samp{[} or that
	503	@samp{[} followed by a @samp{^}. This lasts until the next special
	504	@samp{]} that does not end a character class. This ends the character
	505	alternative and restores the ordinary syntax of regular expressions;
	506	an unquoted @samp{[} is special again and a @samp{]} not.
	507
	508	@node Char Classes
	509	@subsubsection Character Classes
	510	@cindex character classes in regexp
	511
	512	Here is a table of the classes you can use in a character alternative,
	513	and what they mean:
	514
	515	@table @samp
	516	@item [:ascii:]
	517	This matches any @acronym{ASCII} (unibyte) character.
	518	@item [:alnum:]
	519	This matches any letter or digit. (At present, for multibyte
	520	characters, it matches anything that has word syntax.)
	521	@item [:alpha:]
	522	This matches any letter. (At present, for multibyte characters, it
	523	matches anything that has word syntax.)
	524	@item [:blank:]
	525	This matches space and tab only.
	526	@item [:cntrl:]
	527	This matches any @acronym{ASCII} control character.
	528	@item [:digit:]
	529	This matches @samp{0} through @samp{9}. Thus, @samp{[-+[:digit:]]}
	530	matches any digit, as well as @samp{+} and @samp{-}.
	531	@item [:graph:]
	532	This matches graphic characters---everything except @acronym{ASCII} control
	533	characters, space, and the delete character.
	534	@item [:lower:]
	535	This matches any lower-case letter, as determined by
	536	the current case table (@pxref{Case Tables}).
	537	@item [:nonascii:]
	538	This matches any non-@acronym{ASCII} (multibyte) character.
	539	@item [:print:]
	540	This matches printing characters---everything except @acronym{ASCII} control
	541	characters and the delete character.
	542	@item [:punct:]
	543	This matches any punctuation character. (At present, for multibyte
	544	characters, it matches anything that has non-word syntax.)
	545	@item [:space:]
	546	This matches any character that has whitespace syntax
	547	(@pxref{Syntax Class Table}).
	548	@item [:upper:]
	549	This matches any upper-case letter, as determined by
	550	the current case table (@pxref{Case Tables}).
	551	@item [:word:]
	552	This matches any character that has word syntax (@pxref{Syntax Class
	553	Table}).
	554	@item [:xdigit:]
	555	This matches the hexadecimal digits: @samp{0} through @samp{9}, @samp{a}
	556	through @samp{f} and @samp{A} through @samp{F}.
	557	@end table
	558
	559	@node Regexp Backslash
	560	@subsubsection Backslash Constructs in Regular Expressions
	561
	562	For the most part, @samp{\} followed by any character matches only
	563	that character. However, there are several exceptions: certain
	564	two-character sequences starting with @samp{\} that have special
	565	meanings. (The character after the @samp{\} in such a sequence is
	566	always ordinary when used on its own.) Here is a table of the special
	567	@samp{\} constructs.
	568
	569	@table @samp
	570	@item \\|
	571	@cindex @samp{\|} in regexp
	572	@cindex regexp alternative
	573	specifies an alternative.
	574	Two regular expressions @var{a} and @var{b} with @samp{\\|} in
	575	between form an expression that matches anything that either @var{a} or
	576	@var{b} matches.@refill
	577
	578	Thus, @samp{foo\\|bar} matches either @samp{foo} or @samp{bar}
	579	but no other string.@refill
	580
	581	@samp{\\|} applies to the largest possible surrounding expressions. Only a
	582	surrounding @samp{$ @dots{} $} grouping can limit the grouping power of
	583	@samp{\\|}.@refill
	584
	585	If you need full backtracking capability to handle multiple uses of
	586	@samp{\\|}, use the POSIX regular expression functions (@pxref{POSIX
	587	Regexps}).
	588
	589	@item \@{@var{m}\@}
	590	is a postfix operator that repeats the previous pattern exactly @var{m}
	591	times. Thus, @samp{x\@{5\@}} matches the string @samp{xxxxx}
	592	and nothing else. @samp{c[ad]\@{3\@}r} matches string such as
	593	@samp{caaar}, @samp{cdddr}, @samp{cadar}, and so on.
	594
	595	@item \@{@var{m},@var{n}\@}
	596	is a more general postfix operator that specifies repetition with a
	597	minimum of @var{m} repeats and a maximum of @var{n} repeats. If @var{m}
	598	is omitted, the minimum is 0; if @var{n} is omitted, there is no
	599	maximum.
	600
	601	For example, @samp{c[ad]\@{1,2\@}r} matches the strings @samp{car},
	602	@samp{cdr}, @samp{caar}, @samp{cadr}, @samp{cdar}, and @samp{cddr}, and
	603	nothing else.@*
	604	@samp{\@{0,1\@}} or @samp{\@{,1\@}} is equivalent to @samp{?}. @*
	605	@samp{\@{0,\@}} or @samp{\@{,\@}} is equivalent to @samp{}. @
	606	@samp{\@{1,\@}} is equivalent to @samp{+}.
	607
	608	@item $ @dots{} $
	609	@cindex @samp{(} in regexp
	610	@cindex @samp{)} in regexp
	611	@cindex regexp grouping
	612	is a grouping construct that serves three purposes:
	613
	614	@enumerate
	615	@item
	616	To enclose a set of @samp{\\|} alternatives for other operations. Thus,
	617	the regular expression @samp{$foo\\|bar$x} matches either @samp{foox}
	618	or @samp{barx}.
	619
	620	@item
	621	To enclose a complicated expression for the postfix operators @samp{*},
	622	@samp{+} and @samp{?} to operate on. Thus, @samp{ba$na$*} matches
	623	@samp{ba}, @samp{bana}, @samp{banana}, @samp{bananana}, etc., with any
	624	number (zero or more) of @samp{na} strings.
	625
	626	@item
	627	To record a matched substring for future reference with
	628	@samp{\@var{digit}} (see below).
	629	@end enumerate
	630
	631	This last application is not a consequence of the idea of a
	632	parenthetical grouping; it is a separate feature that was assigned as a
	633	second meaning to the same @samp{$ @dots{} $} construct because, in
	634	practice, there was usually no conflict between the two meanings. But
	635	occasionally there is a conflict, and that led to the introduction of
	636	shy groups.
	637
	638	@item $?: @dots{} $
	639	is the @dfn{shy group} construct. A shy group serves the first two
	640	purposes of an ordinary group (controlling the nesting of other
	641	operators), but it does not get a number, so you cannot refer back to
	642	its value with @samp{\@var{digit}}.
	643
	644	Shy groups are particularly useful for mechanically-constructed regular
	645	expressions because they can be added automatically without altering the
	646	numbering of any ordinary, non-shy groups.
	647
	648	@item \@var{digit}
	649	matches the same text that matched the @var{digit}th occurrence of a
	650	grouping (@samp{$ @dots{} $}) construct.
	651
	652	In other words, after the end of a group, the matcher remembers the
	653	beginning and end of the text matched by that group. Later on in the
	654	regular expression you can use @samp{\} followed by @var{digit} to
	655	match that same text, whatever it may have been.
	656
	657	The strings matching the first nine grouping constructs appearing in
	658	the entire regular expression passed to a search or matching function
	659	are assigned numbers 1 through 9 in the order that the open
	660	parentheses appear in the regular expression. So you can use
	661	@samp{\1} through @samp{\9} to refer to the text matched by the
	662	corresponding grouping constructs.
	663
	664	For example, @samp{$.*$\1} matches any newline-free string that is
	665	composed of two identical halves. The @samp{$.*$} matches the first
	666	half, which may be anything, but the @samp{\1} that follows must match
	667	the same exact text.
	668
	669	If a @samp{$ @dots{} $} construct matches more than once (which can
	670	happen, for instance, if it is followed by @samp{*}), only the last
	671	match is recorded.
	672
	673	If a particular grouping construct in the regular expression was never
	674	matched---for instance, if it appears inside of an alternative that
	675	wasn't used, or inside of a repetition that repeated zero times---then
	676	the corresponding @samp{\@var{digit}} construct never matches
	677	anything. To use an artificial example,, @samp{$foo\(b*$\\|lose\)\2}
	678	cannot match @samp{lose}: the second alternative inside the larger
	679	group matches it, but then @samp{\2} is undefined and can't match
	680	anything. But it can match @samp{foobb}, because the first
	681	alternative matches @samp{foob} and @samp{\2} matches @samp{b}.
	682
	683	@item \w
	684	@cindex @samp{\w} in regexp
	685	matches any word-constituent character. The editor syntax table
	686	determines which characters these are. @xref{Syntax Tables}.
	687
	688	@item \W
	689	@cindex @samp{\W} in regexp
	690	matches any character that is not a word constituent.
	691
	692	@item \s@var{code}
	693	@cindex @samp{\s} in regexp
	694	matches any character whose syntax is @var{code}. Here @var{code} is a
	695	character that represents a syntax code: thus, @samp{w} for word
	696	constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
	697	etc. To represent whitespace syntax, use either @samp{-} or a space
	698	character. @xref{Syntax Class Table}, for a list of syntax codes and
	699	the characters that stand for them.
	700
	701	@item \S@var{code}
	702	@cindex @samp{\S} in regexp
	703	matches any character whose syntax is not @var{code}.
	704
	705	@item \c@var{c}
	706	matches any character whose category is @var{c}. Here @var{c} is a
	707	character that represents a category: thus, @samp{c} for Chinese
	708	characters or @samp{g} for Greek characters in the standard category
	709	table.
	710
	711	@item \C@var{c}
	712	matches any character whose category is not @var{c}.
	713	@end table
	714
	715	The following regular expression constructs match the empty string---that is,
	716	they don't use up any characters---but whether they match depends on the
	717	context. For all, the beginning and end of the accessible portion of
	718	the buffer are treated as if they were the actual beginning and end of
	719	the buffer.
	720
	721	@table @samp
	722	@item \`
	723	@cindex @samp{\`} in regexp
	724	matches the empty string, but only at the beginning
	725	of the buffer or string being matched against.
	726
	727	@item \'
	728	@cindex @samp{\'} in regexp
	729	matches the empty string, but only at the end of
	730	the buffer or string being matched against.
	731
	732	@item \=
	733	@cindex @samp{\=} in regexp
	734	matches the empty string, but only at point.
	735	(This construct is not defined when matching against a string.)
	736
	737	@item \b
	738	@cindex @samp{\b} in regexp
	739	matches the empty string, but only at the beginning or
	740	end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
	741	@samp{foo} as a separate word. @samp{\bballs?\b} matches
	742	@samp{ball} or @samp{balls} as a separate word.@refill
	743
	744	@samp{\b} matches at the beginning or end of the buffer (or string)
	745	regardless of what text appears next to it.
	746
	747	@item \B
	748	@cindex @samp{\B} in regexp
	749	matches the empty string, but @emph{not} at the beginning or
	750	end of a word, nor at the beginning or end of the buffer (or string).
	751
	752	@item \<
	753	@cindex @samp{\<} in regexp
	754	matches the empty string, but only at the beginning of a word.
	755	@samp{\<} matches at the beginning of the buffer (or string) only if a
	756	word-constituent character follows.
	757
	758	@item \>
	759	@cindex @samp{\>} in regexp
	760	matches the empty string, but only at the end of a word. @samp{\>}
	761	matches at the end of the buffer (or string) only if the contents end
	762	with a word-constituent character.
	763
	764	@item \_<
	765	@cindex @samp{\_<} in regexp
	766	matches the empty string, but only at the beginning of a symbol. A
	767	symbol is a sequence of one or more word or symbol constituent
	768	characters. @samp{\_<} matches at the beginning of the buffer (or
	769	string) only if a symbol-constituent character follows.
	770
	771	@item \_>
	772	@cindex @samp{\_>} in regexp
	773	matches the empty string, but only at the end of a symbol. @samp{\_>}
	774	matches at the end of the buffer (or string) only if the contents end
	775	with a symbol-constituent character.
	776	@end table
	777
	778	@kindex invalid-regexp
	779	Not every string is a valid regular expression. For example, a string
	780	that ends inside a character alternative without terminating @samp{]}
	781	is invalid, and so is a string that ends with a single @samp{\}. If
	782	an invalid regular expression is passed to any of the search functions,
	783	an @code{invalid-regexp} error is signaled.
	784
	785	@node Regexp Example
	786	@comment node-name, next, previous, up
	787	@subsection Complex Regexp Example
	788
	789	Here is a complicated regexp which was formerly used by Emacs to
	790	recognize the end of a sentence together with any whitespace that
	791	follows. (Nowadays Emacs uses a similar but more complex default
	792	regexp constructed by the function @code{sentence-end}.
	793	@xref{Standard Regexps}.)
	794
	795	First, we show the regexp as a string in Lisp syntax to distinguish
	796	spaces from tab characters. The string constant begins and ends with a
	797	double-quote. @samp{\"} stands for a double-quote as part of the
	798	string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
	799	tab and @samp{\n} for a newline.
	800
	801	@example
	802	"[.?!][]\"')@}]\$$\\\| $\\\|\t\\\|@ @ \$[ \t\n]"
	803	@end example
	804
	805	@noindent
	806	In contrast, if you evaluate this string, you will see the following:
	807
	808	@example
	809	@group
	810	"[.?!][]\"')@}]\$$\\\| $\\\|\t\\\|@ @ \$[ \t\n]"
	811	@result{} "[.?!][]\"')@}]*\$$\\\| $\\\| \\\|@ @ \$[
	812	]*"
	813	@end group
	814	@end example
	815
	816	@noindent
	817	In this output, tab and newline appear as themselves.
	818
	819	This regular expression contains four parts in succession and can be
	820	deciphered as follows:
	821
	822	@table @code
	823	@item [.?!]
	824	The first part of the pattern is a character alternative that matches
	825	any one of three characters: period, question mark, and exclamation
	826	mark. The match must begin with one of these three characters. (This
	827	is one point where the new default regexp used by Emacs differs from
	828	the old. The new value also allows some non-@acronym{ASCII}
	829	characters that end a sentence without any following whitespace.)
	830
	831	@item []\"')@}]*
	832	The second part of the pattern matches any closing braces and quotation
	833	marks, zero or more of them, that may follow the period, question mark
	834	or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in
	835	a string. The @samp{*} at the end indicates that the immediately
	836	preceding regular expression (a character alternative, in this case) may be
	837	repeated zero or more times.
	838
	839	@item \$$\\\|@ $\\\|\t\\\|@ @ \$
	840	The third part of the pattern matches the whitespace that follows the
	841	end of a sentence: the end of a line (optionally with a space), or a
	842	tab, or two spaces. The double backslashes mark the parentheses and
	843	vertical bars as regular expression syntax; the parentheses delimit a
	844	group and the vertical bars separate alternatives. The dollar sign is
	845	used to match the end of a line.
	846
	847	@item [ \t\n]*
	848	Finally, the last part of the pattern matches any additional whitespace
	849	beyond the minimum needed to end a sentence.
	850	@end table
	851
	852	@node Regexp Functions
	853	@subsection Regular Expression Functions
	854
	855	These functions operate on regular expressions.
	856
	857	@defun regexp-quote string
	858	This function returns a regular expression whose only exact match is
	859	@var{string}. Using this regular expression in @code{looking-at} will
	860	succeed only if the next characters in the buffer are @var{string};
	861	using it in a search function will succeed if the text being searched
	862	contains @var{string}.
	863
	864	This allows you to request an exact string match or search when calling
	865	a function that wants a regular expression.
	866
	867	@example
	868	@group
	869	(regexp-quote "^The cat$")
	870	@result{} "\\^The cat\\$"
	871	@end group
	872	@end example
	873
	874	One use of @code{regexp-quote} is to combine an exact string match with
	875	context described as a regular expression. For example, this searches
	876	for the string that is the value of @var{string}, surrounded by
	877	whitespace:
	878
	879	@example
	880	@group
	881	(re-search-forward
	882	(concat "\\s-" (regexp-quote string) "\\s-"))
	883	@end group
	884	@end example
	885	@end defun
	886
	887	@defun regexp-opt strings &optional paren
	888	This function returns an efficient regular expression that will match
	889	any of the strings in the list @var{strings}. This is useful when you
	890	need to make matching or searching as fast as possible---for example,
	891	for Font Lock mode.
	892
	893	If the optional argument @var{paren} is non-@code{nil}, then the
	894	returned regular expression is always enclosed by at least one
	895	parentheses-grouping construct. If @var{paren} is @code{words}, then
	896	that construct is additionally surrounded by @samp{\<} and @samp{\>}.
	897
	898	This simplified definition of @code{regexp-opt} produces a
	899	regular expression which is equivalent to the actual value
	900	(but not as efficient):
	901
	902	@example
	903	(defun regexp-opt (strings paren)
	904	(let ((open-paren (if paren "\\(" ""))
	905	(close-paren (if paren "\\)" "")))
	906	(concat open-paren
	907	(mapconcat 'regexp-quote strings "\\\|")
	908	close-paren)))
	909	@end example
	910	@end defun
	911
	912	@defun regexp-opt-depth regexp
	913	This function returns the total number of grouping constructs
	914	(parenthesized expressions) in @var{regexp}. (This does not include
	915	shy groups.)
	916	@end defun
	917
	918	@node Regexp Search
	919	@section Regular Expression Searching
	920	@cindex regular expression searching
	921	@cindex regexp searching
	922	@cindex searching for regexp
	923
	924	In GNU Emacs, you can search for the next match for a regular
	925	expression either incrementally or not. For incremental search
	926	commands, see @ref{Regexp Search, , Regular Expression Search, emacs,
	927	The GNU Emacs Manual}. Here we describe only the search functions
	928	useful in programs. The principal one is @code{re-search-forward}.
	929
	930	These search functions convert the regular expression to multibyte if
	931	the buffer is multibyte; they convert the regular expression to unibyte
	932	if the buffer is unibyte. @xref{Text Representations}.
	933
	934	@deffn Command re-search-forward regexp &optional limit noerror repeat
	935	This function searches forward in the current buffer for a string of
	936	text that is matched by the regular expression @var{regexp}. The
	937	function skips over any amount of text that is not matched by
	938	@var{regexp}, and leaves point at the end of the first match found.
	939	It returns the new value of point.
	940
	941	If @var{limit} is non-@code{nil}, it must be a position in the current
	942	buffer. It specifies the upper bound to the search. No match
	943	extending after that position is accepted.
	944
	945	If @var{repeat} is supplied, it must be a positive number; the search
	946	is repeated that many times; each repetition starts at the end of the
	947	previous match. If all these successive searches succeed, the search
	948	succeeds, moving point and returning its new value. Otherwise the
	949	search fails. What @code{re-search-forward} does when the search
	950	fails depends on the value of @var{noerror}:
	951
	952	@table @asis
	953	@item @code{nil}
	954	Signal a @code{search-failed} error.
	955	@item @code{t}
	956	Do nothing and return @code{nil}.
	957	@item anything else
	958	Move point to @var{limit} (or the end of the accessible portion of the
	959	buffer) and return @code{nil}.
	960	@end table
	961
	962	In the following example, point is initially before the @samp{T}.
	963	Evaluating the search call moves point to the end of that line (between
	964	the @samp{t} of @samp{hat} and the newline).
	965
	966	@example
	967	@group
	968	---------- Buffer: foo ----------
	969	I read "@point{}The cat in the hat
	970	comes back" twice.
	971	---------- Buffer: foo ----------
	972	@end group
	973
	974	@group
	975	(re-search-forward "[a-z]+" nil t 5)
	976	@result{} 27
	977
	978	---------- Buffer: foo ----------
	979	I read "The cat in the hat@point{}
	980	comes back" twice.
	981	---------- Buffer: foo ----------
	982	@end group
	983	@end example
	984	@end deffn
	985
	986	@deffn Command re-search-backward regexp &optional limit noerror repeat
	987	This function searches backward in the current buffer for a string of
	988	text that is matched by the regular expression @var{regexp}, leaving
	989	point at the beginning of the first text found.
	990
	991	This function is analogous to @code{re-search-forward}, but they are not
	992	simple mirror images. @code{re-search-forward} finds the match whose
	993	beginning is as close as possible to the starting point. If
	994	@code{re-search-backward} were a perfect mirror image, it would find the
	995	match whose end is as close as possible. However, in fact it finds the
	996	match whose beginning is as close as possible (and yet ends before the
	997	starting point). The reason for this is that matching a regular
	998	expression at a given spot always works from beginning to end, and
	999	starts at a specified beginning position.
	1000
	1001	A true mirror-image of @code{re-search-forward} would require a special
	1002	feature for matching regular expressions from end to beginning. It's
	1003	not worth the trouble of implementing that.
	1004	@end deffn
	1005
	1006	@defun string-match regexp string &optional start
	1007	This function returns the index of the start of the first match for
	1008	the regular expression @var{regexp} in @var{string}, or @code{nil} if
	1009	there is no match. If @var{start} is non-@code{nil}, the search starts
	1010	at that index in @var{string}.
	1011
	1012	For example,
	1013
	1014	@example
	1015	@group
	1016	(string-match
	1017	"quick" "The quick brown fox jumped quickly.")
	1018	@result{} 4
	1019	@end group
	1020	@group
	1021	(string-match
	1022	"quick" "The quick brown fox jumped quickly." 8)
	1023	@result{} 27
	1024	@end group
	1025	@end example
	1026
	1027	@noindent
	1028	The index of the first character of the
	1029	string is 0, the index of the second character is 1, and so on.
	1030
	1031	After this function returns, the index of the first character beyond
	1032	the match is available as @code{(match-end 0)}. @xref{Match Data}.
	1033
	1034	@example
	1035	@group
	1036	(string-match
	1037	"quick" "The quick brown fox jumped quickly." 8)
	1038	@result{} 27
	1039	@end group
	1040
	1041	@group
	1042	(match-end 0)
	1043	@result{} 32
	1044	@end group
	1045	@end example
	1046	@end defun
	1047
	1048	@defun looking-at regexp
	1049	This function determines whether the text in the current buffer directly
	1050	following point matches the regular expression @var{regexp}. ``Directly
	1051	following'' means precisely that: the search is ``anchored'' and it can
	1052	succeed only starting with the first character following point. The
	1053	result is @code{t} if so, @code{nil} otherwise.
	1054
	1055	This function does not move point, but it updates the match data, which
	1056	you can access using @code{match-beginning} and @code{match-end}.
	1057	@xref{Match Data}.
	1058
	1059	In this example, point is located directly before the @samp{T}. If it
	1060	were anywhere else, the result would be @code{nil}.
	1061
	1062	@example
	1063	@group
	1064	---------- Buffer: foo ----------
	1065	I read "@point{}The cat in the hat
	1066	comes back" twice.
	1067	---------- Buffer: foo ----------
	1068
	1069	(looking-at "The cat in the hat$")
	1070	@result{} t
	1071	@end group
	1072	@end example
	1073	@end defun
	1074
	1075	@defun looking-back regexp &optional limit
	1076	This function returns @code{t} if @var{regexp} matches text before
	1077	point, ending at point, and @code{nil} otherwise.
	1078
	1079	Because regular expression matching works only going forward, this is
	1080	implemented by searching backwards from point for a match that ends at
	1081	point. That can be quite slow if it has to search a long distance.
	1082	You can bound the time required by specifying @var{limit}, which says
	1083	not to search before @var{limit}. In this case, the match that is
	1084	found must begin at or after @var{limit}.
	1085
	1086	@example
	1087	@group
	1088	---------- Buffer: foo ----------
	1089	I read "@point{}The cat in the hat
	1090	comes back" twice.
	1091	---------- Buffer: foo ----------
	1092
	1093	(looking-back "read \"" 3)
	1094	@result{} t
	1095	(looking-back "read \"" 4)
	1096	@result{} nil
	1097	@end group
	1098	@end example
	1099	@end defun
	1100
	1101	@defvar search-spaces-regexp
	1102	If this variable is non-@code{nil}, it should be a regular expression
	1103	that says how to search for whitespace. In that case, any group of
	1104	spaces in a regular expression being searched for stands for use of
	1105	this regular expression. However, spaces inside of constructs such as
	1106	@samp{[@dots{}]} and @samp{*}, @samp{+}, @samp{?} are not affected by
	1107	@code{search-spaces-regexp}.
	1108
	1109	Since this variable affects all regular expression search and match
	1110	constructs, you should bind it temporarily for as small as possible
	1111	a part of the code.
	1112	@end defvar
	1113
	1114	@node POSIX Regexps
	1115	@section POSIX Regular Expression Searching
	1116
	1117	The usual regular expression functions do backtracking when necessary
	1118	to handle the @samp{\\|} and repetition constructs, but they continue
	1119	this only until they find @emph{some} match. Then they succeed and
	1120	report the first match found.
	1121
	1122	This section describes alternative search functions which perform the
	1123	full backtracking specified by the POSIX standard for regular expression
	1124	matching. They continue backtracking until they have tried all
	1125	possibilities and found all matches, so they can report the longest
	1126	match, as required by POSIX. This is much slower, so use these
	1127	functions only when you really need the longest match.
	1128
	1129	The POSIX search and match functions do not properly support the
	1130	non-greedy repetition operators. This is because POSIX backtracking
	1131	conflicts with the semantics of non-greedy repetition.
	1132
	1133	@defun posix-search-forward regexp &optional limit noerror repeat
	1134	This is like @code{re-search-forward} except that it performs the full
	1135	backtracking specified by the POSIX standard for regular expression
	1136	matching.
	1137	@end defun
	1138
	1139	@defun posix-search-backward regexp &optional limit noerror repeat
	1140	This is like @code{re-search-backward} except that it performs the full
	1141	backtracking specified by the POSIX standard for regular expression
	1142	matching.
	1143	@end defun
	1144
	1145	@defun posix-looking-at regexp
	1146	This is like @code{looking-at} except that it performs the full
	1147	backtracking specified by the POSIX standard for regular expression
	1148	matching.
	1149	@end defun
	1150
	1151	@defun posix-string-match regexp string &optional start
	1152	This is like @code{string-match} except that it performs the full
	1153	backtracking specified by the POSIX standard for regular expression
	1154	matching.
	1155	@end defun
	1156
	1157	@node Match Data
	1158	@section The Match Data
	1159	@cindex match data
	1160
	1161	Emacs keeps track of the start and end positions of the segments of
	1162	text found during a search; this is called the @dfn{match data}.
	1163	Thanks to the match data, you can search for a complex pattern, such
	1164	as a date in a mail message, and then extract parts of the match under
	1165	control of the pattern.
	1166
	1167	Because the match data normally describe the most recent search only,
	1168	you must be careful not to do another search inadvertently between the
	1169	search you wish to refer back to and the use of the match data. If you
	1170	can't avoid another intervening search, you must save and restore the
	1171	match data around it, to prevent it from being overwritten.
	1172
	1173	@menu
	1174	* Replacing Match:: Replacing a substring that was matched.
	1175	* Simple Match Data:: Accessing single items of match data,
	1176	such as where a particular subexpression started.
	1177	* Entire Match Data:: Accessing the entire match data at once, as a list.
	1178	* Saving Match Data:: Saving and restoring the match data.
	1179	@end menu
	1180
	1181	@node Replacing Match
	1182	@subsection Replacing the Text that Matched
	1183
	1184	This function replaces all or part of the text matched by the last
	1185	search. It works by means of the match data.
	1186
	1187	@cindex case in replacements
	1188	@defun replace-match replacement &optional fixedcase literal string subexp
	1189	This function replaces the text in the buffer (or in @var{string}) that
	1190	was matched by the last search. It replaces that text with
	1191	@var{replacement}.
	1192
	1193	If you did the last search in a buffer, you should specify @code{nil}
	1194	for @var{string} and make sure that the current buffer when you call
	1195	@code{replace-match} is the one in which you did the searching or
	1196	matching. Then @code{replace-match} does the replacement by editing
	1197	the buffer; it leaves point at the end of the replacement text, and
	1198	returns @code{t}.
	1199
	1200	If you did the search in a string, pass the same string as @var{string}.
	1201	Then @code{replace-match} does the replacement by constructing and
	1202	returning a new string.
	1203
	1204	If @var{fixedcase} is non-@code{nil}, then @code{replace-match} uses
	1205	the replacement text without case conversion; otherwise, it converts
	1206	the replacement text depending upon the capitalization of the text to
	1207	be replaced. If the original text is all upper case, this converts
	1208	the replacement text to upper case. If all words of the original text
	1209	are capitalized, this capitalizes all the words of the replacement
	1210	text. If all the words are one-letter and they are all upper case,
	1211	they are treated as capitalized words rather than all-upper-case
	1212	words.
	1213
	1214	If @var{literal} is non-@code{nil}, then @var{replacement} is inserted
	1215	exactly as it is, the only alterations being case changes as needed.
	1216	If it is @code{nil} (the default), then the character @samp{\} is treated
	1217	specially. If a @samp{\} appears in @var{replacement}, then it must be
	1218	part of one of the following sequences:
	1219
	1220	@table @asis
	1221	@item @samp{\&}
	1222	@cindex @samp{&} in replacement
	1223	@samp{\&} stands for the entire text being replaced.
	1224
	1225	@item @samp{\@var{n}}
	1226	@cindex @samp{\@var{n}} in replacement
	1227	@samp{\@var{n}}, where @var{n} is a digit, stands for the text that
	1228	matched the @var{n}th subexpression in the original regexp.
	1229	Subexpressions are those expressions grouped inside @samp{$@dots{}$}.
	1230	If the @var{n}th subexpression never matched, an empty string is substituted.
	1231
	1232	@item @samp{\\}
	1233	@cindex @samp{\} in replacement
	1234	@samp{\\} stands for a single @samp{\} in the replacement text.
	1235	@end table
	1236
	1237	These substitutions occur after case conversion, if any,
	1238	so the strings they substitute are never case-converted.
	1239
	1240	If @var{subexp} is non-@code{nil}, that says to replace just
	1241	subexpression number @var{subexp} of the regexp that was matched, not
	1242	the entire match. For example, after matching @samp{foo $ba*r$},
	1243	calling @code{replace-match} with 1 as @var{subexp} means to replace
	1244	just the text that matched @samp{$ba*r$}.
	1245	@end defun
	1246
	1247	@node Simple Match Data
	1248	@subsection Simple Match Data Access
	1249
	1250	This section explains how to use the match data to find out what was
	1251	matched by the last search or match operation, if it succeeded.
	1252
	1253	You can ask about the entire matching text, or about a particular
	1254	parenthetical subexpression of a regular expression. The @var{count}
	1255	argument in the functions below specifies which. If @var{count} is
	1256	zero, you are asking about the entire match. If @var{count} is
	1257	positive, it specifies which subexpression you want.
	1258
	1259	Recall that the subexpressions of a regular expression are those
	1260	expressions grouped with escaped parentheses, @samp{$@dots{}$}. The
	1261	@var{count}th subexpression is found by counting occurrences of
	1262	@samp{\(} from the beginning of the whole regular expression. The first
	1263	subexpression is numbered 1, the second 2, and so on. Only regular
	1264	expressions can have subexpressions---after a simple string search, the
	1265	only information available is about the entire match.
	1266
	1267	Every successful search sets the match data. Therefore, you should
	1268	query the match data immediately after searching, before calling any
	1269	other function that might perform another search. Alternatively, you
	1270	may save and restore the match data (@pxref{Saving Match Data}) around
	1271	the call to functions that could perform another search.
	1272
	1273	A search which fails may or may not alter the match data. In the
	1274	past, a failing search did not do this, but we may change it in the
	1275	future. So don't try to rely on the value of the match data after
	1276	a failing search.
	1277
	1278	@defun match-string count &optional in-string
	1279	This function returns, as a string, the text matched in the last search
	1280	or match operation. It returns the entire text if @var{count} is zero,
	1281	or just the portion corresponding to the @var{count}th parenthetical
	1282	subexpression, if @var{count} is positive.
	1283
	1284	If the last such operation was done against a string with
	1285	@code{string-match}, then you should pass the same string as the
	1286	argument @var{in-string}. After a buffer search or match,
	1287	you should omit @var{in-string} or pass @code{nil} for it; but you
	1288	should make sure that the current buffer when you call
	1289	@code{match-string} is the one in which you did the searching or
	1290	matching.
	1291
	1292	The value is @code{nil} if @var{count} is out of range, or for a
	1293	subexpression inside a @samp{\\|} alternative that wasn't used or a
	1294	repetition that repeated zero times.
	1295	@end defun
	1296
	1297	@defun match-string-no-properties count &optional in-string
	1298	This function is like @code{match-string} except that the result
	1299	has no text properties.
	1300	@end defun
	1301
	1302	@defun match-beginning count
	1303	This function returns the position of the start of text matched by the
	1304	last regular expression searched for, or a subexpression of it.
	1305
	1306	If @var{count} is zero, then the value is the position of the start of
	1307	the entire match. Otherwise, @var{count} specifies a subexpression in
	1308	the regular expression, and the value of the function is the starting
	1309	position of the match for that subexpression.
	1310
	1311	The value is @code{nil} for a subexpression inside a @samp{\\|}
	1312	alternative that wasn't used or a repetition that repeated zero times.
	1313	@end defun
	1314
	1315	@defun match-end count
	1316	This function is like @code{match-beginning} except that it returns the
	1317	position of the end of the match, rather than the position of the
	1318	beginning.
	1319	@end defun
	1320
	1321	Here is an example of using the match data, with a comment showing the
	1322	positions within the text:
	1323
	1324	@example
	1325	@group
	1326	(string-match "\$qu\$\$ick\$"
	1327	"The quick fox jumped quickly.")
	1328	;0123456789
	1329	@result{} 4
	1330	@end group
	1331
	1332	@group
	1333	(match-string 0 "The quick fox jumped quickly.")
	1334	@result{} "quick"
	1335	(match-string 1 "The quick fox jumped quickly.")
	1336	@result{} "qu"
	1337	(match-string 2 "The quick fox jumped quickly.")
	1338	@result{} "ick"
	1339	@end group
	1340
	1341	@group
	1342	(match-beginning 1) ; @r{The beginning of the match}
	1343	@result{} 4 ; @r{with @samp{qu} is at index 4.}
	1344	@end group
	1345
	1346	@group
	1347	(match-beginning 2) ; @r{The beginning of the match}
	1348	@result{} 6 ; @r{with @samp{ick} is at index 6.}
	1349	@end group
	1350
	1351	@group
	1352	(match-end 1) ; @r{The end of the match}
	1353	@result{} 6 ; @r{with @samp{qu} is at index 6.}
	1354
	1355	(match-end 2) ; @r{The end of the match}
	1356	@result{} 9 ; @r{with @samp{ick} is at index 9.}
	1357	@end group
	1358	@end example
	1359
	1360	Here is another example. Point is initially located at the beginning
	1361	of the line. Searching moves point to between the space and the word
	1362	@samp{in}. The beginning of the entire match is at the 9th character of
	1363	the buffer (@samp{T}), and the beginning of the match for the first
	1364	subexpression is at the 13th character (@samp{c}).
	1365
	1366	@example
	1367	@group
	1368	(list
	1369	(re-search-forward "The \$cat \$")
	1370	(match-beginning 0)
	1371	(match-beginning 1))
	1372	@result{} (9 9 13)
	1373	@end group
	1374
	1375	@group
	1376	---------- Buffer: foo ----------
	1377	I read "The cat @point{}in the hat comes back" twice.
	1378	^ ^
	1379	9 13
	1380	---------- Buffer: foo ----------
	1381	@end group
	1382	@end example
	1383
	1384	@noindent
	1385	(In this case, the index returned is a buffer position; the first
	1386	character of the buffer counts as 1.)
	1387
	1388	@node Entire Match Data
	1389	@subsection Accessing the Entire Match Data
	1390
	1391	The functions @code{match-data} and @code{set-match-data} read or
	1392	write the entire match data, all at once.
	1393
	1394	@defun match-data &optional integers reuse reseat
	1395	This function returns a list of positions (markers or integers) that
	1396	record all the information on what text the last search matched.
	1397	Element zero is the position of the beginning of the match for the
	1398	whole expression; element one is the position of the end of the match
	1399	for the expression. The next two elements are the positions of the
	1400	beginning and end of the match for the first subexpression, and so on.
	1401	In general, element
	1402	@ifnottex
	1403	number 2@var{n}
	1404	@end ifnottex
	1405	@tex
	1406	number {\mathsurround=0pt $2n$}
	1407	@end tex
	1408	corresponds to @code{(match-beginning @var{n})}; and
	1409	element
	1410	@ifnottex
	1411	number 2@var{n} + 1
	1412	@end ifnottex
	1413	@tex
	1414	number {\mathsurround=0pt $2n+1$}
	1415	@end tex
	1416	corresponds to @code{(match-end @var{n})}.
	1417
	1418	Normally all the elements are markers or @code{nil}, but if
	1419	@var{integers} is non-@code{nil}, that means to use integers instead
	1420	of markers. (In that case, the buffer itself is appended as an
	1421	additional element at the end of the list, to facilitate complete
	1422	restoration of the match data.) If the last match was done on a
	1423	string with @code{string-match}, then integers are always used,
	1424	since markers can't point into a string.
	1425
	1426	If @var{reuse} is non-@code{nil}, it should be a list. In that case,
	1427	@code{match-data} stores the match data in @var{reuse}. That is,
	1428	@var{reuse} is destructively modified. @var{reuse} does not need to
	1429	have the right length. If it is not long enough to contain the match
	1430	data, it is extended. If it is too long, the length of @var{reuse}
	1431	stays the same, but the elements that were not used are set to
	1432	@code{nil}. The purpose of this feature is to reduce the need for
	1433	garbage collection.
	1434
	1435	If @var{reseat} is non-@code{nil}, all markers on the @var{reuse} list
	1436	are reseated to point to nowhere.
	1437
	1438	As always, there must be no possibility of intervening searches between
	1439	the call to a search function and the call to @code{match-data} that is
	1440	intended to access the match data for that search.
	1441
	1442	@example
	1443	@group
	1444	(match-data)
	1445	@result{} (#<marker at 9 in foo>
	1446	#<marker at 17 in foo>
	1447	#<marker at 13 in foo>
	1448	#<marker at 17 in foo>)
	1449	@end group
	1450	@end example
	1451	@end defun
	1452
	1453	@defun set-match-data match-list &optional reseat
	1454	This function sets the match data from the elements of @var{match-list},
	1455	which should be a list that was the value of a previous call to
	1456	@code{match-data}. (More precisely, anything that has the same format
	1457	will work.)
	1458
	1459	If @var{match-list} refers to a buffer that doesn't exist, you don't get
	1460	an error; that sets the match data in a meaningless but harmless way.
	1461
	1462	If @var{reseat} is non-@code{nil}, all markers on the @var{match-list} list
	1463	are reseated to point to nowhere.
	1464
	1465	@findex store-match-data
	1466	@code{store-match-data} is a semi-obsolete alias for @code{set-match-data}.
	1467	@end defun
	1468
	1469	@node Saving Match Data
	1470	@subsection Saving and Restoring the Match Data
	1471
	1472	When you call a function that may do a search, you may need to save
	1473	and restore the match data around that call, if you want to preserve the
	1474	match data from an earlier search for later use. Here is an example
	1475	that shows the problem that arises if you fail to save the match data:
	1476
	1477	@example
	1478	@group
	1479	(re-search-forward "The \$cat \$")
	1480	@result{} 48
	1481	(foo) ; @r{Perhaps @code{foo} does}
	1482	; @r{more searching.}
	1483	(match-end 0)
	1484	@result{} 61 ; @r{Unexpected result---not 48!}
	1485	@end group
	1486	@end example
	1487
	1488	You can save and restore the match data with @code{save-match-data}:
	1489
	1490	@defmac save-match-data body@dots{}
	1491	This macro executes @var{body}, saving and restoring the match
	1492	data around it. The return value is the value of the last form in
	1493	@var{body}.
	1494	@end defmac
	1495
	1496	You could use @code{set-match-data} together with @code{match-data} to
	1497	imitate the effect of the special form @code{save-match-data}. Here is
	1498	how:
	1499
	1500	@example
	1501	@group
	1502	(let ((data (match-data)))
	1503	(unwind-protect
	1504	@dots{} ; @r{Ok to change the original match data.}
	1505	(set-match-data data)))
	1506	@end group
	1507	@end example
	1508
	1509	Emacs automatically saves and restores the match data when it runs
	1510	process filter functions (@pxref{Filter Functions}) and process
	1511	sentinels (@pxref{Sentinels}).
	1512
	1513	@ignore
	1514	Here is a function which restores the match data provided the buffer
	1515	associated with it still exists.
	1516
	1517	@smallexample
	1518	@group
	1519	(defun restore-match-data (data)
	1520	@c It is incorrect to split the first line of a doc string.
	1521	@c If there's a problem here, it should be solved in some other way.
	1522	"Restore the match data DATA unless the buffer is missing."
	1523	(catch 'foo
	1524	(let ((d data))
	1525	@end group
	1526	(while d
	1527	(and (car d)
	1528	(null (marker-buffer (car d)))
	1529	@group
	1530	;; @file{match-data} @r{buffer is deleted.}
	1531	(throw 'foo nil))
	1532	(setq d (cdr d)))
	1533	(set-match-data data))))
	1534	@end group
	1535	@end smallexample
	1536	@end ignore
	1537
	1538	@node Search and Replace
	1539	@section Search and Replace
	1540	@cindex replacement
	1541
	1542	If you want to find all matches for a regexp in part of the buffer,
	1543	and replace them, the best way is to write an explicit loop using
	1544	@code{re-search-forward} and @code{replace-match}, like this:
	1545
	1546	@example
	1547	(while (re-search-forward "foo[ \t]+bar" nil t)
	1548	(replace-match "foobar"))
	1549	@end example
	1550
	1551	@noindent
	1552	@xref{Replacing Match,, Replacing the Text that Matched}, for a
	1553	description of @code{replace-match}.
	1554
	1555	However, replacing matches in a string is more complex, especially
	1556	if you want to do it efficiently. So Emacs provides a function to do
	1557	this.
	1558
	1559	@defun replace-regexp-in-string regexp rep string &optional fixedcase literal subexp start
	1560	This function copies @var{string} and searches it for matches for
	1561	@var{regexp}, and replaces them with @var{rep}. It returns the
	1562	modified copy. If @var{start} is non-@code{nil}, the search for
	1563	matches starts at that index in @var{string}, so matches starting
	1564	before that index are not changed.
	1565
	1566	This function uses @code{replace-match} to do the replacement, and it
	1567	passes the optional arguments @var{fixedcase}, @var{literal} and
	1568	@var{subexp} along to @code{replace-match}.
	1569
	1570	Instead of a string, @var{rep} can be a function. In that case,
	1571	@code{replace-regexp-in-string} calls @var{rep} for each match,
	1572	passing the text of the match as its sole argument. It collects the
	1573	value @var{rep} returns and passes that to @code{replace-match} as the
	1574	replacement string. The match-data at this point are the result
	1575	of matching @var{regexp} against a substring of @var{string}.
	1576	@end defun
	1577
	1578	If you want to write a command along the lines of @code{query-replace},
	1579	you can use @code{perform-replace} to do the work.
	1580
	1581	@defun perform-replace from-string replacements query-flag regexp-flag delimited-flag &optional repeat-count map start end
	1582	This function is the guts of @code{query-replace} and related
	1583	commands. It searches for occurrences of @var{from-string} in the
	1584	text between positions @var{start} and @var{end} and replaces some or
	1585	all of them. If @var{start} is @code{nil} (or omitted), point is used
	1586	instead, and the end of the buffer's accessible portion is used for
	1587	@var{end}.
	1588
	1589	If @var{query-flag} is @code{nil}, it replaces all
	1590	occurrences; otherwise, it asks the user what to do about each one.
	1591
	1592	If @var{regexp-flag} is non-@code{nil}, then @var{from-string} is
	1593	considered a regular expression; otherwise, it must match literally. If
	1594	@var{delimited-flag} is non-@code{nil}, then only replacements
	1595	surrounded by word boundaries are considered.
	1596
	1597	The argument @var{replacements} specifies what to replace occurrences
	1598	with. If it is a string, that string is used. It can also be a list of
	1599	strings, to be used in cyclic order.
	1600
	1601	If @var{replacements} is a cons cell, @code{(@var{function}
	1602	. @var{data})}, this means to call @var{function} after each match to
	1603	get the replacement text. This function is called with two arguments:
	1604	@var{data}, and the number of replacements already made.
	1605
	1606	If @var{repeat-count} is non-@code{nil}, it should be an integer. Then
	1607	it specifies how many times to use each of the strings in the
	1608	@var{replacements} list before advancing cyclically to the next one.
	1609
	1610	If @var{from-string} contains upper-case letters, then
	1611	@code{perform-replace} binds @code{case-fold-search} to @code{nil}, and
	1612	it uses the @code{replacements} without altering the case of them.
	1613
	1614	Normally, the keymap @code{query-replace-map} defines the possible
	1615	user responses for queries. The argument @var{map}, if
	1616	non-@code{nil}, specifies a keymap to use instead of
	1617	@code{query-replace-map}.
	1618	@end defun
	1619
	1620	@defvar query-replace-map
	1621	This variable holds a special keymap that defines the valid user
	1622	responses for @code{perform-replace} and the commands that use it, as
	1623	well as @code{y-or-n-p} and @code{map-y-or-n-p}. This map is unusual
	1624	in two ways:
	1625
	1626	@itemize @bullet
	1627	@item
	1628	The ``key bindings'' are not commands, just symbols that are meaningful
	1629	to the functions that use this map.
	1630
	1631	@item
	1632	Prefix keys are not supported; each key binding must be for a
	1633	single-event key sequence. This is because the functions don't use
	1634	@code{read-key-sequence} to get the input; instead, they read a single
	1635	event and look it up ``by hand.''
	1636	@end itemize
	1637	@end defvar
	1638
	1639	Here are the meaningful ``bindings'' for @code{query-replace-map}.
	1640	Several of them are meaningful only for @code{query-replace} and
	1641	friends.
	1642
	1643	@table @code
	1644	@item act
	1645	Do take the action being considered---in other words, ``yes.''
	1646
	1647	@item skip
	1648	Do not take action for this question---in other words, ``no.''
	1649
	1650	@item exit
	1651	Answer this question ``no,'' and give up on the entire series of
	1652	questions, assuming that the answers will be ``no.''
	1653
	1654	@item act-and-exit
	1655	Answer this question ``yes,'' and give up on the entire series of
	1656	questions, assuming that subsequent answers will be ``no.''
	1657
	1658	@item act-and-show
	1659	Answer this question ``yes,'' but show the results---don't advance yet
	1660	to the next question.
	1661
	1662	@item automatic
	1663	Answer this question and all subsequent questions in the series with
	1664	``yes,'' without further user interaction.
	1665
	1666	@item backup
	1667	Move back to the previous place that a question was asked about.
	1668
	1669	@item edit
	1670	Enter a recursive edit to deal with this question---instead of any
	1671	other action that would normally be taken.
	1672
	1673	@item delete-and-edit
	1674	Delete the text being considered, then enter a recursive edit to replace
	1675	it.
	1676
	1677	@item recenter
	1678	Redisplay and center the window, then ask the same question again.
	1679
	1680	@item quit
	1681	Perform a quit right away. Only @code{y-or-n-p} and related functions
	1682	use this answer.
	1683
	1684	@item help
	1685	Display some help, then ask again.
	1686	@end table
	1687
	1688	@node Standard Regexps
	1689	@section Standard Regular Expressions Used in Editing
	1690	@cindex regexps used standardly in editing
	1691	@cindex standard regexps used in editing
	1692
	1693	This section describes some variables that hold regular expressions
	1694	used for certain purposes in editing:
	1695
	1696	@defvar page-delimiter
	1697	This is the regular expression describing line-beginnings that separate
	1698	pages. The default value is @code{"^\014"} (i.e., @code{"^^L"} or
	1699	@code{"^\C-l"}); this matches a line that starts with a formfeed
	1700	character.
	1701	@end defvar
	1702
	1703	The following two regular expressions should @emph{not} assume the
	1704	match always starts at the beginning of a line; they should not use
	1705	@samp{^} to anchor the match. Most often, the paragraph commands do
	1706	check for a match only at the beginning of a line, which means that
	1707	@samp{^} would be superfluous. When there is a nonzero left margin,
	1708	they accept matches that start after the left margin. In that case, a
	1709	@samp{^} would be incorrect. However, a @samp{^} is harmless in modes
	1710	where a left margin is never used.
	1711
	1712	@defvar paragraph-separate
	1713	This is the regular expression for recognizing the beginning of a line
	1714	that separates paragraphs. (If you change this, you may have to
	1715	change @code{paragraph-start} also.) The default value is
	1716	@w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of
	1717	spaces, tabs, and form feeds (after its left margin).
	1718	@end defvar
	1719
	1720	@defvar paragraph-start
	1721	This is the regular expression for recognizing the beginning of a line
	1722	that starts @emph{or} separates paragraphs. The default value is
	1723	@w{@code{"\f\\\|[ \t]*$"}}, which matches a line containing only
	1724	whitespace or starting with a form feed (after its left margin).
	1725	@end defvar
	1726
	1727	@defvar sentence-end
	1728	If non-@code{nil}, the value should be a regular expression describing
	1729	the end of a sentence, including the whitespace following the
	1730	sentence. (All paragraph boundaries also end sentences, regardless.)
	1731
	1732	If the value is @code{nil}, the default, then the function
	1733	@code{sentence-end} has to construct the regexp. That is why you
	1734	should always call the function @code{sentence-end} to obtain the
	1735	regexp to be used to recognize the end of a sentence.
	1736	@end defvar
	1737
	1738	@defun sentence-end
	1739	This function returns the value of the variable @code{sentence-end},
	1740	if non-@code{nil}. Otherwise it returns a default value based on the
	1741	values of the variables @code{sentence-end-double-space}
	1742	(@pxref{Definition of sentence-end-double-space}),
	1743	@code{sentence-end-without-period} and
	1744	@code{sentence-end-without-space}.
	1745	@end defun
	1746
	1747	@ignore
	1748	arch-tag: c2573ca2-18aa-4839-93b8-924043ef831f
	1749	@end ignore