HCoop Git - bpt/emacs.git/blame_incremental

... / ...

Commit	Line	Data
	1	@c --texinfo--
	2	@c This is part of the GNU Emacs Lisp Reference Manual.
	3	@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2002, 2003,
	4	@c 2004, 2005 Free Software Foundation, Inc.
	5	@c See the file elisp.texi for copying conditions.
	6	@setfilename ../info/searching
	7	@node Searching and Matching, Syntax Tables, Non-ASCII Characters, Top
	8	@chapter Searching and Matching
	9	@cindex searching
	10
	11	GNU Emacs provides two ways to search through a buffer for specified
	12	text: exact string searches and regular expression searches. After a
	13	regular expression search, you can examine the @dfn{match data} to
	14	determine which text matched the whole regular expression or various
	15	portions of it.
	16
	17	@menu
	18	* String Search:: Search for an exact match.
	19	* Searching and Case:: Case-independent or case-significant searching.
	20	* Regular Expressions:: Describing classes of strings.
	21	* Regexp Search:: Searching for a match for a regexp.
	22	* POSIX Regexps:: Searching POSIX-style for the longest match.
	23	* Match Data:: Finding out which part of the text matched,
	24	after a string or regexp search.
	25	* Search and Replace:: Commands that loop, searching and replacing.
	26	* Standard Regexps:: Useful regexps for finding sentences, pages,...
	27	@end menu
	28
	29	The @samp{skip-chars@dots{}} functions also perform a kind of searching.
	30	@xref{Skipping Characters}.
	31
	32	@node String Search
	33	@section Searching for Strings
	34	@cindex string search
	35
	36	These are the primitive functions for searching through the text in a
	37	buffer. They are meant for use in programs, but you may call them
	38	interactively. If you do so, they prompt for the search string; the
	39	arguments @var{limit} and @var{noerror} are @code{nil}, and @var{repeat}
	40	is 1.
	41
	42	These search functions convert the search string to multibyte if the
	43	buffer is multibyte; they convert the search string to unibyte if the
	44	buffer is unibyte. @xref{Text Representations}.
	45
	46	@deffn Command search-forward string &optional limit noerror repeat
	47	This function searches forward from point for an exact match for
	48	@var{string}. If successful, it sets point to the end of the occurrence
	49	found, and returns the new value of point. If no match is found, the
	50	value and side effects depend on @var{noerror} (see below).
	51	@c Emacs 19 feature
	52
	53	In the following example, point is initially at the beginning of the
	54	line. Then @code{(search-forward "fox")} moves point after the last
	55	letter of @samp{fox}:
	56
	57	@example
	58	@group
	59	---------- Buffer: foo ----------
	60	@point{}The quick brown fox jumped over the lazy dog.
	61	---------- Buffer: foo ----------
	62	@end group
	63
	64	@group
	65	(search-forward "fox")
	66	@result{} 20
	67
	68	---------- Buffer: foo ----------
	69	The quick brown fox@point{} jumped over the lazy dog.
	70	---------- Buffer: foo ----------
	71	@end group
	72	@end example
	73
	74	The argument @var{limit} specifies the upper bound to the search. (It
	75	must be a position in the current buffer.) No match extending after
	76	that position is accepted. If @var{limit} is omitted or @code{nil}, it
	77	defaults to the end of the accessible portion of the buffer.
	78
	79	@kindex search-failed
	80	What happens when the search fails depends on the value of
	81	@var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
	82	error is signaled. If @var{noerror} is @code{t}, @code{search-forward}
	83	returns @code{nil} and does nothing. If @var{noerror} is neither
	84	@code{nil} nor @code{t}, then @code{search-forward} moves point to the
	85	upper bound and returns @code{nil}. (It would be more consistent now to
	86	return the new position of point in that case, but some existing
	87	programs may depend on a value of @code{nil}.)
	88
	89	If @var{repeat} is supplied (it must be a positive number), then the
	90	search is repeated that many times (each time starting at the end of the
	91	previous time's match). If these successive searches succeed, the
	92	function succeeds, moving point and returning its new value. Otherwise
	93	the search fails, with results depending on the value of
	94	@var{noerror}, as described above.
	95	@end deffn
	96
	97	@deffn Command search-backward string &optional limit noerror repeat
	98	This function searches backward from point for @var{string}. It is
	99	just like @code{search-forward} except that it searches backwards and
	100	leaves point at the beginning of the match.
	101	@end deffn
	102
	103	@deffn Command word-search-forward string &optional limit noerror repeat
	104	@cindex word search
	105	This function searches forward from point for a ``word'' match for
	106	@var{string}. If it finds a match, it sets point to the end of the
	107	match found, and returns the new value of point.
	108	@c Emacs 19 feature
	109
	110	Word matching regards @var{string} as a sequence of words, disregarding
	111	punctuation that separates them. It searches the buffer for the same
	112	sequence of words. Each word must be distinct in the buffer (searching
	113	for the word @samp{ball} does not match the word @samp{balls}), but the
	114	details of punctuation and spacing are ignored (searching for @samp{ball
	115	boy} does match @samp{ball. Boy!}).
	116
	117	In this example, point is initially at the beginning of the buffer; the
	118	search leaves it between the @samp{y} and the @samp{!}.
	119
	120	@example
	121	@group
	122	---------- Buffer: foo ----------
	123	@point{}He said "Please! Find
	124	the ball boy!"
	125	---------- Buffer: foo ----------
	126	@end group
	127
	128	@group
	129	(word-search-forward "Please find the ball, boy.")
	130	@result{} 35
	131
	132	---------- Buffer: foo ----------
	133	He said "Please! Find
	134	the ball boy@point{}!"
	135	---------- Buffer: foo ----------
	136	@end group
	137	@end example
	138
	139	If @var{limit} is non-@code{nil}, it must be a position in the current
	140	buffer; it specifies the upper bound to the search. The match found
	141	must not extend after that position.
	142
	143	If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
	144	an error if the search fails. If @var{noerror} is @code{t}, then it
	145	returns @code{nil} instead of signaling an error. If @var{noerror} is
	146	neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the
	147	end of the accessible portion of the buffer) and returns @code{nil}.
	148
	149	If @var{repeat} is non-@code{nil}, then the search is repeated that many
	150	times. Point is positioned at the end of the last match.
	151	@end deffn
	152
	153	@deffn Command word-search-backward string &optional limit noerror repeat
	154	This function searches backward from point for a word match to
	155	@var{string}. This function is just like @code{word-search-forward}
	156	except that it searches backward and normally leaves point at the
	157	beginning of the match.
	158	@end deffn
	159
	160	@node Searching and Case
	161	@section Searching and Case
	162	@cindex searching and case
	163
	164	By default, searches in Emacs ignore the case of the text they are
	165	searching through; if you specify searching for @samp{FOO}, then
	166	@samp{Foo} or @samp{foo} is also considered a match. This applies to
	167	regular expressions, too; thus, @samp{[aB]} would match @samp{a} or
	168	@samp{A} or @samp{b} or @samp{B}.
	169
	170	If you do not want this feature, set the variable
	171	@code{case-fold-search} to @code{nil}. Then all letters must match
	172	exactly, including case. This is a buffer-local variable; altering the
	173	variable affects only the current buffer. (@xref{Intro to
	174	Buffer-Local}.) Alternatively, you may change the value of
	175	@code{default-case-fold-search}, which is the default value of
	176	@code{case-fold-search} for buffers that do not override it.
	177
	178	Note that the user-level incremental search feature handles case
	179	distinctions differently. When given a lower case letter, it looks for
	180	a match of either case, but when given an upper case letter, it looks
	181	for an upper case letter only. But this has nothing to do with the
	182	searching functions used in Lisp code.
	183
	184	@defopt case-replace
	185	This variable determines whether the higher level replacement
	186	functions should preserve case. If the variable is @code{nil}, that
	187	means to use the replacement text verbatim. A non-@code{nil} value
	188	means to convert the case of the replacement text according to the
	189	text being replaced.
	190
	191	This variable is used by passing it as an argument to the function
	192	@code{replace-match}. @xref{Replacing Match}.
	193	@end defopt
	194
	195	@defopt case-fold-search
	196	This buffer-local variable determines whether searches should ignore
	197	case. If the variable is @code{nil} they do not ignore case; otherwise
	198	they do ignore case.
	199	@end defopt
	200
	201	@defvar default-case-fold-search
	202	The value of this variable is the default value for
	203	@code{case-fold-search} in buffers that do not override it. This is the
	204	same as @code{(default-value 'case-fold-search)}.
	205	@end defvar
	206
	207	@node Regular Expressions
	208	@section Regular Expressions
	209	@cindex regular expression
	210	@cindex regexp
	211
	212	A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
	213	denotes a (possibly infinite) set of strings. Searching for matches for
	214	a regexp is a very powerful operation. This section explains how to write
	215	regexps; the following section says how to search for them.
	216
	217	@findex re-builder
	218	@cindex authoring regular expressions
	219	For convenient interactive development of regular expressions, you
	220	can use the @kbd{M-x re-builder} command. It provides a convenient
	221	interface for creating regular expressions, by giving immediate visual
	222	feedback in a separate buffer. As you edit the regexp, all its
	223	matches in the target buffer are highlighted. Each parenthesized
	224	sub-expression of the regexp is shown in a distinct face, which makes
	225	it easier to verify even very complex regexps.
	226
	227	@menu
	228	* Syntax of Regexps:: Rules for writing regular expressions.
	229	* Regexp Example:: Illustrates regular expression syntax.
	230	* Regexp Functions:: Functions for operating on regular expressions.
	231	@end menu
	232
	233	@node Syntax of Regexps
	234	@subsection Syntax of Regular Expressions
	235
	236	Regular expressions have a syntax in which a few characters are
	237	special constructs and the rest are @dfn{ordinary}. An ordinary
	238	character is a simple regular expression that matches that character and
	239	nothing else. The special characters are @samp{.}, @samp{*}, @samp{+},
	240	@samp{?}, @samp{[}, @samp{]}, @samp{^}, @samp{$}, and @samp{\}; no new
	241	special characters will be defined in the future. Any other character
	242	appearing in a regular expression is ordinary, unless a @samp{\}
	243	precedes it.
	244
	245	For example, @samp{f} is not a special character, so it is ordinary, and
	246	therefore @samp{f} is a regular expression that matches the string
	247	@samp{f} and no other string. (It does @emph{not} match the string
	248	@samp{fg}, but it does match a @emph{part} of that string.) Likewise,
	249	@samp{o} is a regular expression that matches only @samp{o}.@refill
	250
	251	Any two regular expressions @var{a} and @var{b} can be concatenated. The
	252	result is a regular expression that matches a string if @var{a} matches
	253	some amount of the beginning of that string and @var{b} matches the rest of
	254	the string.@refill
	255
	256	As a simple example, we can concatenate the regular expressions @samp{f}
	257	and @samp{o} to get the regular expression @samp{fo}, which matches only
	258	the string @samp{fo}. Still trivial. To do something more powerful, you
	259	need to use one of the special regular expression constructs.
	260
	261	@menu
	262	* Regexp Special:: Special characters in regular expressions.
	263	* Char Classes:: Character classes used in regular expressions.
	264	* Regexp Backslash:: Backslash-sequences in regular expressions.
	265	@end menu
	266
	267	@node Regexp Special
	268	@subsubsection Special Characters in Regular Expressions
	269
	270	Here is a list of the characters that are special in a regular
	271	expression.
	272
	273	@need 800
	274	@table @asis
	275	@item @samp{.}@: @r{(Period)}
	276	@cindex @samp{.} in regexp
	277	is a special character that matches any single character except a newline.
	278	Using concatenation, we can make regular expressions like @samp{a.b}, which
	279	matches any three-character string that begins with @samp{a} and ends with
	280	@samp{b}.@refill
	281
	282	@item @samp{*}
	283	@cindex @samp{*} in regexp
	284	is not a construct by itself; it is a postfix operator that means to
	285	match the preceding regular expression repetitively as many times as
	286	possible. Thus, @samp{o*} matches any number of @samp{o}s (including no
	287	@samp{o}s).
	288
	289	@samp{*} always applies to the @emph{smallest} possible preceding
	290	expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating
	291	@samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
	292
	293	The matcher processes a @samp{*} construct by matching, immediately, as
	294	many repetitions as can be found. Then it continues with the rest of
	295	the pattern. If that fails, backtracking occurs, discarding some of the
	296	matches of the @samp{*}-modified construct in the hope that that will
	297	make it possible to match the rest of the pattern. For example, in
	298	matching @samp{caar} against the string @samp{caaar}, the @samp{a}
	299	first tries to match all three @samp{a}s; but the rest of the pattern is
	300	@samp{ar} and there is only @samp{r} left to match, so this try fails.
	301	The next alternative is for @samp{a*} to match only two @samp{a}s. With
	302	this choice, the rest of the regexp matches successfully.@refill
	303
	304	Nested repetition operators take a long time, or even forever, if they
	305	lead to ambiguous matching. For example, trying to match the regular
	306	expression @samp{$x+y$a} against the string
	307	@samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz} could take hours before it
	308	ultimately fails. Emacs must try each way of grouping the 35
	309	@samp{x}s before concluding that none of them can work. Even worse,
	310	@samp{$x$} can match the null string in infinitely many ways, so
	311	it causes an infinite loop. To avoid these problems, check nested
	312	repetitions carefully.
	313
	314	@item @samp{+}
	315	@cindex @samp{+} in regexp
	316	is a postfix operator, similar to @samp{*} except that it must match
	317	the preceding expression at least once. So, for example, @samp{ca+r}
	318	matches the strings @samp{car} and @samp{caaaar} but not the string
	319	@samp{cr}, whereas @samp{ca*r} matches all three strings.
	320
	321	@item @samp{?}
	322	@cindex @samp{?} in regexp
	323	is a postfix operator, similar to @samp{*} except that it must match the
	324	preceding expression either once or not at all. For example,
	325	@samp{ca?r} matches @samp{car} or @samp{cr}; nothing else.
	326
	327	@item @samp{*?}, @samp{+?}, @samp{??}
	328	These are ``non-greedy'' variants of the operators @samp{*}, @samp{+}
	329	and @samp{?}. Where those operators match the largest possible
	330	substring (consistent with matching the entire containing expression),
	331	the non-greedy variants match the smallest possible substring
	332	(consistent with matching the entire containing expression).
	333
	334	For example, the regular expression @samp{c[ad]*a} when applied to the
	335	string @samp{cdaaada} matches the whole string; but the regular
	336	expression @samp{c[ad]*?a}, applied to that same string, matches just
	337	@samp{cda}. (The smallest possible match here for @samp{[ad]*?} that
	338	permits the whole expression to match is @samp{d}.)
	339
	340	@item @samp{[ @dots{} ]}
	341	@cindex character alternative (in regexp)
	342	@cindex @samp{[} in regexp
	343	@cindex @samp{]} in regexp
	344	is a @dfn{character alternative}, which begins with @samp{[} and is
	345	terminated by @samp{]}. In the simplest case, the characters between
	346	the two brackets are what this character alternative can match.
	347
	348	Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and
	349	@samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s
	350	(including the empty string), from which it follows that @samp{c[ad]*r}
	351	matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
	352
	353	You can also include character ranges in a character alternative, by
	354	writing the starting and ending characters with a @samp{-} between them.
	355	Thus, @samp{[a-z]} matches any lower-case @acronym{ASCII} letter.
	356	Ranges may be intermixed freely with individual characters, as in
	357	@samp{[a-z$%.]}, which matches any lower case @acronym{ASCII} letter
	358	or @samp{$}, @samp{%} or period.
	359
	360	Note that the usual regexp special characters are not special inside a
	361	character alternative. A completely different set of characters is
	362	special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
	363
	364	To include a @samp{]} in a character alternative, you must make it the
	365	first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}.
	366	To include a @samp{-}, write @samp{-} as the first or last character of
	367	the character alternative, or put it after a range. Thus, @samp{[]-]}
	368	matches both @samp{]} and @samp{-}.
	369
	370	To include @samp{^} in a character alternative, put it anywhere but at
	371	the beginning.
	372
	373	The beginning and end of a range of multibyte characters must be in
	374	the same character set (@pxref{Character Sets}). Thus,
	375	@code{"[\x8e0-\x97c]"} is invalid because character 0x8e0 (@samp{a}
	376	with grave accent) is in the Emacs character set for Latin-1 but the
	377	character 0x97c (@samp{u} with diaeresis) is in the Emacs character
	378	set for Latin-2. (We use Lisp string syntax to write that example,
	379	and a few others in the next few paragraphs, in order to include hex
	380	escape sequences in them.)
	381
	382	If a range starts with a unibyte character @var{c} and ends with a
	383	multibyte character @var{c2}, the range is divided into two parts: one
	384	is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where
	385	@var{c1} is the first character of the charset to which @var{c2}
	386	belongs.
	387
	388	You cannot always match all non-@acronym{ASCII} characters with the regular
	389	expression @code{"[\200-\377]"}. This works when searching a unibyte
	390	buffer or string (@pxref{Text Representations}), but not in a multibyte
	391	buffer or string, because many non-@acronym{ASCII} characters have codes
	392	above octal 0377. However, the regular expression @code{"[^\000-\177]"}
	393	does match all non-@acronym{ASCII} characters (see below regarding @samp{^}),
	394	in both multibyte and unibyte representations, because only the
	395	@acronym{ASCII} characters are excluded.
	396
	397	A character alternative can also specify named
	398	character classes (@pxref{Char Classes}). This is a POSIX feature whose
	399	syntax is @samp{[:@var{class}:]}. Using a character class is equivalent
	400	to mentioning each of the characters in that class; but the latter is
	401	not feasible in practice, since some classes include thousands of
	402	different characters.
	403
	404	@item @samp{[^ @dots{} ]}
	405	@cindex @samp{^} in regexp
	406	@samp{[^} begins a @dfn{complemented character alternative}. This
	407	matches any character except the ones specified. Thus,
	408	@samp{[^a-z0-9A-Z]} matches all characters @emph{except} letters and
	409	digits.
	410
	411	@samp{^} is not special in a character alternative unless it is the first
	412	character. The character following the @samp{^} is treated as if it
	413	were first (in other words, @samp{-} and @samp{]} are not special there).
	414
	415	A complemented character alternative can match a newline, unless newline is
	416	mentioned as one of the characters not to match. This is in contrast to
	417	the handling of regexps in programs such as @code{grep}.
	418
	419	@item @samp{^}
	420	@cindex beginning of line in regexp
	421	When matching a buffer, @samp{^} matches the empty string, but only at the
	422	beginning of a line in the text being matched (or the beginning of the
	423	accessible portion of the buffer). Otherwise it fails to match
	424	anything. Thus, @samp{^foo} matches a @samp{foo} that occurs at the
	425	beginning of a line.
	426
	427	When matching a string instead of a buffer, @samp{^} matches at the
	428	beginning of the string or after a newline character.
	429
	430	For historical compatibility reasons, @samp{^} can be used only at the
	431	beginning of the regular expression, or after @samp{\(} or @samp{\\|}.
	432
	433	@item @samp{$}
	434	@cindex @samp{$} in regexp
	435	@cindex end of line in regexp
	436	is similar to @samp{^} but matches only at the end of a line (or the
	437	end of the accessible portion of the buffer). Thus, @samp{x+$}
	438	matches a string of one @samp{x} or more at the end of a line.
	439
	440	When matching a string instead of a buffer, @samp{$} matches at the end
	441	of the string or before a newline character.
	442
	443	For historical compatibility reasons, @samp{$} can be used only at the
	444	end of the regular expression, or before @samp{\)} or @samp{\\|}.
	445
	446	@item @samp{\}
	447	@cindex @samp{\} in regexp
	448	has two functions: it quotes the special characters (including
	449	@samp{\}), and it introduces additional special constructs.
	450
	451	Because @samp{\} quotes special characters, @samp{\$} is a regular
	452	expression that matches only @samp{$}, and @samp{\[} is a regular
	453	expression that matches only @samp{[}, and so on.
	454
	455	Note that @samp{\} also has special meaning in the read syntax of Lisp
	456	strings (@pxref{String Type}), and must be quoted with @samp{\}. For
	457	example, the regular expression that matches the @samp{\} character is
	458	@samp{\\}. To write a Lisp string that contains the characters
	459	@samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
	460	@samp{\}. Therefore, the read syntax for a regular expression matching
	461	@samp{\} is @code{"\\\\"}.@refill
	462	@end table
	463
	464	@strong{Please note:} For historical compatibility, special characters
	465	are treated as ordinary ones if they are in contexts where their special
	466	meanings make no sense. For example, @samp{foo} treats @samp{} as
	467	ordinary since there is no preceding expression on which the @samp{*}
	468	can act. It is poor practice to depend on this behavior; quote the
	469	special character anyway, regardless of where it appears.@refill
	470
	471	@node Char Classes
	472	@subsubsection Character Classes
	473	@cindex character classes in regexp
	474
	475	Here is a table of the classes you can use in a character alternative,
	476	and what they mean:
	477
	478	@table @samp
	479	@item [:ascii:]
	480	This matches any @acronym{ASCII} (unibyte) character.
	481	@item [:alnum:]
	482	This matches any letter or digit. (At present, for multibyte
	483	characters, it matches anything that has word syntax.)
	484	@item [:alpha:]
	485	This matches any letter. (At present, for multibyte characters, it
	486	matches anything that has word syntax.)
	487	@item [:blank:]
	488	This matches space and tab only.
	489	@item [:cntrl:]
	490	This matches any @acronym{ASCII} control character.
	491	@item [:digit:]
	492	This matches @samp{0} through @samp{9}. Thus, @samp{[-+[:digit:]]}
	493	matches any digit, as well as @samp{+} and @samp{-}.
	494	@item [:graph:]
	495	This matches graphic characters---everything except @acronym{ASCII} control
	496	characters, space, and the delete character.
	497	@item [:lower:]
	498	This matches any lower-case letter, as determined by
	499	the current case table (@pxref{Case Tables}).
	500	@item [:nonascii:]
	501	This matches any non-@acronym{ASCII} (multibyte) character.
	502	@item [:print:]
	503	This matches printing characters---everything except @acronym{ASCII} control
	504	characters and the delete character.
	505	@item [:punct:]
	506	This matches any punctuation character. (At present, for multibyte
	507	characters, it matches anything that has non-word syntax.)
	508	@item [:space:]
	509	This matches any character that has whitespace syntax
	510	(@pxref{Syntax Class Table}).
	511	@item [:upper:]
	512	This matches any upper-case letter, as determined by
	513	the current case table (@pxref{Case Tables}).
	514	@item [:word:]
	515	This matches any character that has word syntax (@pxref{Syntax Class
	516	Table}).
	517	@item [:xdigit:]
	518	This matches the hexadecimal digits: @samp{0} through @samp{9}, @samp{a}
	519	through @samp{f} and @samp{A} through @samp{F}.
	520	@end table
	521
	522	@node Regexp Backslash
	523	@subsubsection Backslash Constructs in Regular Expressions
	524
	525	For the most part, @samp{\} followed by any character matches only
	526	that character. However, there are several exceptions: certain
	527	two-character sequences starting with @samp{\} that have special
	528	meanings. (The character after the @samp{\} in such a sequence is
	529	always ordinary when used on its own.) Here is a table of the special
	530	@samp{\} constructs.
	531
	532	@table @samp
	533	@item \\|
	534	@cindex @samp{\|} in regexp
	535	@cindex regexp alternative
	536	specifies an alternative.
	537	Two regular expressions @var{a} and @var{b} with @samp{\\|} in
	538	between form an expression that matches anything that either @var{a} or
	539	@var{b} matches.@refill
	540
	541	Thus, @samp{foo\\|bar} matches either @samp{foo} or @samp{bar}
	542	but no other string.@refill
	543
	544	@samp{\\|} applies to the largest possible surrounding expressions. Only a
	545	surrounding @samp{$ @dots{} $} grouping can limit the grouping power of
	546	@samp{\\|}.@refill
	547
	548	If you need full backtracking capability to handle multiple uses of
	549	@samp{\\|}, use the POSIX regular expression functions (@pxref{POSIX
	550	Regexps}).
	551
	552	@item \@{@var{m}\@}
	553	is a postfix operator that repeats the previous pattern exactly @var{m}
	554	times. Thus, @samp{x\@{5\@}} matches the string @samp{xxxxx}
	555	and nothing else. @samp{c[ad]\@{3\@}r} matches string such as
	556	@samp{caaar}, @samp{cdddr}, @samp{cadar}, and so on.
	557
	558	@item \@{@var{m},@var{n}\@}
	559	is a more general postfix operator that specifies repetition with a
	560	minimum of @var{m} repeats and a maximum of @var{n} repeats. If @var{m}
	561	is omitted, the minimum is 0; if @var{n} is omitted, there is no
	562	maximum.
	563
	564	For example, @samp{c[ad]\@{1,2\@}r} matches the strings @samp{car},
	565	@samp{cdr}, @samp{caar}, @samp{cadr}, @samp{cdar}, and @samp{cddr}, and
	566	nothing else.@*
	567	@samp{\@{0,1\@}} or @samp{\@{,1\@}} is equivalent to @samp{?}. @*
	568	@samp{\@{0,\@}} or @samp{\@{,\@}} is equivalent to @samp{}. @
	569	@samp{\@{1,\@}} is equivalent to @samp{+}.
	570
	571	@item $ @dots{} $
	572	@cindex @samp{(} in regexp
	573	@cindex @samp{)} in regexp
	574	@cindex regexp grouping
	575	is a grouping construct that serves three purposes:
	576
	577	@enumerate
	578	@item
	579	To enclose a set of @samp{\\|} alternatives for other operations. Thus,
	580	the regular expression @samp{$foo\\|bar$x} matches either @samp{foox}
	581	or @samp{barx}.
	582
	583	@item
	584	To enclose a complicated expression for the postfix operators @samp{*},
	585	@samp{+} and @samp{?} to operate on. Thus, @samp{ba$na$*} matches
	586	@samp{ba}, @samp{bana}, @samp{banana}, @samp{bananana}, etc., with any
	587	number (zero or more) of @samp{na} strings.
	588
	589	@item
	590	To record a matched substring for future reference with
	591	@samp{\@var{digit}} (see below).
	592	@end enumerate
	593
	594	This last application is not a consequence of the idea of a
	595	parenthetical grouping; it is a separate feature that was assigned as a
	596	second meaning to the same @samp{$ @dots{} $} construct because, in
	597	practice, there was usually no conflict between the two meanings. But
	598	occasionally there is a conflict, and that led to the introduction of
	599	shy groups.
	600
	601	@item $?: @dots{} $
	602	is the @dfn{shy group} construct. A shy group serves the first two
	603	purposes of an ordinary group (controlling the nesting of other
	604	operators), but it does not get a number, so you cannot refer back to
	605	its value with @samp{\@var{digit}}.
	606
	607	Shy groups are particularly useful for mechanically-constructed regular
	608	expressions because they can be added automatically without altering the
	609	numbering of any ordinary, non-shy groups.
	610
	611	@item \@var{digit}
	612	matches the same text that matched the @var{digit}th occurrence of a
	613	grouping (@samp{$ @dots{} $}) construct.
	614
	615	In other words, after the end of a group, the matcher remembers the
	616	beginning and end of the text matched by that group. Later on in the
	617	regular expression you can use @samp{\} followed by @var{digit} to
	618	match that same text, whatever it may have been.
	619
	620	The strings matching the first nine grouping constructs appearing in
	621	the entire regular expression passed to a search or matching function
	622	are assigned numbers 1 through 9 in the order that the open
	623	parentheses appear in the regular expression. So you can use
	624	@samp{\1} through @samp{\9} to refer to the text matched by the
	625	corresponding grouping constructs.
	626
	627	For example, @samp{$.*$\1} matches any newline-free string that is
	628	composed of two identical halves. The @samp{$.*$} matches the first
	629	half, which may be anything, but the @samp{\1} that follows must match
	630	the same exact text.
	631
	632	If a @samp{$ @dots{} $} construct matches more than once (which can
	633	happen, for instance, if it is followed by @samp{*}), only the last
	634	match is recorded.
	635
	636	If a particular grouping construct in the regular expression was never
	637	matched---for instance, if it appears inside of an alternative that
	638	wasn't used, or inside of a repetition that repeated zero times---then
	639	the corresponding @samp{\@var{digit}} construct never matches
	640	anything. To use an artificial example,, @samp{$foo\(b*$\\|lose\)\2}
	641	cannot match @samp{lose}: the second alternative inside the larger
	642	group matches it, but then @samp{\2} is undefined and can't match
	643	anything. But it can match @samp{foobb}, because the first
	644	alternative matches @samp{foob} and @samp{\2} matches @samp{b}.
	645
	646	@item \w
	647	@cindex @samp{\w} in regexp
	648	matches any word-constituent character. The editor syntax table
	649	determines which characters these are. @xref{Syntax Tables}.
	650
	651	@item \W
	652	@cindex @samp{\W} in regexp
	653	matches any character that is not a word constituent.
	654
	655	@item \s@var{code}
	656	@cindex @samp{\s} in regexp
	657	matches any character whose syntax is @var{code}. Here @var{code} is a
	658	character that represents a syntax code: thus, @samp{w} for word
	659	constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
	660	etc. To represent whitespace syntax, use either @samp{-} or a space
	661	character. @xref{Syntax Class Table}, for a list of syntax codes and
	662	the characters that stand for them.
	663
	664	@item \S@var{code}
	665	@cindex @samp{\S} in regexp
	666	matches any character whose syntax is not @var{code}.
	667
	668	@item \c@var{c}
	669	matches any character whose category is @var{c}. Here @var{c} is a
	670	character that represents a category: thus, @samp{c} for Chinese
	671	characters or @samp{g} for Greek characters in the standard category
	672	table.
	673
	674	@item \C@var{c}
	675	matches any character whose category is not @var{c}.
	676	@end table
	677
	678	The following regular expression constructs match the empty string---that is,
	679	they don't use up any characters---but whether they match depends on the
	680	context. For all, the beginning and end of the accessible portion of
	681	the buffer are treated as if they were the actual beginning and end of
	682	the buffer.
	683
	684	@table @samp
	685	@item \`
	686	@cindex @samp{\`} in regexp
	687	matches the empty string, but only at the beginning
	688	of the buffer or string being matched against.
	689
	690	@item \'
	691	@cindex @samp{\'} in regexp
	692	matches the empty string, but only at the end of
	693	the buffer or string being matched against.
	694
	695	@item \=
	696	@cindex @samp{\=} in regexp
	697	matches the empty string, but only at point.
	698	(This construct is not defined when matching against a string.)
	699
	700	@item \b
	701	@cindex @samp{\b} in regexp
	702	matches the empty string, but only at the beginning or
	703	end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
	704	@samp{foo} as a separate word. @samp{\bballs?\b} matches
	705	@samp{ball} or @samp{balls} as a separate word.@refill
	706
	707	@samp{\b} matches at the beginning or end of the buffer (or string)
	708	regardless of what text appears next to it.
	709
	710	@item \B
	711	@cindex @samp{\B} in regexp
	712	matches the empty string, but @emph{not} at the beginning or
	713	end of a word, nor at the beginning or end of the buffer (or string).
	714
	715	@item \<
	716	@cindex @samp{\<} in regexp
	717	matches the empty string, but only at the beginning of a word.
	718	@samp{\<} matches at the beginning of the buffer (or string) only if a
	719	word-constituent character follows.
	720
	721	@item \>
	722	@cindex @samp{\>} in regexp
	723	matches the empty string, but only at the end of a word. @samp{\>}
	724	matches at the end of the buffer (or string) only if the contents end
	725	with a word-constituent character.
	726
	727	@item \_<
	728	@cindex @samp{\_<} in regexp
	729	matches the empty string, but only at the beginning of a symbol. A
	730	symbol is a sequence of one or more word or symbol constituent
	731	characters. @samp{\_<} matches at the beginning of the buffer (or
	732	string) only if a symbol-constituent character follows.
	733
	734	@item \_>
	735	@cindex @samp{\_>} in regexp
	736	matches the empty string, but only at the end of a symbol. @samp{\_>}
	737	matches at the end of the buffer (or string) only if the contents end
	738	with a symbol-constituent character.
	739	@end table
	740
	741	@kindex invalid-regexp
	742	Not every string is a valid regular expression. For example, a string
	743	with unbalanced square brackets is invalid (with a few exceptions, such
	744	as @samp{[]]}), and so is a string that ends with a single @samp{\}. If
	745	an invalid regular expression is passed to any of the search functions,
	746	an @code{invalid-regexp} error is signaled.
	747
	748	@node Regexp Example
	749	@comment node-name, next, previous, up
	750	@subsection Complex Regexp Example
	751
	752	Here is a complicated regexp which was formerly used by Emacs to
	753	recognize the end of a sentence together with any whitespace that
	754	follows. (Nowadays Emacs uses a similar but more complex default
	755	regexp constructed by the function @code{sentence-end}.
	756	@xref{Standard Regexps}.)
	757
	758	First, we show the regexp as a string in Lisp syntax to distinguish
	759	spaces from tab characters. The string constant begins and ends with a
	760	double-quote. @samp{\"} stands for a double-quote as part of the
	761	string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
	762	tab and @samp{\n} for a newline.
	763
	764	@example
	765	"[.?!][]\"')@}]\$$\\\| $\\\|\t\\\|@ @ \$[ \t\n]"
	766	@end example
	767
	768	@noindent
	769	In contrast, if you evaluate this string, you will see the following:
	770
	771	@example
	772	@group
	773	"[.?!][]\"')@}]\$$\\\| $\\\|\t\\\|@ @ \$[ \t\n]"
	774	@result{} "[.?!][]\"')@}]*\$$\\\| $\\\| \\\|@ @ \$[
	775	]*"
	776	@end group
	777	@end example
	778
	779	@noindent
	780	In this output, tab and newline appear as themselves.
	781
	782	This regular expression contains four parts in succession and can be
	783	deciphered as follows:
	784
	785	@table @code
	786	@item [.?!]
	787	The first part of the pattern is a character alternative that matches
	788	any one of three characters: period, question mark, and exclamation
	789	mark. The match must begin with one of these three characters. (This
	790	is one point where the new default regexp used by Emacs differs from
	791	the old. The new value also allows some non-@acronym{ASCII}
	792	characters that end a sentence without any following whitespace.)
	793
	794	@item []\"')@}]*
	795	The second part of the pattern matches any closing braces and quotation
	796	marks, zero or more of them, that may follow the period, question mark
	797	or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in
	798	a string. The @samp{*} at the end indicates that the immediately
	799	preceding regular expression (a character alternative, in this case) may be
	800	repeated zero or more times.
	801
	802	@item \$$\\\|@ $\\\|\t\\\|@ @ \$
	803	The third part of the pattern matches the whitespace that follows the
	804	end of a sentence: the end of a line (optionally with a space), or a
	805	tab, or two spaces. The double backslashes mark the parentheses and
	806	vertical bars as regular expression syntax; the parentheses delimit a
	807	group and the vertical bars separate alternatives. The dollar sign is
	808	used to match the end of a line.
	809
	810	@item [ \t\n]*
	811	Finally, the last part of the pattern matches any additional whitespace
	812	beyond the minimum needed to end a sentence.
	813	@end table
	814
	815	@node Regexp Functions
	816	@subsection Regular Expression Functions
	817
	818	These functions operate on regular expressions.
	819
	820	@defun regexp-quote string
	821	This function returns a regular expression whose only exact match is
	822	@var{string}. Using this regular expression in @code{looking-at} will
	823	succeed only if the next characters in the buffer are @var{string};
	824	using it in a search function will succeed if the text being searched
	825	contains @var{string}.
	826
	827	This allows you to request an exact string match or search when calling
	828	a function that wants a regular expression.
	829
	830	@example
	831	@group
	832	(regexp-quote "^The cat$")
	833	@result{} "\\^The cat\\$"
	834	@end group
	835	@end example
	836
	837	One use of @code{regexp-quote} is to combine an exact string match with
	838	context described as a regular expression. For example, this searches
	839	for the string that is the value of @var{string}, surrounded by
	840	whitespace:
	841
	842	@example
	843	@group
	844	(re-search-forward
	845	(concat "\\s-" (regexp-quote string) "\\s-"))
	846	@end group
	847	@end example
	848	@end defun
	849
	850	@defun regexp-opt strings &optional paren
	851	This function returns an efficient regular expression that will match
	852	any of the strings in the list @var{strings}. This is useful when you
	853	need to make matching or searching as fast as possible---for example,
	854	for Font Lock mode.
	855
	856	If the optional argument @var{paren} is non-@code{nil}, then the
	857	returned regular expression is always enclosed by at least one
	858	parentheses-grouping construct. If @var{paren} is @code{words}, then
	859	that construct is additionally surrounded by @samp{\<} and @samp{\>}.
	860
	861	This simplified definition of @code{regexp-opt} produces a
	862	regular expression which is equivalent to the actual value
	863	(but not as efficient):
	864
	865	@example
	866	(defun regexp-opt (strings paren)
	867	(let ((open-paren (if paren "\\(" ""))
	868	(close-paren (if paren "\\)" "")))
	869	(concat open-paren
	870	(mapconcat 'regexp-quote strings "\\\|")
	871	close-paren)))
	872	@end example
	873	@end defun
	874
	875	@defun regexp-opt-depth regexp
	876	This function returns the total number of grouping constructs
	877	(parenthesized expressions) in @var{regexp}. (This does not include
	878	shy groups.)
	879	@end defun
	880
	881	@node Regexp Search
	882	@section Regular Expression Searching
	883	@cindex regular expression searching
	884	@cindex regexp searching
	885	@cindex searching for regexp
	886
	887	In GNU Emacs, you can search for the next match for a regular
	888	expression either incrementally or not. For incremental search
	889	commands, see @ref{Regexp Search, , Regular Expression Search, emacs,
	890	The GNU Emacs Manual}. Here we describe only the search functions
	891	useful in programs. The principal one is @code{re-search-forward}.
	892
	893	These search functions convert the regular expression to multibyte if
	894	the buffer is multibyte; they convert the regular expression to unibyte
	895	if the buffer is unibyte. @xref{Text Representations}.
	896
	897	@deffn Command re-search-forward regexp &optional limit noerror repeat
	898	This function searches forward in the current buffer for a string of
	899	text that is matched by the regular expression @var{regexp}. The
	900	function skips over any amount of text that is not matched by
	901	@var{regexp}, and leaves point at the end of the first match found.
	902	It returns the new value of point.
	903
	904	If @var{limit} is non-@code{nil}, it must be a position in the current
	905	buffer. It specifies the upper bound to the search. No match
	906	extending after that position is accepted.
	907
	908	If @var{repeat} is supplied, it must be a positive number; the search
	909	is repeated that many times; each repetition starts at the end of the
	910	previous match. If all these successive searches succeed, the search
	911	succeeds, moving point and returning its new value. Otherwise the
	912	search fails. What @code{re-search-forward} does when the search
	913	fails depends on the value of @var{noerror}:
	914
	915	@table @asis
	916	@item @code{nil}
	917	Signal a @code{search-failed} error.
	918	@item @code{t}
	919	Do nothing and return @code{nil}.
	920	@item anything else
	921	Move point to @var{limit} (or the end of the accessible portion of the
	922	buffer) and return @code{nil}.
	923	@end table
	924
	925	In the following example, point is initially before the @samp{T}.
	926	Evaluating the search call moves point to the end of that line (between
	927	the @samp{t} of @samp{hat} and the newline).
	928
	929	@example
	930	@group
	931	---------- Buffer: foo ----------
	932	I read "@point{}The cat in the hat
	933	comes back" twice.
	934	---------- Buffer: foo ----------
	935	@end group
	936
	937	@group
	938	(re-search-forward "[a-z]+" nil t 5)
	939	@result{} 27
	940
	941	---------- Buffer: foo ----------
	942	I read "The cat in the hat@point{}
	943	comes back" twice.
	944	---------- Buffer: foo ----------
	945	@end group
	946	@end example
	947	@end deffn
	948
	949	@deffn Command re-search-backward regexp &optional limit noerror repeat
	950	This function searches backward in the current buffer for a string of
	951	text that is matched by the regular expression @var{regexp}, leaving
	952	point at the beginning of the first text found.
	953
	954	This function is analogous to @code{re-search-forward}, but they are not
	955	simple mirror images. @code{re-search-forward} finds the match whose
	956	beginning is as close as possible to the starting point. If
	957	@code{re-search-backward} were a perfect mirror image, it would find the
	958	match whose end is as close as possible. However, in fact it finds the
	959	match whose beginning is as close as possible (and yet ends before the
	960	starting point). The reason for this is that matching a regular
	961	expression at a given spot always works from beginning to end, and
	962	starts at a specified beginning position.
	963
	964	A true mirror-image of @code{re-search-forward} would require a special
	965	feature for matching regular expressions from end to beginning. It's
	966	not worth the trouble of implementing that.
	967	@end deffn
	968
	969	@defun string-match regexp string &optional start
	970	This function returns the index of the start of the first match for
	971	the regular expression @var{regexp} in @var{string}, or @code{nil} if
	972	there is no match. If @var{start} is non-@code{nil}, the search starts
	973	at that index in @var{string}.
	974
	975	For example,
	976
	977	@example
	978	@group
	979	(string-match
	980	"quick" "The quick brown fox jumped quickly.")
	981	@result{} 4
	982	@end group
	983	@group
	984	(string-match
	985	"quick" "The quick brown fox jumped quickly." 8)
	986	@result{} 27
	987	@end group
	988	@end example
	989
	990	@noindent
	991	The index of the first character of the
	992	string is 0, the index of the second character is 1, and so on.
	993
	994	After this function returns, the index of the first character beyond
	995	the match is available as @code{(match-end 0)}. @xref{Match Data}.
	996
	997	@example
	998	@group
	999	(string-match
	1000	"quick" "The quick brown fox jumped quickly." 8)
	1001	@result{} 27
	1002	@end group
	1003
	1004	@group
	1005	(match-end 0)
	1006	@result{} 32
	1007	@end group
	1008	@end example
	1009	@end defun
	1010
	1011	@defun looking-at regexp
	1012	This function determines whether the text in the current buffer directly
	1013	following point matches the regular expression @var{regexp}. ``Directly
	1014	following'' means precisely that: the search is ``anchored'' and it can
	1015	succeed only starting with the first character following point. The
	1016	result is @code{t} if so, @code{nil} otherwise.
	1017
	1018	This function does not move point, but it updates the match data, which
	1019	you can access using @code{match-beginning} and @code{match-end}.
	1020	@xref{Match Data}.
	1021
	1022	In this example, point is located directly before the @samp{T}. If it
	1023	were anywhere else, the result would be @code{nil}.
	1024
	1025	@example
	1026	@group
	1027	---------- Buffer: foo ----------
	1028	I read "@point{}The cat in the hat
	1029	comes back" twice.
	1030	---------- Buffer: foo ----------
	1031
	1032	(looking-at "The cat in the hat$")
	1033	@result{} t
	1034	@end group
	1035	@end example
	1036	@end defun
	1037
	1038	@defun looking-back regexp &optional limit
	1039	This function returns @code{t} if @var{regexp} matches text before
	1040	point, ending at point, and @code{nil} otherwise.
	1041
	1042	Because regular expression matching works only going forward, this is
	1043	implemented by searching backwards from point for a match that ends at
	1044	point. That can be quite slow if it has to search a long distance.
	1045	You can bound the time required by specifying @var{limit}, which says
	1046	not to search before @var{limit}. In this case, the match that is
	1047	found must begin at or after @var{limit}.
	1048
	1049	@example
	1050	@group
	1051	---------- Buffer: foo ----------
	1052	I read "@point{}The cat in the hat
	1053	comes back" twice.
	1054	---------- Buffer: foo ----------
	1055
	1056	(looking-back "read \"" 3)
	1057	@result{} t
	1058	(looking-back "read \"" 4)
	1059	@result{} nil
	1060	@end group
	1061	@end example
	1062	@end defun
	1063
	1064	@defvar search-spaces-regexp
	1065	If this variable is non-@code{nil}, it should be a regular expression
	1066	that says how to search for whitespace. In that case, any group of
	1067	spaces in a regular expression being searched for stands for use of
	1068	this regular expression. However, spaces inside of constructs such as
	1069	@samp{[@dots{}]} and @samp{*}, @samp{+}, @samp{?} are not affected by
	1070	@code{search-spaces-regexp}.
	1071
	1072	Since this variable affects all regular expression search and match
	1073	constructs, you should bind it temporarily for as small as possible
	1074	a part of the code.
	1075	@end defvar
	1076
	1077	@node POSIX Regexps
	1078	@section POSIX Regular Expression Searching
	1079
	1080	The usual regular expression functions do backtracking when necessary
	1081	to handle the @samp{\\|} and repetition constructs, but they continue
	1082	this only until they find @emph{some} match. Then they succeed and
	1083	report the first match found.
	1084
	1085	This section describes alternative search functions which perform the
	1086	full backtracking specified by the POSIX standard for regular expression
	1087	matching. They continue backtracking until they have tried all
	1088	possibilities and found all matches, so they can report the longest
	1089	match, as required by POSIX. This is much slower, so use these
	1090	functions only when you really need the longest match.
	1091
	1092	The POSIX search and match functions do not properly support the
	1093	non-greedy repetition operators. This is because POSIX backtracking
	1094	conflicts with the semantics of non-greedy repetition.
	1095
	1096	@defun posix-search-forward regexp &optional limit noerror repeat
	1097	This is like @code{re-search-forward} except that it performs the full
	1098	backtracking specified by the POSIX standard for regular expression
	1099	matching.
	1100	@end defun
	1101
	1102	@defun posix-search-backward regexp &optional limit noerror repeat
	1103	This is like @code{re-search-backward} except that it performs the full
	1104	backtracking specified by the POSIX standard for regular expression
	1105	matching.
	1106	@end defun
	1107
	1108	@defun posix-looking-at regexp
	1109	This is like @code{looking-at} except that it performs the full
	1110	backtracking specified by the POSIX standard for regular expression
	1111	matching.
	1112	@end defun
	1113
	1114	@defun posix-string-match regexp string &optional start
	1115	This is like @code{string-match} except that it performs the full
	1116	backtracking specified by the POSIX standard for regular expression
	1117	matching.
	1118	@end defun
	1119
	1120	@node Match Data
	1121	@section The Match Data
	1122	@cindex match data
	1123
	1124	Emacs keeps track of the start and end positions of the segments of
	1125	text found during a search; this is called the @dfn{match data}.
	1126	Thanks to the match data, you can search for a complex pattern, such
	1127	as a date in a mail message, and then extract parts of the match under
	1128	control of the pattern.
	1129
	1130	Because the match data normally describe the most recent search only,
	1131	you must be careful not to do another search inadvertently between the
	1132	search you wish to refer back to and the use of the match data. If you
	1133	can't avoid another intervening search, you must save and restore the
	1134	match data around it, to prevent it from being overwritten.
	1135
	1136	@menu
	1137	* Replacing Match:: Replacing a substring that was matched.
	1138	* Simple Match Data:: Accessing single items of match data,
	1139	such as where a particular subexpression started.
	1140	* Entire Match Data:: Accessing the entire match data at once, as a list.
	1141	* Saving Match Data:: Saving and restoring the match data.
	1142	@end menu
	1143
	1144	@node Replacing Match
	1145	@subsection Replacing the Text that Matched
	1146
	1147	This function replaces all or part of the text matched by the last
	1148	search. It works by means of the match data.
	1149
	1150	@cindex case in replacements
	1151	@defun replace-match replacement &optional fixedcase literal string subexp
	1152	This function replaces the text in the buffer (or in @var{string}) that
	1153	was matched by the last search. It replaces that text with
	1154	@var{replacement}.
	1155
	1156	If you did the last search in a buffer, you should specify @code{nil}
	1157	for @var{string} and make sure that the current buffer when you call
	1158	@code{replace-match} is the one in which you did the searching or
	1159	matching. Then @code{replace-match} does the replacement by editing
	1160	the buffer; it leaves point at the end of the replacement text, and
	1161	returns @code{t}.
	1162
	1163	If you did the search in a string, pass the same string as @var{string}.
	1164	Then @code{replace-match} does the replacement by constructing and
	1165	returning a new string.
	1166
	1167	If @var{fixedcase} is non-@code{nil}, then @code{replace-match} uses
	1168	the replacement text without case conversion; otherwise, it converts
	1169	the replacement text depending upon the capitalization of the text to
	1170	be replaced. If the original text is all upper case, this converts
	1171	the replacement text to upper case. If all words of the original text
	1172	are capitalized, this capitalizes all the words of the replacement
	1173	text. If all the words are one-letter and they are all upper case,
	1174	they are treated as capitalized words rather than all-upper-case
	1175	words.
	1176
	1177	If @var{literal} is non-@code{nil}, then @var{replacement} is inserted
	1178	exactly as it is, the only alterations being case changes as needed.
	1179	If it is @code{nil} (the default), then the character @samp{\} is treated
	1180	specially. If a @samp{\} appears in @var{replacement}, then it must be
	1181	part of one of the following sequences:
	1182
	1183	@table @asis
	1184	@item @samp{\&}
	1185	@cindex @samp{&} in replacement
	1186	@samp{\&} stands for the entire text being replaced.
	1187
	1188	@item @samp{\@var{n}}
	1189	@cindex @samp{\@var{n}} in replacement
	1190	@samp{\@var{n}}, where @var{n} is a digit, stands for the text that
	1191	matched the @var{n}th subexpression in the original regexp.
	1192	Subexpressions are those expressions grouped inside @samp{$@dots{}$}.
	1193	If the @var{n}th subexpression never matched, an empty string is substituted.
	1194
	1195	@item @samp{\\}
	1196	@cindex @samp{\} in replacement
	1197	@samp{\\} stands for a single @samp{\} in the replacement text.
	1198	@end table
	1199
	1200	These substitutions occur after case conversion, if any,
	1201	so the strings they substitute are never case-converted.
	1202
	1203	If @var{subexp} is non-@code{nil}, that says to replace just
	1204	subexpression number @var{subexp} of the regexp that was matched, not
	1205	the entire match. For example, after matching @samp{foo $ba*r$},
	1206	calling @code{replace-match} with 1 as @var{subexp} means to replace
	1207	just the text that matched @samp{$ba*r$}.
	1208	@end defun
	1209
	1210	@node Simple Match Data
	1211	@subsection Simple Match Data Access
	1212
	1213	This section explains how to use the match data to find out what was
	1214	matched by the last search or match operation, if it succeeded.
	1215
	1216	You can ask about the entire matching text, or about a particular
	1217	parenthetical subexpression of a regular expression. The @var{count}
	1218	argument in the functions below specifies which. If @var{count} is
	1219	zero, you are asking about the entire match. If @var{count} is
	1220	positive, it specifies which subexpression you want.
	1221
	1222	Recall that the subexpressions of a regular expression are those
	1223	expressions grouped with escaped parentheses, @samp{$@dots{}$}. The
	1224	@var{count}th subexpression is found by counting occurrences of
	1225	@samp{\(} from the beginning of the whole regular expression. The first
	1226	subexpression is numbered 1, the second 2, and so on. Only regular
	1227	expressions can have subexpressions---after a simple string search, the
	1228	only information available is about the entire match.
	1229
	1230	A search which fails may or may not alter the match data. In the
	1231	past, a failing search did not do this, but we may change it in the
	1232	future. So don't try to rely on the value of the match data after
	1233	a failing search.
	1234
	1235	@defun match-string count &optional in-string
	1236	This function returns, as a string, the text matched in the last search
	1237	or match operation. It returns the entire text if @var{count} is zero,
	1238	or just the portion corresponding to the @var{count}th parenthetical
	1239	subexpression, if @var{count} is positive.
	1240
	1241	If the last such operation was done against a string with
	1242	@code{string-match}, then you should pass the same string as the
	1243	argument @var{in-string}. After a buffer search or match,
	1244	you should omit @var{in-string} or pass @code{nil} for it; but you
	1245	should make sure that the current buffer when you call
	1246	@code{match-string} is the one in which you did the searching or
	1247	matching.
	1248
	1249	The value is @code{nil} if @var{count} is out of range, or for a
	1250	subexpression inside a @samp{\\|} alternative that wasn't used or a
	1251	repetition that repeated zero times.
	1252	@end defun
	1253
	1254	@defun match-string-no-properties count &optional in-string
	1255	This function is like @code{match-string} except that the result
	1256	has no text properties.
	1257	@end defun
	1258
	1259	@defun match-beginning count
	1260	This function returns the position of the start of text matched by the
	1261	last regular expression searched for, or a subexpression of it.
	1262
	1263	If @var{count} is zero, then the value is the position of the start of
	1264	the entire match. Otherwise, @var{count} specifies a subexpression in
	1265	the regular expression, and the value of the function is the starting
	1266	position of the match for that subexpression.
	1267
	1268	The value is @code{nil} for a subexpression inside a @samp{\\|}
	1269	alternative that wasn't used or a repetition that repeated zero times.
	1270	@end defun
	1271
	1272	@defun match-end count
	1273	This function is like @code{match-beginning} except that it returns the
	1274	position of the end of the match, rather than the position of the
	1275	beginning.
	1276	@end defun
	1277
	1278	Here is an example of using the match data, with a comment showing the
	1279	positions within the text:
	1280
	1281	@example
	1282	@group
	1283	(string-match "\$qu\$\$ick\$"
	1284	"The quick fox jumped quickly.")
	1285	;0123456789
	1286	@result{} 4
	1287	@end group
	1288
	1289	@group
	1290	(match-string 0 "The quick fox jumped quickly.")
	1291	@result{} "quick"
	1292	(match-string 1 "The quick fox jumped quickly.")
	1293	@result{} "qu"
	1294	(match-string 2 "The quick fox jumped quickly.")
	1295	@result{} "ick"
	1296	@end group
	1297
	1298	@group
	1299	(match-beginning 1) ; @r{The beginning of the match}
	1300	@result{} 4 ; @r{with @samp{qu} is at index 4.}
	1301	@end group
	1302
	1303	@group
	1304	(match-beginning 2) ; @r{The beginning of the match}
	1305	@result{} 6 ; @r{with @samp{ick} is at index 6.}
	1306	@end group
	1307
	1308	@group
	1309	(match-end 1) ; @r{The end of the match}
	1310	@result{} 6 ; @r{with @samp{qu} is at index 6.}
	1311
	1312	(match-end 2) ; @r{The end of the match}
	1313	@result{} 9 ; @r{with @samp{ick} is at index 9.}
	1314	@end group
	1315	@end example
	1316
	1317	Here is another example. Point is initially located at the beginning
	1318	of the line. Searching moves point to between the space and the word
	1319	@samp{in}. The beginning of the entire match is at the 9th character of
	1320	the buffer (@samp{T}), and the beginning of the match for the first
	1321	subexpression is at the 13th character (@samp{c}).
	1322
	1323	@example
	1324	@group
	1325	(list
	1326	(re-search-forward "The \$cat \$")
	1327	(match-beginning 0)
	1328	(match-beginning 1))
	1329	@result{} (9 9 13)
	1330	@end group
	1331
	1332	@group
	1333	---------- Buffer: foo ----------
	1334	I read "The cat @point{}in the hat comes back" twice.
	1335	^ ^
	1336	9 13
	1337	---------- Buffer: foo ----------
	1338	@end group
	1339	@end example
	1340
	1341	@noindent
	1342	(In this case, the index returned is a buffer position; the first
	1343	character of the buffer counts as 1.)
	1344
	1345	@node Entire Match Data
	1346	@subsection Accessing the Entire Match Data
	1347
	1348	The functions @code{match-data} and @code{set-match-data} read or
	1349	write the entire match data, all at once.
	1350
	1351	@defun match-data &optional integers reuse reseat
	1352	This function returns a list of positions (markers or integers) that
	1353	record all the information on what text the last search matched.
	1354	Element zero is the position of the beginning of the match for the
	1355	whole expression; element one is the position of the end of the match
	1356	for the expression. The next two elements are the positions of the
	1357	beginning and end of the match for the first subexpression, and so on.
	1358	In general, element
	1359	@ifnottex
	1360	number 2@var{n}
	1361	@end ifnottex
	1362	@tex
	1363	number {\mathsurround=0pt $2n$}
	1364	@end tex
	1365	corresponds to @code{(match-beginning @var{n})}; and
	1366	element
	1367	@ifnottex
	1368	number 2@var{n} + 1
	1369	@end ifnottex
	1370	@tex
	1371	number {\mathsurround=0pt $2n+1$}
	1372	@end tex
	1373	corresponds to @code{(match-end @var{n})}.
	1374
	1375	Normally all the elements are markers or @code{nil}, but if
	1376	@var{integers} is non-@code{nil}, that means to use integers instead
	1377	of markers. (In that case, the buffer itself is appended as an
	1378	additional element at the end of the list, to facilitate complete
	1379	restoration of the match data.) If the last match was done on a
	1380	string with @code{string-match}, then integers are always used,
	1381	since markers can't point into a string.
	1382
	1383	If @var{reuse} is non-@code{nil}, it should be a list. In that case,
	1384	@code{match-data} stores the match data in @var{reuse}. That is,
	1385	@var{reuse} is destructively modified. @var{reuse} does not need to
	1386	have the right length. If it is not long enough to contain the match
	1387	data, it is extended. If it is too long, the length of @var{reuse}
	1388	stays the same, but the elements that were not used are set to
	1389	@code{nil}. The purpose of this feature is to reduce the need for
	1390	garbage collection.
	1391
	1392	If @var{reseat} is non-@code{nil}, all markers on the @var{reuse} list
	1393	are reseated to point to nowhere.
	1394
	1395	As always, there must be no possibility of intervening searches between
	1396	the call to a search function and the call to @code{match-data} that is
	1397	intended to access the match data for that search.
	1398
	1399	@example
	1400	@group
	1401	(match-data)
	1402	@result{} (#<marker at 9 in foo>
	1403	#<marker at 17 in foo>
	1404	#<marker at 13 in foo>
	1405	#<marker at 17 in foo>)
	1406	@end group
	1407	@end example
	1408	@end defun
	1409
	1410	@defun set-match-data match-list &optional reseat
	1411	This function sets the match data from the elements of @var{match-list},
	1412	which should be a list that was the value of a previous call to
	1413	@code{match-data}. (More precisely, anything that has the same format
	1414	will work.)
	1415
	1416	If @var{match-list} refers to a buffer that doesn't exist, you don't get
	1417	an error; that sets the match data in a meaningless but harmless way.
	1418
	1419	If @var{reseat} is non-@code{nil}, all markers on the @var{match-list} list
	1420	are reseated to point to nowhere.
	1421
	1422	@findex store-match-data
	1423	@code{store-match-data} is a semi-obsolete alias for @code{set-match-data}.
	1424	@end defun
	1425
	1426	@node Saving Match Data
	1427	@subsection Saving and Restoring the Match Data
	1428
	1429	When you call a function that may do a search, you may need to save
	1430	and restore the match data around that call, if you want to preserve the
	1431	match data from an earlier search for later use. Here is an example
	1432	that shows the problem that arises if you fail to save the match data:
	1433
	1434	@example
	1435	@group
	1436	(re-search-forward "The \$cat \$")
	1437	@result{} 48
	1438	(foo) ; @r{Perhaps @code{foo} does}
	1439	; @r{more searching.}
	1440	(match-end 0)
	1441	@result{} 61 ; @r{Unexpected result---not 48!}
	1442	@end group
	1443	@end example
	1444
	1445	You can save and restore the match data with @code{save-match-data}:
	1446
	1447	@defmac save-match-data body@dots{}
	1448	This macro executes @var{body}, saving and restoring the match
	1449	data around it. The return value is the value of the last form in
	1450	@var{body}.
	1451	@end defmac
	1452
	1453	You could use @code{set-match-data} together with @code{match-data} to
	1454	imitate the effect of the special form @code{save-match-data}. Here is
	1455	how:
	1456
	1457	@example
	1458	@group
	1459	(let ((data (match-data)))
	1460	(unwind-protect
	1461	@dots{} ; @r{Ok to change the original match data.}
	1462	(set-match-data data)))
	1463	@end group
	1464	@end example
	1465
	1466	Emacs automatically saves and restores the match data when it runs
	1467	process filter functions (@pxref{Filter Functions}) and process
	1468	sentinels (@pxref{Sentinels}).
	1469
	1470	@ignore
	1471	Here is a function which restores the match data provided the buffer
	1472	associated with it still exists.
	1473
	1474	@smallexample
	1475	@group
	1476	(defun restore-match-data (data)
	1477	@c It is incorrect to split the first line of a doc string.
	1478	@c If there's a problem here, it should be solved in some other way.
	1479	"Restore the match data DATA unless the buffer is missing."
	1480	(catch 'foo
	1481	(let ((d data))
	1482	@end group
	1483	(while d
	1484	(and (car d)
	1485	(null (marker-buffer (car d)))
	1486	@group
	1487	;; @file{match-data} @r{buffer is deleted.}
	1488	(throw 'foo nil))
	1489	(setq d (cdr d)))
	1490	(set-match-data data))))
	1491	@end group
	1492	@end smallexample
	1493	@end ignore
	1494
	1495	@node Search and Replace
	1496	@section Search and Replace
	1497	@cindex replacement
	1498
	1499	If you want to find all matches for a regexp in part of the buffer,
	1500	and replace them, the best way is to write an explicit loop using
	1501	@code{re-search-forward} and @code{replace-match}, like this:
	1502
	1503	@example
	1504	(while (re-search-forward "foo[ \t]+bar" nil t)
	1505	(replace-match "foobar"))
	1506	@end example
	1507
	1508	@noindent
	1509	@xref{Replacing Match,, Replacing the Text that Matched}, for a
	1510	description of @code{replace-match}.
	1511
	1512	However, replacing matches in a string is more complex, especially
	1513	if you want to do it efficiently. So Emacs provides a function to do
	1514	this.
	1515
	1516	@defun replace-regexp-in-string regexp rep string &optional fixedcase literal subexp start
	1517	This function copies @var{string} and searches it for matches for
	1518	@var{regexp}, and replaces them with @var{rep}. It returns the
	1519	modified copy. If @var{start} is non-@code{nil}, the search for
	1520	matches starts at that index in @var{string}, so matches starting
	1521	before that index are not changed.
	1522
	1523	This function uses @code{replace-match} to do the replacement, and it
	1524	passes the optional arguments @var{fixedcase}, @var{literal} and
	1525	@var{subexp} along to @code{replace-match}.
	1526
	1527	Instead of a string, @var{rep} can be a function. In that case,
	1528	@code{replace-regexp-in-string} calls @var{rep} for each match,
	1529	passing the text of the match as its sole argument. It collects the
	1530	value @var{rep} returns and passes that to @code{replace-match} as the
	1531	replacement string. The match-data at this point are the result
	1532	of matching @var{regexp} against a substring of @var{string}.
	1533	@end defun
	1534
	1535	If you want to write a command along the lines of @code{query-replace},
	1536	you can use @code{perform-replace} to do the work.
	1537
	1538	@defun perform-replace from-string replacements query-flag regexp-flag delimited-flag &optional repeat-count map start end
	1539	This function is the guts of @code{query-replace} and related
	1540	commands. It searches for occurrences of @var{from-string} in the
	1541	text between positions @var{start} and @var{end} and replaces some or
	1542	all of them. If @var{start} is @code{nil} (or omitted), point is used
	1543	instead, and the end of the buffer's accessible portion is used for
	1544	@var{end}.
	1545
	1546	If @var{query-flag} is @code{nil}, it replaces all
	1547	occurrences; otherwise, it asks the user what to do about each one.
	1548
	1549	If @var{regexp-flag} is non-@code{nil}, then @var{from-string} is
	1550	considered a regular expression; otherwise, it must match literally. If
	1551	@var{delimited-flag} is non-@code{nil}, then only replacements
	1552	surrounded by word boundaries are considered.
	1553
	1554	The argument @var{replacements} specifies what to replace occurrences
	1555	with. If it is a string, that string is used. It can also be a list of
	1556	strings, to be used in cyclic order.
	1557
	1558	If @var{replacements} is a cons cell, @code{(@var{function}
	1559	. @var{data})}, this means to call @var{function} after each match to
	1560	get the replacement text. This function is called with two arguments:
	1561	@var{data}, and the number of replacements already made.
	1562
	1563	If @var{repeat-count} is non-@code{nil}, it should be an integer. Then
	1564	it specifies how many times to use each of the strings in the
	1565	@var{replacements} list before advancing cyclically to the next one.
	1566
	1567	If @var{from-string} contains upper-case letters, then
	1568	@code{perform-replace} binds @code{case-fold-search} to @code{nil}, and
	1569	it uses the @code{replacements} without altering the case of them.
	1570
	1571	Normally, the keymap @code{query-replace-map} defines the possible
	1572	user responses for queries. The argument @var{map}, if
	1573	non-@code{nil}, specifies a keymap to use instead of
	1574	@code{query-replace-map}.
	1575	@end defun
	1576
	1577	@defvar query-replace-map
	1578	This variable holds a special keymap that defines the valid user
	1579	responses for @code{perform-replace} and the commands that use it, as
	1580	well as @code{y-or-n-p} and @code{map-y-or-n-p}. This map is unusual
	1581	in two ways:
	1582
	1583	@itemize @bullet
	1584	@item
	1585	The ``key bindings'' are not commands, just symbols that are meaningful
	1586	to the functions that use this map.
	1587
	1588	@item
	1589	Prefix keys are not supported; each key binding must be for a
	1590	single-event key sequence. This is because the functions don't use
	1591	@code{read-key-sequence} to get the input; instead, they read a single
	1592	event and look it up ``by hand.''
	1593	@end itemize
	1594	@end defvar
	1595
	1596	Here are the meaningful ``bindings'' for @code{query-replace-map}.
	1597	Several of them are meaningful only for @code{query-replace} and
	1598	friends.
	1599
	1600	@table @code
	1601	@item act
	1602	Do take the action being considered---in other words, ``yes.''
	1603
	1604	@item skip
	1605	Do not take action for this question---in other words, ``no.''
	1606
	1607	@item exit
	1608	Answer this question ``no,'' and give up on the entire series of
	1609	questions, assuming that the answers will be ``no.''
	1610
	1611	@item act-and-exit
	1612	Answer this question ``yes,'' and give up on the entire series of
	1613	questions, assuming that subsequent answers will be ``no.''
	1614
	1615	@item act-and-show
	1616	Answer this question ``yes,'' but show the results---don't advance yet
	1617	to the next question.
	1618
	1619	@item automatic
	1620	Answer this question and all subsequent questions in the series with
	1621	``yes,'' without further user interaction.
	1622
	1623	@item backup
	1624	Move back to the previous place that a question was asked about.
	1625
	1626	@item edit
	1627	Enter a recursive edit to deal with this question---instead of any
	1628	other action that would normally be taken.
	1629
	1630	@item delete-and-edit
	1631	Delete the text being considered, then enter a recursive edit to replace
	1632	it.
	1633
	1634	@item recenter
	1635	Redisplay and center the window, then ask the same question again.
	1636
	1637	@item quit
	1638	Perform a quit right away. Only @code{y-or-n-p} and related functions
	1639	use this answer.
	1640
	1641	@item help
	1642	Display some help, then ask again.
	1643	@end table
	1644
	1645	@node Standard Regexps
	1646	@section Standard Regular Expressions Used in Editing
	1647	@cindex regexps used standardly in editing
	1648	@cindex standard regexps used in editing
	1649
	1650	This section describes some variables that hold regular expressions
	1651	used for certain purposes in editing:
	1652
	1653	@defvar page-delimiter
	1654	This is the regular expression describing line-beginnings that separate
	1655	pages. The default value is @code{"^\014"} (i.e., @code{"^^L"} or
	1656	@code{"^\C-l"}); this matches a line that starts with a formfeed
	1657	character.
	1658	@end defvar
	1659
	1660	The following two regular expressions should @emph{not} assume the
	1661	match always starts at the beginning of a line; they should not use
	1662	@samp{^} to anchor the match. Most often, the paragraph commands do
	1663	check for a match only at the beginning of a line, which means that
	1664	@samp{^} would be superfluous. When there is a nonzero left margin,
	1665	they accept matches that start after the left margin. In that case, a
	1666	@samp{^} would be incorrect. However, a @samp{^} is harmless in modes
	1667	where a left margin is never used.
	1668
	1669	@defvar paragraph-separate
	1670	This is the regular expression for recognizing the beginning of a line
	1671	that separates paragraphs. (If you change this, you may have to
	1672	change @code{paragraph-start} also.) The default value is
	1673	@w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of
	1674	spaces, tabs, and form feeds (after its left margin).
	1675	@end defvar
	1676
	1677	@defvar paragraph-start
	1678	This is the regular expression for recognizing the beginning of a line
	1679	that starts @emph{or} separates paragraphs. The default value is
	1680	@w{@code{"\f\\\|[ \t]*$"}}, which matches a line containing only
	1681	whitespace or starting with a form feed (after its left margin).
	1682	@end defvar
	1683
	1684	@defvar sentence-end
	1685	If non-@code{nil}, the value should be a regular expression describing
	1686	the end of a sentence, including the whitespace following the
	1687	sentence. (All paragraph boundaries also end sentences, regardless.)
	1688
	1689	If the value is @code{nil}, the default, then the function
	1690	@code{sentence-end} has to construct the regexp. That is why you
	1691	should always call the function @code{sentence-end} to obtain the
	1692	regexp to be used to recognize the end of a sentence.
	1693	@end defvar
	1694
	1695	@defun sentence-end
	1696	This function returns the value of the variable @code{sentence-end},
	1697	if non-@code{nil}. Otherwise it returns a default value based on the
	1698	values of the variables @code{sentence-end-double-space}
	1699	(@pxref{Definition of sentence-end-double-space}),
	1700	@code{sentence-end-without-period} and
	1701	@code{sentence-end-without-space}.
	1702	@end defun
	1703
	1704	@ignore
	1705	arch-tag: c2573ca2-18aa-4839-93b8-924043ef831f
	1706	@end ignore