HCoop Git - bpt/emacs.git/blame_incremental

... / ...

Commit	Line	Data
	1	@c --texinfo--
	2	@c This is part of the GNU Emacs Lisp Reference Manual.
	3	@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999
	4	@c Free Software Foundation, Inc.
	5	@c See the file elisp.texi for copying conditions.
	6	@setfilename ../info/searching
	7	@node Searching and Matching, Syntax Tables, Non-ASCII Characters, Top
	8	@chapter Searching and Matching
	9	@cindex searching
	10
	11	GNU Emacs provides two ways to search through a buffer for specified
	12	text: exact string searches and regular expression searches. After a
	13	regular expression search, you can examine the @dfn{match data} to
	14	determine which text matched the whole regular expression or various
	15	portions of it.
	16
	17	@menu
	18	* String Search:: Search for an exact match.
	19	* Regular Expressions:: Describing classes of strings.
	20	* Regexp Search:: Searching for a match for a regexp.
	21	* POSIX Regexps:: Searching POSIX-style for the longest match.
	22	* Search and Replace:: Internals of @code{query-replace}.
	23	* Match Data:: Finding out which part of the text matched
	24	various parts of a regexp, after regexp search.
	25	* Searching and Case:: Case-independent or case-significant searching.
	26	* Standard Regexps:: Useful regexps for finding sentences, pages,...
	27	@end menu
	28
	29	The @samp{skip-chars@dots{}} functions also perform a kind of searching.
	30	@xref{Skipping Characters}.
	31
	32	@node String Search
	33	@section Searching for Strings
	34	@cindex string search
	35
	36	These are the primitive functions for searching through the text in a
	37	buffer. They are meant for use in programs, but you may call them
	38	interactively. If you do so, they prompt for the search string; the
	39	arguments @var{limit} and @var{noerror} are @code{nil}, and @var{repeat}
	40	is 1.
	41
	42	These search functions convert the search string to multibyte if the
	43	buffer is multibyte; they convert the search string to unibyte if the
	44	buffer is unibyte. @xref{Text Representations}.
	45
	46	@deffn Command search-forward string &optional limit noerror repeat
	47	This function searches forward from point for an exact match for
	48	@var{string}. If successful, it sets point to the end of the occurrence
	49	found, and returns the new value of point. If no match is found, the
	50	value and side effects depend on @var{noerror} (see below).
	51	@c Emacs 19 feature
	52
	53	In the following example, point is initially at the beginning of the
	54	line. Then @code{(search-forward "fox")} moves point after the last
	55	letter of @samp{fox}:
	56
	57	@example
	58	@group
	59	---------- Buffer: foo ----------
	60	@point{}The quick brown fox jumped over the lazy dog.
	61	---------- Buffer: foo ----------
	62	@end group
	63
	64	@group
	65	(search-forward "fox")
	66	@result{} 20
	67
	68	---------- Buffer: foo ----------
	69	The quick brown fox@point{} jumped over the lazy dog.
	70	---------- Buffer: foo ----------
	71	@end group
	72	@end example
	73
	74	The argument @var{limit} specifies the upper bound to the search. (It
	75	must be a position in the current buffer.) No match extending after
	76	that position is accepted. If @var{limit} is omitted or @code{nil}, it
	77	defaults to the end of the accessible portion of the buffer.
	78
	79	@kindex search-failed
	80	What happens when the search fails depends on the value of
	81	@var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
	82	error is signaled. If @var{noerror} is @code{t}, @code{search-forward}
	83	returns @code{nil} and does nothing. If @var{noerror} is neither
	84	@code{nil} nor @code{t}, then @code{search-forward} moves point to the
	85	upper bound and returns @code{nil}. (It would be more consistent now to
	86	return the new position of point in that case, but some existing
	87	programs may depend on a value of @code{nil}.)
	88
	89	If @var{repeat} is supplied (it must be a positive number), then the
	90	search is repeated that many times (each time starting at the end of the
	91	previous time's match). If these successive searches succeed, the
	92	function succeeds, moving point and returning its new value. Otherwise
	93	the search fails, leaving point where it started.
	94	@end deffn
	95
	96	@deffn Command search-backward string &optional limit noerror repeat
	97	This function searches backward from point for @var{string}. It is
	98	just like @code{search-forward} except that it searches backwards and
	99	leaves point at the beginning of the match.
	100	@end deffn
	101
	102	@deffn Command word-search-forward string &optional limit noerror repeat
	103	@cindex word search
	104	This function searches forward from point for a ``word'' match for
	105	@var{string}. If it finds a match, it sets point to the end of the
	106	match found, and returns the new value of point.
	107	@c Emacs 19 feature
	108
	109	Word matching regards @var{string} as a sequence of words, disregarding
	110	punctuation that separates them. It searches the buffer for the same
	111	sequence of words. Each word must be distinct in the buffer (searching
	112	for the word @samp{ball} does not match the word @samp{balls}), but the
	113	details of punctuation and spacing are ignored (searching for @samp{ball
	114	boy} does match @samp{ball. Boy!}).
	115
	116	In this example, point is initially at the beginning of the buffer; the
	117	search leaves it between the @samp{y} and the @samp{!}.
	118
	119	@example
	120	@group
	121	---------- Buffer: foo ----------
	122	@point{}He said "Please! Find
	123	the ball boy!"
	124	---------- Buffer: foo ----------
	125	@end group
	126
	127	@group
	128	(word-search-forward "Please find the ball, boy.")
	129	@result{} 35
	130
	131	---------- Buffer: foo ----------
	132	He said "Please! Find
	133	the ball boy@point{}!"
	134	---------- Buffer: foo ----------
	135	@end group
	136	@end example
	137
	138	If @var{limit} is non-@code{nil} (it must be a position in the current
	139	buffer), then it is the upper bound to the search. The match found must
	140	not extend after that position.
	141
	142	If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
	143	an error if the search fails. If @var{noerror} is @code{t}, then it
	144	returns @code{nil} instead of signaling an error. If @var{noerror} is
	145	neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the
	146	end of the buffer) and returns @code{nil}.
	147
	148	If @var{repeat} is non-@code{nil}, then the search is repeated that many
	149	times. Point is positioned at the end of the last match.
	150	@end deffn
	151
	152	@deffn Command word-search-backward string &optional limit noerror repeat
	153	This function searches backward from point for a word match to
	154	@var{string}. This function is just like @code{word-search-forward}
	155	except that it searches backward and normally leaves point at the
	156	beginning of the match.
	157	@end deffn
	158
	159	@node Regular Expressions
	160	@section Regular Expressions
	161	@cindex regular expression
	162	@cindex regexp
	163
	164	A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
	165	denotes a (possibly infinite) set of strings. Searching for matches for
	166	a regexp is a very powerful operation. This section explains how to write
	167	regexps; the following section says how to search for them.
	168
	169	@menu
	170	* Syntax of Regexps:: Rules for writing regular expressions.
	171	* Regexp Functions:: Functions for operating on regular expressions.
	172	* Regexp Example:: Illustrates regular expression syntax.
	173	@end menu
	174
	175	@node Syntax of Regexps
	176	@subsection Syntax of Regular Expressions
	177
	178	Regular expressions have a syntax in which a few characters are
	179	special constructs and the rest are @dfn{ordinary}. An ordinary
	180	character is a simple regular expression that matches that character and
	181	nothing else. The special characters are @samp{.}, @samp{*}, @samp{+},
	182	@samp{?}, @samp{[}, @samp{]}, @samp{^}, @samp{$}, and @samp{\}; no new
	183	special characters will be defined in the future. Any other character
	184	appearing in a regular expression is ordinary, unless a @samp{\}
	185	precedes it.
	186
	187	For example, @samp{f} is not a special character, so it is ordinary, and
	188	therefore @samp{f} is a regular expression that matches the string
	189	@samp{f} and no other string. (It does @emph{not} match the string
	190	@samp{fg}, but it does match a @emph{part} of that string.) Likewise,
	191	@samp{o} is a regular expression that matches only @samp{o}.@refill
	192
	193	Any two regular expressions @var{a} and @var{b} can be concatenated. The
	194	result is a regular expression that matches a string if @var{a} matches
	195	some amount of the beginning of that string and @var{b} matches the rest of
	196	the string.@refill
	197
	198	As a simple example, we can concatenate the regular expressions @samp{f}
	199	and @samp{o} to get the regular expression @samp{fo}, which matches only
	200	the string @samp{fo}. Still trivial. To do something more powerful, you
	201	need to use one of the special regular expression constructs.
	202
	203	@menu
	204	* Regexp Special:: Special characters in regular expressions.
	205	* Char Classes:: Character classes used in regular expressions.
	206	* Regexp Backslash:: Backslash-sequences in regular expressions.
	207	@end menu
	208
	209	@node Regexp Special
	210	@subsubsection Special Characters in Regular Expressions
	211
	212	Here is a list of the characters that are special in a regular
	213	expression.
	214
	215	@need 800
	216	@table @asis
	217	@item @samp{.}@: @r{(Period)}
	218	@cindex @samp{.} in regexp
	219	is a special character that matches any single character except a newline.
	220	Using concatenation, we can make regular expressions like @samp{a.b}, which
	221	matches any three-character string that begins with @samp{a} and ends with
	222	@samp{b}.@refill
	223
	224	@item @samp{*}
	225	@cindex @samp{*} in regexp
	226	is not a construct by itself; it is a postfix operator that means to
	227	match the preceding regular expression repetitively as many times as
	228	possible. Thus, @samp{o*} matches any number of @samp{o}s (including no
	229	@samp{o}s).
	230
	231	@samp{*} always applies to the @emph{smallest} possible preceding
	232	expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating
	233	@samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
	234
	235	The matcher processes a @samp{*} construct by matching, immediately, as
	236	many repetitions as can be found. Then it continues with the rest of
	237	the pattern. If that fails, backtracking occurs, discarding some of the
	238	matches of the @samp{*}-modified construct in the hope that that will
	239	make it possible to match the rest of the pattern. For example, in
	240	matching @samp{caar} against the string @samp{caaar}, the @samp{a}
	241	first tries to match all three @samp{a}s; but the rest of the pattern is
	242	@samp{ar} and there is only @samp{r} left to match, so this try fails.
	243	The next alternative is for @samp{a*} to match only two @samp{a}s. With
	244	this choice, the rest of the regexp matches successfully.@refill
	245
	246	Nested repetition operators can be extremely slow if they specify
	247	backtracking loops. For example, it could take hours for the regular
	248	expression @samp{$x+y$a} to try to match the sequence
	249	@samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz}, before it ultimately fails.
	250	The slowness is because Emacs must try each imaginable way of grouping
	251	the 35 @samp{x}s before concluding that none of them can work. To make
	252	sure your regular expressions run fast, check nested repetitions
	253	carefully.
	254
	255	@item @samp{+}
	256	@cindex @samp{+} in regexp
	257	is a postfix operator, similar to @samp{*} except that it must match
	258	the preceding expression at least once. So, for example, @samp{ca+r}
	259	matches the strings @samp{car} and @samp{caaaar} but not the string
	260	@samp{cr}, whereas @samp{ca*r} matches all three strings.
	261
	262	@item @samp{?}
	263	@cindex @samp{?} in regexp
	264	is a postfix operator, similar to @samp{*} except that it must match the
	265	preceding expression either once or not at all. For example,
	266	@samp{ca?r} matches @samp{car} or @samp{cr}; nothing else.
	267
	268	@item @samp{*?}, @samp{+?}, @samp{??}
	269	These are ``non-greedy'' variants of the operators @samp{*}, @samp{+}
	270	and @samp{?}. Where those operators match the largest possible
	271	substring (consistent with matching the entire containing expression),
	272	the non-greedy variants match the smallest possible substring
	273	(consistent with matching the entire containing expression).
	274
	275	For example, the regular expression @samp{c[ad]*a} when applied to the
	276	string @samp{cdaaada} matches the whole string; but the regular
	277	expression @samp{c[ad]*?a}, applied to that same string, matches just
	278	@samp{cda}. (The smallest possible match here for @samp{[ad]*?} that
	279	permits the whole expression to match is @samp{d}.)
	280
	281	@item @samp{[ @dots{} ]}
	282	@cindex character alternative (in regexp)
	283	@cindex @samp{[} in regexp
	284	@cindex @samp{]} in regexp
	285	is a @dfn{character alternative}, which begins with @samp{[} and is
	286	terminated by @samp{]}. In the simplest case, the characters between
	287	the two brackets are what this character alternative can match.
	288
	289	Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and
	290	@samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s
	291	(including the empty string), from which it follows that @samp{c[ad]*r}
	292	matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
	293
	294	You can also include character ranges in a character alternative, by
	295	writing the starting and ending characters with a @samp{-} between them.
	296	Thus, @samp{[a-z]} matches any lower-case @sc{ascii} letter. Ranges may be
	297	intermixed freely with individual characters, as in @samp{[a-z$%.]},
	298	which matches any lower case @sc{ascii} letter or @samp{$}, @samp{%} or
	299	period.
	300
	301	Note that the usual regexp special characters are not special inside a
	302	character alternative. A completely different set of characters is
	303	special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
	304
	305	To include a @samp{]} in a character alternative, you must make it the
	306	first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}.
	307	To include a @samp{-}, write @samp{-} as the first or last character of
	308	the character alternative, or put it after a range. Thus, @samp{[]-]}
	309	matches both @samp{]} and @samp{-}.
	310
	311	To include @samp{^} in a character alternative, put it anywhere but at
	312	the beginning.
	313
	314	The beginning and end of a range of multibyte characters must be in
	315	the same character set (@pxref{Character Sets}). Thus,
	316	@code{"[\x8e0-\x97c]"} is invalid because character 0x8e0 (@samp{a}
	317	with grave accent) is in the Emacs character set for Latin-1 but the
	318	character 0x97c (@samp{u} with diaeresis) is in the Emacs character
	319	set for Latin-2. (We use Lisp string syntax to write that example,
	320	and a few others in the next few paragraphs, in order to include hex
	321	escape sequences in them.)
	322
	323	If a range starts with a unibyte character @var{c} and ends with a
	324	multibyte character @var{c2}, the range is divided into two parts: one
	325	is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where
	326	@var{c1} is the first character of the charset to which @var{c2}
	327	belongs.
	328
	329	You cannot always match all non-@sc{ascii} characters with the regular
	330	expression @code{"[\200-\377]"}. This works when searching a unibyte
	331	buffer or string (@pxref{Text Representations}), but not in a multibyte
	332	buffer or string, because many non-@sc{ascii} characters have codes
	333	above octal 0377. However, the regular expression @code{"[^\000-\177]"}
	334	does match all non-@sc{ascii} characters (see below regarding @samp{^}),
	335	in both multibyte and unibyte representations, because only the
	336	@sc{ascii} characters are excluded.
	337
	338	Starting in Emacs 21, a character alternative can also specify named
	339	character classes (@pxref{Char Classes}). This is a POSIX feature whose
	340	syntax is @samp{[:@var{class}:]}. Using a character class is equivalent
	341	to mentioning each of the characters in that class; but the latter is
	342	not feasible in practice, since some classes include thousands of
	343	different characters.
	344
	345	@item @samp{[^ @dots{} ]}
	346	@cindex @samp{^} in regexp
	347	@samp{[^} begins a @dfn{complemented character alternative}, which matches any
	348	character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} matches
	349	all characters @emph{except} letters and digits.
	350
	351	@samp{^} is not special in a character alternative unless it is the first
	352	character. The character following the @samp{^} is treated as if it
	353	were first (in other words, @samp{-} and @samp{]} are not special there).
	354
	355	A complemented character alternative can match a newline, unless newline is
	356	mentioned as one of the characters not to match. This is in contrast to
	357	the handling of regexps in programs such as @code{grep}.
	358
	359	@item @samp{^}
	360	@cindex beginning of line in regexp
	361	is a special character that matches the empty string, but only at the
	362	beginning of a line in the text being matched. Otherwise it fails to
	363	match anything. Thus, @samp{^foo} matches a @samp{foo} that occurs at
	364	the beginning of a line.
	365
	366	When matching a string instead of a buffer, @samp{^} matches at the
	367	beginning of the string or after a newline character.
	368
	369	For historical compatibility reasons, @samp{^} can be used only at the
	370	beginning of the regular expression, or after @samp{\(} or @samp{\\|}.
	371
	372	@item @samp{$}
	373	@cindex @samp{$} in regexp
	374	@cindex end of line in regexp
	375	is similar to @samp{^} but matches only at the end of a line. Thus,
	376	@samp{x+$} matches a string of one @samp{x} or more at the end of a line.
	377
	378	When matching a string instead of a buffer, @samp{$} matches at the end
	379	of the string or before a newline character.
	380
	381	For historical compatibility reasons, @samp{$} can be used only at the
	382	end of the regular expression, or before @samp{\)} or @samp{\\|}.
	383
	384	@item @samp{\}
	385	@cindex @samp{\} in regexp
	386	has two functions: it quotes the special characters (including
	387	@samp{\}), and it introduces additional special constructs.
	388
	389	Because @samp{\} quotes special characters, @samp{\$} is a regular
	390	expression that matches only @samp{$}, and @samp{\[} is a regular
	391	expression that matches only @samp{[}, and so on.
	392
	393	Note that @samp{\} also has special meaning in the read syntax of Lisp
	394	strings (@pxref{String Type}), and must be quoted with @samp{\}. For
	395	example, the regular expression that matches the @samp{\} character is
	396	@samp{\\}. To write a Lisp string that contains the characters
	397	@samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
	398	@samp{\}. Therefore, the read syntax for a regular expression matching
	399	@samp{\} is @code{"\\\\"}.@refill
	400	@end table
	401
	402	@strong{Please note:} For historical compatibility, special characters
	403	are treated as ordinary ones if they are in contexts where their special
	404	meanings make no sense. For example, @samp{foo} treats @samp{} as
	405	ordinary since there is no preceding expression on which the @samp{*}
	406	can act. It is poor practice to depend on this behavior; quote the
	407	special character anyway, regardless of where it appears.@refill
	408
	409	@node Char Classes
	410	@subsubsection Character Classes
	411	@cindex character classes in regexp
	412
	413	Here is a table of the classes you can use in a character alternative,
	414	in Emacs 21, and what they mean:
	415
	416	@table @samp
	417	@item [:ascii:]
	418	This matches any @sc{ascii} (unibyte) character.
	419	@item [:alnum:]
	420	This matches any letter or digit. (At present, for multibyte
	421	characters, it matches anything that has word syntax.)
	422	@item [:alpha:]
	423	This matches any letter. (At present, for multibyte characters, it
	424	matches anything that has word syntax.)
	425	@item [:blank:]
	426	This matches space and tab only.
	427	@item [:cntrl:]
	428	This matches any @sc{ascii} control character.
	429	@item [:digit:]
	430	This matches @samp{0} through @samp{9}. Thus, @samp{[-+[:digit:]]}
	431	matches any digit, as well as @samp{+} and @samp{-}.
	432	@item [:graph:]
	433	This matches graphic characters---everything except @sc{ascii} control
	434	characters, space, and the delete character.
	435	@item [:lower:]
	436	This matches any lower-case letter, as determined by
	437	the current case table (@pxref{Case Tables}).
	438	@item [:nonascii:]
	439	This matches any non-@sc{ascii} (multibyte) character.
	440	@item [:print:]
	441	This matches printing characters---everything except @sc{ascii} control
	442	characters and the delete character.
	443	@item [:punct:]
	444	This matches any punctuation character. (At present, for multibyte
	445	characters, it matches anything that has non-word syntax.)
	446	@item [:space:]
	447	This matches any character that has whitespace syntax
	448	(@pxref{Syntax Class Table}).
	449	@item [:upper:]
	450	This matches any upper-case letter, as determined by
	451	the current case table (@pxref{Case Tables}).
	452	@item [:word:]
	453	This matches any character that has word syntax (@pxref{Syntax Class
	454	Table}).
	455	@item [:xdigit:]
	456	This matches the hexadecimal digits: @samp{0} through @samp{9}, @samp{a}
	457	through @samp{f} and @samp{A} through @samp{F}.
	458	@end table
	459
	460	@node Regexp Backslash
	461	@subsubsection Backslash Constructs in Regular Expressions
	462
	463	For the most part, @samp{\} followed by any character matches only
	464	that character. However, there are several exceptions: certain
	465	two-character sequences starting with @samp{\} that have special
	466	meanings. (The character after the @samp{\} in such a sequence is
	467	always ordinary when used on its own.) Here is a table of the special
	468	@samp{\} constructs.
	469
	470	@table @samp
	471	@item \\|
	472	@cindex @samp{\|} in regexp
	473	@cindex regexp alternative
	474	specifies an alternative.
	475	Two regular expressions @var{a} and @var{b} with @samp{\\|} in
	476	between form an expression that matches anything that either @var{a} or
	477	@var{b} matches.@refill
	478
	479	Thus, @samp{foo\\|bar} matches either @samp{foo} or @samp{bar}
	480	but no other string.@refill
	481
	482	@samp{\\|} applies to the largest possible surrounding expressions. Only a
	483	surrounding @samp{$ @dots{} $} grouping can limit the grouping power of
	484	@samp{\\|}.@refill
	485
	486	Full backtracking capability exists to handle multiple uses of
	487	@samp{\\|}, if you use the POSIX regular expression functions
	488	(@pxref{POSIX Regexps}).
	489
	490	@item \@{@var{m}\@}
	491	is a postfix operator that repeats the previous pattern exactly @var{m}
	492	times. Thus, @samp{x\@{5\@}} matches the string @samp{xxxxx}
	493	and nothing else. @samp{c[ad]\@{3\@}r} matches string such as
	494	@samp{caaar}, @samp{cdddr}, @samp{cadar}, and so on.
	495
	496	@item \@{@var{m},@var{n}\@}
	497	is more general postfix operator that specifies repetition with a
	498	minimum of @var{m} repeats and a maximum of @var{n} repeats. If @var{m}
	499	is omitted, the minimum is 0; if @var{n} is omitted, there is no
	500	maximum.
	501
	502	For example, @samp{c[ad]\@{1,2\@}r} matches the strings @samp{car},
	503	@samp{cdr}, @samp{caar}, @samp{cadr}, @samp{cdar}, and @samp{cddr}, and
	504	nothing else.@*
	505	@samp{\@{0,1\@}} or @samp{\@{,1\@}} is equivalent to @samp{?}. @*
	506	@samp{\@{0,\@}} or @samp{\@{,\@}} is equivalent to @samp{}. @
	507	@samp{\@{1,\@}} is equivalent to @samp{+}.
	508
	509	@item $ @dots{} $
	510	@cindex @samp{(} in regexp
	511	@cindex @samp{)} in regexp
	512	@cindex regexp grouping
	513	is a grouping construct that serves three purposes:
	514
	515	@enumerate
	516	@item
	517	To enclose a set of @samp{\\|} alternatives for other operations. Thus,
	518	the regular expression @samp{$foo\\|bar$x} matches either @samp{foox}
	519	or @samp{barx}.
	520
	521	@item
	522	To enclose a complicated expression for the postfix operators @samp{*},
	523	@samp{+} and @samp{?} to operate on. Thus, @samp{ba$na$*} matches
	524	@samp{ba}, @samp{bana}, @samp{banana}, @samp{bananana}, etc., with any
	525	number (zero or more) of @samp{na} strings.
	526
	527	@item
	528	To record a matched substring for future reference with
	529	@samp{\@var{digit}} (see below).
	530	@end enumerate
	531
	532	This last application is not a consequence of the idea of a
	533	parenthetical grouping; it is a separate feature that was assigned as a
	534	second meaning to the same @samp{$ @dots{} $} construct because, in
	535	practice, there was usually no conflict between the two meanings. But
	536	occasionally there is a conflict, and that led to the introduction of
	537	shy groups.
	538
	539	@item $?: @dots{} $
	540	is the @dfn{shy group} construct. A shy group serves the first two
	541	purposes of an ordinary group (controlling the nesting of other
	542	operators), but it does not get a number, so you cannot refer back to
	543	its value with @samp{\@var{digit}}.
	544
	545	Shy groups are particulary useful for mechanically-constructed regular
	546	expressions because they can be added automatically without altering the
	547	numbering of any ordinary, non-shy groups.
	548
	549	@item \@var{digit}
	550	matches the same text that matched the @var{digit}th occurrence of a
	551	grouping (@samp{$ @dots{} $}) construct.
	552
	553	In other words, after the end of a group, the matcher remembers the
	554	beginning and end of the text matched by that group. Later on in the
	555	regular expression you can use @samp{\} followed by @var{digit} to
	556	match that same text, whatever it may have been.
	557
	558	The strings matching the first nine grouping constructs appearing in
	559	the entire regular expression passed to a search or matching function
	560	are assigned numbers 1 through 9 in the order that the open
	561	parentheses appear in the regular expression. So you can use
	562	@samp{\1} through @samp{\9} to refer to the text matched by the
	563	corresponding grouping constructs.
	564
	565	For example, @samp{$.*$\1} matches any newline-free string that is
	566	composed of two identical halves. The @samp{$.*$} matches the first
	567	half, which may be anything, but the @samp{\1} that follows must match
	568	the same exact text.
	569
	570	If a particular grouping construct in the regular expression was never
	571	matched---for instance, if it appears inside of an alternative that
	572	wasn't used, or inside of a repetition that repeated zero times---then
	573	the corresponding @samp{\@var{digit}} construct never matches
	574	anything. To use an artificial example,, @samp{$foo\(b*$\\|lose\)\2}
	575	cannot match @samp{lose}: the second alternative inside the larger
	576	group matches it, but then @samp{\2} is undefined and can't match
	577	anything. But it can match @samp{foobb}, because the first
	578	alternative matches @samp{foob} and @samp{\2} matches @samp{b}.
	579
	580	@item \w
	581	@cindex @samp{\w} in regexp
	582	matches any word-constituent character. The editor syntax table
	583	determines which characters these are. @xref{Syntax Tables}.
	584
	585	@item \W
	586	@cindex @samp{\W} in regexp
	587	matches any character that is not a word constituent.
	588
	589	@item \s@var{code}
	590	@cindex @samp{\s} in regexp
	591	matches any character whose syntax is @var{code}. Here @var{code} is a
	592	character that represents a syntax code: thus, @samp{w} for word
	593	constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
	594	etc. To represent whitespace syntax, use either @samp{-} or a space
	595	character. @xref{Syntax Class Table}, for a list of syntax codes and
	596	the characters that stand for them.
	597
	598	@item \S@var{code}
	599	@cindex @samp{\S} in regexp
	600	matches any character whose syntax is not @var{code}.
	601
	602	@item \c@var{c}
	603	matches any character whose category is @var{c}. Here @var{c} is a
	604	character that represents a category: thus, @samp{c} for Chinese
	605	characters or @samp{g} for Greek characters in the standard category
	606	table.
	607
	608	@item \C@var{c}
	609	matches any character whose category is not @var{c}.
	610	@end table
	611
	612	The following regular expression constructs match the empty string---that is,
	613	they don't use up any characters---but whether they match depends on the
	614	context.
	615
	616	@table @samp
	617	@item \`
	618	@cindex @samp{\`} in regexp
	619	matches the empty string, but only at the beginning
	620	of the buffer or string being matched against.
	621
	622	@item \'
	623	@cindex @samp{\'} in regexp
	624	matches the empty string, but only at the end of
	625	the buffer or string being matched against.
	626
	627	@item \=
	628	@cindex @samp{\=} in regexp
	629	matches the empty string, but only at point.
	630	(This construct is not defined when matching against a string.)
	631
	632	@item \b
	633	@cindex @samp{\b} in regexp
	634	matches the empty string, but only at the beginning or
	635	end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
	636	@samp{foo} as a separate word. @samp{\bballs?\b} matches
	637	@samp{ball} or @samp{balls} as a separate word.@refill
	638
	639	@samp{\b} matches at the beginning or end of the buffer
	640	regardless of what text appears next to it.
	641
	642	@item \B
	643	@cindex @samp{\B} in regexp
	644	matches the empty string, but @emph{not} at the beginning or
	645	end of a word.
	646
	647	@item \<
	648	@cindex @samp{\<} in regexp
	649	matches the empty string, but only at the beginning of a word.
	650	@samp{\<} matches at the beginning of the buffer only if a
	651	word-constituent character follows.
	652
	653	@item \>
	654	@cindex @samp{\>} in regexp
	655	matches the empty string, but only at the end of a word. @samp{\>}
	656	matches at the end of the buffer only if the contents end with a
	657	word-constituent character.
	658	@end table
	659
	660	@kindex invalid-regexp
	661	Not every string is a valid regular expression. For example, a string
	662	with unbalanced square brackets is invalid (with a few exceptions, such
	663	as @samp{[]]}), and so is a string that ends with a single @samp{\}. If
	664	an invalid regular expression is passed to any of the search functions,
	665	an @code{invalid-regexp} error is signaled.
	666
	667	@node Regexp Example
	668	@comment node-name, next, previous, up
	669	@subsection Complex Regexp Example
	670
	671	Here is a complicated regexp, used by Emacs to recognize the end of a
	672	sentence together with any whitespace that follows. It is the value of
	673	the variable @code{sentence-end}.
	674
	675	First, we show the regexp as a string in Lisp syntax to distinguish
	676	spaces from tab characters. The string constant begins and ends with a
	677	double-quote. @samp{\"} stands for a double-quote as part of the
	678	string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
	679	tab and @samp{\n} for a newline.
	680
	681	@example
	682	"[.?!][]\"')@}]\$$\\\| $\\\|\t\\\| \$[ \t\n]"
	683	@end example
	684
	685	@noindent
	686	In contrast, if you evaluate the variable @code{sentence-end}, you
	687	will see the following:
	688
	689	@example
	690	@group
	691	sentence-end
	692	@result{} "[.?!][]\"')@}]*\$$\\\| $\\\| \\\| \$[
	693	]*"
	694	@end group
	695	@end example
	696
	697	@noindent
	698	In this output, tab and newline appear as themselves.
	699
	700	This regular expression contains four parts in succession and can be
	701	deciphered as follows:
	702
	703	@table @code
	704	@item [.?!]
	705	The first part of the pattern is a character alternative that matches
	706	any one of three characters: period, question mark, and exclamation
	707	mark. The match must begin with one of these three characters.
	708
	709	@item []\"')@}]*
	710	The second part of the pattern matches any closing braces and quotation
	711	marks, zero or more of them, that may follow the period, question mark
	712	or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in
	713	a string. The @samp{*} at the end indicates that the immediately
	714	preceding regular expression (a character alternative, in this case) may be
	715	repeated zero or more times.
	716
	717	@item \$$\\\|@ $\\\|\t\\\|@ @ \$
	718	The third part of the pattern matches the whitespace that follows the
	719	end of a sentence: the end of a line (optionally with a space), or a
	720	tab, or two spaces. The double backslashes mark the parentheses and
	721	vertical bars as regular expression syntax; the parentheses delimit a
	722	group and the vertical bars separate alternatives. The dollar sign is
	723	used to match the end of a line.
	724
	725	@item [ \t\n]*
	726	Finally, the last part of the pattern matches any additional whitespace
	727	beyond the minimum needed to end a sentence.
	728	@end table
	729
	730	@node Regexp Functions
	731	@subsection Regular Expression Functions
	732
	733	These functions operate on regular expressions.
	734
	735	@defun regexp-quote string
	736	This function returns a regular expression whose only exact match is
	737	@var{string}. Using this regular expression in @code{looking-at} will
	738	succeed only if the next characters in the buffer are @var{string};
	739	using it in a search function will succeed if the text being searched
	740	contains @var{string}.
	741
	742	This allows you to request an exact string match or search when calling
	743	a function that wants a regular expression.
	744
	745	@example
	746	@group
	747	(regexp-quote "^The cat$")
	748	@result{} "\\^The cat\\$"
	749	@end group
	750	@end example
	751
	752	One use of @code{regexp-quote} is to combine an exact string match with
	753	context described as a regular expression. For example, this searches
	754	for the string that is the value of @var{string}, surrounded by
	755	whitespace:
	756
	757	@example
	758	@group
	759	(re-search-forward
	760	(concat "\\s-" (regexp-quote string) "\\s-"))
	761	@end group
	762	@end example
	763	@end defun
	764
	765	@defun regexp-opt strings &optional paren
	766	This function returns an efficient regular expression that will match
	767	any of the strings @var{strings}. This is useful when you need to make
	768	matching or searching as fast as possible---for example, for Font Lock
	769	mode.
	770
	771	If the optional argument @var{paren} is non-@code{nil}, then the
	772	returned regular expression is always enclosed by at least one
	773	parentheses-grouping construct.
	774
	775	This simplified definition of @code{regexp-opt} produces a
	776	regular expression which is equivalent to the actual value
	777	(but not as efficient):
	778
	779	@example
	780	(defun regexp-opt (strings paren)
	781	(let ((open-paren (if paren "\\(" ""))
	782	(close-paren (if paren "\\)" "")))
	783	(concat open-paren
	784	(mapconcat 'regexp-quote strings "\\\|")
	785	close-paren)))
	786	@end example
	787	@end defun
	788
	789	@defun regexp-opt-depth regexp
	790	This function returns the total number of grouping constructs
	791	(parenthesized expressions) in @var{regexp}.
	792	@end defun
	793
	794	@node Regexp Search
	795	@section Regular Expression Searching
	796	@cindex regular expression searching
	797	@cindex regexp searching
	798	@cindex searching for regexp
	799
	800	In GNU Emacs, you can search for the next match for a regular
	801	expression either incrementally or not. For incremental search
	802	commands, see @ref{Regexp Search, , Regular Expression Search, emacs,
	803	The GNU Emacs Manual}. Here we describe only the search functions
	804	useful in programs. The principal one is @code{re-search-forward}.
	805
	806	These search functions convert the regular expression to multibyte if
	807	the buffer is multibyte; they convert the regular expression to unibyte
	808	if the buffer is unibyte. @xref{Text Representations}.
	809
	810	@deffn Command re-search-forward regexp &optional limit noerror repeat
	811	This function searches forward in the current buffer for a string of
	812	text that is matched by the regular expression @var{regexp}. The
	813	function skips over any amount of text that is not matched by
	814	@var{regexp}, and leaves point at the end of the first match found.
	815	It returns the new value of point.
	816
	817	If @var{limit} is non-@code{nil} (it must be a position in the current
	818	buffer), then it is the upper bound to the search. No match extending
	819	after that position is accepted.
	820
	821	If @var{repeat} is supplied (it must be a positive number), then the
	822	search is repeated that many times (each time starting at the end of the
	823	previous time's match). If all these successive searches succeed, the
	824	function succeeds, moving point and returning its new value. Otherwise
	825	the function fails.
	826
	827	What happens when the function fails depends on the value of
	828	@var{noerror}. If @var{noerror} is @code{nil}, a @code{search-failed}
	829	error is signaled. If @var{noerror} is @code{t},
	830	@code{re-search-forward} does nothing and returns @code{nil}. If
	831	@var{noerror} is neither @code{nil} nor @code{t}, then
	832	@code{re-search-forward} moves point to @var{limit} (or the end of the
	833	buffer) and returns @code{nil}.
	834
	835	In the following example, point is initially before the @samp{T}.
	836	Evaluating the search call moves point to the end of that line (between
	837	the @samp{t} of @samp{hat} and the newline).
	838
	839	@example
	840	@group
	841	---------- Buffer: foo ----------
	842	I read "@point{}The cat in the hat
	843	comes back" twice.
	844	---------- Buffer: foo ----------
	845	@end group
	846
	847	@group
	848	(re-search-forward "[a-z]+" nil t 5)
	849	@result{} 27
	850
	851	---------- Buffer: foo ----------
	852	I read "The cat in the hat@point{}
	853	comes back" twice.
	854	---------- Buffer: foo ----------
	855	@end group
	856	@end example
	857	@end deffn
	858
	859	@deffn Command re-search-backward regexp &optional limit noerror repeat
	860	This function searches backward in the current buffer for a string of
	861	text that is matched by the regular expression @var{regexp}, leaving
	862	point at the beginning of the first text found.
	863
	864	This function is analogous to @code{re-search-forward}, but they are not
	865	simple mirror images. @code{re-search-forward} finds the match whose
	866	beginning is as close as possible to the starting point. If
	867	@code{re-search-backward} were a perfect mirror image, it would find the
	868	match whose end is as close as possible. However, in fact it finds the
	869	match whose beginning is as close as possible. The reason for this is that
	870	matching a regular expression at a given spot always works from
	871	beginning to end, and starts at a specified beginning position.
	872
	873	A true mirror-image of @code{re-search-forward} would require a special
	874	feature for matching regular expressions from end to beginning. It's
	875	not worth the trouble of implementing that.
	876	@end deffn
	877
	878	@defun string-match regexp string &optional start
	879	This function returns the index of the start of the first match for
	880	the regular expression @var{regexp} in @var{string}, or @code{nil} if
	881	there is no match. If @var{start} is non-@code{nil}, the search starts
	882	at that index in @var{string}.
	883
	884	For example,
	885
	886	@example
	887	@group
	888	(string-match
	889	"quick" "The quick brown fox jumped quickly.")
	890	@result{} 4
	891	@end group
	892	@group
	893	(string-match
	894	"quick" "The quick brown fox jumped quickly." 8)
	895	@result{} 27
	896	@end group
	897	@end example
	898
	899	@noindent
	900	The index of the first character of the
	901	string is 0, the index of the second character is 1, and so on.
	902
	903	After this function returns, the index of the first character beyond
	904	the match is available as @code{(match-end 0)}. @xref{Match Data}.
	905
	906	@example
	907	@group
	908	(string-match
	909	"quick" "The quick brown fox jumped quickly." 8)
	910	@result{} 27
	911	@end group
	912
	913	@group
	914	(match-end 0)
	915	@result{} 32
	916	@end group
	917	@end example
	918	@end defun
	919
	920	@defun looking-at regexp
	921	This function determines whether the text in the current buffer directly
	922	following point matches the regular expression @var{regexp}. ``Directly
	923	following'' means precisely that: the search is ``anchored'' and it can
	924	succeed only starting with the first character following point. The
	925	result is @code{t} if so, @code{nil} otherwise.
	926
	927	This function does not move point, but it updates the match data, which
	928	you can access using @code{match-beginning} and @code{match-end}.
	929	@xref{Match Data}.
	930
	931	In this example, point is located directly before the @samp{T}. If it
	932	were anywhere else, the result would be @code{nil}.
	933
	934	@example
	935	@group
	936	---------- Buffer: foo ----------
	937	I read "@point{}The cat in the hat
	938	comes back" twice.
	939	---------- Buffer: foo ----------
	940
	941	(looking-at "The cat in the hat$")
	942	@result{} t
	943	@end group
	944	@end example
	945	@end defun
	946
	947	@node POSIX Regexps
	948	@section POSIX Regular Expression Searching
	949
	950	The usual regular expression functions do backtracking when necessary
	951	to handle the @samp{\\|} and repetition constructs, but they continue
	952	this only until they find @emph{some} match. Then they succeed and
	953	report the first match found.
	954
	955	This section describes alternative search functions which perform the
	956	full backtracking specified by the POSIX standard for regular expression
	957	matching. They continue backtracking until they have tried all
	958	possibilities and found all matches, so they can report the longest
	959	match, as required by POSIX. This is much slower, so use these
	960	functions only when you really need the longest match.
	961
	962	@defun posix-search-forward regexp &optional limit noerror repeat
	963	This is like @code{re-search-forward} except that it performs the full
	964	backtracking specified by the POSIX standard for regular expression
	965	matching.
	966	@end defun
	967
	968	@defun posix-search-backward regexp &optional limit noerror repeat
	969	This is like @code{re-search-backward} except that it performs the full
	970	backtracking specified by the POSIX standard for regular expression
	971	matching.
	972	@end defun
	973
	974	@defun posix-looking-at regexp
	975	This is like @code{looking-at} except that it performs the full
	976	backtracking specified by the POSIX standard for regular expression
	977	matching.
	978	@end defun
	979
	980	@defun posix-string-match regexp string &optional start
	981	This is like @code{string-match} except that it performs the full
	982	backtracking specified by the POSIX standard for regular expression
	983	matching.
	984	@end defun
	985
	986	@ignore
	987	@deffn Command delete-matching-lines regexp
	988	This function is identical to @code{delete-non-matching-lines}, save
	989	that it deletes what @code{delete-non-matching-lines} keeps.
	990
	991	In the example below, point is located on the first line of text.
	992
	993	@example
	994	@group
	995	---------- Buffer: foo ----------
	996	We hold these truths
	997	to be self-evident,
	998	that all men are created
	999	equal, and that they are
	1000	---------- Buffer: foo ----------
	1001	@end group
	1002
	1003	@group
	1004	(delete-matching-lines "the")
	1005	@result{} nil
	1006
	1007	---------- Buffer: foo ----------
	1008	to be self-evident,
	1009	that all men are created
	1010	---------- Buffer: foo ----------
	1011	@end group
	1012	@end example
	1013	@end deffn
	1014
	1015	@deffn Command flush-lines regexp
	1016	This function is the same as @code{delete-matching-lines}.
	1017	@end deffn
	1018
	1019	@defun delete-non-matching-lines regexp
	1020	This function deletes all lines following point which don't
	1021	contain a match for the regular expression @var{regexp}.
	1022	@end defun
	1023
	1024	@deffn Command keep-lines regexp
	1025	This function is the same as @code{delete-non-matching-lines}.
	1026	@end deffn
	1027
	1028	@deffn Command how-many regexp
	1029	This function counts the number of matches for @var{regexp} there are in
	1030	the current buffer following point. It prints this number in
	1031	the echo area, returning the string printed.
	1032	@end deffn
	1033
	1034	@deffn Command count-matches regexp
	1035	This function is a synonym of @code{how-many}.
	1036	@end deffn
	1037
	1038	@deffn Command list-matching-lines regexp &optional nlines
	1039	This function is a synonym of @code{occur}.
	1040	Show all lines following point containing a match for @var{regexp}.
	1041	Display each line with @var{nlines} lines before and after,
	1042	or @code{-}@var{nlines} before if @var{nlines} is negative.
	1043	@var{nlines} defaults to @code{list-matching-lines-default-context-lines}.
	1044	Interactively it is the prefix arg.
	1045
	1046	The lines are shown in a buffer named @samp{Occur}.
	1047	It serves as a menu to find any of the occurrences in this buffer.
	1048	@kbd{C-h m} (@code{describe-mode}) in that buffer gives help.
	1049	@end deffn
	1050
	1051	@defopt list-matching-lines-default-context-lines
	1052	Default value is 0.
	1053	Default number of context lines to include around a @code{list-matching-lines}
	1054	match. A negative number means to include that many lines before the match.
	1055	A positive number means to include that many lines both before and after.
	1056	@end defopt
	1057	@end ignore
	1058
	1059	@node Search and Replace
	1060	@section Search and Replace
	1061	@cindex replacement
	1062
	1063	@defun perform-replace from-string replacements query-flag regexp-flag delimited-flag &optional repeat-count map start end
	1064	This function is the guts of @code{query-replace} and related
	1065	commands. It searches for occurrences of @var{from-string} in the
	1066	text between positions @var{start} and @var{end} and replaces some or
	1067	all of them. If @var{start} is @code{nil} (or omitted), point is used
	1068	instead, and the buffer's end is used for @var{end}.
	1069
	1070	If @var{query-flag} is @code{nil}, it replaces all
	1071	occurrences; otherwise, it asks the user what to do about each one.
	1072
	1073	If @var{regexp-flag} is non-@code{nil}, then @var{from-string} is
	1074	considered a regular expression; otherwise, it must match literally. If
	1075	@var{delimited-flag} is non-@code{nil}, then only replacements
	1076	surrounded by word boundaries are considered.
	1077
	1078	The argument @var{replacements} specifies what to replace occurrences
	1079	with. If it is a string, that string is used. It can also be a list of
	1080	strings, to be used in cyclic order.
	1081
	1082	If @var{replacements} is a cons cell, @code{(@var{function}
	1083	. @var{data})}, this means to call @var{function} after each match to
	1084	get the replacement text. This function is called with two arguments:
	1085	@var{data}, and the number of replacements already made.
	1086
	1087	If @var{repeat-count} is non-@code{nil}, it should be an integer. Then
	1088	it specifies how many times to use each of the strings in the
	1089	@var{replacements} list before advancing cyclicly to the next one.
	1090
	1091	If @var{from-string} contains upper-case letters, then
	1092	@code{perform-replace} binds @code{case-fold-search} to @code{nil}, and
	1093	it uses the @code{replacements} without altering the case of them.
	1094
	1095	Normally, the keymap @code{query-replace-map} defines the possible user
	1096	responses for queries. The argument @var{map}, if non-@code{nil}, is a
	1097	keymap to use instead of @code{query-replace-map}.
	1098	@end defun
	1099
	1100	@defvar query-replace-map
	1101	This variable holds a special keymap that defines the valid user
	1102	responses for @code{query-replace} and related functions, as well as
	1103	@code{y-or-n-p} and @code{map-y-or-n-p}. It is unusual in two ways:
	1104
	1105	@itemize @bullet
	1106	@item
	1107	The ``key bindings'' are not commands, just symbols that are meaningful
	1108	to the functions that use this map.
	1109
	1110	@item
	1111	Prefix keys are not supported; each key binding must be for a
	1112	single-event key sequence. This is because the functions don't use
	1113	@code{read-key-sequence} to get the input; instead, they read a single
	1114	event and look it up ``by hand.''
	1115	@end itemize
	1116	@end defvar
	1117
	1118	Here are the meaningful ``bindings'' for @code{query-replace-map}.
	1119	Several of them are meaningful only for @code{query-replace} and
	1120	friends.
	1121
	1122	@table @code
	1123	@item act
	1124	Do take the action being considered---in other words, ``yes.''
	1125
	1126	@item skip
	1127	Do not take action for this question---in other words, ``no.''
	1128
	1129	@item exit
	1130	Answer this question ``no,'' and give up on the entire series of
	1131	questions, assuming that the answers will be ``no.''
	1132
	1133	@item act-and-exit
	1134	Answer this question ``yes,'' and give up on the entire series of
	1135	questions, assuming that subsequent answers will be ``no.''
	1136
	1137	@item act-and-show
	1138	Answer this question ``yes,'' but show the results---don't advance yet
	1139	to the next question.
	1140
	1141	@item automatic
	1142	Answer this question and all subsequent questions in the series with
	1143	``yes,'' without further user interaction.
	1144
	1145	@item backup
	1146	Move back to the previous place that a question was asked about.
	1147
	1148	@item edit
	1149	Enter a recursive edit to deal with this question---instead of any
	1150	other action that would normally be taken.
	1151
	1152	@item delete-and-edit
	1153	Delete the text being considered, then enter a recursive edit to replace
	1154	it.
	1155
	1156	@item recenter
	1157	Redisplay and center the window, then ask the same question again.
	1158
	1159	@item quit
	1160	Perform a quit right away. Only @code{y-or-n-p} and related functions
	1161	use this answer.
	1162
	1163	@item help
	1164	Display some help, then ask again.
	1165	@end table
	1166
	1167	@node Match Data
	1168	@section The Match Data
	1169	@cindex match data
	1170
	1171	Emacs keeps track of the start and end positions of the segments of
	1172	text found during a regular expression search. This means, for example,
	1173	that you can search for a complex pattern, such as a date in an Rmail
	1174	message, and then extract parts of the match under control of the
	1175	pattern.
	1176
	1177	Because the match data normally describe the most recent search only,
	1178	you must be careful not to do another search inadvertently between the
	1179	search you wish to refer back to and the use of the match data. If you
	1180	can't avoid another intervening search, you must save and restore the
	1181	match data around it, to prevent it from being overwritten.
	1182
	1183	@menu
	1184	* Replacing Match:: Replacing a substring that was matched.
	1185	* Simple Match Data:: Accessing single items of match data,
	1186	such as where a particular subexpression started.
	1187	* Entire Match Data:: Accessing the entire match data at once, as a list.
	1188	* Saving Match Data:: Saving and restoring the match data.
	1189	@end menu
	1190
	1191	@node Replacing Match
	1192	@subsection Replacing the Text that Matched
	1193
	1194	This function replaces the text matched by the last search with
	1195	@var{replacement}.
	1196
	1197	@cindex case in replacements
	1198	@defun replace-match replacement &optional fixedcase literal string subexp
	1199	This function replaces the text in the buffer (or in @var{string}) that
	1200	was matched by the last search. It replaces that text with
	1201	@var{replacement}.
	1202
	1203	If you did the last search in a buffer, you should specify @code{nil}
	1204	for @var{string}. Then @code{replace-match} does the replacement by
	1205	editing the buffer; it leaves point at the end of the replacement text,
	1206	and returns @code{t}.
	1207
	1208	If you did the search in a string, pass the same string as @var{string}.
	1209	Then @code{replace-match} does the replacement by constructing and
	1210	returning a new string.
	1211
	1212	If @var{fixedcase} is non-@code{nil}, then @code{replace-match} uses
	1213	the replacement text without case conversion; otherwise, it converts
	1214	the replacement text depending upon the capitalization of the text to
	1215	be replaced. If the original text is all upper case, this converts
	1216	the replacement text to upper case. If all words of the original text
	1217	are capitalized, this capitalizes all the words of the replacement
	1218	text. If all the words are one-letter and they are all upper case,
	1219	they are treated as capitalized words rather than all-upper-case
	1220	words.
	1221
	1222	If @var{literal} is non-@code{nil}, then @var{replacement} is inserted
	1223	exactly as it is, the only alterations being case changes as needed.
	1224	If it is @code{nil} (the default), then the character @samp{\} is treated
	1225	specially. If a @samp{\} appears in @var{replacement}, then it must be
	1226	part of one of the following sequences:
	1227
	1228	@table @asis
	1229	@item @samp{\&}
	1230	@cindex @samp{&} in replacement
	1231	@samp{\&} stands for the entire text being replaced.
	1232
	1233	@item @samp{\@var{n}}
	1234	@cindex @samp{\@var{n}} in replacement
	1235	@samp{\@var{n}}, where @var{n} is a digit, stands for the text that
	1236	matched the @var{n}th subexpression in the original regexp.
	1237	Subexpressions are those expressions grouped inside @samp{$@dots{}$}.
	1238
	1239	@item @samp{\\}
	1240	@cindex @samp{\} in replacement
	1241	@samp{\\} stands for a single @samp{\} in the replacement text.
	1242	@end table
	1243
	1244	These substitutions occur after case conversion, if any,
	1245	so the strings they substitute are never case-converted.
	1246
	1247	If @var{subexp} is non-@code{nil}, that says to replace just
	1248	subexpression number @var{subexp} of the regexp that was matched, not
	1249	the entire match. For example, after matching @samp{foo $ba*r$},
	1250	calling @code{replace-match} with 1 as @var{subexp} means to replace
	1251	just the text that matched @samp{$ba*r$}.
	1252	@end defun
	1253
	1254	@node Simple Match Data
	1255	@subsection Simple Match Data Access
	1256
	1257	This section explains how to use the match data to find out what was
	1258	matched by the last search or match operation.
	1259
	1260	You can ask about the entire matching text, or about a particular
	1261	parenthetical subexpression of a regular expression. The @var{count}
	1262	argument in the functions below specifies which. If @var{count} is
	1263	zero, you are asking about the entire match. If @var{count} is
	1264	positive, it specifies which subexpression you want.
	1265
	1266	Recall that the subexpressions of a regular expression are those
	1267	expressions grouped with escaped parentheses, @samp{$@dots{}$}. The
	1268	@var{count}th subexpression is found by counting occurrences of
	1269	@samp{\(} from the beginning of the whole regular expression. The first
	1270	subexpression is numbered 1, the second 2, and so on. Only regular
	1271	expressions can have subexpressions---after a simple string search, the
	1272	only information available is about the entire match.
	1273
	1274	A search which fails may or may not alter the match data. In the
	1275	past, a failing search did not do this, but we may change it in the
	1276	future.
	1277
	1278	@defun match-string count &optional in-string
	1279	This function returns, as a string, the text matched in the last search
	1280	or match operation. It returns the entire text if @var{count} is zero,
	1281	or just the portion corresponding to the @var{count}th parenthetical
	1282	subexpression, if @var{count} is positive.
	1283
	1284	If the last such operation was done against a string with
	1285	@code{string-match}, then you should pass the same string as the
	1286	argument @var{in-string}. After a buffer search or match,
	1287	you should omit @var{in-string} or pass @code{nil} for it; but you
	1288	should make sure that the current buffer when you call
	1289	@code{match-string} is the one in which you did the searching or
	1290	matching.
	1291
	1292	The value is @code{nil} if @var{count} is out of range, or for a
	1293	subexpression inside a @samp{\\|} alternative that wasn't used or a
	1294	repetition that repeated zero times.
	1295	@end defun
	1296
	1297	@defun match-string-no-properties count &optional in-string
	1298	This function is like @code{match-string} except that the result
	1299	has no text properties.
	1300	@end defun
	1301
	1302	@defun match-beginning count
	1303	This function returns the position of the start of text matched by the
	1304	last regular expression searched for, or a subexpression of it.
	1305
	1306	If @var{count} is zero, then the value is the position of the start of
	1307	the entire match. Otherwise, @var{count} specifies a subexpression in
	1308	the regular expression, and the value of the function is the starting
	1309	position of the match for that subexpression.
	1310
	1311	The value is @code{nil} for a subexpression inside a @samp{\\|}
	1312	alternative that wasn't used or a repetition that repeated zero times.
	1313	@end defun
	1314
	1315	@defun match-end count
	1316	This function is like @code{match-beginning} except that it returns the
	1317	position of the end of the match, rather than the position of the
	1318	beginning.
	1319	@end defun
	1320
	1321	Here is an example of using the match data, with a comment showing the
	1322	positions within the text:
	1323
	1324	@example
	1325	@group
	1326	(string-match "\$qu\$\$ick\$"
	1327	"The quick fox jumped quickly.")
	1328	;0123456789
	1329	@result{} 4
	1330	@end group
	1331
	1332	@group
	1333	(match-string 0 "The quick fox jumped quickly.")
	1334	@result{} "quick"
	1335	(match-string 1 "The quick fox jumped quickly.")
	1336	@result{} "qu"
	1337	(match-string 2 "The quick fox jumped quickly.")
	1338	@result{} "ick"
	1339	@end group
	1340
	1341	@group
	1342	(match-beginning 1) ; @r{The beginning of the match}
	1343	@result{} 4 ; @r{with @samp{qu} is at index 4.}
	1344	@end group
	1345
	1346	@group
	1347	(match-beginning 2) ; @r{The beginning of the match}
	1348	@result{} 6 ; @r{with @samp{ick} is at index 6.}
	1349	@end group
	1350
	1351	@group
	1352	(match-end 1) ; @r{The end of the match}
	1353	@result{} 6 ; @r{with @samp{qu} is at index 6.}
	1354
	1355	(match-end 2) ; @r{The end of the match}
	1356	@result{} 9 ; @r{with @samp{ick} is at index 9.}
	1357	@end group
	1358	@end example
	1359
	1360	Here is another example. Point is initially located at the beginning
	1361	of the line. Searching moves point to between the space and the word
	1362	@samp{in}. The beginning of the entire match is at the 9th character of
	1363	the buffer (@samp{T}), and the beginning of the match for the first
	1364	subexpression is at the 13th character (@samp{c}).
	1365
	1366	@example
	1367	@group
	1368	(list
	1369	(re-search-forward "The \$cat \$")
	1370	(match-beginning 0)
	1371	(match-beginning 1))
	1372	@result{} (9 9 13)
	1373	@end group
	1374
	1375	@group
	1376	---------- Buffer: foo ----------
	1377	I read "The cat @point{}in the hat comes back" twice.
	1378	^ ^
	1379	9 13
	1380	---------- Buffer: foo ----------
	1381	@end group
	1382	@end example
	1383
	1384	@noindent
	1385	(In this case, the index returned is a buffer position; the first
	1386	character of the buffer counts as 1.)
	1387
	1388	@node Entire Match Data
	1389	@subsection Accessing the Entire Match Data
	1390
	1391	The functions @code{match-data} and @code{set-match-data} read or
	1392	write the entire match data, all at once.
	1393
	1394	@defun match-data
	1395	This function returns a newly constructed list containing all the
	1396	information on what text the last search matched. Element zero is the
	1397	position of the beginning of the match for the whole expression; element
	1398	one is the position of the end of the match for the expression. The
	1399	next two elements are the positions of the beginning and end of the
	1400	match for the first subexpression, and so on. In general, element
	1401	@ifnottex
	1402	number 2@var{n}
	1403	@end ifnottex
	1404	@tex
	1405	number {\mathsurround=0pt $2n$}
	1406	@end tex
	1407	corresponds to @code{(match-beginning @var{n})}; and
	1408	element
	1409	@ifnottex
	1410	number 2@var{n} + 1
	1411	@end ifnottex
	1412	@tex
	1413	number {\mathsurround=0pt $2n+1$}
	1414	@end tex
	1415	corresponds to @code{(match-end @var{n})}.
	1416
	1417	All the elements are markers or @code{nil} if matching was done on a
	1418	buffer, and all are integers or @code{nil} if matching was done on a
	1419	string with @code{string-match}.
	1420
	1421	As always, there must be no possibility of intervening searches between
	1422	the call to a search function and the call to @code{match-data} that is
	1423	intended to access the match data for that search.
	1424
	1425	@example
	1426	@group
	1427	(match-data)
	1428	@result{} (#<marker at 9 in foo>
	1429	#<marker at 17 in foo>
	1430	#<marker at 13 in foo>
	1431	#<marker at 17 in foo>)
	1432	@end group
	1433	@end example
	1434	@end defun
	1435
	1436	@defun set-match-data match-list
	1437	This function sets the match data from the elements of @var{match-list},
	1438	which should be a list that was the value of a previous call to
	1439	@code{match-data}. (More precisely, anything that has the same format
	1440	will work.)
	1441
	1442	If @var{match-list} refers to a buffer that doesn't exist, you don't get
	1443	an error; that sets the match data in a meaningless but harmless way.
	1444
	1445	@findex store-match-data
	1446	@code{store-match-data} is a semi-obsolete alias for @code{set-match-data}.
	1447	@end defun
	1448
	1449	@node Saving Match Data
	1450	@subsection Saving and Restoring the Match Data
	1451
	1452	When you call a function that may do a search, you may need to save
	1453	and restore the match data around that call, if you want to preserve the
	1454	match data from an earlier search for later use. Here is an example
	1455	that shows the problem that arises if you fail to save the match data:
	1456
	1457	@example
	1458	@group
	1459	(re-search-forward "The \$cat \$")
	1460	@result{} 48
	1461	(foo) ; @r{Perhaps @code{foo} does}
	1462	; @r{more searching.}
	1463	(match-end 0)
	1464	@result{} 61 ; @r{Unexpected result---not 48!}
	1465	@end group
	1466	@end example
	1467
	1468	You can save and restore the match data with @code{save-match-data}:
	1469
	1470	@defmac save-match-data body@dots{}
	1471	This macro executes @var{body}, saving and restoring the match
	1472	data around it.
	1473	@end defmac
	1474
	1475	You could use @code{set-match-data} together with @code{match-data} to
	1476	imitate the effect of the special form @code{save-match-data}. Here is
	1477	how:
	1478
	1479	@example
	1480	@group
	1481	(let ((data (match-data)))
	1482	(unwind-protect
	1483	@dots{} ; @r{Ok to change the original match data.}
	1484	(set-match-data data)))
	1485	@end group
	1486	@end example
	1487
	1488	Emacs automatically saves and restores the match data when it runs
	1489	process filter functions (@pxref{Filter Functions}) and process
	1490	sentinels (@pxref{Sentinels}).
	1491
	1492	@ignore
	1493	Here is a function which restores the match data provided the buffer
	1494	associated with it still exists.
	1495
	1496	@smallexample
	1497	@group
	1498	(defun restore-match-data (data)
	1499	@c It is incorrect to split the first line of a doc string.
	1500	@c If there's a problem here, it should be solved in some other way.
	1501	"Restore the match data DATA unless the buffer is missing."
	1502	(catch 'foo
	1503	(let ((d data))
	1504	@end group
	1505	(while d
	1506	(and (car d)
	1507	(null (marker-buffer (car d)))
	1508	@group
	1509	;; @file{match-data} @r{buffer is deleted.}
	1510	(throw 'foo nil))
	1511	(setq d (cdr d)))
	1512	(set-match-data data))))
	1513	@end group
	1514	@end smallexample
	1515	@end ignore
	1516
	1517	@node Searching and Case
	1518	@section Searching and Case
	1519	@cindex searching and case
	1520
	1521	By default, searches in Emacs ignore the case of the text they are
	1522	searching through; if you specify searching for @samp{FOO}, then
	1523	@samp{Foo} or @samp{foo} is also considered a match. This applies to
	1524	regular expressions, too; thus, @samp{[aB]} would match @samp{a} or
	1525	@samp{A} or @samp{b} or @samp{B}.
	1526
	1527	If you do not want this feature, set the variable
	1528	@code{case-fold-search} to @code{nil}. Then all letters must match
	1529	exactly, including case. This is a buffer-local variable; altering the
	1530	variable affects only the current buffer. (@xref{Intro to
	1531	Buffer-Local}.) Alternatively, you may change the value of
	1532	@code{default-case-fold-search}, which is the default value of
	1533	@code{case-fold-search} for buffers that do not override it.
	1534
	1535	Note that the user-level incremental search feature handles case
	1536	distinctions differently. When given a lower case letter, it looks for
	1537	a match of either case, but when given an upper case letter, it looks
	1538	for an upper case letter only. But this has nothing to do with the
	1539	searching functions used in Lisp code.
	1540
	1541	@defopt case-replace
	1542	This variable determines whether the replacement functions should
	1543	preserve case. If the variable is @code{nil}, that means to use the
	1544	replacement text verbatim. A non-@code{nil} value means to convert the
	1545	case of the replacement text according to the text being replaced.
	1546
	1547	This variable is used by passing it as an argument to the function
	1548	@code{replace-match}. @xref{Replacing Match}.
	1549	@end defopt
	1550
	1551	@defopt case-fold-search
	1552	This buffer-local variable determines whether searches should ignore
	1553	case. If the variable is @code{nil} they do not ignore case; otherwise
	1554	they do ignore case.
	1555	@end defopt
	1556
	1557	@defvar default-case-fold-search
	1558	The value of this variable is the default value for
	1559	@code{case-fold-search} in buffers that do not override it. This is the
	1560	same as @code{(default-value 'case-fold-search)}.
	1561	@end defvar
	1562
	1563	@node Standard Regexps
	1564	@section Standard Regular Expressions Used in Editing
	1565	@cindex regexps used standardly in editing
	1566	@cindex standard regexps used in editing
	1567
	1568	This section describes some variables that hold regular expressions
	1569	used for certain purposes in editing:
	1570
	1571	@defvar page-delimiter
	1572	This is the regular expression describing line-beginnings that separate
	1573	pages. The default value is @code{"^\014"} (i.e., @code{"^^L"} or
	1574	@code{"^\C-l"}); this matches a line that starts with a formfeed
	1575	character.
	1576	@end defvar
	1577
	1578	The following two regular expressions should @emph{not} assume the
	1579	match always starts at the beginning of a line; they should not use
	1580	@samp{^} to anchor the match. Most often, the paragraph commands do
	1581	check for a match only at the beginning of a line, which means that
	1582	@samp{^} would be superfluous. When there is a nonzero left margin,
	1583	they accept matches that start after the left margin. In that case, a
	1584	@samp{^} would be incorrect. However, a @samp{^} is harmless in modes
	1585	where a left margin is never used.
	1586
	1587	@defvar paragraph-separate
	1588	This is the regular expression for recognizing the beginning of a line
	1589	that separates paragraphs. (If you change this, you may have to
	1590	change @code{paragraph-start} also.) The default value is
	1591	@w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of
	1592	spaces, tabs, and form feeds (after its left margin).
	1593	@end defvar
	1594
	1595	@defvar paragraph-start
	1596	This is the regular expression for recognizing the beginning of a line
	1597	that starts @emph{or} separates paragraphs. The default value is
	1598	@w{@code{"[@ \t\n\f]"}}, which matches a line starting with a space, tab,
	1599	newline, or form feed (after its left margin).
	1600	@end defvar
	1601
	1602	@defvar sentence-end
	1603	This is the regular expression describing the end of a sentence. (All
	1604	paragraph boundaries also end sentences, regardless.) The default value
	1605	is:
	1606
	1607	@example
	1608	"[.?!][]\"')@}]\$$\\\| $\\\|\t\\\| \$[ \t\n]"
	1609	@end example
	1610
	1611	This means a period, question mark or exclamation mark, followed
	1612	optionally by a closing parenthetical character, followed by tabs,
	1613	spaces or new lines.
	1614
	1615	For a detailed explanation of this regular expression, see @ref{Regexp
	1616	Example}.
	1617	@end defvar