| 1 | @c This is part of the Emacs manual. |
| 2 | @c Copyright (C) 1985, 1986, 1987, 1993, 1994, 1995, 1997, 2000, 2001, 2002, |
| 3 | @c 2003, 2004, 2005, 2006 Free Software Foundation, Inc. |
| 4 | @c See file emacs.texi for copying conditions. |
| 5 | @node Search, Fixit, Display, Top |
| 6 | @chapter Searching and Replacement |
| 7 | @cindex searching |
| 8 | @cindex finding strings within text |
| 9 | |
| 10 | Like other editors, Emacs has commands for searching for occurrences of |
| 11 | a string. The principal search command is unusual in that it is |
| 12 | @dfn{incremental}; it begins to search before you have finished typing the |
| 13 | search string. There are also nonincremental search commands more like |
| 14 | those of other editors. |
| 15 | |
| 16 | Besides the usual @code{replace-string} command that finds all |
| 17 | occurrences of one string and replaces them with another, Emacs has a |
| 18 | more flexible replacement command called @code{query-replace}, which |
| 19 | asks interactively which occurrences to replace. There are also |
| 20 | commands to find and operate on all matches for a pattern. |
| 21 | |
| 22 | You can also search multiple files under control of a tags |
| 23 | table (@pxref{Tags Search}) or through the Dired @kbd{A} command |
| 24 | (@pxref{Operating on Files}), or ask the @code{grep} program to do it |
| 25 | (@pxref{Grep Searching}). |
| 26 | |
| 27 | |
| 28 | @menu |
| 29 | * Incremental Search:: Search happens as you type the string. |
| 30 | * Nonincremental Search:: Specify entire string and then search. |
| 31 | * Word Search:: Search for sequence of words. |
| 32 | * Regexp Search:: Search for match for a regexp. |
| 33 | * Regexps:: Syntax of regular expressions. |
| 34 | * Regexp Backslash:: Regular expression constructs starting with `\'. |
| 35 | * Regexp Example:: A complex regular expression explained. |
| 36 | * Search Case:: To ignore case while searching, or not. |
| 37 | * Replace:: Search, and replace some or all matches. |
| 38 | * Other Repeating Search:: Operating on all matches for some regexp. |
| 39 | @end menu |
| 40 | |
| 41 | @node Incremental Search |
| 42 | @section Incremental Search |
| 43 | @cindex incremental search |
| 44 | @cindex isearch |
| 45 | |
| 46 | An incremental search begins searching as soon as you type the first |
| 47 | character of the search string. As you type in the search string, Emacs |
| 48 | shows you where the string (as you have typed it so far) would be |
| 49 | found. When you have typed enough characters to identify the place you |
| 50 | want, you can stop. Depending on what you plan to do next, you may or |
| 51 | may not need to terminate the search explicitly with @key{RET}. |
| 52 | |
| 53 | @table @kbd |
| 54 | @item C-s |
| 55 | Incremental search forward (@code{isearch-forward}). |
| 56 | @item C-r |
| 57 | Incremental search backward (@code{isearch-backward}). |
| 58 | @end table |
| 59 | |
| 60 | @menu |
| 61 | * Basic Isearch:: Basic incremental search commands. |
| 62 | * Repeat Isearch:: Searching for the same string again. |
| 63 | * Error in Isearch:: When your string is not found. |
| 64 | * Special Isearch:: Special input in incremental search. |
| 65 | * Non-ASCII Isearch:: How to search for non-ASCII characters. |
| 66 | * Isearch Yank:: Commands that grab text into the search string |
| 67 | or else edit the search string. |
| 68 | * Highlight Isearch:: Isearch highlights the other possible matches. |
| 69 | * Isearch Scroll:: Scrolling during an incremental search. |
| 70 | * Slow Isearch:: Incremental search features for slow terminals. |
| 71 | @end menu |
| 72 | |
| 73 | @node Basic Isearch |
| 74 | @subsection Basics of Incremental Search |
| 75 | |
| 76 | @kindex C-s |
| 77 | @findex isearch-forward |
| 78 | @kbd{C-s} starts a forward incremental search. It reads characters |
| 79 | from the keyboard, and moves point past the next occurrence of those |
| 80 | characters. If you type @kbd{C-s} and then @kbd{F}, that puts the |
| 81 | cursor after the first @samp{F} (the first following the starting point, since |
| 82 | this is a forward search). Then if you type an @kbd{O}, you will see |
| 83 | the cursor move to just after the first @samp{FO} (the @samp{F} in that |
| 84 | @samp{FO} may or may not be the first @samp{F}). After another |
| 85 | @kbd{O}, the cursor moves to just after the first @samp{FOO} after the place |
| 86 | where you started the search. At each step, the buffer text that |
| 87 | matches the search string is highlighted, if the terminal can do that; |
| 88 | the current search string is always displayed in the echo area. |
| 89 | |
| 90 | If you make a mistake in typing the search string, you can cancel |
| 91 | characters with @key{DEL}. Each @key{DEL} cancels the last character of |
| 92 | search string. This does not happen until Emacs is ready to read another |
| 93 | input character; first it must either find, or fail to find, the character |
| 94 | you want to erase. If you do not want to wait for this to happen, use |
| 95 | @kbd{C-g} as described below. |
| 96 | |
| 97 | When you are satisfied with the place you have reached, you can type |
| 98 | @key{RET}, which stops searching, leaving the cursor where the search |
| 99 | brought it. Also, any command not specially meaningful in searches |
| 100 | stops the searching and is then executed. Thus, typing @kbd{C-a} |
| 101 | would exit the search and then move to the beginning of the line. |
| 102 | @key{RET} is necessary only if the next command you want to type is a |
| 103 | printing character, @key{DEL}, @key{RET}, or another character that is |
| 104 | special within searches (@kbd{C-q}, @kbd{C-w}, @kbd{C-r}, @kbd{C-s}, |
| 105 | @kbd{C-y}, @kbd{M-y}, @kbd{M-r}, @kbd{M-c}, @kbd{M-e}, and some other |
| 106 | meta-characters). |
| 107 | |
| 108 | When you exit the incremental search, it sets the mark where point |
| 109 | @emph{was} before the search. That is convenient for moving back |
| 110 | there. In Transient Mark mode, incremental search sets the mark |
| 111 | without activating it, and does so only if the mark is not already |
| 112 | active. |
| 113 | |
| 114 | @node Repeat Isearch |
| 115 | @subsection Repeating Incremental Search |
| 116 | |
| 117 | Sometimes you search for @samp{FOO} and find one, but not the one you |
| 118 | expected to find. There was a second @samp{FOO} that you forgot |
| 119 | about, before the one you were aiming for. In this event, type |
| 120 | another @kbd{C-s} to move to the next occurrence of the search string. |
| 121 | You can repeat this any number of times. If you overshoot, you can |
| 122 | cancel some @kbd{C-s} characters with @key{DEL}. |
| 123 | |
| 124 | After you exit a search, you can search for the same string again by |
| 125 | typing just @kbd{C-s C-s}: the first @kbd{C-s} is the key that invokes |
| 126 | incremental search, and the second @kbd{C-s} means ``search again.'' |
| 127 | |
| 128 | If a search is failing and you ask to repeat it by typing another |
| 129 | @kbd{C-s}, it starts again from the beginning of the buffer. |
| 130 | Repeating a failing reverse search with @kbd{C-r} starts again from |
| 131 | the end. This is called @dfn{wrapping around}, and @samp{Wrapped} |
| 132 | appears in the search prompt once this has happened. If you keep on |
| 133 | going past the original starting point of the search, it changes to |
| 134 | @samp{Overwrapped}, which means that you are revisiting matches that |
| 135 | you have already seen. |
| 136 | |
| 137 | To reuse earlier search strings, use the @dfn{search ring}. The |
| 138 | commands @kbd{M-p} and @kbd{M-n} move through the ring to pick a search |
| 139 | string to reuse. These commands leave the selected search ring element |
| 140 | in the minibuffer, where you can edit it. To edit the current search |
| 141 | string in the minibuffer without replacing it with items from the |
| 142 | search ring, type @kbd{M-e}. Type @kbd{C-s} or @kbd{C-r} |
| 143 | to terminate editing the string and search for it. |
| 144 | |
| 145 | You can change to searching backwards with @kbd{C-r}. For instance, |
| 146 | if you are searching forward but you realize you were looking for |
| 147 | something above the starting point, you can do this. Repeated |
| 148 | @kbd{C-r} keeps looking for more occurrences backwards. A @kbd{C-s} |
| 149 | starts going forwards again. @kbd{C-r} in a search can be canceled |
| 150 | with @key{DEL}. |
| 151 | |
| 152 | @kindex C-r |
| 153 | @findex isearch-backward |
| 154 | If you know initially that you want to search backwards, you can use |
| 155 | @kbd{C-r} instead of @kbd{C-s} to start the search, because @kbd{C-r} |
| 156 | as a key runs a command (@code{isearch-backward}) to search backward. |
| 157 | A backward search finds matches that end before the starting point, |
| 158 | just as a forward search finds matches that begin after it. |
| 159 | |
| 160 | @node Error in Isearch |
| 161 | @subsection Errors in Incremental Search |
| 162 | |
| 163 | If your string is not found at all, the echo area says @samp{Failing |
| 164 | I-Search}. The cursor is after the place where Emacs found as much of your |
| 165 | string as it could. Thus, if you search for @samp{FOOT}, and there is no |
| 166 | @samp{FOOT}, you might see the cursor after the @samp{FOO} in @samp{FOOL}. |
| 167 | At this point there are several things you can do. If your string was |
| 168 | mistyped, you can rub some of it out and correct it. If you like the place |
| 169 | you have found, you can type @key{RET} or some other Emacs command to |
| 170 | remain there. Or you can type @kbd{C-g}, which |
| 171 | removes from the search string the characters that could not be found (the |
| 172 | @samp{T} in @samp{FOOT}), leaving those that were found (the @samp{FOO} in |
| 173 | @samp{FOOT}). A second @kbd{C-g} at that point cancels the search |
| 174 | entirely, returning point to where it was when the search started. |
| 175 | |
| 176 | @cindex quitting (in search) |
| 177 | The @kbd{C-g} ``quit'' character does special things during searches; |
| 178 | just what it does depends on the status of the search. If the search has |
| 179 | found what you specified and is waiting for input, @kbd{C-g} cancels the |
| 180 | entire search. The cursor moves back to where you started the search. If |
| 181 | @kbd{C-g} is typed when there are characters in the search string that have |
| 182 | not been found---because Emacs is still searching for them, or because it |
| 183 | has failed to find them---then the search string characters which have not |
| 184 | been found are discarded from the search string. With them gone, the |
| 185 | search is now successful and waiting for more input, so a second @kbd{C-g} |
| 186 | will cancel the entire search. |
| 187 | |
| 188 | @node Special Isearch |
| 189 | @subsection Special Input for Incremental Search |
| 190 | |
| 191 | An upper-case letter in the search string makes the search |
| 192 | case-sensitive. If you delete the upper-case character from the search |
| 193 | string, it ceases to have this effect. @xref{Search Case}. |
| 194 | |
| 195 | To search for a newline, type @kbd{C-j}. To search for another |
| 196 | control character, such as control-S or carriage return, you must quote |
| 197 | it by typing @kbd{C-q} first. This function of @kbd{C-q} is analogous |
| 198 | to its use for insertion (@pxref{Inserting Text}): it causes the |
| 199 | following character to be treated the way any ``ordinary'' character is |
| 200 | treated in the same context. You can also specify a character by its |
| 201 | octal code: enter @kbd{C-q} followed by a sequence of octal digits. |
| 202 | |
| 203 | @kbd{M-%} typed in incremental search invokes @code{query-replace} |
| 204 | or @code{query-replace-regexp} (depending on search mode) with the |
| 205 | current search string used as the string to replace. @xref{Query |
| 206 | Replace}. |
| 207 | |
| 208 | Entering @key{RET} when the search string is empty launches |
| 209 | nonincremental search (@pxref{Nonincremental Search}). |
| 210 | |
| 211 | @vindex isearch-mode-map |
| 212 | To customize the special characters that incremental search understands, |
| 213 | alter their bindings in the keymap @code{isearch-mode-map}. For a list |
| 214 | of bindings, look at the documentation of @code{isearch-mode} with |
| 215 | @kbd{C-h f isearch-mode @key{RET}}. |
| 216 | |
| 217 | @node Non-ASCII Isearch |
| 218 | @subsection Isearch for Non-@acronym{ASCII} Characters |
| 219 | @cindex searching for non-@acronym{ASCII} characters |
| 220 | @cindex input method, during incremental search |
| 221 | |
| 222 | To enter non-@acronym{ASCII} characters in an incremental search, |
| 223 | you can use @kbd{C-q} (see the previous section), but it is easier to |
| 224 | use an input method (@pxref{Input Methods}). If an input method is |
| 225 | enabled in the current buffer when you start the search, you can use |
| 226 | it in the search string also. Emacs indicates that by including the |
| 227 | input method mnemonic in its prompt, like this: |
| 228 | |
| 229 | @example |
| 230 | I-search [@var{im}]: |
| 231 | @end example |
| 232 | |
| 233 | @noindent |
| 234 | @findex isearch-toggle-input-method |
| 235 | @findex isearch-toggle-specified-input-method |
| 236 | where @var{im} is the mnemonic of the active input method. |
| 237 | |
| 238 | You can toggle (enable or disable) the input method while you type |
| 239 | the search string with @kbd{C-\} (@code{isearch-toggle-input-method}). |
| 240 | You can turn on a certain (non-default) input method with @kbd{C-^} |
| 241 | (@code{isearch-toggle-specified-input-method}), which prompts for the |
| 242 | name of the input method. The input method you enable during |
| 243 | incremental search remains enabled in the current buffer afterwards. |
| 244 | |
| 245 | @node Isearch Yank |
| 246 | @subsection Isearch Yanking |
| 247 | |
| 248 | The characters @kbd{C-w} and @kbd{C-y} can be used in incremental |
| 249 | search to grab text from the buffer into the search string. This |
| 250 | makes it convenient to search for another occurrence of text at point. |
| 251 | @kbd{C-w} copies the character or word after point as part of the |
| 252 | search string, advancing point over it. (The decision, whether to |
| 253 | copy a character or a word, is heuristic.) Another @kbd{C-s} to |
| 254 | repeat the search will then search for a string including that |
| 255 | character or word. |
| 256 | |
| 257 | @kbd{C-y} is similar to @kbd{C-w} but copies all the rest of the |
| 258 | current line into the search string. If point is already at the end |
| 259 | of a line, it grabs the entire next line. Both @kbd{C-y} and |
| 260 | @kbd{C-w} convert the text they copy to lower case if the search is |
| 261 | currently not case-sensitive; this is so the search remains |
| 262 | case-insensitive. |
| 263 | |
| 264 | @kbd{C-M-w} and @kbd{C-M-y} modify the search string by only one |
| 265 | character at a time: @kbd{C-M-w} deletes the last character from the |
| 266 | search string and @kbd{C-M-y} copies the character after point to the |
| 267 | end of the search string. An alternative method to add the character |
| 268 | after point into the search string is to enter the minibuffer by |
| 269 | @kbd{M-e} and to type @kbd{C-f} at the end of the search string in the |
| 270 | minibuffer. |
| 271 | |
| 272 | The character @kbd{M-y} copies text from the kill ring into the search |
| 273 | string. It uses the same text that @kbd{C-y} as a command would yank. |
| 274 | @kbd{Mouse-2} in the echo area does the same. |
| 275 | @xref{Yanking}. |
| 276 | |
| 277 | @node Highlight Isearch |
| 278 | @subsection Lazy Search Highlighting |
| 279 | @cindex lazy search highlighting |
| 280 | @vindex isearch-lazy-highlight |
| 281 | |
| 282 | When you pause for a little while during incremental search, it |
| 283 | highlights all other possible matches for the search string. This |
| 284 | makes it easier to anticipate where you can get to by typing @kbd{C-s} |
| 285 | or @kbd{C-r} to repeat the search. The short delay before highlighting |
| 286 | other matches helps indicate which match is the current one. |
| 287 | If you don't like this feature, you can turn it off by setting |
| 288 | @code{isearch-lazy-highlight} to @code{nil}. |
| 289 | |
| 290 | @cindex faces for highlighting search matches |
| 291 | You can control how this highlighting looks by customizing the faces |
| 292 | @code{isearch} (used for the current match) and @code{lazy-highlight} |
| 293 | (for all the other matches). |
| 294 | |
| 295 | @node Isearch Scroll |
| 296 | @subsection Scrolling During Incremental Search |
| 297 | |
| 298 | You can enable the use of vertical scrolling during incremental |
| 299 | search (without exiting the search) by setting the customizable |
| 300 | variable @code{isearch-allow-scroll} to a non-@code{nil} value. This |
| 301 | applies to using the vertical scroll-bar and to certain keyboard |
| 302 | commands such as @kbd{@key{PRIOR}} (@code{scroll-down}), |
| 303 | @kbd{@key{NEXT}} (@code{scroll-up}) and @kbd{C-l} (@code{recenter}). |
| 304 | You must run these commands via their key sequences to stay in the |
| 305 | search---typing @kbd{M-x} will terminate the search. You can give |
| 306 | prefix arguments to these commands in the usual way. |
| 307 | |
| 308 | This feature won't let you scroll the current match out of visibility, |
| 309 | however. |
| 310 | |
| 311 | The feature also affects some other commands, such as @kbd{C-x 2} |
| 312 | (@code{split-window-vertically}) and @kbd{C-x ^} |
| 313 | (@code{enlarge-window}) which don't exactly scroll but do affect where |
| 314 | the text appears on the screen. In general, it applies to any command |
| 315 | whose name has a non-@code{nil} @code{isearch-scroll} property. So you |
| 316 | can control which commands are affected by changing these properties. |
| 317 | |
| 318 | For example, to make @kbd{C-h l} usable within an incremental search |
| 319 | in all future Emacs sessions, use @kbd{C-h c} to find what command it |
| 320 | runs. (You type @kbd{C-h c C-h l}; it says @code{view-lossage}.) |
| 321 | Then you can put the following line in your @file{.emacs} file |
| 322 | (@pxref{Init File}): |
| 323 | |
| 324 | @example |
| 325 | (put 'view-lossage 'isearch-scroll t) |
| 326 | @end example |
| 327 | |
| 328 | @noindent |
| 329 | This feature can be applied to any command that doesn't permanently |
| 330 | change point, the buffer contents, the match data, the current buffer, |
| 331 | or the selected window and frame. The command must not itself attempt |
| 332 | an incremental search. |
| 333 | |
| 334 | @node Slow Isearch |
| 335 | @subsection Slow Terminal Incremental Search |
| 336 | |
| 337 | Incremental search on a slow terminal uses a modified style of display |
| 338 | that is designed to take less time. Instead of redisplaying the buffer at |
| 339 | each place the search gets to, it creates a new single-line window and uses |
| 340 | that to display the line that the search has found. The single-line window |
| 341 | comes into play as soon as point moves outside of the text that is already |
| 342 | on the screen. |
| 343 | |
| 344 | When you terminate the search, the single-line window is removed. |
| 345 | Emacs then redisplays the window in which the search was done, to show |
| 346 | its new position of point. |
| 347 | |
| 348 | @vindex search-slow-speed |
| 349 | The slow terminal style of display is used when the terminal baud rate is |
| 350 | less than or equal to the value of the variable @code{search-slow-speed}, |
| 351 | initially 1200. See also the discussion of the variable @code{baud-rate} |
| 352 | (@pxref{baud-rate,, Customization of Display}). |
| 353 | |
| 354 | @vindex search-slow-window-lines |
| 355 | The number of lines to use in slow terminal search display is controlled |
| 356 | by the variable @code{search-slow-window-lines}. Its normal value is 1. |
| 357 | |
| 358 | @node Nonincremental Search |
| 359 | @section Nonincremental Search |
| 360 | @cindex nonincremental search |
| 361 | |
| 362 | Emacs also has conventional nonincremental search commands, which require |
| 363 | you to type the entire search string before searching begins. |
| 364 | |
| 365 | @table @kbd |
| 366 | @item C-s @key{RET} @var{string} @key{RET} |
| 367 | Search for @var{string}. |
| 368 | @item C-r @key{RET} @var{string} @key{RET} |
| 369 | Search backward for @var{string}. |
| 370 | @end table |
| 371 | |
| 372 | To do a nonincremental search, first type @kbd{C-s @key{RET}}. This |
| 373 | enters the minibuffer to read the search string; terminate the string |
| 374 | with @key{RET}, and then the search takes place. If the string is not |
| 375 | found, the search command signals an error. |
| 376 | |
| 377 | When you type @kbd{C-s @key{RET}}, the @kbd{C-s} invokes incremental |
| 378 | search as usual. That command is specially programmed to invoke |
| 379 | nonincremental search, @code{search-forward}, if the string you |
| 380 | specify is empty. (Such an empty argument would otherwise be |
| 381 | useless.) But it does not call @code{search-forward} right away. First |
| 382 | it checks the next input character to see if is @kbd{C-w}, |
| 383 | which specifies a word search. |
| 384 | @ifnottex |
| 385 | @xref{Word Search}. |
| 386 | @end ifnottex |
| 387 | @kbd{C-r @key{RET}} does likewise, for a reverse incremental search. |
| 388 | |
| 389 | @findex search-forward |
| 390 | @findex search-backward |
| 391 | Forward and backward nonincremental searches are implemented by the |
| 392 | commands @code{search-forward} and @code{search-backward}. These |
| 393 | commands may be bound to keys in the usual manner. The feature that you |
| 394 | can get to them via the incremental search commands exists for |
| 395 | historical reasons, and to avoid the need to find separate key sequences |
| 396 | for them. |
| 397 | |
| 398 | @node Word Search |
| 399 | @section Word Search |
| 400 | @cindex word search |
| 401 | |
| 402 | Word search searches for a sequence of words without regard to how the |
| 403 | words are separated. More precisely, you type a string of many words, |
| 404 | using single spaces to separate them, and the string can be found even |
| 405 | if there are multiple spaces, newlines, or other punctuation characters |
| 406 | between these words. |
| 407 | |
| 408 | Word search is useful for editing a printed document made with a text |
| 409 | formatter. If you edit while looking at the printed, formatted version, |
| 410 | you can't tell where the line breaks are in the source file. With word |
| 411 | search, you can search without having to know them. |
| 412 | |
| 413 | @table @kbd |
| 414 | @item C-s @key{RET} C-w @var{words} @key{RET} |
| 415 | Search for @var{words}, ignoring details of punctuation. |
| 416 | @item C-r @key{RET} C-w @var{words} @key{RET} |
| 417 | Search backward for @var{words}, ignoring details of punctuation. |
| 418 | @end table |
| 419 | |
| 420 | Word search as a special case of nonincremental search is invoked |
| 421 | with @kbd{C-s @key{RET} C-w}. This is followed by the search string, |
| 422 | which must always be terminated with @key{RET}. Being nonincremental, |
| 423 | this search does not start until the argument is terminated. It works |
| 424 | by constructing a regular expression and searching for that; see |
| 425 | @ref{Regexp Search}. |
| 426 | |
| 427 | Use @kbd{C-r @key{RET} C-w} to do backward word search. |
| 428 | |
| 429 | You can also invoke word search with @kbd{C-s M-e C-w} or @kbd{C-r |
| 430 | M-e C-w} followed by the search string and terminated with @key{RET}, |
| 431 | @kbd{C-s} or @kbd{C-r}. This puts word search into incremental mode |
| 432 | where you can use all keys available for incremental search. However, |
| 433 | when you type more words in incremental word search, it will fail |
| 434 | until you type complete words. |
| 435 | |
| 436 | @findex word-search-forward |
| 437 | @findex word-search-backward |
| 438 | Forward and backward word searches are implemented by the commands |
| 439 | @code{word-search-forward} and @code{word-search-backward}. These |
| 440 | commands may be bound to keys in the usual manner. They are available |
| 441 | via the incremental search commands both for historical reasons and |
| 442 | to avoid the need to find separate key sequences for them. |
| 443 | |
| 444 | @node Regexp Search |
| 445 | @section Regular Expression Search |
| 446 | @cindex regular expression |
| 447 | @cindex regexp |
| 448 | |
| 449 | A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern |
| 450 | that denotes a class of alternative strings to match, possibly |
| 451 | infinitely many. GNU Emacs provides both incremental and |
| 452 | nonincremental ways to search for a match for a regexp. The syntax of |
| 453 | regular expressions is explained in the following section. |
| 454 | |
| 455 | @kindex C-M-s |
| 456 | @findex isearch-forward-regexp |
| 457 | @kindex C-M-r |
| 458 | @findex isearch-backward-regexp |
| 459 | Incremental search for a regexp is done by typing @kbd{C-M-s} |
| 460 | (@code{isearch-forward-regexp}), by invoking @kbd{C-s} with a |
| 461 | prefix argument (whose value does not matter), or by typing @kbd{M-r} |
| 462 | within a forward incremental search. This command reads a |
| 463 | search string incrementally just like @kbd{C-s}, but it treats the |
| 464 | search string as a regexp rather than looking for an exact match |
| 465 | against the text in the buffer. Each time you add text to the search |
| 466 | string, you make the regexp longer, and the new regexp is searched |
| 467 | for. To search backward for a regexp, use @kbd{C-M-r} |
| 468 | (@code{isearch-backward-regexp}), @kbd{C-r} with a prefix argument, |
| 469 | or @kbd{M-r} within a backward incremental search. |
| 470 | |
| 471 | All of the control characters that do special things within an |
| 472 | ordinary incremental search have the same function in incremental regexp |
| 473 | search. Typing @kbd{C-s} or @kbd{C-r} immediately after starting the |
| 474 | search retrieves the last incremental search regexp used; that is to |
| 475 | say, incremental regexp and non-regexp searches have independent |
| 476 | defaults. They also have separate search rings that you can access with |
| 477 | @kbd{M-p} and @kbd{M-n}. |
| 478 | |
| 479 | @vindex search-whitespace-regexp |
| 480 | If you type @key{SPC} in incremental regexp search, it matches any |
| 481 | sequence of whitespace characters, including newlines. If you want to |
| 482 | match just a space, type @kbd{C-q @key{SPC}}. You can control what a |
| 483 | bare space matches by setting the variable |
| 484 | @code{search-whitespace-regexp} to the desired regexp. |
| 485 | |
| 486 | In some cases, adding characters to the regexp in an incremental regexp |
| 487 | search can make the cursor move back and start again. For example, if |
| 488 | you have searched for @samp{foo} and you add @samp{\|bar}, the cursor |
| 489 | backs up in case the first @samp{bar} precedes the first @samp{foo}. |
| 490 | |
| 491 | @findex re-search-forward |
| 492 | @findex re-search-backward |
| 493 | Nonincremental search for a regexp is done by the functions |
| 494 | @code{re-search-forward} and @code{re-search-backward}. You can invoke |
| 495 | these with @kbd{M-x}, or bind them to keys, or invoke them by way of |
| 496 | incremental regexp search with @kbd{C-M-s @key{RET}} and @kbd{C-M-r |
| 497 | @key{RET}}. |
| 498 | |
| 499 | If you use the incremental regexp search commands with a prefix |
| 500 | argument, they perform ordinary string search, like |
| 501 | @code{isearch-forward} and @code{isearch-backward}. @xref{Incremental |
| 502 | Search}. |
| 503 | |
| 504 | @node Regexps |
| 505 | @section Syntax of Regular Expressions |
| 506 | @cindex syntax of regexps |
| 507 | |
| 508 | This manual describes regular expression features that users |
| 509 | typically want to use. There are additional features that are |
| 510 | mainly used in Lisp programs; see @ref{Regular Expressions,,, |
| 511 | elisp, The Emacs Lisp Reference Manual}. |
| 512 | |
| 513 | Regular expressions have a syntax in which a few characters are |
| 514 | special constructs and the rest are @dfn{ordinary}. An ordinary |
| 515 | character is a simple regular expression which matches that same |
| 516 | character and nothing else. The special characters are @samp{$}, |
| 517 | @samp{^}, @samp{.}, @samp{*}, @samp{+}, @samp{?}, @samp{[}, and |
| 518 | @samp{\}. The character @samp{]} is special if it ends a character |
| 519 | alternative (see later). The character @samp{-} is special inside a |
| 520 | character alternative. Any other character appearing in a regular |
| 521 | expression is ordinary, unless a @samp{\} precedes it. (When you use |
| 522 | regular expressions in a Lisp program, each @samp{\} must be doubled, |
| 523 | see the example near the end of this section.) |
| 524 | |
| 525 | For example, @samp{f} is not a special character, so it is ordinary, and |
| 526 | therefore @samp{f} is a regular expression that matches the string |
| 527 | @samp{f} and no other string. (It does @emph{not} match the string |
| 528 | @samp{ff}.) Likewise, @samp{o} is a regular expression that matches |
| 529 | only @samp{o}. (When case distinctions are being ignored, these regexps |
| 530 | also match @samp{F} and @samp{O}, but we consider this a generalization |
| 531 | of ``the same string,'' rather than an exception.) |
| 532 | |
| 533 | Any two regular expressions @var{a} and @var{b} can be concatenated. The |
| 534 | result is a regular expression which matches a string if @var{a} matches |
| 535 | some amount of the beginning of that string and @var{b} matches the rest of |
| 536 | the string.@refill |
| 537 | |
| 538 | As a simple example, we can concatenate the regular expressions @samp{f} |
| 539 | and @samp{o} to get the regular expression @samp{fo}, which matches only |
| 540 | the string @samp{fo}. Still trivial. To do something nontrivial, you |
| 541 | need to use one of the special characters. Here is a list of them. |
| 542 | |
| 543 | @table @asis |
| 544 | @item @kbd{.}@: @r{(Period)} |
| 545 | is a special character that matches any single character except a newline. |
| 546 | Using concatenation, we can make regular expressions like @samp{a.b}, which |
| 547 | matches any three-character string that begins with @samp{a} and ends with |
| 548 | @samp{b}.@refill |
| 549 | |
| 550 | @item @kbd{*} |
| 551 | is not a construct by itself; it is a postfix operator that means to |
| 552 | match the preceding regular expression repetitively as many times as |
| 553 | possible. Thus, @samp{o*} matches any number of @samp{o}s (including no |
| 554 | @samp{o}s). |
| 555 | |
| 556 | @samp{*} always applies to the @emph{smallest} possible preceding |
| 557 | expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating |
| 558 | @samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on. |
| 559 | |
| 560 | The matcher processes a @samp{*} construct by matching, immediately, |
| 561 | as many repetitions as can be found. Then it continues with the rest |
| 562 | of the pattern. If that fails, backtracking occurs, discarding some |
| 563 | of the matches of the @samp{*}-modified construct in case that makes |
| 564 | it possible to match the rest of the pattern. For example, in matching |
| 565 | @samp{ca*ar} against the string @samp{caaar}, the @samp{a*} first |
| 566 | tries to match all three @samp{a}s; but the rest of the pattern is |
| 567 | @samp{ar} and there is only @samp{r} left to match, so this try fails. |
| 568 | The next alternative is for @samp{a*} to match only two @samp{a}s. |
| 569 | With this choice, the rest of the regexp matches successfully.@refill |
| 570 | |
| 571 | @item @kbd{+} |
| 572 | is a postfix operator, similar to @samp{*} except that it must match |
| 573 | the preceding expression at least once. So, for example, @samp{ca+r} |
| 574 | matches the strings @samp{car} and @samp{caaaar} but not the string |
| 575 | @samp{cr}, whereas @samp{ca*r} matches all three strings. |
| 576 | |
| 577 | @item @kbd{?} |
| 578 | is a postfix operator, similar to @samp{*} except that it can match the |
| 579 | preceding expression either once or not at all. For example, |
| 580 | @samp{ca?r} matches @samp{car} or @samp{cr}; nothing else. |
| 581 | |
| 582 | @item @kbd{*?}, @kbd{+?}, @kbd{??} |
| 583 | @cindex non-greedy regexp matching |
| 584 | are non-greedy variants of the operators above. The normal operators |
| 585 | @samp{*}, @samp{+}, @samp{?} are @dfn{greedy} in that they match as |
| 586 | much as they can, as long as the overall regexp can still match. With |
| 587 | a following @samp{?}, they are non-greedy: they will match as little |
| 588 | as possible. |
| 589 | |
| 590 | Thus, both @samp{ab*} and @samp{ab*?} can match the string @samp{a} |
| 591 | and the string @samp{abbbb}; but if you try to match them both against |
| 592 | the text @samp{abbb}, @samp{ab*} will match it all (the longest valid |
| 593 | match), while @samp{ab*?} will match just @samp{a} (the shortest |
| 594 | valid match). |
| 595 | |
| 596 | Non-greedy operators match the shortest possible string starting at a |
| 597 | given starting point; in a forward search, though, the earliest |
| 598 | possible starting point for match is always the one chosen. Thus, if |
| 599 | you search for @samp{a.*?$} against the text @samp{abbab} followed by |
| 600 | a newline, it matches the whole string. Since it @emph{can} match |
| 601 | starting at the first @samp{a}, it does. |
| 602 | |
| 603 | @item @kbd{\@{@var{n}\@}} |
| 604 | is a postfix operator that specifies repetition @var{n} times---that |
| 605 | is, the preceding regular expression must match exactly @var{n} times |
| 606 | in a row. For example, @samp{x\@{4\@}} matches the string @samp{xxxx} |
| 607 | and nothing else. |
| 608 | |
| 609 | @item @kbd{\@{@var{n},@var{m}\@}} |
| 610 | is a postfix operator that specifies repetition between @var{n} and |
| 611 | @var{m} times---that is, the preceding regular expression must match |
| 612 | at least @var{n} times, but no more than @var{m} times. If @var{m} is |
| 613 | omitted, then there is no upper limit, but the preceding regular |
| 614 | expression must match at least @var{n} times.@* @samp{\@{0,1\@}} is |
| 615 | equivalent to @samp{?}. @* @samp{\@{0,\@}} is equivalent to |
| 616 | @samp{*}. @* @samp{\@{1,\@}} is equivalent to @samp{+}. |
| 617 | |
| 618 | @item @kbd{[ @dots{} ]} |
| 619 | is a @dfn{character set}, which begins with @samp{[} and is terminated |
| 620 | by @samp{]}. In the simplest case, the characters between the two |
| 621 | brackets are what this set can match. |
| 622 | |
| 623 | Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and |
| 624 | @samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s |
| 625 | (including the empty string), from which it follows that @samp{c[ad]*r} |
| 626 | matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc. |
| 627 | |
| 628 | You can also include character ranges in a character set, by writing the |
| 629 | starting and ending characters with a @samp{-} between them. Thus, |
| 630 | @samp{[a-z]} matches any lower-case @acronym{ASCII} letter. Ranges may be |
| 631 | intermixed freely with individual characters, as in @samp{[a-z$%.]}, |
| 632 | which matches any lower-case @acronym{ASCII} letter or @samp{$}, @samp{%} or |
| 633 | period. |
| 634 | |
| 635 | Note that the usual regexp special characters are not special inside a |
| 636 | character set. A completely different set of special characters exists |
| 637 | inside character sets: @samp{]}, @samp{-} and @samp{^}. |
| 638 | |
| 639 | To include a @samp{]} in a character set, you must make it the first |
| 640 | character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. To |
| 641 | include a @samp{-}, write @samp{-} as the first or last character of the |
| 642 | set, or put it after a range. Thus, @samp{[]-]} matches both @samp{]} |
| 643 | and @samp{-}. |
| 644 | |
| 645 | To include @samp{^} in a set, put it anywhere but at the beginning of |
| 646 | the set. (At the beginning, it complements the set---see below.) |
| 647 | |
| 648 | When you use a range in case-insensitive search, you should write both |
| 649 | ends of the range in upper case, or both in lower case, or both should |
| 650 | be non-letters. The behavior of a mixed-case range such as @samp{A-z} |
| 651 | is somewhat ill-defined, and it may change in future Emacs versions. |
| 652 | |
| 653 | @item @kbd{[^ @dots{} ]} |
| 654 | @samp{[^} begins a @dfn{complemented character set}, which matches any |
| 655 | character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} matches |
| 656 | all characters @emph{except} @acronym{ASCII} letters and digits. |
| 657 | |
| 658 | @samp{^} is not special in a character set unless it is the first |
| 659 | character. The character following the @samp{^} is treated as if it |
| 660 | were first (in other words, @samp{-} and @samp{]} are not special there). |
| 661 | |
| 662 | A complemented character set can match a newline, unless newline is |
| 663 | mentioned as one of the characters not to match. This is in contrast to |
| 664 | the handling of regexps in programs such as @code{grep}. |
| 665 | |
| 666 | @item @kbd{^} |
| 667 | is a special character that matches the empty string, but only at the |
| 668 | beginning of a line in the text being matched. Otherwise it fails to |
| 669 | match anything. Thus, @samp{^foo} matches a @samp{foo} that occurs at |
| 670 | the beginning of a line. |
| 671 | |
| 672 | For historical compatibility reasons, @samp{^} can be used with this |
| 673 | meaning only at the beginning of the regular expression, or after |
| 674 | @samp{\(} or @samp{\|}. |
| 675 | |
| 676 | @item @kbd{$} |
| 677 | is similar to @samp{^} but matches only at the end of a line. Thus, |
| 678 | @samp{x+$} matches a string of one @samp{x} or more at the end of a line. |
| 679 | |
| 680 | For historical compatibility reasons, @samp{$} can be used with this |
| 681 | meaning only at the end of the regular expression, or before @samp{\)} |
| 682 | or @samp{\|}. |
| 683 | |
| 684 | @item @kbd{\} |
| 685 | has two functions: it quotes the special characters (including |
| 686 | @samp{\}), and it introduces additional special constructs. |
| 687 | |
| 688 | Because @samp{\} quotes special characters, @samp{\$} is a regular |
| 689 | expression that matches only @samp{$}, and @samp{\[} is a regular |
| 690 | expression that matches only @samp{[}, and so on. |
| 691 | |
| 692 | See the following section for the special constructs that begin |
| 693 | with @samp{\}. |
| 694 | @end table |
| 695 | |
| 696 | Note: for historical compatibility, special characters are treated as |
| 697 | ordinary ones if they are in contexts where their special meanings make no |
| 698 | sense. For example, @samp{*foo} treats @samp{*} as ordinary since there is |
| 699 | no preceding expression on which the @samp{*} can act. It is poor practice |
| 700 | to depend on this behavior; it is better to quote the special character anyway, |
| 701 | regardless of where it appears. |
| 702 | |
| 703 | As a @samp{\} is not special inside a character alternative, it can |
| 704 | never remove the special meaning of @samp{-} or @samp{]}. So you |
| 705 | should not quote these characters when they have no special meaning |
| 706 | either. This would not clarify anything, since backslashes can |
| 707 | legitimately precede these characters where they @emph{have} special |
| 708 | meaning, as in @samp{[^\]} (@code{"[^\\]"} for Lisp string syntax), |
| 709 | which matches any single character except a backslash. |
| 710 | |
| 711 | @node Regexp Backslash |
| 712 | @section Backslash in Regular Expressions |
| 713 | |
| 714 | For the most part, @samp{\} followed by any character matches only |
| 715 | that character. However, there are several exceptions: two-character |
| 716 | sequences starting with @samp{\} that have special meanings. The |
| 717 | second character in the sequence is always an ordinary character when |
| 718 | used on its own. Here is a table of @samp{\} constructs. |
| 719 | |
| 720 | @table @kbd |
| 721 | @item \| |
| 722 | specifies an alternative. Two regular expressions @var{a} and @var{b} |
| 723 | with @samp{\|} in between form an expression that matches some text if |
| 724 | either @var{a} matches it or @var{b} matches it. It works by trying to |
| 725 | match @var{a}, and if that fails, by trying to match @var{b}. |
| 726 | |
| 727 | Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar} |
| 728 | but no other string.@refill |
| 729 | |
| 730 | @samp{\|} applies to the largest possible surrounding expressions. Only a |
| 731 | surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of |
| 732 | @samp{\|}.@refill |
| 733 | |
| 734 | Full backtracking capability exists to handle multiple uses of @samp{\|}. |
| 735 | |
| 736 | @item \( @dots{} \) |
| 737 | is a grouping construct that serves three purposes: |
| 738 | |
| 739 | @enumerate |
| 740 | @item |
| 741 | To enclose a set of @samp{\|} alternatives for other operations. |
| 742 | Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}. |
| 743 | |
| 744 | @item |
| 745 | To enclose a complicated expression for the postfix operators @samp{*}, |
| 746 | @samp{+} and @samp{?} to operate on. Thus, @samp{ba\(na\)*} matches |
| 747 | @samp{bananana}, etc., with any (zero or more) number of @samp{na} |
| 748 | strings.@refill |
| 749 | |
| 750 | @item |
| 751 | To record a matched substring for future reference. |
| 752 | @end enumerate |
| 753 | |
| 754 | This last application is not a consequence of the idea of a |
| 755 | parenthetical grouping; it is a separate feature that is assigned as a |
| 756 | second meaning to the same @samp{\( @dots{} \)} construct. In practice |
| 757 | there is usually no conflict between the two meanings; when there is |
| 758 | a conflict, you can use a ``shy'' group. |
| 759 | |
| 760 | @item \(?: @dots{} \) |
| 761 | @cindex shy group, in regexp |
| 762 | specifies a ``shy'' group that does not record the matched substring; |
| 763 | you can't refer back to it with @samp{\@var{d}}. This is useful |
| 764 | in mechanically combining regular expressions, so that you |
| 765 | can add groups for syntactic purposes without interfering with |
| 766 | the numbering of the groups that are meant to be referred to. |
| 767 | |
| 768 | @item \@var{d} |
| 769 | @cindex back reference, in regexp |
| 770 | matches the same text that matched the @var{d}th occurrence of a |
| 771 | @samp{\( @dots{} \)} construct. This is called a @dfn{back |
| 772 | reference}. |
| 773 | |
| 774 | After the end of a @samp{\( @dots{} \)} construct, the matcher remembers |
| 775 | the beginning and end of the text matched by that construct. Then, |
| 776 | later on in the regular expression, you can use @samp{\} followed by the |
| 777 | digit @var{d} to mean ``match the same text matched the @var{d}th time |
| 778 | by the @samp{\( @dots{} \)} construct.'' |
| 779 | |
| 780 | The strings matching the first nine @samp{\( @dots{} \)} constructs |
| 781 | appearing in a regular expression are assigned numbers 1 through 9 in |
| 782 | the order that the open-parentheses appear in the regular expression. |
| 783 | So you can use @samp{\1} through @samp{\9} to refer to the text matched |
| 784 | by the corresponding @samp{\( @dots{} \)} constructs. |
| 785 | |
| 786 | For example, @samp{\(.*\)\1} matches any newline-free string that is |
| 787 | composed of two identical halves. The @samp{\(.*\)} matches the first |
| 788 | half, which may be anything, but the @samp{\1} that follows must match |
| 789 | the same exact text. |
| 790 | |
| 791 | If a particular @samp{\( @dots{} \)} construct matches more than once |
| 792 | (which can easily happen if it is followed by @samp{*}), only the last |
| 793 | match is recorded. |
| 794 | |
| 795 | @item \` |
| 796 | matches the empty string, but only at the beginning of the string or |
| 797 | buffer (or its accessible portion) being matched against. |
| 798 | |
| 799 | @item \' |
| 800 | matches the empty string, but only at the end of the string or buffer |
| 801 | (or its accessible portion) being matched against. |
| 802 | |
| 803 | @item \= |
| 804 | matches the empty string, but only at point. |
| 805 | |
| 806 | @item \b |
| 807 | matches the empty string, but only at the beginning or |
| 808 | end of a word. Thus, @samp{\bfoo\b} matches any occurrence of |
| 809 | @samp{foo} as a separate word. @samp{\bballs?\b} matches |
| 810 | @samp{ball} or @samp{balls} as a separate word.@refill |
| 811 | |
| 812 | @samp{\b} matches at the beginning or end of the buffer |
| 813 | regardless of what text appears next to it. |
| 814 | |
| 815 | @item \B |
| 816 | matches the empty string, but @emph{not} at the beginning or |
| 817 | end of a word. |
| 818 | |
| 819 | @item \< |
| 820 | matches the empty string, but only at the beginning of a word. |
| 821 | @samp{\<} matches at the beginning of the buffer only if a |
| 822 | word-constituent character follows. |
| 823 | |
| 824 | @item \> |
| 825 | matches the empty string, but only at the end of a word. @samp{\>} |
| 826 | matches at the end of the buffer only if the contents end with a |
| 827 | word-constituent character. |
| 828 | |
| 829 | @item \w |
| 830 | matches any word-constituent character. The syntax table |
| 831 | determines which characters these are. @xref{Syntax}. |
| 832 | |
| 833 | @item \W |
| 834 | matches any character that is not a word-constituent. |
| 835 | |
| 836 | @item \_< |
| 837 | matches the empty string, but only at the beginning of a symbol. |
| 838 | A symbol is a sequence of one or more symbol-constituent characters. |
| 839 | A symbol-constituent character is a character whose syntax is either |
| 840 | @samp{w} or @samp{_}. @samp{\_<} matches at the beginning of the |
| 841 | buffer only if a symbol-constituent character follows. |
| 842 | |
| 843 | @item \_> |
| 844 | matches the empty string, but only at the end of a symbol. @samp{\_>} |
| 845 | matches at the end of the buffer only if the contents end with a |
| 846 | symbol-constituent character. |
| 847 | |
| 848 | @item \s@var{c} |
| 849 | matches any character whose syntax is @var{c}. Here @var{c} is a |
| 850 | character that designates a particular syntax class: thus, @samp{w} |
| 851 | for word constituent, @samp{-} or @samp{ } for whitespace, @samp{.} |
| 852 | for ordinary punctuation, etc. @xref{Syntax}. |
| 853 | |
| 854 | @item \S@var{c} |
| 855 | matches any character whose syntax is not @var{c}. |
| 856 | |
| 857 | @cindex categories of characters |
| 858 | @cindex characters which belong to a specific language |
| 859 | @findex describe-categories |
| 860 | @item \c@var{c} |
| 861 | matches any character that belongs to the category @var{c}. For |
| 862 | example, @samp{\cc} matches Chinese characters, @samp{\cg} matches |
| 863 | Greek characters, etc. For the description of the known categories, |
| 864 | type @kbd{M-x describe-categories @key{RET}}. |
| 865 | |
| 866 | @item \C@var{c} |
| 867 | matches any character that does @emph{not} belong to category |
| 868 | @var{c}. |
| 869 | @end table |
| 870 | |
| 871 | The constructs that pertain to words and syntax are controlled by the |
| 872 | setting of the syntax table (@pxref{Syntax}). |
| 873 | |
| 874 | @node Regexp Example |
| 875 | @section Regular Expression Example |
| 876 | |
| 877 | Here is a complicated regexp---a simplified version of the regexp |
| 878 | that Emacs uses, by default, to recognize the end of a sentence |
| 879 | together with any whitespace that follows. We show its Lisp syntax to |
| 880 | distinguish the spaces from the tab characters. In Lisp syntax, the |
| 881 | string constant begins and ends with a double-quote. @samp{\"} stands |
| 882 | for a double-quote as part of the regexp, @samp{\\} for a backslash as |
| 883 | part of the regexp, @samp{\t} for a tab, and @samp{\n} for a newline. |
| 884 | |
| 885 | @example |
| 886 | "[.?!][]\"')]*\\($\\| $\\|\t\\| \\)[ \t\n]*" |
| 887 | @end example |
| 888 | |
| 889 | @noindent |
| 890 | This contains four parts in succession: a character set matching |
| 891 | period, @samp{?}, or @samp{!}; a character set matching |
| 892 | close-brackets, quotes, or parentheses, repeated zero or more times; a |
| 893 | set of alternatives within backslash-parentheses that matches either |
| 894 | end-of-line, a space at the end of a line, a tab, or two spaces; and a |
| 895 | character set matching whitespace characters, repeated any number of |
| 896 | times. |
| 897 | |
| 898 | To enter the same regexp in incremental search, you would type |
| 899 | @key{TAB} to enter a tab, and @kbd{C-j} to enter a newline. You would |
| 900 | also type single backslashes as themselves, instead of doubling them |
| 901 | for Lisp syntax. In commands that use ordinary minibuffer input to |
| 902 | read a regexp, you would quote the @kbd{C-j} by preceding it with a |
| 903 | @kbd{C-q} to prevent @kbd{C-j} from exiting the minibuffer. |
| 904 | |
| 905 | @node Search Case |
| 906 | @section Searching and Case |
| 907 | |
| 908 | Incremental searches in Emacs normally ignore the case of the text |
| 909 | they are searching through, if you specify the text in lower case. |
| 910 | Thus, if you specify searching for @samp{foo}, then @samp{Foo} and |
| 911 | @samp{foo} are also considered a match. Regexps, and in particular |
| 912 | character sets, are included: @samp{[ab]} would match @samp{a} or |
| 913 | @samp{A} or @samp{b} or @samp{B}.@refill |
| 914 | |
| 915 | An upper-case letter anywhere in the incremental search string makes |
| 916 | the search case-sensitive. Thus, searching for @samp{Foo} does not find |
| 917 | @samp{foo} or @samp{FOO}. This applies to regular expression search as |
| 918 | well as to string search. The effect ceases if you delete the |
| 919 | upper-case letter from the search string. |
| 920 | |
| 921 | Typing @kbd{M-c} within an incremental search toggles the case |
| 922 | sensitivity of that search. The effect does not extend beyond the |
| 923 | current incremental search to the next one, but it does override the |
| 924 | effect of including an upper-case letter in the current search. |
| 925 | |
| 926 | @vindex case-fold-search |
| 927 | @vindex default-case-fold-search |
| 928 | If you set the variable @code{case-fold-search} to @code{nil}, then |
| 929 | all letters must match exactly, including case. This is a per-buffer |
| 930 | variable; altering the variable affects only the current buffer, but |
| 931 | there is a default value in @code{default-case-fold-search} that you |
| 932 | can also set. @xref{Locals}. This variable applies to nonincremental |
| 933 | searches also, including those performed by the replace commands |
| 934 | (@pxref{Replace}) and the minibuffer history matching commands |
| 935 | (@pxref{Minibuffer History}). |
| 936 | |
| 937 | Several related variables control case-sensitivity of searching and |
| 938 | matching for specific commands or activities. For instance, |
| 939 | @code{tags-case-fold-search} controls case sensitivity for |
| 940 | @code{find-tag}. To find these variables, do @kbd{M-x |
| 941 | apropos-variable @key{RET} case-fold-search @key{RET}}. |
| 942 | |
| 943 | @node Replace |
| 944 | @section Replacement Commands |
| 945 | @cindex replacement |
| 946 | @cindex search-and-replace commands |
| 947 | @cindex string substitution |
| 948 | @cindex global substitution |
| 949 | |
| 950 | Global search-and-replace operations are not needed often in Emacs, |
| 951 | but they are available. In addition to the simple @kbd{M-x |
| 952 | replace-string} command which replaces all occurrences, |
| 953 | there is @kbd{M-%} (@code{query-replace}), which presents each occurrence |
| 954 | of the pattern and asks you whether to replace it. |
| 955 | |
| 956 | The replace commands normally operate on the text from point to the |
| 957 | end of the buffer; however, in Transient Mark mode (@pxref{Transient |
| 958 | Mark}), when the mark is active, they operate on the region. The |
| 959 | basic replace commands replace one string (or regexp) with one |
| 960 | replacement string. It is possible to perform several replacements in |
| 961 | parallel using the command @code{expand-region-abbrevs} |
| 962 | (@pxref{Expanding Abbrevs}). |
| 963 | |
| 964 | @menu |
| 965 | * Unconditional Replace:: Replacing all matches for a string. |
| 966 | * Regexp Replace:: Replacing all matches for a regexp. |
| 967 | * Replacement and Case:: How replacements preserve case of letters. |
| 968 | * Query Replace:: How to use querying. |
| 969 | @end menu |
| 970 | |
| 971 | @node Unconditional Replace, Regexp Replace, Replace, Replace |
| 972 | @subsection Unconditional Replacement |
| 973 | @findex replace-string |
| 974 | |
| 975 | @table @kbd |
| 976 | @item M-x replace-string @key{RET} @var{string} @key{RET} @var{newstring} @key{RET} |
| 977 | Replace every occurrence of @var{string} with @var{newstring}. |
| 978 | @end table |
| 979 | |
| 980 | To replace every instance of @samp{foo} after point with @samp{bar}, |
| 981 | use the command @kbd{M-x replace-string} with the two arguments |
| 982 | @samp{foo} and @samp{bar}. Replacement happens only in the text after |
| 983 | point, so if you want to cover the whole buffer you must go to the |
| 984 | beginning first. All occurrences up to the end of the buffer are |
| 985 | replaced; to limit replacement to part of the buffer, narrow to that |
| 986 | part of the buffer before doing the replacement (@pxref{Narrowing}). |
| 987 | In Transient Mark mode, when the region is active, replacement is |
| 988 | limited to the region (@pxref{Transient Mark}). |
| 989 | |
| 990 | When @code{replace-string} exits, it leaves point at the last |
| 991 | occurrence replaced. It sets the mark to the prior position of point |
| 992 | (where the @code{replace-string} command was issued); use @kbd{C-u |
| 993 | C-@key{SPC}} to move back there. |
| 994 | |
| 995 | A numeric argument restricts replacement to matches that are surrounded |
| 996 | by word boundaries. The argument's value doesn't matter. |
| 997 | |
| 998 | What if you want to exchange @samp{x} and @samp{y}: replace every @samp{x} with a @samp{y} and vice versa? You can do it this way: |
| 999 | |
| 1000 | @example |
| 1001 | M-x replace-string @key{RET} x @key{RET} @@TEMP@@ @key{RET} |
| 1002 | M-< M-x replace-string @key{RET} y @key{RET} x @key{RET} |
| 1003 | M-< M-x replace-string @key{RET} @@TEMP@@ @key{RET} y @key{RET} |
| 1004 | @end example |
| 1005 | |
| 1006 | @noindent |
| 1007 | This works provided the string @samp{@@TEMP@@} does not appear |
| 1008 | in your text. |
| 1009 | |
| 1010 | @node Regexp Replace, Replacement and Case, Unconditional Replace, Replace |
| 1011 | @subsection Regexp Replacement |
| 1012 | @findex replace-regexp |
| 1013 | |
| 1014 | The @kbd{M-x replace-string} command replaces exact matches for a |
| 1015 | single string. The similar command @kbd{M-x replace-regexp} replaces |
| 1016 | any match for a specified pattern. |
| 1017 | |
| 1018 | @table @kbd |
| 1019 | @item M-x replace-regexp @key{RET} @var{regexp} @key{RET} @var{newstring} @key{RET} |
| 1020 | Replace every match for @var{regexp} with @var{newstring}. |
| 1021 | @end table |
| 1022 | |
| 1023 | @cindex back reference, in regexp replacement |
| 1024 | In @code{replace-regexp}, the @var{newstring} need not be constant: |
| 1025 | it can refer to all or part of what is matched by the @var{regexp}. |
| 1026 | @samp{\&} in @var{newstring} stands for the entire match being |
| 1027 | replaced. @samp{\@var{d}} in @var{newstring}, where @var{d} is a |
| 1028 | digit, stands for whatever matched the @var{d}th parenthesized |
| 1029 | grouping in @var{regexp}. (This is called a ``back reference.'') |
| 1030 | @samp{\#} refers to the count of replacements already made in this |
| 1031 | command, as a decimal number. In the first replacement, @samp{\#} |
| 1032 | stands for @samp{0}; in the second, for @samp{1}; and so on. For |
| 1033 | example, |
| 1034 | |
| 1035 | @example |
| 1036 | M-x replace-regexp @key{RET} c[ad]+r @key{RET} \&-safe @key{RET} |
| 1037 | @end example |
| 1038 | |
| 1039 | @noindent |
| 1040 | replaces (for example) @samp{cadr} with @samp{cadr-safe} and @samp{cddr} |
| 1041 | with @samp{cddr-safe}. |
| 1042 | |
| 1043 | @example |
| 1044 | M-x replace-regexp @key{RET} \(c[ad]+r\)-safe @key{RET} \1 @key{RET} |
| 1045 | @end example |
| 1046 | |
| 1047 | @noindent |
| 1048 | performs the inverse transformation. To include a @samp{\} in the |
| 1049 | text to replace with, you must enter @samp{\\}. |
| 1050 | |
| 1051 | If you want to enter part of the replacement string by hand each |
| 1052 | time, use @samp{\?} in the replacement string. Each replacement will |
| 1053 | ask you to edit the replacement string in the minibuffer, putting |
| 1054 | point where the @samp{\?} was. |
| 1055 | |
| 1056 | The remainder of this subsection is intended for specialized tasks |
| 1057 | and requires knowledge of Lisp. Most readers can skip it. |
| 1058 | |
| 1059 | You can use Lisp expressions to calculate parts of the |
| 1060 | replacement string. To do this, write @samp{\,} followed by the |
| 1061 | expression in the replacement string. Each replacement calculates the |
| 1062 | value of the expression and converts it to text without quoting (if |
| 1063 | it's a string, this means using the string's contents), and uses it in |
| 1064 | the replacement string in place of the expression itself. If the |
| 1065 | expression is a symbol, one space in the replacement string after the |
| 1066 | symbol name goes with the symbol name, so the value replaces them |
| 1067 | both. |
| 1068 | |
| 1069 | Inside such an expression, you can use some special sequences. |
| 1070 | @samp{\&} and @samp{\@var{n}} refer here, as usual, to the entire |
| 1071 | match as a string, and to a submatch as a string. @var{n} may be |
| 1072 | multiple digits, and the value of @samp{\@var{n}} is @code{nil} if |
| 1073 | subexpression @var{n} did not match. You can also use @samp{\#&} and |
| 1074 | @samp{\#@var{n}} to refer to those matches as numbers (this is valid |
| 1075 | when the match or submatch has the form of a numeral). @samp{\#} here |
| 1076 | too stands for the number of already-completed replacements. |
| 1077 | |
| 1078 | Repeating our example to exchange @samp{x} and @samp{y}, we can thus |
| 1079 | do it also this way: |
| 1080 | |
| 1081 | @example |
| 1082 | M-x replace-regexp @key{RET} \(x\)\|y @key{RET} |
| 1083 | \,(if \1 "y" "x") @key{RET} |
| 1084 | @end example |
| 1085 | |
| 1086 | For computing replacement strings for @samp{\,}, the @code{format} |
| 1087 | function is often useful (@pxref{Formatting Strings,,, elisp, The Emacs |
| 1088 | Lisp Reference Manual}). For example, to add consecutively numbered |
| 1089 | strings like @samp{ABC00042} to columns 73 @w{to 80} (unless they are |
| 1090 | already occupied), you can use |
| 1091 | |
| 1092 | @example |
| 1093 | M-x replace-regexp @key{RET} ^.\@{0,72\@}$ @key{RET} |
| 1094 | \,(format "%-72sABC%05d" \& \#) @key{RET} |
| 1095 | @end example |
| 1096 | |
| 1097 | @node Replacement and Case, Query Replace, Regexp Replace, Replace |
| 1098 | @subsection Replace Commands and Case |
| 1099 | |
| 1100 | If the first argument of a replace command is all lower case, the |
| 1101 | command ignores case while searching for occurrences to |
| 1102 | replace---provided @code{case-fold-search} is non-@code{nil}. If |
| 1103 | @code{case-fold-search} is set to @code{nil}, case is always significant |
| 1104 | in all searches. |
| 1105 | |
| 1106 | @vindex case-replace |
| 1107 | In addition, when the @var{newstring} argument is all or partly lower |
| 1108 | case, replacement commands try to preserve the case pattern of each |
| 1109 | occurrence. Thus, the command |
| 1110 | |
| 1111 | @example |
| 1112 | M-x replace-string @key{RET} foo @key{RET} bar @key{RET} |
| 1113 | @end example |
| 1114 | |
| 1115 | @noindent |
| 1116 | replaces a lower case @samp{foo} with a lower case @samp{bar}, an |
| 1117 | all-caps @samp{FOO} with @samp{BAR}, and a capitalized @samp{Foo} with |
| 1118 | @samp{Bar}. (These three alternatives---lower case, all caps, and |
| 1119 | capitalized, are the only ones that @code{replace-string} can |
| 1120 | distinguish.) |
| 1121 | |
| 1122 | If upper-case letters are used in the replacement string, they remain |
| 1123 | upper case every time that text is inserted. If upper-case letters are |
| 1124 | used in the first argument, the second argument is always substituted |
| 1125 | exactly as given, with no case conversion. Likewise, if either |
| 1126 | @code{case-replace} or @code{case-fold-search} is set to @code{nil}, |
| 1127 | replacement is done without case conversion. |
| 1128 | |
| 1129 | @node Query Replace,, Replacement and Case, Replace |
| 1130 | @subsection Query Replace |
| 1131 | @cindex query replace |
| 1132 | |
| 1133 | @table @kbd |
| 1134 | @item M-% @var{string} @key{RET} @var{newstring} @key{RET} |
| 1135 | @itemx M-x query-replace @key{RET} @var{string} @key{RET} @var{newstring} @key{RET} |
| 1136 | Replace some occurrences of @var{string} with @var{newstring}. |
| 1137 | @item C-M-% @var{regexp} @key{RET} @var{newstring} @key{RET} |
| 1138 | @itemx M-x query-replace-regexp @key{RET} @var{regexp} @key{RET} @var{newstring} @key{RET} |
| 1139 | Replace some matches for @var{regexp} with @var{newstring}. |
| 1140 | @end table |
| 1141 | |
| 1142 | @kindex M-% |
| 1143 | @findex query-replace |
| 1144 | If you want to change only some of the occurrences of @samp{foo} to |
| 1145 | @samp{bar}, not all of them, then you cannot use an ordinary |
| 1146 | @code{replace-string}. Instead, use @kbd{M-%} (@code{query-replace}). |
| 1147 | This command finds occurrences of @samp{foo} one by one, displays each |
| 1148 | occurrence and asks you whether to replace it. Aside from querying, |
| 1149 | @code{query-replace} works just like @code{replace-string}. It |
| 1150 | preserves case, like @code{replace-string}, provided |
| 1151 | @code{case-replace} is non-@code{nil}, as it normally is. A numeric |
| 1152 | argument means consider only occurrences that are bounded by |
| 1153 | word-delimiter characters. |
| 1154 | |
| 1155 | @kindex C-M-% |
| 1156 | @findex query-replace-regexp |
| 1157 | @kbd{C-M-%} performs regexp search and replace (@code{query-replace-regexp}). |
| 1158 | It works like @code{replace-regexp} except that it queries |
| 1159 | like @code{query-replace}. |
| 1160 | |
| 1161 | @cindex faces for highlighting query replace |
| 1162 | These commands highlight the current match using the face |
| 1163 | @code{query-replace}. They highlight other matches using |
| 1164 | @code{lazy-highlight} just like incremental search (@pxref{Incremental |
| 1165 | Search}). |
| 1166 | |
| 1167 | The characters you can type when you are shown a match for the string |
| 1168 | or regexp are: |
| 1169 | |
| 1170 | @ignore @c Not worth it. |
| 1171 | @kindex SPC @r{(query-replace)} |
| 1172 | @kindex DEL @r{(query-replace)} |
| 1173 | @kindex , @r{(query-replace)} |
| 1174 | @kindex RET @r{(query-replace)} |
| 1175 | @kindex . @r{(query-replace)} |
| 1176 | @kindex ! @r{(query-replace)} |
| 1177 | @kindex ^ @r{(query-replace)} |
| 1178 | @kindex C-r @r{(query-replace)} |
| 1179 | @kindex C-w @r{(query-replace)} |
| 1180 | @kindex C-l @r{(query-replace)} |
| 1181 | @end ignore |
| 1182 | |
| 1183 | @c WideCommands |
| 1184 | @table @kbd |
| 1185 | @item @key{SPC} |
| 1186 | to replace the occurrence with @var{newstring}. |
| 1187 | |
| 1188 | @item @key{DEL} |
| 1189 | to skip to the next occurrence without replacing this one. |
| 1190 | |
| 1191 | @item , @r{(Comma)} |
| 1192 | to replace this occurrence and display the result. You are then asked |
| 1193 | for another input character to say what to do next. Since the |
| 1194 | replacement has already been made, @key{DEL} and @key{SPC} are |
| 1195 | equivalent in this situation; both move to the next occurrence. |
| 1196 | |
| 1197 | You can type @kbd{C-r} at this point (see below) to alter the replaced |
| 1198 | text. You can also type @kbd{C-x u} to undo the replacement; this exits |
| 1199 | the @code{query-replace}, so if you want to do further replacement you |
| 1200 | must use @kbd{C-x @key{ESC} @key{ESC} @key{RET}} to restart |
| 1201 | (@pxref{Repetition}). |
| 1202 | |
| 1203 | @item @key{RET} |
| 1204 | to exit without doing any more replacements. |
| 1205 | |
| 1206 | @item .@: @r{(Period)} |
| 1207 | to replace this occurrence and then exit without searching for more |
| 1208 | occurrences. |
| 1209 | |
| 1210 | @item ! |
| 1211 | to replace all remaining occurrences without asking again. |
| 1212 | |
| 1213 | @item ^ |
| 1214 | to go back to the position of the previous occurrence (or what used to |
| 1215 | be an occurrence), in case you changed it by mistake or want to |
| 1216 | reexamine it. |
| 1217 | |
| 1218 | @item C-r |
| 1219 | to enter a recursive editing level, in case the occurrence needs to be |
| 1220 | edited rather than just replaced with @var{newstring}. When you are |
| 1221 | done, exit the recursive editing level with @kbd{C-M-c} to proceed to |
| 1222 | the next occurrence. @xref{Recursive Edit}. |
| 1223 | |
| 1224 | @item C-w |
| 1225 | to delete the occurrence, and then enter a recursive editing level as in |
| 1226 | @kbd{C-r}. Use the recursive edit to insert text to replace the deleted |
| 1227 | occurrence of @var{string}. When done, exit the recursive editing level |
| 1228 | with @kbd{C-M-c} to proceed to the next occurrence. |
| 1229 | |
| 1230 | @item e |
| 1231 | to edit the replacement string in the minibuffer. When you exit the |
| 1232 | minibuffer by typing @key{RET}, the minibuffer contents replace the |
| 1233 | current occurrence of the pattern. They also become the new |
| 1234 | replacement string for any further occurrences. |
| 1235 | |
| 1236 | @item C-l |
| 1237 | to redisplay the screen. Then you must type another character to |
| 1238 | specify what to do with this occurrence. |
| 1239 | |
| 1240 | @item C-h |
| 1241 | to display a message summarizing these options. Then you must type |
| 1242 | another character to specify what to do with this occurrence. |
| 1243 | @end table |
| 1244 | |
| 1245 | Some other characters are aliases for the ones listed above: @kbd{y}, |
| 1246 | @kbd{n} and @kbd{q} are equivalent to @key{SPC}, @key{DEL} and |
| 1247 | @key{RET}. |
| 1248 | |
| 1249 | Aside from this, any other character exits the @code{query-replace}, |
| 1250 | and is then reread as part of a key sequence. Thus, if you type |
| 1251 | @kbd{C-k}, it exits the @code{query-replace} and then kills to end of |
| 1252 | line. |
| 1253 | |
| 1254 | To restart a @code{query-replace} once it is exited, use @kbd{C-x |
| 1255 | @key{ESC} @key{ESC}}, which repeats the @code{query-replace} because it |
| 1256 | used the minibuffer to read its arguments. @xref{Repetition, C-x ESC |
| 1257 | ESC}. |
| 1258 | |
| 1259 | @xref{Operating on Files}, for the Dired @kbd{Q} command which |
| 1260 | performs query replace on selected files. See also @ref{Transforming |
| 1261 | File Names}, for Dired commands to rename, copy, or link files by |
| 1262 | replacing regexp matches in file names. |
| 1263 | |
| 1264 | @node Other Repeating Search |
| 1265 | @section Other Search-and-Loop Commands |
| 1266 | |
| 1267 | Here are some other commands that find matches for a regular |
| 1268 | expression. They all ignore case in matching, if the pattern contains |
| 1269 | no upper-case letters and @code{case-fold-search} is non-@code{nil}. |
| 1270 | Aside from @code{occur} and its variants, all operate on the text from |
| 1271 | point to the end of the buffer, or on the active region in Transient |
| 1272 | Mark mode. |
| 1273 | |
| 1274 | @findex list-matching-lines |
| 1275 | @findex occur |
| 1276 | @findex multi-occur |
| 1277 | @findex multi-occur-in-matching-buffers |
| 1278 | @findex how-many |
| 1279 | @findex delete-non-matching-lines |
| 1280 | @findex delete-matching-lines |
| 1281 | @findex flush-lines |
| 1282 | @findex keep-lines |
| 1283 | |
| 1284 | @table @kbd |
| 1285 | @item M-x occur @key{RET} @var{regexp} @key{RET} |
| 1286 | Display a list showing each line in the buffer that contains a match |
| 1287 | for @var{regexp}. To limit the search to part of the buffer, narrow |
| 1288 | to that part (@pxref{Narrowing}). A numeric argument @var{n} |
| 1289 | specifies that @var{n} lines of context are to be displayed before and |
| 1290 | after each matching line. Currently, @code{occur} can not correctly |
| 1291 | handle multiline matches. |
| 1292 | |
| 1293 | @kindex RET @r{(Occur mode)} |
| 1294 | @kindex o @r{(Occur mode)} |
| 1295 | @kindex C-o @r{(Occur mode)} |
| 1296 | The buffer @samp{*Occur*} containing the output serves as a menu for |
| 1297 | finding the occurrences in their original context. Click |
| 1298 | @kbd{Mouse-2} on an occurrence listed in @samp{*Occur*}, or position |
| 1299 | point there and type @key{RET}; this switches to the buffer that was |
| 1300 | searched and moves point to the original of the chosen occurrence. |
| 1301 | @kbd{o} and @kbd{C-o} display the match in another window; @kbd{C-o} |
| 1302 | does not select it. |
| 1303 | |
| 1304 | After using @kbd{M-x occur}, you can use @code{next-error} to visit |
| 1305 | the occurrences found, one by one. @ref{Compilation Mode}. |
| 1306 | |
| 1307 | @item M-x list-matching-lines |
| 1308 | Synonym for @kbd{M-x occur}. |
| 1309 | |
| 1310 | @item M-x multi-occur @key{RET} @var{buffers} @key{RET} @var{regexp} @key{RET} |
| 1311 | This function is just like @code{occur}, except it is able to search |
| 1312 | through multiple buffers. It asks you to specify the buffer names one by one. |
| 1313 | |
| 1314 | @item M-x multi-occur-in-matching-buffers @key{RET} @var{bufregexp} @key{RET} @var{regexp} @key{RET} |
| 1315 | This function is similar to @code{multi-occur}, except the buffers to |
| 1316 | search are specified by a regular expression that matches visited |
| 1317 | file names. With a prefix argument, it uses the regular expression to match |
| 1318 | buffer names instead. |
| 1319 | |
| 1320 | @item M-x how-many @key{RET} @var{regexp} @key{RET} |
| 1321 | Print the number of matches for @var{regexp} that exist in the buffer |
| 1322 | after point. In Transient Mark mode, if the region is active, the |
| 1323 | command operates on the region instead. |
| 1324 | |
| 1325 | @item M-x flush-lines @key{RET} @var{regexp} @key{RET} |
| 1326 | This command deletes each line that contains a match for @var{regexp}, |
| 1327 | operating on the text after point; it deletes the current line |
| 1328 | if it contains a match starting after point. In Transient Mark mode, |
| 1329 | if the region is active, the command operates on the region instead; |
| 1330 | it deletes a line partially contained in the region if it contains a |
| 1331 | match entirely contained in the region. |
| 1332 | |
| 1333 | If a match is split across lines, @code{flush-lines} deletes all those |
| 1334 | lines. It deletes the lines before starting to look for the next |
| 1335 | match; hence, it ignores a match starting on the same line at which |
| 1336 | another match ended. |
| 1337 | |
| 1338 | @item M-x keep-lines @key{RET} @var{regexp} @key{RET} |
| 1339 | This command deletes each line that @emph{does not} contain a match for |
| 1340 | @var{regexp}, operating on the text after point; if point is not at the |
| 1341 | beginning of a line, it always keeps the current line. In Transient |
| 1342 | Mark mode, if the region is active, the command operates on the region |
| 1343 | instead; it never deletes lines that are only partially contained in |
| 1344 | the region (a newline that ends a line counts as part of that line). |
| 1345 | |
| 1346 | If a match is split across lines, this command keeps all those lines. |
| 1347 | @end table |
| 1348 | |
| 1349 | @ignore |
| 1350 | arch-tag: fd9d8e77-66af-491c-b212-d80999613e3e |
| 1351 | @end ignore |