| 1 | \input texinfo @c -*- texinfo -*- |
| 2 | @c %**start of header |
| 3 | @setfilename ../../info/nxml-mode |
| 4 | @settitle nXML Mode |
| 5 | @c %**end of header |
| 6 | |
| 7 | @copying |
| 8 | |
| 9 | This manual documents nxml-mode, an Emacs major mode for editing |
| 10 | XML with RELAX NG support. |
| 11 | |
| 12 | Copyright @copyright{} 2007, 2008, 2009 Free Software Foundation, Inc. |
| 13 | |
| 14 | @quotation |
| 15 | Permission is granted to copy, distribute and/or modify this document |
| 16 | under the terms of the GNU Free Documentation License, Version 1.3 or |
| 17 | any later version published by the Free Software Foundation; with no |
| 18 | Invariant Sections, with the Front-Cover texts being ``A GNU |
| 19 | Manual,'' and with the Back-Cover Texts as in (a) below. A copy of the |
| 20 | license is included in the section entitled ``GNU Free Documentation |
| 21 | License'' in the Emacs manual. |
| 22 | |
| 23 | (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and |
| 24 | modify this GNU manual. Buying copies from the FSF supports it in |
| 25 | developing GNU and promoting software freedom.'' |
| 26 | |
| 27 | This document is part of a collection distributed under the GNU Free |
| 28 | Documentation License. If you want to distribute this document |
| 29 | separately from the collection, you can do so by adding a copy of the |
| 30 | license to the document, as described in section 6 of the license. |
| 31 | @end quotation |
| 32 | @end copying |
| 33 | |
| 34 | @dircategory Emacs |
| 35 | @direntry |
| 36 | * nXML Mode: (nxml-mode). XML editing mode with RELAX NG support. |
| 37 | @end direntry |
| 38 | |
| 39 | @node Top |
| 40 | @top nXML Mode |
| 41 | |
| 42 | This manual documents nxml-mode, an Emacs major mode for editing |
| 43 | XML with RELAX NG support. This manual is not yet complete. |
| 44 | |
| 45 | @menu |
| 46 | * Completion:: |
| 47 | * Inserting end-tags:: |
| 48 | * Paragraphs:: |
| 49 | * Outlining:: |
| 50 | * Locating a schema:: |
| 51 | * DTDs:: |
| 52 | * Limitations:: |
| 53 | @end menu |
| 54 | |
| 55 | @node Completion |
| 56 | @chapter Completion |
| 57 | |
| 58 | Apart from real-time validation, the most important feature that |
| 59 | nxml-mode provides for assisting in document creation is "completion". |
| 60 | Completion assists the user in inserting characters at point, based on |
| 61 | knowledge of the schema and on the contents of the buffer before |
| 62 | point. |
| 63 | |
| 64 | The traditional GNU Emacs key combination for completion in a |
| 65 | buffer is @kbd{M-@key{TAB}}. However, many window systems |
| 66 | and window managers use this key combination themselves (typically for |
| 67 | switching between windows) and do not pass it to applications. It's |
| 68 | hard to find key combinations in GNU Emacs that are both easy to type |
| 69 | and not taken by something else. @kbd{C-@key{RET}} (i.e. |
| 70 | pressing the Enter or Return key, while the Ctrl key is held down) is |
| 71 | available. It won't be available on a traditional terminal (because |
| 72 | it is indistinguishable from Return), but it will work with a window |
| 73 | system. Therefore we adopt the following solution by default: use |
| 74 | @kbd{C-@key{RET}} when there's a window system and |
| 75 | @kbd{M-@key{TAB}} when there's not. In the following, I |
| 76 | will assume that a window system is being used and will therefore |
| 77 | refer to @kbd{C-@key{RET}}. |
| 78 | |
| 79 | Completion works by examining the symbol preceding point. This |
| 80 | is the symbol to be completed. The symbol to be completed may be the |
| 81 | empty. Completion considers what symbols starting with the symbol to |
| 82 | be completed would be valid replacements for the symbol to be |
| 83 | completed, given the schema and the contents of the buffer before |
| 84 | point. These symbols are the possible completions. An example may |
| 85 | make this clearer. Suppose the buffer looks like this (where @point{} |
| 86 | indicates point): |
| 87 | |
| 88 | @example |
| 89 | <html xmlns="http://www.w3.org/1999/xhtml"> |
| 90 | <h@point{} |
| 91 | @end example |
| 92 | |
| 93 | @noindent |
| 94 | and the schema is XHTML. In this context, the symbol to be completed |
| 95 | is @samp{h}. The possible completions consist of just |
| 96 | @samp{head}. Another example, is |
| 97 | |
| 98 | @example |
| 99 | <html xmlns="http://www.w3.org/1999/xhtml"> |
| 100 | <head> |
| 101 | <@point{} |
| 102 | @end example |
| 103 | |
| 104 | @noindent |
| 105 | In this case, the symbol to be completed is empty, and the possible |
| 106 | completions are @samp{base}, @samp{isindex}, |
| 107 | @samp{link}, @samp{meta}, @samp{script}, |
| 108 | @samp{style}, @samp{title}. Another example is: |
| 109 | |
| 110 | @example |
| 111 | <html xmlns="@point{} |
| 112 | @end example |
| 113 | |
| 114 | @noindent |
| 115 | In this case, the symbol to be completed is empty, and the possible |
| 116 | completions are just @samp{http://www.w3.org/1999/xhtml}. |
| 117 | |
| 118 | When you type @kbd{C-@key{RET}}, what happens depends |
| 119 | on what the set of possible completions are. |
| 120 | |
| 121 | @itemize @bullet |
| 122 | @item |
| 123 | If the set of completions is empty, nothing |
| 124 | happens. |
| 125 | @item |
| 126 | If there is one possible completion, then that completion is |
| 127 | inserted, together with any following characters that are |
| 128 | required. For example, in this case: |
| 129 | |
| 130 | @example |
| 131 | <html xmlns="http://www.w3.org/1999/xhtml"> |
| 132 | <@point{} |
| 133 | @end example |
| 134 | |
| 135 | @noindent |
| 136 | @kbd{C-@key{RET}} will yield |
| 137 | |
| 138 | @example |
| 139 | <html xmlns="http://www.w3.org/1999/xhtml"> |
| 140 | <head@point{} |
| 141 | @end example |
| 142 | @item |
| 143 | If there is more than one possible completion, but all |
| 144 | possible completions share a common non-empty prefix, then that prefix |
| 145 | is inserted. For example, suppose the buffer is: |
| 146 | |
| 147 | @example |
| 148 | <html x@point{} |
| 149 | @end example |
| 150 | |
| 151 | @noindent |
| 152 | The symbol to be completed is @samp{x}. The possible completions |
| 153 | are @samp{xmlns} and @samp{xml:lang}. These share a |
| 154 | common prefix of @samp{xml}. Thus, @kbd{C-@key{RET}} |
| 155 | will yield: |
| 156 | |
| 157 | @example |
| 158 | <html xml@point{} |
| 159 | @end example |
| 160 | |
| 161 | @noindent |
| 162 | Typically, you would do @kbd{C-@key{RET}} again, which would |
| 163 | have the result described in the next item. |
| 164 | @item |
| 165 | If there is more than one possible completion, but the |
| 166 | possible completions do not share a non-empty prefix, then Emacs will |
| 167 | prompt you to input the symbol in the minibuffer, initializing the |
| 168 | minibuffer with the symbol to be completed, and popping up a buffer |
| 169 | showing the possible completions. You can now input the symbol to be |
| 170 | inserted. The symbol you input will be inserted in the buffer instead |
| 171 | of the symbol to be completed. Emacs will then insert any required |
| 172 | characters after the symbol. For example, if it contains: |
| 173 | |
| 174 | @example |
| 175 | <html xml@point{} |
| 176 | @end example |
| 177 | |
| 178 | @noindent |
| 179 | Emacs will prompt you in the minibuffer with |
| 180 | |
| 181 | @example |
| 182 | Attribute: xml@point{} |
| 183 | @end example |
| 184 | |
| 185 | @noindent |
| 186 | and the buffer showing possible completions will contain |
| 187 | |
| 188 | @example |
| 189 | Possible completions are: |
| 190 | xml:lang xmlns |
| 191 | @end example |
| 192 | |
| 193 | @noindent |
| 194 | If you input @kbd{xmlns}, the result will be: |
| 195 | |
| 196 | @example |
| 197 | <html xmlns="@point{} |
| 198 | @end example |
| 199 | |
| 200 | @noindent |
| 201 | (If you do @kbd{C-@key{RET}} again, the namespace URI will |
| 202 | be inserted. Should that happen automatically?) |
| 203 | @end itemize |
| 204 | |
| 205 | @node Inserting end-tags |
| 206 | @chapter Inserting end-tags |
| 207 | |
| 208 | The main redundancy in XML syntax is end-tags. nxml-mode provides |
| 209 | several ways to make it easier to enter end-tags. You can use all of |
| 210 | these without a schema. |
| 211 | |
| 212 | You can use @kbd{C-@key{RET}} after @samp{</} |
| 213 | to complete the rest of the end-tag. |
| 214 | |
| 215 | @kbd{C-c C-f} inserts an end-tag for the element containing |
| 216 | point. This command is useful when you want to input the start-tag, |
| 217 | then input the content and finally input the end-tag. The @samp{f} |
| 218 | is mnemonic for finish. |
| 219 | |
| 220 | If you want to keep tags balanced and input the end-tag at the |
| 221 | same time as the start-tag, before inputting the content, then you can |
| 222 | use @kbd{C-c C-i}. This inserts a @samp{>}, then inserts |
| 223 | the end-tag and leaves point before the end-tag. @kbd{C-c C-b} |
| 224 | is similar but more convenient for block-level elements: it puts the |
| 225 | start-tag, point and the end-tag on successive lines, appropriately |
| 226 | indented. The @samp{i} is mnemonic for inline and the |
| 227 | @samp{b} is mnemonic for block. |
| 228 | |
| 229 | Finally, you can customize nxml-mode so that @kbd{/} |
| 230 | automatically inserts the rest of the end-tag when it occurs after |
| 231 | @samp{<}, by doing |
| 232 | |
| 233 | @display |
| 234 | @kbd{M-x customize-variable @key{RET} nxml-slash-auto-complete-flag @key{RET}} |
| 235 | @end display |
| 236 | |
| 237 | @noindent |
| 238 | and then following the instructions in the displayed buffer. |
| 239 | |
| 240 | @node Paragraphs |
| 241 | @chapter Paragraphs |
| 242 | |
| 243 | Emacs has several commands that operate on paragraphs, most |
| 244 | notably @kbd{M-q}. nXML mode redefines these to work in a way |
| 245 | that is useful for XML. The exact rules that are used to find the |
| 246 | beginning and end of a paragraph are complicated; they are designed |
| 247 | mainly to ensure that @kbd{M-q} does the right thing. |
| 248 | |
| 249 | A paragraph consists of one or more complete, consecutive lines. |
| 250 | A group of lines is not considered a paragraph unless it contains some |
| 251 | non-whitespace characters between tags or inside comments. A blank |
| 252 | line separates paragraphs. A single tag on a line by itself also |
| 253 | separates paragraphs. More precisely, if one tag together with any |
| 254 | leading and trailing whitespace completely occupy one or more lines, |
| 255 | then those lines will not be included in any paragraph. |
| 256 | |
| 257 | A start-tag at the beginning of the line (possibly indented) may |
| 258 | be treated as starting a paragraph. Similarly, an end-tag at the end |
| 259 | of the line may be treated as ending a paragraph. The following rules |
| 260 | are used to determine whether such a tag is in fact treated as a |
| 261 | paragraph boundary: |
| 262 | |
| 263 | @itemize @bullet |
| 264 | @item |
| 265 | If the schema does not allow text at that point, then it |
| 266 | is a paragraph boundary. |
| 267 | @item |
| 268 | If the end-tag corresponding to the start-tag is not at |
| 269 | the end of its line, or the start-tag corresponding to the end-tag is |
| 270 | not at the beginning of its line, then it is not a paragraph |
| 271 | boundary. For example, in |
| 272 | |
| 273 | @example |
| 274 | <p>This is a paragraph with an |
| 275 | <emph>emphasized</emph> phrase. |
| 276 | @end example |
| 277 | |
| 278 | @noindent |
| 279 | the @samp{<emph>} start-tag would not be considered as |
| 280 | starting a paragraph, because its corresponding end-tag is not at the |
| 281 | end of the line. |
| 282 | @item |
| 283 | If there is text that is a sibling in element tree, then |
| 284 | it is not a paragraph boundary. For example, in |
| 285 | |
| 286 | @example |
| 287 | <p>This is a paragraph with an |
| 288 | <emph>emphasized phrase that takes one source line</emph> |
| 289 | @end example |
| 290 | |
| 291 | @noindent |
| 292 | the @samp{<emph>} start-tag would not be considered as |
| 293 | starting a paragraph, even though its end-tag is at the end of its |
| 294 | line, because there the text @samp{This is a paragraph with an} |
| 295 | is a sibling of the @samp{emph} element. |
| 296 | @item |
| 297 | Otherwise, it is a paragraph boundary. |
| 298 | @end itemize |
| 299 | |
| 300 | @node Outlining |
| 301 | @chapter Outlining |
| 302 | |
| 303 | nXML mode allows you to display all or part of a buffer as an |
| 304 | outline, in a similar way to Emacs' outline mode. An outline in nXML |
| 305 | mode is based on recognizing two kinds of element: sections and |
| 306 | headings. There is one heading for every section and one section for |
| 307 | every heading. A section contains its heading as or within its first |
| 308 | child element. A section also contains its subordinate sections (its |
| 309 | subsections). The text content of a section consists of anything in a |
| 310 | section that is neither a subsection nor a heading. |
| 311 | |
| 312 | Note that this is a different model from that used by XHTML. |
| 313 | nXML mode's outline support will not be useful for XHTML unless you |
| 314 | adopt a convention of adding a @code{div} to enclose each |
| 315 | section, rather than having sections implicitly delimited by different |
| 316 | @code{h@var{n}} elements. This limitation may be removed |
| 317 | in a future version. |
| 318 | |
| 319 | The variable @code{nxml-section-element-name-regexp} gives |
| 320 | a regexp for the local names (i.e. the part of the name following any |
| 321 | prefix) of section elements. The variable |
| 322 | @code{nxml-heading-element-name-regexp} gives a regexp for the |
| 323 | local names of heading elements. For an element to be recognized |
| 324 | as a section |
| 325 | |
| 326 | @itemize @bullet |
| 327 | @item |
| 328 | its start-tag must occur at the beginning of a line |
| 329 | (possibly indented); |
| 330 | @item |
| 331 | its local name must match |
| 332 | @code{nxml-section-element-name-regexp}; |
| 333 | @item |
| 334 | either its first child element or a descendant of that |
| 335 | first child element must have a local name that matches |
| 336 | @code{nxml-heading-element-name-regexp}; the first such element |
| 337 | is treated as the section's heading. |
| 338 | @end itemize |
| 339 | |
| 340 | @noindent |
| 341 | You can customize these variables using @kbd{M-x |
| 342 | customize-variable}. |
| 343 | |
| 344 | There are three possible outline states for a section: |
| 345 | |
| 346 | @itemize @bullet |
| 347 | @item |
| 348 | normal, showing everything, including its heading, text |
| 349 | content and subsections; each subsection is displayed according to the |
| 350 | state of that subsection; |
| 351 | @item |
| 352 | showing just its heading, with both its text content and |
| 353 | its subsections hidden; all subsections are hidden regardless of their |
| 354 | state; |
| 355 | @item |
| 356 | showing its heading and its subsections, with its text |
| 357 | content hidden; each subsection is displayed according to the state of |
| 358 | that subsection. |
| 359 | @end itemize |
| 360 | |
| 361 | In the last two states, where the text content is hidden, the |
| 362 | heading is displayed specially, in an abbreviated form. An element |
| 363 | like this: |
| 364 | |
| 365 | @example |
| 366 | <section> |
| 367 | <title>Food</title> |
| 368 | <para>There are many kinds of food.</para> |
| 369 | </section> |
| 370 | @end example |
| 371 | |
| 372 | @noindent |
| 373 | would be displayed on a single line like this: |
| 374 | |
| 375 | @example |
| 376 | <-section>Food...</> |
| 377 | @end example |
| 378 | |
| 379 | @noindent |
| 380 | If there are hidden subsections, then a @code{+} will be used |
| 381 | instead of a @code{-} like this: |
| 382 | |
| 383 | @example |
| 384 | <+section>Food...</> |
| 385 | @end example |
| 386 | |
| 387 | @noindent |
| 388 | If there are non-hidden subsections, then the section will instead be |
| 389 | displayed like this: |
| 390 | |
| 391 | @example |
| 392 | <-section>Food... |
| 393 | <-section>Delicious Food...</> |
| 394 | <-section>Distasteful Food...</> |
| 395 | </-section> |
| 396 | @end example |
| 397 | |
| 398 | @noindent |
| 399 | The heading is always displayed with an indent that corresponds to its |
| 400 | depth in the outline, even it is not actually indented in the buffer. |
| 401 | The variable @code{nxml-outline-child-indent} controls how much |
| 402 | a subheading is indented with respect to its parent heading when the |
| 403 | heading is being displayed specially. |
| 404 | |
| 405 | Commands to change the outline state of sections are bound to |
| 406 | key sequences that start with @kbd{C-c C-o} (@kbd{o} is |
| 407 | mnemonic for outline). The third and final key has been chosen to be |
| 408 | consistent with outline mode. In the following descriptions |
| 409 | current section means the section containing point, or, more precisely, |
| 410 | the innermost section containing the character immediately following |
| 411 | point. |
| 412 | |
| 413 | @itemize @bullet |
| 414 | @item |
| 415 | @kbd{C-c C-o C-a} shows all sections in the buffer |
| 416 | normally. |
| 417 | @item |
| 418 | @kbd{C-c C-o C-t} hides the text content |
| 419 | of all sections in the buffer. |
| 420 | @item |
| 421 | @kbd{C-c C-o C-c} hides the text content |
| 422 | of the current section. |
| 423 | @item |
| 424 | @kbd{C-c C-o C-e} shows the text content |
| 425 | of the current section. |
| 426 | @item |
| 427 | @kbd{C-c C-o C-d} hides the text content |
| 428 | and subsections of the current section. |
| 429 | @item |
| 430 | @kbd{C-c C-o C-s} shows the current section |
| 431 | and all its direct and indirect subsections normally. |
| 432 | @item |
| 433 | @kbd{C-c C-o C-k} shows the headings of the |
| 434 | direct and indirect subsections of the current section. |
| 435 | @item |
| 436 | @kbd{C-c C-o C-l} hides the text content of the |
| 437 | current section and of its direct and indirect |
| 438 | subsections. |
| 439 | @item |
| 440 | @kbd{C-c C-o C-i} shows the headings of the |
| 441 | direct subsections of the current section. |
| 442 | @item |
| 443 | @kbd{C-c C-o C-o} hides as much as possible without |
| 444 | hiding the current section's text content; the headings of ancestor |
| 445 | sections of the current section and their child section sections will |
| 446 | not be hidden. |
| 447 | @end itemize |
| 448 | |
| 449 | When a heading is displayed specially, you can use |
| 450 | @key{RET} in that heading to show the text content of the section |
| 451 | in the same way as @kbd{C-c C-o C-e}. |
| 452 | |
| 453 | You can also use the mouse to change the outline state: |
| 454 | @kbd{S-mouse-2} hides the text content of a section in the same |
| 455 | way as@kbd{C-c C-o C-c}; @kbd{mouse-2} on a specially |
| 456 | displayed heading shows the text content of the section in the same |
| 457 | way as @kbd{C-c C-o C-e}; @kbd{mouse-1} on a specially |
| 458 | displayed start-tag toggles the display of subheadings on and |
| 459 | off. |
| 460 | |
| 461 | The outline state for each section is stored with the first |
| 462 | character of the section (as a text property). Every command that |
| 463 | changes the outline state of any section updates the display of the |
| 464 | buffer so that each section is displayed correctly according to its |
| 465 | outline state. If the section structure is subsequently changed, then |
| 466 | it is possible for the display to no longer correctly reflect the |
| 467 | stored outline state. @kbd{C-c C-o C-r} can be used to refresh |
| 468 | the display so it is correct again. |
| 469 | |
| 470 | @node Locating a schema |
| 471 | @chapter Locating a schema |
| 472 | |
| 473 | nXML mode has a configurable set of rules to locate a schema for |
| 474 | the file being edited. The rules are contained in one or more schema |
| 475 | locating files, which are XML documents. |
| 476 | |
| 477 | The variable @samp{rng-schema-locating-files} specifies |
| 478 | the list of the file-names of schema locating files that nXML mode |
| 479 | should use. The order of the list is significant: when file |
| 480 | @var{x} occurs in the list before file @var{y} then rules |
| 481 | from file @var{x} have precedence over rules from file |
| 482 | @var{y}. A filename specified in |
| 483 | @samp{rng-schema-locating-files} may be relative. If so, it will |
| 484 | be resolved relative to the document for which a schema is being |
| 485 | located. It is not an error if relative file-names in |
| 486 | @samp{rng-schema-locating-files} do not not exist. You can use |
| 487 | @kbd{M-x customize-variable @key{RET} rng-schema-locating-files |
| 488 | @key{RET}} to customize the list of schema locating |
| 489 | files. |
| 490 | |
| 491 | By default, @samp{rng-schema-locating-files} list has two |
| 492 | members: @samp{schemas.xml}, and |
| 493 | @samp{@var{dist-dir}/schema/schemas.xml} where |
| 494 | @samp{@var{dist-dir}} is the directory containing the nXML |
| 495 | distribution. The first member will cause nXML mode to use a file |
| 496 | @samp{schemas.xml} in the same directory as the document being |
| 497 | edited if such a file exist. The second member contains rules for the |
| 498 | schemas that are included with the nXML distribution. |
| 499 | |
| 500 | @menu |
| 501 | * Commands for locating a schema:: |
| 502 | * Schema locating files:: |
| 503 | @end menu |
| 504 | |
| 505 | @node Commands for locating a schema |
| 506 | @section Commands for locating a schema |
| 507 | |
| 508 | The command @kbd{C-c C-s C-w} will tell you what schema |
| 509 | is currently being used. |
| 510 | |
| 511 | The rules for locating a schema are applied automatically when |
| 512 | you visit a file in nXML mode. However, if you have just created a new |
| 513 | file and the schema cannot be inferred from the file-name, then this |
| 514 | will not locate the right schema. In this case, you should insert the |
| 515 | start-tag of the root element and then use the command @kbd{C-c |
| 516 | C-a}, which reapplies the rules based on the current content of |
| 517 | the document. It is usually not necessary to insert the complete |
| 518 | start-tag; often just @samp{<@var{name}} is |
| 519 | enough. |
| 520 | |
| 521 | If you want to use a schema that has not yet been added to the |
| 522 | schema locating files, you can use the command @kbd{C-c C-s C-f} |
| 523 | to manually select the file contaiing the schema for the document in |
| 524 | current buffer. Emacs will read the file-name of the schema from the |
| 525 | minibuffer. After reading the file-name, Emacs will ask whether you |
| 526 | wish to add a rule to a schema locating file that persistently |
| 527 | associates the document with the selected schema. The rule will be |
| 528 | added to the first file in the list specified |
| 529 | @samp{rng-schema-locating-files}; it will create the file if |
| 530 | necessary, but will not create a directory. If the variable |
| 531 | @samp{rng-schema-locating-files} has not been customized, this |
| 532 | means that the rule will be added to the file @samp{schemas.xml} |
| 533 | in the same directory as the document being edited. |
| 534 | |
| 535 | The command @kbd{C-c C-s C-t} allows you to select a schema by |
| 536 | specifying an identifier for the type of the document. The schema |
| 537 | locating files determine the available type identifiers and what |
| 538 | schema is used for each type identifier. This is useful when it is |
| 539 | impossible to infer the right schema from either the file-name or the |
| 540 | content of the document, even though the schema is already in the |
| 541 | schema locating file. A situation in which this can occur is when |
| 542 | there are multiple variants of a schema where all valid documents have |
| 543 | the same document element. For example, XHTML has Strict and |
| 544 | Transitional variants. In a situation like this, a schema locating file |
| 545 | can define a type identifier for each variant. As with @kbd{C-c |
| 546 | C-s C-f}, Emacs will ask whether you wish to add a rule to a schema |
| 547 | locating file that persistently associates the document with the |
| 548 | specified type identifier. |
| 549 | |
| 550 | The command @kbd{C-c C-s C-l} adds a rule to a schema |
| 551 | locating file that persistently associates the document with |
| 552 | the schema that is currently being used. |
| 553 | |
| 554 | @node Schema locating files |
| 555 | @section Schema locating files |
| 556 | |
| 557 | Each schema locating file specifies a list of rules. The rules |
| 558 | from each file are appended in order. To locate a schema each rule is |
| 559 | applied in turn until a rule matches. The first matching rule is then |
| 560 | used to determine the schema. |
| 561 | |
| 562 | Schema locating files are designed to be useful for other |
| 563 | applications that need to locate a schema for a document. In fact, |
| 564 | there is nothing specific to locating schemas in the design; it could |
| 565 | equally well be used for locating a stylesheet. |
| 566 | |
| 567 | @menu |
| 568 | * Schema locating file syntax basics:: |
| 569 | * Using the document's URI to locate a schema:: |
| 570 | * Using the document element to locate a schema:: |
| 571 | * Using type identifiers in schema locating files:: |
| 572 | * Using multiple schema locating files:: |
| 573 | @end menu |
| 574 | |
| 575 | @node Schema locating file syntax basics |
| 576 | @subsection Schema locating file syntax basics |
| 577 | |
| 578 | There is a schema for schema locating files in the file |
| 579 | @samp{locate.rnc} in the schema directory. Schema locating |
| 580 | files must be valid with respect to this schema. |
| 581 | |
| 582 | The document element of a schema locating file must be |
| 583 | @samp{locatingRules} and the namespace URI must be |
| 584 | @samp{http://thaiopensource.com/ns/locating-rules/1.0}. The |
| 585 | children of the document element specify rules. The order of the |
| 586 | children is the same as the order of the rules. Here's a complete |
| 587 | example of a schema locating file: |
| 588 | |
| 589 | @example |
| 590 | <?xml version="1.0"?> |
| 591 | <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0"> |
| 592 | <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/> |
| 593 | <documentElement localName="book" uri="docbook.rnc"/> |
| 594 | </locatingRules> |
| 595 | @end example |
| 596 | |
| 597 | @noindent |
| 598 | This says to use the schema @samp{xhtml.rnc} for a document with |
| 599 | namespace @samp{http://www.w3.org/1999/xhtml}, and to use the |
| 600 | schema @samp{docbook.rnc} for a document whose local name is |
| 601 | @samp{book}. If the document element had both a namespace URI |
| 602 | of @samp{http://www.w3.org/1999/xhtml} and a local name of |
| 603 | @samp{book}, then the matching rule that comes first will be |
| 604 | used and so the schema @samp{xhtml.rnc} would be used. There is |
| 605 | no precedence between different types of rule; the first matching rule |
| 606 | of any type is used. |
| 607 | |
| 608 | As usual with XML-related technologies, resources are identified |
| 609 | by URIs. The @samp{uri} attribute identifies the schema by |
| 610 | specifying the URI. The URI may be relative. If so, it is resolved |
| 611 | relative to the URI of the schema locating file that contains |
| 612 | attribute. This means that if the value of @samp{uri} attribute |
| 613 | does not contain a @samp{/}, then it will refer to a filename in |
| 614 | the same directory as the schema locating file. |
| 615 | |
| 616 | @node Using the document's URI to locate a schema |
| 617 | @subsection Using the document's URI to locate a schema |
| 618 | |
| 619 | A @samp{uri} rule locates a schema based on the URI of the |
| 620 | document. The @samp{uri} attribute specifies the URI of the |
| 621 | schema. The @samp{resource} attribute can be used to specify |
| 622 | the schema for a particular document. For example, |
| 623 | |
| 624 | @example |
| 625 | <uri resource="spec.xml" uri="docbook.rnc"/> |
| 626 | @end example |
| 627 | |
| 628 | @noindent |
| 629 | specifies that that the schema for @samp{spec.xml} is |
| 630 | @samp{docbook.rnc}. |
| 631 | |
| 632 | The @samp{pattern} attribute can be used instead of the |
| 633 | @samp{resource} attribute to specify the schema for any document |
| 634 | whose URI matches a pattern. The pattern has the same syntax as an |
| 635 | absolute or relative URI except that the path component of the URI can |
| 636 | use a @samp{*} character to stand for zero or more characters |
| 637 | within a path segment (i.e. any character other @samp{/}). |
| 638 | Typically, the URI pattern looks like a relative URI, but, whereas a |
| 639 | relative URI in the @samp{resource} attribute is resolved into a |
| 640 | particular absolute URI using the base URI of the schema locating |
| 641 | file, a relative URI pattern matches if it matches some number of |
| 642 | complete path segments of the document's URI ending with the last path |
| 643 | segment of the document's URI. For example, |
| 644 | |
| 645 | @example |
| 646 | <uri pattern="*.xsl" uri="xslt.rnc"/> |
| 647 | @end example |
| 648 | |
| 649 | @noindent |
| 650 | specifies that the schema for documents with a URI whose path ends |
| 651 | with @samp{.xsl} is @samp{xslt.rnc}. |
| 652 | |
| 653 | A @samp{transformURI} rule locates a schema by |
| 654 | transforming the URI of the document. The @samp{fromPattern} |
| 655 | attribute specifies a URI pattern with the same meaning as the |
| 656 | @samp{pattern} attribute of the @samp{uri} element. The |
| 657 | @samp{toPattern} attribute is a URI pattern that is used to |
| 658 | generate the URI of the schema. Each @samp{*} in the |
| 659 | @samp{toPattern} is replaced by the string that matched the |
| 660 | corresponding @samp{*} in the @samp{fromPattern}. The |
| 661 | resulting string is appended to the initial part of the document's URI |
| 662 | that was not explicitly matched by the @samp{fromPattern}. The |
| 663 | rule matches only if the transformed URI identifies an existing |
| 664 | resource. For example, the rule |
| 665 | |
| 666 | @example |
| 667 | <transformURI fromPattern="*.xml" toPattern="*.rnc"/> |
| 668 | @end example |
| 669 | |
| 670 | @noindent |
| 671 | would transform the URI @samp{file:///home/jjc/docs/spec.xml} |
| 672 | into the URI @samp{file:///home/jjc/docs/spec.rnc}. Thus, this |
| 673 | rule specifies that to locate a schema for a document |
| 674 | @samp{@var{foo}.xml}, Emacs should test whether a file |
| 675 | @samp{@var{foo}.rnc} exists in the same directory as |
| 676 | @samp{@var{foo}.xml}, and, if so, should use it as the |
| 677 | schema. |
| 678 | |
| 679 | @node Using the document element to locate a schema |
| 680 | @subsection Using the document element to locate a schema |
| 681 | |
| 682 | A @samp{documentElement} rule locates a schema based on |
| 683 | the local name and prefix of the document element. For example, a rule |
| 684 | |
| 685 | @example |
| 686 | <documentElement prefix="xsl" localName="stylesheet" uri="xslt.rnc"/> |
| 687 | @end example |
| 688 | |
| 689 | @noindent |
| 690 | specifies that when the name of the document element is |
| 691 | @samp{xsl:stylesheet}, then @samp{xslt.rnc} should be used |
| 692 | as the schema. Either the @samp{prefix} or |
| 693 | @samp{localName} attribute may be omitted to allow any prefix or |
| 694 | local name. |
| 695 | |
| 696 | A @samp{namespace} rule locates a schema based on the |
| 697 | namespace URI of the document element. For example, a rule |
| 698 | |
| 699 | @example |
| 700 | <namespace ns="http://www.w3.org/1999/XSL/Transform" uri="xslt.rnc"/> |
| 701 | @end example |
| 702 | |
| 703 | @noindent |
| 704 | specifies that when the namespace URI of the document is |
| 705 | @samp{http://www.w3.org/1999/XSL/Transform}, then |
| 706 | @samp{xslt.rnc} should be used as the schema. |
| 707 | |
| 708 | @node Using type identifiers in schema locating files |
| 709 | @subsection Using type identifiers in schema locating files |
| 710 | |
| 711 | Type identifiers allow a level of indirection in locating the |
| 712 | schema for a document. Instead of associating the document directly |
| 713 | with a schema URI, the document is associated with a type identifier, |
| 714 | which is in turn associated with a schema URI. nXML mode does not |
| 715 | constrain the format of type identifiers. They can be simply strings |
| 716 | without any formal structure or they can be public identifiers or |
| 717 | URIs. Note that these type identifiers have nothing to do with the |
| 718 | DOCTYPE declaration. When comparing type identifiers, whitespace is |
| 719 | normalized in the same way as with the @samp{xsd:token} |
| 720 | datatype: leading and trailing whitespace is stripped; other sequences |
| 721 | of whitespace are normalized to a single space character. |
| 722 | |
| 723 | Each of the rules described in previous sections that uses a |
| 724 | @samp{uri} attribute to specify a schema, can instead use a |
| 725 | @samp{typeId} attribute to specify a type identifier. The type |
| 726 | identifier can be associated with a URI using a @samp{typeId} |
| 727 | element. For example, |
| 728 | |
| 729 | @example |
| 730 | <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0"> |
| 731 | <namespace ns="http://www.w3.org/1999/xhtml" typeId="XHTML"/> |
| 732 | <typeId id="XHTML" typeId="XHTML Strict"/> |
| 733 | <typeId id="XHTML Strict" uri="xhtml-strict.rnc"/> |
| 734 | <typeId id="XHTML Transitional" uri="xhtml-transitional.rnc"/> |
| 735 | </locatingRules> |
| 736 | @end example |
| 737 | |
| 738 | @noindent |
| 739 | declares three type identifiers @samp{XHTML} (representing the |
| 740 | default variant of XHTML to be used), @samp{XHTML Strict} and |
| 741 | @samp{XHTML Transitional}. Such a schema locating file would |
| 742 | use @samp{xhtml-strict.rnc} for a document whose namespace is |
| 743 | @samp{http://www.w3.org/1999/xhtml}. But it is considerably |
| 744 | more flexible than a schema locating file that simply specified |
| 745 | |
| 746 | @example |
| 747 | <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml-strict.rnc"/> |
| 748 | @end example |
| 749 | |
| 750 | @noindent |
| 751 | A user can easily use @kbd{C-c C-s C-t} to select between XHTML |
| 752 | Strict and XHTML Transitional. Also, a user can easily add a catalog |
| 753 | |
| 754 | @example |
| 755 | <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0"> |
| 756 | <typeId id="XHTML" typeId="XHTML Transitional"/> |
| 757 | </locatingRules> |
| 758 | @end example |
| 759 | |
| 760 | @noindent |
| 761 | that makes the default variant of XHTML be XHTML Transitional. |
| 762 | |
| 763 | @node Using multiple schema locating files |
| 764 | @subsection Using multiple schema locating files |
| 765 | |
| 766 | The @samp{include} element includes rules from another |
| 767 | schema locating file. The behavior is exactly as if the rules from |
| 768 | that file were included in place of the @samp{include} element. |
| 769 | Relative URIs are resolved into absolute URIs before the inclusion is |
| 770 | performed. For example, |
| 771 | |
| 772 | @example |
| 773 | <include rules="../rules.xml"/> |
| 774 | @end example |
| 775 | |
| 776 | @noindent |
| 777 | includes the rules from @samp{rules.xml}. |
| 778 | |
| 779 | The process of locating a schema takes as input a list of schema |
| 780 | locating files. The rules in all these files and in the files they |
| 781 | include are resolved into a single list of rules, which are applied |
| 782 | strictly in order. Sometimes this order is not what is needed. |
| 783 | For example, suppose you have two schema locating files, a private |
| 784 | file |
| 785 | |
| 786 | @example |
| 787 | <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0"> |
| 788 | <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/> |
| 789 | </locatingRules> |
| 790 | @end example |
| 791 | |
| 792 | @noindent |
| 793 | followed by a public file |
| 794 | |
| 795 | @example |
| 796 | <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0"> |
| 797 | <transformURI pathSuffix=".xml" replacePathSuffix=".rnc"/> |
| 798 | <namespace ns="http://www.w3.org/1999/XSL/Transform" typeId="XSLT"/> |
| 799 | </locatingRules> |
| 800 | @end example |
| 801 | |
| 802 | @noindent |
| 803 | The effect of these two files is that the XHTML @samp{namespace} |
| 804 | rule takes precedence over the @samp{transformURI} rule, which |
| 805 | is almost certainly not what is needed. This can be solved by adding |
| 806 | an @samp{applyFollowingRules} to the private file. |
| 807 | |
| 808 | @example |
| 809 | <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0"> |
| 810 | <applyFollowingRules ruleType="transformURI"/> |
| 811 | <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/> |
| 812 | </locatingRules> |
| 813 | @end example |
| 814 | |
| 815 | @node DTDs |
| 816 | @chapter DTDs |
| 817 | |
| 818 | nxml-mode is designed to support the creation of standalone XML |
| 819 | documents that do not depend on a DTD. Although it is common practice |
| 820 | to insert a DOCTYPE declaration referencing an external DTD, this has |
| 821 | undesirable side-effects. It means that the document is no longer |
| 822 | self-contained. It also means that different XML parsers may interpret |
| 823 | the document in different ways, since the XML Recommendation does not |
| 824 | require XML parsers to read the DTD. With DTDs, it was impractical to |
| 825 | get validation without using an external DTD or reference to an |
| 826 | parameter entity. With RELAX NG and other schema languages, you can |
| 827 | simulataneously get the benefits of validation and standalone XML |
| 828 | documents. Therefore, I recommend that you do not reference an |
| 829 | external DOCTYPE in your XML documents. |
| 830 | |
| 831 | One problem is entities for characters. Typically, as well as |
| 832 | providing validation, DTDs also provide a set of character entities |
| 833 | for documents to use. Schemas cannot provide this functionality, |
| 834 | because schema validation happens after XML parsing. The recommended |
| 835 | solution is to either use the Unicode characters directly, or, if this |
| 836 | is impractical, use character references. nXML mode supports this by |
| 837 | providing commands for entering characters and character references |
| 838 | using the Unicode names, and can display the glyph corresponding to a |
| 839 | character reference. |
| 840 | |
| 841 | @node Limitations |
| 842 | @chapter Limitations |
| 843 | |
| 844 | nXML mode has some limitations: |
| 845 | |
| 846 | @itemize @bullet |
| 847 | @item |
| 848 | DTD support is limited. Internal parsed general entities declared |
| 849 | in the internal subset are supported provided they do not contain |
| 850 | elements. Other usage of DTDs is ignored. |
| 851 | @item |
| 852 | The restrictions on RELAX NG schemas in section 7 of the RELAX NG |
| 853 | specification are not enforced. |
| 854 | @item |
| 855 | Unicode support has problems. This stems mostly from the fact that |
| 856 | the XML (and RELAX NG) character model is based squarely on Unicode, |
| 857 | whereas the Emacs character model is not. Emacs 22 is slated to have |
| 858 | full Unicode support, which should improve the situation here. |
| 859 | @end itemize |
| 860 | |
| 861 | @bye |
| 862 | |
| 863 | @ignore |
| 864 | arch-tag: 3b6e8ac2-ae8d-4f38-bd43-ce9f80be04d6 |
| 865 | @end ignore |