doc/misc/nxml-mode.texi

   1 \input texinfo @c -*- texinfo -*-
   2 @c %**start of header
   3 @setfilename ../../info/nxml-mode
   4 @settitle nXML Mode
   5 @c %**end of header
   6
   7 @copying
   8 This manual documents nxml-mode, an Emacs major mode for editing
   9 XML with RELAX NG support.
  10
  11 Copyright @copyright{} 2007-2012 Free Software Foundation, Inc.
  12
  13 @quotation
  14 Permission is granted to copy, distribute and/or modify this document
  15 under the terms of the GNU Free Documentation License, Version 1.3 or
  16 any later version published by the Free Software Foundation; with no
  17 Invariant Sections, with the Front-Cover texts being ``A GNU
  18 Manual,'' and with the Back-Cover Texts as in (a) below.  A copy of the
  19 license is included in the section entitled ``GNU Free Documentation
  20 License'' in the Emacs manual.
  21
  22 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
  23 modify this GNU manual.  Buying copies from the FSF supports it in
  24 developing GNU and promoting software freedom.''
  25
  26 This document is part of a collection distributed under the GNU Free
  27 Documentation License.  If you want to distribute this document
  28 separately from the collection, you can do so by adding a copy of the
  29 license to the document, as described in section 6 of the license.
  30 @end quotation
  31 @end copying
  32
  33 @dircategory Emacs editing modes
  34 @direntry
  35 * nXML Mode: (nxml-mode).       XML editing mode with RELAX NG support.
  36 @end direntry
  37
  38 @node Top
  39 @top nXML Mode
  40
  41 @insertcopying
  42
  43 This manual is not yet complete.
  44
  45 @menu
  46 * Introduction::
  47 * Completion::
  48 * Inserting end-tags::
  49 * Paragraphs::
  50 * Outlining::
  51 * Locating a schema::
  52 * DTDs::
  53 * Limitations::
  54 @end menu
  55
  56 @node Introduction
  57 @chapter Introduction
  58
  59 nXML mode is an Emacs major-mode for editing XML documents.  It supports
  60 editing well-formed XML documents, and provides schema-sensitive editing
  61 using RELAX NG Compact Syntax.  To get started, visit a file containing an
  62 XML document, and, if necessary, use @kbd{M-x nxml-mode} to switch to nXML
  63 mode.  By default, @code{auto-mode-alist} and @code{magic-fallback-alist}
  64 put buffers in nXML mode if they have recognizable XML content or file
  65 extensions.  You may wish to customize the settings, for example to
  66 recognize different file extensions.
  67
  68 Once in nXML mode, you can type @kbd{C-h m} for basic information on the
  69 mode.
  70
  71 The @file{etc/nxml} directory in the Emacs distribution contains some data
  72 files used by nXML mode, and includes two files (@file{test-valid.xml} and
  73 @file{test-invalid.xml}) that provide examples of valid and invalid XML
  74 documents.
  75
  76 To get validation and schema-sensitive editing, you need a RELAX NG Compact
  77 Syntax (RNC) schema for your document (@pxref{Locating a schema}).  The
  78 @file{etc/schema} directory includes some schemas for popular document
  79 types.  See @url{http://relaxng.org/} for more information on RELAX NG.
  80 You can use the @samp{Trang} program from
  81 @url{http://www.thaiopensource.com/relaxng/trang.html} to
  82 automatically create RNC schemas.  This program can:
  83
  84 @itemize @bullet
  85 @item
  86 infer an RNC schema from an instance document;
  87 @item
  88 convert a DTD to an RNC schema;
  89 @item
  90 convert a RELAX NG XML syntax schema to an RNC schema.
  91 @end itemize
  92
  93 @noindent To convert a RELAX NG XML syntax (@samp{.rng}) schema to a RNC
  94 one, you can also use the XSLT stylesheet from
  95 @url{http://www.pantor.com/download.html}.
  96
  97 To convert a W3C XML Schema to an RNC schema, you need first to convert it
  98 to RELAX NG XML syntax using the RELAX NG converter tool @code{rngconv}
  99 (built on top of MSV).  See @url{https://github.com/kohsuke/msv}
 100 and @url{https://msv.dev.java.net/}.
 101
 102 For historical discussions only, see the mailing list archives at
 103 @url{http://groups.yahoo.com/group/emacs-nxml-mode/}.  Please make all new
 104 discussions on the @samp{help-gnu-emacs} and @samp{emacs-devel} mailing
 105 lists.  Report any bugs with @kbd{M-x report-emacs-bug}.
 106
 107
 108 @node Completion
 109 @chapter Completion
 110
 111 Apart from real-time validation, the most important feature that
 112 nxml-mode provides for assisting in document creation is "completion".
 113 Completion assists the user in inserting characters at point, based on
 114 knowledge of the schema and on the contents of the buffer before
 115 point.
 116
 117 The traditional GNU Emacs key combination for completion in a
 118 buffer is @kbd{M-@key{TAB}}. However, many window systems
 119 and window managers use this key combination themselves (typically for
 120 switching between windows) and do not pass it to applications. It's
 121 hard to find key combinations in GNU Emacs that are both easy to type
 122 and not taken by something else.  @kbd{C-@key{RET}} (i.e.
 123 pressing the Enter or Return key, while the Ctrl key is held down) is
 124 available.  It won't be available on a traditional terminal (because
 125 it is indistinguishable from Return), but it will work with a window
 126 system.  Therefore we adopt the following solution by default: use
 127 @kbd{C-@key{RET}} when there's a window system and
 128 @kbd{M-@key{TAB}} when there's not.  In the following, I
 129 will assume that a window system is being used and will therefore
 130 refer to @kbd{C-@key{RET}}.
 131
 132 Completion works by examining the symbol preceding point.  This
 133 is the symbol to be completed. The symbol to be completed may be the
 134 empty. Completion considers what symbols starting with the symbol to
 135 be completed would be valid replacements for the symbol to be
 136 completed, given the schema and the contents of the buffer before
 137 point.  These symbols are the possible completions.  An example may
 138 make this clearer.  Suppose the buffer looks like this (where @point{}
 139 indicates point):
 140
 141 @example
 142 <html xmlns="http://www.w3.org/1999/xhtml">
 143 <h@point{}
 144 @end example
 145
 146 @noindent
 147 and the schema is XHTML.  In this context, the symbol to be completed
 148 is @samp{h}.  The possible completions consist of just
 149 @samp{head}.  Another example, is
 150
 151 @example
 152 <html xmlns="http://www.w3.org/1999/xhtml">
 153 <head>
 154 <@point{}
 155 @end example
 156
 157 @noindent
 158 In this case, the symbol to be completed is empty, and the possible
 159 completions are @samp{base}, @samp{isindex},
 160 @samp{link}, @samp{meta}, @samp{script},
 161 @samp{style}, @samp{title}.  Another example is:
 162
 163 @example
 164 <html xmlns="@point{}
 165 @end example
 166
 167 @noindent
 168 In this case, the symbol to be completed is empty, and the possible
 169 completions are just @samp{http://www.w3.org/1999/xhtml}.
 170
 171 When you type @kbd{C-@key{RET}}, what happens depends
 172 on what the set of possible completions are.
 173
 174 @itemize @bullet
 175 @item
 176 If the set of completions is empty, nothing
 177 happens.
 178 @item
 179 If there is one possible completion, then that completion is
 180 inserted, together with any following characters that are
 181 required. For example, in this case:
 182
 183 @example
 184 <html xmlns="http://www.w3.org/1999/xhtml">
 185 <@point{}
 186 @end example
 187
 188 @noindent
 189 @kbd{C-@key{RET}} will yield
 190
 191 @example
 192 <html xmlns="http://www.w3.org/1999/xhtml">
 193 <head@point{}
 194 @end example
 195 @item
 196 If there is more than one possible completion, but all
 197 possible completions share a common non-empty prefix, then that prefix
 198 is inserted. For example, suppose the buffer is:
 199
 200 @example
 201 <html x@point{}
 202 @end example
 203
 204 @noindent
 205 The symbol to be completed is @samp{x}. The possible completions
 206 are @samp{xmlns} and @samp{xml:lang}.  These share a
 207 common prefix of @samp{xml}.  Thus, @kbd{C-@key{RET}}
 208 will yield:
 209
 210 @example
 211 <html xml@point{}
 212 @end example
 213
 214 @noindent
 215 Typically, you would do @kbd{C-@key{RET}} again, which would
 216 have the result described in the next item.
 217 @item
 218 If there is more than one possible completion, but the
 219 possible completions do not share a non-empty prefix, then Emacs will
 220 prompt you to input the symbol in the minibuffer, initializing the
 221 minibuffer with the symbol to be completed, and popping up a buffer
 222 showing the possible completions.  You can now input the symbol to be
 223 inserted.  The symbol you input will be inserted in the buffer instead
 224 of the symbol to be completed.  Emacs will then insert any required
 225 characters after the symbol.  For example, if it contains:
 226
 227 @example
 228 <html xml@point{}
 229 @end example
 230
 231 @noindent
 232 Emacs will prompt you in the minibuffer with
 233
 234 @example
 235 Attribute: xml@point{}
 236 @end example
 237
 238 @noindent
 239 and the buffer showing possible completions will contain
 240
 241 @example
 242 Possible completions are:
 243 xml:lang                           xmlns
 244 @end example
 245
 246 @noindent
 247 If you input @kbd{xmlns}, the result will be:
 248
 249 @example
 250 <html xmlns="@point{}
 251 @end example
 252
 253 @noindent
 254 (If you do @kbd{C-@key{RET}} again, the namespace URI will
 255 be inserted. Should that happen automatically?)
 256 @end itemize
 257
 258 @node Inserting end-tags
 259 @chapter Inserting end-tags
 260
 261 The main redundancy in XML syntax is end-tags.  nxml-mode provides
 262 several ways to make it easier to enter end-tags.  You can use all of
 263 these without a schema.
 264
 265 You can use @kbd{C-@key{RET}} after @samp{</}
 266 to complete the rest of the end-tag.
 267
 268 @kbd{C-c C-f} inserts an end-tag for the element containing
 269 point. This command is useful when you want to input the start-tag,
 270 then input the content and finally input the end-tag. The @samp{f}
 271 is mnemonic for finish.
 272
 273 If you want to keep tags balanced and input the end-tag at the
 274 same time as the start-tag, before inputting the content, then you can
 275 use @kbd{C-c C-i}. This inserts a @samp{>}, then inserts
 276 the end-tag and leaves point before the end-tag.  @kbd{C-c C-b}
 277 is similar but more convenient for block-level elements: it puts the
 278 start-tag, point and the end-tag on successive lines, appropriately
 279 indented. The @samp{i} is mnemonic for inline and the
 280 @samp{b} is mnemonic for block.
 281
 282 Finally, you can customize nxml-mode so that @kbd{/}
 283 automatically inserts the rest of the end-tag when it occurs after
 284 @samp{<}, by doing
 285
 286 @display
 287 @kbd{M-x customize-variable @key{RET} nxml-slash-auto-complete-flag @key{RET}}
 288 @end display
 289
 290 @noindent
 291 and then following the instructions in the displayed buffer.
 292
 293 @node Paragraphs
 294 @chapter Paragraphs
 295
 296 Emacs has several commands that operate on paragraphs, most
 297 notably @kbd{M-q}. nXML mode redefines these to work in a way
 298 that is useful for XML.  The exact rules that are used to find the
 299 beginning and end of a paragraph are complicated; they are designed
 300 mainly to ensure that @kbd{M-q} does the right thing.
 301
 302 A paragraph consists of one or more complete, consecutive lines.
 303 A group of lines is not considered a paragraph unless it contains some
 304 non-whitespace characters between tags or inside comments.  A blank
 305 line separates paragraphs.  A single tag on a line by itself also
 306 separates paragraphs.  More precisely, if one tag together with any
 307 leading and trailing whitespace completely occupy one or more lines,
 308 then those lines will not be included in any paragraph.
 309
 310 A start-tag at the beginning of the line (possibly indented) may
 311 be treated as starting a paragraph.  Similarly, an end-tag at the end
 312 of the line may be treated as ending a paragraph. The following rules
 313 are used to determine whether such a tag is in fact treated as a
 314 paragraph boundary:
 315
 316 @itemize @bullet
 317 @item
 318 If the schema does not allow text at that point, then it
 319 is a paragraph boundary.
 320 @item
 321 If the end-tag corresponding to the start-tag is not at
 322 the end of its line, or the start-tag corresponding to the end-tag is
 323 not at the beginning of its line, then it is not a paragraph
 324 boundary. For example, in
 325
 326 @example
 327 <p>This is a paragraph with an
 328 <emph>emphasized</emph> phrase.
 329 @end example
 330
 331 @noindent
 332 the @samp{<emph>} start-tag would not be considered as
 333 starting a paragraph, because its corresponding end-tag is not at the
 334 end of the line.
 335 @item
 336 If there is text that is a sibling in element tree, then
 337 it is not a paragraph boundary.  For example, in
 338
 339 @example
 340 <p>This is a paragraph with an
 341 <emph>emphasized phrase that takes one source line</emph>
 342 @end example
 343
 344 @noindent
 345 the @samp{<emph>} start-tag would not be considered as
 346 starting a paragraph, even though its end-tag is at the end of its
 347 line, because there the text @samp{This is a paragraph with an}
 348 is a sibling of the @samp{emph} element.
 349 @item
 350 Otherwise, it is a paragraph boundary.
 351 @end itemize
 352
 353 @node Outlining
 354 @chapter Outlining
 355
 356 nXML mode allows you to display all or part of a buffer as an
 357 outline, in a similar way to Emacs's outline mode.  An outline in nXML
 358 mode is based on recognizing two kinds of element: sections and
 359 headings.  There is one heading for every section and one section for
 360 every heading.  A section contains its heading as or within its first
 361 child element.  A section also contains its subordinate sections (its
 362 subsections).  The text content of a section consists of anything in a
 363 section that is neither a subsection nor a heading.
 364
 365 Note that this is a different model from that used by XHTML.
 366 nXML mode's outline support will not be useful for XHTML unless you
 367 adopt a convention of adding a @code{div} to enclose each
 368 section, rather than having sections implicitly delimited by different
 369 @code{h@var{n}} elements.  This limitation may be removed
 370 in a future version.
 371
 372 The variable @code{nxml-section-element-name-regexp} gives
 373 a regexp for the local names (i.e. the part of the name following any
 374 prefix) of section elements. The variable
 375 @code{nxml-heading-element-name-regexp} gives a regexp for the
 376 local names of heading elements. For an element to be recognized
 377 as a section
 378
 379 @itemize @bullet
 380 @item
 381 its start-tag must occur at the beginning of a line
 382 (possibly indented);
 383 @item
 384 its local name must match
 385 @code{nxml-section-element-name-regexp};
 386 @item
 387 either its first child element or a descendant of that
 388 first child element must have a local name that matches
 389 @code{nxml-heading-element-name-regexp}; the first such element
 390 is treated as the section's heading.
 391 @end itemize
 392
 393 @noindent
 394 You can customize these variables using @kbd{M-x
 395 customize-variable}.
 396
 397 There are three possible outline states for a section:
 398
 399 @itemize @bullet
 400 @item
 401 normal, showing everything, including its heading, text
 402 content and subsections; each subsection is displayed according to the
 403 state of that subsection;
 404 @item
 405 showing just its heading, with both its text content and
 406 its subsections hidden; all subsections are hidden regardless of their
 407 state;
 408 @item
 409 showing its heading and its subsections, with its text
 410 content hidden; each subsection is displayed according to the state of
 411 that subsection.
 412 @end itemize
 413
 414 In the last two states, where the text content is hidden, the
 415 heading is displayed specially, in an abbreviated form. An element
 416 like this:
 417
 418 @example
 419 <section>
 420 <title>Food</title>
 421 <para>There are many kinds of food.</para>
 422 </section>
 423 @end example
 424
 425 @noindent
 426 would be displayed on a single line like this:
 427
 428 @example
 429 <-section>Food...</>
 430 @end example
 431
 432 @noindent
 433 If there are hidden subsections, then a @code{+} will be used
 434 instead of a @code{-} like this:
 435
 436 @example
 437 <+section>Food...</>
 438 @end example
 439
 440 @noindent
 441 If there are non-hidden subsections, then the section will instead be
 442 displayed like this:
 443
 444 @example
 445 <-section>Food...
 446   <-section>Delicious Food...</>
 447   <-section>Distasteful Food...</>
 448 </-section>
 449 @end example
 450
 451 @noindent
 452 The heading is always displayed with an indent that corresponds to its
 453 depth in the outline, even it is not actually indented in the buffer.
 454 The variable @code{nxml-outline-child-indent} controls how much
 455 a subheading is indented with respect to its parent heading when the
 456 heading is being displayed specially.
 457
 458 Commands to change the outline state of sections are bound to
 459 key sequences that start with @kbd{C-c C-o} (@kbd{o} is
 460 mnemonic for outline).  The third and final key has been chosen to be
 461 consistent with outline mode.  In the following descriptions
 462 current section means the section containing point, or, more precisely,
 463 the innermost section containing the character immediately following
 464 point.
 465
 466 @itemize @bullet
 467 @item
 468 @kbd{C-c C-o C-a} shows all sections in the buffer
 469 normally.
 470 @item
 471 @kbd{C-c C-o C-t} hides the text content
 472 of all sections in the buffer.
 473 @item
 474 @kbd{C-c C-o C-c} hides the text content
 475 of the current section.
 476 @item
 477 @kbd{C-c C-o C-e} shows the text content
 478 of the current section.
 479 @item
 480 @kbd{C-c C-o C-d} hides the text content
 481 and subsections of the current section.
 482 @item
 483 @kbd{C-c C-o C-s} shows the current section
 484 and all its direct and indirect subsections normally.
 485 @item
 486 @kbd{C-c C-o C-k} shows the headings of the
 487 direct and indirect subsections of the current section.
 488 @item
 489 @kbd{C-c C-o C-l} hides the text content of the
 490 current section and of its direct and indirect
 491 subsections.
 492 @item
 493 @kbd{C-c C-o C-i} shows the headings of the
 494 direct subsections of the current section.
 495 @item
 496 @kbd{C-c C-o C-o} hides as much as possible without
 497 hiding the current section's text content; the headings of ancestor
 498 sections of the current section and their child section sections will
 499 not be hidden.
 500 @end itemize
 501
 502 When a heading is displayed specially, you can use
 503 @key{RET} in that heading to show the text content of the section
 504 in the same way as @kbd{C-c C-o C-e}.
 505
 506 You can also use the mouse to change the outline state:
 507 @kbd{S-mouse-2} hides the text content of a section in the same
 508 way as@kbd{C-c C-o C-c}; @kbd{mouse-2} on a specially
 509 displayed heading shows the text content of the section in the same
 510 way as @kbd{C-c C-o C-e}; @kbd{mouse-1} on a specially
 511 displayed start-tag toggles the display of subheadings on and
 512 off.
 513
 514 The outline state for each section is stored with the first
 515 character of the section (as a text property). Every command that
 516 changes the outline state of any section updates the display of the
 517 buffer so that each section is displayed correctly according to its
 518 outline state.  If the section structure is subsequently changed, then
 519 it is possible for the display to no longer correctly reflect the
 520 stored outline state. @kbd{C-c C-o C-r} can be used to refresh
 521 the display so it is correct again.
 522
 523 @node Locating a schema
 524 @chapter Locating a schema
 525
 526 nXML mode has a configurable set of rules to locate a schema for
 527 the file being edited.  The rules are contained in one or more schema
 528 locating files, which are XML documents.
 529
 530 The variable @samp{rng-schema-locating-files} specifies
 531 the list of the file-names of schema locating files that nXML mode
 532 should use.  The order of the list is significant: when file
 533 @var{x} occurs in the list before file @var{y} then rules
 534 from file @var{x} have precedence over rules from file
 535 @var{y}.  A filename specified in
 536 @samp{rng-schema-locating-files} may be relative. If so, it will
 537 be resolved relative to the document for which a schema is being
 538 located. It is not an error if relative file-names in
 539 @samp{rng-schema-locating-files} do not exist. You can use
 540 @kbd{M-x customize-variable @key{RET} rng-schema-locating-files
 541 @key{RET}} to customize the list of schema locating
 542 files.
 543
 544 By default, @samp{rng-schema-locating-files} list has two
 545 members: @samp{schemas.xml}, and
 546 @samp{@var{dist-dir}/schema/schemas.xml} where
 547 @samp{@var{dist-dir}} is the directory containing the nXML
 548 distribution. The first member will cause nXML mode to use a file
 549 @samp{schemas.xml} in the same directory as the document being
 550 edited if such a file exist.  The second member contains rules for the
 551 schemas that are included with the nXML distribution.
 552
 553 @menu
 554 * Commands for locating a schema::
 555 * Schema locating files::
 556 @end menu
 557
 558 @node Commands for locating a schema
 559 @section Commands for locating a schema
 560
 561 The command @kbd{C-c C-s C-w} will tell you what schema
 562 is currently being used.
 563
 564 The rules for locating a schema are applied automatically when
 565 you visit a file in nXML mode. However, if you have just created a new
 566 file and the schema cannot be inferred from the file-name, then this
 567 will not locate the right schema.  In this case, you should insert the
 568 start-tag of the root element and then use the command @kbd{C-c C-s
 569 C-a}, which reapplies the rules based on the current content of
 570 the document.  It is usually not necessary to insert the complete
 571 start-tag; often just @samp{<@var{name}} is
 572 enough.
 573
 574 If you want to use a schema that has not yet been added to the
 575 schema locating files, you can use the command @kbd{C-c C-s C-f}
 576 to manually select the file containing the schema for the document in
 577 current buffer.  Emacs will read the file-name of the schema from the
 578 minibuffer. After reading the file-name, Emacs will ask whether you
 579 wish to add a rule to a schema locating file that persistently
 580 associates the document with the selected schema.  The rule will be
 581 added to the first file in the list specified
 582 @samp{rng-schema-locating-files}; it will create the file if
 583 necessary, but will not create a directory. If the variable
 584 @samp{rng-schema-locating-files} has not been customized, this
 585 means that the rule will be added to the file @samp{schemas.xml}
 586 in the same directory as the document being edited.
 587
 588 The command @kbd{C-c C-s C-t} allows you to select a schema by
 589 specifying an identifier for the type of the document.  The schema
 590 locating files determine the available type identifiers and what
 591 schema is used for each type identifier. This is useful when it is
 592 impossible to infer the right schema from either the file-name or the
 593 content of the document, even though the schema is already in the
 594 schema locating file.  A situation in which this can occur is when
 595 there are multiple variants of a schema where all valid documents have
 596 the same document element.  For example, XHTML has Strict and
 597 Transitional variants.  In a situation like this, a schema locating file
 598 can define a type identifier for each variant. As with @kbd{C-c
 599 C-s C-f}, Emacs will ask whether you wish to add a rule to a schema
 600 locating file that persistently associates the document with the
 601 specified type identifier.
 602
 603 The command @kbd{C-c C-s C-l} adds a rule to a schema
 604 locating file that persistently associates the document with
 605 the schema that is currently being used.
 606
 607 @node Schema locating files
 608 @section Schema locating files
 609
 610 Each schema locating file specifies a list of rules.  The rules
 611 from each file are appended in order. To locate a schema each rule is
 612 applied in turn until a rule matches.  The first matching rule is then
 613 used to determine the schema.
 614
 615 Schema locating files are designed to be useful for other
 616 applications that need to locate a schema for a document. In fact,
 617 there is nothing specific to locating schemas in the design; it could
 618 equally well be used for locating a stylesheet.
 619
 620 @menu
 621 * Schema locating file syntax basics::
 622 * Using the document's URI to locate a schema::
 623 * Using the document element to locate a schema::
 624 * Using type identifiers in schema locating files::
 625 * Using multiple schema locating files::
 626 @end menu
 627
 628 @node Schema locating file syntax basics
 629 @subsection Schema locating file syntax basics
 630
 631 There is a schema for schema locating files in the file
 632 @samp{locate.rnc} in the schema directory.  Schema locating
 633 files must be valid with respect to this schema.
 634
 635 The document element of a schema locating file must be
 636 @samp{locatingRules} and the namespace URI must be
 637 @samp{http://thaiopensource.com/ns/locating-rules/1.0}.  The
 638 children of the document element specify rules. The order of the
 639 children is the same as the order of the rules.  Here's a complete
 640 example of a schema locating file:
 641
 642 @example
 643 <?xml version="1.0"?>
 644 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 645   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 646   <documentElement localName="book" uri="docbook.rnc"/>
 647 </locatingRules>
 648 @end example
 649
 650 @noindent
 651 This says to use the schema @samp{xhtml.rnc} for a document with
 652 namespace @samp{http://www.w3.org/1999/xhtml}, and to use the
 653 schema @samp{docbook.rnc} for a document whose local name is
 654 @samp{book}.  If the document element had both a namespace URI
 655 of @samp{http://www.w3.org/1999/xhtml} and a local name of
 656 @samp{book}, then the matching rule that comes first will be
 657 used and so the schema @samp{xhtml.rnc} would be used.  There is
 658 no precedence between different types of rule; the first matching rule
 659 of any type is used.
 660
 661 As usual with XML-related technologies, resources are identified
 662 by URIs.  The @samp{uri} attribute identifies the schema by
 663 specifying the URI.  The URI may be relative.  If so, it is resolved
 664 relative to the URI of the schema locating file that contains
 665 attribute. This means that if the value of @samp{uri} attribute
 666 does not contain a @samp{/}, then it will refer to a filename in
 667 the same directory as the schema locating file.
 668
 669 @node Using the document's URI to locate a schema
 670 @subsection Using the document's URI to locate a schema
 671
 672 A @samp{uri} rule locates a schema based on the URI of the
 673 document.  The @samp{uri} attribute specifies the URI of the
 674 schema.  The @samp{resource} attribute can be used to specify
 675 the schema for a particular document.  For example,
 676
 677 @example
 678 <uri resource="spec.xml" uri="docbook.rnc"/>
 679 @end example
 680
 681 @noindent
 682 specifies that the schema for @samp{spec.xml} is
 683 @samp{docbook.rnc}.
 684
 685 The @samp{pattern} attribute can be used instead of the
 686 @samp{resource} attribute to specify the schema for any document
 687 whose URI matches a pattern.  The pattern has the same syntax as an
 688 absolute or relative URI except that the path component of the URI can
 689 use a @samp{*} character to stand for zero or more characters
 690 within a path segment (i.e. any character other @samp{/}).
 691 Typically, the URI pattern looks like a relative URI, but, whereas a
 692 relative URI in the @samp{resource} attribute is resolved into a
 693 particular absolute URI using the base URI of the schema locating
 694 file, a relative URI pattern matches if it matches some number of
 695 complete path segments of the document's URI ending with the last path
 696 segment of the document's URI. For example,
 697
 698 @example
 699 <uri pattern="*.xsl" uri="xslt.rnc"/>
 700 @end example
 701
 702 @noindent
 703 specifies that the schema for documents with a URI whose path ends
 704 with @samp{.xsl} is @samp{xslt.rnc}.
 705
 706 A @samp{transformURI} rule locates a schema by
 707 transforming the URI of the document. The @samp{fromPattern}
 708 attribute specifies a URI pattern with the same meaning as the
 709 @samp{pattern} attribute of the @samp{uri} element.  The
 710 @samp{toPattern} attribute is a URI pattern that is used to
 711 generate the URI of the schema.  Each @samp{*} in the
 712 @samp{toPattern} is replaced by the string that matched the
 713 corresponding @samp{*} in the @samp{fromPattern}.  The
 714 resulting string is appended to the initial part of the document's URI
 715 that was not explicitly matched by the @samp{fromPattern}.  The
 716 rule matches only if the transformed URI identifies an existing
 717 resource.  For example, the rule
 718
 719 @example
 720 <transformURI fromPattern="*.xml" toPattern="*.rnc"/>
 721 @end example
 722
 723 @noindent
 724 would transform the URI @samp{file:///home/jjc/docs/spec.xml}
 725 into the URI @samp{file:///home/jjc/docs/spec.rnc}.  Thus, this
 726 rule specifies that to locate a schema for a document
 727 @samp{@var{foo}.xml}, Emacs should test whether a file
 728 @samp{@var{foo}.rnc} exists in the same directory as
 729 @samp{@var{foo}.xml}, and, if so, should use it as the
 730 schema.
 731
 732 @node Using the document element to locate a schema
 733 @subsection Using the document element to locate a schema
 734
 735 A @samp{documentElement} rule locates a schema based on
 736 the local name and prefix of the document element. For example, a rule
 737
 738 @example
 739 <documentElement prefix="xsl" localName="stylesheet" uri="xslt.rnc"/>
 740 @end example
 741
 742 @noindent
 743 specifies that when the name of the document element is
 744 @samp{xsl:stylesheet}, then @samp{xslt.rnc} should be used
 745 as the schema. Either the @samp{prefix} or
 746 @samp{localName} attribute may be omitted to allow any prefix or
 747 local name.
 748
 749 A @samp{namespace} rule locates a schema based on the
 750 namespace URI of the document element. For example, a rule
 751
 752 @example
 753 <namespace ns="http://www.w3.org/1999/XSL/Transform" uri="xslt.rnc"/>
 754 @end example
 755
 756 @noindent
 757 specifies that when the namespace URI of the document is
 758 @samp{http://www.w3.org/1999/XSL/Transform}, then
 759 @samp{xslt.rnc} should be used as the schema.
 760
 761 @node Using type identifiers in schema locating files
 762 @subsection Using type identifiers in schema locating files
 763
 764 Type identifiers allow a level of indirection in locating the
 765 schema for a document.  Instead of associating the document directly
 766 with a schema URI, the document is associated with a type identifier,
 767 which is in turn associated with a schema URI. nXML mode does not
 768 constrain the format of type identifiers.  They can be simply strings
 769 without any formal structure or they can be public identifiers or
 770 URIs.  Note that these type identifiers have nothing to do with the
 771 DOCTYPE declaration.  When comparing type identifiers, whitespace is
 772 normalized in the same way as with the @samp{xsd:token}
 773 datatype: leading and trailing whitespace is stripped; other sequences
 774 of whitespace are normalized to a single space character.
 775
 776 Each of the rules described in previous sections that uses a
 777 @samp{uri} attribute to specify a schema, can instead use a
 778 @samp{typeId} attribute to specify a type identifier.  The type
 779 identifier can be associated with a URI using a @samp{typeId}
 780 element. For example,
 781
 782 @example
 783 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 784   <namespace ns="http://www.w3.org/1999/xhtml" typeId="XHTML"/>
 785   <typeId id="XHTML" typeId="XHTML Strict"/>
 786   <typeId id="XHTML Strict" uri="xhtml-strict.rnc"/>
 787   <typeId id="XHTML Transitional" uri="xhtml-transitional.rnc"/>
 788 </locatingRules>
 789 @end example
 790
 791 @noindent
 792 declares three type identifiers @samp{XHTML} (representing the
 793 default variant of XHTML to be used), @samp{XHTML Strict} and
 794 @samp{XHTML Transitional}.  Such a schema locating file would
 795 use @samp{xhtml-strict.rnc} for a document whose namespace is
 796 @samp{http://www.w3.org/1999/xhtml}.  But it is considerably
 797 more flexible than a schema locating file that simply specified
 798
 799 @example
 800 <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml-strict.rnc"/>
 801 @end example
 802
 803 @noindent
 804 A user can easily use @kbd{C-c C-s C-t} to select between XHTML
 805 Strict and XHTML Transitional. Also, a user can easily add a catalog
 806
 807 @example
 808 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 809   <typeId id="XHTML" typeId="XHTML Transitional"/>
 810 </locatingRules>
 811 @end example
 812
 813 @noindent
 814 that makes the default variant of XHTML be XHTML Transitional.
 815
 816 @node Using multiple schema locating files
 817 @subsection Using multiple schema locating files
 818
 819 The @samp{include} element includes rules from another
 820 schema locating file.  The behavior is exactly as if the rules from
 821 that file were included in place of the @samp{include} element.
 822 Relative URIs are resolved into absolute URIs before the inclusion is
 823 performed. For example,
 824
 825 @example
 826 <include rules="../rules.xml"/>
 827 @end example
 828
 829 @noindent
 830 includes the rules from @samp{rules.xml}.
 831
 832 The process of locating a schema takes as input a list of schema
 833 locating files.  The rules in all these files and in the files they
 834 include are resolved into a single list of rules, which are applied
 835 strictly in order.  Sometimes this order is not what is needed.
 836 For example, suppose you have two schema locating files, a private
 837 file
 838
 839 @example
 840 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 841   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 842 </locatingRules>
 843 @end example
 844
 845 @noindent
 846 followed by a public file
 847
 848 @example
 849 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 850   <transformURI pathSuffix=".xml" replacePathSuffix=".rnc"/>
 851   <namespace ns="http://www.w3.org/1999/XSL/Transform" typeId="XSLT"/>
 852 </locatingRules>
 853 @end example
 854
 855 @noindent
 856 The effect of these two files is that the XHTML @samp{namespace}
 857 rule takes precedence over the @samp{transformURI} rule, which
 858 is almost certainly not what is needed.  This can be solved by adding
 859 an @samp{applyFollowingRules} to the private file.
 860
 861 @example
 862 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 863   <applyFollowingRules ruleType="transformURI"/>
 864   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 865 </locatingRules>
 866 @end example
 867
 868 @node DTDs
 869 @chapter DTDs
 870
 871 nxml-mode is designed to support the creation of standalone XML
 872 documents that do not depend on a DTD.  Although it is common practice
 873 to insert a DOCTYPE declaration referencing an external DTD, this has
 874 undesirable side-effects.  It means that the document is no longer
 875 self-contained. It also means that different XML parsers may interpret
 876 the document in different ways, since the XML Recommendation does not
 877 require XML parsers to read the DTD.  With DTDs, it was impractical to
 878 get validation without using an external DTD or reference to an
 879 parameter entity.  With RELAX NG and other schema languages, you can
 880 simultaneously get the benefits of validation and standalone XML
 881 documents.  Therefore, I recommend that you do not reference an
 882 external DOCTYPE in your XML documents.
 883
 884 One problem is entities for characters. Typically, as well as
 885 providing validation, DTDs also provide a set of character entities
 886 for documents to use. Schemas cannot provide this functionality,
 887 because schema validation happens after XML parsing.  The recommended
 888 solution is to either use the Unicode characters directly, or, if this
 889 is impractical, use character references.  nXML mode supports this by
 890 providing commands for entering characters and character references
 891 using the Unicode names, and can display the glyph corresponding to a
 892 character reference.
 893
 894 @node Limitations
 895 @chapter Limitations
 896
 897 nXML mode has some limitations:
 898
 899 @itemize @bullet
 900 @item
 901 DTD support is limited.  Internal parsed general entities declared
 902 in the internal subset are supported provided they do not contain
 903 elements. Other usage of DTDs is ignored.
 904 @item
 905 The restrictions on RELAX NG schemas in section 7 of the RELAX NG
 906 specification are not enforced.
 907 @end itemize
 908
 909 @bye