doc/misc/nxml-mode.texi

   1 \input texinfo @c -*- texinfo -*-
   2 @c %**start of header
   3 @setfilename ../../info/nxml-mode
   4 @settitle nXML Mode
   5 @c %**end of header
   6
   7 @copying
   8 This manual documents nxml-mode, an Emacs major mode for editing
   9 XML with RELAX NG support.
  10
  11 Copyright @copyright{} 2007-2011
  12 Free Software Foundation, Inc.
  13
  14 @quotation
  15 Permission is granted to copy, distribute and/or modify this document
  16 under the terms of the GNU Free Documentation License, Version 1.3 or
  17 any later version published by the Free Software Foundation; with no
  18 Invariant Sections, with the Front-Cover texts being ``A GNU
  19 Manual,'' and with the Back-Cover Texts as in (a) below.  A copy of the
  20 license is included in the section entitled ``GNU Free Documentation
  21 License'' in the Emacs manual.
  22
  23 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
  24 modify this GNU manual.  Buying copies from the FSF supports it in
  25 developing GNU and promoting software freedom.''
  26
  27 This document is part of a collection distributed under the GNU Free
  28 Documentation License.  If you want to distribute this document
  29 separately from the collection, you can do so by adding a copy of the
  30 license to the document, as described in section 6 of the license.
  31 @end quotation
  32 @end copying
  33
  34 @dircategory Emacs editing modes
  35 @direntry
  36 * nXML Mode: (nxml-mode).       XML editing mode with RELAX NG support.
  37 @end direntry
  38
  39 @node Top
  40 @top nXML Mode
  41
  42 @insertcopying
  43
  44 This manual is not yet complete.
  45
  46 @menu
  47 * Introduction::
  48 * Completion::
  49 * Inserting end-tags::
  50 * Paragraphs::
  51 * Outlining::
  52 * Locating a schema::
  53 * DTDs::
  54 * Limitations::
  55 @end menu
  56
  57 @node Introduction
  58 @chapter Introduction
  59
  60 nXML mode is an Emacs major-mode for editing XML documents.  It supports
  61 editing well-formed XML documents, and provides schema-sensitive editing
  62 using RELAX NG Compact Syntax.  To get started, visit a file containing an
  63 XML document, and, if necessary, use @kbd{M-x nxml-mode} to switch to nXML
  64 mode.  By default, @code{auto-mode-alist} and @code{magic-fallback-alist}
  65 put buffers in nXML mode if they have recognizable XML content or file
  66 extensions.  You may wish to customize the settings, for example to
  67 recognize different file extensions.
  68
  69 Once in nXML mode, you can type @kbd{C-h m} for basic information on the
  70 mode.
  71
  72 The @file{etc/nxml} directory in the Emacs distribution contains some data
  73 files used by nXML mode, and includes two files (@file{test-valid.xml} and
  74 @file{test-invalid.xml}) that provide examples of valid and invalid XML
  75 documents.
  76
  77 To get validation and schema-sensitive editing, you need a RELAX NG Compact
  78 Syntax (RNC) schema for your document (@pxref{Locating a schema}).  The
  79 @file{etc/schema} directory includes some schemas for popular document
  80 types.  See @url{http://relaxng.org/} for more information on RELAX NG.
  81 You can use the @samp{Trang} program from
  82 @url{http://www.thaiopensource.com/relaxng/trang.html} to
  83 automatically create RNC schemas.  This program can:
  84
  85 @itemize @bullet
  86 @item
  87 infer an RNC schema from an instance document;
  88 @item
  89 convert a DTD to an RNC schema;
  90 @item
  91 convert a RELAX NG XML syntax schema to an RNC schema.
  92 @end itemize
  93
  94 @noindent To convert a RELAX NG XML syntax (@samp{.rng}) schema to a RNC
  95 one, you can also use the XSLT stylesheet from
  96 @url{http://www.pantor.com/download.html}.
  97
  98 To convert a W3C XML Schema to an RNC schema, you need first to convert it
  99 to RELAX NG XML syntax using the RELAX NG converter tool @code{rngconv}
 100 (built on top of MSV).  See @url{https://github.com/kohsuke/msv}
 101 and @url{https://msv.dev.java.net/}.
 102
 103 For historical discussions only, see the mailing list archives at
 104 @url{http://groups.yahoo.com/group/emacs-nxml-mode/}.  Please make all new
 105 discussions on the @samp{help-gnu-emacs} and @samp{emacs-devel} mailing
 106 lists.  Report any bugs with @kbd{M-x report-emacs-bug}.
 107
 108
 109 @node Completion
 110 @chapter Completion
 111
 112 Apart from real-time validation, the most important feature that
 113 nxml-mode provides for assisting in document creation is "completion".
 114 Completion assists the user in inserting characters at point, based on
 115 knowledge of the schema and on the contents of the buffer before
 116 point.
 117
 118 The traditional GNU Emacs key combination for completion in a
 119 buffer is @kbd{M-@key{TAB}}. However, many window systems
 120 and window managers use this key combination themselves (typically for
 121 switching between windows) and do not pass it to applications. It's
 122 hard to find key combinations in GNU Emacs that are both easy to type
 123 and not taken by something else.  @kbd{C-@key{RET}} (i.e.
 124 pressing the Enter or Return key, while the Ctrl key is held down) is
 125 available.  It won't be available on a traditional terminal (because
 126 it is indistinguishable from Return), but it will work with a window
 127 system.  Therefore we adopt the following solution by default: use
 128 @kbd{C-@key{RET}} when there's a window system and
 129 @kbd{M-@key{TAB}} when there's not.  In the following, I
 130 will assume that a window system is being used and will therefore
 131 refer to @kbd{C-@key{RET}}.
 132
 133 Completion works by examining the symbol preceding point.  This
 134 is the symbol to be completed. The symbol to be completed may be the
 135 empty. Completion considers what symbols starting with the symbol to
 136 be completed would be valid replacements for the symbol to be
 137 completed, given the schema and the contents of the buffer before
 138 point.  These symbols are the possible completions.  An example may
 139 make this clearer.  Suppose the buffer looks like this (where @point{}
 140 indicates point):
 141
 142 @example
 143 <html xmlns="http://www.w3.org/1999/xhtml">
 144 <h@point{}
 145 @end example
 146
 147 @noindent
 148 and the schema is XHTML.  In this context, the symbol to be completed
 149 is @samp{h}.  The possible completions consist of just
 150 @samp{head}.  Another example, is
 151
 152 @example
 153 <html xmlns="http://www.w3.org/1999/xhtml">
 154 <head>
 155 <@point{}
 156 @end example
 157
 158 @noindent
 159 In this case, the symbol to be completed is empty, and the possible
 160 completions are @samp{base}, @samp{isindex},
 161 @samp{link}, @samp{meta}, @samp{script},
 162 @samp{style}, @samp{title}.  Another example is:
 163
 164 @example
 165 <html xmlns="@point{}
 166 @end example
 167
 168 @noindent
 169 In this case, the symbol to be completed is empty, and the possible
 170 completions are just @samp{http://www.w3.org/1999/xhtml}.
 171
 172 When you type @kbd{C-@key{RET}}, what happens depends
 173 on what the set of possible completions are.
 174
 175 @itemize @bullet
 176 @item
 177 If the set of completions is empty, nothing
 178 happens.
 179 @item
 180 If there is one possible completion, then that completion is
 181 inserted, together with any following characters that are
 182 required. For example, in this case:
 183
 184 @example
 185 <html xmlns="http://www.w3.org/1999/xhtml">
 186 <@point{}
 187 @end example
 188
 189 @noindent
 190 @kbd{C-@key{RET}} will yield
 191
 192 @example
 193 <html xmlns="http://www.w3.org/1999/xhtml">
 194 <head@point{}
 195 @end example
 196 @item
 197 If there is more than one possible completion, but all
 198 possible completions share a common non-empty prefix, then that prefix
 199 is inserted. For example, suppose the buffer is:
 200
 201 @example
 202 <html x@point{}
 203 @end example
 204
 205 @noindent
 206 The symbol to be completed is @samp{x}. The possible completions
 207 are @samp{xmlns} and @samp{xml:lang}.  These share a
 208 common prefix of @samp{xml}.  Thus, @kbd{C-@key{RET}}
 209 will yield:
 210
 211 @example
 212 <html xml@point{}
 213 @end example
 214
 215 @noindent
 216 Typically, you would do @kbd{C-@key{RET}} again, which would
 217 have the result described in the next item.
 218 @item
 219 If there is more than one possible completion, but the
 220 possible completions do not share a non-empty prefix, then Emacs will
 221 prompt you to input the symbol in the minibuffer, initializing the
 222 minibuffer with the symbol to be completed, and popping up a buffer
 223 showing the possible completions.  You can now input the symbol to be
 224 inserted.  The symbol you input will be inserted in the buffer instead
 225 of the symbol to be completed.  Emacs will then insert any required
 226 characters after the symbol.  For example, if it contains:
 227
 228 @example
 229 <html xml@point{}
 230 @end example
 231
 232 @noindent
 233 Emacs will prompt you in the minibuffer with
 234
 235 @example
 236 Attribute: xml@point{}
 237 @end example
 238
 239 @noindent
 240 and the buffer showing possible completions will contain
 241
 242 @example
 243 Possible completions are:
 244 xml:lang                           xmlns
 245 @end example
 246
 247 @noindent
 248 If you input @kbd{xmlns}, the result will be:
 249
 250 @example
 251 <html xmlns="@point{}
 252 @end example
 253
 254 @noindent
 255 (If you do @kbd{C-@key{RET}} again, the namespace URI will
 256 be inserted. Should that happen automatically?)
 257 @end itemize
 258
 259 @node Inserting end-tags
 260 @chapter Inserting end-tags
 261
 262 The main redundancy in XML syntax is end-tags.  nxml-mode provides
 263 several ways to make it easier to enter end-tags.  You can use all of
 264 these without a schema.
 265
 266 You can use @kbd{C-@key{RET}} after @samp{</}
 267 to complete the rest of the end-tag.
 268
 269 @kbd{C-c C-f} inserts an end-tag for the element containing
 270 point. This command is useful when you want to input the start-tag,
 271 then input the content and finally input the end-tag. The @samp{f}
 272 is mnemonic for finish.
 273
 274 If you want to keep tags balanced and input the end-tag at the
 275 same time as the start-tag, before inputting the content, then you can
 276 use @kbd{C-c C-i}. This inserts a @samp{>}, then inserts
 277 the end-tag and leaves point before the end-tag.  @kbd{C-c C-b}
 278 is similar but more convenient for block-level elements: it puts the
 279 start-tag, point and the end-tag on successive lines, appropriately
 280 indented. The @samp{i} is mnemonic for inline and the
 281 @samp{b} is mnemonic for block.
 282
 283 Finally, you can customize nxml-mode so that @kbd{/}
 284 automatically inserts the rest of the end-tag when it occurs after
 285 @samp{<}, by doing
 286
 287 @display
 288 @kbd{M-x customize-variable @key{RET} nxml-slash-auto-complete-flag @key{RET}}
 289 @end display
 290
 291 @noindent
 292 and then following the instructions in the displayed buffer.
 293
 294 @node Paragraphs
 295 @chapter Paragraphs
 296
 297 Emacs has several commands that operate on paragraphs, most
 298 notably @kbd{M-q}. nXML mode redefines these to work in a way
 299 that is useful for XML.  The exact rules that are used to find the
 300 beginning and end of a paragraph are complicated; they are designed
 301 mainly to ensure that @kbd{M-q} does the right thing.
 302
 303 A paragraph consists of one or more complete, consecutive lines.
 304 A group of lines is not considered a paragraph unless it contains some
 305 non-whitespace characters between tags or inside comments.  A blank
 306 line separates paragraphs.  A single tag on a line by itself also
 307 separates paragraphs.  More precisely, if one tag together with any
 308 leading and trailing whitespace completely occupy one or more lines,
 309 then those lines will not be included in any paragraph.
 310
 311 A start-tag at the beginning of the line (possibly indented) may
 312 be treated as starting a paragraph.  Similarly, an end-tag at the end
 313 of the line may be treated as ending a paragraph. The following rules
 314 are used to determine whether such a tag is in fact treated as a
 315 paragraph boundary:
 316
 317 @itemize @bullet
 318 @item
 319 If the schema does not allow text at that point, then it
 320 is a paragraph boundary.
 321 @item
 322 If the end-tag corresponding to the start-tag is not at
 323 the end of its line, or the start-tag corresponding to the end-tag is
 324 not at the beginning of its line, then it is not a paragraph
 325 boundary. For example, in
 326
 327 @example
 328 <p>This is a paragraph with an
 329 <emph>emphasized</emph> phrase.
 330 @end example
 331
 332 @noindent
 333 the @samp{<emph>} start-tag would not be considered as
 334 starting a paragraph, because its corresponding end-tag is not at the
 335 end of the line.
 336 @item
 337 If there is text that is a sibling in element tree, then
 338 it is not a paragraph boundary.  For example, in
 339
 340 @example
 341 <p>This is a paragraph with an
 342 <emph>emphasized phrase that takes one source line</emph>
 343 @end example
 344
 345 @noindent
 346 the @samp{<emph>} start-tag would not be considered as
 347 starting a paragraph, even though its end-tag is at the end of its
 348 line, because there the text @samp{This is a paragraph with an}
 349 is a sibling of the @samp{emph} element.
 350 @item
 351 Otherwise, it is a paragraph boundary.
 352 @end itemize
 353
 354 @node Outlining
 355 @chapter Outlining
 356
 357 nXML mode allows you to display all or part of a buffer as an
 358 outline, in a similar way to Emacs' outline mode.  An outline in nXML
 359 mode is based on recognizing two kinds of element: sections and
 360 headings.  There is one heading for every section and one section for
 361 every heading.  A section contains its heading as or within its first
 362 child element.  A section also contains its subordinate sections (its
 363 subsections).  The text content of a section consists of anything in a
 364 section that is neither a subsection nor a heading.
 365
 366 Note that this is a different model from that used by XHTML.
 367 nXML mode's outline support will not be useful for XHTML unless you
 368 adopt a convention of adding a @code{div} to enclose each
 369 section, rather than having sections implicitly delimited by different
 370 @code{h@var{n}} elements.  This limitation may be removed
 371 in a future version.
 372
 373 The variable @code{nxml-section-element-name-regexp} gives
 374 a regexp for the local names (i.e. the part of the name following any
 375 prefix) of section elements. The variable
 376 @code{nxml-heading-element-name-regexp} gives a regexp for the
 377 local names of heading elements. For an element to be recognized
 378 as a section
 379
 380 @itemize @bullet
 381 @item
 382 its start-tag must occur at the beginning of a line
 383 (possibly indented);
 384 @item
 385 its local name must match
 386 @code{nxml-section-element-name-regexp};
 387 @item
 388 either its first child element or a descendant of that
 389 first child element must have a local name that matches
 390 @code{nxml-heading-element-name-regexp}; the first such element
 391 is treated as the section's heading.
 392 @end itemize
 393
 394 @noindent
 395 You can customize these variables using @kbd{M-x
 396 customize-variable}.
 397
 398 There are three possible outline states for a section:
 399
 400 @itemize @bullet
 401 @item
 402 normal, showing everything, including its heading, text
 403 content and subsections; each subsection is displayed according to the
 404 state of that subsection;
 405 @item
 406 showing just its heading, with both its text content and
 407 its subsections hidden; all subsections are hidden regardless of their
 408 state;
 409 @item
 410 showing its heading and its subsections, with its text
 411 content hidden; each subsection is displayed according to the state of
 412 that subsection.
 413 @end itemize
 414
 415 In the last two states, where the text content is hidden, the
 416 heading is displayed specially, in an abbreviated form. An element
 417 like this:
 418
 419 @example
 420 <section>
 421 <title>Food</title>
 422 <para>There are many kinds of food.</para>
 423 </section>
 424 @end example
 425
 426 @noindent
 427 would be displayed on a single line like this:
 428
 429 @example
 430 <-section>Food...</>
 431 @end example
 432
 433 @noindent
 434 If there are hidden subsections, then a @code{+} will be used
 435 instead of a @code{-} like this:
 436
 437 @example
 438 <+section>Food...</>
 439 @end example
 440
 441 @noindent
 442 If there are non-hidden subsections, then the section will instead be
 443 displayed like this:
 444
 445 @example
 446 <-section>Food...
 447   <-section>Delicious Food...</>
 448   <-section>Distasteful Food...</>
 449 </-section>
 450 @end example
 451
 452 @noindent
 453 The heading is always displayed with an indent that corresponds to its
 454 depth in the outline, even it is not actually indented in the buffer.
 455 The variable @code{nxml-outline-child-indent} controls how much
 456 a subheading is indented with respect to its parent heading when the
 457 heading is being displayed specially.
 458
 459 Commands to change the outline state of sections are bound to
 460 key sequences that start with @kbd{C-c C-o} (@kbd{o} is
 461 mnemonic for outline).  The third and final key has been chosen to be
 462 consistent with outline mode.  In the following descriptions
 463 current section means the section containing point, or, more precisely,
 464 the innermost section containing the character immediately following
 465 point.
 466
 467 @itemize @bullet
 468 @item
 469 @kbd{C-c C-o C-a} shows all sections in the buffer
 470 normally.
 471 @item
 472 @kbd{C-c C-o C-t} hides the text content
 473 of all sections in the buffer.
 474 @item
 475 @kbd{C-c C-o C-c} hides the text content
 476 of the current section.
 477 @item
 478 @kbd{C-c C-o C-e} shows the text content
 479 of the current section.
 480 @item
 481 @kbd{C-c C-o C-d} hides the text content
 482 and subsections of the current section.
 483 @item
 484 @kbd{C-c C-o C-s} shows the current section
 485 and all its direct and indirect subsections normally.
 486 @item
 487 @kbd{C-c C-o C-k} shows the headings of the
 488 direct and indirect subsections of the current section.
 489 @item
 490 @kbd{C-c C-o C-l} hides the text content of the
 491 current section and of its direct and indirect
 492 subsections.
 493 @item
 494 @kbd{C-c C-o C-i} shows the headings of the
 495 direct subsections of the current section.
 496 @item
 497 @kbd{C-c C-o C-o} hides as much as possible without
 498 hiding the current section's text content; the headings of ancestor
 499 sections of the current section and their child section sections will
 500 not be hidden.
 501 @end itemize
 502
 503 When a heading is displayed specially, you can use
 504 @key{RET} in that heading to show the text content of the section
 505 in the same way as @kbd{C-c C-o C-e}.
 506
 507 You can also use the mouse to change the outline state:
 508 @kbd{S-mouse-2} hides the text content of a section in the same
 509 way as@kbd{C-c C-o C-c}; @kbd{mouse-2} on a specially
 510 displayed heading shows the text content of the section in the same
 511 way as @kbd{C-c C-o C-e}; @kbd{mouse-1} on a specially
 512 displayed start-tag toggles the display of subheadings on and
 513 off.
 514
 515 The outline state for each section is stored with the first
 516 character of the section (as a text property). Every command that
 517 changes the outline state of any section updates the display of the
 518 buffer so that each section is displayed correctly according to its
 519 outline state.  If the section structure is subsequently changed, then
 520 it is possible for the display to no longer correctly reflect the
 521 stored outline state. @kbd{C-c C-o C-r} can be used to refresh
 522 the display so it is correct again.
 523
 524 @node Locating a schema
 525 @chapter Locating a schema
 526
 527 nXML mode has a configurable set of rules to locate a schema for
 528 the file being edited.  The rules are contained in one or more schema
 529 locating files, which are XML documents.
 530
 531 The variable @samp{rng-schema-locating-files} specifies
 532 the list of the file-names of schema locating files that nXML mode
 533 should use.  The order of the list is significant: when file
 534 @var{x} occurs in the list before file @var{y} then rules
 535 from file @var{x} have precedence over rules from file
 536 @var{y}.  A filename specified in
 537 @samp{rng-schema-locating-files} may be relative. If so, it will
 538 be resolved relative to the document for which a schema is being
 539 located. It is not an error if relative file-names in
 540 @samp{rng-schema-locating-files} do not exist. You can use
 541 @kbd{M-x customize-variable @key{RET} rng-schema-locating-files
 542 @key{RET}} to customize the list of schema locating
 543 files.
 544
 545 By default, @samp{rng-schema-locating-files} list has two
 546 members: @samp{schemas.xml}, and
 547 @samp{@var{dist-dir}/schema/schemas.xml} where
 548 @samp{@var{dist-dir}} is the directory containing the nXML
 549 distribution. The first member will cause nXML mode to use a file
 550 @samp{schemas.xml} in the same directory as the document being
 551 edited if such a file exist.  The second member contains rules for the
 552 schemas that are included with the nXML distribution.
 553
 554 @menu
 555 * Commands for locating a schema::
 556 * Schema locating files::
 557 @end menu
 558
 559 @node Commands for locating a schema
 560 @section Commands for locating a schema
 561
 562 The command @kbd{C-c C-s C-w} will tell you what schema
 563 is currently being used.
 564
 565 The rules for locating a schema are applied automatically when
 566 you visit a file in nXML mode. However, if you have just created a new
 567 file and the schema cannot be inferred from the file-name, then this
 568 will not locate the right schema.  In this case, you should insert the
 569 start-tag of the root element and then use the command @kbd{C-c C-s
 570 C-a}, which reapplies the rules based on the current content of
 571 the document.  It is usually not necessary to insert the complete
 572 start-tag; often just @samp{<@var{name}} is
 573 enough.
 574
 575 If you want to use a schema that has not yet been added to the
 576 schema locating files, you can use the command @kbd{C-c C-s C-f}
 577 to manually select the file containing the schema for the document in
 578 current buffer.  Emacs will read the file-name of the schema from the
 579 minibuffer. After reading the file-name, Emacs will ask whether you
 580 wish to add a rule to a schema locating file that persistently
 581 associates the document with the selected schema.  The rule will be
 582 added to the first file in the list specified
 583 @samp{rng-schema-locating-files}; it will create the file if
 584 necessary, but will not create a directory. If the variable
 585 @samp{rng-schema-locating-files} has not been customized, this
 586 means that the rule will be added to the file @samp{schemas.xml}
 587 in the same directory as the document being edited.
 588
 589 The command @kbd{C-c C-s C-t} allows you to select a schema by
 590 specifying an identifier for the type of the document.  The schema
 591 locating files determine the available type identifiers and what
 592 schema is used for each type identifier. This is useful when it is
 593 impossible to infer the right schema from either the file-name or the
 594 content of the document, even though the schema is already in the
 595 schema locating file.  A situation in which this can occur is when
 596 there are multiple variants of a schema where all valid documents have
 597 the same document element.  For example, XHTML has Strict and
 598 Transitional variants.  In a situation like this, a schema locating file
 599 can define a type identifier for each variant. As with @kbd{C-c
 600 C-s C-f}, Emacs will ask whether you wish to add a rule to a schema
 601 locating file that persistently associates the document with the
 602 specified type identifier.
 603
 604 The command @kbd{C-c C-s C-l} adds a rule to a schema
 605 locating file that persistently associates the document with
 606 the schema that is currently being used.
 607
 608 @node Schema locating files
 609 @section Schema locating files
 610
 611 Each schema locating file specifies a list of rules.  The rules
 612 from each file are appended in order. To locate a schema each rule is
 613 applied in turn until a rule matches.  The first matching rule is then
 614 used to determine the schema.
 615
 616 Schema locating files are designed to be useful for other
 617 applications that need to locate a schema for a document. In fact,
 618 there is nothing specific to locating schemas in the design; it could
 619 equally well be used for locating a stylesheet.
 620
 621 @menu
 622 * Schema locating file syntax basics::
 623 * Using the document's URI to locate a schema::
 624 * Using the document element to locate a schema::
 625 * Using type identifiers in schema locating files::
 626 * Using multiple schema locating files::
 627 @end menu
 628
 629 @node Schema locating file syntax basics
 630 @subsection Schema locating file syntax basics
 631
 632 There is a schema for schema locating files in the file
 633 @samp{locate.rnc} in the schema directory.  Schema locating
 634 files must be valid with respect to this schema.
 635
 636 The document element of a schema locating file must be
 637 @samp{locatingRules} and the namespace URI must be
 638 @samp{http://thaiopensource.com/ns/locating-rules/1.0}.  The
 639 children of the document element specify rules. The order of the
 640 children is the same as the order of the rules.  Here's a complete
 641 example of a schema locating file:
 642
 643 @example
 644 <?xml version="1.0"?>
 645 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 646   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 647   <documentElement localName="book" uri="docbook.rnc"/>
 648 </locatingRules>
 649 @end example
 650
 651 @noindent
 652 This says to use the schema @samp{xhtml.rnc} for a document with
 653 namespace @samp{http://www.w3.org/1999/xhtml}, and to use the
 654 schema @samp{docbook.rnc} for a document whose local name is
 655 @samp{book}.  If the document element had both a namespace URI
 656 of @samp{http://www.w3.org/1999/xhtml} and a local name of
 657 @samp{book}, then the matching rule that comes first will be
 658 used and so the schema @samp{xhtml.rnc} would be used.  There is
 659 no precedence between different types of rule; the first matching rule
 660 of any type is used.
 661
 662 As usual with XML-related technologies, resources are identified
 663 by URIs.  The @samp{uri} attribute identifies the schema by
 664 specifying the URI.  The URI may be relative.  If so, it is resolved
 665 relative to the URI of the schema locating file that contains
 666 attribute. This means that if the value of @samp{uri} attribute
 667 does not contain a @samp{/}, then it will refer to a filename in
 668 the same directory as the schema locating file.
 669
 670 @node Using the document's URI to locate a schema
 671 @subsection Using the document's URI to locate a schema
 672
 673 A @samp{uri} rule locates a schema based on the URI of the
 674 document.  The @samp{uri} attribute specifies the URI of the
 675 schema.  The @samp{resource} attribute can be used to specify
 676 the schema for a particular document.  For example,
 677
 678 @example
 679 <uri resource="spec.xml" uri="docbook.rnc"/>
 680 @end example
 681
 682 @noindent
 683 specifies that the schema for @samp{spec.xml} is
 684 @samp{docbook.rnc}.
 685
 686 The @samp{pattern} attribute can be used instead of the
 687 @samp{resource} attribute to specify the schema for any document
 688 whose URI matches a pattern.  The pattern has the same syntax as an
 689 absolute or relative URI except that the path component of the URI can
 690 use a @samp{*} character to stand for zero or more characters
 691 within a path segment (i.e. any character other @samp{/}).
 692 Typically, the URI pattern looks like a relative URI, but, whereas a
 693 relative URI in the @samp{resource} attribute is resolved into a
 694 particular absolute URI using the base URI of the schema locating
 695 file, a relative URI pattern matches if it matches some number of
 696 complete path segments of the document's URI ending with the last path
 697 segment of the document's URI. For example,
 698
 699 @example
 700 <uri pattern="*.xsl" uri="xslt.rnc"/>
 701 @end example
 702
 703 @noindent
 704 specifies that the schema for documents with a URI whose path ends
 705 with @samp{.xsl} is @samp{xslt.rnc}.
 706
 707 A @samp{transformURI} rule locates a schema by
 708 transforming the URI of the document. The @samp{fromPattern}
 709 attribute specifies a URI pattern with the same meaning as the
 710 @samp{pattern} attribute of the @samp{uri} element.  The
 711 @samp{toPattern} attribute is a URI pattern that is used to
 712 generate the URI of the schema.  Each @samp{*} in the
 713 @samp{toPattern} is replaced by the string that matched the
 714 corresponding @samp{*} in the @samp{fromPattern}.  The
 715 resulting string is appended to the initial part of the document's URI
 716 that was not explicitly matched by the @samp{fromPattern}.  The
 717 rule matches only if the transformed URI identifies an existing
 718 resource.  For example, the rule
 719
 720 @example
 721 <transformURI fromPattern="*.xml" toPattern="*.rnc"/>
 722 @end example
 723
 724 @noindent
 725 would transform the URI @samp{file:///home/jjc/docs/spec.xml}
 726 into the URI @samp{file:///home/jjc/docs/spec.rnc}.  Thus, this
 727 rule specifies that to locate a schema for a document
 728 @samp{@var{foo}.xml}, Emacs should test whether a file
 729 @samp{@var{foo}.rnc} exists in the same directory as
 730 @samp{@var{foo}.xml}, and, if so, should use it as the
 731 schema.
 732
 733 @node Using the document element to locate a schema
 734 @subsection Using the document element to locate a schema
 735
 736 A @samp{documentElement} rule locates a schema based on
 737 the local name and prefix of the document element. For example, a rule
 738
 739 @example
 740 <documentElement prefix="xsl" localName="stylesheet" uri="xslt.rnc"/>
 741 @end example
 742
 743 @noindent
 744 specifies that when the name of the document element is
 745 @samp{xsl:stylesheet}, then @samp{xslt.rnc} should be used
 746 as the schema. Either the @samp{prefix} or
 747 @samp{localName} attribute may be omitted to allow any prefix or
 748 local name.
 749
 750 A @samp{namespace} rule locates a schema based on the
 751 namespace URI of the document element. For example, a rule
 752
 753 @example
 754 <namespace ns="http://www.w3.org/1999/XSL/Transform" uri="xslt.rnc"/>
 755 @end example
 756
 757 @noindent
 758 specifies that when the namespace URI of the document is
 759 @samp{http://www.w3.org/1999/XSL/Transform}, then
 760 @samp{xslt.rnc} should be used as the schema.
 761
 762 @node Using type identifiers in schema locating files
 763 @subsection Using type identifiers in schema locating files
 764
 765 Type identifiers allow a level of indirection in locating the
 766 schema for a document.  Instead of associating the document directly
 767 with a schema URI, the document is associated with a type identifier,
 768 which is in turn associated with a schema URI. nXML mode does not
 769 constrain the format of type identifiers.  They can be simply strings
 770 without any formal structure or they can be public identifiers or
 771 URIs.  Note that these type identifiers have nothing to do with the
 772 DOCTYPE declaration.  When comparing type identifiers, whitespace is
 773 normalized in the same way as with the @samp{xsd:token}
 774 datatype: leading and trailing whitespace is stripped; other sequences
 775 of whitespace are normalized to a single space character.
 776
 777 Each of the rules described in previous sections that uses a
 778 @samp{uri} attribute to specify a schema, can instead use a
 779 @samp{typeId} attribute to specify a type identifier.  The type
 780 identifier can be associated with a URI using a @samp{typeId}
 781 element. For example,
 782
 783 @example
 784 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 785   <namespace ns="http://www.w3.org/1999/xhtml" typeId="XHTML"/>
 786   <typeId id="XHTML" typeId="XHTML Strict"/>
 787   <typeId id="XHTML Strict" uri="xhtml-strict.rnc"/>
 788   <typeId id="XHTML Transitional" uri="xhtml-transitional.rnc"/>
 789 </locatingRules>
 790 @end example
 791
 792 @noindent
 793 declares three type identifiers @samp{XHTML} (representing the
 794 default variant of XHTML to be used), @samp{XHTML Strict} and
 795 @samp{XHTML Transitional}.  Such a schema locating file would
 796 use @samp{xhtml-strict.rnc} for a document whose namespace is
 797 @samp{http://www.w3.org/1999/xhtml}.  But it is considerably
 798 more flexible than a schema locating file that simply specified
 799
 800 @example
 801 <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml-strict.rnc"/>
 802 @end example
 803
 804 @noindent
 805 A user can easily use @kbd{C-c C-s C-t} to select between XHTML
 806 Strict and XHTML Transitional. Also, a user can easily add a catalog
 807
 808 @example
 809 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 810   <typeId id="XHTML" typeId="XHTML Transitional"/>
 811 </locatingRules>
 812 @end example
 813
 814 @noindent
 815 that makes the default variant of XHTML be XHTML Transitional.
 816
 817 @node Using multiple schema locating files
 818 @subsection Using multiple schema locating files
 819
 820 The @samp{include} element includes rules from another
 821 schema locating file.  The behavior is exactly as if the rules from
 822 that file were included in place of the @samp{include} element.
 823 Relative URIs are resolved into absolute URIs before the inclusion is
 824 performed. For example,
 825
 826 @example
 827 <include rules="../rules.xml"/>
 828 @end example
 829
 830 @noindent
 831 includes the rules from @samp{rules.xml}.
 832
 833 The process of locating a schema takes as input a list of schema
 834 locating files.  The rules in all these files and in the files they
 835 include are resolved into a single list of rules, which are applied
 836 strictly in order.  Sometimes this order is not what is needed.
 837 For example, suppose you have two schema locating files, a private
 838 file
 839
 840 @example
 841 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 842   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 843 </locatingRules>
 844 @end example
 845
 846 @noindent
 847 followed by a public file
 848
 849 @example
 850 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 851   <transformURI pathSuffix=".xml" replacePathSuffix=".rnc"/>
 852   <namespace ns="http://www.w3.org/1999/XSL/Transform" typeId="XSLT"/>
 853 </locatingRules>
 854 @end example
 855
 856 @noindent
 857 The effect of these two files is that the XHTML @samp{namespace}
 858 rule takes precedence over the @samp{transformURI} rule, which
 859 is almost certainly not what is needed.  This can be solved by adding
 860 an @samp{applyFollowingRules} to the private file.
 861
 862 @example
 863 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 864   <applyFollowingRules ruleType="transformURI"/>
 865   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 866 </locatingRules>
 867 @end example
 868
 869 @node DTDs
 870 @chapter DTDs
 871
 872 nxml-mode is designed to support the creation of standalone XML
 873 documents that do not depend on a DTD.  Although it is common practice
 874 to insert a DOCTYPE declaration referencing an external DTD, this has
 875 undesirable side-effects.  It means that the document is no longer
 876 self-contained. It also means that different XML parsers may interpret
 877 the document in different ways, since the XML Recommendation does not
 878 require XML parsers to read the DTD.  With DTDs, it was impractical to
 879 get validation without using an external DTD or reference to an
 880 parameter entity.  With RELAX NG and other schema languages, you can
 881 simulataneously get the benefits of validation and standalone XML
 882 documents.  Therefore, I recommend that you do not reference an
 883 external DOCTYPE in your XML documents.
 884
 885 One problem is entities for characters. Typically, as well as
 886 providing validation, DTDs also provide a set of character entities
 887 for documents to use. Schemas cannot provide this functionality,
 888 because schema validation happens after XML parsing.  The recommended
 889 solution is to either use the Unicode characters directly, or, if this
 890 is impractical, use character references.  nXML mode supports this by
 891 providing commands for entering characters and character references
 892 using the Unicode names, and can display the glyph corresponding to a
 893 character reference.
 894
 895 @node Limitations
 896 @chapter Limitations
 897
 898 nXML mode has some limitations:
 899
 900 @itemize @bullet
 901 @item
 902 DTD support is limited.  Internal parsed general entities declared
 903 in the internal subset are supported provided they do not contain
 904 elements. Other usage of DTDs is ignored.
 905 @item
 906 The restrictions on RELAX NG schemas in section 7 of the RELAX NG
 907 specification are not enforced.
 908 @end itemize
 909
 910 @bye
 911