Don't say "buying copies from the FSF" for manuals they do not publish
[bpt/emacs.git] / doc / misc / nxml-mode.texi
CommitLineData
8cd39fb3
MH
1\input texinfo @c -*- texinfo -*-
2@c %**start of header
ac97a16b 3@setfilename ../../info/nxml-mode
8cd39fb3
MH
4@settitle nXML Mode
5@c %**end of header
6
20234d96 7@copying
3d439cd1 8This manual documents nXML mode, an Emacs major mode for editing
867d4bb3 9XML with RELAX NG support.
20234d96 10
44e97401 11Copyright @copyright{} 2007-2012 Free Software Foundation, Inc.
20234d96
GM
12
13@quotation
14Permission is granted to copy, distribute and/or modify this document
6a2c4aec 15under the terms of the GNU Free Documentation License, Version 1.3 or
20234d96
GM
16any later version published by the Free Software Foundation; with no
17Invariant Sections, with the Front-Cover texts being ``A GNU
18Manual,'' and with the Back-Cover Texts as in (a) below. A copy of the
19license is included in the section entitled ``GNU Free Documentation
20License'' in the Emacs manual.
21
6f093307 22(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
6bf430d1 23modify this GNU manual.''
20234d96
GM
24
25This document is part of a collection distributed under the GNU Free
26Documentation License. If you want to distribute this document
27separately from the collection, you can do so by adding a copy of the
28license to the document, as described in section 6 of the license.
29@end quotation
30@end copying
31
0c973505 32@dircategory Emacs editing modes
8cd39fb3 33@direntry
7aa579d9 34* nXML Mode: (nxml-mode). XML editing mode with RELAX NG support.
8cd39fb3
MH
35@end direntry
36
37@node Top
38@top nXML Mode
39
5dc584b5
KB
40@insertcopying
41
42This manual is not yet complete.
8cd39fb3
MH
43
44@menu
d3dfb185 45* Introduction::
867d4bb3
JB
46* Completion::
47* Inserting end-tags::
48* Paragraphs::
49* Outlining::
50* Locating a schema::
51* DTDs::
52* Limitations::
8cd39fb3
MH
53@end menu
54
d3dfb185
GM
55@node Introduction
56@chapter Introduction
57
58nXML mode is an Emacs major-mode for editing XML documents. It supports
59editing well-formed XML documents, and provides schema-sensitive editing
60using RELAX NG Compact Syntax. To get started, visit a file containing an
61XML document, and, if necessary, use @kbd{M-x nxml-mode} to switch to nXML
62mode. By default, @code{auto-mode-alist} and @code{magic-fallback-alist}
63put buffers in nXML mode if they have recognizable XML content or file
64extensions. You may wish to customize the settings, for example to
65recognize different file extensions.
66
67Once in nXML mode, you can type @kbd{C-h m} for basic information on the
68mode.
69
70The @file{etc/nxml} directory in the Emacs distribution contains some data
b218c6cd
EW
71files used by nXML mode, and includes two files (@file{test-valid.xml} and
72@file{test-invalid.xml}) that provide examples of valid and invalid XML
d3dfb185
GM
73documents.
74
75To get validation and schema-sensitive editing, you need a RELAX NG Compact
76Syntax (RNC) schema for your document (@pxref{Locating a schema}). The
77@file{etc/schema} directory includes some schemas for popular document
1df7defd 78types. See @url{http://relaxng.org/} for more information on RELAX NG@.
d3dfb185
GM
79You can use the @samp{Trang} program from
80@url{http://www.thaiopensource.com/relaxng/trang.html} to
81automatically create RNC schemas. This program can:
82
83@itemize @bullet
84@item
85infer an RNC schema from an instance document;
86@item
87convert a DTD to an RNC schema;
88@item
89convert a RELAX NG XML syntax schema to an RNC schema.
90@end itemize
91
92@noindent To convert a RELAX NG XML syntax (@samp{.rng}) schema to a RNC
93one, you can also use the XSLT stylesheet from
94@url{http://www.pantor.com/download.html}.
95
96To convert a W3C XML Schema to an RNC schema, you need first to convert it
4d47208a 97to RELAX NG XML syntax using the RELAX NG converter tool @code{rngconv}
d3dfb185
GM
98(built on top of MSV). See @url{https://github.com/kohsuke/msv}
99and @url{https://msv.dev.java.net/}.
100
101For historical discussions only, see the mailing list archives at
102@url{http://groups.yahoo.com/group/emacs-nxml-mode/}. Please make all new
103discussions on the @samp{help-gnu-emacs} and @samp{emacs-devel} mailing
104lists. Report any bugs with @kbd{M-x report-emacs-bug}.
105
106
8cd39fb3
MH
107@node Completion
108@chapter Completion
109
3d439cd1
CY
110Apart from real-time validation, the most important feature that nXML
111mode provides for assisting in document creation is "completion".
8cd39fb3
MH
112Completion assists the user in inserting characters at point, based on
113knowledge of the schema and on the contents of the buffer before
114point.
115
3d439cd1
CY
116nXML mode adapts the standard GNU Emacs command for completion in a
117buffer: @code{completion-at-point}, which is bound to @kbd{C-M-i} and
118@kbd{M-@key{TAB}}. Note that many window systems and window managers
119use @kbd{M-@key{TAB}} themselves (typically for switching between
120windows) and do not pass it to applications. In that case, you should
121type @kbd{C-M-i} or @kbd{@key{ESC} @key{TAB}} for completion, or bind
122@code{completion-at-point} to a key that is convenient for you. In
123the following, I will assume that you type @kbd{C-M-i}.
124
125nXML mode completion works by examining the symbol preceding point.
126This is the symbol to be completed. The symbol to be completed may be
127the empty. Completion considers what symbols starting with the symbol
128to be completed would be valid replacements for the symbol to be
8cd39fb3
MH
129completed, given the schema and the contents of the buffer before
130point. These symbols are the possible completions. An example may
131make this clearer. Suppose the buffer looks like this (where @point{}
132indicates point):
133
134@example
135<html xmlns="http://www.w3.org/1999/xhtml">
136<h@point{}
137@end example
138
139@noindent
1df7defd 140and the schema is XHTML@. In this context, the symbol to be completed
8cd39fb3
MH
141is @samp{h}. The possible completions consist of just
142@samp{head}. Another example, is
143
144@example
145<html xmlns="http://www.w3.org/1999/xhtml">
146<head>
147<@point{}
148@end example
149
150@noindent
151In this case, the symbol to be completed is empty, and the possible
152completions are @samp{base}, @samp{isindex},
153@samp{link}, @samp{meta}, @samp{script},
154@samp{style}, @samp{title}. Another example is:
155
156@example
157<html xmlns="@point{}
158@end example
159
160@noindent
161In this case, the symbol to be completed is empty, and the possible
162completions are just @samp{http://www.w3.org/1999/xhtml}.
163
3d439cd1 164When you type @kbd{C-M-i}, what happens depends
8cd39fb3
MH
165on what the set of possible completions are.
166
167@itemize @bullet
168@item
169If the set of completions is empty, nothing
170happens.
171@item
172If there is one possible completion, then that completion is
173inserted, together with any following characters that are
174required. For example, in this case:
175
176@example
177<html xmlns="http://www.w3.org/1999/xhtml">
178<@point{}
179@end example
180
181@noindent
3d439cd1 182@kbd{C-M-i} will yield
8cd39fb3
MH
183
184@example
185<html xmlns="http://www.w3.org/1999/xhtml">
186<head@point{}
187@end example
188@item
189If there is more than one possible completion, but all
190possible completions share a common non-empty prefix, then that prefix
191is inserted. For example, suppose the buffer is:
192
193@example
194<html x@point{}
195@end example
196
197@noindent
3d439cd1
CY
198The symbol to be completed is @samp{x}. The possible completions are
199@samp{xmlns} and @samp{xml:lang}. These share a common prefix of
200@samp{xml}. Thus, @kbd{C-M-i} will yield:
8cd39fb3
MH
201
202@example
203<html xml@point{}
204@end example
205
206@noindent
3d439cd1
CY
207Typically, you would do @kbd{C-M-i} again, which would have the result
208described in the next item.
8cd39fb3
MH
209@item
210If there is more than one possible completion, but the
211possible completions do not share a non-empty prefix, then Emacs will
212prompt you to input the symbol in the minibuffer, initializing the
213minibuffer with the symbol to be completed, and popping up a buffer
214showing the possible completions. You can now input the symbol to be
215inserted. The symbol you input will be inserted in the buffer instead
216of the symbol to be completed. Emacs will then insert any required
217characters after the symbol. For example, if it contains:
218
219@example
220<html xml@point{}
221@end example
222
223@noindent
224Emacs will prompt you in the minibuffer with
225
226@example
227Attribute: xml@point{}
228@end example
229
230@noindent
231and the buffer showing possible completions will contain
232
233@example
234Possible completions are:
b1fbbb32 235xml:lang xmlns
8cd39fb3
MH
236@end example
237
238@noindent
239If you input @kbd{xmlns}, the result will be:
240
241@example
242<html xmlns="@point{}
243@end example
244
245@noindent
3d439cd1
CY
246(If you do @kbd{C-M-i} again, the namespace URI will be
247inserted. Should that happen automatically?)
8cd39fb3
MH
248@end itemize
249
250@node Inserting end-tags
251@chapter Inserting end-tags
252
3d439cd1 253The main redundancy in XML syntax is end-tags. nXML mode provides
8cd39fb3
MH
254several ways to make it easier to enter end-tags. You can use all of
255these without a schema.
256
3d439cd1
CY
257You can use @kbd{C-M-i} after @samp{</} to complete the rest of the
258end-tag.
8cd39fb3
MH
259
260@kbd{C-c C-f} inserts an end-tag for the element containing
261point. This command is useful when you want to input the start-tag,
262then input the content and finally input the end-tag. The @samp{f}
263is mnemonic for finish.
264
265If you want to keep tags balanced and input the end-tag at the
266same time as the start-tag, before inputting the content, then you can
267use @kbd{C-c C-i}. This inserts a @samp{>}, then inserts
268the end-tag and leaves point before the end-tag. @kbd{C-c C-b}
269is similar but more convenient for block-level elements: it puts the
270start-tag, point and the end-tag on successive lines, appropriately
271indented. The @samp{i} is mnemonic for inline and the
272@samp{b} is mnemonic for block.
273
3d439cd1
CY
274Finally, you can customize nXML mode so that @kbd{/} automatically
275inserts the rest of the end-tag when it occurs after @samp{<}, by
276doing
8cd39fb3
MH
277
278@display
279@kbd{M-x customize-variable @key{RET} nxml-slash-auto-complete-flag @key{RET}}
280@end display
281
282@noindent
283and then following the instructions in the displayed buffer.
284
285@node Paragraphs
286@chapter Paragraphs
287
288Emacs has several commands that operate on paragraphs, most
289notably @kbd{M-q}. nXML mode redefines these to work in a way
1df7defd 290that is useful for XML@. The exact rules that are used to find the
8cd39fb3
MH
291beginning and end of a paragraph are complicated; they are designed
292mainly to ensure that @kbd{M-q} does the right thing.
293
294A paragraph consists of one or more complete, consecutive lines.
295A group of lines is not considered a paragraph unless it contains some
296non-whitespace characters between tags or inside comments. A blank
297line separates paragraphs. A single tag on a line by itself also
298separates paragraphs. More precisely, if one tag together with any
299leading and trailing whitespace completely occupy one or more lines,
300then those lines will not be included in any paragraph.
301
302A start-tag at the beginning of the line (possibly indented) may
303be treated as starting a paragraph. Similarly, an end-tag at the end
304of the line may be treated as ending a paragraph. The following rules
305are used to determine whether such a tag is in fact treated as a
306paragraph boundary:
307
308@itemize @bullet
309@item
310If the schema does not allow text at that point, then it
311is a paragraph boundary.
312@item
313If the end-tag corresponding to the start-tag is not at
314the end of its line, or the start-tag corresponding to the end-tag is
315not at the beginning of its line, then it is not a paragraph
316boundary. For example, in
317
318@example
319<p>This is a paragraph with an
320<emph>emphasized</emph> phrase.
321@end example
322
323@noindent
324the @samp{<emph>} start-tag would not be considered as
325starting a paragraph, because its corresponding end-tag is not at the
326end of the line.
327@item
328If there is text that is a sibling in element tree, then
329it is not a paragraph boundary. For example, in
330
331@example
332<p>This is a paragraph with an
333<emph>emphasized phrase that takes one source line</emph>
334@end example
335
336@noindent
337the @samp{<emph>} start-tag would not be considered as
338starting a paragraph, even though its end-tag is at the end of its
339line, because there the text @samp{This is a paragraph with an}
340is a sibling of the @samp{emph} element.
341@item
342Otherwise, it is a paragraph boundary.
343@end itemize
344
345@node Outlining
346@chapter Outlining
347
348nXML mode allows you to display all or part of a buffer as an
44e97401 349outline, in a similar way to Emacs's outline mode. An outline in nXML
8cd39fb3
MH
350mode is based on recognizing two kinds of element: sections and
351headings. There is one heading for every section and one section for
352every heading. A section contains its heading as or within its first
353child element. A section also contains its subordinate sections (its
354subsections). The text content of a section consists of anything in a
355section that is neither a subsection nor a heading.
356
1df7defd 357Note that this is a different model from that used by XHTML@.
8cd39fb3
MH
358nXML mode's outline support will not be useful for XHTML unless you
359adopt a convention of adding a @code{div} to enclose each
360section, rather than having sections implicitly delimited by different
361@code{h@var{n}} elements. This limitation may be removed
362in a future version.
363
364The variable @code{nxml-section-element-name-regexp} gives
1df7defd 365a regexp for the local names (i.e., the part of the name following any
8cd39fb3
MH
366prefix) of section elements. The variable
367@code{nxml-heading-element-name-regexp} gives a regexp for the
368local names of heading elements. For an element to be recognized
369as a section
370
371@itemize @bullet
372@item
373its start-tag must occur at the beginning of a line
374(possibly indented);
375@item
376its local name must match
377@code{nxml-section-element-name-regexp};
378@item
379either its first child element or a descendant of that
380first child element must have a local name that matches
381@code{nxml-heading-element-name-regexp}; the first such element
382is treated as the section's heading.
383@end itemize
384
385@noindent
386You can customize these variables using @kbd{M-x
387customize-variable}.
388
389There are three possible outline states for a section:
390
391@itemize @bullet
392@item
393normal, showing everything, including its heading, text
394content and subsections; each subsection is displayed according to the
395state of that subsection;
396@item
397showing just its heading, with both its text content and
398its subsections hidden; all subsections are hidden regardless of their
399state;
400@item
401showing its heading and its subsections, with its text
402content hidden; each subsection is displayed according to the state of
403that subsection.
404@end itemize
405
406In the last two states, where the text content is hidden, the
407heading is displayed specially, in an abbreviated form. An element
408like this:
409
410@example
411<section>
412<title>Food</title>
413<para>There are many kinds of food.</para>
414</section>
415@end example
416
417@noindent
418would be displayed on a single line like this:
419
420@example
421<-section>Food...</>
422@end example
423
424@noindent
425If there are hidden subsections, then a @code{+} will be used
426instead of a @code{-} like this:
427
428@example
429<+section>Food...</>
430@end example
431
432@noindent
433If there are non-hidden subsections, then the section will instead be
434displayed like this:
435
436@example
437<-section>Food...
438 <-section>Delicious Food...</>
439 <-section>Distasteful Food...</>
440</-section>
441@end example
442
443@noindent
444The heading is always displayed with an indent that corresponds to its
445depth in the outline, even it is not actually indented in the buffer.
446The variable @code{nxml-outline-child-indent} controls how much
447a subheading is indented with respect to its parent heading when the
448heading is being displayed specially.
449
450Commands to change the outline state of sections are bound to
451key sequences that start with @kbd{C-c C-o} (@kbd{o} is
452mnemonic for outline). The third and final key has been chosen to be
453consistent with outline mode. In the following descriptions
454current section means the section containing point, or, more precisely,
455the innermost section containing the character immediately following
456point.
457
458@itemize @bullet
459@item
460@kbd{C-c C-o C-a} shows all sections in the buffer
461normally.
462@item
463@kbd{C-c C-o C-t} hides the text content
464of all sections in the buffer.
465@item
466@kbd{C-c C-o C-c} hides the text content
467of the current section.
468@item
469@kbd{C-c C-o C-e} shows the text content
470of the current section.
471@item
472@kbd{C-c C-o C-d} hides the text content
473and subsections of the current section.
474@item
867d4bb3 475@kbd{C-c C-o C-s} shows the current section
8cd39fb3
MH
476and all its direct and indirect subsections normally.
477@item
478@kbd{C-c C-o C-k} shows the headings of the
479direct and indirect subsections of the current section.
480@item
481@kbd{C-c C-o C-l} hides the text content of the
482current section and of its direct and indirect
483subsections.
484@item
485@kbd{C-c C-o C-i} shows the headings of the
486direct subsections of the current section.
487@item
488@kbd{C-c C-o C-o} hides as much as possible without
489hiding the current section's text content; the headings of ancestor
490sections of the current section and their child section sections will
491not be hidden.
492@end itemize
493
494When a heading is displayed specially, you can use
495@key{RET} in that heading to show the text content of the section
496in the same way as @kbd{C-c C-o C-e}.
497
498You can also use the mouse to change the outline state:
499@kbd{S-mouse-2} hides the text content of a section in the same
500way as@kbd{C-c C-o C-c}; @kbd{mouse-2} on a specially
501displayed heading shows the text content of the section in the same
502way as @kbd{C-c C-o C-e}; @kbd{mouse-1} on a specially
503displayed start-tag toggles the display of subheadings on and
504off.
505
506The outline state for each section is stored with the first
507character of the section (as a text property). Every command that
508changes the outline state of any section updates the display of the
509buffer so that each section is displayed correctly according to its
510outline state. If the section structure is subsequently changed, then
511it is possible for the display to no longer correctly reflect the
512stored outline state. @kbd{C-c C-o C-r} can be used to refresh
513the display so it is correct again.
514
515@node Locating a schema
516@chapter Locating a schema
517
518nXML mode has a configurable set of rules to locate a schema for
519the file being edited. The rules are contained in one or more schema
520locating files, which are XML documents.
521
522The variable @samp{rng-schema-locating-files} specifies
523the list of the file-names of schema locating files that nXML mode
524should use. The order of the list is significant: when file
525@var{x} occurs in the list before file @var{y} then rules
526from file @var{x} have precedence over rules from file
527@var{y}. A filename specified in
528@samp{rng-schema-locating-files} may be relative. If so, it will
529be resolved relative to the document for which a schema is being
530located. It is not an error if relative file-names in
867d4bb3 531@samp{rng-schema-locating-files} do not exist. You can use
8cd39fb3
MH
532@kbd{M-x customize-variable @key{RET} rng-schema-locating-files
533@key{RET}} to customize the list of schema locating
534files.
535
536By default, @samp{rng-schema-locating-files} list has two
537members: @samp{schemas.xml}, and
538@samp{@var{dist-dir}/schema/schemas.xml} where
539@samp{@var{dist-dir}} is the directory containing the nXML
540distribution. The first member will cause nXML mode to use a file
541@samp{schemas.xml} in the same directory as the document being
542edited if such a file exist. The second member contains rules for the
543schemas that are included with the nXML distribution.
544
545@menu
867d4bb3
JB
546* Commands for locating a schema::
547* Schema locating files::
8cd39fb3
MH
548@end menu
549
550@node Commands for locating a schema
551@section Commands for locating a schema
552
553The command @kbd{C-c C-s C-w} will tell you what schema
554is currently being used.
555
556The rules for locating a schema are applied automatically when
557you visit a file in nXML mode. However, if you have just created a new
558file and the schema cannot be inferred from the file-name, then this
559will not locate the right schema. In this case, you should insert the
40572be6 560start-tag of the root element and then use the command @kbd{C-c C-s
8cd39fb3
MH
561C-a}, which reapplies the rules based on the current content of
562the document. It is usually not necessary to insert the complete
563start-tag; often just @samp{<@var{name}} is
564enough.
565
566If you want to use a schema that has not yet been added to the
567schema locating files, you can use the command @kbd{C-c C-s C-f}
b6f9df0f 568to manually select the file containing the schema for the document in
8cd39fb3
MH
569current buffer. Emacs will read the file-name of the schema from the
570minibuffer. After reading the file-name, Emacs will ask whether you
571wish to add a rule to a schema locating file that persistently
572associates the document with the selected schema. The rule will be
573added to the first file in the list specified
574@samp{rng-schema-locating-files}; it will create the file if
575necessary, but will not create a directory. If the variable
576@samp{rng-schema-locating-files} has not been customized, this
577means that the rule will be added to the file @samp{schemas.xml}
578in the same directory as the document being edited.
579
580The command @kbd{C-c C-s C-t} allows you to select a schema by
581specifying an identifier for the type of the document. The schema
582locating files determine the available type identifiers and what
583schema is used for each type identifier. This is useful when it is
584impossible to infer the right schema from either the file-name or the
585content of the document, even though the schema is already in the
586schema locating file. A situation in which this can occur is when
587there are multiple variants of a schema where all valid documents have
588the same document element. For example, XHTML has Strict and
589Transitional variants. In a situation like this, a schema locating file
590can define a type identifier for each variant. As with @kbd{C-c
591C-s C-f}, Emacs will ask whether you wish to add a rule to a schema
592locating file that persistently associates the document with the
593specified type identifier.
594
595The command @kbd{C-c C-s C-l} adds a rule to a schema
596locating file that persistently associates the document with
597the schema that is currently being used.
598
599@node Schema locating files
600@section Schema locating files
601
602Each schema locating file specifies a list of rules. The rules
603from each file are appended in order. To locate a schema each rule is
604applied in turn until a rule matches. The first matching rule is then
605used to determine the schema.
606
607Schema locating files are designed to be useful for other
608applications that need to locate a schema for a document. In fact,
609there is nothing specific to locating schemas in the design; it could
610equally well be used for locating a stylesheet.
611
612@menu
867d4bb3
JB
613* Schema locating file syntax basics::
614* Using the document's URI to locate a schema::
615* Using the document element to locate a schema::
616* Using type identifiers in schema locating files::
617* Using multiple schema locating files::
8cd39fb3
MH
618@end menu
619
620@node Schema locating file syntax basics
621@subsection Schema locating file syntax basics
622
623There is a schema for schema locating files in the file
624@samp{locate.rnc} in the schema directory. Schema locating
625files must be valid with respect to this schema.
626
627The document element of a schema locating file must be
628@samp{locatingRules} and the namespace URI must be
629@samp{http://thaiopensource.com/ns/locating-rules/1.0}. The
630children of the document element specify rules. The order of the
631children is the same as the order of the rules. Here's a complete
632example of a schema locating file:
633
634@example
635<?xml version="1.0"?>
636<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
637 <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
638 <documentElement localName="book" uri="docbook.rnc"/>
639</locatingRules>
640@end example
641
642@noindent
643This says to use the schema @samp{xhtml.rnc} for a document with
644namespace @samp{http://www.w3.org/1999/xhtml}, and to use the
645schema @samp{docbook.rnc} for a document whose local name is
646@samp{book}. If the document element had both a namespace URI
647of @samp{http://www.w3.org/1999/xhtml} and a local name of
648@samp{book}, then the matching rule that comes first will be
649used and so the schema @samp{xhtml.rnc} would be used. There is
650no precedence between different types of rule; the first matching rule
651of any type is used.
652
653As usual with XML-related technologies, resources are identified
654by URIs. The @samp{uri} attribute identifies the schema by
1df7defd 655specifying the URI@. The URI may be relative. If so, it is resolved
8cd39fb3
MH
656relative to the URI of the schema locating file that contains
657attribute. This means that if the value of @samp{uri} attribute
658does not contain a @samp{/}, then it will refer to a filename in
659the same directory as the schema locating file.
660
661@node Using the document's URI to locate a schema
662@subsection Using the document's URI to locate a schema
663
664A @samp{uri} rule locates a schema based on the URI of the
665document. The @samp{uri} attribute specifies the URI of the
666schema. The @samp{resource} attribute can be used to specify
667the schema for a particular document. For example,
668
669@example
670<uri resource="spec.xml" uri="docbook.rnc"/>
671@end example
672
673@noindent
867d4bb3 674specifies that the schema for @samp{spec.xml} is
8cd39fb3
MH
675@samp{docbook.rnc}.
676
677The @samp{pattern} attribute can be used instead of the
678@samp{resource} attribute to specify the schema for any document
679whose URI matches a pattern. The pattern has the same syntax as an
680absolute or relative URI except that the path component of the URI can
681use a @samp{*} character to stand for zero or more characters
1df7defd 682within a path segment (i.e., any character other @samp{/}).
8cd39fb3
MH
683Typically, the URI pattern looks like a relative URI, but, whereas a
684relative URI in the @samp{resource} attribute is resolved into a
685particular absolute URI using the base URI of the schema locating
686file, a relative URI pattern matches if it matches some number of
687complete path segments of the document's URI ending with the last path
1df7defd 688segment of the document's URI@. For example,
8cd39fb3
MH
689
690@example
691<uri pattern="*.xsl" uri="xslt.rnc"/>
692@end example
693
694@noindent
695specifies that the schema for documents with a URI whose path ends
696with @samp{.xsl} is @samp{xslt.rnc}.
697
698A @samp{transformURI} rule locates a schema by
699transforming the URI of the document. The @samp{fromPattern}
700attribute specifies a URI pattern with the same meaning as the
701@samp{pattern} attribute of the @samp{uri} element. The
702@samp{toPattern} attribute is a URI pattern that is used to
703generate the URI of the schema. Each @samp{*} in the
704@samp{toPattern} is replaced by the string that matched the
705corresponding @samp{*} in the @samp{fromPattern}. The
706resulting string is appended to the initial part of the document's URI
707that was not explicitly matched by the @samp{fromPattern}. The
708rule matches only if the transformed URI identifies an existing
709resource. For example, the rule
710
711@example
712<transformURI fromPattern="*.xml" toPattern="*.rnc"/>
713@end example
714
715@noindent
716would transform the URI @samp{file:///home/jjc/docs/spec.xml}
717into the URI @samp{file:///home/jjc/docs/spec.rnc}. Thus, this
718rule specifies that to locate a schema for a document
719@samp{@var{foo}.xml}, Emacs should test whether a file
720@samp{@var{foo}.rnc} exists in the same directory as
721@samp{@var{foo}.xml}, and, if so, should use it as the
722schema.
723
724@node Using the document element to locate a schema
725@subsection Using the document element to locate a schema
726
727A @samp{documentElement} rule locates a schema based on
728the local name and prefix of the document element. For example, a rule
729
730@example
731<documentElement prefix="xsl" localName="stylesheet" uri="xslt.rnc"/>
732@end example
733
734@noindent
735specifies that when the name of the document element is
736@samp{xsl:stylesheet}, then @samp{xslt.rnc} should be used
737as the schema. Either the @samp{prefix} or
738@samp{localName} attribute may be omitted to allow any prefix or
739local name.
740
741A @samp{namespace} rule locates a schema based on the
742namespace URI of the document element. For example, a rule
743
744@example
745<namespace ns="http://www.w3.org/1999/XSL/Transform" uri="xslt.rnc"/>
746@end example
747
748@noindent
749specifies that when the namespace URI of the document is
750@samp{http://www.w3.org/1999/XSL/Transform}, then
751@samp{xslt.rnc} should be used as the schema.
752
753@node Using type identifiers in schema locating files
754@subsection Using type identifiers in schema locating files
755
756Type identifiers allow a level of indirection in locating the
757schema for a document. Instead of associating the document directly
758with a schema URI, the document is associated with a type identifier,
1df7defd 759which is in turn associated with a schema URI@. nXML mode does not
8cd39fb3
MH
760constrain the format of type identifiers. They can be simply strings
761without any formal structure or they can be public identifiers or
762URIs. Note that these type identifiers have nothing to do with the
763DOCTYPE declaration. When comparing type identifiers, whitespace is
764normalized in the same way as with the @samp{xsd:token}
765datatype: leading and trailing whitespace is stripped; other sequences
766of whitespace are normalized to a single space character.
767
768Each of the rules described in previous sections that uses a
769@samp{uri} attribute to specify a schema, can instead use a
770@samp{typeId} attribute to specify a type identifier. The type
771identifier can be associated with a URI using a @samp{typeId}
772element. For example,
773
774@example
775<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
776 <namespace ns="http://www.w3.org/1999/xhtml" typeId="XHTML"/>
777 <typeId id="XHTML" typeId="XHTML Strict"/>
778 <typeId id="XHTML Strict" uri="xhtml-strict.rnc"/>
779 <typeId id="XHTML Transitional" uri="xhtml-transitional.rnc"/>
780</locatingRules>
781@end example
782
783@noindent
784declares three type identifiers @samp{XHTML} (representing the
785default variant of XHTML to be used), @samp{XHTML Strict} and
786@samp{XHTML Transitional}. Such a schema locating file would
787use @samp{xhtml-strict.rnc} for a document whose namespace is
788@samp{http://www.w3.org/1999/xhtml}. But it is considerably
789more flexible than a schema locating file that simply specified
790
791@example
792<namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml-strict.rnc"/>
793@end example
794
795@noindent
796A user can easily use @kbd{C-c C-s C-t} to select between XHTML
797Strict and XHTML Transitional. Also, a user can easily add a catalog
798
799@example
800<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
801 <typeId id="XHTML" typeId="XHTML Transitional"/>
802</locatingRules>
803@end example
804
805@noindent
806that makes the default variant of XHTML be XHTML Transitional.
807
808@node Using multiple schema locating files
809@subsection Using multiple schema locating files
810
811The @samp{include} element includes rules from another
812schema locating file. The behavior is exactly as if the rules from
813that file were included in place of the @samp{include} element.
814Relative URIs are resolved into absolute URIs before the inclusion is
815performed. For example,
816
817@example
818<include rules="../rules.xml"/>
819@end example
820
821@noindent
822includes the rules from @samp{rules.xml}.
823
824The process of locating a schema takes as input a list of schema
825locating files. The rules in all these files and in the files they
826include are resolved into a single list of rules, which are applied
827strictly in order. Sometimes this order is not what is needed.
828For example, suppose you have two schema locating files, a private
829file
830
831@example
832<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
833 <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
834</locatingRules>
835@end example
836
837@noindent
838followed by a public file
839
840@example
841<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
842 <transformURI pathSuffix=".xml" replacePathSuffix=".rnc"/>
843 <namespace ns="http://www.w3.org/1999/XSL/Transform" typeId="XSLT"/>
844</locatingRules>
845@end example
846
847@noindent
848The effect of these two files is that the XHTML @samp{namespace}
849rule takes precedence over the @samp{transformURI} rule, which
850is almost certainly not what is needed. This can be solved by adding
851an @samp{applyFollowingRules} to the private file.
852
853@example
854<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
855 <applyFollowingRules ruleType="transformURI"/>
856 <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
857</locatingRules>
858@end example
859
860@node DTDs
861@chapter DTDs
862
3d439cd1 863nXML mode is designed to support the creation of standalone XML
1df7defd 864documents that do not depend on a DTD@. Although it is common practice
8cd39fb3
MH
865to insert a DOCTYPE declaration referencing an external DTD, this has
866undesirable side-effects. It means that the document is no longer
867self-contained. It also means that different XML parsers may interpret
868the document in different ways, since the XML Recommendation does not
1df7defd 869require XML parsers to read the DTD@. With DTDs, it was impractical to
8cd39fb3
MH
870get validation without using an external DTD or reference to an
871parameter entity. With RELAX NG and other schema languages, you can
9858f6c3 872simultaneously get the benefits of validation and standalone XML
8cd39fb3
MH
873documents. Therefore, I recommend that you do not reference an
874external DOCTYPE in your XML documents.
875
876One problem is entities for characters. Typically, as well as
877providing validation, DTDs also provide a set of character entities
878for documents to use. Schemas cannot provide this functionality,
879because schema validation happens after XML parsing. The recommended
880solution is to either use the Unicode characters directly, or, if this
881is impractical, use character references. nXML mode supports this by
882providing commands for entering characters and character references
883using the Unicode names, and can display the glyph corresponding to a
884character reference.
885
886@node Limitations
887@chapter Limitations
888
889nXML mode has some limitations:
890
891@itemize @bullet
892@item
893DTD support is limited. Internal parsed general entities declared
894in the internal subset are supported provided they do not contain
895elements. Other usage of DTDs is ignored.
896@item
897The restrictions on RELAX NG schemas in section 7 of the RELAX NG
898specification are not enforced.
8cd39fb3
MH
899@end itemize
900
901@bye