2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 2013 Free Software Foundation, Inc.
4 @c See the file guile.texi for copying conditions.
9 SXML is a native representation of XML in terms of standard Scheme data
10 types: lists, symbols, and strings. For example, the simple XML
14 <parrot type="African Grey"><name>Alfie</name></parrot>
17 may be represented with the following SXML:
20 (parrot (@@ (type "African Grey)) (name "Alfie"))
23 SXML is very general, and is capable of representing all of XML.
24 Formally, this means that SXML is a conforming implementation of the
25 @uref{XML Information Set,http://www.w3.org/TR/xml-infoset/} standard.
27 Guile includes several facilities for working with XML and SXML:
28 parsers, serializers, and transformers.
31 * SXML Overview:: XML, as it was meant to be
32 * Reading and Writing XML:: Convenient XML parsing and serializing
33 * SSAX:: Custom functional-style XML parsers
34 * Transforming SXML:: Munging SXML with @code{pre-post-order}
35 * SXML Tree Fold:: Fold-based SXML transformations
36 * SXPath:: XPath for SXML
37 * sxml apply-templates:: A more XSLT-like approach to SXML transformations
38 * sxml ssax input-parse:: The SSAX tokenizer, optimized for Guile
42 @subsection SXML Overview
44 (This section needs to be written; volunteers welcome.)
47 @node Reading and Writing XML
48 @subsection Reading and Writing XML
50 The @code{(sxml simple)} module presents a basic interface for parsing
51 XML from a port into the Scheme SXML format, and for serializing it back
55 (use-modules (sxml simple))
58 @deffn {Scheme Procedure} xml->sxml [string-or-port] [#:namespaces='()] @
59 [#:declare-namespaces?=#t] [#:trim-whitespace?=#f] @
60 [#:entities='()] [#:default-entity-handler=#f] @
61 [#:doctype-handler=#f]
62 Use SSAX to parse an XML document into SXML. Takes one optional
63 argument, @var{string-or-port}, which defaults to the current input
64 port. Returns the resulting SXML document. If @var{string-or-port} is
65 a port, it will be left pointing at the next available character in the
69 As is normal in SXML, XML elements parse as tagged lists. Attributes,
70 if any, are placed after the tag, within an @code{@@} element. The root
71 of the resulting XML will be contained in a special tag, @code{*TOP*}.
72 This tag will contain the root element of the XML, but also any prior
73 processing instructions.
77 @result{} (*TOP* (foo))
78 (xml->sxml "<foo>text</foo>")
79 @result{} (*TOP* (foo "text"))
80 (xml->sxml "<foo kind=\"bar\">text</foo>")
81 @result{} (*TOP* (foo (@@ (kind "bar")) "text"))
82 (xml->sxml "<?xml version=\"1.0\"?><foo/>")
83 @result{} (*TOP* (*PI* xml "version=\"1.0\"") (foo))
86 All namespaces in the XML document must be declared, via @code{xmlns}
87 attributes. SXML elements built from non-default namespaces will have
88 their tags prefixed with their URI. Users can specify custom prefixes
89 for certain namespaces with the @code{#:namespaces} keyword argument to
93 (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>")
94 @result{} (*TOP* (http://example.org/ns1:foo "text"))
95 (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>"
96 #:namespaces '((ns1 . "http://example.org/ns1")))
97 @result{} (*TOP* (ns1:foo "text"))
98 (xml->sxml "<foo xmlns:bar=\"http://example.org/ns2\"><bar:baz/></foo>"
99 #:namespaces '((ns2 . "http://example.org/ns2")))
100 @result{} (*TOP* (foo (ns2:baz)))
103 By default, namespaces passed to @code{xml->sxml} are treated as if they
104 were declared on the root element. Passing a false
105 @code{#:declare-namespaces?} argument will disable this behavior,
106 requiring in-document declarations of namespaces before use..
109 (xml->sxml "<foo><ns2:baz/></foo>"
110 #:namespaces '((ns2 . "http://example.org/ns2")))
111 @result{} (*TOP* (foo (ns2:baz)))
112 (xml->sxml "<foo><ns2:baz/></foo>"
113 #:namespaces '((ns2 . "http://example.org/ns2"))
114 #:declare-namespaces? #f)
115 @result{} error: undeclared namespace: `bar'
118 By default, all whitespace in XML is significant. Passing the
119 @code{#:trim-whitespace?} keyword argument to @code{xml->sxml} will trim
120 whitespace in front, behind and between elements, treating it as
121 ``unsignificant''. Whitespace in text fragments is left alone.
124 (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
125 @result{} (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n"))
126 (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
127 #:trim-whitespace? #t)
128 @result{} (*TOP* (foo (bar " Alfie the parrot! ")))
131 Parsed entities may be declared with the @code{#:entities} keyword
132 argument, or handled with the @code{#:default-entity-handler}. By
133 default, only the standard @code{<}, @code{>}, @code{&},
134 @code{'} and @code{"} entities are defined, as well as the
135 @code{&#@var{N};} and @code{&#x@var{N};} (decimal and hexadecimal)
136 numeric character entities.
139 (xml->sxml "<foo>&</foo>")
140 @result{} (*TOP* (foo "&"))
141 (xml->sxml "<foo> </foo>")
142 @result{} error: undefined entity: nbsp
143 (xml->sxml "<foo> </foo>")
144 @result{} (*TOP* (foo "\xa0"))
145 (xml->sxml "<foo> </foo>"
146 #:entities '((nbsp . "\xa0")))
147 @result{} (*TOP* (foo "\xa0"))
148 (xml->sxml "<foo> &foo;</foo>"
149 #:default-entity-handler
154 (format (current-warning-port)
155 "~a:~a:~a: undefined entitity: ~a\n"
156 (or (port-filename port) "<unknown file>")
157 (port-line port) (port-column port)
159 (symbol->string name)))))
160 @print{} <unknown file>:0:17: undefined entitity: foo
161 @result{} (*TOP* (foo "\xa0 foo"))
164 By default, @code{xml->sxml} skips over the @code{<!DOCTYPE>}
165 declaration, if any. This behavior can be overridden with the
166 @code{#:doctype-handler} argument, which should be a procedure of three
167 arguments: the @dfn{docname} (a symbol), @dfn{systemid} (a string), and
168 the internal doctype subset (as a string or @code{#f} if not present).
170 The handler should return keyword arguments as multiple values, as if it
171 were calling its continuation with keyword arguments. The continuation
172 accepts the @code{#:entities} and @code{#:namespaces} keyword arguments,
173 in the same format that @code{xml->sxml} itself takes. These entities
174 and namespaces will be prepended to those given to the @code{xml->sxml}
178 (define (handle-foo docname systemid internal-subset)
181 (values #:entities '((greets . "<i>Hello, world!</i>"))))
185 (xml->sxml "<!DOCTYPE foo><p>&greets;</p>"
186 #:doctype-handler handle-foo)
187 @result{} (*TOP* (p (i "Hello, world!")))
190 If the document has no doctype declaration, the @var{doctype-handler} is
191 invoked with @code{#f} for the three arguments.
193 In the future, the continuation may accept other keyword arguments, for
194 example to validate the parsed SXML against the doctype.
196 @deffn {Scheme Procedure} sxml->xml tree [port]
197 Serialize the SXML tree @var{tree} as XML. The output will be written to
198 the current output port, unless the optional argument @var{port} is
202 @deffn {Scheme Procedure} sxml->string sxml
203 Detag an sxml tree @var{sxml} into a string. Does not perform any
208 @subsection SSAX: A Functional XML Parsing Toolkit
210 Guile's XML parser is based on Oleg Kiselyov's powerful XML parsing
213 @subsubsection History
215 Back in the 1990s, when the world was young again and XML was the
216 solution to all of its problems, there were basically two kinds of XML
217 parsers out there: DOM parsers and SAX parsers.
219 A DOM parser reads through an entire XML document, building up a tree of
220 ``DOM objects'' representing the document structure. They are very easy
221 to use, but sometimes you don't actually want all of the information in
222 a document; building an object tree is not necessary if all you want to
223 do is to count word frequencies in a document, for example.
225 SAX parsers were created to give the programmer more control on the
226 parsing process. A programmer gives the SAX parser a number of
227 ``callbacks'': functions that will be called on various features of the
228 XML stream as they are encountered. SAX parsers are more efficient, but
229 much harder to user, as users typically have to manually maintain a
230 stack of open elements.
232 Kiselyov realized that the SAX programming model could be made much
233 simpler if the callbacks were formulated not as a linear fold across the
234 features of the XML stream, but as a @emph{tree fold} over the structure
235 implicit in the XML. In this way, the user has a very convenient,
236 functional-style interface that can still generate optimal parsers.
238 The @code{xml->sxml} interface from the @code{(sxml simple)} module is a
239 DOM-style parser built using SSAX, though it returns SXML instead of DOM
242 @subsubsection Implementation
244 @code{(sxml ssax)} is a package of low-to-high level lexing and parsing
245 procedures that can be combined to yield a SAX, a DOM, a validating
246 parser, or a parser intended for a particular document type. The
247 procedures in the package can be used separately to tokenize or parse
248 various pieces of XML documents. The package supports XML Namespaces,
249 internal and external parsed entities, user-controlled handling of
250 whitespace, and validation. This module therefore is intended to be a
251 framework, a set of ``Lego blocks'' you can use to build a parser
252 following any discipline and performing validation to any degree. As an
253 example of the parser construction, this file includes a semi-validating
256 SSAX has a ``sequential'' feel of SAX yet a ``functional style'' of DOM.
257 Like a SAX parser, the framework scans the document only once and
258 permits incremental processing. An application that handles document
259 elements in order can run as efficiently as possible. @emph{Unlike} a
260 SAX parser, the framework does not require an application register
261 stateful callbacks and surrender control to the parser. Rather, it is
262 the application that can drive the framework -- calling its functions to
263 get the current lexical or syntax element. These functions do not
264 maintain or mutate any state save the input port. Therefore, the
265 framework permits parsing of XML in a pure functional style, with the
266 input port being a monad (or a linear, read-once parameter).
268 Besides the @var{port}, there is another monad -- @var{seed}. Most of
269 the middle- and high-level parsers are single-threaded through the
270 @var{seed}. The functions of this framework do not process or affect
271 the @var{seed} in any way: they simply pass it around as an instance of
272 an opaque datatype. User functions, on the other hand, can use the seed
273 to maintain user's state, to accumulate parsing results, etc. A user
274 can freely mix his own functions with those of the framework. On the
275 other hand, the user may wish to instantiate a high-level parser:
276 @code{SSAX:make-elem-parser} or @code{SSAX:make-parser}. In the latter
277 case, the user must provide functions of specific signatures, which are
278 called at predictable moments during the parsing: to handle character
279 data, element data, or processing instructions (PI). The functions are
280 always given the @var{seed}, among other parameters, and must return the
283 From a functional point of view, XML parsing is a combined
284 pre-post-order traversal of a ``tree'' that is the XML document itself.
285 This down-and-up traversal tells the user about an element when its
286 start tag is encountered. The user is notified about the element once
287 more, after all element's children have been handled. The process of
288 XML parsing therefore is a fold over the raw XML document. Unlike a
289 fold over trees defined in [1], the parser is necessarily
290 single-threaded -- obviously as elements in a text XML document are laid
291 down sequentially. The parser therefore is a tree fold that has been
292 transformed to accept an accumulating parameter [1,2].
294 Formally, the denotational semantics of the parser can be expressed as
297 parser:: (Start-tag -> Seed -> Seed) ->
298 (Start-tag -> Seed -> Seed -> Seed) ->
299 (Char-Data -> Seed -> Seed) ->
300 XML-text-fragment -> Seed -> Seed
301 parser fdown fup fchar "<elem attrs> content </elem>" seed
302 = fup "<elem attrs>" seed
303 (parser fdown fup fchar "content" (fdown "<elem attrs>" seed))
305 parser fdown fup fchar "char-data content" seed
306 = parser fdown fup fchar "content" (fchar "char-data" seed)
308 parser fdown fup fchar "elem-content content" seed
309 = parser fdown fup fchar "content" (
310 parser fdown fup fchar "elem-content" seed)
313 Compare the last two equations with the left fold
316 fold-left kons elem:list seed = fold-left kons list (kons elem seed)
319 The real parser created by @code{SSAX:make-parser} is slightly more
320 complicated, to account for processing instructions, entity references,
321 namespaces, processing of document type declaration, etc.
323 The XML standard document referred to in this module is
324 @uref{http://www.w3.org/TR/1998/REC-xml-19980210.html}
326 The present file also defines a procedure that parses the text of an XML
327 document or of a separate element into SXML, an S-expression-based model
328 of an XML Information Set. SXML is also an Abstract Syntax Tree of an
329 XML document. SXML is similar but not identical to DOM; SXML is
330 particularly suitable for Scheme-based XML/HTML authoring, SXPath
331 queries, and tree transformations. See SXML.html for more details.
332 SXML is a term implementation of evaluation of the XML document [3].
333 The other implementation is context-passing.
335 The present frameworks fully supports the XML Namespaces Recommendation:
336 @uref{http://www.w3.org/TR/REC-xml-names/}.
342 Jeremy Gibbons, Geraint Jones, "The Under-appreciated Unfold," Proc.
343 ICFP'98, 1998, pp. 273-279.
346 Richard S. Bird, The promotion and accumulation strategies in
347 transformational programming, ACM Trans. Progr. Lang. Systems,
348 6(4):487-504, October 1984.
351 Ralf Hinze, "Deriving Backtracking Monad Transformers," Functional
352 Pearl. Proc ICFP'00, pp. 186-197.
357 @deffn {Scheme Procedure} current-ssax-error-port
360 @deffn {Scheme Procedure} with-ssax-error-to-port port thunk
363 @deffn {Scheme Procedure} xml-token? _
365 -- Scheme Procedure: pair? x
366 Return `#t' if X is a pair; otherwise return `#f'.
372 @deffn {Scheme Syntax} xml-token-kind token
375 @deffn {Scheme Syntax} xml-token-head token
378 @deffn {Scheme Procedure} make-empty-attlist
381 @deffn {Scheme Procedure} attlist-add attlist name-value
384 @deffn {Scheme Procedure} attlist-null? x
385 Return @code{#t} if @var{x} is the empty list, else @code{#f}.
388 @deffn {Scheme Procedure} attlist-remove-top attlist
391 @deffn {Scheme Procedure} attlist->alist attlist
394 @deffn {Scheme Procedure} attlist-fold kons knil lis1
397 @deffn {Scheme Procedure} define-parsed-entity! entity str
398 Define a new parsed entity. @var{entity} should be a symbol.
400 Instances of &@var{entity}; in XML text will be replaced with the string
401 @var{str}, which will then be parsed.
404 @deffn {Scheme Procedure} reset-parsed-entity-definitions!
405 Restore the set of parsed entity definitions to its initial state.
408 @deffn {Scheme Procedure} ssax:uri-string->symbol uri-str
411 @deffn {Scheme Procedure} ssax:skip-internal-dtd port
414 @deffn {Scheme Procedure} ssax:read-pi-body-as-string port
417 @deffn {Scheme Procedure} ssax:reverse-collect-str-drop-ws fragments
420 @deffn {Scheme Procedure} ssax:read-markup-token port
423 @deffn {Scheme Procedure} ssax:read-cdata-body port str-handler seed
426 @deffn {Scheme Procedure} ssax:read-char-ref port
429 @deffn {Scheme Procedure} ssax:read-attributes port entities
432 @deffn {Scheme Procedure} ssax:complete-start-tag tag-head port elems entities namespaces
435 @deffn {Scheme Procedure} ssax:read-external-id port
438 @deffn {Scheme Procedure} ssax:read-char-data port expect-eof? str-handler seed
441 @deffn {Scheme Procedure} ssax:xml->sxml port namespace-prefix-assig
444 @deffn {Scheme Syntax} ssax:make-parser . kw-val-pairs
447 @deffn {Scheme Syntax} ssax:make-pi-parser orig-handlers
450 @deffn {Scheme Syntax} ssax:make-elem-parser my-new-level-seed my-finish-element my-char-data-handler my-pi-handlers
453 @node Transforming SXML
454 @subsection Transforming SXML
455 @subsubsection Overview
456 @heading SXML expression tree transformers
457 @subheading Pre-Post-order traversal of a tree and creation of a new tree
459 pre-post-order:: <tree> x <bindings> -> <new-tree>
465 <bindings> ::= (<binding> ...)
466 <binding> ::= (<trigger-symbol> *preorder* . <handler>) |
467 (<trigger-symbol> *macro* . <handler>) |
468 (<trigger-symbol> <new-bindings> . <handler>) |
469 (<trigger-symbol> . <handler>)
470 <trigger-symbol> ::= XMLname | *text* | *default*
471 <handler> :: <trigger-symbol> x [<tree>] -> <new-tree>
474 The pre-post-order function visits the nodes and nodelists
475 pre-post-order (depth-first). For each @code{<Node>} of the form
476 @code{(@var{name} <Node> ...)}, it looks up an association with the
477 given @var{name} among its @var{<bindings>}. If failed,
478 @code{pre-post-order} tries to locate a @code{*default*} binding. It's
479 an error if the latter attempt fails as well. Having found a binding,
480 the @code{pre-post-order} function first checks to see if the binding is
484 (<trigger-symbol> *preorder* . <handler>)
487 If it is, the handler is 'applied' to the current node. Otherwise, the
488 pre-post-order function first calls itself recursively for each child of
489 the current node, with @var{<new-bindings>} prepended to the
490 @var{<bindings>} in effect. The result of these calls is passed to the
491 @var{<handler>} (along with the head of the current @var{<Node>}). To be
492 more precise, the handler is _applied_ to the head of the current node
493 and its processed children. The result of the handler, which should also
494 be a @code{<tree>}, replaces the current @var{<Node>}. If the current
495 @var{<Node>} is a text string or other atom, a special binding with a
496 symbol @code{*text*} is looked up.
498 A binding can also be of a form
501 (<trigger-symbol> *macro* . <handler>)
504 This is equivalent to @code{*preorder*} described above. However, the
505 result is re-processed again, with the current stylesheet.
508 @deffn {Scheme Procedure} SRV:send-reply . fragments
509 Output the @var{fragments} to the current output port.
511 The fragments are a list of strings, characters, numbers, thunks,
512 @code{#f}, @code{#t} -- and other fragments. The function traverses the
513 tree depth-first, writes out strings and characters, executes thunks,
514 and ignores @code{#f} and @code{'()}. The function returns @code{#t} if
515 anything was written at all; otherwise the result is @code{#f} If
516 @code{#t} occurs among the fragments, it is not written out but causes
517 the result of @code{SRV:send-reply} to be @code{#t}.
520 @deffn {Scheme Procedure} foldts fdown fup fhere seed tree
523 @deffn {Scheme Procedure} post-order tree bindings
526 @deffn {Scheme Procedure} pre-post-order tree bindings
529 @deffn {Scheme Procedure} replace-range beg-pred end-pred forest
533 @subsection SXML Tree Fold
534 @subsubsection Overview
535 @code{(sxml fold)} defines a number of variants of the @dfn{fold}
536 algorithm for use in transforming SXML trees. Additionally it defines
537 the layout operator, @code{fold-layout}, which might be described as a
538 context-passing variant of SSAX's @code{pre-post-order}.
541 @deffn {Scheme Procedure} foldt fup fhere tree
542 The standard multithreaded tree fold.
544 @var{fup} is of type [a] -> a. @var{fhere} is of type object -> a.
547 @deffn {Scheme Procedure} foldts fdown fup fhere seed tree
548 The single-threaded tree fold originally defined in SSAX. @xref{SSAX},
549 for more information.
552 @deffn {Scheme Procedure} foldts* fdown fup fhere seed tree
553 A variant of @code{foldts} that allows pre-order tree
554 rewrites. Originally defined in Andy Wingo's 2007 paper,
555 @emph{Applications of fold to XML transformation}.
558 @deffn {Scheme Procedure} fold-values proc list . seeds
559 A variant of @code{fold} that allows multi-valued seeds. Note that the
560 order of the arguments differs from that of @code{fold}. @xref{SRFI-1
564 @deffn {Scheme Procedure} foldts*-values fdown fup fhere tree . seeds
565 A variant of @code{foldts*} that allows multi-valued
566 seeds. Originally defined in Andy Wingo's 2007 paper, @emph{Applications
567 of fold to XML transformation}.
570 @deffn {Scheme Procedure} fold-layout tree bindings params layout stylesheet
571 A traversal combinator in the spirit of @code{pre-post-order}.
572 @xref{Transforming SXML}.
574 @code{fold-layout} was originally presented in Andy Wingo's 2007 paper,
575 @emph{Applications of fold to XML transformation}.
578 bindings := (<binding>...)
579 binding := (<tag> <bandler-pair>...)
580 | (*default* . <post-handler>)
581 | (*text* . <text-handler>)
583 handler-pair := (pre-layout . <pre-layout-handler>)
584 | (post . <post-handler>)
585 | (bindings . <bindings>)
586 | (pre . <pre-handler>)
587 | (macro . <macro-handler>)
591 @item pre-layout-handler
592 A function of three arguments:
596 the kids of the current node, before traversal
599 the params of the current node
602 the layout coming into this node
606 @var{pre-layout-handler} is expected to use this information to return a
607 layout to pass to the kids. The default implementation returns the
608 layout given in the arguments.
611 A function of five arguments:
615 the current tag being processed
618 the params of the current node
621 the layout coming into the current node, before any kids were processed
624 the layout after processing all of the children
627 the already-processed child nodes
631 @var{post-handler} should return two values, the layout to pass to the
632 next node and the final tree.
635 @var{text-handler} is a function of three arguments:
649 @var{text-handler} should return two values, the layout to pass to the
650 next node and the value to which the string should transform.
657 @subsubsection Overview
658 @heading SXPath: SXML Query Language
659 SXPath is a query language for SXML, an instance of XML Information set
660 (Infoset) in the form of s-expressions. See @code{(sxml ssax)} for the
661 definition of SXML and more details. SXPath is also a translation into
662 Scheme of an XML Path Language, @uref{http://www.w3.org/TR/xpath,XPath}.
663 XPath and SXPath describe means of selecting a set of Infoset's items or
666 To facilitate queries, XPath maps the XML Infoset into an explicit tree,
667 and introduces important notions of a location path and a current,
668 context node. A location path denotes a selection of a set of nodes
669 relative to a context node. Any XPath tree has a distinguished, root
670 node -- which serves as the context node for absolute location paths.
671 Location path is recursively defined as a location step joined with a
672 location path. A location step is a simple query of the database
673 relative to a context node. A step may include expressions that further
674 filter the selected set. Each node in the resulting set is used as a
675 context node for the adjoining location path. The result of the step is
676 a union of the sets returned by the latter location paths.
678 The SXML representation of the XML Infoset (see SSAX.scm) is rather
679 suitable for querying as it is. Bowing to the XPath specification, we
680 will refer to SXML information items as 'Nodes':
683 <Node> ::= <Element> | <attributes-coll> | <attrib>
684 | "text string" | <PI>
687 This production can also be described as
690 <Node> ::= (name . <Nodeset>) | "text string"
693 An (ordered) set of nodes is just a list of the constituent nodes:
696 <Nodeset> ::= (<Node> ...)
699 Nodesets, and Nodes other than text strings are both lists. A <Nodeset>
700 however is either an empty list, or a list whose head is not a symbol. A
701 symbol at the head of a node is either an XML name (in which case it's a
702 tag of an XML element), or an administrative name such as '@@'. This
703 uniform list representation makes processing rather simple and elegant,
704 while avoiding confusion. The multi-branch tree structure formed by the
705 mutually-recursive datatypes <Node> and <Nodeset> lends itself well to
706 processing by functional languages.
708 A location path is in fact a composite query over an XPath tree or its
709 branch. A singe step is a combination of a projection, selection or a
710 transitive closure. Multiple steps are combined via join and union
711 operations. This insight allows us to @emph{elegantly} implement XPath
712 as a sequence of projection and filtering primitives -- converters --
713 joined by @dfn{combinators}. Each converter takes a node and returns a
714 nodeset which is the result of the corresponding query relative to that
715 node. A converter can also be called on a set of nodes. In that case it
716 returns a union of the corresponding queries over each node in the set.
717 The union is easily implemented as a list append operation as all nodes
718 in a SXML tree are considered distinct, by XPath conventions. We also
719 preserve the order of the members in the union. Query combinators are
720 high-order functions: they take converter(s) (which is a Node|Nodeset ->
721 Nodeset function) and compose or otherwise combine them. We will be
722 concerned with only relative location paths [XPath]: an absolute
723 location path is a relative path applied to the root node.
725 Similarly to XPath, SXPath defines full and abbreviated notations for
726 location paths. In both cases, the abbreviated notation can be
727 mechanically expanded into the full form by simple rewriting rules. In
728 case of SXPath the corresponding rules are given as comments to a sxpath
729 function, below. The regression test suite at the end of this file shows
730 a representative sample of SXPaths in both notations, juxtaposed with
731 the corresponding XPath expressions. Most of the samples are borrowed
732 literally from the XPath specification, while the others are adjusted
733 for our running example, tree1.
736 @deffn {Scheme Procedure} nodeset? x
739 @deffn {Scheme Procedure} node-typeof? crit
742 @deffn {Scheme Procedure} node-eq? other
745 @deffn {Scheme Procedure} node-equal? other
748 @deffn {Scheme Procedure} node-pos n
751 @deffn {Scheme Procedure} filter pred?
753 -- Scheme Procedure: filter pred list
754 Return all the elements of 2nd arg LIST that satisfy predicate
755 PRED. The list is not disordered - elements that appear in the
756 result list occur in the same order as they occur in the argument
757 list. The returned list may share a common tail with the argument
758 list. The dynamic order in which the various applications of pred
759 are made is not specified.
761 (filter even? '(0 7 8 8 43 -4)) => (0 8 8 -4)
767 @deffn {Scheme Procedure} take-until pred?
770 @deffn {Scheme Procedure} take-after pred?
773 @deffn {Scheme Procedure} map-union proc lst
776 @deffn {Scheme Procedure} node-reverse node-or-nodeset
779 @deffn {Scheme Procedure} node-trace title
782 @deffn {Scheme Procedure} select-kids test-pred?
785 @deffn {Scheme Procedure} node-self pred?
787 -- Scheme Procedure: filter pred list
788 Return all the elements of 2nd arg LIST that satisfy predicate
789 PRED. The list is not disordered - elements that appear in the
790 result list occur in the same order as they occur in the argument
791 list. The returned list may share a common tail with the argument
792 list. The dynamic order in which the various applications of pred
793 are made is not specified.
795 (filter even? '(0 7 8 8 43 -4)) => (0 8 8 -4)
801 @deffn {Scheme Procedure} node-join . selectors
804 @deffn {Scheme Procedure} node-reduce . converters
807 @deffn {Scheme Procedure} node-or . converters
810 @deffn {Scheme Procedure} node-closure test-pred?
813 @deffn {Scheme Procedure} node-parent rootnode
816 @deffn {Scheme Procedure} sxpath path
819 @node sxml ssax input-parse
820 @subsection (sxml ssax input-parse)
821 @subsubsection Overview
824 The procedures in this module surprisingly often suffice to parse an
825 input stream. They either skip, or build and return tokens, according to
826 inclusion or delimiting semantics. The list of characters to expect,
827 include, or to break at may vary from one invocation of a function to
828 another. This allows the functions to easily parse even
829 context-sensitive languages.
831 EOF is generally frowned on, and thrown up upon if encountered.
832 Exceptions are mentioned specifically. The list of expected characters
833 (characters to skip until, or break-characters) may include an EOF
834 "character", which is to be coded as the symbol, @code{*eof*}.
836 The input stream to parse is specified as a @dfn{port}, which is usually
837 the last (and optional) argument. It defaults to the current input port
840 If the parser encounters an error, it will throw an exception to the key
841 @code{parser-error}. The arguments will be of the form @code{(@var{port}
842 @var{message} @var{specialising-msg}*)}.
844 The first argument is a port, which typically points to the offending
845 character or its neighborhood. You can then use @code{port-column} and
846 @code{port-line} to query the current position. @var{message} is the
847 description of the error. Other arguments supply more details about the
851 @deffn {Scheme Procedure} peek-next-char [port]
854 @deffn {Scheme Procedure} assert-curr-char expected-chars comment [port]
857 @deffn {Scheme Procedure} skip-until arg [port]
860 @deffn {Scheme Procedure} skip-while skip-chars [port]
863 @deffn {Scheme Procedure} next-token prefix-skipped-chars break-chars [comment] [port]
866 @deffn {Scheme Procedure} next-token-of incl-list/pred [port]
869 @deffn {Scheme Procedure} read-text-line [port]
872 @deffn {Scheme Procedure} read-string n [port]
875 @deffn {Scheme Procedure} find-string-from-port? _ _ . _
876 Looks for @var{str} in @var{<input-port>}, optionally within the first
877 @var{max-no-char} characters.
880 @node sxml apply-templates
881 @subsection (sxml apply-templates)
882 @subsubsection Overview
883 Pre-order traversal of a tree and creation of a new tree:
886 apply-templates:: tree x <templates> -> <new-tree>
892 <templates> ::= (<template> ...)
893 <template> ::= (<node-test> <node-test> ... <node-test> . <handler>)
894 <node-test> ::= an argument to node-typeof? above
895 <handler> ::= <tree> -> <new-tree>
898 This procedure does a @emph{normal}, pre-order traversal of an SXML
899 tree. It walks the tree, checking at each node against the list of
902 If the match is found (which must be unique, i.e., unambiguous), the
903 corresponding handler is invoked and given the current node as an
904 argument. The result from the handler, which must be a @code{<tree>},
905 takes place of the current node in the resulting tree. The name of the
906 function is not accidental: it resembles rather closely an
907 @code{apply-templates} function of XSLT.
910 @deffn {Scheme Procedure} apply-templates tree templates