Merge commit '58147d67806e1f54c447d7eabac35b1a5086c3a6'
[bpt/guile.git] / doc / ref / sxml.texi
CommitLineData
58c4a39d
AW
1@c -*-texinfo-*-
2@c This is part of the GNU Guile Reference Manual.
3@c Copyright (C) 2013 Free Software Foundation, Inc.
4@c See the file guile.texi for copying conditions.
5
6@node SXML
7@section SXML
8
3e31e75a
AW
9SXML is a native representation of XML in terms of standard Scheme data
10types: lists, symbols, and strings. For example, the simple XML
11fragment:
58c4a39d 12
3e31e75a
AW
13@example
14<parrot type="African Grey"><name>Alfie</name></parrot>
58c4a39d
AW
15@end example
16
3e31e75a 17may be represented with the following SXML:
58c4a39d 18
3e31e75a
AW
19@example
20(parrot (@@ (type "African Grey)) (name "Alfie"))
21@end example
58c4a39d 22
3e31e75a
AW
23SXML is very general, and is capable of representing all of XML.
24Formally, this means that SXML is a conforming implementation of the
25@uref{XML Information Set,http://www.w3.org/TR/xml-infoset/} standard.
58c4a39d 26
3e31e75a
AW
27Guile includes several facilities for working with XML and SXML:
28parsers, serializers, and transformers.
58c4a39d 29
3e31e75a
AW
30@menu
31* SXML Overview:: XML, as it was meant to be
32* Reading and Writing XML:: Convenient XML parsing and serializing
33* SSAX:: Custom functional-style XML parsers
34* Transforming SXML:: Munging SXML with @code{pre-post-order}
35* SXML Tree Fold:: Fold-based SXML transformations
36* SXPath:: XPath for SXML
37* sxml apply-templates:: A more XSLT-like approach to SXML transformations
38* sxml ssax input-parse:: The SSAX tokenizer, optimized for Guile
39@end menu
58c4a39d 40
3e31e75a
AW
41@node SXML Overview
42@subsection SXML Overview
58c4a39d 43
3e31e75a 44(This section needs to be written; volunteers welcome.)
58c4a39d 45
58c4a39d 46
3e31e75a
AW
47@node Reading and Writing XML
48@subsection Reading and Writing XML
58c4a39d 49
3e31e75a
AW
50The @code{(sxml simple)} module presents a basic interface for parsing
51XML from a port into the Scheme SXML format, and for serializing it back
52to text.
58c4a39d 53
3e31e75a
AW
54@example
55(use-modules (sxml simple))
56@end example
58c4a39d 57
a14b6e18 58@deffn {Scheme Procedure} xml->sxml [string-or-port] [#:namespaces='()] @
1488753a 59 [#:declare-namespaces?=#t] [#:trim-whitespace?=#f] @
e10c2509
AW
60 [#:entities='()] [#:default-entity-handler=#f] @
61 [#:doctype-handler=#f]
58c4a39d 62Use SSAX to parse an XML document into SXML. Takes one optional
a14b6e18
AW
63argument, @var{string-or-port}, which defaults to the current input
64port. Returns the resulting SXML document. If @var{string-or-port} is
65a port, it will be left pointing at the next available character in the
66port.
3e31e75a 67@end deffn
58c4a39d 68
1488753a
AW
69As is normal in SXML, XML elements parse as tagged lists. Attributes,
70if any, are placed after the tag, within an @code{@@} element. The root
71of the resulting XML will be contained in a special tag, @code{*TOP*}.
72This tag will contain the root element of the XML, but also any prior
73processing instructions.
74
75@example
a14b6e18 76(xml->sxml "<foo/>")
1488753a 77@result{} (*TOP* (foo))
a14b6e18 78(xml->sxml "<foo>text</foo>")
1488753a 79@result{} (*TOP* (foo "text"))
a14b6e18 80(xml->sxml "<foo kind=\"bar\">text</foo>")
1488753a 81@result{} (*TOP* (foo (@@ (kind "bar")) "text"))
a14b6e18 82(xml->sxml "<?xml version=\"1.0\"?><foo/>")
1488753a
AW
83@result{} (*TOP* (*PI* xml "version=\"1.0\"") (foo))
84@end example
85
86All namespaces in the XML document must be declared, via @code{xmlns}
87attributes. SXML elements built from non-default namespaces will have
88their tags prefixed with their URI. Users can specify custom prefixes
89for certain namespaces with the @code{#:namespaces} keyword argument to
90@code{xml->sxml}.
91
92@example
a14b6e18 93(xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>")
1488753a 94@result{} (*TOP* (http://example.org/ns1:foo "text"))
a14b6e18
AW
95(xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>"
96 #:namespaces '((ns1 . "http://example.org/ns1")))
1488753a 97@result{} (*TOP* (ns1:foo "text"))
a14b6e18
AW
98(xml->sxml "<foo xmlns:bar=\"http://example.org/ns2\"><bar:baz/></foo>"
99 #:namespaces '((ns2 . "http://example.org/ns2")))
1488753a
AW
100@result{} (*TOP* (foo (ns2:baz)))
101@end example
102
e10c2509
AW
103By default, namespaces passed to @code{xml->sxml} are treated as if they
104were declared on the root element. Passing a false
105@code{#:declare-namespaces?} argument will disable this behavior,
106requiring in-document declarations of namespaces before use..
1488753a
AW
107
108@example
a14b6e18 109(xml->sxml "<foo><ns2:baz/></foo>"
1488753a 110 #:namespaces '((ns2 . "http://example.org/ns2")))
e10c2509 111@result{} (*TOP* (foo (ns2:baz)))
a14b6e18 112(xml->sxml "<foo><ns2:baz/></foo>"
1488753a 113 #:namespaces '((ns2 . "http://example.org/ns2"))
e10c2509
AW
114 #:declare-namespaces? #f)
115@result{} error: undeclared namespace: `bar'
1488753a
AW
116@end example
117
118By default, all whitespace in XML is significant. Passing the
119@code{#:trim-whitespace?} keyword argument to @code{xml->sxml} will trim
120whitespace in front, behind and between elements, treating it as
121``unsignificant''. Whitespace in text fragments is left alone.
122
123@example
a14b6e18 124(xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
e10c2509 125@result{} (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n"))
a14b6e18 126(xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
1488753a 127 #:trim-whitespace? #t)
e10c2509 128@result{} (*TOP* (foo (bar " Alfie the parrot! ")))
1488753a
AW
129@end example
130
131Parsed entities may be declared with the @code{#:entities} keyword
132argument, or handled with the @code{#:default-entity-handler}. By
133default, only the standard @code{&lt;}, @code{&gt;}, @code{&amp;},
134@code{&apos;} and @code{&quot;} entities are defined, as well as the
135@code{&#@var{N};} and @code{&#x@var{N};} (decimal and hexadecimal)
136numeric character entities.
137
138@example
a14b6e18 139(xml->sxml "<foo>&amp;</foo>")
1488753a 140@result{} (*TOP* (foo "&"))
a14b6e18 141(xml->sxml "<foo>&nbsp;</foo>")
1488753a 142@result{} error: undefined entity: nbsp
a14b6e18 143(xml->sxml "<foo>&#xA0;</foo>")
1488753a 144@result{} (*TOP* (foo "\xa0"))
a14b6e18 145(xml->sxml "<foo>&nbsp;</foo>"
1488753a
AW
146 #:entities '((nbsp . "\xa0")))
147@result{} (*TOP* (foo "\xa0"))
a14b6e18 148(xml->sxml "<foo>&nbsp; &foo;</foo>"
1488753a
AW
149 #:default-entity-handler
150 (lambda (port name)
151 (case name
152 ((nbsp) "\xa0")
153 (else
154 (format (current-warning-port)
155 "~a:~a:~a: undefined entitity: ~a\n"
156 (or (port-filename port) "<unknown file>")
157 (port-line port) (port-column port)
158 name)
159 (symbol->string name)))))
160@print{} <unknown file>:0:17: undefined entitity: foo
161@result{} (*TOP* (foo "\xa0 foo"))
162@end example
163
e10c2509
AW
164By default, @code{xml->sxml} skips over the @code{<!DOCTYPE>}
165declaration, if any. This behavior can be overridden with the
166@code{#:doctype-handler} argument, which should be a procedure of three
167arguments: the @dfn{docname} (a symbol), @dfn{systemid} (a string), and
168the internal doctype subset (as a string or @code{#f} if not present).
169
170The handler should return keyword arguments as multiple values, as if it
171were calling its continuation with keyword arguments. The continuation
172accepts the @code{#:entities} and @code{#:namespaces} keyword arguments,
173in the same format that @code{xml->sxml} itself takes. These entities
174and namespaces will be prepended to those given to the @code{xml->sxml}
175invocation.
176
177@example
178(define (handle-foo docname systemid internal-subset)
179 (case docname
180 ((foo)
181 (values #:entities '((greets . "<i>Hello, world!</i>"))))
182 (else
183 (values))))
184
185(xml->sxml "<!DOCTYPE foo><p>&greets;</p>"
186 #:doctype-handler handle-foo)
187@result{} (*TOP* (p (i "Hello, world!")))
188@end example
189
190If the document has no doctype declaration, the @var{doctype-handler} is
191invoked with @code{#f} for the three arguments.
192
193In the future, the continuation may accept other keyword arguments, for
194example to validate the parsed SXML against the doctype.
195
3e31e75a
AW
196@deffn {Scheme Procedure} sxml->xml tree [port]
197Serialize the SXML tree @var{tree} as XML. The output will be written to
58c4a39d
AW
198the current output port, unless the optional argument @var{port} is
199present.
3e31e75a 200@end deffn
58c4a39d 201
3e31e75a 202@deffn {Scheme Procedure} sxml->string sxml
58c4a39d
AW
203Detag an sxml tree @var{sxml} into a string. Does not perform any
204formatting.
3e31e75a
AW
205@end deffn
206
207@node SSAX
208@subsection SSAX: A Functional XML Parsing Toolkit
209
210Guile's XML parser is based on Oleg Kiselyov's powerful XML parsing
211toolkit, SSAX.
212
213@subsubsection History
214
215Back in the 1990s, when the world was young again and XML was the
216solution to all of its problems, there were basically two kinds of XML
217parsers out there: DOM parsers and SAX parsers.
218
219A DOM parser reads through an entire XML document, building up a tree of
220``DOM objects'' representing the document structure. They are very easy
221to use, but sometimes you don't actually want all of the information in
222a document; building an object tree is not necessary if all you want to
223do is to count word frequencies in a document, for example.
224
225SAX parsers were created to give the programmer more control on the
226parsing process. A programmer gives the SAX parser a number of
227``callbacks'': functions that will be called on various features of the
228XML stream as they are encountered. SAX parsers are more efficient, but
229much harder to user, as users typically have to manually maintain a
230stack of open elements.
231
232Kiselyov realized that the SAX programming model could be made much
233simpler if the callbacks were formulated not as a linear fold across the
234features of the XML stream, but as a @emph{tree fold} over the structure
235implicit in the XML. In this way, the user has a very convenient,
236functional-style interface that can still generate optimal parsers.
237
238The @code{xml->sxml} interface from the @code{(sxml simple)} module is a
239DOM-style parser built using SSAX, though it returns SXML instead of DOM
240objects.
241
242@subsubsection Implementation
243
244@code{(sxml ssax)} is a package of low-to-high level lexing and parsing
245procedures that can be combined to yield a SAX, a DOM, a validating
246parser, or a parser intended for a particular document type. The
247procedures in the package can be used separately to tokenize or parse
248various pieces of XML documents. The package supports XML Namespaces,
249internal and external parsed entities, user-controlled handling of
250whitespace, and validation. This module therefore is intended to be a
251framework, a set of ``Lego blocks'' you can use to build a parser
252following any discipline and performing validation to any degree. As an
253example of the parser construction, this file includes a semi-validating
254SXML parser.
255
256SSAX has a ``sequential'' feel of SAX yet a ``functional style'' of DOM.
257Like a SAX parser, the framework scans the document only once and
258permits incremental processing. An application that handles document
259elements in order can run as efficiently as possible. @emph{Unlike} a
260SAX parser, the framework does not require an application register
261stateful callbacks and surrender control to the parser. Rather, it is
262the application that can drive the framework -- calling its functions to
263get the current lexical or syntax element. These functions do not
264maintain or mutate any state save the input port. Therefore, the
265framework permits parsing of XML in a pure functional style, with the
266input port being a monad (or a linear, read-once parameter).
267
268Besides the @var{port}, there is another monad -- @var{seed}. Most of
58c4a39d 269the middle- and high-level parsers are single-threaded through the
3e31e75a
AW
270@var{seed}. The functions of this framework do not process or affect
271the @var{seed} in any way: they simply pass it around as an instance of
272an opaque datatype. User functions, on the other hand, can use the seed
273to maintain user's state, to accumulate parsing results, etc. A user
274can freely mix his own functions with those of the framework. On the
275other hand, the user may wish to instantiate a high-level parser:
276@code{SSAX:make-elem-parser} or @code{SSAX:make-parser}. In the latter
58c4a39d
AW
277case, the user must provide functions of specific signatures, which are
278called at predictable moments during the parsing: to handle character
3e31e75a 279data, element data, or processing instructions (PI). The functions are
58c4a39d
AW
280always given the @var{seed}, among other parameters, and must return the
281new @var{seed}.
282
283From a functional point of view, XML parsing is a combined
3e31e75a 284pre-post-order traversal of a ``tree'' that is the XML document itself.
58c4a39d 285This down-and-up traversal tells the user about an element when its
3e31e75a
AW
286start tag is encountered. The user is notified about the element once
287more, after all element's children have been handled. The process of
288XML parsing therefore is a fold over the raw XML document. Unlike a
289fold over trees defined in [1], the parser is necessarily
290single-threaded -- obviously as elements in a text XML document are laid
291down sequentially. The parser therefore is a tree fold that has been
292transformed to accept an accumulating parameter [1,2].
58c4a39d
AW
293
294Formally, the denotational semantics of the parser can be expressed as
295
296@smallexample
297 parser:: (Start-tag -> Seed -> Seed) ->
298 (Start-tag -> Seed -> Seed -> Seed) ->
299 (Char-Data -> Seed -> Seed) ->
300 XML-text-fragment -> Seed -> Seed
301 parser fdown fup fchar "<elem attrs> content </elem>" seed
302 = fup "<elem attrs>" seed
303 (parser fdown fup fchar "content" (fdown "<elem attrs>" seed))
304
305 parser fdown fup fchar "char-data content" seed
306 = parser fdown fup fchar "content" (fchar "char-data" seed)
307
308 parser fdown fup fchar "elem-content content" seed
309 = parser fdown fup fchar "content" (
310 parser fdown fup fchar "elem-content" seed)
311@end smallexample
312
313Compare the last two equations with the left fold
314
315@smallexample
316 fold-left kons elem:list seed = fold-left kons list (kons elem seed)
317@end smallexample
318
319The real parser created by @code{SSAX:make-parser} is slightly more
320complicated, to account for processing instructions, entity references,
321namespaces, processing of document type declaration, etc.
322
3e31e75a
AW
323The XML standard document referred to in this module is
324@uref{http://www.w3.org/TR/1998/REC-xml-19980210.html}
58c4a39d
AW
325
326The present file also defines a procedure that parses the text of an XML
327document or of a separate element into SXML, an S-expression-based model
3e31e75a
AW
328of an XML Information Set. SXML is also an Abstract Syntax Tree of an
329XML document. SXML is similar but not identical to DOM; SXML is
58c4a39d 330particularly suitable for Scheme-based XML/HTML authoring, SXPath
3e31e75a
AW
331queries, and tree transformations. See SXML.html for more details.
332SXML is a term implementation of evaluation of the XML document [3].
333The other implementation is context-passing.
334
335The present frameworks fully supports the XML Namespaces Recommendation:
336@uref{http://www.w3.org/TR/REC-xml-names/}.
58c4a39d 337
3e31e75a 338Other links:
58c4a39d
AW
339
340@table @asis
341@item [1]
342Jeremy Gibbons, Geraint Jones, "The Under-appreciated Unfold," Proc.
343ICFP'98, 1998, pp. 273-279.
344
345@item [2]
346Richard S. Bird, The promotion and accumulation strategies in
347transformational programming, ACM Trans. Progr. Lang. Systems,
3486(4):487-504, October 1984.
349
350@item [3]
351Ralf Hinze, "Deriving Backtracking Monad Transformers," Functional
352Pearl. Proc ICFP'00, pp. 186-197.
353
354@end table
355
356@subsubsection Usage
3e31e75a
AW
357@deffn {Scheme Procedure} current-ssax-error-port
358@end deffn
58c4a39d 359
3e31e75a
AW
360@deffn {Scheme Procedure} with-ssax-error-to-port port thunk
361@end deffn
58c4a39d 362
3e31e75a 363@deffn {Scheme Procedure} xml-token? _
58c4a39d
AW
364@verbatim
365 -- Scheme Procedure: pair? x
366 Return `#t' if X is a pair; otherwise return `#f'.
367
368
369@end verbatim
3e31e75a 370@end deffn
58c4a39d 371
3e31e75a
AW
372@deffn {Scheme Syntax} xml-token-kind token
373@end deffn
58c4a39d 374
3e31e75a
AW
375@deffn {Scheme Syntax} xml-token-head token
376@end deffn
58c4a39d 377
3e31e75a
AW
378@deffn {Scheme Procedure} make-empty-attlist
379@end deffn
58c4a39d 380
3e31e75a
AW
381@deffn {Scheme Procedure} attlist-add attlist name-value
382@end deffn
58c4a39d 383
a4b4fbbd
JE
384@deffn {Scheme Procedure} attlist-null? x
385Return @code{#t} if @var{x} is the empty list, else @code{#f}.
3e31e75a 386@end deffn
58c4a39d 387
3e31e75a
AW
388@deffn {Scheme Procedure} attlist-remove-top attlist
389@end deffn
58c4a39d 390
3e31e75a
AW
391@deffn {Scheme Procedure} attlist->alist attlist
392@end deffn
58c4a39d 393
3e31e75a
AW
394@deffn {Scheme Procedure} attlist-fold kons knil lis1
395@end deffn
58c4a39d 396
3e31e75a
AW
397@deffn {Scheme Procedure} define-parsed-entity! entity str
398Define a new parsed entity. @var{entity} should be a symbol.
58c4a39d
AW
399
400Instances of &@var{entity}; in XML text will be replaced with the string
401@var{str}, which will then be parsed.
3e31e75a 402@end deffn
58c4a39d 403
3e31e75a 404@deffn {Scheme Procedure} reset-parsed-entity-definitions!
58c4a39d 405Restore the set of parsed entity definitions to its initial state.
3e31e75a 406@end deffn
58c4a39d 407
3e31e75a
AW
408@deffn {Scheme Procedure} ssax:uri-string->symbol uri-str
409@end deffn
58c4a39d 410
3e31e75a
AW
411@deffn {Scheme Procedure} ssax:skip-internal-dtd port
412@end deffn
58c4a39d 413
3e31e75a
AW
414@deffn {Scheme Procedure} ssax:read-pi-body-as-string port
415@end deffn
58c4a39d 416
3e31e75a
AW
417@deffn {Scheme Procedure} ssax:reverse-collect-str-drop-ws fragments
418@end deffn
58c4a39d 419
3e31e75a
AW
420@deffn {Scheme Procedure} ssax:read-markup-token port
421@end deffn
58c4a39d 422
3e31e75a
AW
423@deffn {Scheme Procedure} ssax:read-cdata-body port str-handler seed
424@end deffn
58c4a39d 425
3e31e75a
AW
426@deffn {Scheme Procedure} ssax:read-char-ref port
427@end deffn
58c4a39d 428
3e31e75a
AW
429@deffn {Scheme Procedure} ssax:read-attributes port entities
430@end deffn
58c4a39d 431
3e31e75a
AW
432@deffn {Scheme Procedure} ssax:complete-start-tag tag-head port elems entities namespaces
433@end deffn
58c4a39d 434
3e31e75a
AW
435@deffn {Scheme Procedure} ssax:read-external-id port
436@end deffn
58c4a39d 437
3e31e75a
AW
438@deffn {Scheme Procedure} ssax:read-char-data port expect-eof? str-handler seed
439@end deffn
58c4a39d 440
3e31e75a
AW
441@deffn {Scheme Procedure} ssax:xml->sxml port namespace-prefix-assig
442@end deffn
58c4a39d 443
3e31e75a
AW
444@deffn {Scheme Syntax} ssax:make-parser . kw-val-pairs
445@end deffn
58c4a39d 446
3e31e75a
AW
447@deffn {Scheme Syntax} ssax:make-pi-parser orig-handlers
448@end deffn
58c4a39d 449
3e31e75a
AW
450@deffn {Scheme Syntax} ssax:make-elem-parser my-new-level-seed my-finish-element my-char-data-handler my-pi-handlers
451@end deffn
58c4a39d 452
3e31e75a
AW
453@node Transforming SXML
454@subsection Transforming SXML
58c4a39d
AW
455@subsubsection Overview
456@heading SXML expression tree transformers
457@subheading Pre-Post-order traversal of a tree and creation of a new tree
458@smallexample
459pre-post-order:: <tree> x <bindings> -> <new-tree>
460@end smallexample
461
462where
463
464@smallexample
465 <bindings> ::= (<binding> ...)
466 <binding> ::= (<trigger-symbol> *preorder* . <handler>) |
467 (<trigger-symbol> *macro* . <handler>) |
468 (<trigger-symbol> <new-bindings> . <handler>) |
469 (<trigger-symbol> . <handler>)
470 <trigger-symbol> ::= XMLname | *text* | *default*
471 <handler> :: <trigger-symbol> x [<tree>] -> <new-tree>
472@end smallexample
473
474The pre-post-order function visits the nodes and nodelists
3e31e75a 475pre-post-order (depth-first). For each @code{<Node>} of the form
58c4a39d 476@code{(@var{name} <Node> ...)}, it looks up an association with the
3e31e75a
AW
477given @var{name} among its @var{<bindings>}. If failed,
478@code{pre-post-order} tries to locate a @code{*default*} binding. It's
479an error if the latter attempt fails as well. Having found a binding,
58c4a39d
AW
480the @code{pre-post-order} function first checks to see if the binding is
481of the form
482
483@smallexample
484 (<trigger-symbol> *preorder* . <handler>)
485@end smallexample
486
3e31e75a 487If it is, the handler is 'applied' to the current node. Otherwise, the
58c4a39d
AW
488pre-post-order function first calls itself recursively for each child of
489the current node, with @var{<new-bindings>} prepended to the
3e31e75a
AW
490@var{<bindings>} in effect. The result of these calls is passed to the
491@var{<handler>} (along with the head of the current @var{<Node>}). To be
58c4a39d 492more precise, the handler is _applied_ to the head of the current node
3e31e75a
AW
493and its processed children. The result of the handler, which should also
494be a @code{<tree>}, replaces the current @var{<Node>}. If the current
58c4a39d
AW
495@var{<Node>} is a text string or other atom, a special binding with a
496symbol @code{*text*} is looked up.
497
498A binding can also be of a form
499
500@smallexample
501 (<trigger-symbol> *macro* . <handler>)
502@end smallexample
503
3e31e75a 504This is equivalent to @code{*preorder*} described above. However, the
58c4a39d
AW
505result is re-processed again, with the current stylesheet.
506
507@subsubsection Usage
3e31e75a 508@deffn {Scheme Procedure} SRV:send-reply . fragments
58c4a39d
AW
509Output the @var{fragments} to the current output port.
510
511The fragments are a list of strings, characters, numbers, thunks,
3e31e75a 512@code{#f}, @code{#t} -- and other fragments. The function traverses the
58c4a39d 513tree depth-first, writes out strings and characters, executes thunks,
3e31e75a 514and ignores @code{#f} and @code{'()}. The function returns @code{#t} if
58c4a39d
AW
515anything was written at all; otherwise the result is @code{#f} If
516@code{#t} occurs among the fragments, it is not written out but causes
517the result of @code{SRV:send-reply} to be @code{#t}.
3e31e75a
AW
518@end deffn
519
520@deffn {Scheme Procedure} foldts fdown fup fhere seed tree
521@end deffn
522
523@deffn {Scheme Procedure} post-order tree bindings
524@end deffn
525
526@deffn {Scheme Procedure} pre-post-order tree bindings
527@end deffn
528
529@deffn {Scheme Procedure} replace-range beg-pred end-pred forest
530@end deffn
531
532@node SXML Tree Fold
533@subsection SXML Tree Fold
534@subsubsection Overview
535@code{(sxml fold)} defines a number of variants of the @dfn{fold}
536algorithm for use in transforming SXML trees. Additionally it defines
537the layout operator, @code{fold-layout}, which might be described as a
538context-passing variant of SSAX's @code{pre-post-order}.
539
540@subsubsection Usage
541@deffn {Scheme Procedure} foldt fup fhere tree
542The standard multithreaded tree fold.
543
544@var{fup} is of type [a] -> a. @var{fhere} is of type object -> a.
545@end deffn
546
547@deffn {Scheme Procedure} foldts fdown fup fhere seed tree
548The single-threaded tree fold originally defined in SSAX. @xref{SSAX},
549for more information.
550@end deffn
551
552@deffn {Scheme Procedure} foldts* fdown fup fhere seed tree
553A variant of @code{foldts} that allows pre-order tree
554rewrites. Originally defined in Andy Wingo's 2007 paper,
555@emph{Applications of fold to XML transformation}.
556@end deffn
557
558@deffn {Scheme Procedure} fold-values proc list . seeds
559A variant of @code{fold} that allows multi-valued seeds. Note that the
560order of the arguments differs from that of @code{fold}. @xref{SRFI-1
561Fold and Map}.
562@end deffn
563
564@deffn {Scheme Procedure} foldts*-values fdown fup fhere tree . seeds
565A variant of @code{foldts*} that allows multi-valued
566seeds. Originally defined in Andy Wingo's 2007 paper, @emph{Applications
567of fold to XML transformation}.
568@end deffn
569
570@deffn {Scheme Procedure} fold-layout tree bindings params layout stylesheet
571A traversal combinator in the spirit of @code{pre-post-order}.
572@xref{Transforming SXML}.
573
574@code{fold-layout} was originally presented in Andy Wingo's 2007 paper,
575@emph{Applications of fold to XML transformation}.
576
577@example
578bindings := (<binding>...)
579binding := (<tag> <bandler-pair>...)
580 | (*default* . <post-handler>)
581 | (*text* . <text-handler>)
582tag := <symbol>
583handler-pair := (pre-layout . <pre-layout-handler>)
584 | (post . <post-handler>)
585 | (bindings . <bindings>)
586 | (pre . <pre-handler>)
587 | (macro . <macro-handler>)
588@end example
589
590@table @var
591@item pre-layout-handler
592A function of three arguments:
593
594@table @var
595@item kids
596the kids of the current node, before traversal
597
598@item params
599the params of the current node
600
601@item layout
602the layout coming into this node
603
604@end table
605
606@var{pre-layout-handler} is expected to use this information to return a
607layout to pass to the kids. The default implementation returns the
608layout given in the arguments.
609
610@item post-handler
611A function of five arguments:
612
613@table @var
614@item tag
615the current tag being processed
616
617@item params
618the params of the current node
619
620@item layout
621the layout coming into the current node, before any kids were processed
622
623@item klayout
624the layout after processing all of the children
625
626@item kids
627the already-processed child nodes
628
629@end table
58c4a39d 630
3e31e75a
AW
631@var{post-handler} should return two values, the layout to pass to the
632next node and the final tree.
58c4a39d 633
3e31e75a
AW
634@item text-handler
635@var{text-handler} is a function of three arguments:
58c4a39d 636
3e31e75a
AW
637@table @var
638@item text
639the string
58c4a39d 640
3e31e75a
AW
641@item params
642the current params
58c4a39d 643
3e31e75a
AW
644@item layout
645the current layout
646
647@end table
648
649@var{text-handler} should return two values, the layout to pass to the
650next node and the value to which the string should transform.
651
652@end table
653@end deffn
58c4a39d 654
3e31e75a
AW
655@node SXPath
656@subsection SXPath
58c4a39d
AW
657@subsubsection Overview
658@heading SXPath: SXML Query Language
659SXPath is a query language for SXML, an instance of XML Information set
3e31e75a
AW
660(Infoset) in the form of s-expressions. See @code{(sxml ssax)} for the
661definition of SXML and more details. SXPath is also a translation into
58c4a39d
AW
662Scheme of an XML Path Language, @uref{http://www.w3.org/TR/xpath,XPath}.
663XPath and SXPath describe means of selecting a set of Infoset's items or
664their properties.
665
666To facilitate queries, XPath maps the XML Infoset into an explicit tree,
667and introduces important notions of a location path and a current,
3e31e75a
AW
668context node. A location path denotes a selection of a set of nodes
669relative to a context node. Any XPath tree has a distinguished, root
58c4a39d
AW
670node -- which serves as the context node for absolute location paths.
671Location path is recursively defined as a location step joined with a
3e31e75a
AW
672location path. A location step is a simple query of the database
673relative to a context node. A step may include expressions that further
674filter the selected set. Each node in the resulting set is used as a
675context node for the adjoining location path. The result of the step is
58c4a39d
AW
676a union of the sets returned by the latter location paths.
677
678The SXML representation of the XML Infoset (see SSAX.scm) is rather
3e31e75a 679suitable for querying as it is. Bowing to the XPath specification, we
58c4a39d
AW
680will refer to SXML information items as 'Nodes':
681
682@example
683 <Node> ::= <Element> | <attributes-coll> | <attrib>
684 | "text string" | <PI>
685@end example
686
687This production can also be described as
688
689@example
690 <Node> ::= (name . <Nodeset>) | "text string"
691@end example
692
693An (ordered) set of nodes is just a list of the constituent nodes:
694
695@example
696 <Nodeset> ::= (<Node> ...)
697@end example
698
3e31e75a
AW
699Nodesets, and Nodes other than text strings are both lists. A <Nodeset>
700however is either an empty list, or a list whose head is not a symbol. A
58c4a39d 701symbol at the head of a node is either an XML name (in which case it's a
3e31e75a 702tag of an XML element), or an administrative name such as '@@'. This
58c4a39d 703uniform list representation makes processing rather simple and elegant,
3e31e75a 704while avoiding confusion. The multi-branch tree structure formed by the
58c4a39d
AW
705mutually-recursive datatypes <Node> and <Nodeset> lends itself well to
706processing by functional languages.
707
708A location path is in fact a composite query over an XPath tree or its
3e31e75a
AW
709branch. A singe step is a combination of a projection, selection or a
710transitive closure. Multiple steps are combined via join and union
711operations. This insight allows us to @emph{elegantly} implement XPath
58c4a39d 712as a sequence of projection and filtering primitives -- converters --
3e31e75a 713joined by @dfn{combinators}. Each converter takes a node and returns a
58c4a39d 714nodeset which is the result of the corresponding query relative to that
3e31e75a 715node. A converter can also be called on a set of nodes. In that case it
58c4a39d
AW
716returns a union of the corresponding queries over each node in the set.
717The union is easily implemented as a list append operation as all nodes
3e31e75a
AW
718in a SXML tree are considered distinct, by XPath conventions. We also
719preserve the order of the members in the union. Query combinators are
58c4a39d 720high-order functions: they take converter(s) (which is a Node|Nodeset ->
3e31e75a 721Nodeset function) and compose or otherwise combine them. We will be
58c4a39d
AW
722concerned with only relative location paths [XPath]: an absolute
723location path is a relative path applied to the root node.
724
725Similarly to XPath, SXPath defines full and abbreviated notations for
3e31e75a
AW
726location paths. In both cases, the abbreviated notation can be
727mechanically expanded into the full form by simple rewriting rules. In
58c4a39d 728case of SXPath the corresponding rules are given as comments to a sxpath
3e31e75a 729function, below. The regression test suite at the end of this file shows
58c4a39d 730a representative sample of SXPaths in both notations, juxtaposed with
3e31e75a 731the corresponding XPath expressions. Most of the samples are borrowed
58c4a39d
AW
732literally from the XPath specification, while the others are adjusted
733for our running example, tree1.
734
735@subsubsection Usage
3e31e75a
AW
736@deffn {Scheme Procedure} nodeset? x
737@end deffn
58c4a39d 738
3e31e75a
AW
739@deffn {Scheme Procedure} node-typeof? crit
740@end deffn
58c4a39d 741
3e31e75a
AW
742@deffn {Scheme Procedure} node-eq? other
743@end deffn
58c4a39d 744
3e31e75a
AW
745@deffn {Scheme Procedure} node-equal? other
746@end deffn
58c4a39d 747
3e31e75a
AW
748@deffn {Scheme Procedure} node-pos n
749@end deffn
58c4a39d 750
3e31e75a 751@deffn {Scheme Procedure} filter pred?
58c4a39d
AW
752@verbatim
753 -- Scheme Procedure: filter pred list
754 Return all the elements of 2nd arg LIST that satisfy predicate
755 PRED. The list is not disordered - elements that appear in the
756 result list occur in the same order as they occur in the argument
3e31e75a
AW
757 list. The returned list may share a common tail with the argument
758 list. The dynamic order in which the various applications of pred
58c4a39d
AW
759 are made is not specified.
760
761 (filter even? '(0 7 8 8 43 -4)) => (0 8 8 -4)
762
763
764@end verbatim
3e31e75a 765@end deffn
58c4a39d 766
3e31e75a
AW
767@deffn {Scheme Procedure} take-until pred?
768@end deffn
58c4a39d 769
3e31e75a
AW
770@deffn {Scheme Procedure} take-after pred?
771@end deffn
58c4a39d 772
3e31e75a
AW
773@deffn {Scheme Procedure} map-union proc lst
774@end deffn
58c4a39d 775
3e31e75a
AW
776@deffn {Scheme Procedure} node-reverse node-or-nodeset
777@end deffn
58c4a39d 778
3e31e75a
AW
779@deffn {Scheme Procedure} node-trace title
780@end deffn
58c4a39d 781
3e31e75a
AW
782@deffn {Scheme Procedure} select-kids test-pred?
783@end deffn
58c4a39d 784
3e31e75a 785@deffn {Scheme Procedure} node-self pred?
58c4a39d
AW
786@verbatim
787 -- Scheme Procedure: filter pred list
788 Return all the elements of 2nd arg LIST that satisfy predicate
789 PRED. The list is not disordered - elements that appear in the
790 result list occur in the same order as they occur in the argument
3e31e75a
AW
791 list. The returned list may share a common tail with the argument
792 list. The dynamic order in which the various applications of pred
58c4a39d
AW
793 are made is not specified.
794
795 (filter even? '(0 7 8 8 43 -4)) => (0 8 8 -4)
796
797
798@end verbatim
3e31e75a
AW
799@end deffn
800
801@deffn {Scheme Procedure} node-join . selectors
802@end deffn
803
804@deffn {Scheme Procedure} node-reduce . converters
805@end deffn
806
807@deffn {Scheme Procedure} node-or . converters
808@end deffn
809
810@deffn {Scheme Procedure} node-closure test-pred?
811@end deffn
812
813@deffn {Scheme Procedure} node-parent rootnode
814@end deffn
815
816@deffn {Scheme Procedure} sxpath path
817@end deffn
818
819@node sxml ssax input-parse
820@subsection (sxml ssax input-parse)
821@subsubsection Overview
822A simple lexer.
823
824The procedures in this module surprisingly often suffice to parse an
825input stream. They either skip, or build and return tokens, according to
826inclusion or delimiting semantics. The list of characters to expect,
827include, or to break at may vary from one invocation of a function to
828another. This allows the functions to easily parse even
829context-sensitive languages.
830
831EOF is generally frowned on, and thrown up upon if encountered.
832Exceptions are mentioned specifically. The list of expected characters
833(characters to skip until, or break-characters) may include an EOF
834"character", which is to be coded as the symbol, @code{*eof*}.
835
836The input stream to parse is specified as a @dfn{port}, which is usually
837the last (and optional) argument. It defaults to the current input port
838if omitted.
839
840If the parser encounters an error, it will throw an exception to the key
841@code{parser-error}. The arguments will be of the form @code{(@var{port}
842@var{message} @var{specialising-msg}*)}.
843
844The first argument is a port, which typically points to the offending
845character or its neighborhood. You can then use @code{port-column} and
846@code{port-line} to query the current position. @var{message} is the
847description of the error. Other arguments supply more details about the
848problem.
849
850@subsubsection Usage
851@deffn {Scheme Procedure} peek-next-char [port]
852@end deffn
853
854@deffn {Scheme Procedure} assert-curr-char expected-chars comment [port]
855@end deffn
856
857@deffn {Scheme Procedure} skip-until arg [port]
858@end deffn
58c4a39d 859
3e31e75a
AW
860@deffn {Scheme Procedure} skip-while skip-chars [port]
861@end deffn
58c4a39d 862
3e31e75a
AW
863@deffn {Scheme Procedure} next-token prefix-skipped-chars break-chars [comment] [port]
864@end deffn
58c4a39d 865
3e31e75a
AW
866@deffn {Scheme Procedure} next-token-of incl-list/pred [port]
867@end deffn
58c4a39d 868
3e31e75a
AW
869@deffn {Scheme Procedure} read-text-line [port]
870@end deffn
58c4a39d 871
3e31e75a
AW
872@deffn {Scheme Procedure} read-string n [port]
873@end deffn
58c4a39d 874
3e31e75a
AW
875@deffn {Scheme Procedure} find-string-from-port? _ _ . _
876Looks for @var{str} in @var{<input-port>}, optionally within the first
877@var{max-no-char} characters.
878@end deffn
879
880@node sxml apply-templates
881@subsection (sxml apply-templates)
882@subsubsection Overview
883Pre-order traversal of a tree and creation of a new tree:
884
885@smallexample
886 apply-templates:: tree x <templates> -> <new-tree>
887@end smallexample
888
889where
890
891@smallexample
892 <templates> ::= (<template> ...)
893 <template> ::= (<node-test> <node-test> ... <node-test> . <handler>)
894 <node-test> ::= an argument to node-typeof? above
895 <handler> ::= <tree> -> <new-tree>
896@end smallexample
897
898This procedure does a @emph{normal}, pre-order traversal of an SXML
899tree. It walks the tree, checking at each node against the list of
900matching templates.
901
902If the match is found (which must be unique, i.e., unambiguous), the
903corresponding handler is invoked and given the current node as an
904argument. The result from the handler, which must be a @code{<tree>},
905takes place of the current node in the resulting tree. The name of the
906function is not accidental: it resembles rather closely an
907@code{apply-templates} function of XSLT.
908
909@subsubsection Usage
910@deffn {Scheme Procedure} apply-templates tree templates
911@end deffn
58c4a39d 912