Merge commit '58147d67806e1f54c447d7eabac35b1a5086c3a6'
[bpt/guile.git] / doc / ref / sxml.texi
1 @c -*-texinfo-*-
2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 2013 Free Software Foundation, Inc.
4 @c See the file guile.texi for copying conditions.
5
6 @node SXML
7 @section SXML
8
9 SXML is a native representation of XML in terms of standard Scheme data
10 types: lists, symbols, and strings. For example, the simple XML
11 fragment:
12
13 @example
14 <parrot type="African Grey"><name>Alfie</name></parrot>
15 @end example
16
17 may be represented with the following SXML:
18
19 @example
20 (parrot (@@ (type "African Grey)) (name "Alfie"))
21 @end example
22
23 SXML is very general, and is capable of representing all of XML.
24 Formally, this means that SXML is a conforming implementation of the
25 @uref{XML Information Set,http://www.w3.org/TR/xml-infoset/} standard.
26
27 Guile includes several facilities for working with XML and SXML:
28 parsers, serializers, and transformers.
29
30 @menu
31 * SXML Overview:: XML, as it was meant to be
32 * Reading and Writing XML:: Convenient XML parsing and serializing
33 * SSAX:: Custom functional-style XML parsers
34 * Transforming SXML:: Munging SXML with @code{pre-post-order}
35 * SXML Tree Fold:: Fold-based SXML transformations
36 * SXPath:: XPath for SXML
37 * sxml apply-templates:: A more XSLT-like approach to SXML transformations
38 * sxml ssax input-parse:: The SSAX tokenizer, optimized for Guile
39 @end menu
40
41 @node SXML Overview
42 @subsection SXML Overview
43
44 (This section needs to be written; volunteers welcome.)
45
46
47 @node Reading and Writing XML
48 @subsection Reading and Writing XML
49
50 The @code{(sxml simple)} module presents a basic interface for parsing
51 XML from a port into the Scheme SXML format, and for serializing it back
52 to text.
53
54 @example
55 (use-modules (sxml simple))
56 @end example
57
58 @deffn {Scheme Procedure} xml->sxml [string-or-port] [#:namespaces='()] @
59 [#:declare-namespaces?=#t] [#:trim-whitespace?=#f] @
60 [#:entities='()] [#:default-entity-handler=#f] @
61 [#:doctype-handler=#f]
62 Use SSAX to parse an XML document into SXML. Takes one optional
63 argument, @var{string-or-port}, which defaults to the current input
64 port. Returns the resulting SXML document. If @var{string-or-port} is
65 a port, it will be left pointing at the next available character in the
66 port.
67 @end deffn
68
69 As is normal in SXML, XML elements parse as tagged lists. Attributes,
70 if any, are placed after the tag, within an @code{@@} element. The root
71 of the resulting XML will be contained in a special tag, @code{*TOP*}.
72 This tag will contain the root element of the XML, but also any prior
73 processing instructions.
74
75 @example
76 (xml->sxml "<foo/>")
77 @result{} (*TOP* (foo))
78 (xml->sxml "<foo>text</foo>")
79 @result{} (*TOP* (foo "text"))
80 (xml->sxml "<foo kind=\"bar\">text</foo>")
81 @result{} (*TOP* (foo (@@ (kind "bar")) "text"))
82 (xml->sxml "<?xml version=\"1.0\"?><foo/>")
83 @result{} (*TOP* (*PI* xml "version=\"1.0\"") (foo))
84 @end example
85
86 All namespaces in the XML document must be declared, via @code{xmlns}
87 attributes. SXML elements built from non-default namespaces will have
88 their tags prefixed with their URI. Users can specify custom prefixes
89 for certain namespaces with the @code{#:namespaces} keyword argument to
90 @code{xml->sxml}.
91
92 @example
93 (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>")
94 @result{} (*TOP* (http://example.org/ns1:foo "text"))
95 (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>"
96 #:namespaces '((ns1 . "http://example.org/ns1")))
97 @result{} (*TOP* (ns1:foo "text"))
98 (xml->sxml "<foo xmlns:bar=\"http://example.org/ns2\"><bar:baz/></foo>"
99 #:namespaces '((ns2 . "http://example.org/ns2")))
100 @result{} (*TOP* (foo (ns2:baz)))
101 @end example
102
103 By default, namespaces passed to @code{xml->sxml} are treated as if they
104 were declared on the root element. Passing a false
105 @code{#:declare-namespaces?} argument will disable this behavior,
106 requiring in-document declarations of namespaces before use..
107
108 @example
109 (xml->sxml "<foo><ns2:baz/></foo>"
110 #:namespaces '((ns2 . "http://example.org/ns2")))
111 @result{} (*TOP* (foo (ns2:baz)))
112 (xml->sxml "<foo><ns2:baz/></foo>"
113 #:namespaces '((ns2 . "http://example.org/ns2"))
114 #:declare-namespaces? #f)
115 @result{} error: undeclared namespace: `bar'
116 @end example
117
118 By default, all whitespace in XML is significant. Passing the
119 @code{#:trim-whitespace?} keyword argument to @code{xml->sxml} will trim
120 whitespace in front, behind and between elements, treating it as
121 ``unsignificant''. Whitespace in text fragments is left alone.
122
123 @example
124 (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
125 @result{} (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n"))
126 (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
127 #:trim-whitespace? #t)
128 @result{} (*TOP* (foo (bar " Alfie the parrot! ")))
129 @end example
130
131 Parsed entities may be declared with the @code{#:entities} keyword
132 argument, or handled with the @code{#:default-entity-handler}. By
133 default, only the standard @code{&lt;}, @code{&gt;}, @code{&amp;},
134 @code{&apos;} and @code{&quot;} entities are defined, as well as the
135 @code{&#@var{N};} and @code{&#x@var{N};} (decimal and hexadecimal)
136 numeric character entities.
137
138 @example
139 (xml->sxml "<foo>&amp;</foo>")
140 @result{} (*TOP* (foo "&"))
141 (xml->sxml "<foo>&nbsp;</foo>")
142 @result{} error: undefined entity: nbsp
143 (xml->sxml "<foo>&#xA0;</foo>")
144 @result{} (*TOP* (foo "\xa0"))
145 (xml->sxml "<foo>&nbsp;</foo>"
146 #:entities '((nbsp . "\xa0")))
147 @result{} (*TOP* (foo "\xa0"))
148 (xml->sxml "<foo>&nbsp; &foo;</foo>"
149 #:default-entity-handler
150 (lambda (port name)
151 (case name
152 ((nbsp) "\xa0")
153 (else
154 (format (current-warning-port)
155 "~a:~a:~a: undefined entitity: ~a\n"
156 (or (port-filename port) "<unknown file>")
157 (port-line port) (port-column port)
158 name)
159 (symbol->string name)))))
160 @print{} <unknown file>:0:17: undefined entitity: foo
161 @result{} (*TOP* (foo "\xa0 foo"))
162 @end example
163
164 By default, @code{xml->sxml} skips over the @code{<!DOCTYPE>}
165 declaration, if any. This behavior can be overridden with the
166 @code{#:doctype-handler} argument, which should be a procedure of three
167 arguments: the @dfn{docname} (a symbol), @dfn{systemid} (a string), and
168 the internal doctype subset (as a string or @code{#f} if not present).
169
170 The handler should return keyword arguments as multiple values, as if it
171 were calling its continuation with keyword arguments. The continuation
172 accepts the @code{#:entities} and @code{#:namespaces} keyword arguments,
173 in the same format that @code{xml->sxml} itself takes. These entities
174 and namespaces will be prepended to those given to the @code{xml->sxml}
175 invocation.
176
177 @example
178 (define (handle-foo docname systemid internal-subset)
179 (case docname
180 ((foo)
181 (values #:entities '((greets . "<i>Hello, world!</i>"))))
182 (else
183 (values))))
184
185 (xml->sxml "<!DOCTYPE foo><p>&greets;</p>"
186 #:doctype-handler handle-foo)
187 @result{} (*TOP* (p (i "Hello, world!")))
188 @end example
189
190 If the document has no doctype declaration, the @var{doctype-handler} is
191 invoked with @code{#f} for the three arguments.
192
193 In the future, the continuation may accept other keyword arguments, for
194 example to validate the parsed SXML against the doctype.
195
196 @deffn {Scheme Procedure} sxml->xml tree [port]
197 Serialize the SXML tree @var{tree} as XML. The output will be written to
198 the current output port, unless the optional argument @var{port} is
199 present.
200 @end deffn
201
202 @deffn {Scheme Procedure} sxml->string sxml
203 Detag an sxml tree @var{sxml} into a string. Does not perform any
204 formatting.
205 @end deffn
206
207 @node SSAX
208 @subsection SSAX: A Functional XML Parsing Toolkit
209
210 Guile's XML parser is based on Oleg Kiselyov's powerful XML parsing
211 toolkit, SSAX.
212
213 @subsubsection History
214
215 Back in the 1990s, when the world was young again and XML was the
216 solution to all of its problems, there were basically two kinds of XML
217 parsers out there: DOM parsers and SAX parsers.
218
219 A DOM parser reads through an entire XML document, building up a tree of
220 ``DOM objects'' representing the document structure. They are very easy
221 to use, but sometimes you don't actually want all of the information in
222 a document; building an object tree is not necessary if all you want to
223 do is to count word frequencies in a document, for example.
224
225 SAX parsers were created to give the programmer more control on the
226 parsing process. A programmer gives the SAX parser a number of
227 ``callbacks'': functions that will be called on various features of the
228 XML stream as they are encountered. SAX parsers are more efficient, but
229 much harder to user, as users typically have to manually maintain a
230 stack of open elements.
231
232 Kiselyov realized that the SAX programming model could be made much
233 simpler if the callbacks were formulated not as a linear fold across the
234 features of the XML stream, but as a @emph{tree fold} over the structure
235 implicit in the XML. In this way, the user has a very convenient,
236 functional-style interface that can still generate optimal parsers.
237
238 The @code{xml->sxml} interface from the @code{(sxml simple)} module is a
239 DOM-style parser built using SSAX, though it returns SXML instead of DOM
240 objects.
241
242 @subsubsection Implementation
243
244 @code{(sxml ssax)} is a package of low-to-high level lexing and parsing
245 procedures that can be combined to yield a SAX, a DOM, a validating
246 parser, or a parser intended for a particular document type. The
247 procedures in the package can be used separately to tokenize or parse
248 various pieces of XML documents. The package supports XML Namespaces,
249 internal and external parsed entities, user-controlled handling of
250 whitespace, and validation. This module therefore is intended to be a
251 framework, a set of ``Lego blocks'' you can use to build a parser
252 following any discipline and performing validation to any degree. As an
253 example of the parser construction, this file includes a semi-validating
254 SXML parser.
255
256 SSAX has a ``sequential'' feel of SAX yet a ``functional style'' of DOM.
257 Like a SAX parser, the framework scans the document only once and
258 permits incremental processing. An application that handles document
259 elements in order can run as efficiently as possible. @emph{Unlike} a
260 SAX parser, the framework does not require an application register
261 stateful callbacks and surrender control to the parser. Rather, it is
262 the application that can drive the framework -- calling its functions to
263 get the current lexical or syntax element. These functions do not
264 maintain or mutate any state save the input port. Therefore, the
265 framework permits parsing of XML in a pure functional style, with the
266 input port being a monad (or a linear, read-once parameter).
267
268 Besides the @var{port}, there is another monad -- @var{seed}. Most of
269 the middle- and high-level parsers are single-threaded through the
270 @var{seed}. The functions of this framework do not process or affect
271 the @var{seed} in any way: they simply pass it around as an instance of
272 an opaque datatype. User functions, on the other hand, can use the seed
273 to maintain user's state, to accumulate parsing results, etc. A user
274 can freely mix his own functions with those of the framework. On the
275 other hand, the user may wish to instantiate a high-level parser:
276 @code{SSAX:make-elem-parser} or @code{SSAX:make-parser}. In the latter
277 case, the user must provide functions of specific signatures, which are
278 called at predictable moments during the parsing: to handle character
279 data, element data, or processing instructions (PI). The functions are
280 always given the @var{seed}, among other parameters, and must return the
281 new @var{seed}.
282
283 From a functional point of view, XML parsing is a combined
284 pre-post-order traversal of a ``tree'' that is the XML document itself.
285 This down-and-up traversal tells the user about an element when its
286 start tag is encountered. The user is notified about the element once
287 more, after all element's children have been handled. The process of
288 XML parsing therefore is a fold over the raw XML document. Unlike a
289 fold over trees defined in [1], the parser is necessarily
290 single-threaded -- obviously as elements in a text XML document are laid
291 down sequentially. The parser therefore is a tree fold that has been
292 transformed to accept an accumulating parameter [1,2].
293
294 Formally, the denotational semantics of the parser can be expressed as
295
296 @smallexample
297 parser:: (Start-tag -> Seed -> Seed) ->
298 (Start-tag -> Seed -> Seed -> Seed) ->
299 (Char-Data -> Seed -> Seed) ->
300 XML-text-fragment -> Seed -> Seed
301 parser fdown fup fchar "<elem attrs> content </elem>" seed
302 = fup "<elem attrs>" seed
303 (parser fdown fup fchar "content" (fdown "<elem attrs>" seed))
304
305 parser fdown fup fchar "char-data content" seed
306 = parser fdown fup fchar "content" (fchar "char-data" seed)
307
308 parser fdown fup fchar "elem-content content" seed
309 = parser fdown fup fchar "content" (
310 parser fdown fup fchar "elem-content" seed)
311 @end smallexample
312
313 Compare the last two equations with the left fold
314
315 @smallexample
316 fold-left kons elem:list seed = fold-left kons list (kons elem seed)
317 @end smallexample
318
319 The real parser created by @code{SSAX:make-parser} is slightly more
320 complicated, to account for processing instructions, entity references,
321 namespaces, processing of document type declaration, etc.
322
323 The XML standard document referred to in this module is
324 @uref{http://www.w3.org/TR/1998/REC-xml-19980210.html}
325
326 The present file also defines a procedure that parses the text of an XML
327 document or of a separate element into SXML, an S-expression-based model
328 of an XML Information Set. SXML is also an Abstract Syntax Tree of an
329 XML document. SXML is similar but not identical to DOM; SXML is
330 particularly suitable for Scheme-based XML/HTML authoring, SXPath
331 queries, and tree transformations. See SXML.html for more details.
332 SXML is a term implementation of evaluation of the XML document [3].
333 The other implementation is context-passing.
334
335 The present frameworks fully supports the XML Namespaces Recommendation:
336 @uref{http://www.w3.org/TR/REC-xml-names/}.
337
338 Other links:
339
340 @table @asis
341 @item [1]
342 Jeremy Gibbons, Geraint Jones, "The Under-appreciated Unfold," Proc.
343 ICFP'98, 1998, pp. 273-279.
344
345 @item [2]
346 Richard S. Bird, The promotion and accumulation strategies in
347 transformational programming, ACM Trans. Progr. Lang. Systems,
348 6(4):487-504, October 1984.
349
350 @item [3]
351 Ralf Hinze, "Deriving Backtracking Monad Transformers," Functional
352 Pearl. Proc ICFP'00, pp. 186-197.
353
354 @end table
355
356 @subsubsection Usage
357 @deffn {Scheme Procedure} current-ssax-error-port
358 @end deffn
359
360 @deffn {Scheme Procedure} with-ssax-error-to-port port thunk
361 @end deffn
362
363 @deffn {Scheme Procedure} xml-token? _
364 @verbatim
365 -- Scheme Procedure: pair? x
366 Return `#t' if X is a pair; otherwise return `#f'.
367
368
369 @end verbatim
370 @end deffn
371
372 @deffn {Scheme Syntax} xml-token-kind token
373 @end deffn
374
375 @deffn {Scheme Syntax} xml-token-head token
376 @end deffn
377
378 @deffn {Scheme Procedure} make-empty-attlist
379 @end deffn
380
381 @deffn {Scheme Procedure} attlist-add attlist name-value
382 @end deffn
383
384 @deffn {Scheme Procedure} attlist-null? x
385 Return @code{#t} if @var{x} is the empty list, else @code{#f}.
386 @end deffn
387
388 @deffn {Scheme Procedure} attlist-remove-top attlist
389 @end deffn
390
391 @deffn {Scheme Procedure} attlist->alist attlist
392 @end deffn
393
394 @deffn {Scheme Procedure} attlist-fold kons knil lis1
395 @end deffn
396
397 @deffn {Scheme Procedure} define-parsed-entity! entity str
398 Define a new parsed entity. @var{entity} should be a symbol.
399
400 Instances of &@var{entity}; in XML text will be replaced with the string
401 @var{str}, which will then be parsed.
402 @end deffn
403
404 @deffn {Scheme Procedure} reset-parsed-entity-definitions!
405 Restore the set of parsed entity definitions to its initial state.
406 @end deffn
407
408 @deffn {Scheme Procedure} ssax:uri-string->symbol uri-str
409 @end deffn
410
411 @deffn {Scheme Procedure} ssax:skip-internal-dtd port
412 @end deffn
413
414 @deffn {Scheme Procedure} ssax:read-pi-body-as-string port
415 @end deffn
416
417 @deffn {Scheme Procedure} ssax:reverse-collect-str-drop-ws fragments
418 @end deffn
419
420 @deffn {Scheme Procedure} ssax:read-markup-token port
421 @end deffn
422
423 @deffn {Scheme Procedure} ssax:read-cdata-body port str-handler seed
424 @end deffn
425
426 @deffn {Scheme Procedure} ssax:read-char-ref port
427 @end deffn
428
429 @deffn {Scheme Procedure} ssax:read-attributes port entities
430 @end deffn
431
432 @deffn {Scheme Procedure} ssax:complete-start-tag tag-head port elems entities namespaces
433 @end deffn
434
435 @deffn {Scheme Procedure} ssax:read-external-id port
436 @end deffn
437
438 @deffn {Scheme Procedure} ssax:read-char-data port expect-eof? str-handler seed
439 @end deffn
440
441 @deffn {Scheme Procedure} ssax:xml->sxml port namespace-prefix-assig
442 @end deffn
443
444 @deffn {Scheme Syntax} ssax:make-parser . kw-val-pairs
445 @end deffn
446
447 @deffn {Scheme Syntax} ssax:make-pi-parser orig-handlers
448 @end deffn
449
450 @deffn {Scheme Syntax} ssax:make-elem-parser my-new-level-seed my-finish-element my-char-data-handler my-pi-handlers
451 @end deffn
452
453 @node Transforming SXML
454 @subsection Transforming SXML
455 @subsubsection Overview
456 @heading SXML expression tree transformers
457 @subheading Pre-Post-order traversal of a tree and creation of a new tree
458 @smallexample
459 pre-post-order:: <tree> x <bindings> -> <new-tree>
460 @end smallexample
461
462 where
463
464 @smallexample
465 <bindings> ::= (<binding> ...)
466 <binding> ::= (<trigger-symbol> *preorder* . <handler>) |
467 (<trigger-symbol> *macro* . <handler>) |
468 (<trigger-symbol> <new-bindings> . <handler>) |
469 (<trigger-symbol> . <handler>)
470 <trigger-symbol> ::= XMLname | *text* | *default*
471 <handler> :: <trigger-symbol> x [<tree>] -> <new-tree>
472 @end smallexample
473
474 The pre-post-order function visits the nodes and nodelists
475 pre-post-order (depth-first). For each @code{<Node>} of the form
476 @code{(@var{name} <Node> ...)}, it looks up an association with the
477 given @var{name} among its @var{<bindings>}. If failed,
478 @code{pre-post-order} tries to locate a @code{*default*} binding. It's
479 an error if the latter attempt fails as well. Having found a binding,
480 the @code{pre-post-order} function first checks to see if the binding is
481 of the form
482
483 @smallexample
484 (<trigger-symbol> *preorder* . <handler>)
485 @end smallexample
486
487 If it is, the handler is 'applied' to the current node. Otherwise, the
488 pre-post-order function first calls itself recursively for each child of
489 the current node, with @var{<new-bindings>} prepended to the
490 @var{<bindings>} in effect. The result of these calls is passed to the
491 @var{<handler>} (along with the head of the current @var{<Node>}). To be
492 more precise, the handler is _applied_ to the head of the current node
493 and its processed children. The result of the handler, which should also
494 be a @code{<tree>}, replaces the current @var{<Node>}. If the current
495 @var{<Node>} is a text string or other atom, a special binding with a
496 symbol @code{*text*} is looked up.
497
498 A binding can also be of a form
499
500 @smallexample
501 (<trigger-symbol> *macro* . <handler>)
502 @end smallexample
503
504 This is equivalent to @code{*preorder*} described above. However, the
505 result is re-processed again, with the current stylesheet.
506
507 @subsubsection Usage
508 @deffn {Scheme Procedure} SRV:send-reply . fragments
509 Output the @var{fragments} to the current output port.
510
511 The fragments are a list of strings, characters, numbers, thunks,
512 @code{#f}, @code{#t} -- and other fragments. The function traverses the
513 tree depth-first, writes out strings and characters, executes thunks,
514 and ignores @code{#f} and @code{'()}. The function returns @code{#t} if
515 anything was written at all; otherwise the result is @code{#f} If
516 @code{#t} occurs among the fragments, it is not written out but causes
517 the result of @code{SRV:send-reply} to be @code{#t}.
518 @end deffn
519
520 @deffn {Scheme Procedure} foldts fdown fup fhere seed tree
521 @end deffn
522
523 @deffn {Scheme Procedure} post-order tree bindings
524 @end deffn
525
526 @deffn {Scheme Procedure} pre-post-order tree bindings
527 @end deffn
528
529 @deffn {Scheme Procedure} replace-range beg-pred end-pred forest
530 @end deffn
531
532 @node SXML Tree Fold
533 @subsection SXML Tree Fold
534 @subsubsection Overview
535 @code{(sxml fold)} defines a number of variants of the @dfn{fold}
536 algorithm for use in transforming SXML trees. Additionally it defines
537 the layout operator, @code{fold-layout}, which might be described as a
538 context-passing variant of SSAX's @code{pre-post-order}.
539
540 @subsubsection Usage
541 @deffn {Scheme Procedure} foldt fup fhere tree
542 The standard multithreaded tree fold.
543
544 @var{fup} is of type [a] -> a. @var{fhere} is of type object -> a.
545 @end deffn
546
547 @deffn {Scheme Procedure} foldts fdown fup fhere seed tree
548 The single-threaded tree fold originally defined in SSAX. @xref{SSAX},
549 for more information.
550 @end deffn
551
552 @deffn {Scheme Procedure} foldts* fdown fup fhere seed tree
553 A variant of @code{foldts} that allows pre-order tree
554 rewrites. Originally defined in Andy Wingo's 2007 paper,
555 @emph{Applications of fold to XML transformation}.
556 @end deffn
557
558 @deffn {Scheme Procedure} fold-values proc list . seeds
559 A variant of @code{fold} that allows multi-valued seeds. Note that the
560 order of the arguments differs from that of @code{fold}. @xref{SRFI-1
561 Fold and Map}.
562 @end deffn
563
564 @deffn {Scheme Procedure} foldts*-values fdown fup fhere tree . seeds
565 A variant of @code{foldts*} that allows multi-valued
566 seeds. Originally defined in Andy Wingo's 2007 paper, @emph{Applications
567 of fold to XML transformation}.
568 @end deffn
569
570 @deffn {Scheme Procedure} fold-layout tree bindings params layout stylesheet
571 A traversal combinator in the spirit of @code{pre-post-order}.
572 @xref{Transforming SXML}.
573
574 @code{fold-layout} was originally presented in Andy Wingo's 2007 paper,
575 @emph{Applications of fold to XML transformation}.
576
577 @example
578 bindings := (<binding>...)
579 binding := (<tag> <bandler-pair>...)
580 | (*default* . <post-handler>)
581 | (*text* . <text-handler>)
582 tag := <symbol>
583 handler-pair := (pre-layout . <pre-layout-handler>)
584 | (post . <post-handler>)
585 | (bindings . <bindings>)
586 | (pre . <pre-handler>)
587 | (macro . <macro-handler>)
588 @end example
589
590 @table @var
591 @item pre-layout-handler
592 A function of three arguments:
593
594 @table @var
595 @item kids
596 the kids of the current node, before traversal
597
598 @item params
599 the params of the current node
600
601 @item layout
602 the layout coming into this node
603
604 @end table
605
606 @var{pre-layout-handler} is expected to use this information to return a
607 layout to pass to the kids. The default implementation returns the
608 layout given in the arguments.
609
610 @item post-handler
611 A function of five arguments:
612
613 @table @var
614 @item tag
615 the current tag being processed
616
617 @item params
618 the params of the current node
619
620 @item layout
621 the layout coming into the current node, before any kids were processed
622
623 @item klayout
624 the layout after processing all of the children
625
626 @item kids
627 the already-processed child nodes
628
629 @end table
630
631 @var{post-handler} should return two values, the layout to pass to the
632 next node and the final tree.
633
634 @item text-handler
635 @var{text-handler} is a function of three arguments:
636
637 @table @var
638 @item text
639 the string
640
641 @item params
642 the current params
643
644 @item layout
645 the current layout
646
647 @end table
648
649 @var{text-handler} should return two values, the layout to pass to the
650 next node and the value to which the string should transform.
651
652 @end table
653 @end deffn
654
655 @node SXPath
656 @subsection SXPath
657 @subsubsection Overview
658 @heading SXPath: SXML Query Language
659 SXPath is a query language for SXML, an instance of XML Information set
660 (Infoset) in the form of s-expressions. See @code{(sxml ssax)} for the
661 definition of SXML and more details. SXPath is also a translation into
662 Scheme of an XML Path Language, @uref{http://www.w3.org/TR/xpath,XPath}.
663 XPath and SXPath describe means of selecting a set of Infoset's items or
664 their properties.
665
666 To facilitate queries, XPath maps the XML Infoset into an explicit tree,
667 and introduces important notions of a location path and a current,
668 context node. A location path denotes a selection of a set of nodes
669 relative to a context node. Any XPath tree has a distinguished, root
670 node -- which serves as the context node for absolute location paths.
671 Location path is recursively defined as a location step joined with a
672 location path. A location step is a simple query of the database
673 relative to a context node. A step may include expressions that further
674 filter the selected set. Each node in the resulting set is used as a
675 context node for the adjoining location path. The result of the step is
676 a union of the sets returned by the latter location paths.
677
678 The SXML representation of the XML Infoset (see SSAX.scm) is rather
679 suitable for querying as it is. Bowing to the XPath specification, we
680 will refer to SXML information items as 'Nodes':
681
682 @example
683 <Node> ::= <Element> | <attributes-coll> | <attrib>
684 | "text string" | <PI>
685 @end example
686
687 This production can also be described as
688
689 @example
690 <Node> ::= (name . <Nodeset>) | "text string"
691 @end example
692
693 An (ordered) set of nodes is just a list of the constituent nodes:
694
695 @example
696 <Nodeset> ::= (<Node> ...)
697 @end example
698
699 Nodesets, and Nodes other than text strings are both lists. A <Nodeset>
700 however is either an empty list, or a list whose head is not a symbol. A
701 symbol at the head of a node is either an XML name (in which case it's a
702 tag of an XML element), or an administrative name such as '@@'. This
703 uniform list representation makes processing rather simple and elegant,
704 while avoiding confusion. The multi-branch tree structure formed by the
705 mutually-recursive datatypes <Node> and <Nodeset> lends itself well to
706 processing by functional languages.
707
708 A location path is in fact a composite query over an XPath tree or its
709 branch. A singe step is a combination of a projection, selection or a
710 transitive closure. Multiple steps are combined via join and union
711 operations. This insight allows us to @emph{elegantly} implement XPath
712 as a sequence of projection and filtering primitives -- converters --
713 joined by @dfn{combinators}. Each converter takes a node and returns a
714 nodeset which is the result of the corresponding query relative to that
715 node. A converter can also be called on a set of nodes. In that case it
716 returns a union of the corresponding queries over each node in the set.
717 The union is easily implemented as a list append operation as all nodes
718 in a SXML tree are considered distinct, by XPath conventions. We also
719 preserve the order of the members in the union. Query combinators are
720 high-order functions: they take converter(s) (which is a Node|Nodeset ->
721 Nodeset function) and compose or otherwise combine them. We will be
722 concerned with only relative location paths [XPath]: an absolute
723 location path is a relative path applied to the root node.
724
725 Similarly to XPath, SXPath defines full and abbreviated notations for
726 location paths. In both cases, the abbreviated notation can be
727 mechanically expanded into the full form by simple rewriting rules. In
728 case of SXPath the corresponding rules are given as comments to a sxpath
729 function, below. The regression test suite at the end of this file shows
730 a representative sample of SXPaths in both notations, juxtaposed with
731 the corresponding XPath expressions. Most of the samples are borrowed
732 literally from the XPath specification, while the others are adjusted
733 for our running example, tree1.
734
735 @subsubsection Usage
736 @deffn {Scheme Procedure} nodeset? x
737 @end deffn
738
739 @deffn {Scheme Procedure} node-typeof? crit
740 @end deffn
741
742 @deffn {Scheme Procedure} node-eq? other
743 @end deffn
744
745 @deffn {Scheme Procedure} node-equal? other
746 @end deffn
747
748 @deffn {Scheme Procedure} node-pos n
749 @end deffn
750
751 @deffn {Scheme Procedure} filter pred?
752 @verbatim
753 -- Scheme Procedure: filter pred list
754 Return all the elements of 2nd arg LIST that satisfy predicate
755 PRED. The list is not disordered - elements that appear in the
756 result list occur in the same order as they occur in the argument
757 list. The returned list may share a common tail with the argument
758 list. The dynamic order in which the various applications of pred
759 are made is not specified.
760
761 (filter even? '(0 7 8 8 43 -4)) => (0 8 8 -4)
762
763
764 @end verbatim
765 @end deffn
766
767 @deffn {Scheme Procedure} take-until pred?
768 @end deffn
769
770 @deffn {Scheme Procedure} take-after pred?
771 @end deffn
772
773 @deffn {Scheme Procedure} map-union proc lst
774 @end deffn
775
776 @deffn {Scheme Procedure} node-reverse node-or-nodeset
777 @end deffn
778
779 @deffn {Scheme Procedure} node-trace title
780 @end deffn
781
782 @deffn {Scheme Procedure} select-kids test-pred?
783 @end deffn
784
785 @deffn {Scheme Procedure} node-self pred?
786 @verbatim
787 -- Scheme Procedure: filter pred list
788 Return all the elements of 2nd arg LIST that satisfy predicate
789 PRED. The list is not disordered - elements that appear in the
790 result list occur in the same order as they occur in the argument
791 list. The returned list may share a common tail with the argument
792 list. The dynamic order in which the various applications of pred
793 are made is not specified.
794
795 (filter even? '(0 7 8 8 43 -4)) => (0 8 8 -4)
796
797
798 @end verbatim
799 @end deffn
800
801 @deffn {Scheme Procedure} node-join . selectors
802 @end deffn
803
804 @deffn {Scheme Procedure} node-reduce . converters
805 @end deffn
806
807 @deffn {Scheme Procedure} node-or . converters
808 @end deffn
809
810 @deffn {Scheme Procedure} node-closure test-pred?
811 @end deffn
812
813 @deffn {Scheme Procedure} node-parent rootnode
814 @end deffn
815
816 @deffn {Scheme Procedure} sxpath path
817 @end deffn
818
819 @node sxml ssax input-parse
820 @subsection (sxml ssax input-parse)
821 @subsubsection Overview
822 A simple lexer.
823
824 The procedures in this module surprisingly often suffice to parse an
825 input stream. They either skip, or build and return tokens, according to
826 inclusion or delimiting semantics. The list of characters to expect,
827 include, or to break at may vary from one invocation of a function to
828 another. This allows the functions to easily parse even
829 context-sensitive languages.
830
831 EOF is generally frowned on, and thrown up upon if encountered.
832 Exceptions are mentioned specifically. The list of expected characters
833 (characters to skip until, or break-characters) may include an EOF
834 "character", which is to be coded as the symbol, @code{*eof*}.
835
836 The input stream to parse is specified as a @dfn{port}, which is usually
837 the last (and optional) argument. It defaults to the current input port
838 if omitted.
839
840 If the parser encounters an error, it will throw an exception to the key
841 @code{parser-error}. The arguments will be of the form @code{(@var{port}
842 @var{message} @var{specialising-msg}*)}.
843
844 The first argument is a port, which typically points to the offending
845 character or its neighborhood. You can then use @code{port-column} and
846 @code{port-line} to query the current position. @var{message} is the
847 description of the error. Other arguments supply more details about the
848 problem.
849
850 @subsubsection Usage
851 @deffn {Scheme Procedure} peek-next-char [port]
852 @end deffn
853
854 @deffn {Scheme Procedure} assert-curr-char expected-chars comment [port]
855 @end deffn
856
857 @deffn {Scheme Procedure} skip-until arg [port]
858 @end deffn
859
860 @deffn {Scheme Procedure} skip-while skip-chars [port]
861 @end deffn
862
863 @deffn {Scheme Procedure} next-token prefix-skipped-chars break-chars [comment] [port]
864 @end deffn
865
866 @deffn {Scheme Procedure} next-token-of incl-list/pred [port]
867 @end deffn
868
869 @deffn {Scheme Procedure} read-text-line [port]
870 @end deffn
871
872 @deffn {Scheme Procedure} read-string n [port]
873 @end deffn
874
875 @deffn {Scheme Procedure} find-string-from-port? _ _ . _
876 Looks for @var{str} in @var{<input-port>}, optionally within the first
877 @var{max-no-char} characters.
878 @end deffn
879
880 @node sxml apply-templates
881 @subsection (sxml apply-templates)
882 @subsubsection Overview
883 Pre-order traversal of a tree and creation of a new tree:
884
885 @smallexample
886 apply-templates:: tree x <templates> -> <new-tree>
887 @end smallexample
888
889 where
890
891 @smallexample
892 <templates> ::= (<template> ...)
893 <template> ::= (<node-test> <node-test> ... <node-test> . <handler>)
894 <node-test> ::= an argument to node-typeof? above
895 <handler> ::= <tree> -> <new-tree>
896 @end smallexample
897
898 This procedure does a @emph{normal}, pre-order traversal of an SXML
899 tree. It walks the tree, checking at each node against the list of
900 matching templates.
901
902 If the match is found (which must be unique, i.e., unambiguous), the
903 corresponding handler is invoked and given the current node as an
904 argument. The result from the handler, which must be a @code{<tree>},
905 takes place of the current node in the resulting tree. The name of the
906 function is not accidental: it resembles rather closely an
907 @code{apply-templates} function of XSLT.
908
909 @subsubsection Usage
910 @deffn {Scheme Procedure} apply-templates tree templates
911 @end deffn
912