quick fix to ssax.scm
[bpt/guile.git] / module / sxml / upstream / SXPath-old.scm
CommitLineData
47f3ce52
AW
1; XML processing in Scheme
2; SXPath -- SXML Query Language
3;
4; SXPath is a query language for SXML, an instance of XML Information
5; set (Infoset) in the form of s-expressions. See SSAX.scm for the
6; definition of SXML and more details. SXPath is also a translation into
7; Scheme of an XML Path Language, XPath:
8; http://www.w3.org/TR/xpath
9; XPath and SXPath describe means of selecting a set of Infoset's items
10; or their properties.
11;
12; To facilitate queries, XPath maps the XML Infoset into an explicit
13; tree, and introduces important notions of a location path and a
14; current, context node. A location path denotes a selection of a set of
15; nodes relative to a context node. Any XPath tree has a distinguished,
16; root node -- which serves as the context node for absolute location
17; paths. Location path is recursively defined as a location step joined
18; with a location path. A location step is a simple query of the
19; database relative to a context node. A step may include expressions
20; that further filter the selected set. Each node in the resulting set
21; is used as a context node for the adjoining location path. The result
22; of the step is a union of the sets returned by the latter location
23; paths.
24;
25; The SXML representation of the XML Infoset (see SSAX.scm) is rather
26; suitable for querying as it is. Bowing to the XPath specification,
27; we will refer to SXML information items as 'Nodes':
28; <Node> ::= <Element> | <attributes-coll> | <attrib>
29; | "text string" | <PI>
30; This production can also be described as
31; <Node> ::= (name . <Nodeset>) | "text string"
32; An (ordered) set of nodes is just a list of the constituent nodes:
33; <Nodeset> ::= (<Node> ...)
34; Nodesets, and Nodes other than text strings are both lists. A
35; <Nodeset> however is either an empty list, or a list whose head is not
36; a symbol. A symbol at the head of a node is either an XML name (in
37; which case it's a tag of an XML element), or an administrative name
38; such as '@'. This uniform list representation makes processing rather
39; simple and elegant, while avoiding confusion. The multi-branch tree
40; structure formed by the mutually-recursive datatypes <Node> and
41; <Nodeset> lends itself well to processing by functional languages.
42;
43; A location path is in fact a composite query over an XPath tree or
44; its branch. A singe step is a combination of a projection, selection
45; or a transitive closure. Multiple steps are combined via join and
46; union operations. This insight allows us to _elegantly_ implement
47; XPath as a sequence of projection and filtering primitives --
48; converters -- joined by _combinators_. Each converter takes a node
49; and returns a nodeset which is the result of the corresponding query
50; relative to that node. A converter can also be called on a set of
51; nodes. In that case it returns a union of the corresponding queries over
52; each node in the set. The union is easily implemented as a list
53; append operation as all nodes in a SXML tree are considered
54; distinct, by XPath conventions. We also preserve the order of the
55; members in the union. Query combinators are high-order functions:
56; they take converter(s) (which is a Node|Nodeset -> Nodeset function)
57; and compose or otherwise combine them. We will be concerned with
58; only relative location paths [XPath]: an absolute location path is a
59; relative path applied to the root node.
60;
61; Similarly to XPath, SXPath defines full and abbreviated notations
62; for location paths. In both cases, the abbreviated notation can be
63; mechanically expanded into the full form by simple rewriting
64; rules. In case of SXPath the corresponding rules are given as
65; comments to a sxpath function, below. The regression test suite at
66; the end of this file shows a representative sample of SXPaths in
67; both notations, juxtaposed with the corresponding XPath
68; expressions. Most of the samples are borrowed literally from the
69; XPath specification, while the others are adjusted for our running
70; example, tree1.
71;
72; To do:
73; Rename filter to node-filter or ns-filter
74; Use ;=== for chapters, ;--- for sections, and ;^^^ for end sections
75;
76; $Id: SXPath-old.scm,v 1.4 2004/07/07 16:02:31 sperber Exp $
77
78
79 ; See http://pobox.com/~oleg/ftp/Scheme/myenv.scm
80 ; See http://pobox.com/~oleg/ftp/Scheme/myenv-scm.scm
81 ; See http://pobox.com/~oleg/ftp/Scheme/myenv-bigloo.scm
82;(module SXPath
83; (include "myenv-bigloo.scm")) ; For use with Bigloo 2.2b
84;(load "myenv-scm.scm") ; For use with SCM v5d2
85;(include "myenv.scm") ; For use with Gambit-C 3.0
86
87
88
89(define (nodeset? x)
90 (or (and (pair? x) (not (symbol? (car x)))) (null? x)))
91
92;-------------------------
93; Basic converters and applicators
94; A converter is a function
95; type Converter = Node|Nodeset -> Nodeset
96; A converter can also play a role of a predicate: in that case, if a
97; converter, applied to a node or a nodeset, yields a non-empty
98; nodeset, the converter-predicate is deemed satisfied. Throughout
99; this file a nil nodeset is equivalent to #f in denoting a failure.
100
101; The following function implements a 'Node test' as defined in
102; Sec. 2.3 of XPath document. A node test is one of the components of a
103; location step. It is also a converter-predicate in SXPath.
104;
105; The function node-typeof? takes a type criterion and returns a function,
106; which, when applied to a node, will tell if the node satisfies
107; the test.
108; node-typeof? :: Crit -> Node -> Boolean
109;
110; The criterion 'crit' is a symbol, one of the following:
111; id - tests if the Node has the right name (id)
112; @ - tests if the Node is an <attributes-coll>
113; * - tests if the Node is an <Element>
114; *text* - tests if the Node is a text node
115; *PI* - tests if the Node is a PI node
116; *any* - #t for any type of Node
117
118(define (node-typeof? crit)
119 (lambda (node)
120 (case crit
121 ((*) (and (pair? node) (not (memq (car node) '(@ *PI*)))))
122 ((*any*) #t)
123 ((*text*) (string? node))
124 (else
125 (and (pair? node) (eq? crit (car node))))
126)))
127
128
129; Curried equivalence converter-predicates
130(define (node-eq? other)
131 (lambda (node)
132 (eq? other node)))
133
134(define (node-equal? other)
135 (lambda (node)
136 (equal? other node)))
137
138; node-pos:: N -> Nodeset -> Nodeset, or
139; node-pos:: N -> Converter
140; Select the N'th element of a Nodeset and return as a singular Nodeset;
141; Return an empty nodeset if the Nth element does not exist.
142; ((node-pos 1) Nodeset) selects the node at the head of the Nodeset,
143; if exists; ((node-pos 2) Nodeset) selects the Node after that, if
144; exists.
145; N can also be a negative number: in that case the node is picked from
146; the tail of the list.
147; ((node-pos -1) Nodeset) selects the last node of a non-empty nodeset;
148; ((node-pos -2) Nodeset) selects the last but one node, if exists.
149
150(define (node-pos n)
151 (lambda (nodeset)
152 (cond
153 ((not (nodeset? nodeset)) '())
154 ((null? nodeset) nodeset)
155 ((eqv? n 1) (list (car nodeset)))
156 ((negative? n) ((node-pos (+ n 1 (length nodeset))) nodeset))
157 (else
158 (assert (positive? n))
159 ((node-pos (dec n)) (cdr nodeset))))))
160
161; filter:: Converter -> Converter
162; A filter applicator, which introduces a filtering context. The argument
163; converter is considered a predicate, with either #f or nil result meaning
164; failure.
165(define (filter pred?)
166 (lambda (lst) ; a nodeset or a node (will be converted to a singleton nset)
167 (let loop ((lst (if (nodeset? lst) lst (list lst))) (res '()))
168 (if (null? lst)
169 (reverse res)
170 (let ((pred-result (pred? (car lst))))
171 (loop (cdr lst)
172 (if (and pred-result (not (null? pred-result)))
173 (cons (car lst) res)
174 res)))))))
175
176; take-until:: Converter -> Converter, or
177; take-until:: Pred -> Node|Nodeset -> Nodeset
178; Given a converter-predicate and a nodeset, apply the predicate to
179; each element of the nodeset, until the predicate yields anything but #f or
180; nil. Return the elements of the input nodeset that have been processed
181; till that moment (that is, which fail the predicate).
182; take-until is a variation of the filter above: take-until passes
183; elements of an ordered input set till (but not including) the first
184; element that satisfies the predicate.
185; The nodeset returned by ((take-until (not pred)) nset) is a subset --
186; to be more precise, a prefix -- of the nodeset returned by
187; ((filter pred) nset)
188
189(define (take-until pred?)
190 (lambda (lst) ; a nodeset or a node (will be converted to a singleton nset)
191 (let loop ((lst (if (nodeset? lst) lst (list lst))))
192 (if (null? lst) lst
193 (let ((pred-result (pred? (car lst))))
194 (if (and pred-result (not (null? pred-result)))
195 '()
196 (cons (car lst) (loop (cdr lst)))))
197 ))))
198
199
200; take-after:: Converter -> Converter, or
201; take-after:: Pred -> Node|Nodeset -> Nodeset
202; Given a converter-predicate and a nodeset, apply the predicate to
203; each element of the nodeset, until the predicate yields anything but #f or
204; nil. Return the elements of the input nodeset that have not been processed:
205; that is, return the elements of the input nodeset that follow the first
206; element that satisfied the predicate.
207; take-after along with take-until partition an input nodeset into three
208; parts: the first element that satisfies a predicate, all preceding
209; elements and all following elements.
210
211(define (take-after pred?)
212 (lambda (lst) ; a nodeset or a node (will be converted to a singleton nset)
213 (let loop ((lst (if (nodeset? lst) lst (list lst))))
214 (if (null? lst) lst
215 (let ((pred-result (pred? (car lst))))
216 (if (and pred-result (not (null? pred-result)))
217 (cdr lst)
218 (loop (cdr lst))))
219 ))))
220
221; Apply proc to each element of lst and return the list of results.
222; if proc returns a nodeset, splice it into the result
223;
224; From another point of view, map-union is a function Converter->Converter,
225; which places an argument-converter in a joining context.
226
227(define (map-union proc lst)
228 (if (null? lst) lst
229 (let ((proc-res (proc (car lst))))
230 ((if (nodeset? proc-res) append cons)
231 proc-res (map-union proc (cdr lst))))))
232
233; node-reverse :: Converter, or
234; node-reverse:: Node|Nodeset -> Nodeset
235; Reverses the order of nodes in the nodeset
236; This basic converter is needed to implement a reverse document order
237; (see the XPath Recommendation).
238(define node-reverse
239 (lambda (node-or-nodeset)
240 (if (not (nodeset? node-or-nodeset)) (list node-or-nodeset)
241 (reverse node-or-nodeset))))
242
243; node-trace:: String -> Converter
244; (node-trace title) is an identity converter. In addition it prints out
245; a node or nodeset it is applied to, prefixed with the 'title'.
246; This converter is very useful for debugging.
247
248(define (node-trace title)
249 (lambda (node-or-nodeset)
250 (cout nl "-->")
251 (display title)
252 (display " :")
253 (pretty-print node-or-nodeset)
254 node-or-nodeset))
255
256
257;-------------------------
258; Converter combinators
259;
260; Combinators are higher-order functions that transmogrify a converter
261; or glue a sequence of converters into a single, non-trivial
262; converter. The goal is to arrive at converters that correspond to
263; XPath location paths.
264;
265; From a different point of view, a combinator is a fixed, named
266; _pattern_ of applying converters. Given below is a complete set of
267; such patterns that together implement XPath location path
268; specification. As it turns out, all these combinators can be built
269; from a small number of basic blocks: regular functional composition,
270; map-union and filter applicators, and the nodeset union.
271
272
273
274; select-kids:: Pred -> Node -> Nodeset
275; Given a Node, return an (ordered) subset its children that satisfy
276; the Pred (a converter, actually)
277; select-kids:: Pred -> Nodeset -> Nodeset
278; The same as above, but select among children of all the nodes in
279; the Nodeset
280;
281; More succinctly, the signature of this function is
282; select-kids:: Converter -> Converter
283
284(define (select-kids test-pred?)
285 (lambda (node) ; node or node-set
286 (cond
287 ((null? node) node)
288 ((not (pair? node)) '()) ; No children
289 ((symbol? (car node))
290 ((filter test-pred?) (cdr node))) ; it's a single node
291 (else (map-union (select-kids test-pred?) node)))))
292
293
294; node-self:: Pred -> Node -> Nodeset, or
295; node-self:: Converter -> Converter
296; Similar to select-kids but apply to the Node itself rather
297; than to its children. The resulting Nodeset will contain either one
298; component, or will be empty (if the Node failed the Pred).
299(define node-self filter)
300
301
302; node-join:: [LocPath] -> Node|Nodeset -> Nodeset, or
303; node-join:: [Converter] -> Converter
304; join the sequence of location steps or paths as described
305; in the title comments above.
306(define (node-join . selectors)
307 (lambda (nodeset) ; Nodeset or node
308 (let loop ((nodeset nodeset) (selectors selectors))
309 (if (null? selectors) nodeset
310 (loop
311 (if (nodeset? nodeset)
312 (map-union (car selectors) nodeset)
313 ((car selectors) nodeset))
314 (cdr selectors))))))
315
316
317; node-reduce:: [LocPath] -> Node|Nodeset -> Nodeset, or
318; node-reduce:: [Converter] -> Converter
319; A regular functional composition of converters.
320; From a different point of view,
321; ((apply node-reduce converters) nodeset)
322; is equivalent to
323; (foldl apply nodeset converters)
324; i.e., folding, or reducing, a list of converters with the nodeset
325; as a seed.
326(define (node-reduce . converters)
327 (lambda (nodeset) ; Nodeset or node
328 (let loop ((nodeset nodeset) (converters converters))
329 (if (null? converters) nodeset
330 (loop ((car converters) nodeset) (cdr converters))))))
331
332
333; node-or:: [Converter] -> Converter
334; This combinator applies all converters to a given node and
335; produces the union of their results.
336; This combinator corresponds to a union, '|' operation for XPath
337; location paths.
338; (define (node-or . converters)
339; (lambda (node-or-nodeset)
340; (if (null? converters) node-or-nodeset
341; (append
342; ((car converters) node-or-nodeset)
343; ((apply node-or (cdr converters)) node-or-nodeset)))))
344; More optimal implementation follows
345(define (node-or . converters)
346 (lambda (node-or-nodeset)
347 (let loop ((result '()) (converters converters))
348 (if (null? converters) result
349 (loop (append result (or ((car converters) node-or-nodeset) '()))
350 (cdr converters))))))
351
352
353; node-closure:: Converter -> Converter
354; Select all _descendants_ of a node that satisfy a converter-predicate.
355; This combinator is similar to select-kids but applies to
356; grand... children as well.
357; This combinator implements the "descendant::" XPath axis
358; Conceptually, this combinator can be expressed as
359; (define (node-closure f)
360; (node-or
361; (select-kids f)
362; (node-reduce (select-kids (node-typeof? '*)) (node-closure f))))
363; This definition, as written, looks somewhat like a fixpoint, and it
364; will run forever. It is obvious however that sooner or later
365; (select-kids (node-typeof? '*)) will return an empty nodeset. At
366; this point further iterations will no longer affect the result and
367; can be stopped.
368
369(define (node-closure test-pred?)
370 (lambda (node) ; Nodeset or node
371 (let loop ((parent node) (result '()))
372 (if (null? parent) result
373 (loop ((select-kids (node-typeof? '*)) parent)
374 (append result
375 ((select-kids test-pred?) parent)))
376 ))))
377
378; node-parent:: RootNode -> Converter
379; (node-parent rootnode) yields a converter that returns a parent of a
380; node it is applied to. If applied to a nodeset, it returns the list
381; of parents of nodes in the nodeset. The rootnode does not have
382; to be the root node of the whole SXML tree -- it may be a root node
383; of a branch of interest.
384; Given the notation of Philip Wadler's paper on semantics of XSLT,
385; parent(x) = { y | y=subnode*(root), x=subnode(y) }
386; Therefore, node-parent is not the fundamental converter: it can be
387; expressed through the existing ones. Yet node-parent is a rather
388; convenient converter. It corresponds to a parent:: axis of SXPath.
389; Note that the parent:: axis can be used with an attribute node as well!
390
391(define (node-parent rootnode)
392 (lambda (node) ; Nodeset or node
393 (if (nodeset? node) (map-union (node-parent rootnode) node)
394 (let ((pred
395 (node-or
396 (node-reduce
397 (node-self (node-typeof? '*))
398 (select-kids (node-eq? node)))
399 (node-join
400 (select-kids (node-typeof? '@))
401 (select-kids (node-eq? node))))))
402 ((node-or
403 (node-self pred)
404 (node-closure pred))
405 rootnode)))))
406
407;-------------------------
408; Evaluate an abbreviated SXPath
409; sxpath:: AbbrPath -> Converter, or
410; sxpath:: AbbrPath -> Node|Nodeset -> Nodeset
411; AbbrPath is a list. It is translated to the full SXPath according
412; to the following rewriting rules
413; (sxpath '()) -> (node-join)
414; (sxpath '(path-component ...)) ->
415; (node-join (sxpath1 path-component) (sxpath '(...)))
416; (sxpath1 '//) -> (node-or
417; (node-self (node-typeof? '*any*))
418; (node-closure (node-typeof? '*any*)))
419; (sxpath1 '(equal? x)) -> (select-kids (node-equal? x))
420; (sxpath1 '(eq? x)) -> (select-kids (node-eq? x))
421; (sxpath1 ?symbol) -> (select-kids (node-typeof? ?symbol)
422; (sxpath1 procedure) -> procedure
423; (sxpath1 '(?symbol ...)) -> (sxpath1 '((?symbol) ...))
424; (sxpath1 '(path reducer ...)) ->
425; (node-reduce (sxpath path) (sxpathr reducer) ...)
426; (sxpathr number) -> (node-pos number)
427; (sxpathr path-filter) -> (filter (sxpath path-filter))
428
429(define (sxpath path)
430 (lambda (nodeset)
431 (let loop ((nodeset nodeset) (path path))
432 (cond
433 ((null? path) nodeset)
434 ((nodeset? nodeset)
435 (map-union (sxpath path) nodeset))
436 ((procedure? (car path))
437 (loop ((car path) nodeset) (cdr path)))
438 ((eq? '// (car path))
439 (loop
440 ((if (nodeset? nodeset) append cons) nodeset
441 ((node-closure (node-typeof? '*any*)) nodeset))
442 (cdr path)))
443 ((symbol? (car path))
444 (loop ((select-kids (node-typeof? (car path))) nodeset)
445 (cdr path)))
446 ((and (pair? (car path)) (eq? 'equal? (caar path)))
447 (loop ((select-kids (apply node-equal? (cdar path))) nodeset)
448 (cdr path)))
449 ((and (pair? (car path)) (eq? 'eq? (caar path)))
450 (loop ((select-kids (apply node-eq? (cdar path))) nodeset)
451 (cdr path)))
452 ((pair? (car path))
453 (let reducer ((nodeset
454 (if (symbol? (caar path))
455 ((select-kids (node-typeof? (caar path))) nodeset)
456 (loop nodeset (caar path))))
457 (reducing-path (cdar path)))
458 (cond
459 ((null? reducing-path) (loop nodeset (cdr path)))
460 ((number? (car reducing-path))
461 (reducer ((node-pos (car reducing-path)) nodeset)
462 (cdr reducing-path)))
463 (else
464 (reducer ((filter (sxpath (car reducing-path))) nodeset)
465 (cdr reducing-path))))))
466 (else
467 (error "Invalid path step: " (car path)))
468))))
469
470;------------------------------------------------------------------------
471; Sample XPath/SXPath expressions: regression test suite for the
472; implementation above.
473
474; A running example
475
476(define tree1
477 '(html
478 (head (title "Slides"))
479 (body
480 (p (@ (align "center"))
481 (table (@ (style "font-size: x-large"))
482 (tr
483 (td (@ (align "right")) "Talks ")
484 (td (@ (align "center")) " = ")
485 (td " slides + transition"))
486 (tr (td)
487 (td (@ (align "center")) " = ")
488 (td " data + control"))
489 (tr (td)
490 (td (@ (align "center")) " = ")
491 (td " programs"))))
492 (ul
493 (li (a (@ (href "slides/slide0001.gif")) "Introduction"))
494 (li (a (@ (href "slides/slide0010.gif")) "Summary")))
495 )))
496
497
498; Example from a posting "Re: DrScheme and XML",
499; Shriram Krishnamurthi, comp.lang.scheme, Nov. 26. 1999.
500; http://www.deja.com/getdoc.xp?AN=553507805
501(define tree3
502 '(poem (@ (title "The Lovesong of J. Alfred Prufrock")
503 (poet "T. S. Eliot"))
504 (stanza
505 (line "Let us go then, you and I,")
506 (line "When the evening is spread out against the sky")
507 (line "Like a patient etherized upon a table:"))
508 (stanza
509 (line "In the room the women come and go")
510 (line "Talking of Michaelangelo."))))
511
512; Validation Test harness
513
514(define-syntax run-test
515 (syntax-rules (define)
516 ((run-test "scan-exp" (define vars body))
517 (define vars (run-test "scan-exp" body)))
518 ((run-test "scan-exp" ?body)
519 (letrec-syntax
520 ((scan-exp ; (scan-exp body k)
521 (syntax-rules (quote quasiquote !)
522 ((scan-exp '() (k-head ! . args))
523 (k-head '() . args))
524 ((scan-exp (quote (hd . tl)) k)
525 (scan-lit-lst (hd . tl) (do-wrap ! quasiquote k)))
526 ((scan-exp (quasiquote (hd . tl)) k)
527 (scan-lit-lst (hd . tl) (do-wrap ! quasiquote k)))
528 ((scan-exp (quote x) (k-head ! . args))
529 (k-head
530 (if (string? (quote x)) (string->symbol (quote x)) (quote x))
531 . args))
532 ((scan-exp (hd . tl) k)
533 (scan-exp hd (do-tl ! scan-exp tl k)))
534 ((scan-exp x (k-head ! . args))
535 (k-head x . args))))
536 (do-tl
537 (syntax-rules (!)
538 ((do-tl processed-hd fn () (k-head ! . args))
539 (k-head (processed-hd) . args))
540 ((do-tl processed-hd fn old-tl k)
541 (fn old-tl (do-cons ! processed-hd k)))))
542 (do-cons
543 (syntax-rules (!)
544 ((do-cons processed-tl processed-hd (k-head ! . args))
545 (k-head (processed-hd . processed-tl) . args))))
546 (do-wrap
547 (syntax-rules (!)
548 ((do-wrap val fn (k-head ! . args))
549 (k-head (fn val) . args))))
550 (do-finish
551 (syntax-rules ()
552 ((do-finish new-body) new-body)))
553
554 (scan-lit-lst ; scan literal list
555 (syntax-rules (quote unquote unquote-splicing !)
556 ((scan-lit-lst '() (k-head ! . args))
557 (k-head '() . args))
558 ((scan-lit-lst (quote (hd . tl)) k)
559 (do-tl quote scan-lit-lst ((hd . tl)) k))
560 ((scan-lit-lst (unquote x) k)
561 (scan-exp x (do-wrap ! unquote k)))
562 ((scan-lit-lst (unquote-splicing x) k)
563 (scan-exp x (do-wrap ! unquote-splicing k)))
564 ((scan-lit-lst (quote x) (k-head ! . args))
565 (k-head
566 ,(if (string? (quote x)) (string->symbol (quote x)) (quote x))
567 . args))
568 ((scan-lit-lst (hd . tl) k)
569 (scan-lit-lst hd (do-tl ! scan-lit-lst tl k)))
570 ((scan-lit-lst x (k-head ! . args))
571 (k-head x . args))))
572 )
573 (scan-exp ?body (do-finish !))))
574 ((run-test body ...)
575 (begin
576 (run-test "scan-exp" body) ...))
577))
578
579; Overwrite the above macro to switch the tests off
580; (define-macro (run-test selector node expected-result) #f)
581
582; Location path, full form: child::para
583; Location path, abbreviated form: para
584; selects the para element children of the context node
585
586(let ((tree
587 '(elem (@) (para (@) "para") (br (@)) "cdata" (para (@) "second par"))
588 )
589 (expected '((para (@) "para") (para (@) "second par")))
590 )
591 (run-test (select-kids (node-typeof? 'para)) tree expected)
592 (run-test (sxpath '(para)) tree expected)
593)
594
595; Location path, full form: child::*
596; Location path, abbreviated form: *
597; selects all element children of the context node
598
599(let ((tree
600 '(elem (@) (para (@) "para") (br (@)) "cdata" (para "second par"))
601 )
602 (expected
603 '((para (@) "para") (br (@)) (para "second par")))
604 )
605 (run-test (select-kids (node-typeof? '*)) tree expected)
606 (run-test (sxpath '(*)) tree expected)
607)
608
609
610
611; Location path, full form: child::text()
612; Location path, abbreviated form: text()
613; selects all text node children of the context node
614(let ((tree
615 '(elem (@) (para (@) "para") (br (@)) "cdata" (para "second par"))
616 )
617 (expected
618 '("cdata"))
619 )
620 (run-test (select-kids (node-typeof? '*text*)) tree expected)
621 (run-test (sxpath '(*text*)) tree expected)
622)
623
624
625; Location path, full form: child::node()
626; Location path, abbreviated form: node()
627; selects all the children of the context node, whatever their node type
628(let* ((tree
629 '(elem (@) (para (@) "para") (br (@)) "cdata" (para "second par"))
630 )
631 (expected (cdr tree))
632 )
633 (run-test (select-kids (node-typeof? '*any*)) tree expected)
634 (run-test (sxpath '(*any*)) tree expected)
635)
636
637; Location path, full form: child::*/child::para
638; Location path, abbreviated form: */para
639; selects all para grandchildren of the context node
640
641(let ((tree
642 '(elem (@) (para (@) "para") (br (@)) "cdata" (para "second par")
643 (div (@ (name "aa")) (para "third para")))
644 )
645 (expected
646 '((para "third para")))
647 )
648 (run-test
649 (node-join (select-kids (node-typeof? '*))
650 (select-kids (node-typeof? 'para)))
651 tree expected)
652 (run-test (sxpath '(* para)) tree expected)
653)
654
655
656; Location path, full form: attribute::name
657; Location path, abbreviated form: @name
658; selects the 'name' attribute of the context node
659
660(let ((tree
661 '(elem (@ (name "elem") (id "idz"))
662 (para (@) "para") (br (@)) "cdata" (para (@) "second par")
663 (div (@ (name "aa")) (para (@) "third para")))
664 )
665 (expected
666 '((name "elem")))
667 )
668 (run-test
669 (node-join (select-kids (node-typeof? '@))
670 (select-kids (node-typeof? 'name)))
671 tree expected)
672 (run-test (sxpath '(@ name)) tree expected)
673)
674
675; Location path, full form: attribute::*
676; Location path, abbreviated form: @*
677; selects all the attributes of the context node
678(let ((tree
679 '(elem (@ (name "elem") (id "idz"))
680 (para (@) "para") (br (@)) "cdata" (para "second par")
681 (div (@ (name "aa")) (para (@) "third para")))
682 )
683 (expected
684 '((name "elem") (id "idz")))
685 )
686 (run-test
687 (node-join (select-kids (node-typeof? '@))
688 (select-kids (node-typeof? '*)))
689 tree expected)
690 (run-test (sxpath '(@ *)) tree expected)
691)
692
693
694; Location path, full form: descendant::para
695; Location path, abbreviated form: .//para
696; selects the para element descendants of the context node
697
698(let ((tree
699 '(elem (@ (name "elem") (id "idz"))
700 (para (@) "para") (br (@)) "cdata" (para "second par")
701 (div (@ (name "aa")) (para (@) "third para")))
702 )
703 (expected
704 '((para (@) "para") (para "second par") (para (@) "third para")))
705 )
706 (run-test
707 (node-closure (node-typeof? 'para))
708 tree expected)
709 (run-test (sxpath '(// para)) tree expected)
710)
711
712; Location path, full form: self::para
713; Location path, abbreviated form: _none_
714; selects the context node if it is a para element; otherwise selects nothing
715
716(let ((tree
717 '(elem (@ (name "elem") (id "idz"))
718 (para (@) "para") (br (@)) "cdata" (para "second par")
719 (div (@ (name "aa")) (para (@) "third para")))
720 )
721 )
722 (run-test (node-self (node-typeof? 'para)) tree '())
723 (run-test (node-self (node-typeof? 'elem)) tree (list tree))
724)
725
726; Location path, full form: descendant-or-self::node()
727; Location path, abbreviated form: //
728; selects the context node, all the children (including attribute nodes)
729; of the context node, and all the children of all the (element)
730; descendants of the context node.
731; This is _almost_ a powerset of the context node.
732(let* ((tree
733 '(para (@ (name "elem") (id "idz"))
734 (para (@) "para") (br (@)) "cdata" (para "second par")
735 (div (@ (name "aa")) (para (@) "third para")))
736 )
737 (expected
738 (cons tree
739 (append (cdr tree)
740 '((@) "para" (@) "second par"
741 (@ (name "aa")) (para (@) "third para")
742 (@) "third para"))))
743 )
744 (run-test
745 (node-or
746 (node-self (node-typeof? '*any*))
747 (node-closure (node-typeof? '*any*)))
748 tree expected)
749 (run-test (sxpath '(//)) tree expected)
750)
751
752; Location path, full form: ancestor::div
753; Location path, abbreviated form: _none_
754; selects all div ancestors of the context node
755; This Location expression is equivalent to the following:
756; /descendant-or-self::div[descendant::node() = curr_node]
757; This shows that the ancestor:: axis is actually redundant. Still,
758; it can be emulated as the following SXPath expression demonstrates.
759
760; The insight behind "ancestor::div" -- selecting all "div" ancestors
761; of the current node -- is
762; S[ancestor::div] context_node =
763; { y | y=subnode*(root), context_node=subnode(subnode*(y)),
764; isElement(y), name(y) = "div" }
765; We observe that
766; { y | y=subnode*(root), pred(y) }
767; can be expressed in SXPath as
768; ((node-or (node-self pred) (node-closure pred)) root-node)
769; The composite predicate 'isElement(y) & name(y) = "div"' corresponds to
770; (node-self (node-typeof? 'div)) in SXPath. Finally, filter
771; context_node=subnode(subnode*(y)) is tantamount to
772; (node-closure (node-eq? context-node)), whereas node-reduce denotes the
773; the composition of converters-predicates in the filtering context.
774
775(let*
776 ((root
777 '(div (@ (name "elem") (id "idz"))
778 (para (@) "para") (br (@)) "cdata" (para (@) "second par")
779 (div (@ (name "aa")) (para (@) "third para"))))
780 (context-node ; /descendant::any()[child::text() == "third para"]
781 (car
782 ((node-closure
783 (select-kids
784 (node-equal? "third para")))
785 root)))
786 (pred
787 (node-reduce (node-self (node-typeof? 'div))
788 (node-closure (node-eq? context-node))
789 ))
790 )
791 (run-test
792 (node-or
793 (node-self pred)
794 (node-closure pred))
795 root
796 (cons root
797 '((div (@ (name "aa")) (para (@) "third para")))))
798)
799
800
801
802; Location path, full form: child::div/descendant::para
803; Location path, abbreviated form: div//para
804; selects the para element descendants of the div element
805; children of the context node
806
807(let ((tree
808 '(elem (@ (name "elem") (id "idz"))
809 (para (@) "para") (br (@)) "cdata" (para "second par")
810 (div (@ (name "aa")) (para (@) "third para")
811 (div (para "fourth para"))))
812 )
813 (expected
814 '((para (@) "third para") (para "fourth para")))
815 )
816 (run-test
817 (node-join
818 (select-kids (node-typeof? 'div))
819 (node-closure (node-typeof? 'para)))
820 tree expected)
821 (run-test (sxpath '(div // para)) tree expected)
822)
823
824
825; Location path, full form: /descendant::olist/child::item
826; Location path, abbreviated form: //olist/item
827; selects all the item elements that have an olist parent (which is not root)
828; and that are in the same document as the context node
829; See the following test.
830
831; Location path, full form: /descendant::td/attribute::align
832; Location path, abbreviated form: //td/@align
833; Selects 'align' attributes of all 'td' elements in tree1
834(let ((tree tree1)
835 (expected
836 '((align "right") (align "center") (align "center") (align "center"))
837 ))
838 (run-test
839 (node-join
840 (node-closure (node-typeof? 'td))
841 (select-kids (node-typeof? '@))
842 (select-kids (node-typeof? 'align)))
843 tree expected)
844 (run-test (sxpath '(// td @ align)) tree expected)
845)
846
847
848; Location path, full form: /descendant::td[attribute::align]
849; Location path, abbreviated form: //td[@align]
850; Selects all td elements that have an attribute 'align' in tree1
851(let ((tree tree1)
852 (expected
853 '((td (@ (align "right")) "Talks ") (td (@ (align "center")) " = ")
854 (td (@ (align "center")) " = ") (td (@ (align "center")) " = "))
855 ))
856 (run-test
857 (node-reduce
858 (node-closure (node-typeof? 'td))
859 (filter
860 (node-join
861 (select-kids (node-typeof? '@))
862 (select-kids (node-typeof? 'align)))))
863 tree expected)
864 (run-test (sxpath `(// td ,(node-self (sxpath '(@ align))))) tree expected)
865 (run-test (sxpath '(// (td (@ align)))) tree expected)
866 (run-test (sxpath '(// ((td) (@ align)))) tree expected)
867 ; note! (sxpath ...) is a converter. Therefore, it can be used
868 ; as any other converter, for example, in the full-form SXPath.
869 ; Thus we can mix the full and abbreviated form SXPath's freely.
870 (run-test
871 (node-reduce
872 (node-closure (node-typeof? 'td))
873 (filter
874 (sxpath '(@ align))))
875 tree expected)
876)
877
878
879; Location path, full form: /descendant::td[attribute::align = "right"]
880; Location path, abbreviated form: //td[@align = "right"]
881; Selects all td elements that have an attribute align = "right" in tree1
882(let ((tree tree1)
883 (expected
884 '((td (@ (align "right")) "Talks "))
885 ))
886 (run-test
887 (node-reduce
888 (node-closure (node-typeof? 'td))
889 (filter
890 (node-join
891 (select-kids (node-typeof? '@))
892 (select-kids (node-equal? '(align "right"))))))
893 tree expected)
894 (run-test (sxpath '(// (td (@ (equal? (align "right")))))) tree expected)
895)
896
897; Location path, full form: child::para[position()=1]
898; Location path, abbreviated form: para[1]
899; selects the first para child of the context node
900(let ((tree
901 '(elem (@ (name "elem") (id "idz"))
902 (para (@) "para") (br (@)) "cdata" (para "second par")
903 (div (@ (name "aa")) (para (@) "third para")))
904 )
905 (expected
906 '((para (@) "para"))
907 ))
908 (run-test
909 (node-reduce
910 (select-kids (node-typeof? 'para))
911 (node-pos 1))
912 tree expected)
913 (run-test (sxpath '((para 1))) tree expected)
914)
915
916; Location path, full form: child::para[position()=last()]
917; Location path, abbreviated form: para[last()]
918; selects the last para child of the context node
919(let ((tree
920 '(elem (@ (name "elem") (id "idz"))
921 (para (@) "para") (br (@)) "cdata" (para "second par")
922 (div (@ (name "aa")) (para (@) "third para")))
923 )
924 (expected
925 '((para "second par"))
926 ))
927 (run-test
928 (node-reduce
929 (select-kids (node-typeof? 'para))
930 (node-pos -1))
931 tree expected)
932 (run-test (sxpath '((para -1))) tree expected)
933)
934
935; Illustrating the following Note of Sec 2.5 of XPath:
936; "NOTE: The location path //para[1] does not mean the same as the
937; location path /descendant::para[1]. The latter selects the first
938; descendant para element; the former selects all descendant para
939; elements that are the first para children of their parents."
940
941(let ((tree
942 '(elem (@ (name "elem") (id "idz"))
943 (para (@) "para") (br (@)) "cdata" (para "second par")
944 (div (@ (name "aa")) (para (@) "third para")))
945 )
946 )
947 (run-test
948 (node-reduce ; /descendant::para[1] in SXPath
949 (node-closure (node-typeof? 'para))
950 (node-pos 1))
951 tree '((para (@) "para")))
952 (run-test (sxpath '(// (para 1))) tree
953 '((para (@) "para") (para (@) "third para")))
954)
955
956; Location path, full form: parent::node()
957; Location path, abbreviated form: ..
958; selects the parent of the context node. The context node may be
959; an attribute node!
960; For the last test:
961; Location path, full form: parent::*/attribute::name
962; Location path, abbreviated form: ../@name
963; Selects the name attribute of the parent of the context node
964
965(let* ((tree
966 '(elem (@ (name "elem") (id "idz"))
967 (para (@) "para") (br (@)) "cdata" (para "second par")
968 (div (@ (name "aa")) (para (@) "third para")))
969 )
970 (para1 ; the first para node
971 (car ((sxpath '(para)) tree)))
972 (para3 ; the third para node
973 (car ((sxpath '(div para)) tree)))
974 (div ; div node
975 (car ((sxpath '(// div)) tree)))
976 )
977 (run-test
978 (node-parent tree)
979 para1 (list tree))
980 (run-test
981 (node-parent tree)
982 para3 (list div))
983 (run-test ; checking the parent of an attribute node
984 (node-parent tree)
985 ((sxpath '(@ name)) div) (list div))
986 (run-test
987 (node-join
988 (node-parent tree)
989 (select-kids (node-typeof? '@))
990 (select-kids (node-typeof? 'name)))
991 para3 '((name "aa")))
992 (run-test
993 (sxpath `(,(node-parent tree) @ name))
994 para3 '((name "aa")))
995)
996
997; Location path, full form: following-sibling::chapter[position()=1]
998; Location path, abbreviated form: none
999; selects the next chapter sibling of the context node
1000; The path is equivalent to
1001; let cnode = context-node
1002; in
1003; parent::* / child::chapter [take-after node_eq(self::*,cnode)]
1004; [position()=1]
1005(let* ((tree
1006 '(document
1007 (preface "preface")
1008 (chapter (@ (id "one")) "Chap 1 text")
1009 (chapter (@ (id "two")) "Chap 2 text")
1010 (chapter (@ (id "three")) "Chap 3 text")
1011 (chapter (@ (id "four")) "Chap 4 text")
1012 (epilogue "Epilogue text")
1013 (appendix (@ (id "A")) "App A text")
1014 (References "References"))
1015 )
1016 (a-node ; to be used as a context node
1017 (car ((sxpath '(// (chapter (@ (equal? (id "two")))))) tree)))
1018 (expected
1019 '((chapter (@ (id "three")) "Chap 3 text")))
1020 )
1021 (run-test
1022 (node-reduce
1023 (node-join
1024 (node-parent tree)
1025 (select-kids (node-typeof? 'chapter)))
1026 (take-after (node-eq? a-node))
1027 (node-pos 1)
1028 )
1029 a-node expected)
1030)
1031
1032; preceding-sibling::chapter[position()=1]
1033; selects the previous chapter sibling of the context node
1034; The path is equivalent to
1035; let cnode = context-node
1036; in
1037; parent::* / child::chapter [take-until node_eq(self::*,cnode)]
1038; [position()=-1]
1039(let* ((tree
1040 '(document
1041 (preface "preface")
1042 (chapter (@ (id "one")) "Chap 1 text")
1043 (chapter (@ (id "two")) "Chap 2 text")
1044 (chapter (@ (id "three")) "Chap 3 text")
1045 (chapter (@ (id "four")) "Chap 4 text")
1046 (epilogue "Epilogue text")
1047 (appendix (@ (id "A")) "App A text")
1048 (References "References"))
1049 )
1050 (a-node ; to be used as a context node
1051 (car ((sxpath '(// (chapter (@ (equal? (id "three")))))) tree)))
1052 (expected
1053 '((chapter (@ (id "two")) "Chap 2 text")))
1054 )
1055 (run-test
1056 (node-reduce
1057 (node-join
1058 (node-parent tree)
1059 (select-kids (node-typeof? 'chapter)))
1060 (take-until (node-eq? a-node))
1061 (node-pos -1)
1062 )
1063 a-node expected)
1064)
1065
1066
1067; /descendant::figure[position()=42]
1068; selects the forty-second figure element in the document
1069; See the next example, which is more general.
1070
1071; Location path, full form:
1072; child::table/child::tr[position()=2]/child::td[position()=3]
1073; Location path, abbreviated form: table/tr[2]/td[3]
1074; selects the third td of the second tr of the table
1075(let ((tree ((node-closure (node-typeof? 'p)) tree1))
1076 (expected
1077 '((td " data + control"))
1078 ))
1079 (run-test
1080 (node-join
1081 (select-kids (node-typeof? 'table))
1082 (node-reduce (select-kids (node-typeof? 'tr))
1083 (node-pos 2))
1084 (node-reduce (select-kids (node-typeof? 'td))
1085 (node-pos 3)))
1086 tree expected)
1087 (run-test (sxpath '(table (tr 2) (td 3))) tree expected)
1088)
1089
1090
1091; Location path, full form:
1092; child::para[attribute::type='warning'][position()=5]
1093; Location path, abbreviated form: para[@type='warning'][5]
1094; selects the fifth para child of the context node that has a type
1095; attribute with value warning
1096(let ((tree
1097 '(chapter
1098 (para "para1")
1099 (para (@ (type "warning")) "para 2")
1100 (para (@ (type "warning")) "para 3")
1101 (para (@ (type "warning")) "para 4")
1102 (para (@ (type "warning")) "para 5")
1103 (para (@ (type "warning")) "para 6"))
1104 )
1105 (expected
1106 '((para (@ (type "warning")) "para 6"))
1107 ))
1108 (run-test
1109 (node-reduce
1110 (select-kids (node-typeof? 'para))
1111 (filter
1112 (node-join
1113 (select-kids (node-typeof? '@))
1114 (select-kids (node-equal? '(type "warning")))))
1115 (node-pos 5))
1116 tree expected)
1117 (run-test (sxpath '( (((para (@ (equal? (type "warning"))))) 5 ) ))
1118 tree expected)
1119 (run-test (sxpath '( (para (@ (equal? (type "warning"))) 5 ) ))
1120 tree expected)
1121)
1122
1123
1124; Location path, full form:
1125; child::para[position()=5][attribute::type='warning']
1126; Location path, abbreviated form: para[5][@type='warning']
1127; selects the fifth para child of the context node if that child has a 'type'
1128; attribute with value warning
1129(let ((tree
1130 '(chapter
1131 (para "para1")
1132 (para (@ (type "warning")) "para 2")
1133 (para (@ (type "warning")) "para 3")
1134 (para (@ (type "warning")) "para 4")
1135 (para (@ (type "warning")) "para 5")
1136 (para (@ (type "warning")) "para 6"))
1137 )
1138 (expected
1139 '((para (@ (type "warning")) "para 5"))
1140 ))
1141 (run-test
1142 (node-reduce
1143 (select-kids (node-typeof? 'para))
1144 (node-pos 5)
1145 (filter
1146 (node-join
1147 (select-kids (node-typeof? '@))
1148 (select-kids (node-equal? '(type "warning"))))))
1149 tree expected)
1150 (run-test (sxpath '( (( (para 5)) (@ (equal? (type "warning"))))))
1151 tree expected)
1152 (run-test (sxpath '( (para 5 (@ (equal? (type "warning")))) ))
1153 tree expected)
1154)
1155
1156; Location path, full form:
1157; child::*[self::chapter or self::appendix]
1158; Location path, semi-abbreviated form: *[self::chapter or self::appendix]
1159; selects the chapter and appendix children of the context node
1160(let ((tree
1161 '(document
1162 (preface "preface")
1163 (chapter (@ (id "one")) "Chap 1 text")
1164 (chapter (@ (id "two")) "Chap 2 text")
1165 (chapter (@ (id "three")) "Chap 3 text")
1166 (epilogue "Epilogue text")
1167 (appendix (@ (id "A")) "App A text")
1168 (References "References"))
1169 )
1170 (expected
1171 '((chapter (@ (id "one")) "Chap 1 text")
1172 (chapter (@ (id "two")) "Chap 2 text")
1173 (chapter (@ (id "three")) "Chap 3 text")
1174 (appendix (@ (id "A")) "App A text"))
1175 ))
1176 (run-test
1177 (node-join
1178 (select-kids (node-typeof? '*))
1179 (filter
1180 (node-or
1181 (node-self (node-typeof? 'chapter))
1182 (node-self (node-typeof? 'appendix)))))
1183 tree expected)
1184 (run-test (sxpath `(* ,(node-or (node-self (node-typeof? 'chapter))
1185 (node-self (node-typeof? 'appendix)))))
1186 tree expected)
1187)
1188
1189
1190; Location path, full form: child::chapter[child::title='Introduction']
1191; Location path, abbreviated form: chapter[title = 'Introduction']
1192; selects the chapter children of the context node that have one or more
1193; title children with string-value equal to Introduction
1194; See a similar example: //td[@align = "right"] above.
1195
1196; Location path, full form: child::chapter[child::title]
1197; Location path, abbreviated form: chapter[title]
1198; selects the chapter children of the context node that have one or
1199; more title children
1200; See a similar example //td[@align] above.
1201
1202(cerr nl "Example with tree3: extracting the first lines of every stanza" nl)
1203(let ((tree tree3)
1204 (expected
1205 '("Let us go then, you and I," "In the room the women come and go")
1206 ))
1207 (run-test
1208 (node-join
1209 (node-closure (node-typeof? 'stanza))
1210 (node-reduce
1211 (select-kids (node-typeof? 'line)) (node-pos 1))
1212 (select-kids (node-typeof? '*text*)))
1213 tree expected)
1214 (run-test (sxpath '(// stanza (line 1) *text*)) tree expected)
1215)
1216