2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 2010, 2013 Free Software Foundation, Inc.
4 @c See the file guile.texi for copying conditions.
6 @c Based on the documentation at
7 @c <http://planet.plt-scheme.org/package-source/jim/sxml-match.plt/1/1/doc.txt>,
8 @c copyright 2005 Jim Bender, and released under the MIT/X11 license (like the
9 @c rest of `sxml-match'.)
11 @c Converted to Texinfo and modified by Ludovic Courtès, 2010.
14 @section @code{sxml-match}: Pattern Matching of SXML
16 @cindex pattern matching (SXML)
17 @cindex SXML pattern matching
19 The @code{(sxml match)} module provides syntactic forms for pattern
20 matching of SXML trees, in a ``by example'' style reminiscent of the
21 pattern matching of the @code{syntax-rules} and @code{syntax-case} macro
22 systems. @xref{SXML}, for more information on SXML.
24 The following example@footnote{This example is taken from a paper by
25 Krishnamurthi et al. Their paper was the first to show the usefulness of the
26 @code{syntax-rules} style of pattern matching for transformation of XML, though
27 the language described, XT3D, is an XML language.} provides a brief
28 illustration, transforming a music album catalog language into HTML.
31 (define (album->html x)
33 [(album (@@ (title ,t)) (catalog (num ,n) (fmt ,f)) ...)
35 (li (b ,n) (i ,f)) ...)]))
38 Three macros are provided: @code{sxml-match}, @code{sxml-match-let}, and
39 @code{sxml-match-let*}.
41 Compared to a standard s-expression pattern matcher (@pxref{Pattern
42 Matching}), @code{sxml-match} provides the following benefits:
46 matching of SXML elements does not depend on any degree of normalization of the
49 matching of SXML attributes (within an element) is under-ordered; the order of
50 the attributes specified within the pattern need not match the ordering with the
51 element being matched;
53 all attributes specified in the pattern must be present in the element being
54 matched; in the spirit that XML is 'extensible', the element being matched may
55 include additional attributes not specified in the pattern.
58 The present module is a descendant of WebIt!, and was inspired by an
59 s-expression pattern matcher developed by Erik Hilsdale, Dan Friedman, and Kent
60 Dybvig at Indiana University.
62 @unnumberedsubsec Syntax
64 @code{sxml-match} provides @code{case}-like form for pattern matching of XML
67 @deffn {Scheme Syntax} sxml-match input-expression clause1 clause2 @dots{}
68 Match @var{input-expression}, an SXML tree, according to the given @var{clause}s
69 (one or more), each consisting of a pattern and one or more expressions to be
70 evaluated if the pattern match succeeds. Optionally, each @var{clause} within
71 @code{sxml-match} may include a @dfn{guard expression}.
74 The pattern notation is based on that of Scheme's @code{syntax-rules} and
75 @code{syntax-case} macro systems. The grammar for the @code{sxml-match} syntax
79 match-form ::= (sxml-match input-expression
82 clause ::= [node-pattern action-expression+]
83 | [node-pattern (guard expression*) action-expression+]
85 node-pattern ::= literal-pattern
90 literal-pattern ::= string
96 attr-list-pattern ::= (@ attribute-pattern*)
97 | (@ attribute-pattern* . pat-var-or-cata)
99 attribute-pattern ::= (tag-symbol attr-val-pattern)
101 attr-val-pattern ::= literal-pattern
103 | (pat-var-or-cata default-value-expr)
105 element-pattern ::= (tag-symbol attr-list-pattern?)
106 | (tag-symbol attr-list-pattern? nodeset-pattern)
107 | (tag-symbol attr-list-pattern?
108 nodeset-pattern? . pat-var-or-cata)
110 list-pattern ::= (list nodeset-pattern)
111 | (list nodeset-pattern? . pat-var-or-cata)
114 nodeset-pattern ::= node-pattern
116 | node-pattern nodeset-pattern
117 | node-pattern ... nodeset-pattern
119 pat-var-or-cata ::= (unquote var-symbol)
120 | (unquote [var-symbol*])
121 | (unquote [cata-expression -> var-symbol*])
124 Within a list or element body pattern, ellipses may appear only once, but may be
125 followed by zero or more node patterns.
127 Guard expressions cannot refer to the return values of catamorphisms.
129 Ellipses in the output expressions must appear only in an expression context;
130 ellipses are not allowed in a syntactic form.
132 The sections below illustrate specific aspects of the @code{sxml-match} pattern
135 @unnumberedsubsec Matching XML Elements
137 The example below illustrates the pattern matching of an XML element:
140 (sxml-match '(e (@@ (i 1)) 3 4 5)
141 [(e (@@ (i ,d)) ,a ,b ,c) (list d a b c)]
145 Each clause in @code{sxml-match} contains two parts: a pattern and one or more
146 expressions which are evaluated if the pattern is successfully match. The
147 example above matches an element @code{e} with an attribute @code{i} and three
150 Pattern variables are must be ``unquoted'' in the pattern. The above expression
151 binds @var{d} to @code{1}, @var{a} to @code{3}, @var{b} to @code{4}, and @var{c}
154 @unnumberedsubsec Ellipses in Patterns
156 As in @code{syntax-rules}, ellipses may be used to specify a repeated pattern.
157 Note that the pattern @code{item ...} specifies zero-or-more matches of the
160 The use of ellipses in a pattern is illustrated in the code fragment below,
161 where nested ellipses are used to match the children of repeated instances of an
162 @code{a} element, within an element @code{d}.
165 (define x '(d (a 1 2 3) (a 4 5) (a 6 7 8) (a 9 10)))
169 (list (list b ...) ...)])
172 The above expression returns a value of @code{((1 2 3) (4 5) (6 7 8) (9 10))}.
174 @unnumberedsubsec Ellipses in Quasiquote'd Output
176 Within the body of an @code{sxml-match} form, a slightly extended version of
177 quasiquote is provided, which allows the use of ellipses. This is illustrated
178 in the example below.
181 (sxml-match '(e 3 4 5 6 7)
182 [(e ,i ... 6 7) `("start" ,(list 'wrap i) ... "end")]
186 The general pattern is that @code{`(something ,i ...)} is rewritten as
187 @code{`(something ,@@i)}.
189 @unnumberedsubsec Matching Nodesets
191 A nodeset pattern is designated by a list in the pattern, beginning the
192 identifier list. The example below illustrates matching a nodeset.
195 (sxml-match '("i" "j" "k" "l" "m")
196 [(list ,a ,b ,c ,d ,e)
197 `((p ,a) (p ,b) (p ,c) (p ,d) (p ,e))])
200 This example wraps each nodeset item in an HTML paragraph element. This example
201 can be rewritten and simplified through using ellipsis:
204 (sxml-match '("i" "j" "k" "l" "m")
209 This version will match nodesets of any length, and wrap each item in the
210 nodeset in an HTML paragraph element.
212 @unnumberedsubsec Matching the ``Rest'' of a Nodeset
214 Matching the ``rest'' of a nodeset is achieved by using a @code{. rest)} pattern
215 at the end of an element or nodeset pattern.
217 This is illustrated in the example below:
220 (sxml-match '(e 3 (f 4 5 6) 7)
225 The above expression returns @code{(3 (4 5 6) 7)}.
227 @unnumberedsubsec Matching the Unmatched Attributes
229 Sometimes it is useful to bind a list of attributes present in the element being
230 matched, but which do not appear in the pattern. This is achieved by using a
231 @code{. rest)} pattern at the end of the attribute list pattern. This is
232 illustrated in the example below:
235 (sxml-match '(a (@@ (z 1) (y 2) (x 3)) 4 5 6)
236 [(a (@@ (y ,www) . ,qqq) ,t ,u ,v)
237 (list www qqq t u v)])
240 The above expression matches the attribute @code{y} and binds a list of the
241 remaining attributes to the variable @var{qqq}. The result of the above
242 expression is @code{(2 ((z 1) (x 3)) 4 5 6)}.
244 This type of pattern also allows the binding of all attributes:
247 (sxml-match '(a (@@ (z 1) (y 2) (x 3)))
252 @unnumberedsubsec Default Values in Attribute Patterns
254 It is possible to specify a default value for an attribute which is used if the
255 attribute is not present in the element being matched. This is illustrated in
256 the following example:
259 (sxml-match '(e 3 4 5)
260 [(e (@@ (z (,d 1))) ,a ,b ,c) (list d a b c)])
263 The value @code{1} is used when the attribute @code{z} is absent from the
266 @unnumberedsubsec Guards in Patterns
268 Guards may be added to a pattern clause via the @code{guard} keyword. A guard
269 expression may include zero or more expressions which are evaluated only if the
270 pattern is matched. The body of the clause is only evaluated if the guard
271 expressions evaluate to @code{#t}.
273 The use of guard expressions is illustrated below:
277 ((a ,n) (guard (number? n)) n)
278 ((a ,m ,n) (guard (number? m) (number? n)) (+ m n)))
281 @unnumberedsubsec Catamorphisms
283 The example below illustrates the use of explicit recursion within an
284 @code{sxml-match} form. This example implements a simple calculator for the
285 basic arithmetic operations, which are represented by the XML elements
286 @code{plus}, @code{minus}, @code{times}, and @code{div}.
292 [,i (guard (integer? i)) i]
293 [(plus ,x ,y) (+ (simple-eval x) (simple-eval y))]
294 [(times ,x ,y) (* (simple-eval x) (simple-eval y))]
295 [(minus ,x ,y) (- (simple-eval x) (simple-eval y))]
296 [(div ,x ,y) (/ (simple-eval x) (simple-eval y))]
297 [,otherwise (error "simple-eval: invalid expression" x)])))
300 Using the catamorphism feature of @code{sxml-match}, a more concise version of
301 @code{simple-eval} can be written. The pattern @code{,[x]} recursively invokes
302 the pattern matcher on the value bound in this position.
308 [,i (guard (integer? i)) i]
309 [(plus ,[x] ,[y]) (+ x y)]
310 [(times ,[x] ,[y]) (* x y)]
311 [(minus ,[x] ,[y]) (- x y)]
312 [(div ,[x] ,[y]) (/ x y)]
313 [,otherwise (error "simple-eval: invalid expression" x)])))
316 @unnumberedsubsec Named-Catamorphisms
318 It is also possible to explicitly name the operator in the ``cata'' position.
319 Where @code{,[id*]} recurs to the top of the current @code{sxml-match},
320 @code{,[cata -> id*]} recurs to @code{cata}. @code{cata} must evaluate to a
321 procedure which takes one argument, and returns as many values as there are
322 identifiers following @code{->}.
324 Named catamorphism patterns allow processing to be split into multiple, mutually
325 recursive procedures. This is illustrated in the example below: a
326 transformation that formats a ``TV Guide'' into HTML.
329 (define (tv-guide->html g)
330 (define (cast-list cl)
332 [(CastList (CastMember (Character (Name ,ch)) (Actor (Name ,a))) ...)
333 `(div (ul (li ,ch ": " ,a) ...))]))
336 [(Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
337 (Description ,desc ...))
341 [(Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
342 (Description ,desc ...)
349 [(TVGuide (@@ (start ,start-date)
351 (Channel (Name ,nm) ,[prog -> p] ...) ...)
352 `(html (head (title "TV Guide"))
353 (body (h1 "TV Guide")
354 (div (h2 ,nm) ,p ...) ...))]))
357 @unnumberedsubsec @code{sxml-match-let} and @code{sxml-match-let*}
359 @deffn {Scheme Syntax} sxml-match-let ((pat expr) ...) expression0 expression ...
360 @deffnx {Scheme Syntax} sxml-match-let* ((pat expr) ...) expression0 expression ...
361 These forms generalize the @code{let} and @code{let*} forms of Scheme to allow
362 an XML pattern in the binding position, rather than a simple variable.
365 For example, the expression below:
368 (sxml-match-let ([(a ,i ,j) '(a 1 2)])
372 binds the variables @var{i} and @var{j} to @code{1} and @code{2} in the XML