elisp @@ macro
[bpt/guile.git] / doc / ref / sxml-match.texi
1 @c -*-texinfo-*-
2 @c This is part of the GNU Guile Reference Manual.
3 @c Copyright (C) 2010, 2013 Free Software Foundation, Inc.
4 @c See the file guile.texi for copying conditions.
5 @c
6 @c Based on the documentation at
7 @c <http://planet.plt-scheme.org/package-source/jim/sxml-match.plt/1/1/doc.txt>,
8 @c copyright 2005 Jim Bender, and released under the MIT/X11 license (like the
9 @c rest of `sxml-match'.)
10 @c
11 @c Converted to Texinfo and modified by Ludovic Courtès, 2010.
12
13 @node sxml-match
14 @section @code{sxml-match}: Pattern Matching of SXML
15
16 @cindex pattern matching (SXML)
17 @cindex SXML pattern matching
18
19 The @code{(sxml match)} module provides syntactic forms for pattern
20 matching of SXML trees, in a ``by example'' style reminiscent of the
21 pattern matching of the @code{syntax-rules} and @code{syntax-case} macro
22 systems. @xref{SXML}, for more information on SXML.
23
24 The following example@footnote{This example is taken from a paper by
25 Krishnamurthi et al. Their paper was the first to show the usefulness of the
26 @code{syntax-rules} style of pattern matching for transformation of XML, though
27 the language described, XT3D, is an XML language.} provides a brief
28 illustration, transforming a music album catalog language into HTML.
29
30 @lisp
31 (define (album->html x)
32 (sxml-match x
33 [(album (@@ (title ,t)) (catalog (num ,n) (fmt ,f)) ...)
34 `(ul (li ,t)
35 (li (b ,n) (i ,f)) ...)]))
36 @end lisp
37
38 Three macros are provided: @code{sxml-match}, @code{sxml-match-let}, and
39 @code{sxml-match-let*}.
40
41 Compared to a standard s-expression pattern matcher (@pxref{Pattern
42 Matching}), @code{sxml-match} provides the following benefits:
43
44 @itemize
45 @item
46 matching of SXML elements does not depend on any degree of normalization of the
47 SXML;
48 @item
49 matching of SXML attributes (within an element) is under-ordered; the order of
50 the attributes specified within the pattern need not match the ordering with the
51 element being matched;
52 @item
53 all attributes specified in the pattern must be present in the element being
54 matched; in the spirit that XML is 'extensible', the element being matched may
55 include additional attributes not specified in the pattern.
56 @end itemize
57
58 The present module is a descendant of WebIt!, and was inspired by an
59 s-expression pattern matcher developed by Erik Hilsdale, Dan Friedman, and Kent
60 Dybvig at Indiana University.
61
62 @unnumberedsubsec Syntax
63
64 @code{sxml-match} provides @code{case}-like form for pattern matching of XML
65 nodes.
66
67 @deffn {Scheme Syntax} sxml-match input-expression clause1 clause2 @dots{}
68 Match @var{input-expression}, an SXML tree, according to the given @var{clause}s
69 (one or more), each consisting of a pattern and one or more expressions to be
70 evaluated if the pattern match succeeds. Optionally, each @var{clause} within
71 @code{sxml-match} may include a @dfn{guard expression}.
72 @end deffn
73
74 The pattern notation is based on that of Scheme's @code{syntax-rules} and
75 @code{syntax-case} macro systems. The grammar for the @code{sxml-match} syntax
76 is given below:
77
78 @verbatim
79 match-form ::= (sxml-match input-expression
80 clause+)
81
82 clause ::= [node-pattern action-expression+]
83 | [node-pattern (guard expression*) action-expression+]
84
85 node-pattern ::= literal-pattern
86 | pat-var-or-cata
87 | element-pattern
88 | list-pattern
89
90 literal-pattern ::= string
91 | character
92 | number
93 | #t
94 | #f
95
96 attr-list-pattern ::= (@ attribute-pattern*)
97 | (@ attribute-pattern* . pat-var-or-cata)
98
99 attribute-pattern ::= (tag-symbol attr-val-pattern)
100
101 attr-val-pattern ::= literal-pattern
102 | pat-var-or-cata
103 | (pat-var-or-cata default-value-expr)
104
105 element-pattern ::= (tag-symbol attr-list-pattern?)
106 | (tag-symbol attr-list-pattern? nodeset-pattern)
107 | (tag-symbol attr-list-pattern?
108 nodeset-pattern? . pat-var-or-cata)
109
110 list-pattern ::= (list nodeset-pattern)
111 | (list nodeset-pattern? . pat-var-or-cata)
112 | (list)
113
114 nodeset-pattern ::= node-pattern
115 | node-pattern ...
116 | node-pattern nodeset-pattern
117 | node-pattern ... nodeset-pattern
118
119 pat-var-or-cata ::= (unquote var-symbol)
120 | (unquote [var-symbol*])
121 | (unquote [cata-expression -> var-symbol*])
122 @end verbatim
123
124 Within a list or element body pattern, ellipses may appear only once, but may be
125 followed by zero or more node patterns.
126
127 Guard expressions cannot refer to the return values of catamorphisms.
128
129 Ellipses in the output expressions must appear only in an expression context;
130 ellipses are not allowed in a syntactic form.
131
132 The sections below illustrate specific aspects of the @code{sxml-match} pattern
133 matcher.
134
135 @unnumberedsubsec Matching XML Elements
136
137 The example below illustrates the pattern matching of an XML element:
138
139 @lisp
140 (sxml-match '(e (@@ (i 1)) 3 4 5)
141 [(e (@@ (i ,d)) ,a ,b ,c) (list d a b c)]
142 [,otherwise #f])
143 @end lisp
144
145 Each clause in @code{sxml-match} contains two parts: a pattern and one or more
146 expressions which are evaluated if the pattern is successfully match. The
147 example above matches an element @code{e} with an attribute @code{i} and three
148 children.
149
150 Pattern variables are must be ``unquoted'' in the pattern. The above expression
151 binds @var{d} to @code{1}, @var{a} to @code{3}, @var{b} to @code{4}, and @var{c}
152 to @code{5}.
153
154 @unnumberedsubsec Ellipses in Patterns
155
156 As in @code{syntax-rules}, ellipses may be used to specify a repeated pattern.
157 Note that the pattern @code{item ...} specifies zero-or-more matches of the
158 pattern @code{item}.
159
160 The use of ellipses in a pattern is illustrated in the code fragment below,
161 where nested ellipses are used to match the children of repeated instances of an
162 @code{a} element, within an element @code{d}.
163
164 @lisp
165 (define x '(d (a 1 2 3) (a 4 5) (a 6 7 8) (a 9 10)))
166
167 (sxml-match x
168 [(d (a ,b ...) ...)
169 (list (list b ...) ...)])
170 @end lisp
171
172 The above expression returns a value of @code{((1 2 3) (4 5) (6 7 8) (9 10))}.
173
174 @unnumberedsubsec Ellipses in Quasiquote'd Output
175
176 Within the body of an @code{sxml-match} form, a slightly extended version of
177 quasiquote is provided, which allows the use of ellipses. This is illustrated
178 in the example below.
179
180 @lisp
181 (sxml-match '(e 3 4 5 6 7)
182 [(e ,i ... 6 7) `("start" ,(list 'wrap i) ... "end")]
183 [,otherwise #f])
184 @end lisp
185
186 The general pattern is that @code{`(something ,i ...)} is rewritten as
187 @code{`(something ,@@i)}.
188
189 @unnumberedsubsec Matching Nodesets
190
191 A nodeset pattern is designated by a list in the pattern, beginning the
192 identifier list. The example below illustrates matching a nodeset.
193
194 @lisp
195 (sxml-match '("i" "j" "k" "l" "m")
196 [(list ,a ,b ,c ,d ,e)
197 `((p ,a) (p ,b) (p ,c) (p ,d) (p ,e))])
198 @end lisp
199
200 This example wraps each nodeset item in an HTML paragraph element. This example
201 can be rewritten and simplified through using ellipsis:
202
203 @lisp
204 (sxml-match '("i" "j" "k" "l" "m")
205 [(list ,i ...)
206 `((p ,i) ...)])
207 @end lisp
208
209 This version will match nodesets of any length, and wrap each item in the
210 nodeset in an HTML paragraph element.
211
212 @unnumberedsubsec Matching the ``Rest'' of a Nodeset
213
214 Matching the ``rest'' of a nodeset is achieved by using a @code{. rest)} pattern
215 at the end of an element or nodeset pattern.
216
217 This is illustrated in the example below:
218
219 @lisp
220 (sxml-match '(e 3 (f 4 5 6) 7)
221 [(e ,a (f . ,y) ,d)
222 (list a y d)])
223 @end lisp
224
225 The above expression returns @code{(3 (4 5 6) 7)}.
226
227 @unnumberedsubsec Matching the Unmatched Attributes
228
229 Sometimes it is useful to bind a list of attributes present in the element being
230 matched, but which do not appear in the pattern. This is achieved by using a
231 @code{. rest)} pattern at the end of the attribute list pattern. This is
232 illustrated in the example below:
233
234 @lisp
235 (sxml-match '(a (@@ (z 1) (y 2) (x 3)) 4 5 6)
236 [(a (@@ (y ,www) . ,qqq) ,t ,u ,v)
237 (list www qqq t u v)])
238 @end lisp
239
240 The above expression matches the attribute @code{y} and binds a list of the
241 remaining attributes to the variable @var{qqq}. The result of the above
242 expression is @code{(2 ((z 1) (x 3)) 4 5 6)}.
243
244 This type of pattern also allows the binding of all attributes:
245
246 @lisp
247 (sxml-match '(a (@@ (z 1) (y 2) (x 3)))
248 [(a (@@ . ,qqq))
249 qqq])
250 @end lisp
251
252 @unnumberedsubsec Default Values in Attribute Patterns
253
254 It is possible to specify a default value for an attribute which is used if the
255 attribute is not present in the element being matched. This is illustrated in
256 the following example:
257
258 @lisp
259 (sxml-match '(e 3 4 5)
260 [(e (@@ (z (,d 1))) ,a ,b ,c) (list d a b c)])
261 @end lisp
262
263 The value @code{1} is used when the attribute @code{z} is absent from the
264 element @code{e}.
265
266 @unnumberedsubsec Guards in Patterns
267
268 Guards may be added to a pattern clause via the @code{guard} keyword. A guard
269 expression may include zero or more expressions which are evaluated only if the
270 pattern is matched. The body of the clause is only evaluated if the guard
271 expressions evaluate to @code{#t}.
272
273 The use of guard expressions is illustrated below:
274
275 @lisp
276 (sxml-match '(a 2 3)
277 ((a ,n) (guard (number? n)) n)
278 ((a ,m ,n) (guard (number? m) (number? n)) (+ m n)))
279 @end lisp
280
281 @unnumberedsubsec Catamorphisms
282
283 The example below illustrates the use of explicit recursion within an
284 @code{sxml-match} form. This example implements a simple calculator for the
285 basic arithmetic operations, which are represented by the XML elements
286 @code{plus}, @code{minus}, @code{times}, and @code{div}.
287
288 @lisp
289 (define simple-eval
290 (lambda (x)
291 (sxml-match x
292 [,i (guard (integer? i)) i]
293 [(plus ,x ,y) (+ (simple-eval x) (simple-eval y))]
294 [(times ,x ,y) (* (simple-eval x) (simple-eval y))]
295 [(minus ,x ,y) (- (simple-eval x) (simple-eval y))]
296 [(div ,x ,y) (/ (simple-eval x) (simple-eval y))]
297 [,otherwise (error "simple-eval: invalid expression" x)])))
298 @end lisp
299
300 Using the catamorphism feature of @code{sxml-match}, a more concise version of
301 @code{simple-eval} can be written. The pattern @code{,[x]} recursively invokes
302 the pattern matcher on the value bound in this position.
303
304 @lisp
305 (define simple-eval
306 (lambda (x)
307 (sxml-match x
308 [,i (guard (integer? i)) i]
309 [(plus ,[x] ,[y]) (+ x y)]
310 [(times ,[x] ,[y]) (* x y)]
311 [(minus ,[x] ,[y]) (- x y)]
312 [(div ,[x] ,[y]) (/ x y)]
313 [,otherwise (error "simple-eval: invalid expression" x)])))
314 @end lisp
315
316 @unnumberedsubsec Named-Catamorphisms
317
318 It is also possible to explicitly name the operator in the ``cata'' position.
319 Where @code{,[id*]} recurs to the top of the current @code{sxml-match},
320 @code{,[cata -> id*]} recurs to @code{cata}. @code{cata} must evaluate to a
321 procedure which takes one argument, and returns as many values as there are
322 identifiers following @code{->}.
323
324 Named catamorphism patterns allow processing to be split into multiple, mutually
325 recursive procedures. This is illustrated in the example below: a
326 transformation that formats a ``TV Guide'' into HTML.
327
328 @lisp
329 (define (tv-guide->html g)
330 (define (cast-list cl)
331 (sxml-match cl
332 [(CastList (CastMember (Character (Name ,ch)) (Actor (Name ,a))) ...)
333 `(div (ul (li ,ch ": " ,a) ...))]))
334 (define (prog p)
335 (sxml-match p
336 [(Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
337 (Description ,desc ...))
338 `(div (p ,start-time
339 (br) ,series-title
340 (br) ,desc ...))]
341 [(Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
342 (Description ,desc ...)
343 ,[cast-list -> cl])
344 `(div (p ,start-time
345 (br) ,series-title
346 (br) ,desc ...)
347 ,cl)]))
348 (sxml-match g
349 [(TVGuide (@@ (start ,start-date)
350 (end ,end-date))
351 (Channel (Name ,nm) ,[prog -> p] ...) ...)
352 `(html (head (title "TV Guide"))
353 (body (h1 "TV Guide")
354 (div (h2 ,nm) ,p ...) ...))]))
355 @end lisp
356
357 @unnumberedsubsec @code{sxml-match-let} and @code{sxml-match-let*}
358
359 @deffn {Scheme Syntax} sxml-match-let ((pat expr) ...) expression0 expression ...
360 @deffnx {Scheme Syntax} sxml-match-let* ((pat expr) ...) expression0 expression ...
361 These forms generalize the @code{let} and @code{let*} forms of Scheme to allow
362 an XML pattern in the binding position, rather than a simple variable.
363 @end deffn
364
365 For example, the expression below:
366
367 @lisp
368 (sxml-match-let ([(a ,i ,j) '(a 1 2)])
369 (+ i j))
370 @end lisp
371
372 binds the variables @var{i} and @var{j} to @code{1} and @code{2} in the XML
373 value given.
374
375 @c Local Variables:
376 @c coding: utf-8
377 @c End: