Merge commit '58147d67806e1f54c447d7eabac35b1a5086c3a6'
[bpt/guile.git] / doc / ref / sxml-match.texi
CommitLineData
400a5dcb
LC
1@c -*-texinfo-*-
2@c This is part of the GNU Guile Reference Manual.
3e31e75a 3@c Copyright (C) 2010, 2013 Free Software Foundation, Inc.
400a5dcb
LC
4@c See the file guile.texi for copying conditions.
5@c
6@c Based on the documentation at
7@c <http://planet.plt-scheme.org/package-source/jim/sxml-match.plt/1/1/doc.txt>,
8@c copyright 2005 Jim Bender, and released under the MIT/X11 license (like the
9@c rest of `sxml-match'.)
10@c
11@c Converted to Texinfo and modified by Ludovic Courtès, 2010.
12
13@node sxml-match
14@section @code{sxml-match}: Pattern Matching of SXML
15
16@cindex pattern matching (SXML)
17@cindex SXML pattern matching
18
3e31e75a
AW
19The @code{(sxml match)} module provides syntactic forms for pattern
20matching of SXML trees, in a ``by example'' style reminiscent of the
21pattern matching of the @code{syntax-rules} and @code{syntax-case} macro
22systems. @xref{SXML}, for more information on SXML.
400a5dcb
LC
23
24The following example@footnote{This example is taken from a paper by
25Krishnamurthi et al. Their paper was the first to show the usefulness of the
26@code{syntax-rules} style of pattern matching for transformation of XML, though
27the language described, XT3D, is an XML language.} provides a brief
28illustration, transforming a music album catalog language into HTML.
29
30@lisp
31(define (album->html x)
32 (sxml-match x
bdfcabfe 33 [(album (@@ (title ,t)) (catalog (num ,n) (fmt ,f)) ...)
400a5dcb
LC
34 `(ul (li ,t)
35 (li (b ,n) (i ,f)) ...)]))
36@end lisp
37
38Three macros are provided: @code{sxml-match}, @code{sxml-match-let}, and
39@code{sxml-match-let*}.
40
358663ca
LC
41Compared to a standard s-expression pattern matcher (@pxref{Pattern
42Matching}), @code{sxml-match} provides the following benefits:
400a5dcb
LC
43
44@itemize
45@item
46matching of SXML elements does not depend on any degree of normalization of the
47SXML;
48@item
49matching of SXML attributes (within an element) is under-ordered; the order of
50the attributes specified within the pattern need not match the ordering with the
51element being matched;
52@item
53all attributes specified in the pattern must be present in the element being
54matched; in the spirit that XML is 'extensible', the element being matched may
55include additional attributes not specified in the pattern.
56@end itemize
57
58The present module is a descendant of WebIt!, and was inspired by an
59s-expression pattern matcher developed by Erik Hilsdale, Dan Friedman, and Kent
60Dybvig at Indiana University.
61
62@unnumberedsubsec Syntax
63
64@code{sxml-match} provides @code{case}-like form for pattern matching of XML
65nodes.
66
df0a1002 67@deffn {Scheme Syntax} sxml-match input-expression clause1 clause2 @dots{}
400a5dcb
LC
68Match @var{input-expression}, an SXML tree, according to the given @var{clause}s
69(one or more), each consisting of a pattern and one or more expressions to be
70evaluated if the pattern match succeeds. Optionally, each @var{clause} within
71@code{sxml-match} may include a @dfn{guard expression}.
72@end deffn
73
74The pattern notation is based on that of Scheme's @code{syntax-rules} and
75@code{syntax-case} macro systems. The grammar for the @code{sxml-match} syntax
76is given below:
77
78@verbatim
79match-form ::= (sxml-match input-expression
80 clause+)
81
82clause ::= [node-pattern action-expression+]
83 | [node-pattern (guard expression*) action-expression+]
84
85node-pattern ::= literal-pattern
86 | pat-var-or-cata
87 | element-pattern
88 | list-pattern
89
90literal-pattern ::= string
91 | character
92 | number
93 | #t
94 | #f
95
96attr-list-pattern ::= (@ attribute-pattern*)
97 | (@ attribute-pattern* . pat-var-or-cata)
98
99attribute-pattern ::= (tag-symbol attr-val-pattern)
100
101attr-val-pattern ::= literal-pattern
102 | pat-var-or-cata
103 | (pat-var-or-cata default-value-expr)
104
105element-pattern ::= (tag-symbol attr-list-pattern?)
106 | (tag-symbol attr-list-pattern? nodeset-pattern)
107 | (tag-symbol attr-list-pattern?
108 nodeset-pattern? . pat-var-or-cata)
109
110list-pattern ::= (list nodeset-pattern)
111 | (list nodeset-pattern? . pat-var-or-cata)
112 | (list)
113
114nodeset-pattern ::= node-pattern
115 | node-pattern ...
116 | node-pattern nodeset-pattern
117 | node-pattern ... nodeset-pattern
118
119pat-var-or-cata ::= (unquote var-symbol)
120 | (unquote [var-symbol*])
121 | (unquote [cata-expression -> var-symbol*])
122@end verbatim
123
124Within a list or element body pattern, ellipses may appear only once, but may be
125followed by zero or more node patterns.
126
127Guard expressions cannot refer to the return values of catamorphisms.
128
129Ellipses in the output expressions must appear only in an expression context;
130ellipses are not allowed in a syntactic form.
131
132The sections below illustrate specific aspects of the @code{sxml-match} pattern
133matcher.
134
135@unnumberedsubsec Matching XML Elements
136
137The example below illustrates the pattern matching of an XML element:
138
139@lisp
bdfcabfe
LC
140(sxml-match '(e (@@ (i 1)) 3 4 5)
141 [(e (@@ (i ,d)) ,a ,b ,c) (list d a b c)]
400a5dcb
LC
142 [,otherwise #f])
143@end lisp
144
145Each clause in @code{sxml-match} contains two parts: a pattern and one or more
146expressions which are evaluated if the pattern is successfully match. The
147example above matches an element @code{e} with an attribute @code{i} and three
148children.
149
150Pattern variables are must be ``unquoted'' in the pattern. The above expression
151binds @var{d} to @code{1}, @var{a} to @code{3}, @var{b} to @code{4}, and @var{c}
152to @code{5}.
153
154@unnumberedsubsec Ellipses in Patterns
155
156As in @code{syntax-rules}, ellipses may be used to specify a repeated pattern.
157Note that the pattern @code{item ...} specifies zero-or-more matches of the
158pattern @code{item}.
159
160The use of ellipses in a pattern is illustrated in the code fragment below,
161where nested ellipses are used to match the children of repeated instances of an
162@code{a} element, within an element @code{d}.
163
164@lisp
165(define x '(d (a 1 2 3) (a 4 5) (a 6 7 8) (a 9 10)))
166
167(sxml-match x
168 [(d (a ,b ...) ...)
169 (list (list b ...) ...)])
170@end lisp
171
172The above expression returns a value of @code{((1 2 3) (4 5) (6 7 8) (9 10))}.
173
174@unnumberedsubsec Ellipses in Quasiquote'd Output
175
176Within the body of an @code{sxml-match} form, a slightly extended version of
177quasiquote is provided, which allows the use of ellipses. This is illustrated
178in the example below.
179
180@lisp
181(sxml-match '(e 3 4 5 6 7)
182 [(e ,i ... 6 7) `("start" ,(list 'wrap i) ... "end")]
183 [,otherwise #f])
184@end lisp
185
186The general pattern is that @code{`(something ,i ...)} is rewritten as
187@code{`(something ,@@i)}.
188
189@unnumberedsubsec Matching Nodesets
190
191A nodeset pattern is designated by a list in the pattern, beginning the
192identifier list. The example below illustrates matching a nodeset.
193
194@lisp
195(sxml-match '("i" "j" "k" "l" "m")
196 [(list ,a ,b ,c ,d ,e)
197 `((p ,a) (p ,b) (p ,c) (p ,d) (p ,e))])
198@end lisp
199
200This example wraps each nodeset item in an HTML paragraph element. This example
201can be rewritten and simplified through using ellipsis:
202
203@lisp
204(sxml-match '("i" "j" "k" "l" "m")
205 [(list ,i ...)
206 `((p ,i) ...)])
207@end lisp
208
209This version will match nodesets of any length, and wrap each item in the
210nodeset in an HTML paragraph element.
211
212@unnumberedsubsec Matching the ``Rest'' of a Nodeset
213
214Matching the ``rest'' of a nodeset is achieved by using a @code{. rest)} pattern
215at the end of an element or nodeset pattern.
216
217This is illustrated in the example below:
218
219@lisp
220(sxml-match '(e 3 (f 4 5 6) 7)
221 [(e ,a (f . ,y) ,d)
222 (list a y d)])
223@end lisp
224
225The above expression returns @code{(3 (4 5 6) 7)}.
226
227@unnumberedsubsec Matching the Unmatched Attributes
228
229Sometimes it is useful to bind a list of attributes present in the element being
230matched, but which do not appear in the pattern. This is achieved by using a
231@code{. rest)} pattern at the end of the attribute list pattern. This is
232illustrated in the example below:
233
234@lisp
bdfcabfe
LC
235(sxml-match '(a (@@ (z 1) (y 2) (x 3)) 4 5 6)
236 [(a (@@ (y ,www) . ,qqq) ,t ,u ,v)
400a5dcb
LC
237 (list www qqq t u v)])
238@end lisp
239
240The above expression matches the attribute @code{y} and binds a list of the
241remaining attributes to the variable @var{qqq}. The result of the above
242expression is @code{(2 ((z 1) (x 3)) 4 5 6)}.
243
244This type of pattern also allows the binding of all attributes:
245
246@lisp
bdfcabfe
LC
247(sxml-match '(a (@@ (z 1) (y 2) (x 3)))
248 [(a (@@ . ,qqq))
400a5dcb
LC
249 qqq])
250@end lisp
251
252@unnumberedsubsec Default Values in Attribute Patterns
253
254It is possible to specify a default value for an attribute which is used if the
255attribute is not present in the element being matched. This is illustrated in
256the following example:
257
258@lisp
259(sxml-match '(e 3 4 5)
bdfcabfe 260 [(e (@@ (z (,d 1))) ,a ,b ,c) (list d a b c)])
400a5dcb
LC
261@end lisp
262
263The value @code{1} is used when the attribute @code{z} is absent from the
264element @code{e}.
265
266@unnumberedsubsec Guards in Patterns
267
268Guards may be added to a pattern clause via the @code{guard} keyword. A guard
269expression may include zero or more expressions which are evaluated only if the
270pattern is matched. The body of the clause is only evaluated if the guard
271expressions evaluate to @code{#t}.
272
273The use of guard expressions is illustrated below:
274
275@lisp
276(sxml-match '(a 2 3)
277 ((a ,n) (guard (number? n)) n)
278 ((a ,m ,n) (guard (number? m) (number? n)) (+ m n)))
279@end lisp
280
281@unnumberedsubsec Catamorphisms
282
283The example below illustrates the use of explicit recursion within an
284@code{sxml-match} form. This example implements a simple calculator for the
285basic arithmetic operations, which are represented by the XML elements
286@code{plus}, @code{minus}, @code{times}, and @code{div}.
287
288@lisp
289(define simple-eval
290 (lambda (x)
291 (sxml-match x
292 [,i (guard (integer? i)) i]
293 [(plus ,x ,y) (+ (simple-eval x) (simple-eval y))]
294 [(times ,x ,y) (* (simple-eval x) (simple-eval y))]
295 [(minus ,x ,y) (- (simple-eval x) (simple-eval y))]
296 [(div ,x ,y) (/ (simple-eval x) (simple-eval y))]
297 [,otherwise (error "simple-eval: invalid expression" x)])))
298@end lisp
299
300Using the catamorphism feature of @code{sxml-match}, a more concise version of
ecb87335 301@code{simple-eval} can be written. The pattern @code{,[x]} recursively invokes
400a5dcb
LC
302the pattern matcher on the value bound in this position.
303
304@lisp
305(define simple-eval
306 (lambda (x)
307 (sxml-match x
308 [,i (guard (integer? i)) i]
309 [(plus ,[x] ,[y]) (+ x y)]
310 [(times ,[x] ,[y]) (* x y)]
311 [(minus ,[x] ,[y]) (- x y)]
312 [(div ,[x] ,[y]) (/ x y)]
313 [,otherwise (error "simple-eval: invalid expression" x)])))
314@end lisp
315
316@unnumberedsubsec Named-Catamorphisms
317
318It is also possible to explicitly name the operator in the ``cata'' position.
319Where @code{,[id*]} recurs to the top of the current @code{sxml-match},
320@code{,[cata -> id*]} recurs to @code{cata}. @code{cata} must evaluate to a
321procedure which takes one argument, and returns as many values as there are
322identifiers following @code{->}.
323
324Named catamorphism patterns allow processing to be split into multiple, mutually
325recursive procedures. This is illustrated in the example below: a
bdfcabfe 326transformation that formats a ``TV Guide'' into HTML.
400a5dcb
LC
327
328@lisp
329(define (tv-guide->html g)
330 (define (cast-list cl)
331 (sxml-match cl
332 [(CastList (CastMember (Character (Name ,ch)) (Actor (Name ,a))) ...)
333 `(div (ul (li ,ch ": " ,a) ...))]))
334 (define (prog p)
335 (sxml-match p
336 [(Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
337 (Description ,desc ...))
338 `(div (p ,start-time
339 (br) ,series-title
340 (br) ,desc ...))]
341 [(Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
342 (Description ,desc ...)
343 ,[cast-list -> cl])
344 `(div (p ,start-time
345 (br) ,series-title
346 (br) ,desc ...)
347 ,cl)]))
348 (sxml-match g
bdfcabfe 349 [(TVGuide (@@ (start ,start-date)
400a5dcb
LC
350 (end ,end-date))
351 (Channel (Name ,nm) ,[prog -> p] ...) ...)
352 `(html (head (title "TV Guide"))
353 (body (h1 "TV Guide")
354 (div (h2 ,nm) ,p ...) ...))]))
355@end lisp
356
357@unnumberedsubsec @code{sxml-match-let} and @code{sxml-match-let*}
358
9cbcc73f
BT
359@deffn {Scheme Syntax} sxml-match-let ((pat expr) ...) expression0 expression ...
360@deffnx {Scheme Syntax} sxml-match-let* ((pat expr) ...) expression0 expression ...
400a5dcb
LC
361These forms generalize the @code{let} and @code{let*} forms of Scheme to allow
362an XML pattern in the binding position, rather than a simple variable.
363@end deffn
364
365For example, the expression below:
366
367@lisp
368(sxml-match-let ([(a ,i ,j) '(a 1 2)])
369 (+ i j))
370@end lisp
371
372binds the variables @var{i} and @var{j} to @code{1} and @code{2} in the XML
373value given.
374
375@c Local Variables:
376@c coding: utf-8
377@c End: