Commit | Line | Data |
---|---|---|
400a5dcb LC |
1 | @c -*-texinfo-*- |
2 | @c This is part of the GNU Guile Reference Manual. | |
3e31e75a | 3 | @c Copyright (C) 2010, 2013 Free Software Foundation, Inc. |
400a5dcb LC |
4 | @c See the file guile.texi for copying conditions. |
5 | @c | |
6 | @c Based on the documentation at | |
7 | @c <http://planet.plt-scheme.org/package-source/jim/sxml-match.plt/1/1/doc.txt>, | |
8 | @c copyright 2005 Jim Bender, and released under the MIT/X11 license (like the | |
9 | @c rest of `sxml-match'.) | |
10 | @c | |
11 | @c Converted to Texinfo and modified by Ludovic Courtès, 2010. | |
12 | ||
13 | @node sxml-match | |
14 | @section @code{sxml-match}: Pattern Matching of SXML | |
15 | ||
16 | @cindex pattern matching (SXML) | |
17 | @cindex SXML pattern matching | |
18 | ||
3e31e75a AW |
19 | The @code{(sxml match)} module provides syntactic forms for pattern |
20 | matching of SXML trees, in a ``by example'' style reminiscent of the | |
21 | pattern matching of the @code{syntax-rules} and @code{syntax-case} macro | |
22 | systems. @xref{SXML}, for more information on SXML. | |
400a5dcb LC |
23 | |
24 | The following example@footnote{This example is taken from a paper by | |
25 | Krishnamurthi et al. Their paper was the first to show the usefulness of the | |
26 | @code{syntax-rules} style of pattern matching for transformation of XML, though | |
27 | the language described, XT3D, is an XML language.} provides a brief | |
28 | illustration, transforming a music album catalog language into HTML. | |
29 | ||
30 | @lisp | |
31 | (define (album->html x) | |
32 | (sxml-match x | |
bdfcabfe | 33 | [(album (@@ (title ,t)) (catalog (num ,n) (fmt ,f)) ...) |
400a5dcb LC |
34 | `(ul (li ,t) |
35 | (li (b ,n) (i ,f)) ...)])) | |
36 | @end lisp | |
37 | ||
38 | Three macros are provided: @code{sxml-match}, @code{sxml-match-let}, and | |
39 | @code{sxml-match-let*}. | |
40 | ||
358663ca LC |
41 | Compared to a standard s-expression pattern matcher (@pxref{Pattern |
42 | Matching}), @code{sxml-match} provides the following benefits: | |
400a5dcb LC |
43 | |
44 | @itemize | |
45 | @item | |
46 | matching of SXML elements does not depend on any degree of normalization of the | |
47 | SXML; | |
48 | @item | |
49 | matching of SXML attributes (within an element) is under-ordered; the order of | |
50 | the attributes specified within the pattern need not match the ordering with the | |
51 | element being matched; | |
52 | @item | |
53 | all attributes specified in the pattern must be present in the element being | |
54 | matched; in the spirit that XML is 'extensible', the element being matched may | |
55 | include additional attributes not specified in the pattern. | |
56 | @end itemize | |
57 | ||
58 | The present module is a descendant of WebIt!, and was inspired by an | |
59 | s-expression pattern matcher developed by Erik Hilsdale, Dan Friedman, and Kent | |
60 | Dybvig at Indiana University. | |
61 | ||
62 | @unnumberedsubsec Syntax | |
63 | ||
64 | @code{sxml-match} provides @code{case}-like form for pattern matching of XML | |
65 | nodes. | |
66 | ||
df0a1002 | 67 | @deffn {Scheme Syntax} sxml-match input-expression clause1 clause2 @dots{} |
400a5dcb LC |
68 | Match @var{input-expression}, an SXML tree, according to the given @var{clause}s |
69 | (one or more), each consisting of a pattern and one or more expressions to be | |
70 | evaluated if the pattern match succeeds. Optionally, each @var{clause} within | |
71 | @code{sxml-match} may include a @dfn{guard expression}. | |
72 | @end deffn | |
73 | ||
74 | The pattern notation is based on that of Scheme's @code{syntax-rules} and | |
75 | @code{syntax-case} macro systems. The grammar for the @code{sxml-match} syntax | |
76 | is given below: | |
77 | ||
78 | @verbatim | |
79 | match-form ::= (sxml-match input-expression | |
80 | clause+) | |
81 | ||
82 | clause ::= [node-pattern action-expression+] | |
83 | | [node-pattern (guard expression*) action-expression+] | |
84 | ||
85 | node-pattern ::= literal-pattern | |
86 | | pat-var-or-cata | |
87 | | element-pattern | |
88 | | list-pattern | |
89 | ||
90 | literal-pattern ::= string | |
91 | | character | |
92 | | number | |
93 | | #t | |
94 | | #f | |
95 | ||
96 | attr-list-pattern ::= (@ attribute-pattern*) | |
97 | | (@ attribute-pattern* . pat-var-or-cata) | |
98 | ||
99 | attribute-pattern ::= (tag-symbol attr-val-pattern) | |
100 | ||
101 | attr-val-pattern ::= literal-pattern | |
102 | | pat-var-or-cata | |
103 | | (pat-var-or-cata default-value-expr) | |
104 | ||
105 | element-pattern ::= (tag-symbol attr-list-pattern?) | |
106 | | (tag-symbol attr-list-pattern? nodeset-pattern) | |
107 | | (tag-symbol attr-list-pattern? | |
108 | nodeset-pattern? . pat-var-or-cata) | |
109 | ||
110 | list-pattern ::= (list nodeset-pattern) | |
111 | | (list nodeset-pattern? . pat-var-or-cata) | |
112 | | (list) | |
113 | ||
114 | nodeset-pattern ::= node-pattern | |
115 | | node-pattern ... | |
116 | | node-pattern nodeset-pattern | |
117 | | node-pattern ... nodeset-pattern | |
118 | ||
119 | pat-var-or-cata ::= (unquote var-symbol) | |
120 | | (unquote [var-symbol*]) | |
121 | | (unquote [cata-expression -> var-symbol*]) | |
122 | @end verbatim | |
123 | ||
124 | Within a list or element body pattern, ellipses may appear only once, but may be | |
125 | followed by zero or more node patterns. | |
126 | ||
127 | Guard expressions cannot refer to the return values of catamorphisms. | |
128 | ||
129 | Ellipses in the output expressions must appear only in an expression context; | |
130 | ellipses are not allowed in a syntactic form. | |
131 | ||
132 | The sections below illustrate specific aspects of the @code{sxml-match} pattern | |
133 | matcher. | |
134 | ||
135 | @unnumberedsubsec Matching XML Elements | |
136 | ||
137 | The example below illustrates the pattern matching of an XML element: | |
138 | ||
139 | @lisp | |
bdfcabfe LC |
140 | (sxml-match '(e (@@ (i 1)) 3 4 5) |
141 | [(e (@@ (i ,d)) ,a ,b ,c) (list d a b c)] | |
400a5dcb LC |
142 | [,otherwise #f]) |
143 | @end lisp | |
144 | ||
145 | Each clause in @code{sxml-match} contains two parts: a pattern and one or more | |
146 | expressions which are evaluated if the pattern is successfully match. The | |
147 | example above matches an element @code{e} with an attribute @code{i} and three | |
148 | children. | |
149 | ||
150 | Pattern variables are must be ``unquoted'' in the pattern. The above expression | |
151 | binds @var{d} to @code{1}, @var{a} to @code{3}, @var{b} to @code{4}, and @var{c} | |
152 | to @code{5}. | |
153 | ||
154 | @unnumberedsubsec Ellipses in Patterns | |
155 | ||
156 | As in @code{syntax-rules}, ellipses may be used to specify a repeated pattern. | |
157 | Note that the pattern @code{item ...} specifies zero-or-more matches of the | |
158 | pattern @code{item}. | |
159 | ||
160 | The use of ellipses in a pattern is illustrated in the code fragment below, | |
161 | where nested ellipses are used to match the children of repeated instances of an | |
162 | @code{a} element, within an element @code{d}. | |
163 | ||
164 | @lisp | |
165 | (define x '(d (a 1 2 3) (a 4 5) (a 6 7 8) (a 9 10))) | |
166 | ||
167 | (sxml-match x | |
168 | [(d (a ,b ...) ...) | |
169 | (list (list b ...) ...)]) | |
170 | @end lisp | |
171 | ||
172 | The above expression returns a value of @code{((1 2 3) (4 5) (6 7 8) (9 10))}. | |
173 | ||
174 | @unnumberedsubsec Ellipses in Quasiquote'd Output | |
175 | ||
176 | Within the body of an @code{sxml-match} form, a slightly extended version of | |
177 | quasiquote is provided, which allows the use of ellipses. This is illustrated | |
178 | in the example below. | |
179 | ||
180 | @lisp | |
181 | (sxml-match '(e 3 4 5 6 7) | |
182 | [(e ,i ... 6 7) `("start" ,(list 'wrap i) ... "end")] | |
183 | [,otherwise #f]) | |
184 | @end lisp | |
185 | ||
186 | The general pattern is that @code{`(something ,i ...)} is rewritten as | |
187 | @code{`(something ,@@i)}. | |
188 | ||
189 | @unnumberedsubsec Matching Nodesets | |
190 | ||
191 | A nodeset pattern is designated by a list in the pattern, beginning the | |
192 | identifier list. The example below illustrates matching a nodeset. | |
193 | ||
194 | @lisp | |
195 | (sxml-match '("i" "j" "k" "l" "m") | |
196 | [(list ,a ,b ,c ,d ,e) | |
197 | `((p ,a) (p ,b) (p ,c) (p ,d) (p ,e))]) | |
198 | @end lisp | |
199 | ||
200 | This example wraps each nodeset item in an HTML paragraph element. This example | |
201 | can be rewritten and simplified through using ellipsis: | |
202 | ||
203 | @lisp | |
204 | (sxml-match '("i" "j" "k" "l" "m") | |
205 | [(list ,i ...) | |
206 | `((p ,i) ...)]) | |
207 | @end lisp | |
208 | ||
209 | This version will match nodesets of any length, and wrap each item in the | |
210 | nodeset in an HTML paragraph element. | |
211 | ||
212 | @unnumberedsubsec Matching the ``Rest'' of a Nodeset | |
213 | ||
214 | Matching the ``rest'' of a nodeset is achieved by using a @code{. rest)} pattern | |
215 | at the end of an element or nodeset pattern. | |
216 | ||
217 | This is illustrated in the example below: | |
218 | ||
219 | @lisp | |
220 | (sxml-match '(e 3 (f 4 5 6) 7) | |
221 | [(e ,a (f . ,y) ,d) | |
222 | (list a y d)]) | |
223 | @end lisp | |
224 | ||
225 | The above expression returns @code{(3 (4 5 6) 7)}. | |
226 | ||
227 | @unnumberedsubsec Matching the Unmatched Attributes | |
228 | ||
229 | Sometimes it is useful to bind a list of attributes present in the element being | |
230 | matched, but which do not appear in the pattern. This is achieved by using a | |
231 | @code{. rest)} pattern at the end of the attribute list pattern. This is | |
232 | illustrated in the example below: | |
233 | ||
234 | @lisp | |
bdfcabfe LC |
235 | (sxml-match '(a (@@ (z 1) (y 2) (x 3)) 4 5 6) |
236 | [(a (@@ (y ,www) . ,qqq) ,t ,u ,v) | |
400a5dcb LC |
237 | (list www qqq t u v)]) |
238 | @end lisp | |
239 | ||
240 | The above expression matches the attribute @code{y} and binds a list of the | |
241 | remaining attributes to the variable @var{qqq}. The result of the above | |
242 | expression is @code{(2 ((z 1) (x 3)) 4 5 6)}. | |
243 | ||
244 | This type of pattern also allows the binding of all attributes: | |
245 | ||
246 | @lisp | |
bdfcabfe LC |
247 | (sxml-match '(a (@@ (z 1) (y 2) (x 3))) |
248 | [(a (@@ . ,qqq)) | |
400a5dcb LC |
249 | qqq]) |
250 | @end lisp | |
251 | ||
252 | @unnumberedsubsec Default Values in Attribute Patterns | |
253 | ||
254 | It is possible to specify a default value for an attribute which is used if the | |
255 | attribute is not present in the element being matched. This is illustrated in | |
256 | the following example: | |
257 | ||
258 | @lisp | |
259 | (sxml-match '(e 3 4 5) | |
bdfcabfe | 260 | [(e (@@ (z (,d 1))) ,a ,b ,c) (list d a b c)]) |
400a5dcb LC |
261 | @end lisp |
262 | ||
263 | The value @code{1} is used when the attribute @code{z} is absent from the | |
264 | element @code{e}. | |
265 | ||
266 | @unnumberedsubsec Guards in Patterns | |
267 | ||
268 | Guards may be added to a pattern clause via the @code{guard} keyword. A guard | |
269 | expression may include zero or more expressions which are evaluated only if the | |
270 | pattern is matched. The body of the clause is only evaluated if the guard | |
271 | expressions evaluate to @code{#t}. | |
272 | ||
273 | The use of guard expressions is illustrated below: | |
274 | ||
275 | @lisp | |
276 | (sxml-match '(a 2 3) | |
277 | ((a ,n) (guard (number? n)) n) | |
278 | ((a ,m ,n) (guard (number? m) (number? n)) (+ m n))) | |
279 | @end lisp | |
280 | ||
281 | @unnumberedsubsec Catamorphisms | |
282 | ||
283 | The example below illustrates the use of explicit recursion within an | |
284 | @code{sxml-match} form. This example implements a simple calculator for the | |
285 | basic arithmetic operations, which are represented by the XML elements | |
286 | @code{plus}, @code{minus}, @code{times}, and @code{div}. | |
287 | ||
288 | @lisp | |
289 | (define simple-eval | |
290 | (lambda (x) | |
291 | (sxml-match x | |
292 | [,i (guard (integer? i)) i] | |
293 | [(plus ,x ,y) (+ (simple-eval x) (simple-eval y))] | |
294 | [(times ,x ,y) (* (simple-eval x) (simple-eval y))] | |
295 | [(minus ,x ,y) (- (simple-eval x) (simple-eval y))] | |
296 | [(div ,x ,y) (/ (simple-eval x) (simple-eval y))] | |
297 | [,otherwise (error "simple-eval: invalid expression" x)]))) | |
298 | @end lisp | |
299 | ||
300 | Using the catamorphism feature of @code{sxml-match}, a more concise version of | |
ecb87335 | 301 | @code{simple-eval} can be written. The pattern @code{,[x]} recursively invokes |
400a5dcb LC |
302 | the pattern matcher on the value bound in this position. |
303 | ||
304 | @lisp | |
305 | (define simple-eval | |
306 | (lambda (x) | |
307 | (sxml-match x | |
308 | [,i (guard (integer? i)) i] | |
309 | [(plus ,[x] ,[y]) (+ x y)] | |
310 | [(times ,[x] ,[y]) (* x y)] | |
311 | [(minus ,[x] ,[y]) (- x y)] | |
312 | [(div ,[x] ,[y]) (/ x y)] | |
313 | [,otherwise (error "simple-eval: invalid expression" x)]))) | |
314 | @end lisp | |
315 | ||
316 | @unnumberedsubsec Named-Catamorphisms | |
317 | ||
318 | It is also possible to explicitly name the operator in the ``cata'' position. | |
319 | Where @code{,[id*]} recurs to the top of the current @code{sxml-match}, | |
320 | @code{,[cata -> id*]} recurs to @code{cata}. @code{cata} must evaluate to a | |
321 | procedure which takes one argument, and returns as many values as there are | |
322 | identifiers following @code{->}. | |
323 | ||
324 | Named catamorphism patterns allow processing to be split into multiple, mutually | |
325 | recursive procedures. This is illustrated in the example below: a | |
bdfcabfe | 326 | transformation that formats a ``TV Guide'' into HTML. |
400a5dcb LC |
327 | |
328 | @lisp | |
329 | (define (tv-guide->html g) | |
330 | (define (cast-list cl) | |
331 | (sxml-match cl | |
332 | [(CastList (CastMember (Character (Name ,ch)) (Actor (Name ,a))) ...) | |
333 | `(div (ul (li ,ch ": " ,a) ...))])) | |
334 | (define (prog p) | |
335 | (sxml-match p | |
336 | [(Program (Start ,start-time) (Duration ,dur) (Series ,series-title) | |
337 | (Description ,desc ...)) | |
338 | `(div (p ,start-time | |
339 | (br) ,series-title | |
340 | (br) ,desc ...))] | |
341 | [(Program (Start ,start-time) (Duration ,dur) (Series ,series-title) | |
342 | (Description ,desc ...) | |
343 | ,[cast-list -> cl]) | |
344 | `(div (p ,start-time | |
345 | (br) ,series-title | |
346 | (br) ,desc ...) | |
347 | ,cl)])) | |
348 | (sxml-match g | |
bdfcabfe | 349 | [(TVGuide (@@ (start ,start-date) |
400a5dcb LC |
350 | (end ,end-date)) |
351 | (Channel (Name ,nm) ,[prog -> p] ...) ...) | |
352 | `(html (head (title "TV Guide")) | |
353 | (body (h1 "TV Guide") | |
354 | (div (h2 ,nm) ,p ...) ...))])) | |
355 | @end lisp | |
356 | ||
357 | @unnumberedsubsec @code{sxml-match-let} and @code{sxml-match-let*} | |
358 | ||
9cbcc73f BT |
359 | @deffn {Scheme Syntax} sxml-match-let ((pat expr) ...) expression0 expression ... |
360 | @deffnx {Scheme Syntax} sxml-match-let* ((pat expr) ...) expression0 expression ... | |
400a5dcb LC |
361 | These forms generalize the @code{let} and @code{let*} forms of Scheme to allow |
362 | an XML pattern in the binding position, rather than a simple variable. | |
363 | @end deffn | |
364 | ||
365 | For example, the expression below: | |
366 | ||
367 | @lisp | |
368 | (sxml-match-let ([(a ,i ,j) '(a 1 2)]) | |
369 | (+ i j)) | |
370 | @end lisp | |
371 | ||
372 | binds the variables @var{i} and @var{j} to @code{1} and @code{2} in the XML | |
373 | value given. | |
374 | ||
375 | @c Local Variables: | |
376 | @c coding: utf-8 | |
377 | @c End: |