Commit | Line | Data |
---|---|---|
eee0877c AW |
1 | @c -*-texinfo-*- |
2 | @c This is part of the GNU Guile Reference Manual. | |
718bc349 | 3 | @c Copyright (C) 2006, 2010, 2011 |
eee0877c AW |
4 | @c Free Software Foundation, Inc. |
5 | @c See the file guile.texi for copying conditions. | |
6 | ||
7 | @node PEG Parsing | |
8 | @section PEG Parsing | |
9 | ||
718bc349 AW |
10 | Parsing Expression Grammars (PEGs) are a way of specifying formal |
11 | languages for text processing. They can be used either for matching | |
12 | (like regular expressions) or for building recursive descent parsers | |
13 | (like lex/yacc). Guile uses a superset of PEG syntax that allows more | |
14 | control over what information is preserved during parsing. | |
eee0877c | 15 | |
718bc349 AW |
16 | Wikipedia has a clear and concise introduction to PEGs if you want to |
17 | familiarize yourself with the syntax: | |
18 | @url{http://en.wikipedia.org/wiki/Parsing_expression_grammar}. | |
eee0877c | 19 | |
718bc349 AW |
20 | The module works by compiling PEGs down to lambda expressions. These |
21 | can either be stored in variables at compile-time by the define macros | |
22 | (@code{define-nonterm} and @code{define-grammar}) or calculated | |
23 | explicitly at runtime with the compile functions | |
24 | (@code{peg-sexp-compile} and @code{peg-string-compile}). | |
eee0877c | 25 | |
718bc349 AW |
26 | They can then be used for either parsing (@code{peg-parse}) or matching |
27 | (@code{peg-match}). For convenience, @code{peg-match} also takes | |
28 | pattern literals in case you want to inline a simple search (people | |
29 | often use regular expressions this way). | |
eee0877c | 30 | |
718bc349 AW |
31 | The rest of this documentation consists of a syntax reference, an API |
32 | reference, and a tutorial. | |
eee0877c AW |
33 | |
34 | @menu | |
35 | * PEG Syntax Reference:: | |
36 | * PEG API Reference:: | |
37 | * PEG Tutorial:: | |
97c84694 | 38 | * PEG Internals:: |
eee0877c AW |
39 | @end menu |
40 | ||
41 | @node PEG Syntax Reference | |
42 | @subsection PEG Syntax Reference | |
43 | ||
44 | @subsubheading Normal PEG Syntax: | |
45 | ||
718bc349 AW |
46 | @deftp {PEG Pattern} sequence a b |
47 | Parses @var{a}. If this succeeds, continues to parse @var{b} from the | |
48 | end of the text parsed as @var{a}. Succeeds if both @var{a} and | |
49 | @var{b} succeed. | |
50 | ||
51 | @code{"a b"} | |
52 | ||
53 | @code{(and a b)} | |
54 | @end deftp | |
55 | ||
56 | @deftp {PEG Pattern} {ordered choice} a b | |
57 | Parses @var{a}. If this fails, backtracks and parses @var{b}. | |
58 | Succeeds if either @var{a} or @var{b} succeeds. | |
59 | ||
60 | @code{"a/b"} | |
61 | ||
62 | @code{(or a b)} | |
63 | @end deftp | |
64 | ||
65 | @deftp {PEG Pattern} {zero or more} a | |
66 | Parses @var{a} as many times in a row as it can, starting each @var{a} | |
67 | at the end of the text parsed by the previous @var{a}. Always | |
68 | succeeds. | |
69 | ||
70 | @code{"a*"} | |
71 | ||
f310a111 | 72 | @code{(* a)} |
718bc349 AW |
73 | @end deftp |
74 | ||
75 | @deftp {PEG Pattern} {one or more} a | |
76 | Parses @var{a} as many times in a row as it can, starting each @var{a} | |
77 | at the end of the text parsed by the previous @var{a}. Succeeds if at | |
78 | least one @var{a} was parsed. | |
79 | ||
80 | @code{"a+"} | |
81 | ||
3d19969d | 82 | @code{(+ a)} |
718bc349 AW |
83 | @end deftp |
84 | ||
85 | @deftp {PEG Pattern} optional a | |
86 | Tries to parse @var{a}. Succeeds if @var{a} succeeds. | |
87 | ||
88 | @code{"a?"} | |
89 | ||
8e97edd5 | 90 | @code{(? a)} |
718bc349 AW |
91 | @end deftp |
92 | ||
93 | @deftp {PEG Pattern} {and predicate} a | |
94 | Makes sure it is possible to parse @var{a}, but does not actually parse | |
95 | it. Succeeds if @var{a} would succeed. | |
96 | ||
97 | @code{"&a"} | |
98 | ||
99 | @code{(body & a 1)} | |
100 | @end deftp | |
101 | ||
102 | @deftp {PEG Pattern} {not predicate} a | |
103 | Makes sure it is impossible to parse @var{a}, but does not actually | |
104 | parse it. Succeeds if @var{a} would fail. | |
105 | ||
106 | @code{"!a"} | |
107 | ||
108 | @code{(body ! a 1)} | |
109 | @end deftp | |
110 | ||
111 | @deftp {PEG Pattern} {string literal} ``abc'' | |
112 | Parses the string @var{"abc"}. Succeeds if that parsing succeeds. | |
113 | ||
114 | @code{"'abc'"} | |
115 | ||
116 | @code{"abc"} | |
117 | @end deftp | |
118 | ||
119 | @deftp {PEG Pattern} {any character} | |
120 | Parses any single character. Succeeds unless there is no more text to | |
121 | be parsed. | |
122 | ||
123 | @code{"."} | |
124 | ||
125 | @code{peg-any} | |
126 | @end deftp | |
127 | ||
128 | @deftp {PEG Pattern} {character class} a b | |
129 | Alternative syntax for ``Ordered Choice @var{a} @var{b}'' if @var{a} and | |
130 | @var{b} are characters. | |
131 | ||
132 | @code{"[ab]"} | |
133 | ||
134 | @code{(or "a" "b")} | |
135 | @end deftp | |
136 | ||
137 | @deftp {PEG Pattern} {range of characters} a z | |
138 | Parses any character falling between @var{a} and @var{z}. | |
139 | ||
140 | @code{"[a-z]"} | |
141 | ||
142 | @code{(range #\a #\z)} | |
143 | @end deftp | |
144 | ||
145 | Example: | |
146 | ||
147 | @example | |
148 | "(a !b / c &d*) 'e'+" | |
149 | @end example | |
150 | ||
eee0877c | 151 | Would be: |
718bc349 | 152 | |
eee0877c AW |
153 | @lisp |
154 | (and | |
155 | (or | |
156 | (and a (body ! b 1)) | |
157 | (and c (body & d *))) | |
158 | (body lit "e" +)) | |
159 | @end lisp | |
160 | ||
718bc349 AW |
161 | @subsubheading Extended Syntax |
162 | ||
eee0877c AW |
163 | There is some extra syntax for S-expressions. |
164 | ||
718bc349 AW |
165 | @deftp {PEG Pattern} ignore a |
166 | Ignore the text matching @var{a} | |
167 | @end deftp | |
eee0877c | 168 | |
718bc349 AW |
169 | @deftp {PEG Pattern} capture a |
170 | Capture the text matching @var{a}. | |
171 | @end deftp | |
eee0877c | 172 | |
718bc349 AW |
173 | @deftp {PEG Pattern} peg a |
174 | Embed the PEG pattern @var{a} using string syntax. | |
175 | @end deftp | |
176 | ||
177 | Example: | |
eee0877c | 178 | |
718bc349 AW |
179 | @example |
180 | "!a / 'b'" | |
181 | @end example | |
eee0877c | 182 | |
eee0877c | 183 | Would be: |
718bc349 | 184 | |
eee0877c AW |
185 | @lisp |
186 | (or (peg "!a") "b") | |
187 | @end lisp | |
188 | ||
189 | @node PEG API Reference | |
190 | @subsection PEG API Reference | |
191 | ||
192 | @subsubheading Define Macros | |
193 | ||
718bc349 AW |
194 | The most straightforward way to define a PEG is by using one of the |
195 | define macros (both of these macroexpand into @code{define} | |
196 | expressions). These macros bind parsing functions to variables. These | |
197 | parsing functions may be invoked by @code{peg-parse} or | |
198 | @code{peg-match}, which return a PEG match record. Raw data can be | |
199 | retrieved from this record with the PEG match deconstructor functions. | |
200 | More complicated (and perhaps enlightening) examples can be found in the | |
201 | tutorial. | |
eee0877c AW |
202 | |
203 | @deffn {Scheme Macro} define-grammar peg-string | |
718bc349 AW |
204 | Defines all the nonterminals in the PEG @var{peg-string}. More |
205 | precisely, @code{define-grammar} takes a superset of PEGs. A normal PEG | |
206 | has a @code{<-} between the nonterminal and the pattern. | |
207 | @code{define-grammar} uses this symbol to determine what information it | |
208 | should propagate up the parse tree. The normal @code{<-} propagates the | |
209 | matched text up the parse tree, @code{<--} propagates the matched text | |
210 | up the parse tree tagged with the name of the nonterminal, and @code{<} | |
211 | discards that matched text and propagates nothing up the parse tree. | |
212 | Also, nonterminals may consist of any alphanumeric character or a ``-'' | |
213 | character (in normal PEGs nonterminals can only be alphabetic). | |
eee0877c AW |
214 | |
215 | For example, if we: | |
216 | @lisp | |
217 | (define-grammar | |
218 | "as <- 'a'+ | |
219 | bs <- 'b'+ | |
220 | as-or-bs <- as/bs") | |
221 | (define-grammar | |
222 | "as-tag <-- 'a'+ | |
223 | bs-tag <-- 'b'+ | |
224 | as-or-bs-tag <-- as-tag/bs-tag") | |
225 | @end lisp | |
226 | Then: | |
227 | @lisp | |
228 | (peg-parse as-or-bs "aabbcc") @result{} | |
229 | #<peg start: 0 end: 2 string: aabbcc tree: aa> | |
230 | (peg-parse as-or-bs-tag "aabbcc") @result{} | |
231 | #<peg start: 0 end: 2 string: aabbcc tree: (as-or-bs-tag (as-tag aa))> | |
232 | @end lisp | |
233 | ||
718bc349 AW |
234 | Note that in doing this, we have bound 6 variables at the toplevel |
235 | (@var{as}, @var{bs}, @var{as-or-bs}, @var{as-tag}, @var{bs-tag}, and | |
236 | @var{as-or-bs-tag}). | |
eee0877c AW |
237 | @end deffn |
238 | ||
239 | @deffn {Scheme Macro} define-nonterm name capture-type peg-sexp | |
718bc349 AW |
240 | Defines a single nonterminal @var{name}. @var{capture-type} determines |
241 | how much information is passed up the parse tree. @var{peg-sexp} is a | |
242 | PEG in S-expression form. | |
243 | ||
244 | Possible values for capture-type: | |
245 | ||
246 | @table @code | |
247 | @item all | |
248 | passes the matched text up the parse tree tagged with the name of the | |
249 | nonterminal. | |
250 | @item body | |
251 | passes the matched text up the parse tree. | |
252 | @item none | |
253 | passes nothing up the parse tree. | |
254 | @end table | |
eee0877c AW |
255 | |
256 | For Example, if we: | |
257 | @lisp | |
258 | (define-nonterm as body (body lit "a" +)) | |
259 | (define-nonterm bs body (body lit "b" +)) | |
260 | (define-nonterm as-or-bs body (or as bs)) | |
261 | (define-nonterm as-tag all (body lit "a" +)) | |
262 | (define-nonterm bs-tag all (body lit "b" +)) | |
263 | (define-nonterm as-or-bs-tag all (or as-tag bs-tag)) | |
264 | @end lisp | |
265 | Then: | |
266 | @lisp | |
267 | (peg-parse as-or-bs "aabbcc") @result{} | |
268 | #<peg start: 0 end: 2 string: aabbcc tree: aa> | |
269 | (peg-parse as-or-bs-tag "aabbcc") @result{} | |
270 | #<peg start: 0 end: 2 string: aabbcc tree: (as-or-bs-tag (as-tag aa))> | |
271 | @end lisp | |
272 | ||
718bc349 AW |
273 | Note that in doing this, we have bound 6 variables at the toplevel |
274 | (@var{as}, @var{bs}, @var{as-or-bs}, @var{as-tag}, @var{bs-tag}, and | |
275 | @var{as-or-bs-tag}). | |
eee0877c AW |
276 | @end deffn |
277 | ||
718bc349 AW |
278 | These are macros, with all that entails. If you've built up a list at |
279 | runtime and want to define a new PEG from it, you should e.g.: | |
eee0877c AW |
280 | @lisp |
281 | (define exp '(body lit "a" +)) | |
282 | (eval `(define-nonterm as body ,exp) (interaction-environment)) | |
283 | @end lisp | |
718bc349 AW |
284 | The @code{eval} function has a bad reputation with regard to efficiency, |
285 | but this is mostly because of the extra work that has to be done | |
286 | compiling the expressions, which has to be done anyway when compiling | |
287 | the PEGs at runtime. | |
eee0877c AW |
288 | |
289 | @subsubheading Compile Functions | |
718bc349 AW |
290 | It is sometimes useful to be able to compile anonymous PEG patterns at |
291 | runtime. These functions let you do that using either syntax. | |
eee0877c AW |
292 | |
293 | @deffn {Scheme Procedure} peg-string-compile peg-string capture-type | |
718bc349 AW |
294 | Compiles the PEG pattern in @var{peg-string} propagating according to |
295 | @var{capture-type} (capture-type can be any of the values from | |
296 | @code{define-nonterm}). | |
eee0877c AW |
297 | @end deffn |
298 | ||
299 | ||
300 | @deffn {Scheme Procedure} peg-sexp-compile peg-sexp capture-type | |
718bc349 AW |
301 | Compiles the PEG pattern in @var{peg-sexp} propagating according to |
302 | @var{capture-type} (capture-type can be any of the values from | |
303 | @code{define-nonterm}). | |
eee0877c AW |
304 | @end deffn |
305 | ||
306 | ||
307 | @subsubheading Parsing & Matching Functions | |
308 | ||
718bc349 AW |
309 | For our purposes, ``parsing'' means parsing a string into a tree |
310 | starting from the first character, while ``matching'' means searching | |
311 | through the string for a substring. In practice, the only difference | |
312 | between the two functions is that @code{peg-parse} gives up if it can't | |
313 | find a valid substring starting at index 0 and @code{peg-match} keeps | |
314 | looking. They are both equally capable of ``parsing'' and ``matching'' | |
315 | given those constraints. | |
eee0877c AW |
316 | |
317 | @deffn {Scheme Procedure} peg-parse nonterm string | |
718bc349 AW |
318 | Parses @var{string} using the PEG stored in @var{nonterm}. If no match |
319 | was found, @code{peg-parse} returns false. If a match was found, a PEG | |
320 | match record is returned. | |
321 | ||
322 | The @code{capture-type} argument to @code{define-nonterm} allows you to | |
323 | choose what information to hold on to while parsing. The options are: | |
324 | ||
325 | @table @code | |
326 | @item all | |
327 | tag the matched text with the nonterminal | |
328 | @item body | |
329 | just the matched text | |
330 | @item none | |
331 | nothing | |
332 | @end table | |
eee0877c AW |
333 | |
334 | @lisp | |
335 | (define-nonterm as all (body lit "a" +)) | |
336 | (peg-parse as "aabbcc") @result{} | |
337 | #<peg start: 0 end: 2 string: aabbcc tree: (as aa)> | |
338 | ||
339 | (define-nonterm as body (body lit "a" +)) | |
340 | (peg-parse as "aabbcc") @result{} | |
341 | #<peg start: 0 end: 2 string: aabbcc tree: aa> | |
342 | ||
343 | (define-nonterm as none (body lit "a" +)) | |
344 | (peg-parse as "aabbcc") @result{} | |
345 | #<peg start: 0 end: 2 string: aabbcc tree: ()> | |
346 | ||
347 | (define-nonterm bs body (body lit "b" +)) | |
348 | (peg-parse bs "aabbcc") @result{} | |
349 | #f | |
350 | @end lisp | |
351 | @end deffn | |
352 | ||
353 | @deffn {Scheme Macro} peg-match nonterm-or-peg string | |
718bc349 AW |
354 | Searches through @var{string} looking for a matching subexpression. |
355 | @var{nonterm-or-peg} can either be a nonterminal or a literal PEG | |
356 | pattern. When a literal PEG pattern is provided, @code{peg-match} works | |
357 | very similarly to the regular expression searches many hackers are used | |
358 | to. If no match was found, @code{peg-match} returns false. If a match | |
359 | was found, a PEG match record is returned. | |
eee0877c AW |
360 | |
361 | @lisp | |
362 | (define-nonterm as body (body lit "a" +)) | |
363 | (peg-match as "aabbcc") @result{} | |
364 | #<peg start: 0 end: 2 string: aabbcc tree: aa> | |
365 | (peg-match (body lit "a" +) "aabbcc") @result{} | |
366 | #<peg start: 0 end: 2 string: aabbcc tree: aa> | |
367 | (peg-match "'a'+" "aabbcc") @result{} | |
368 | #<peg start: 0 end: 2 string: aabbcc tree: aa> | |
369 | ||
370 | (define-nonterm as all (body lit "a" +)) | |
371 | (peg-match as "aabbcc") @result{} | |
372 | #<peg start: 0 end: 2 string: aabbcc tree: (as aa)> | |
373 | ||
374 | (define-nonterm bs body (body lit "b" +)) | |
375 | (peg-match bs "aabbcc") @result{} | |
376 | #<peg start: 2 end: 4 string: aabbcc tree: bb> | |
377 | (peg-match (body lit "b" +) "aabbcc") @result{} | |
378 | #<peg start: 2 end: 4 string: aabbcc tree: bb> | |
379 | (peg-match "'b'+" "aabbcc") @result{} | |
380 | #<peg start: 2 end: 4 string: aabbcc tree: bb> | |
381 | ||
382 | (define-nonterm zs body (body lit "z" +)) | |
383 | (peg-match zs "aabbcc") @result{} | |
384 | #f | |
385 | (peg-match (body lit "z" +) "aabbcc") @result{} | |
386 | #f | |
387 | (peg-match "'z'+" "aabbcc") @result{} | |
388 | #f | |
389 | @end lisp | |
390 | @end deffn | |
391 | ||
392 | @subsubheading PEG Match Records | |
718bc349 AW |
393 | The @code{peg-parse} and @code{peg-match} functions both return PEG |
394 | match records. Actual information can be extracted from these with the | |
395 | following functions. | |
eee0877c AW |
396 | |
397 | @deffn {Scheme Procedure} peg:string peg-match | |
718bc349 AW |
398 | Returns the original string that was parsed in the creation of |
399 | @code{peg-match}. | |
eee0877c AW |
400 | @end deffn |
401 | ||
402 | @deffn {Scheme Procedure} peg:start peg-match | |
718bc349 AW |
403 | Returns the index of the first parsed character in the original string |
404 | (from @code{peg:string}). If this is the same as @code{peg:end}, | |
405 | nothing was parsed. | |
eee0877c AW |
406 | @end deffn |
407 | ||
408 | @deffn {Scheme Procedure} peg:end peg-match | |
718bc349 AW |
409 | Returns one more than the index of the last parsed character in the |
410 | original string (from @code{peg:string}). If this is the same as | |
411 | @code{peg:start}, nothing was parsed. | |
eee0877c AW |
412 | @end deffn |
413 | ||
414 | @deffn {Scheme Procedure} peg:substring peg-match | |
718bc349 AW |
415 | Returns the substring parsed by @code{peg-match}. This is equivalent to |
416 | @code{(substring (peg:string peg-match) (peg:start peg-match) (peg:end | |
417 | peg-match))}. | |
eee0877c AW |
418 | @end deffn |
419 | ||
420 | @deffn {Scheme Procedure} peg:tree peg-match | |
421 | Returns the tree parsed by @code{peg-match}. | |
422 | @end deffn | |
423 | ||
424 | @deffn {Scheme Procedure} peg-record? peg-match | |
718bc349 AW |
425 | Returns true if @code{peg-match} is a PEG match record, or false |
426 | otherwise. | |
eee0877c AW |
427 | @end deffn |
428 | ||
429 | Example: | |
430 | @lisp | |
431 | (define-nonterm bs all (peg "'b'+")) | |
432 | ||
433 | (peg-match bs "aabbcc") @result{} | |
434 | #<peg start: 2 end: 4 string: aabbcc tree: (bs bb)> | |
435 | ||
436 | (let ((pm (peg-match bs "aabbcc"))) | |
437 | `((string ,(peg:string pm)) | |
438 | (start ,(peg:start pm)) | |
439 | (end ,(peg:end pm)) | |
440 | (substring ,(peg:substring pm)) | |
441 | (tree ,(peg:tree pm)) | |
442 | (record? ,(peg-record? pm)))) @result{} | |
443 | ((string "aabbcc") | |
444 | (start 2) | |
445 | (end 4) | |
446 | (substring "bb") | |
447 | (tree (bs "bb")) | |
448 | (record? #t)) | |
449 | @end lisp | |
450 | ||
451 | @subsubheading Miscellaneous | |
452 | ||
453 | @deffn {Scheme Procedure} context-flatten tst lst | |
718bc349 AW |
454 | Takes a predicate @var{tst} and a list @var{lst}. Flattens @var{lst} |
455 | until all elements are either atoms or satisfy @var{tst}. If @var{lst} | |
456 | itself satisfies @var{tst}, @code{(list lst)} is returned (this is a | |
457 | flat list whose only element satisfies @var{tst}). | |
eee0877c AW |
458 | |
459 | @lisp | |
460 | (context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(2 2 (1 1 (2 2)) (2 2 (1 1)))) @result{} | |
461 | (2 2 (1 1 (2 2)) 2 2 (1 1)) | |
462 | (context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(1 1 (1 1 (2 2)) (2 2 (1 1)))) @result{} | |
463 | ((1 1 (1 1 (2 2)) (2 2 (1 1)))) | |
464 | @end lisp | |
465 | ||
466 | If you're wondering why this is here, take a look at the tutorial. | |
467 | @end deffn | |
468 | ||
469 | @deffn {Scheme Procedure} keyword-flatten terms lst | |
718bc349 AW |
470 | A less general form of @code{context-flatten}. Takes a list of terminal |
471 | atoms @code{terms} and flattens @var{lst} until all elements are either | |
472 | atoms, or lists which have an atom from @code{terms} as their first | |
473 | element. | |
eee0877c AW |
474 | @lisp |
475 | (keyword-flatten '(a b) '(c a b (a c) (b c) (c (b a) (c a)))) @result{} | |
476 | (c a b (a c) (b c) c (b a) c a) | |
477 | @end lisp | |
478 | ||
479 | If you're wondering why this is here, take a look at the tutorial. | |
480 | @end deffn | |
481 | ||
482 | @node PEG Tutorial | |
483 | @subsection PEG Tutorial | |
484 | ||
485 | @subsubheading Parsing /etc/passwd | |
486 | This example will show how to parse /etc/passwd using PEGs. | |
487 | ||
488 | First we define an example /etc/passwd file: | |
489 | ||
490 | @lisp | |
491 | (define *etc-passwd* | |
492 | "root:x:0:0:root:/root:/bin/bash | |
493 | daemon:x:1:1:daemon:/usr/sbin:/bin/sh | |
494 | bin:x:2:2:bin:/bin:/bin/sh | |
495 | sys:x:3:3:sys:/dev:/bin/sh | |
496 | nobody:x:65534:65534:nobody:/nonexistent:/bin/sh | |
497 | messagebus:x:103:107::/var/run/dbus:/bin/false | |
498 | ") | |
499 | @end lisp | |
500 | ||
718bc349 AW |
501 | As a first pass at this, we might want to have all the entries in |
502 | /etc/passwd in a list. | |
eee0877c AW |
503 | |
504 | Doing this with string-based PEG syntax would look like this: | |
505 | @lisp | |
506 | (define-grammar | |
507 | "passwd <- entry* !. | |
508 | entry <-- (! NL .)* NL* | |
509 | NL < '\n'") | |
510 | @end lisp | |
718bc349 AW |
511 | |
512 | A @code{passwd} file is 0 or more entries (@code{entry*}) until the end | |
513 | of the file (@code{!.} (@code{.} is any character, so @code{!.} means | |
514 | ``not anything'')). We want to capture the data in the nonterminal | |
515 | @code{passwd}, but not tag it with the name, so we use @code{<-}. | |
516 | ||
517 | An entry is a series of 0 or more characters that aren't newlines | |
518 | (@code{(! NL .)*}) followed by 0 or more newlines (@code{NL*}). We want | |
519 | to tag all the entries with @code{entry}, so we use @code{<--}. | |
520 | ||
521 | A newline is just a literal newline (@code{'\n'}). We don't want a | |
522 | bunch of newlines cluttering up the output, so we use @code{<} to throw | |
523 | away the captured data. | |
eee0877c AW |
524 | |
525 | Here is the same PEG defined using S-expressions: | |
526 | @lisp | |
527 | (define-nonterm passwd body (and (body lit entry *) (body ! peg-any 1))) | |
528 | (define-nonterm entry all (and (body lit (and (body ! NL 1) peg-any) *) | |
529 | (body lit NL *))) | |
530 | (define-nonterm NL none "\n") | |
531 | @end lisp | |
532 | ||
718bc349 AW |
533 | Obviously this is much more verbose. On the other hand, it's more |
534 | explicit, and thus easier to build automatically. However, there are | |
535 | some tricks that make S-expressions easier to use in some cases. One is | |
536 | the @code{ignore} keyword; the string syntax has no way to say ``throw | |
537 | away this text'' except breaking it out into a separate nonterminal. | |
538 | For instance, to throw away the newlines we had to define @code{NL}. In | |
539 | the S-expression syntax, we could have simply written @code{(ignore | |
540 | "\n")}. Also, for the cases where string syntax is really much cleaner, | |
541 | the @code{peg} keyword can be used to embed string syntax in | |
542 | S-expression syntax. For instance, we could have written: | |
543 | ||
eee0877c AW |
544 | @lisp |
545 | (define-nonterm passwd body (peg "entry* !.")) | |
546 | @end lisp | |
547 | ||
718bc349 AW |
548 | However we define it, parsing @code{*etc-passwd*} with the @code{passwd} |
549 | nonterminal yields the same results: | |
550 | ||
eee0877c AW |
551 | @lisp |
552 | (peg:tree (peg-parse passwd *etc-passwd*)) @result{} | |
553 | ((entry "root:x:0:0:root:/root:/bin/bash") | |
554 | (entry "daemon:x:1:1:daemon:/usr/sbin:/bin/sh") | |
555 | (entry "bin:x:2:2:bin:/bin:/bin/sh") | |
556 | (entry "sys:x:3:3:sys:/dev:/bin/sh") | |
557 | (entry "nobody:x:65534:65534:nobody:/nonexistent:/bin/sh") | |
558 | (entry "messagebus:x:103:107::/var/run/dbus:/bin/false")) | |
559 | @end lisp | |
560 | ||
561 | However, here is something to be wary of: | |
718bc349 | 562 | |
eee0877c AW |
563 | @lisp |
564 | (peg:tree (peg-parse passwd "one entry")) @result{} | |
565 | (entry "one entry") | |
566 | @end lisp | |
567 | ||
718bc349 AW |
568 | By default, the parse trees generated by PEGs are compressed as much as |
569 | possible without losing information. It may not look like this is what | |
570 | you want at first, but uncompressed parse trees are an enormous headache | |
571 | (there's no easy way to predict how deep particular lists will nest, | |
572 | there are empty lists littered everywhere, etc. etc.). One side-effect | |
573 | of this, however, is that sometimes the compressor is too aggressive. | |
574 | No information is discarded when @code{((entry "one entry"))} is | |
575 | compressed to @code{(entry "one entry")}, but in this particular case it | |
576 | probably isn't what we want. | |
577 | ||
578 | There are two functions for easily dealing with this: | |
579 | @code{keyword-flatten} and @code{context-flatten}. The | |
580 | @code{keyword-flatten} function takes a list of keywords and a list to | |
581 | flatten, then tries to coerce the list such that the first element of | |
582 | all sublists is one of the keywords. The @code{context-flatten} | |
583 | function is similar, but instead of a list of keywords it takes a | |
584 | predicate that should indicate whether a given sublist is good enough | |
585 | (refer to the API reference for more details). | |
eee0877c AW |
586 | |
587 | What we want here is @code{keyword-flatten}. | |
588 | @lisp | |
589 | (keyword-flatten '(entry) (peg:tree (peg-parse passwd *etc-passwd*))) @result{} | |
590 | ((entry "root:x:0:0:root:/root:/bin/bash") | |
591 | (entry "daemon:x:1:1:daemon:/usr/sbin:/bin/sh") | |
592 | (entry "bin:x:2:2:bin:/bin:/bin/sh") | |
593 | (entry "sys:x:3:3:sys:/dev:/bin/sh") | |
594 | (entry "nobody:x:65534:65534:nobody:/nonexistent:/bin/sh") | |
595 | (entry "messagebus:x:103:107::/var/run/dbus:/bin/false")) | |
596 | (keyword-flatten '(entry) (peg:tree (peg-parse passwd "one entry"))) @result{} | |
597 | ((entry "one entry")) | |
598 | @end lisp | |
599 | ||
718bc349 AW |
600 | Of course, this is a somewhat contrived example. In practice we would |
601 | probably just tag the @code{passwd} nonterminal to remove the ambiguity | |
602 | (using either the @code{all} keyword for S-expressions or the @code{<--} | |
603 | symbol for strings).. | |
eee0877c AW |
604 | |
605 | @lisp | |
606 | (define-nonterm tag-passwd all (peg "entry* !.")) | |
607 | (peg:tree (peg-parse tag-passwd *etc-passwd*)) @result{} | |
608 | (tag-passwd | |
609 | (entry "root:x:0:0:root:/root:/bin/bash") | |
610 | (entry "daemon:x:1:1:daemon:/usr/sbin:/bin/sh") | |
611 | (entry "bin:x:2:2:bin:/bin:/bin/sh") | |
612 | (entry "sys:x:3:3:sys:/dev:/bin/sh") | |
613 | (entry "nobody:x:65534:65534:nobody:/nonexistent:/bin/sh") | |
614 | (entry "messagebus:x:103:107::/var/run/dbus:/bin/false")) | |
615 | (peg:tree (peg-parse tag-passwd "one entry")) | |
616 | (tag-passwd | |
617 | (entry "one entry")) | |
618 | @end lisp | |
619 | ||
718bc349 AW |
620 | If you're ever uncertain about the potential results of parsing |
621 | something, remember the two absolute rules: | |
622 | @enumerate | |
623 | @item | |
624 | No parsing information will ever be discarded. | |
625 | @item | |
626 | There will never be any lists with fewer than 2 elements. | |
627 | @end enumerate | |
628 | ||
629 | For the purposes of (1), "parsing information" means things tagged with | |
630 | the @code{any} keyword or the @code{<--} symbol. Plain strings will be | |
631 | concatenated. | |
eee0877c | 632 | |
718bc349 AW |
633 | Let's extend this example a bit more and actually pull some useful |
634 | information out of the passwd file: | |
eee0877c | 635 | |
eee0877c AW |
636 | @lisp |
637 | (define-grammar | |
638 | "passwd <-- entry* !. | |
639 | entry <-- login C pass C uid C gid C nameORcomment C homedir C shell NL* | |
640 | login <-- text | |
641 | pass <-- text | |
642 | uid <-- [0-9]* | |
643 | gid <-- [0-9]* | |
644 | nameORcomment <-- text | |
645 | homedir <-- path | |
646 | shell <-- path | |
647 | path <-- (SLASH pathELEMENT)* | |
648 | pathELEMENT <-- (!NL !C !'/' .)* | |
649 | text <- (!NL !C .)* | |
650 | C < ':' | |
651 | NL < '\n' | |
652 | SLASH < '/'") | |
653 | @end lisp | |
654 | ||
655 | This produces rather pretty parse trees: | |
656 | @lisp | |
657 | (passwd | |
658 | (entry (login "root") | |
659 | (pass "x") | |
660 | (uid "0") | |
661 | (gid "0") | |
662 | (nameORcomment "root") | |
663 | (homedir (path (pathELEMENT "root"))) | |
664 | (shell (path (pathELEMENT "bin") (pathELEMENT "bash")))) | |
665 | (entry (login "daemon") | |
666 | (pass "x") | |
667 | (uid "1") | |
668 | (gid "1") | |
669 | (nameORcomment "daemon") | |
670 | (homedir | |
671 | (path (pathELEMENT "usr") (pathELEMENT "sbin"))) | |
672 | (shell (path (pathELEMENT "bin") (pathELEMENT "sh")))) | |
673 | (entry (login "bin") | |
674 | (pass "x") | |
675 | (uid "2") | |
676 | (gid "2") | |
677 | (nameORcomment "bin") | |
678 | (homedir (path (pathELEMENT "bin"))) | |
679 | (shell (path (pathELEMENT "bin") (pathELEMENT "sh")))) | |
680 | (entry (login "sys") | |
681 | (pass "x") | |
682 | (uid "3") | |
683 | (gid "3") | |
684 | (nameORcomment "sys") | |
685 | (homedir (path (pathELEMENT "dev"))) | |
686 | (shell (path (pathELEMENT "bin") (pathELEMENT "sh")))) | |
687 | (entry (login "nobody") | |
688 | (pass "x") | |
689 | (uid "65534") | |
690 | (gid "65534") | |
691 | (nameORcomment "nobody") | |
692 | (homedir (path (pathELEMENT "nonexistent"))) | |
693 | (shell (path (pathELEMENT "bin") (pathELEMENT "sh")))) | |
694 | (entry (login "messagebus") | |
695 | (pass "x") | |
696 | (uid "103") | |
697 | (gid "107") | |
698 | nameORcomment | |
699 | (homedir | |
700 | (path (pathELEMENT "var") | |
701 | (pathELEMENT "run") | |
702 | (pathELEMENT "dbus"))) | |
703 | (shell (path (pathELEMENT "bin") (pathELEMENT "false"))))) | |
704 | @end lisp | |
705 | ||
718bc349 AW |
706 | Notice that when there's no entry in a field (e.g. @code{nameORcomment} |
707 | for messagebus) the symbol is inserted. This is the ``don't throw away | |
708 | any information'' rule---we succesfully matched a @code{nameORcomment} | |
709 | of 0 characters (since we used @code{*} when defining it). This is | |
710 | usually what you want, because it allows you to e.g. use @code{list-ref} | |
711 | to pull out elements (since they all have known offsets). | |
eee0877c | 712 | |
718bc349 AW |
713 | If you'd prefer not to have symbols for empty matches, you can replace |
714 | the @code{*} with a @code{+} and add a @code{?} after the | |
715 | @code{nameORcomment} in @code{entry}. Then it will try to parse 1 or | |
716 | more characters, fail (inserting nothing into the parse tree), but | |
717 | continue because it didn't have to match the nameORcomment to continue. | |
eee0877c AW |
718 | |
719 | ||
720 | @subsubheading Embedding Arithmetic Expressions | |
721 | ||
722 | We can parse simple mathematical expressions with the following PEG: | |
723 | ||
724 | @lisp | |
725 | (define-grammar | |
726 | "expr <- sum | |
727 | sum <-- (product ('+' / '-') sum) / product | |
728 | product <-- (value ('*' / '/') product) / value | |
729 | value <-- number / '(' expr ')' | |
730 | number <-- [0-9]+") | |
731 | @end lisp | |
732 | ||
733 | Then: | |
734 | @lisp | |
735 | (peg:tree (peg-parse expr "1+1/2*3+(1+1)/2")) @result{} | |
736 | (sum (product (value (number "1"))) | |
737 | "+" | |
738 | (sum (product | |
739 | (value (number "1")) | |
740 | "/" | |
741 | (product | |
742 | (value (number "2")) | |
743 | "*" | |
744 | (product (value (number "3"))))) | |
745 | "+" | |
746 | (sum (product | |
747 | (value "(" | |
748 | (sum (product (value (number "1"))) | |
749 | "+" | |
750 | (sum (product (value (number "1"))))) | |
751 | ")") | |
752 | "/" | |
753 | (product (value (number "2"))))))) | |
754 | @end lisp | |
755 | ||
718bc349 AW |
756 | There is very little wasted effort in this PEG. The @code{number} |
757 | nonterminal has to be tagged because otherwise the numbers might run | |
758 | together with the arithmetic expressions during the string concatenation | |
759 | stage of parse-tree compression (the parser will see ``1'' followed by | |
760 | ``/'' and decide to call it ``1/''). When in doubt, tag. | |
eee0877c AW |
761 | |
762 | It is very easy to turn these parse trees into lisp expressions: | |
718bc349 | 763 | |
eee0877c AW |
764 | @lisp |
765 | (define (parse-sum sum left . rest) | |
766 | (if (null? rest) | |
767 | (apply parse-product left) | |
768 | (list (string->symbol (car rest)) | |
769 | (apply parse-product left) | |
770 | (apply parse-sum (cadr rest))))) | |
771 | ||
772 | (define (parse-product product left . rest) | |
773 | (if (null? rest) | |
774 | (apply parse-value left) | |
775 | (list (string->symbol (car rest)) | |
776 | (apply parse-value left) | |
777 | (apply parse-product (cadr rest))))) | |
778 | ||
779 | (define (parse-value value first . rest) | |
780 | (if (null? rest) | |
781 | (string->number (cadr first)) | |
782 | (apply parse-sum (car rest)))) | |
783 | ||
784 | (define parse-expr parse-sum) | |
785 | @end lisp | |
718bc349 AW |
786 | |
787 | (Notice all these functions look very similar; for a more complicated | |
788 | PEG, it would be worth abstracting.) | |
eee0877c AW |
789 | |
790 | Then: | |
791 | @lisp | |
792 | (apply parse-expr (peg:tree (peg-parse expr "1+1/2*3+(1+1)/2"))) @result{} | |
793 | (+ 1 (+ (/ 1 (* 2 3)) (/ (+ 1 1) 2))) | |
794 | @end lisp | |
795 | ||
718bc349 AW |
796 | But wait! The associativity is wrong! Where it says @code{(/ 1 (* 2 |
797 | 3))}, it should say @code{(* (/ 1 2) 3)}. | |
eee0877c | 798 | |
718bc349 AW |
799 | It's tempting to try replacing e.g. @code{"sum <-- (product ('+' / '-') |
800 | sum) / product"} with @code{"sum <-- (sum ('+' / '-') product) / | |
801 | product"}, but this is a Bad Idea. PEGs don't support left recursion. | |
802 | To see why, imagine what the parser will do here. When it tries to | |
803 | parse @code{sum}, it first has to try and parse @code{sum}. But to do | |
804 | that, it first has to try and parse @code{sum}. This will continue | |
805 | until the stack gets blown off. | |
eee0877c | 806 | |
718bc349 AW |
807 | So how does one parse left-associative binary operators with PEGs? |
808 | Honestly, this is one of their major shortcomings. There's no | |
809 | general-purpose way of doing this, but here the repetition operators are | |
810 | a good choice: | |
eee0877c AW |
811 | |
812 | @lisp | |
813 | (use-modules (srfi srfi-1)) | |
814 | ||
815 | (define-grammar | |
816 | "expr <- sum | |
817 | sum <-- (product ('+' / '-'))* product | |
818 | product <-- (value ('*' / '/'))* value | |
819 | value <-- number / '(' expr ')' | |
820 | number <-- [0-9]+") | |
821 | ||
822 | ;; take a deep breath... | |
823 | (define (make-left-parser next-func) | |
824 | (lambda (sum first . rest) ;; general form, comments below assume | |
825 | ;; that we're dealing with a sum expression | |
826 | (if (null? rest) ;; form (sum (product ...)) | |
827 | (apply next-func first) | |
828 | (if (string? (cadr first));; form (sum ((product ...) "+") (product ...)) | |
829 | (list (string->symbol (cadr first)) | |
830 | (apply next-func (car first)) | |
831 | (apply next-func (car rest))) | |
832 | ;; form (sum (((product ...) "+") ((product ...) "+")) (product ...)) | |
833 | (car | |
834 | (reduce ;; walk through the list and build a left-associative tree | |
835 | (lambda (l r) | |
836 | (list (list (cadr r) (car r) (apply next-func (car l))) | |
837 | (string->symbol (cadr l)))) | |
838 | 'ignore | |
839 | (append ;; make a list of all the products | |
840 | ;; the first one should be pre-parsed | |
841 | (list (list (apply next-func (caar first)) | |
842 | (string->symbol (cadar first)))) | |
843 | (cdr first) | |
844 | ;; the last one has to be added in | |
845 | (list (append rest '("done")))))))))) | |
846 | ||
847 | (define (parse-value value first . rest) | |
848 | (if (null? rest) | |
849 | (string->number (cadr first)) | |
850 | (apply parse-sum (car rest)))) | |
851 | (define parse-product (make-left-parser parse-value)) | |
852 | (define parse-sum (make-left-parser parse-product)) | |
853 | (define parse-expr parse-sum) | |
854 | @end lisp | |
855 | ||
856 | Then: | |
857 | @lisp | |
858 | (apply parse-expr (peg:tree (peg-parse expr "1+1/2*3+(1+1)/2"))) @result{} | |
859 | (+ (+ 1 (* (/ 1 2) 3)) (/ (+ 1 1) 2)) | |
860 | @end lisp | |
861 | ||
718bc349 AW |
862 | As you can see, this is much uglier (it could be made prettier by using |
863 | @code{context-flatten}, but the way it's written above makes it clear | |
864 | how we deal with the three ways the zero-or-more @code{*} expression can | |
865 | parse). Fortunately, most of the time we can get away with only using | |
866 | right-associativity. | |
eee0877c AW |
867 | |
868 | @subsubheading Simplified Functions | |
869 | ||
718bc349 AW |
870 | For a more tantalizing example, consider the following grammar that |
871 | parses (highly) simplified C functions: | |
872 | ||
eee0877c AW |
873 | @lisp |
874 | (define-grammar | |
875 | "cfunc <-- cSP ctype cSP cname cSP cargs cLB cSP cbody cRB | |
876 | ctype <-- cidentifier | |
877 | cname <-- cidentifier | |
878 | cargs <-- cLP (! (cSP cRP) carg cSP (cCOMMA / cRP) cSP)* cSP | |
879 | carg <-- cSP ctype cSP cname | |
880 | cbody <-- cstatement * | |
881 | cidentifier <- [a-zA-z][a-zA-Z0-9_]* | |
882 | cstatement <-- (!';'.)*cSC cSP | |
883 | cSC < ';' | |
884 | cCOMMA < ',' | |
885 | cLP < '(' | |
886 | cRP < ')' | |
887 | cLB < '@{' | |
888 | cRB < '@}' | |
889 | cSP < [ \t\n]*") | |
890 | @end lisp | |
891 | ||
892 | Then: | |
893 | @lisp | |
894 | (peg-parse cfunc "int square(int a) @{ return a*a;@}") @result{} | |
895 | (32 | |
896 | (cfunc (ctype "int") | |
897 | (cname "square") | |
898 | (cargs (carg (ctype "int") (cname "a"))) | |
899 | (cbody (cstatement "return a*a")))) | |
900 | @end lisp | |
901 | ||
902 | And: | |
903 | @lisp | |
904 | (peg-parse cfunc "int mod(int a, int b) @{ int c = a/b;return a-b*c; @}") @result{} | |
905 | (52 | |
906 | (cfunc (ctype "int") | |
907 | (cname "mod") | |
908 | (cargs (carg (ctype "int") (cname "a")) | |
909 | (carg (ctype "int") (cname "b"))) | |
910 | (cbody (cstatement "int c = a/b") | |
911 | (cstatement "return a- b*c")))) | |
912 | @end lisp | |
913 | ||
718bc349 AW |
914 | By wrapping all the @code{carg} nonterminals in a @code{cargs} |
915 | nonterminal, we were able to remove any ambiguity in the parsing | |
916 | structure and avoid having to call @code{context-flatten} on the output | |
917 | of @code{peg-parse}. We used the same trick with the @code{cstatement} | |
918 | nonterminals, wrapping them in a @code{cbody} nonterminal. | |
919 | ||
920 | The whitespace nonterminal @code{cSP} used here is a (very) useful | |
921 | instantiation of a common pattern for matching syntactically irrelevant | |
922 | information. Since it's tagged with @code{<} and ends with @code{*} it | |
923 | won't clutter up the parse trees (all the empty lists will be discarded | |
924 | during the compression step) and it will never cause parsing to fail. | |
97c84694 NL |
925 | |
926 | @node PEG Internals | |
927 | @subsection PEG Internals | |
928 | ||
929 | A PEG parser takes a string as input and attempts to parse it as a given | |
930 | nonterminal. The key idea of the PEG implementation is that every | |
931 | nonterminal is just a function that takes a string as an argument and | |
932 | attempts to parse that string as its nonterminal. The functions always | |
933 | start from the beginning, but a parse is considered successful if there | |
934 | is material left over at the end. | |
935 | ||
936 | This makes it easy to model different PEG parsing operations. For | |
937 | instance, consider the PEG grammar @code{"ab"}, which could also be | |
938 | written @code{(and "a" "b")}. It matches the string ``ab''. Here's how | |
939 | that might be implemented in the PEG style: | |
940 | ||
941 | @lisp | |
942 | (define (match-and-a-b str) | |
943 | (match-a str) | |
944 | (match-b str)) | |
945 | @end lisp | |
946 | ||
947 | As you can see, the use of functions provides an easy way to model | |
948 | sequencing. In a similar way, one could model @code{(or a b)} with | |
949 | something like the following: | |
950 | ||
951 | @lisp | |
952 | (define (match-or-a-b str) | |
953 | (or (match-a str) (match-b str))) | |
954 | @end lisp | |
955 | ||
956 | Here the semantics of a PEG @code{or} expression map naturally onto | |
957 | Scheme's @code{or} operator. This function will attempt to run | |
958 | @code{(match-a str)}, and return its result if it succeeds. Otherwise it | |
959 | will run @code{(match-b str)}. | |
960 | ||
961 | Of course, the code above wouldn't quite work. We need some way for the | |
962 | parsing functions to communicate. The actual interface used is below. | |
963 | ||
964 | @subsubheading Parsing Function Interface | |
965 | ||
966 | A parsing function takes three arguments - a string, the length of that | |
967 | string, and the position in that string it should start parsing at. In | |
968 | effect, the parsing functions pass around substrings in pieces - the | |
969 | first argument is a buffer of characters, and the second two give a | |
970 | range within that buffer that the parsing function should look at. | |
971 | ||
972 | Parsing functions return either #f, if they failed to match their | |
973 | nonterminal, or a list whose first element must be an integer | |
974 | representing the final position in the string they matched and whose cdr | |
975 | can be any other data the function wishes to return, or '() if it | |
976 | doesn't have any more data. | |
977 | ||
978 | The one caveat is that if the extra data it returns is a list, any | |
979 | adjacent strings in that list will be appended by @code{peg-parse}. For | |
980 | instance, if a parsing function returns @code{(13 ("a" "b" "c"))}, | |
981 | @code{peg-parse} will take @code{(13 ("abc"))} as its value. | |
982 | ||
983 | For example, here is a function to match ``ab'' using the actual | |
984 | interface. | |
985 | ||
986 | @lisp | |
987 | (define (match-a-b str len pos) | |
988 | (and (<= (+ pos 2) len) | |
989 | (string= str "ab" pos (+ pos 2)) | |
990 | (list (+ pos 2) '()))) ; we return no extra information | |
991 | @end lisp | |
992 | ||
993 | The above function can be used to match a string by running | |
994 | @code{(peg-parse match-a-b "ab")}. | |
94e8517c NL |
995 | |
996 | @subsubheading Code Generators and Extensible Syntax | |
997 | ||
998 | PEG expressions, such as those in a @code{define-nonterm} form, are | |
999 | interpreted internally in two steps. | |
1000 | ||
1001 | First, any string PEG is expanded into an s-expression PEG by the code | |
1002 | in the @code{(ice-9 peg string-peg)} module. | |
1003 | ||
1004 | Then, then s-expression PEG that results is compiled into a parsing | |
1005 | function by the @code{(ice-9 peg codegen)} module. In particular, the | |
1006 | function @code{peg-sexp-compile} is called on the s-expression. It then | |
1007 | decides what to do based on the form it is passed. | |
1008 | ||
1009 | The PEG syntax can be expanded by providing @code{peg-sexp-compile} more | |
1010 | options for what to do with its forms. The extended syntax will be | |
1011 | associated with a symbol, for instance @code{my-parsing-form}, and will | |
1012 | be called on all PEG expressions of the form | |
1013 | @lisp | |
1014 | (my-parsing-form ...) | |
1015 | @end lisp | |
1016 | ||
1017 | The parsing function should take two arguments. The first will be a | |
1018 | syntax object containing a list with all of the arguments to the form | |
1019 | (but not the form's name), and the second will be the | |
1020 | @code{capture-type} argument that is passed to @code{define-nonterm}. | |
1021 | ||
1022 | New functions can be registered by calling @code{(add-peg-compiler! | |
1023 | symbol function)}, where @code{symbol} is the symbol that will indicate | |
1024 | a form of this type and @code{function} is the code generating function | |
1025 | described above. The function @code{add-peg-compiler!} is exported from | |
1026 | the @code{(ice-9 peg codegen)} module. |