Commit | Line | Data |
---|---|---|
cfa49c1e EL |
1 | \input texinfo @c -*-texinfo-*- |
2 | @c %**start of header | |
98c94021 | 3 | @setfilename ../../info/bovine |
cfa49c1e EL |
4 | @set TITLE Bovine parser development |
5 | @set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim | |
6 | @settitle @value{TITLE} | |
7 | ||
8 | @c ************************************************************************* | |
9 | @c @ Header | |
10 | @c ************************************************************************* | |
11 | ||
12 | @c Merge all indexes into a single index for now. | |
13 | @c We can always separate them later into two or more as needed. | |
14 | @syncodeindex vr cp | |
15 | @syncodeindex fn cp | |
16 | @syncodeindex ky cp | |
17 | @syncodeindex pg cp | |
18 | @syncodeindex tp cp | |
19 | ||
20 | @c @footnotestyle separate | |
21 | @c @paragraphindent 2 | |
22 | @c @@smallbook | |
23 | @c %**end of header | |
24 | ||
25 | @copying | |
98c94021 | 26 | Copyright @copyright{} 1999-2004, 2012 Free Software Foundation, Inc. |
cfa49c1e EL |
27 | |
28 | @quotation | |
29 | Permission is granted to copy, distribute and/or modify this document | |
98c94021 GM |
30 | under the terms of the GNU Free Documentation License, Version 1.3 or |
31 | any later version published by the Free Software Foundation; with no | |
32 | Invariant Sections, with the Front-Cover texts being ``A GNU Manual,'' | |
33 | and with the Back-Cover Texts as in (a) below. A copy of the license | |
34 | is included in the section entitled ``GNU Free Documentation License''. | |
35 | ||
36 | (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and | |
6bf430d1 | 37 | modify this GNU manual.'' |
cfa49c1e EL |
38 | @end quotation |
39 | @end copying | |
40 | ||
98c94021 | 41 | @dircategory Emacs misc features |
cfa49c1e | 42 | @direntry |
98c94021 | 43 | * Bovine: (bovine). Semantic bovine parser development. |
cfa49c1e | 44 | @end direntry |
cfa49c1e EL |
45 | |
46 | @iftex | |
47 | @finalout | |
48 | @end iftex | |
49 | ||
50 | @c @setchapternewpage odd | |
51 | @c @setchapternewpage off | |
52 | ||
cfa49c1e EL |
53 | @titlepage |
54 | @sp 10 | |
55 | @title @value{TITLE} | |
56 | @author by @value{AUTHOR} | |
cfa49c1e EL |
57 | @page |
58 | @vskip 0pt plus 1 fill | |
59 | @insertcopying | |
60 | @end titlepage | |
61 | @page | |
62 | ||
98c94021 GM |
63 | @macro semantic{} |
64 | @i{Semantic} | |
65 | @end macro | |
cfa49c1e EL |
66 | |
67 | @c ************************************************************************* | |
68 | @c @ Document | |
69 | @c ************************************************************************* | |
70 | @contents | |
71 | ||
72 | @node top | |
73 | @top @value{TITLE} | |
74 | ||
75 | The @dfn{bovine} parser is the original @semantic{} parser, and is an | |
76 | implementation of an @acronym{LL} parser. It is good for simple | |
77 | languages. It has many conveniences making grammar writing easy. The | |
78 | conveniences make it less powerful than a Bison-like @acronym{LALR} | |
79 | parser. For more information, @inforef{top, the Wisent Parser Manual, | |
80 | wisent}. | |
81 | ||
82 | Bovine @acronym{LL} grammars are stored in files with a @file{.by} | |
83 | extension. When compiled, the contents is converted into a file of | |
84 | the form @file{NAME-by.el}. This, in turn is byte compiled. | |
85 | @inforef{top, Grammar Framework Manual, grammar-fw}. | |
86 | ||
98c94021 GM |
87 | @ifnottex |
88 | @insertcopying | |
89 | @end ifnottex | |
90 | ||
cfa49c1e EL |
91 | @menu |
92 | * Starting Rules:: The starting rules for the grammar. | |
98c94021 GM |
93 | * Bovine Grammar Rules:: Rules used to parse a language. |
94 | * Optional Lambda Expression:: Actions to take when a rule is matched. | |
95 | * Bovine Examples:: Simple Samples. | |
96 | * GNU Free Documentation License:: The license for this documentation. | |
97 | @c * Index:: | |
cfa49c1e EL |
98 | @end menu |
99 | ||
100 | @node Starting Rules | |
101 | @chapter Starting Rules | |
102 | ||
103 | In Bison, one and only one nonterminal is designated as the ``start'' | |
104 | symbol. In @semantic{}, one or more nonterminals can be designated as | |
105 | the ``start'' symbol. They are declared following the @code{%start} | |
106 | keyword separated by spaces. @inforef{start Decl, ,grammar-fw}. | |
107 | ||
108 | If no @code{%start} keyword is used in a grammar, then the very first | |
109 | is used. Internally the first start nonterminal is targeted by the | |
110 | reserved symbol @code{bovine-toplevel}, so it can be found by the | |
111 | parser harness. | |
112 | ||
113 | To find locally defined variables, the local context handler needs to | |
114 | parse the body of functional code. The @code{scopestart} declaration | |
115 | specifies the name of a nonterminal used as the goal to parse a local | |
116 | context, @inforef{scopestart Decl, ,grammar-fw}. Internally the | |
117 | scopestart nonterminal is targeted by the reserved symbol | |
118 | @code{bovine-inner-scope}, so it can be found by the parser harness. | |
119 | ||
120 | @node Bovine Grammar Rules | |
121 | @chapter Bovine Grammar Rules | |
122 | ||
123 | The rules are what allow the compiler to create tags from a language | |
124 | file. Once the setup is done in the prologue, you can start writing | |
125 | rules. @inforef{Grammar Rules, ,grammar-fw}. | |
126 | ||
127 | @example | |
128 | @var{result} : @var{components1} @var{optional-semantic-action1}) | |
129 | | @var{components2} @var{optional-semantic-action2} | |
130 | ; | |
131 | @end example | |
132 | ||
133 | @var{result} is a nonterminal, that is a symbol synthesized in your grammar. | |
134 | @var{components} is a list of elements that are to be matched if @var{result} | |
135 | is to be made. @var{optional-semantic-action} is an optional sequence | |
136 | of simplified Emacs Lisp expressions for concocting the parse tree. | |
137 | ||
138 | In bison, each time an element of @var{components} is found, it is | |
139 | @dfn{shifted} onto the parser stack. (The stack of matched elements.) | |
140 | When all @var{components}' elements have been matched, it is | |
141 | @dfn{reduced} to @var{result}. @xref{(bison)Algorithm}. | |
142 | ||
143 | A particular @var{result} written into your grammar becomes | |
144 | the parser's goal. It is designated by a @code{%start} statement | |
145 | (@pxref{Starting Rules}). The value returned by the associated | |
146 | @var{optional-semantic-action} is the parser's result. It should be | |
147 | a tree of @semantic{} @dfn{tags}, @inforef{Semantic Tags, , | |
148 | semantic-appdev}. | |
149 | ||
150 | @var{components} is made up of symbols. A symbol such as @code{FOO} | |
151 | means that a syntactic token of class @code{FOO} must be matched. | |
152 | ||
153 | @menu | |
98c94021 GM |
154 | * How Lexical Tokens Match:: |
155 | * Grammar-to-Lisp Details:: | |
156 | * Order of components in rules:: | |
cfa49c1e EL |
157 | @end menu |
158 | ||
159 | @node How Lexical Tokens Match | |
160 | @section How Lexical Tokens Match | |
161 | ||
162 | A lexical rule must be used to define how to match a lexical token. | |
163 | ||
164 | For instance: | |
165 | ||
166 | @example | |
167 | %keyword FOO "foo" | |
168 | @end example | |
169 | ||
170 | Means that @code{FOO} is a reserved language keyword, matched as such | |
171 | by looking up into a keyword table, @inforef{keyword Decl, | |
172 | ,grammar-fw}. This is because @code{"foo"} will be converted to | |
173 | @code{FOO} in the lexical analysis stage. Thus the symbol @code{FOO} | |
174 | won't be available any other way. | |
175 | ||
176 | If we specify our token in this way: | |
177 | ||
178 | @example | |
179 | %token <symbol> FOO "foo" | |
180 | @end example | |
181 | ||
182 | then @code{FOO} will match the string @code{"foo"} explicitly, but it | |
183 | won't do so at the lexical level, allowing use of the text | |
184 | @code{"foo"} in other forms of regular expressions. | |
185 | ||
186 | In that case, @code{FOO} is a @code{symbol}-type token. To match, a | |
187 | @code{symbol} must first be encountered, and then it must | |
188 | @code{string-match "foo"}. | |
189 | ||
190 | @table @strong | |
191 | @item Caution: | |
192 | Be especially careful to remember that @code{"foo"}, and more | |
193 | generally the %token's match-value string, is a regular expression! | |
194 | @end table | |
195 | ||
196 | Non symbol tokens are also allowed. For example: | |
197 | ||
198 | @example | |
199 | %token <punctuation> PERIOD "[.]" | |
200 | ||
201 | filename : symbol PERIOD symbol | |
202 | ; | |
203 | @end example | |
204 | ||
205 | @code{PERIOD} is a @code{punctuation}-type token that will explicitly | |
206 | match one period when used in the above rule. | |
207 | ||
208 | @table @strong | |
209 | @item Please Note: | |
210 | @code{symbol}, @code{punctuation}, etc., are predefined lexical token | |
211 | types, based on the @dfn{syntax class}-character associations | |
212 | currently in effect. | |
213 | @end table | |
214 | ||
215 | @node Grammar-to-Lisp Details | |
216 | @section Grammar-to-Lisp Details | |
217 | ||
218 | For the bovinator, lexical token matching patterns are @emph{inlined}. | |
219 | When the grammar-to-lisp converter encounters a lexical token | |
220 | declaration of the form: | |
221 | ||
222 | @example | |
223 | %token <@var{type}> @var{token-name} @var{match-value} | |
224 | @end example | |
225 | ||
226 | It substitutes every occurrences of @var{token-name} in rules, by its | |
227 | expanded form: | |
228 | ||
229 | @example | |
230 | @var{type} @var{match-value} | |
231 | @end example | |
232 | ||
233 | For example: | |
234 | ||
235 | @example | |
236 | %token <symbol> MOOSE "moose" | |
237 | ||
238 | find_a_moose: MOOSE | |
239 | ; | |
240 | @end example | |
241 | ||
242 | Will generate this pseudo equivalent-rule: | |
243 | ||
244 | @example | |
245 | find_a_moose: symbol "moose" ;; invalid syntax! | |
246 | ; | |
247 | @end example | |
248 | ||
249 | Thus, from the bovinator point of view, the @var{components} part of a | |
250 | rule is made up of symbols and strings. A string in the mix means | |
251 | that the previous symbol must have the additional constraint of | |
252 | exactly matching it, as described in @ref{How Lexical Tokens Match}. | |
253 | ||
254 | @table @strong | |
255 | @item Please Note: | |
256 | For the bovinator, this task was mixed into the language definition to | |
257 | simplify implementation, though Bison's technique is more efficient. | |
258 | @end table | |
259 | ||
260 | @node Order of components in rules | |
261 | @section Order of components in rules | |
262 | ||
263 | If a rule has multiple components, order is important, for example | |
264 | ||
265 | @example | |
266 | headerfile : symbol PERIOD symbol | |
267 | | symbol | |
268 | ; | |
269 | @end example | |
270 | ||
271 | would match @samp{foo.h} or the @acronym{C++} header @samp{foo}. | |
272 | The bovine parser will first attempt to match the long form, and then | |
273 | the short form. If they were in reverse order, then the long form | |
274 | would never be tested. | |
275 | ||
276 | @c @xref{Default syntactic tokens}. | |
277 | ||
278 | @node Optional Lambda Expression | |
279 | @chapter Optional Lambda Expressions | |
280 | ||
281 | The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into | |
282 | a bovine lambda. This lambda has special short-cuts to simplify | |
283 | reading the semantic action definition. An @acronym{OLE} like this: | |
284 | ||
285 | @example | |
286 | ( $1 ) | |
287 | @end example | |
288 | ||
289 | results in a lambda return which consists entirely of the string | |
290 | or object found by matching the first (zeroth) element of match. | |
291 | An @acronym{OLE} like this: | |
292 | ||
293 | @example | |
294 | ( ,(foo $1) ) | |
295 | @end example | |
296 | ||
297 | executes @code{foo} on the first argument, and then splices its return | |
298 | into the return list whereas: | |
299 | ||
300 | @example | |
301 | ( (foo $1) ) | |
302 | @end example | |
303 | ||
304 | executes @code{foo}, and that is placed in the return list. | |
305 | ||
306 | Here are other things that can appear inline: | |
307 | ||
308 | @table @code | |
309 | @item $1 | |
310 | The first object matched. | |
311 | ||
312 | @item ,$1 | |
313 | The first object spliced into the list (assuming it is a list from a | |
314 | non-terminal). | |
315 | ||
316 | @item '$1 | |
317 | The first object matched, placed in a list. i.e. @code{( $1 )}. | |
318 | ||
319 | @item foo | |
320 | The symbol @code{foo} (exactly as displayed). | |
321 | ||
322 | @item (foo) | |
323 | A function call to foo which is stuck into the return list. | |
324 | ||
325 | @item ,(foo) | |
326 | A function call to foo which is spliced into the return list. | |
327 | ||
328 | @item '(foo) | |
329 | A function call to foo which is stuck into the return list in a list. | |
330 | ||
331 | @item (EXPAND @var{$1} @var{nonterminal} @var{depth}) | |
332 | A list starting with @code{EXPAND} performs a recursive parse on the | |
333 | token passed to it (represented by @samp{$1} above.) The | |
334 | @dfn{semantic list} is a common token to expand, as there are often | |
335 | interesting things in the list. The @var{nonterminal} is a symbol in | |
336 | your table which the bovinator will start with when parsing. | |
337 | @var{nonterminal}'s definition is the same as any other nonterminal. | |
338 | @var{depth} should be at least @samp{1} when descending into a | |
339 | semantic list. | |
340 | ||
341 | @item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth}) | |
342 | Is like @code{EXPAND}, except that the parser will iterate over | |
343 | @var{nonterminal} until there are no more matches. (The same way the | |
344 | parser iterates over the starting rule (@pxref{Starting Rules}). This | |
345 | lets you have much simpler rules in this specific case, and also lets | |
346 | you have positional information in the returned tokens, and error | |
347 | skipping. | |
348 | ||
349 | @item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{}) | |
350 | This is used for creating an association list. Each @var{symbol} is | |
351 | included in the list if the associated @var{value} is non-@code{nil}. | |
352 | While the items are all listed explicitly, the created structure is an | |
353 | association list of the form: | |
354 | ||
355 | @example | |
356 | ((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{}) | |
357 | @end example | |
358 | ||
359 | @item (TAG @var{name} @var{class} [@var{attributes}]) | |
360 | This creates one tag in the current buffer. | |
361 | ||
362 | @table @var | |
363 | @item name | |
364 | Is a string that represents the tag in the language. | |
365 | ||
366 | @item class | |
367 | Is the kind of tag being create, such as @code{function}, or | |
368 | @code{variable}, though any symbol will work. | |
369 | ||
370 | @item attributes | |
371 | Is an optional set of labeled values such as @w{@code{:constant-flag t :parent | |
372 | "parenttype"}}. | |
373 | @end table | |
374 | ||
375 | @item (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}]) | |
376 | @itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}]) | |
377 | @itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}]) | |
378 | @itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}]) | |
379 | @itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}]) | |
380 | @itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}]) | |
381 | Create a tag with @var{name} of respectively the class | |
382 | @code{variable}, @code{function}, @code{type}, @code{include}, | |
383 | @code{package}, and @code{code}. | |
384 | See @inforef{Creating Tags, , semantic-appdev} for the lisp | |
385 | functions these translate into. | |
386 | @end table | |
387 | ||
388 | If the symbol @code{%quotemode backquote} is specified, then use | |
389 | @code{,@@} to splice a list in, and @code{,} to evaluate the expression. | |
390 | This lets you send @code{$1} as a symbol into a list instead of having | |
391 | it expanded inline. | |
392 | ||
393 | @node Bovine Examples | |
394 | @chapter Examples | |
395 | ||
396 | The rule: | |
397 | ||
398 | @example | |
399 | any-symbol: symbol | |
400 | ; | |
401 | @end example | |
402 | ||
403 | is equivalent to | |
404 | ||
405 | @example | |
406 | any-symbol: symbol | |
407 | ( $1 ) | |
408 | ; | |
409 | @end example | |
410 | ||
411 | which, if it matched the string @samp{"A"}, would return | |
412 | ||
413 | @example | |
414 | ( "A" ) | |
415 | @end example | |
416 | ||
417 | If this rule were used like this: | |
418 | ||
419 | @example | |
420 | %token <punctuation> EQUAL "=" | |
421 | @dots{} | |
422 | assign: any-symbol EQUAL any-symbol | |
423 | ( $1 $3 ) | |
424 | ; | |
425 | @end example | |
426 | ||
427 | it would match @samp{"A=B"}, and return | |
428 | ||
429 | @example | |
430 | ( ("A") ("B") ) | |
431 | @end example | |
432 | ||
433 | The letters @samp{A} and @samp{B} come back in lists because | |
434 | @samp{any-symbol} is a nonterminal, not an actual lexical element. | |
435 | ||
436 | To get a better result with nonterminals, use @asis{,} to splice lists | |
437 | in like this: | |
438 | ||
439 | @example | |
440 | %token <punctuation> EQUAL "=" | |
441 | @dots{} | |
442 | assign: any-symbol EQUAL any-symbol | |
443 | ( ,$1 ,$3 ) | |
444 | ; | |
445 | @end example | |
446 | ||
447 | which would return | |
448 | ||
449 | @example | |
450 | ( "A" "B" ) | |
451 | @end example | |
452 | ||
453 | @node GNU Free Documentation License | |
454 | @appendix GNU Free Documentation License | |
455 | ||
98c94021 | 456 | @include doclicense.texi |
cfa49c1e | 457 | |
98c94021 GM |
458 | @c There is nothing to index at the moment. |
459 | @ignore | |
cfa49c1e EL |
460 | @node Index |
461 | @unnumbered Index | |
462 | @printindex cp | |
98c94021 | 463 | @end ignore |
cfa49c1e EL |
464 | |
465 | @iftex | |
466 | @contents | |
467 | @summarycontents | |
468 | @end iftex | |
469 | ||
470 | @bye | |
471 | ||
472 | @c Following comments are for the benefit of ispell. | |
473 | ||
474 | @c LocalWords: bovinator inlined |