Commit | Line | Data |
---|---|---|
cfa49c1e EL |
1 | \input texinfo @c -*-texinfo-*- |
2 | @c %**start of header | |
98c94021 | 3 | @setfilename ../../info/bovine |
cfa49c1e EL |
4 | @set TITLE Bovine parser development |
5 | @set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim | |
6 | @settitle @value{TITLE} | |
c6ab4664 | 7 | @documentencoding UTF-8 |
cfa49c1e EL |
8 | |
9 | @c ************************************************************************* | |
10 | @c @ Header | |
11 | @c ************************************************************************* | |
12 | ||
13 | @c Merge all indexes into a single index for now. | |
14 | @c We can always separate them later into two or more as needed. | |
15 | @syncodeindex vr cp | |
16 | @syncodeindex fn cp | |
17 | @syncodeindex ky cp | |
18 | @syncodeindex pg cp | |
19 | @syncodeindex tp cp | |
20 | ||
21 | @c @footnotestyle separate | |
22 | @c @paragraphindent 2 | |
23 | @c @@smallbook | |
24 | @c %**end of header | |
25 | ||
26 | @copying | |
6bc383b1 | 27 | Copyright @copyright{} 1999--2004, 2012--2014 Free Software Foundation, Inc. |
cfa49c1e EL |
28 | |
29 | @quotation | |
30 | Permission is granted to copy, distribute and/or modify this document | |
98c94021 GM |
31 | under the terms of the GNU Free Documentation License, Version 1.3 or |
32 | any later version published by the Free Software Foundation; with no | |
33 | Invariant Sections, with the Front-Cover texts being ``A GNU Manual,'' | |
34 | and with the Back-Cover Texts as in (a) below. A copy of the license | |
35 | is included in the section entitled ``GNU Free Documentation License''. | |
36 | ||
37 | (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and | |
6bf430d1 | 38 | modify this GNU manual.'' |
cfa49c1e EL |
39 | @end quotation |
40 | @end copying | |
41 | ||
98c94021 | 42 | @dircategory Emacs misc features |
cfa49c1e | 43 | @direntry |
98c94021 | 44 | * Bovine: (bovine). Semantic bovine parser development. |
cfa49c1e | 45 | @end direntry |
cfa49c1e EL |
46 | |
47 | @iftex | |
48 | @finalout | |
49 | @end iftex | |
50 | ||
51 | @c @setchapternewpage odd | |
52 | @c @setchapternewpage off | |
53 | ||
cfa49c1e EL |
54 | @titlepage |
55 | @sp 10 | |
56 | @title @value{TITLE} | |
57 | @author by @value{AUTHOR} | |
cfa49c1e EL |
58 | @page |
59 | @vskip 0pt plus 1 fill | |
60 | @insertcopying | |
61 | @end titlepage | |
62 | @page | |
63 | ||
98c94021 GM |
64 | @macro semantic{} |
65 | @i{Semantic} | |
66 | @end macro | |
cfa49c1e EL |
67 | |
68 | @c ************************************************************************* | |
69 | @c @ Document | |
70 | @c ************************************************************************* | |
71 | @contents | |
72 | ||
73 | @node top | |
74 | @top @value{TITLE} | |
75 | ||
76 | The @dfn{bovine} parser is the original @semantic{} parser, and is an | |
77 | implementation of an @acronym{LL} parser. It is good for simple | |
78 | languages. It has many conveniences making grammar writing easy. The | |
79 | conveniences make it less powerful than a Bison-like @acronym{LALR} | |
a944db14 | 80 | parser. For more information, @inforef{Top, The Wisent Parser Manual, |
cfa49c1e EL |
81 | wisent}. |
82 | ||
83 | Bovine @acronym{LL} grammars are stored in files with a @file{.by} | |
84 | extension. When compiled, the contents is converted into a file of | |
85 | the form @file{NAME-by.el}. This, in turn is byte compiled. | |
86 | @inforef{top, Grammar Framework Manual, grammar-fw}. | |
87 | ||
98c94021 GM |
88 | @ifnottex |
89 | @insertcopying | |
90 | @end ifnottex | |
91 | ||
cfa49c1e EL |
92 | @menu |
93 | * Starting Rules:: The starting rules for the grammar. | |
98c94021 GM |
94 | * Bovine Grammar Rules:: Rules used to parse a language. |
95 | * Optional Lambda Expression:: Actions to take when a rule is matched. | |
96 | * Bovine Examples:: Simple Samples. | |
97 | * GNU Free Documentation License:: The license for this documentation. | |
98 | @c * Index:: | |
cfa49c1e EL |
99 | @end menu |
100 | ||
101 | @node Starting Rules | |
102 | @chapter Starting Rules | |
103 | ||
104 | In Bison, one and only one nonterminal is designated as the ``start'' | |
105 | symbol. In @semantic{}, one or more nonterminals can be designated as | |
106 | the ``start'' symbol. They are declared following the @code{%start} | |
107 | keyword separated by spaces. @inforef{start Decl, ,grammar-fw}. | |
108 | ||
109 | If no @code{%start} keyword is used in a grammar, then the very first | |
110 | is used. Internally the first start nonterminal is targeted by the | |
111 | reserved symbol @code{bovine-toplevel}, so it can be found by the | |
112 | parser harness. | |
113 | ||
114 | To find locally defined variables, the local context handler needs to | |
115 | parse the body of functional code. The @code{scopestart} declaration | |
116 | specifies the name of a nonterminal used as the goal to parse a local | |
117 | context, @inforef{scopestart Decl, ,grammar-fw}. Internally the | |
118 | scopestart nonterminal is targeted by the reserved symbol | |
119 | @code{bovine-inner-scope}, so it can be found by the parser harness. | |
120 | ||
121 | @node Bovine Grammar Rules | |
122 | @chapter Bovine Grammar Rules | |
123 | ||
124 | The rules are what allow the compiler to create tags from a language | |
125 | file. Once the setup is done in the prologue, you can start writing | |
126 | rules. @inforef{Grammar Rules, ,grammar-fw}. | |
127 | ||
128 | @example | |
129 | @var{result} : @var{components1} @var{optional-semantic-action1}) | |
130 | | @var{components2} @var{optional-semantic-action2} | |
131 | ; | |
132 | @end example | |
133 | ||
134 | @var{result} is a nonterminal, that is a symbol synthesized in your grammar. | |
135 | @var{components} is a list of elements that are to be matched if @var{result} | |
136 | is to be made. @var{optional-semantic-action} is an optional sequence | |
137 | of simplified Emacs Lisp expressions for concocting the parse tree. | |
138 | ||
139 | In bison, each time an element of @var{components} is found, it is | |
140 | @dfn{shifted} onto the parser stack. (The stack of matched elements.) | |
141 | When all @var{components}' elements have been matched, it is | |
88edc57f | 142 | @dfn{reduced} to @var{result}. @xref{Algorithm,,, bison, The GNU Bison Manual}. |
cfa49c1e EL |
143 | |
144 | A particular @var{result} written into your grammar becomes | |
145 | the parser's goal. It is designated by a @code{%start} statement | |
146 | (@pxref{Starting Rules}). The value returned by the associated | |
147 | @var{optional-semantic-action} is the parser's result. It should be | |
148 | a tree of @semantic{} @dfn{tags}, @inforef{Semantic Tags, , | |
149 | semantic-appdev}. | |
150 | ||
151 | @var{components} is made up of symbols. A symbol such as @code{FOO} | |
152 | means that a syntactic token of class @code{FOO} must be matched. | |
153 | ||
154 | @menu | |
98c94021 GM |
155 | * How Lexical Tokens Match:: |
156 | * Grammar-to-Lisp Details:: | |
157 | * Order of components in rules:: | |
cfa49c1e EL |
158 | @end menu |
159 | ||
160 | @node How Lexical Tokens Match | |
161 | @section How Lexical Tokens Match | |
162 | ||
163 | A lexical rule must be used to define how to match a lexical token. | |
164 | ||
165 | For instance: | |
166 | ||
167 | @example | |
168 | %keyword FOO "foo" | |
169 | @end example | |
170 | ||
171 | Means that @code{FOO} is a reserved language keyword, matched as such | |
172 | by looking up into a keyword table, @inforef{keyword Decl, | |
173 | ,grammar-fw}. This is because @code{"foo"} will be converted to | |
174 | @code{FOO} in the lexical analysis stage. Thus the symbol @code{FOO} | |
175 | won't be available any other way. | |
176 | ||
177 | If we specify our token in this way: | |
178 | ||
179 | @example | |
180 | %token <symbol> FOO "foo" | |
181 | @end example | |
182 | ||
183 | then @code{FOO} will match the string @code{"foo"} explicitly, but it | |
184 | won't do so at the lexical level, allowing use of the text | |
185 | @code{"foo"} in other forms of regular expressions. | |
186 | ||
187 | In that case, @code{FOO} is a @code{symbol}-type token. To match, a | |
188 | @code{symbol} must first be encountered, and then it must | |
189 | @code{string-match "foo"}. | |
190 | ||
191 | @table @strong | |
192 | @item Caution: | |
193 | Be especially careful to remember that @code{"foo"}, and more | |
194 | generally the %token's match-value string, is a regular expression! | |
195 | @end table | |
196 | ||
197 | Non symbol tokens are also allowed. For example: | |
198 | ||
199 | @example | |
200 | %token <punctuation> PERIOD "[.]" | |
201 | ||
202 | filename : symbol PERIOD symbol | |
203 | ; | |
204 | @end example | |
205 | ||
206 | @code{PERIOD} is a @code{punctuation}-type token that will explicitly | |
207 | match one period when used in the above rule. | |
208 | ||
209 | @table @strong | |
210 | @item Please Note: | |
211 | @code{symbol}, @code{punctuation}, etc., are predefined lexical token | |
212 | types, based on the @dfn{syntax class}-character associations | |
213 | currently in effect. | |
214 | @end table | |
215 | ||
216 | @node Grammar-to-Lisp Details | |
217 | @section Grammar-to-Lisp Details | |
218 | ||
219 | For the bovinator, lexical token matching patterns are @emph{inlined}. | |
220 | When the grammar-to-lisp converter encounters a lexical token | |
221 | declaration of the form: | |
222 | ||
223 | @example | |
224 | %token <@var{type}> @var{token-name} @var{match-value} | |
225 | @end example | |
226 | ||
227 | It substitutes every occurrences of @var{token-name} in rules, by its | |
228 | expanded form: | |
229 | ||
230 | @example | |
231 | @var{type} @var{match-value} | |
232 | @end example | |
233 | ||
234 | For example: | |
235 | ||
236 | @example | |
237 | %token <symbol> MOOSE "moose" | |
238 | ||
239 | find_a_moose: MOOSE | |
240 | ; | |
241 | @end example | |
242 | ||
243 | Will generate this pseudo equivalent-rule: | |
244 | ||
245 | @example | |
246 | find_a_moose: symbol "moose" ;; invalid syntax! | |
247 | ; | |
248 | @end example | |
249 | ||
250 | Thus, from the bovinator point of view, the @var{components} part of a | |
251 | rule is made up of symbols and strings. A string in the mix means | |
252 | that the previous symbol must have the additional constraint of | |
253 | exactly matching it, as described in @ref{How Lexical Tokens Match}. | |
254 | ||
255 | @table @strong | |
256 | @item Please Note: | |
257 | For the bovinator, this task was mixed into the language definition to | |
258 | simplify implementation, though Bison's technique is more efficient. | |
259 | @end table | |
260 | ||
261 | @node Order of components in rules | |
262 | @section Order of components in rules | |
263 | ||
264 | If a rule has multiple components, order is important, for example | |
265 | ||
266 | @example | |
267 | headerfile : symbol PERIOD symbol | |
268 | | symbol | |
269 | ; | |
270 | @end example | |
271 | ||
272 | would match @samp{foo.h} or the @acronym{C++} header @samp{foo}. | |
273 | The bovine parser will first attempt to match the long form, and then | |
274 | the short form. If they were in reverse order, then the long form | |
275 | would never be tested. | |
276 | ||
277 | @c @xref{Default syntactic tokens}. | |
278 | ||
279 | @node Optional Lambda Expression | |
280 | @chapter Optional Lambda Expressions | |
281 | ||
282 | The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into | |
283 | a bovine lambda. This lambda has special short-cuts to simplify | |
284 | reading the semantic action definition. An @acronym{OLE} like this: | |
285 | ||
286 | @example | |
287 | ( $1 ) | |
288 | @end example | |
289 | ||
290 | results in a lambda return which consists entirely of the string | |
291 | or object found by matching the first (zeroth) element of match. | |
292 | An @acronym{OLE} like this: | |
293 | ||
294 | @example | |
295 | ( ,(foo $1) ) | |
296 | @end example | |
297 | ||
298 | executes @code{foo} on the first argument, and then splices its return | |
299 | into the return list whereas: | |
300 | ||
301 | @example | |
302 | ( (foo $1) ) | |
303 | @end example | |
304 | ||
305 | executes @code{foo}, and that is placed in the return list. | |
306 | ||
307 | Here are other things that can appear inline: | |
308 | ||
309 | @table @code | |
310 | @item $1 | |
311 | The first object matched. | |
312 | ||
313 | @item ,$1 | |
314 | The first object spliced into the list (assuming it is a list from a | |
315 | non-terminal). | |
316 | ||
317 | @item '$1 | |
65e7ca35 | 318 | The first object matched, placed in a list. I.e., @code{( $1 )}. |
cfa49c1e EL |
319 | |
320 | @item foo | |
321 | The symbol @code{foo} (exactly as displayed). | |
322 | ||
323 | @item (foo) | |
324 | A function call to foo which is stuck into the return list. | |
325 | ||
326 | @item ,(foo) | |
327 | A function call to foo which is spliced into the return list. | |
328 | ||
329 | @item '(foo) | |
330 | A function call to foo which is stuck into the return list in a list. | |
331 | ||
332 | @item (EXPAND @var{$1} @var{nonterminal} @var{depth}) | |
333 | A list starting with @code{EXPAND} performs a recursive parse on the | |
334 | token passed to it (represented by @samp{$1} above.) The | |
335 | @dfn{semantic list} is a common token to expand, as there are often | |
336 | interesting things in the list. The @var{nonterminal} is a symbol in | |
337 | your table which the bovinator will start with when parsing. | |
338 | @var{nonterminal}'s definition is the same as any other nonterminal. | |
339 | @var{depth} should be at least @samp{1} when descending into a | |
340 | semantic list. | |
341 | ||
342 | @item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth}) | |
343 | Is like @code{EXPAND}, except that the parser will iterate over | |
344 | @var{nonterminal} until there are no more matches. (The same way the | |
345 | parser iterates over the starting rule (@pxref{Starting Rules}). This | |
346 | lets you have much simpler rules in this specific case, and also lets | |
347 | you have positional information in the returned tokens, and error | |
348 | skipping. | |
349 | ||
350 | @item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{}) | |
351 | This is used for creating an association list. Each @var{symbol} is | |
352 | included in the list if the associated @var{value} is non-@code{nil}. | |
353 | While the items are all listed explicitly, the created structure is an | |
354 | association list of the form: | |
355 | ||
356 | @example | |
357 | ((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{}) | |
358 | @end example | |
359 | ||
360 | @item (TAG @var{name} @var{class} [@var{attributes}]) | |
361 | This creates one tag in the current buffer. | |
362 | ||
363 | @table @var | |
364 | @item name | |
365 | Is a string that represents the tag in the language. | |
366 | ||
367 | @item class | |
368 | Is the kind of tag being create, such as @code{function}, or | |
369 | @code{variable}, though any symbol will work. | |
370 | ||
371 | @item attributes | |
fd762011 GM |
372 | Is an optional set of labeled values such as @code{:constant-flag t :parent |
373 | "parenttype"}. | |
cfa49c1e EL |
374 | @end table |
375 | ||
376 | @item (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}]) | |
377 | @itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}]) | |
378 | @itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}]) | |
379 | @itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}]) | |
380 | @itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}]) | |
381 | @itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}]) | |
382 | Create a tag with @var{name} of respectively the class | |
383 | @code{variable}, @code{function}, @code{type}, @code{include}, | |
384 | @code{package}, and @code{code}. | |
385 | See @inforef{Creating Tags, , semantic-appdev} for the lisp | |
386 | functions these translate into. | |
387 | @end table | |
388 | ||
389 | If the symbol @code{%quotemode backquote} is specified, then use | |
390 | @code{,@@} to splice a list in, and @code{,} to evaluate the expression. | |
391 | This lets you send @code{$1} as a symbol into a list instead of having | |
392 | it expanded inline. | |
393 | ||
394 | @node Bovine Examples | |
395 | @chapter Examples | |
396 | ||
397 | The rule: | |
398 | ||
399 | @example | |
400 | any-symbol: symbol | |
401 | ; | |
402 | @end example | |
403 | ||
404 | is equivalent to | |
405 | ||
406 | @example | |
407 | any-symbol: symbol | |
408 | ( $1 ) | |
409 | ; | |
410 | @end example | |
411 | ||
412 | which, if it matched the string @samp{"A"}, would return | |
413 | ||
414 | @example | |
415 | ( "A" ) | |
416 | @end example | |
417 | ||
418 | If this rule were used like this: | |
419 | ||
420 | @example | |
421 | %token <punctuation> EQUAL "=" | |
422 | @dots{} | |
423 | assign: any-symbol EQUAL any-symbol | |
424 | ( $1 $3 ) | |
425 | ; | |
426 | @end example | |
427 | ||
428 | it would match @samp{"A=B"}, and return | |
429 | ||
430 | @example | |
431 | ( ("A") ("B") ) | |
432 | @end example | |
433 | ||
434 | The letters @samp{A} and @samp{B} come back in lists because | |
435 | @samp{any-symbol} is a nonterminal, not an actual lexical element. | |
436 | ||
437 | To get a better result with nonterminals, use @asis{,} to splice lists | |
438 | in like this: | |
439 | ||
440 | @example | |
441 | %token <punctuation> EQUAL "=" | |
442 | @dots{} | |
443 | assign: any-symbol EQUAL any-symbol | |
444 | ( ,$1 ,$3 ) | |
445 | ; | |
446 | @end example | |
447 | ||
448 | which would return | |
449 | ||
450 | @example | |
451 | ( "A" "B" ) | |
452 | @end example | |
453 | ||
454 | @node GNU Free Documentation License | |
455 | @appendix GNU Free Documentation License | |
456 | ||
98c94021 | 457 | @include doclicense.texi |
cfa49c1e | 458 | |
98c94021 GM |
459 | @c There is nothing to index at the moment. |
460 | @ignore | |
cfa49c1e EL |
461 | @node Index |
462 | @unnumbered Index | |
463 | @printindex cp | |
98c94021 | 464 | @end ignore |
cfa49c1e EL |
465 | |
466 | @iftex | |
467 | @contents | |
468 | @summarycontents | |
469 | @end iftex | |
470 | ||
471 | @bye | |
472 | ||
473 | @c Following comments are for the benefit of ispell. | |
474 | ||
475 | @c LocalWords: bovinator inlined |