Specify info encoding and language.
[bpt/emacs.git] / doc / misc / bovine.texi
1 \input texinfo @c -*-texinfo-*-
2 @c %**start of header
3 @setfilename ../../info/bovine
4 @set TITLE Bovine parser development
5 @set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
6 @settitle @value{TITLE}
7 @documentencoding UTF-8
8 @documentlanguage en
9
10 @c *************************************************************************
11 @c @ Header
12 @c *************************************************************************
13
14 @c Merge all indexes into a single index for now.
15 @c We can always separate them later into two or more as needed.
16 @syncodeindex vr cp
17 @syncodeindex fn cp
18 @syncodeindex ky cp
19 @syncodeindex pg cp
20 @syncodeindex tp cp
21
22 @c @footnotestyle separate
23 @c @paragraphindent 2
24 @c @@smallbook
25 @c %**end of header
26
27 @copying
28 Copyright @copyright{} 1999--2004, 2012--2013 Free Software Foundation, Inc.
29
30 @quotation
31 Permission is granted to copy, distribute and/or modify this document
32 under the terms of the GNU Free Documentation License, Version 1.3 or
33 any later version published by the Free Software Foundation; with no
34 Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
35 and with the Back-Cover Texts as in (a) below. A copy of the license
36 is included in the section entitled ``GNU Free Documentation License''.
37
38 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
39 modify this GNU manual.''
40 @end quotation
41 @end copying
42
43 @dircategory Emacs misc features
44 @direntry
45 * Bovine: (bovine). Semantic bovine parser development.
46 @end direntry
47
48 @iftex
49 @finalout
50 @end iftex
51
52 @c @setchapternewpage odd
53 @c @setchapternewpage off
54
55 @titlepage
56 @sp 10
57 @title @value{TITLE}
58 @author by @value{AUTHOR}
59 @page
60 @vskip 0pt plus 1 fill
61 @insertcopying
62 @end titlepage
63 @page
64
65 @macro semantic{}
66 @i{Semantic}
67 @end macro
68
69 @c *************************************************************************
70 @c @ Document
71 @c *************************************************************************
72 @contents
73
74 @node top
75 @top @value{TITLE}
76
77 The @dfn{bovine} parser is the original @semantic{} parser, and is an
78 implementation of an @acronym{LL} parser. It is good for simple
79 languages. It has many conveniences making grammar writing easy. The
80 conveniences make it less powerful than a Bison-like @acronym{LALR}
81 parser. For more information, @inforef{Top, The Wisent Parser Manual,
82 wisent}.
83
84 Bovine @acronym{LL} grammars are stored in files with a @file{.by}
85 extension. When compiled, the contents is converted into a file of
86 the form @file{NAME-by.el}. This, in turn is byte compiled.
87 @inforef{top, Grammar Framework Manual, grammar-fw}.
88
89 @ifnottex
90 @insertcopying
91 @end ifnottex
92
93 @menu
94 * Starting Rules:: The starting rules for the grammar.
95 * Bovine Grammar Rules:: Rules used to parse a language.
96 * Optional Lambda Expression:: Actions to take when a rule is matched.
97 * Bovine Examples:: Simple Samples.
98 * GNU Free Documentation License:: The license for this documentation.
99 @c * Index::
100 @end menu
101
102 @node Starting Rules
103 @chapter Starting Rules
104
105 In Bison, one and only one nonterminal is designated as the ``start''
106 symbol. In @semantic{}, one or more nonterminals can be designated as
107 the ``start'' symbol. They are declared following the @code{%start}
108 keyword separated by spaces. @inforef{start Decl, ,grammar-fw}.
109
110 If no @code{%start} keyword is used in a grammar, then the very first
111 is used. Internally the first start nonterminal is targeted by the
112 reserved symbol @code{bovine-toplevel}, so it can be found by the
113 parser harness.
114
115 To find locally defined variables, the local context handler needs to
116 parse the body of functional code. The @code{scopestart} declaration
117 specifies the name of a nonterminal used as the goal to parse a local
118 context, @inforef{scopestart Decl, ,grammar-fw}. Internally the
119 scopestart nonterminal is targeted by the reserved symbol
120 @code{bovine-inner-scope}, so it can be found by the parser harness.
121
122 @node Bovine Grammar Rules
123 @chapter Bovine Grammar Rules
124
125 The rules are what allow the compiler to create tags from a language
126 file. Once the setup is done in the prologue, you can start writing
127 rules. @inforef{Grammar Rules, ,grammar-fw}.
128
129 @example
130 @var{result} : @var{components1} @var{optional-semantic-action1})
131 | @var{components2} @var{optional-semantic-action2}
132 ;
133 @end example
134
135 @var{result} is a nonterminal, that is a symbol synthesized in your grammar.
136 @var{components} is a list of elements that are to be matched if @var{result}
137 is to be made. @var{optional-semantic-action} is an optional sequence
138 of simplified Emacs Lisp expressions for concocting the parse tree.
139
140 In bison, each time an element of @var{components} is found, it is
141 @dfn{shifted} onto the parser stack. (The stack of matched elements.)
142 When all @var{components}' elements have been matched, it is
143 @dfn{reduced} to @var{result}. @xref{Algorithm,,, bison, The GNU Bison Manual}.
144
145 A particular @var{result} written into your grammar becomes
146 the parser's goal. It is designated by a @code{%start} statement
147 (@pxref{Starting Rules}). The value returned by the associated
148 @var{optional-semantic-action} is the parser's result. It should be
149 a tree of @semantic{} @dfn{tags}, @inforef{Semantic Tags, ,
150 semantic-appdev}.
151
152 @var{components} is made up of symbols. A symbol such as @code{FOO}
153 means that a syntactic token of class @code{FOO} must be matched.
154
155 @menu
156 * How Lexical Tokens Match::
157 * Grammar-to-Lisp Details::
158 * Order of components in rules::
159 @end menu
160
161 @node How Lexical Tokens Match
162 @section How Lexical Tokens Match
163
164 A lexical rule must be used to define how to match a lexical token.
165
166 For instance:
167
168 @example
169 %keyword FOO "foo"
170 @end example
171
172 Means that @code{FOO} is a reserved language keyword, matched as such
173 by looking up into a keyword table, @inforef{keyword Decl,
174 ,grammar-fw}. This is because @code{"foo"} will be converted to
175 @code{FOO} in the lexical analysis stage. Thus the symbol @code{FOO}
176 won't be available any other way.
177
178 If we specify our token in this way:
179
180 @example
181 %token <symbol> FOO "foo"
182 @end example
183
184 then @code{FOO} will match the string @code{"foo"} explicitly, but it
185 won't do so at the lexical level, allowing use of the text
186 @code{"foo"} in other forms of regular expressions.
187
188 In that case, @code{FOO} is a @code{symbol}-type token. To match, a
189 @code{symbol} must first be encountered, and then it must
190 @code{string-match "foo"}.
191
192 @table @strong
193 @item Caution:
194 Be especially careful to remember that @code{"foo"}, and more
195 generally the %token's match-value string, is a regular expression!
196 @end table
197
198 Non symbol tokens are also allowed. For example:
199
200 @example
201 %token <punctuation> PERIOD "[.]"
202
203 filename : symbol PERIOD symbol
204 ;
205 @end example
206
207 @code{PERIOD} is a @code{punctuation}-type token that will explicitly
208 match one period when used in the above rule.
209
210 @table @strong
211 @item Please Note:
212 @code{symbol}, @code{punctuation}, etc., are predefined lexical token
213 types, based on the @dfn{syntax class}-character associations
214 currently in effect.
215 @end table
216
217 @node Grammar-to-Lisp Details
218 @section Grammar-to-Lisp Details
219
220 For the bovinator, lexical token matching patterns are @emph{inlined}.
221 When the grammar-to-lisp converter encounters a lexical token
222 declaration of the form:
223
224 @example
225 %token <@var{type}> @var{token-name} @var{match-value}
226 @end example
227
228 It substitutes every occurrences of @var{token-name} in rules, by its
229 expanded form:
230
231 @example
232 @var{type} @var{match-value}
233 @end example
234
235 For example:
236
237 @example
238 %token <symbol> MOOSE "moose"
239
240 find_a_moose: MOOSE
241 ;
242 @end example
243
244 Will generate this pseudo equivalent-rule:
245
246 @example
247 find_a_moose: symbol "moose" ;; invalid syntax!
248 ;
249 @end example
250
251 Thus, from the bovinator point of view, the @var{components} part of a
252 rule is made up of symbols and strings. A string in the mix means
253 that the previous symbol must have the additional constraint of
254 exactly matching it, as described in @ref{How Lexical Tokens Match}.
255
256 @table @strong
257 @item Please Note:
258 For the bovinator, this task was mixed into the language definition to
259 simplify implementation, though Bison's technique is more efficient.
260 @end table
261
262 @node Order of components in rules
263 @section Order of components in rules
264
265 If a rule has multiple components, order is important, for example
266
267 @example
268 headerfile : symbol PERIOD symbol
269 | symbol
270 ;
271 @end example
272
273 would match @samp{foo.h} or the @acronym{C++} header @samp{foo}.
274 The bovine parser will first attempt to match the long form, and then
275 the short form. If they were in reverse order, then the long form
276 would never be tested.
277
278 @c @xref{Default syntactic tokens}.
279
280 @node Optional Lambda Expression
281 @chapter Optional Lambda Expressions
282
283 The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into
284 a bovine lambda. This lambda has special short-cuts to simplify
285 reading the semantic action definition. An @acronym{OLE} like this:
286
287 @example
288 ( $1 )
289 @end example
290
291 results in a lambda return which consists entirely of the string
292 or object found by matching the first (zeroth) element of match.
293 An @acronym{OLE} like this:
294
295 @example
296 ( ,(foo $1) )
297 @end example
298
299 executes @code{foo} on the first argument, and then splices its return
300 into the return list whereas:
301
302 @example
303 ( (foo $1) )
304 @end example
305
306 executes @code{foo}, and that is placed in the return list.
307
308 Here are other things that can appear inline:
309
310 @table @code
311 @item $1
312 The first object matched.
313
314 @item ,$1
315 The first object spliced into the list (assuming it is a list from a
316 non-terminal).
317
318 @item '$1
319 The first object matched, placed in a list. I.e., @code{( $1 )}.
320
321 @item foo
322 The symbol @code{foo} (exactly as displayed).
323
324 @item (foo)
325 A function call to foo which is stuck into the return list.
326
327 @item ,(foo)
328 A function call to foo which is spliced into the return list.
329
330 @item '(foo)
331 A function call to foo which is stuck into the return list in a list.
332
333 @item (EXPAND @var{$1} @var{nonterminal} @var{depth})
334 A list starting with @code{EXPAND} performs a recursive parse on the
335 token passed to it (represented by @samp{$1} above.) The
336 @dfn{semantic list} is a common token to expand, as there are often
337 interesting things in the list. The @var{nonterminal} is a symbol in
338 your table which the bovinator will start with when parsing.
339 @var{nonterminal}'s definition is the same as any other nonterminal.
340 @var{depth} should be at least @samp{1} when descending into a
341 semantic list.
342
343 @item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth})
344 Is like @code{EXPAND}, except that the parser will iterate over
345 @var{nonterminal} until there are no more matches. (The same way the
346 parser iterates over the starting rule (@pxref{Starting Rules}). This
347 lets you have much simpler rules in this specific case, and also lets
348 you have positional information in the returned tokens, and error
349 skipping.
350
351 @item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{})
352 This is used for creating an association list. Each @var{symbol} is
353 included in the list if the associated @var{value} is non-@code{nil}.
354 While the items are all listed explicitly, the created structure is an
355 association list of the form:
356
357 @example
358 ((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{})
359 @end example
360
361 @item (TAG @var{name} @var{class} [@var{attributes}])
362 This creates one tag in the current buffer.
363
364 @table @var
365 @item name
366 Is a string that represents the tag in the language.
367
368 @item class
369 Is the kind of tag being create, such as @code{function}, or
370 @code{variable}, though any symbol will work.
371
372 @item attributes
373 Is an optional set of labeled values such as @code{:constant-flag t :parent
374 "parenttype"}.
375 @end table
376
377 @item (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}])
378 @itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}])
379 @itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}])
380 @itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}])
381 @itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}])
382 @itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}])
383 Create a tag with @var{name} of respectively the class
384 @code{variable}, @code{function}, @code{type}, @code{include},
385 @code{package}, and @code{code}.
386 See @inforef{Creating Tags, , semantic-appdev} for the lisp
387 functions these translate into.
388 @end table
389
390 If the symbol @code{%quotemode backquote} is specified, then use
391 @code{,@@} to splice a list in, and @code{,} to evaluate the expression.
392 This lets you send @code{$1} as a symbol into a list instead of having
393 it expanded inline.
394
395 @node Bovine Examples
396 @chapter Examples
397
398 The rule:
399
400 @example
401 any-symbol: symbol
402 ;
403 @end example
404
405 is equivalent to
406
407 @example
408 any-symbol: symbol
409 ( $1 )
410 ;
411 @end example
412
413 which, if it matched the string @samp{"A"}, would return
414
415 @example
416 ( "A" )
417 @end example
418
419 If this rule were used like this:
420
421 @example
422 %token <punctuation> EQUAL "="
423 @dots{}
424 assign: any-symbol EQUAL any-symbol
425 ( $1 $3 )
426 ;
427 @end example
428
429 it would match @samp{"A=B"}, and return
430
431 @example
432 ( ("A") ("B") )
433 @end example
434
435 The letters @samp{A} and @samp{B} come back in lists because
436 @samp{any-symbol} is a nonterminal, not an actual lexical element.
437
438 To get a better result with nonterminals, use @asis{,} to splice lists
439 in like this:
440
441 @example
442 %token <punctuation> EQUAL "="
443 @dots{}
444 assign: any-symbol EQUAL any-symbol
445 ( ,$1 ,$3 )
446 ;
447 @end example
448
449 which would return
450
451 @example
452 ( "A" "B" )
453 @end example
454
455 @node GNU Free Documentation License
456 @appendix GNU Free Documentation License
457
458 @include doclicense.texi
459
460 @c There is nothing to index at the moment.
461 @ignore
462 @node Index
463 @unnumbered Index
464 @printindex cp
465 @end ignore
466
467 @iftex
468 @contents
469 @summarycontents
470 @end iftex
471
472 @bye
473
474 @c Following comments are for the benefit of ispell.
475
476 @c LocalWords: bovinator inlined