Don't say "buying copies from the FSF" for manuals they do not publish
[bpt/emacs.git] / doc / misc / bovine.texi
CommitLineData
cfa49c1e
EL
1\input texinfo @c -*-texinfo-*-
2@c %**start of header
98c94021 3@setfilename ../../info/bovine
cfa49c1e
EL
4@set TITLE Bovine parser development
5@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
6@settitle @value{TITLE}
7
8@c *************************************************************************
9@c @ Header
10@c *************************************************************************
11
12@c Merge all indexes into a single index for now.
13@c We can always separate them later into two or more as needed.
14@syncodeindex vr cp
15@syncodeindex fn cp
16@syncodeindex ky cp
17@syncodeindex pg cp
18@syncodeindex tp cp
19
20@c @footnotestyle separate
21@c @paragraphindent 2
22@c @@smallbook
23@c %**end of header
24
25@copying
98c94021 26Copyright @copyright{} 1999-2004, 2012 Free Software Foundation, Inc.
cfa49c1e
EL
27
28@quotation
29Permission is granted to copy, distribute and/or modify this document
98c94021
GM
30under the terms of the GNU Free Documentation License, Version 1.3 or
31any later version published by the Free Software Foundation; with no
32Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
33and with the Back-Cover Texts as in (a) below. A copy of the license
34is included in the section entitled ``GNU Free Documentation License''.
35
36(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
6bf430d1 37modify this GNU manual.''
cfa49c1e
EL
38@end quotation
39@end copying
40
98c94021 41@dircategory Emacs misc features
cfa49c1e 42@direntry
98c94021 43* Bovine: (bovine). Semantic bovine parser development.
cfa49c1e 44@end direntry
cfa49c1e
EL
45
46@iftex
47@finalout
48@end iftex
49
50@c @setchapternewpage odd
51@c @setchapternewpage off
52
cfa49c1e
EL
53@titlepage
54@sp 10
55@title @value{TITLE}
56@author by @value{AUTHOR}
cfa49c1e
EL
57@page
58@vskip 0pt plus 1 fill
59@insertcopying
60@end titlepage
61@page
62
98c94021
GM
63@macro semantic{}
64@i{Semantic}
65@end macro
cfa49c1e
EL
66
67@c *************************************************************************
68@c @ Document
69@c *************************************************************************
70@contents
71
72@node top
73@top @value{TITLE}
74
75The @dfn{bovine} parser is the original @semantic{} parser, and is an
76implementation of an @acronym{LL} parser. It is good for simple
77languages. It has many conveniences making grammar writing easy. The
78conveniences make it less powerful than a Bison-like @acronym{LALR}
79parser. For more information, @inforef{top, the Wisent Parser Manual,
80wisent}.
81
82Bovine @acronym{LL} grammars are stored in files with a @file{.by}
83extension. When compiled, the contents is converted into a file of
84the form @file{NAME-by.el}. This, in turn is byte compiled.
85@inforef{top, Grammar Framework Manual, grammar-fw}.
86
98c94021
GM
87@ifnottex
88@insertcopying
89@end ifnottex
90
cfa49c1e
EL
91@menu
92* Starting Rules:: The starting rules for the grammar.
98c94021
GM
93* Bovine Grammar Rules:: Rules used to parse a language.
94* Optional Lambda Expression:: Actions to take when a rule is matched.
95* Bovine Examples:: Simple Samples.
96* GNU Free Documentation License:: The license for this documentation.
97@c * Index::
cfa49c1e
EL
98@end menu
99
100@node Starting Rules
101@chapter Starting Rules
102
103In Bison, one and only one nonterminal is designated as the ``start''
104symbol. In @semantic{}, one or more nonterminals can be designated as
105the ``start'' symbol. They are declared following the @code{%start}
106keyword separated by spaces. @inforef{start Decl, ,grammar-fw}.
107
108If no @code{%start} keyword is used in a grammar, then the very first
109is used. Internally the first start nonterminal is targeted by the
110reserved symbol @code{bovine-toplevel}, so it can be found by the
111parser harness.
112
113To find locally defined variables, the local context handler needs to
114parse the body of functional code. The @code{scopestart} declaration
115specifies the name of a nonterminal used as the goal to parse a local
116context, @inforef{scopestart Decl, ,grammar-fw}. Internally the
117scopestart nonterminal is targeted by the reserved symbol
118@code{bovine-inner-scope}, so it can be found by the parser harness.
119
120@node Bovine Grammar Rules
121@chapter Bovine Grammar Rules
122
123The rules are what allow the compiler to create tags from a language
124file. Once the setup is done in the prologue, you can start writing
125rules. @inforef{Grammar Rules, ,grammar-fw}.
126
127@example
128@var{result} : @var{components1} @var{optional-semantic-action1})
129 | @var{components2} @var{optional-semantic-action2}
130 ;
131@end example
132
133@var{result} is a nonterminal, that is a symbol synthesized in your grammar.
134@var{components} is a list of elements that are to be matched if @var{result}
135is to be made. @var{optional-semantic-action} is an optional sequence
136of simplified Emacs Lisp expressions for concocting the parse tree.
137
138In bison, each time an element of @var{components} is found, it is
139@dfn{shifted} onto the parser stack. (The stack of matched elements.)
140When all @var{components}' elements have been matched, it is
141@dfn{reduced} to @var{result}. @xref{(bison)Algorithm}.
142
143A particular @var{result} written into your grammar becomes
144the parser's goal. It is designated by a @code{%start} statement
145(@pxref{Starting Rules}). The value returned by the associated
146@var{optional-semantic-action} is the parser's result. It should be
147a tree of @semantic{} @dfn{tags}, @inforef{Semantic Tags, ,
148semantic-appdev}.
149
150@var{components} is made up of symbols. A symbol such as @code{FOO}
151means that a syntactic token of class @code{FOO} must be matched.
152
153@menu
98c94021
GM
154* How Lexical Tokens Match::
155* Grammar-to-Lisp Details::
156* Order of components in rules::
cfa49c1e
EL
157@end menu
158
159@node How Lexical Tokens Match
160@section How Lexical Tokens Match
161
162A lexical rule must be used to define how to match a lexical token.
163
164For instance:
165
166@example
167%keyword FOO "foo"
168@end example
169
170Means that @code{FOO} is a reserved language keyword, matched as such
171by looking up into a keyword table, @inforef{keyword Decl,
172,grammar-fw}. This is because @code{"foo"} will be converted to
173@code{FOO} in the lexical analysis stage. Thus the symbol @code{FOO}
174won't be available any other way.
175
176If we specify our token in this way:
177
178@example
179%token <symbol> FOO "foo"
180@end example
181
182then @code{FOO} will match the string @code{"foo"} explicitly, but it
183won't do so at the lexical level, allowing use of the text
184@code{"foo"} in other forms of regular expressions.
185
186In that case, @code{FOO} is a @code{symbol}-type token. To match, a
187@code{symbol} must first be encountered, and then it must
188@code{string-match "foo"}.
189
190@table @strong
191@item Caution:
192Be especially careful to remember that @code{"foo"}, and more
193generally the %token's match-value string, is a regular expression!
194@end table
195
196Non symbol tokens are also allowed. For example:
197
198@example
199%token <punctuation> PERIOD "[.]"
200
201filename : symbol PERIOD symbol
202 ;
203@end example
204
205@code{PERIOD} is a @code{punctuation}-type token that will explicitly
206match one period when used in the above rule.
207
208@table @strong
209@item Please Note:
210@code{symbol}, @code{punctuation}, etc., are predefined lexical token
211types, based on the @dfn{syntax class}-character associations
212currently in effect.
213@end table
214
215@node Grammar-to-Lisp Details
216@section Grammar-to-Lisp Details
217
218For the bovinator, lexical token matching patterns are @emph{inlined}.
219When the grammar-to-lisp converter encounters a lexical token
220declaration of the form:
221
222@example
223%token <@var{type}> @var{token-name} @var{match-value}
224@end example
225
226It substitutes every occurrences of @var{token-name} in rules, by its
227expanded form:
228
229@example
230@var{type} @var{match-value}
231@end example
232
233For example:
234
235@example
236%token <symbol> MOOSE "moose"
237
238find_a_moose: MOOSE
239 ;
240@end example
241
242Will generate this pseudo equivalent-rule:
243
244@example
245find_a_moose: symbol "moose" ;; invalid syntax!
246 ;
247@end example
248
249Thus, from the bovinator point of view, the @var{components} part of a
250rule is made up of symbols and strings. A string in the mix means
251that the previous symbol must have the additional constraint of
252exactly matching it, as described in @ref{How Lexical Tokens Match}.
253
254@table @strong
255@item Please Note:
256For the bovinator, this task was mixed into the language definition to
257simplify implementation, though Bison's technique is more efficient.
258@end table
259
260@node Order of components in rules
261@section Order of components in rules
262
263If a rule has multiple components, order is important, for example
264
265@example
266headerfile : symbol PERIOD symbol
267 | symbol
268 ;
269@end example
270
271would match @samp{foo.h} or the @acronym{C++} header @samp{foo}.
272The bovine parser will first attempt to match the long form, and then
273the short form. If they were in reverse order, then the long form
274would never be tested.
275
276@c @xref{Default syntactic tokens}.
277
278@node Optional Lambda Expression
279@chapter Optional Lambda Expressions
280
281The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into
282a bovine lambda. This lambda has special short-cuts to simplify
283reading the semantic action definition. An @acronym{OLE} like this:
284
285@example
286( $1 )
287@end example
288
289results in a lambda return which consists entirely of the string
290or object found by matching the first (zeroth) element of match.
291An @acronym{OLE} like this:
292
293@example
294( ,(foo $1) )
295@end example
296
297executes @code{foo} on the first argument, and then splices its return
298into the return list whereas:
299
300@example
301( (foo $1) )
302@end example
303
304executes @code{foo}, and that is placed in the return list.
305
306Here are other things that can appear inline:
307
308@table @code
309@item $1
310The first object matched.
311
312@item ,$1
313The first object spliced into the list (assuming it is a list from a
314non-terminal).
315
316@item '$1
317The first object matched, placed in a list. i.e. @code{( $1 )}.
318
319@item foo
320The symbol @code{foo} (exactly as displayed).
321
322@item (foo)
323A function call to foo which is stuck into the return list.
324
325@item ,(foo)
326A function call to foo which is spliced into the return list.
327
328@item '(foo)
329A function call to foo which is stuck into the return list in a list.
330
331@item (EXPAND @var{$1} @var{nonterminal} @var{depth})
332A list starting with @code{EXPAND} performs a recursive parse on the
333token passed to it (represented by @samp{$1} above.) The
334@dfn{semantic list} is a common token to expand, as there are often
335interesting things in the list. The @var{nonterminal} is a symbol in
336your table which the bovinator will start with when parsing.
337@var{nonterminal}'s definition is the same as any other nonterminal.
338@var{depth} should be at least @samp{1} when descending into a
339semantic list.
340
341@item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth})
342Is like @code{EXPAND}, except that the parser will iterate over
343@var{nonterminal} until there are no more matches. (The same way the
344parser iterates over the starting rule (@pxref{Starting Rules}). This
345lets you have much simpler rules in this specific case, and also lets
346you have positional information in the returned tokens, and error
347skipping.
348
349@item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{})
350This is used for creating an association list. Each @var{symbol} is
351included in the list if the associated @var{value} is non-@code{nil}.
352While the items are all listed explicitly, the created structure is an
353association list of the form:
354
355@example
356((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{})
357@end example
358
359@item (TAG @var{name} @var{class} [@var{attributes}])
360This creates one tag in the current buffer.
361
362@table @var
363@item name
364Is a string that represents the tag in the language.
365
366@item class
367Is the kind of tag being create, such as @code{function}, or
368@code{variable}, though any symbol will work.
369
370@item attributes
371Is an optional set of labeled values such as @w{@code{:constant-flag t :parent
372"parenttype"}}.
373@end table
374
375@item (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}])
376@itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}])
377@itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}])
378@itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}])
379@itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}])
380@itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}])
381Create a tag with @var{name} of respectively the class
382@code{variable}, @code{function}, @code{type}, @code{include},
383@code{package}, and @code{code}.
384See @inforef{Creating Tags, , semantic-appdev} for the lisp
385functions these translate into.
386@end table
387
388If the symbol @code{%quotemode backquote} is specified, then use
389@code{,@@} to splice a list in, and @code{,} to evaluate the expression.
390This lets you send @code{$1} as a symbol into a list instead of having
391it expanded inline.
392
393@node Bovine Examples
394@chapter Examples
395
396The rule:
397
398@example
399any-symbol: symbol
400 ;
401@end example
402
403is equivalent to
404
405@example
406any-symbol: symbol
407 ( $1 )
408 ;
409@end example
410
411which, if it matched the string @samp{"A"}, would return
412
413@example
414( "A" )
415@end example
416
417If this rule were used like this:
418
419@example
420%token <punctuation> EQUAL "="
421@dots{}
422assign: any-symbol EQUAL any-symbol
423 ( $1 $3 )
424 ;
425@end example
426
427it would match @samp{"A=B"}, and return
428
429@example
430( ("A") ("B") )
431@end example
432
433The letters @samp{A} and @samp{B} come back in lists because
434@samp{any-symbol} is a nonterminal, not an actual lexical element.
435
436To get a better result with nonterminals, use @asis{,} to splice lists
437in like this:
438
439@example
440%token <punctuation> EQUAL "="
441@dots{}
442assign: any-symbol EQUAL any-symbol
443 ( ,$1 ,$3 )
444 ;
445@end example
446
447which would return
448
449@example
450( "A" "B" )
451@end example
452
453@node GNU Free Documentation License
454@appendix GNU Free Documentation License
455
98c94021 456@include doclicense.texi
cfa49c1e 457
98c94021
GM
458@c There is nothing to index at the moment.
459@ignore
cfa49c1e
EL
460@node Index
461@unnumbered Index
462@printindex cp
98c94021 463@end ignore
cfa49c1e
EL
464
465@iftex
466@contents
467@summarycontents
468@end iftex
469
470@bye
471
472@c Following comments are for the benefit of ispell.
473
474@c LocalWords: bovinator inlined