* doc/misc/sem-user.texi (Create System Databases): Markup fix.
[bpt/emacs.git] / doc / misc / bovine.texi
CommitLineData
cfa49c1e
EL
1\input texinfo @c -*-texinfo-*-
2@c %**start of header
98c94021 3@setfilename ../../info/bovine
cfa49c1e
EL
4@set TITLE Bovine parser development
5@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
6@settitle @value{TITLE}
c6ab4664 7@documentencoding UTF-8
cfa49c1e
EL
8
9@c *************************************************************************
10@c @ Header
11@c *************************************************************************
12
13@c Merge all indexes into a single index for now.
14@c We can always separate them later into two or more as needed.
15@syncodeindex vr cp
16@syncodeindex fn cp
17@syncodeindex ky cp
18@syncodeindex pg cp
19@syncodeindex tp cp
20
21@c @footnotestyle separate
22@c @paragraphindent 2
23@c @@smallbook
24@c %**end of header
25
26@copying
6bc383b1 27Copyright @copyright{} 1999--2004, 2012--2014 Free Software Foundation, Inc.
cfa49c1e
EL
28
29@quotation
30Permission is granted to copy, distribute and/or modify this document
98c94021
GM
31under the terms of the GNU Free Documentation License, Version 1.3 or
32any later version published by the Free Software Foundation; with no
33Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
34and with the Back-Cover Texts as in (a) below. A copy of the license
35is included in the section entitled ``GNU Free Documentation License''.
36
37(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
6bf430d1 38modify this GNU manual.''
cfa49c1e
EL
39@end quotation
40@end copying
41
98c94021 42@dircategory Emacs misc features
cfa49c1e 43@direntry
98c94021 44* Bovine: (bovine). Semantic bovine parser development.
cfa49c1e 45@end direntry
cfa49c1e
EL
46
47@iftex
48@finalout
49@end iftex
50
51@c @setchapternewpage odd
52@c @setchapternewpage off
53
cfa49c1e
EL
54@titlepage
55@sp 10
56@title @value{TITLE}
57@author by @value{AUTHOR}
cfa49c1e
EL
58@page
59@vskip 0pt plus 1 fill
60@insertcopying
61@end titlepage
62@page
63
98c94021
GM
64@macro semantic{}
65@i{Semantic}
66@end macro
cfa49c1e
EL
67
68@c *************************************************************************
69@c @ Document
70@c *************************************************************************
71@contents
72
73@node top
74@top @value{TITLE}
75
76The @dfn{bovine} parser is the original @semantic{} parser, and is an
77implementation of an @acronym{LL} parser. It is good for simple
78languages. It has many conveniences making grammar writing easy. The
79conveniences make it less powerful than a Bison-like @acronym{LALR}
a944db14 80parser. For more information, @inforef{Top, The Wisent Parser Manual,
cfa49c1e
EL
81wisent}.
82
83Bovine @acronym{LL} grammars are stored in files with a @file{.by}
84extension. When compiled, the contents is converted into a file of
85the form @file{NAME-by.el}. This, in turn is byte compiled.
86@inforef{top, Grammar Framework Manual, grammar-fw}.
87
98c94021
GM
88@ifnottex
89@insertcopying
90@end ifnottex
91
cfa49c1e
EL
92@menu
93* Starting Rules:: The starting rules for the grammar.
98c94021
GM
94* Bovine Grammar Rules:: Rules used to parse a language.
95* Optional Lambda Expression:: Actions to take when a rule is matched.
96* Bovine Examples:: Simple Samples.
97* GNU Free Documentation License:: The license for this documentation.
98@c * Index::
cfa49c1e
EL
99@end menu
100
101@node Starting Rules
102@chapter Starting Rules
103
104In Bison, one and only one nonterminal is designated as the ``start''
105symbol. In @semantic{}, one or more nonterminals can be designated as
106the ``start'' symbol. They are declared following the @code{%start}
107keyword separated by spaces. @inforef{start Decl, ,grammar-fw}.
108
109If no @code{%start} keyword is used in a grammar, then the very first
110is used. Internally the first start nonterminal is targeted by the
111reserved symbol @code{bovine-toplevel}, so it can be found by the
112parser harness.
113
114To find locally defined variables, the local context handler needs to
115parse the body of functional code. The @code{scopestart} declaration
116specifies the name of a nonterminal used as the goal to parse a local
117context, @inforef{scopestart Decl, ,grammar-fw}. Internally the
118scopestart nonterminal is targeted by the reserved symbol
119@code{bovine-inner-scope}, so it can be found by the parser harness.
120
121@node Bovine Grammar Rules
122@chapter Bovine Grammar Rules
123
124The rules are what allow the compiler to create tags from a language
125file. Once the setup is done in the prologue, you can start writing
126rules. @inforef{Grammar Rules, ,grammar-fw}.
127
128@example
129@var{result} : @var{components1} @var{optional-semantic-action1})
130 | @var{components2} @var{optional-semantic-action2}
131 ;
132@end example
133
134@var{result} is a nonterminal, that is a symbol synthesized in your grammar.
135@var{components} is a list of elements that are to be matched if @var{result}
136is to be made. @var{optional-semantic-action} is an optional sequence
137of simplified Emacs Lisp expressions for concocting the parse tree.
138
139In bison, each time an element of @var{components} is found, it is
140@dfn{shifted} onto the parser stack. (The stack of matched elements.)
141When all @var{components}' elements have been matched, it is
88edc57f 142@dfn{reduced} to @var{result}. @xref{Algorithm,,, bison, The GNU Bison Manual}.
cfa49c1e
EL
143
144A particular @var{result} written into your grammar becomes
145the parser's goal. It is designated by a @code{%start} statement
146(@pxref{Starting Rules}). The value returned by the associated
147@var{optional-semantic-action} is the parser's result. It should be
148a tree of @semantic{} @dfn{tags}, @inforef{Semantic Tags, ,
149semantic-appdev}.
150
151@var{components} is made up of symbols. A symbol such as @code{FOO}
152means that a syntactic token of class @code{FOO} must be matched.
153
154@menu
98c94021
GM
155* How Lexical Tokens Match::
156* Grammar-to-Lisp Details::
157* Order of components in rules::
cfa49c1e
EL
158@end menu
159
160@node How Lexical Tokens Match
161@section How Lexical Tokens Match
162
163A lexical rule must be used to define how to match a lexical token.
164
165For instance:
166
167@example
168%keyword FOO "foo"
169@end example
170
171Means that @code{FOO} is a reserved language keyword, matched as such
172by looking up into a keyword table, @inforef{keyword Decl,
173,grammar-fw}. This is because @code{"foo"} will be converted to
174@code{FOO} in the lexical analysis stage. Thus the symbol @code{FOO}
175won't be available any other way.
176
177If we specify our token in this way:
178
179@example
180%token <symbol> FOO "foo"
181@end example
182
183then @code{FOO} will match the string @code{"foo"} explicitly, but it
184won't do so at the lexical level, allowing use of the text
185@code{"foo"} in other forms of regular expressions.
186
187In that case, @code{FOO} is a @code{symbol}-type token. To match, a
188@code{symbol} must first be encountered, and then it must
189@code{string-match "foo"}.
190
191@table @strong
192@item Caution:
193Be especially careful to remember that @code{"foo"}, and more
194generally the %token's match-value string, is a regular expression!
195@end table
196
197Non symbol tokens are also allowed. For example:
198
199@example
200%token <punctuation> PERIOD "[.]"
201
202filename : symbol PERIOD symbol
203 ;
204@end example
205
206@code{PERIOD} is a @code{punctuation}-type token that will explicitly
207match one period when used in the above rule.
208
209@table @strong
210@item Please Note:
211@code{symbol}, @code{punctuation}, etc., are predefined lexical token
212types, based on the @dfn{syntax class}-character associations
213currently in effect.
214@end table
215
216@node Grammar-to-Lisp Details
217@section Grammar-to-Lisp Details
218
219For the bovinator, lexical token matching patterns are @emph{inlined}.
220When the grammar-to-lisp converter encounters a lexical token
221declaration of the form:
222
223@example
224%token <@var{type}> @var{token-name} @var{match-value}
225@end example
226
227It substitutes every occurrences of @var{token-name} in rules, by its
228expanded form:
229
230@example
231@var{type} @var{match-value}
232@end example
233
234For example:
235
236@example
237%token <symbol> MOOSE "moose"
238
239find_a_moose: MOOSE
240 ;
241@end example
242
243Will generate this pseudo equivalent-rule:
244
245@example
246find_a_moose: symbol "moose" ;; invalid syntax!
247 ;
248@end example
249
250Thus, from the bovinator point of view, the @var{components} part of a
251rule is made up of symbols and strings. A string in the mix means
252that the previous symbol must have the additional constraint of
253exactly matching it, as described in @ref{How Lexical Tokens Match}.
254
255@table @strong
256@item Please Note:
257For the bovinator, this task was mixed into the language definition to
258simplify implementation, though Bison's technique is more efficient.
259@end table
260
261@node Order of components in rules
262@section Order of components in rules
263
264If a rule has multiple components, order is important, for example
265
266@example
267headerfile : symbol PERIOD symbol
268 | symbol
269 ;
270@end example
271
272would match @samp{foo.h} or the @acronym{C++} header @samp{foo}.
273The bovine parser will first attempt to match the long form, and then
274the short form. If they were in reverse order, then the long form
275would never be tested.
276
277@c @xref{Default syntactic tokens}.
278
279@node Optional Lambda Expression
280@chapter Optional Lambda Expressions
281
282The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into
283a bovine lambda. This lambda has special short-cuts to simplify
284reading the semantic action definition. An @acronym{OLE} like this:
285
286@example
287( $1 )
288@end example
289
290results in a lambda return which consists entirely of the string
291or object found by matching the first (zeroth) element of match.
292An @acronym{OLE} like this:
293
294@example
295( ,(foo $1) )
296@end example
297
298executes @code{foo} on the first argument, and then splices its return
299into the return list whereas:
300
301@example
302( (foo $1) )
303@end example
304
305executes @code{foo}, and that is placed in the return list.
306
307Here are other things that can appear inline:
308
309@table @code
310@item $1
311The first object matched.
312
313@item ,$1
314The first object spliced into the list (assuming it is a list from a
315non-terminal).
316
317@item '$1
65e7ca35 318The first object matched, placed in a list. I.e., @code{( $1 )}.
cfa49c1e
EL
319
320@item foo
321The symbol @code{foo} (exactly as displayed).
322
323@item (foo)
324A function call to foo which is stuck into the return list.
325
326@item ,(foo)
327A function call to foo which is spliced into the return list.
328
329@item '(foo)
330A function call to foo which is stuck into the return list in a list.
331
332@item (EXPAND @var{$1} @var{nonterminal} @var{depth})
333A list starting with @code{EXPAND} performs a recursive parse on the
334token passed to it (represented by @samp{$1} above.) The
335@dfn{semantic list} is a common token to expand, as there are often
336interesting things in the list. The @var{nonterminal} is a symbol in
337your table which the bovinator will start with when parsing.
338@var{nonterminal}'s definition is the same as any other nonterminal.
339@var{depth} should be at least @samp{1} when descending into a
340semantic list.
341
342@item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth})
343Is like @code{EXPAND}, except that the parser will iterate over
344@var{nonterminal} until there are no more matches. (The same way the
345parser iterates over the starting rule (@pxref{Starting Rules}). This
346lets you have much simpler rules in this specific case, and also lets
347you have positional information in the returned tokens, and error
348skipping.
349
350@item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{})
351This is used for creating an association list. Each @var{symbol} is
352included in the list if the associated @var{value} is non-@code{nil}.
353While the items are all listed explicitly, the created structure is an
354association list of the form:
355
356@example
357((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{})
358@end example
359
360@item (TAG @var{name} @var{class} [@var{attributes}])
361This creates one tag in the current buffer.
362
363@table @var
364@item name
365Is a string that represents the tag in the language.
366
367@item class
368Is the kind of tag being create, such as @code{function}, or
369@code{variable}, though any symbol will work.
370
371@item attributes
fd762011
GM
372Is an optional set of labeled values such as @code{:constant-flag t :parent
373"parenttype"}.
cfa49c1e
EL
374@end table
375
376@item (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}])
377@itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}])
378@itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}])
379@itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}])
380@itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}])
381@itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}])
382Create a tag with @var{name} of respectively the class
383@code{variable}, @code{function}, @code{type}, @code{include},
384@code{package}, and @code{code}.
385See @inforef{Creating Tags, , semantic-appdev} for the lisp
386functions these translate into.
387@end table
388
389If the symbol @code{%quotemode backquote} is specified, then use
390@code{,@@} to splice a list in, and @code{,} to evaluate the expression.
391This lets you send @code{$1} as a symbol into a list instead of having
392it expanded inline.
393
394@node Bovine Examples
395@chapter Examples
396
397The rule:
398
399@example
400any-symbol: symbol
401 ;
402@end example
403
404is equivalent to
405
406@example
407any-symbol: symbol
408 ( $1 )
409 ;
410@end example
411
412which, if it matched the string @samp{"A"}, would return
413
414@example
415( "A" )
416@end example
417
418If this rule were used like this:
419
420@example
421%token <punctuation> EQUAL "="
422@dots{}
423assign: any-symbol EQUAL any-symbol
424 ( $1 $3 )
425 ;
426@end example
427
428it would match @samp{"A=B"}, and return
429
430@example
431( ("A") ("B") )
432@end example
433
434The letters @samp{A} and @samp{B} come back in lists because
435@samp{any-symbol} is a nonterminal, not an actual lexical element.
436
437To get a better result with nonterminals, use @asis{,} to splice lists
438in like this:
439
440@example
441%token <punctuation> EQUAL "="
442@dots{}
443assign: any-symbol EQUAL any-symbol
444 ( ,$1 ,$3 )
445 ;
446@end example
447
448which would return
449
450@example
451( "A" "B" )
452@end example
453
454@node GNU Free Documentation License
455@appendix GNU Free Documentation License
456
98c94021 457@include doclicense.texi
cfa49c1e 458
98c94021
GM
459@c There is nothing to index at the moment.
460@ignore
cfa49c1e
EL
461@node Index
462@unnumbered Index
463@printindex cp
98c94021 464@end ignore
cfa49c1e
EL
465
466@iftex
467@contents
468@summarycontents
469@end iftex
470
471@bye
472
473@c Following comments are for the benefit of ispell.
474
475@c LocalWords: bovinator inlined