Specify info encoding and language.
[bpt/emacs.git] / doc / misc / wisent.texi
CommitLineData
9e7abd17
EL
1\input texinfo @c -*-texinfo-*-
2@c %**start of header
bfe6ffb6 3@setfilename ../../info/wisent
9e7abd17
EL
4@set TITLE Wisent Parser Development
5@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
6@settitle @value{TITLE}
7
8@c *************************************************************************
9@c @ Header
10@c *************************************************************************
11
12@c Merge all indexes into a single index for now.
13@c We can always separate them later into two or more as needed.
14@syncodeindex vr cp
15@syncodeindex fn cp
16@syncodeindex ky cp
17@syncodeindex pg cp
18@syncodeindex tp cp
19
20@c @footnotestyle separate
21@c @paragraphindent 2
22@c @@smallbook
dcd812be
PE
23@documentencoding UTF-8
24@documentlanguage en
9e7abd17
EL
25@c %**end of header
26
27@copying
f0ddbf7b 28Copyright @copyright{} 1988--1993, 1995, 1998--2004, 2007, 2012--2013
bfe6ffb6 29Free Software Foundation, Inc.
9e7abd17 30
bfe6ffb6
GM
31@c Since we are both GNU manuals, we do not need to ack each other here.
32@ignore
9e7abd17
EL
33Some texts are borrowed or adapted from the manual of Bison version
341.35. The text in section entitled ``Understanding the automaton'' is
35adapted from the section ``Understanding Your Parser'' in the manual
36of Bison version 1.49.
bfe6ffb6 37@end ignore
9e7abd17
EL
38
39@quotation
40Permission is granted to copy, distribute and/or modify this document
bfe6ffb6
GM
41under the terms of the GNU Free Documentation License, Version 1.3 or
42any later version published by the Free Software Foundation; with no
43Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
44and with the Back-Cover Texts as in (a) below. A copy of the license
45is included in the section entitled ``GNU Free Documentation License''.
46
47(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
6bf430d1 48modify this GNU manual.''
9e7abd17
EL
49@end quotation
50@end copying
51
bfe6ffb6 52@dircategory Emacs misc features
9e7abd17 53@direntry
bfe6ffb6 54* Wisent: (wisent). Semantic Wisent parser development.
9e7abd17 55@end direntry
9e7abd17
EL
56
57@iftex
58@finalout
59@end iftex
60
61@c @setchapternewpage odd
62@c @setchapternewpage off
63
9e7abd17
EL
64@titlepage
65@sp 10
66@title @value{TITLE}
67@author by @value{AUTHOR}
9e7abd17
EL
68@page
69@vskip 0pt plus 1 fill
70@insertcopying
71@end titlepage
72@page
73
bfe6ffb6
GM
74@macro semantic{}
75@i{Semantic}
76@end macro
9e7abd17
EL
77
78@c *************************************************************************
79@c @ Document
80@c *************************************************************************
81@contents
82
83@node top
84@top @value{TITLE}
85
86Wisent (the European Bison ;-) is an Emacs Lisp implementation of the
87GNU Compiler Compiler Bison.
88
89This manual describes how to use Wisent to develop grammars for
90programming languages, and how to use grammars to parse language
91source in Emacs buffers.
92
93It also describes how Wisent is used with the @semantic{} tool set
94described in the @ref{Top, Semantic Manual, Semantic Manual, semantic}.
95
bfe6ffb6
GM
96@ifnottex
97@insertcopying
98@end ifnottex
99
9e7abd17 100@menu
bfe6ffb6
GM
101* Wisent Overview::
102* Wisent Grammar::
103* Wisent Parsing::
104* Wisent Semantic::
105* GNU Free Documentation License::
106* Index::
9e7abd17
EL
107@end menu
108
109@node Wisent Overview
110@chapter Wisent Overview
111
112@dfn{Wisent} (the European Bison) is an implementation in Emacs Lisp
113of the GNU Compiler Compiler Bison. Its code is a port of the C code
114of GNU Bison 1.28 & 1.31.
115
116For more details on the basic concepts for understanding Wisent, it is
2ecfdaa6 117worthwhile to read the @ref{Top, Bison Manual, , bison}.
9e7abd17
EL
118
119Wisent can generate compilers compatible with the @semantic{} tool set.
120See the @ref{Top, Semantic Manual, , semantic}.
121
122It benefits from these Bison features:
123
124@itemize @bullet
bfe6ffb6 125@item
9e7abd17
EL
126It uses a fast but not so space-efficient encoding for the parse
127tables, described in Corbett's PhD thesis from Berkeley:
128@quotation
129@cite{Static Semantics in Compiler Error Recovery}@*
130June 1985, Report No. UCB/CSD 85/251.
131@end quotation
132
bfe6ffb6 133@item
9e7abd17 134For generating the lookahead sets, Wisent uses the well-known
cccaebd2 135technique of F. DeRemer and A. Pennello described in:
9e7abd17 136@quotation
cccaebd2
PE
137@cite{Efficient Computation of LALR(1) Look-Ahead Sets}@*
138October 1982, ACM TOPLAS Vol 4 No 4, 615--49,
139@uref{http://dx.doi.org/10.1145/69622.357187}.
9e7abd17
EL
140@end quotation
141
bfe6ffb6 142@item
9e7abd17
EL
143Wisent resolves shift/reduce conflicts using operator precedence and
144associativity.
145
bfe6ffb6 146@item
9e7abd17
EL
147Parser error recovery is accomplished using rules which match the
148special token @code{error}.
149@end itemize
150
151Nevertheless there are some fundamental differences between Bison and
152Wisent.
153
154@itemize
155@item
156Wisent is intended to be used in Emacs. It reads and produces Emacs
157Lisp data structures. All the additional code used in grammars is
158Emacs Lisp code.
159
160@item
161Contrary to Bison, Wisent does not generate a parser which combines
162Emacs Lisp code and grammar constructs. They exist separately.
163Wisent reads the grammar from a Lisp data structure and then generates
164grammar constructs as tables. Afterward, the derived tables can be
165included and byte-compiled in separate Emacs Lisp files, and be used
166at a later time by the Wisent's parser engine.
167
168@item
169Wisent allows multiple start nonterminals and allows a call to the
170parsing function to be made for a particular start nonterminal. For
171example, this is particularly useful to parse a region of an Emacs
172buffer. @semantic{} heavily depends on the availability of this feature.
173@end itemize
174
175@node Wisent Grammar
176@chapter Wisent Grammar
177
178@cindex context-free grammar
179@cindex rule
180In order for Wisent to parse a language, it must be described by a
181@dfn{context-free grammar}. That is a grammar specified as rules that
182can be applied regardless of context. For more information, see
183@ref{Language and Grammar, , , bison}, in the Bison manual.
184
185@cindex terminal
186@cindex nonterminal
187The formal grammar is formulated using @dfn{terminal} and
188@dfn{nonterminal} items. Terminals can be Emacs Lisp symbols or
189characters, and nonterminals are symbols only.
190
191@cindex token
192Terminals (also known as @dfn{tokens}) represent the lexical
193elements of the language like numbers, strings, etc..
194
195For example @samp{PLUS} can represent the operator @samp{+}.
196
197Nonterminal symbols are described by rules:
198
199@example
200@group
201RESULT @equiv{} COMPONENTS@dots{}
202@end group
203@end example
204
205@samp{RESULT} is a nonterminal that this rule describes and
206@samp{COMPONENTS} are various terminals and nonterminals that are put
207together by this rule.
208
209For example, this rule:
210
211@example
212@group
213exp @equiv{} exp PLUS exp
214@end group
215@end example
216
217Says that two groupings of type @samp{exp}, with a @samp{PLUS} token
218in between, can be combined into a larger grouping of type @samp{exp}.
bfe6ffb6 219
9e7abd17 220@menu
bfe6ffb6
GM
221* Grammar format::
222* Example::
223* Compiling a grammar::
224* Conflicts::
9e7abd17
EL
225@end menu
226
faeea569 227@node Grammar format
9e7abd17
EL
228@section Grammar format
229
230@cindex grammar format
231To be acceptable by Wisent a context-free grammar must respect a
232particular format. That is, must be represented as an Emacs Lisp list
233of the form:
234
235@code{(@var{terminals} @var{assocs} . @var{non-terminals})}
236
237@table @var
238@item terminals
239Is the list of terminal symbols used in the grammar.
240
241@cindex associativity
242@item assocs
243Specify the associativity of @var{terminals}. It is @code{nil} when
244there is no associativity defined, or an alist of
245@w{@code{(@var{assoc-type} . @var{assoc-value})}} elements.
246
247@var{assoc-type} must be one of the @code{default-prec},
248@code{nonassoc}, @code{left} or @code{right} symbols. When
249@var{assoc-type} is @code{default-prec}, @var{assoc-value} must be
250@code{nil} or @code{t} (the default). Otherwise it is a list of
251tokens which must have been previously declared in @var{terminals}.
252
253For details, see @ref{Contextual Precedence, , , bison}, in the
254Bison manual.
255
256@item non-terminals
257Is the list of nonterminal definitions. Each definition has the form:
258
259@code{(@var{nonterm} . @var{rules})}
260
261Where @var{nonterm} is the nonterminal symbol defined and
262@var{rules} the list of rules that describe this nonterminal. Each
263rule is a list:
264
265@code{(@var{components} [@var{precedence}] [@var{action}])}
266
267Where:
268
269@table @var
270@item components
271Is a list of various terminals and nonterminals that are put together
272by this rule.
273
274For example,
275
276@example
277@group
278(exp ((exp ?+ exp)) ;; exp: exp '+' exp
279 ) ;; ;
280@end group
281@end example
282
283Says that two groupings of type @samp{exp}, with a @samp{+} token in
284between, can be combined into a larger grouping of type @samp{exp}.
cccaebd2 285
9e7abd17
EL
286@cindex grammar coding conventions
287By convention, a nonterminal symbol should be in lower case, such as
288@samp{exp}, @samp{stmt} or @samp{declaration}. Terminal symbols
289should be upper case to distinguish them from nonterminals: for
290example, @samp{INTEGER}, @samp{IDENTIFIER}, @samp{IF} or
291@samp{RETURN}. A terminal symbol that represents a particular keyword
292in the language is conventionally the same as that keyword converted
293to upper case. The terminal symbol @code{error} is reserved for error
294recovery.
295
296@cindex middle-rule actions
297Scattered among the components can be @dfn{middle-rule} actions.
298Usually only @var{action} is provided (@pxref{action}).
299
300If @var{components} in a rule is @code{nil}, it means that the rule
301can match the empty string. For example, here is how to define a
302comma-separated sequence of zero or more @samp{exp} groupings:
303
2ecfdaa6 304@smallexample
9e7abd17
EL
305@group
306(expseq (nil) ;; expseq: ;; empty
307 ((expseq1)) ;; | expseq1
308 ) ;; ;
309
310(expseq1 ((exp)) ;; expseq1: exp
311 ((expseq1 ?, exp)) ;; | expseq1 ',' exp
312 ) ;; ;
313@end group
2ecfdaa6 314@end smallexample
9e7abd17
EL
315
316@cindex precedence level
317@item precedence
318Assign the rule the precedence of the given terminal item, overriding
319the precedence that would be deduced for it, that is the one of the
320last terminal in it. Notice that only terminals declared in
321@var{assocs} have a precedence level. The altered rule precedence
322then affects how conflicts involving that rule are resolved.
323
324@var{precedence} is an optional vector of one terminal item.
325
326Here is how @var{precedence} solves the problem of unary minus.
327First, declare a precedence for a fictitious terminal symbol named
328@code{UMINUS}. There are no tokens of this type, but the symbol
329serves to stand for its precedence:
330
331@example
332@dots{}
333((default-prec t) ;; This is the default
334 (left '+' '-')
335 (left '*')
336 (left UMINUS))
337@end example
338
339Now the precedence of @code{UMINUS} can be used in specific rules:
340
2ecfdaa6 341@smallexample
9e7abd17
EL
342@group
343(exp @dots{} ;; exp: @dots{}
344 ((exp ?- exp)) ;; | exp '-' exp
345 @dots{} ;; @dots{}
346 ((?- exp) [UMINUS]) ;; | '-' exp %prec UMINUS
347 @dots{} ;; @dots{}
348 ) ;; ;
349@end group
2ecfdaa6 350@end smallexample
9e7abd17
EL
351
352If you forget to append @code{[UMINUS]} to the rule for unary minus,
353Wisent silently assumes that minus has its usual precedence. This
354kind of problem can be tricky to debug, since one typically discovers
355the mistake only by testing the code.
356
357Using @code{(default-prec nil)} declaration makes it easier to
358discover this kind of problem systematically. It causes rules that
359lack a @var{precedence} modifier to have no precedence, even if the
360last terminal symbol mentioned in their components has a declared
361precedence.
362
363If @code{(default-prec nil)} is in effect, you must specify
364@var{precedence} for all rules that participate in precedence conflict
365resolution. Then you will see any shift/reduce conflict until you
366tell Wisent how to resolve it, either by changing your grammar or by
367adding an explicit precedence. This will probably add declarations to
368the grammar, but it helps to protect against incorrect rule
369precedences.
370
371The effect of @code{(default-prec nil)} can be reversed by giving
372@code{(default-prec t)}, which is the default.
373
374For more details, see @ref{Contextual Precedence, , , bison}, in the
375Bison manual.
376
377It is important to understand that @var{assocs} declarations defines
378associativity but also assign a precedence level to terminals. All
379terminals declared in the same @code{left}, @code{right} or
380@code{nonassoc} association get the same precedence level. The
381precedence level is increased at each new association.
382
383On the other hand, @var{precedence} explicitly assign the precedence
384level of the given terminal to a rule.
385
386@cindex semantic actions
387@item @anchor{action}action
388An action is an optional Emacs Lisp function call, like this:
389
390@code{(identity $1)}
391
392The result of an action determines the semantic value of a rule.
393
394From an implementation standpoint, the function call will be embedded
395in a lambda expression, and several useful local variables will be
396defined:
397
398@table @code
399@vindex $N
400@item $@var{n}
401Where @var{n} is a positive integer. Like in Bison, the value of
402@code{$@var{n}} is the semantic value of the @var{n}th element of
403@var{components}, starting from 1. It can be of any Lisp data
404type.
405
406@vindex $region@var{n}
407@item $regionN
408Where @var{n} is a positive integer. For each @code{$@var{n}}
409variable defined there is a corresponding @code{$region@var{n}}
410variable. Its value is a pair @code{(@var{start-pos} .
411@var{end-pos})} that represent the start and end positions (in the
412lexical input stream) of the @code{$@var{n}} value. It can be
413@code{nil} when the component positions are not available, like for an
414empty string component for example.
415
416@vindex $region
417@item $region
418Its value is the leftmost and rightmost positions of input data
419matched by all @var{components} in the rule. This is a pair
420@code{(@var{leftmost-pos} . @var{rightmost-pos})}. It can be
421@code{nil} when components positions are not available.
422
423@vindex $nterm
424@item $nterm
425This variable is initialized with the nonterminal symbol
426(@var{nonterm}) the rule belongs to. It could be useful to improve
427error reporting or debugging. It is also used to automatically
428provide incremental re-parse entry points for @semantic{} tags
429(@pxref{Wisent Semantic}).
430
431@vindex $action
432@item $action
433The value of @code{$action} is the symbolic name of the current
434semantic action (@pxref{Debugging actions}).
435@end table
436
437When an action is not specified a default value is supplied, it is
438@code{(identity $1)}. This means that the default semantic value of a
439rule is the value of its first component. Excepted for a rule
440matching the empty string, for which the default action is to return
441@code{nil}.
442@end table
443@end table
444
faeea569 445@node Example
9e7abd17
EL
446@section Example
447
448@cindex grammar example
449Here is an example to parse simple infix arithmetic expressions. See
450@ref{Infix Calc, , , bison}, in the Bison manual for details.
451
452@lisp
453@group
454'(
455 ;; Terminals
456 (NUM)
bfe6ffb6 457
9e7abd17
EL
458 ;; Terminal associativity & precedence
459 ((nonassoc ?=)
460 (left ?- ?+)
461 (left ?* ?/)
462 (left NEG)
463 (right ?^))
bfe6ffb6 464
9e7abd17
EL
465 ;; Rules
466 (input
467 ((line))
468 ((input line)
469 (format "%s %s" $1 $2))
470 )
471
472 (line
473 ((?;)
474 (progn ";"))
475 ((exp ?;)
476 (format "%s;" $1))
477 ((error ?;)
478 (progn "Error;")))
479 )
480
481 (exp
482 ((NUM)
483 (string-to-number $1))
484 ((exp ?= exp)
485 (= $1 $3))
486 ((exp ?+ exp)
487 (+ $1 $3))
488 ((exp ?- exp)
489 (- $1 $3))
490 ((exp ?* exp)
491 (* $1 $3))
492 ((exp ?/ exp)
493 (/ $1 $3))
494 ((?- exp) [NEG]
495 (- $2))
496 ((exp ?^ exp)
497 (expt $1 $3))
498 ((?\( exp ?\))
499 (progn $2))
500 )
501 )
502@end group
503@end lisp
504
505In the bison-like @dfn{WY} format (@pxref{Wisent Semantic}) the
506grammar looks like this:
507
508@example
509@group
510%token <number> NUM
511
512%nonassoc '=' ;; comparison
513%left '-' '+'
514%left '*' '/'
515%left NEG ;; negation--unary minus
516%right '^' ;; exponentiation
517
518%%
519
520input:
521 line
522 | input line
523 (format "%s %s" $1 $2)
524 ;
525
526line:
527 ';'
528 @{";"@}
529 | exp ';'
530 (format "%s;" $1)
531 | error ';'
532 @{"Error;"@}
533 ;
534
535exp:
536 NUM
537 (string-to-number $1)
538 | exp '=' exp
539 (= $1 $3)
540 | exp '+' exp
541 (+ $1 $3)
542 | exp '-' exp
543 (- $1 $3)
544 | exp '*' exp
545 (* $1 $3)
546 | exp '/' exp
547 (/ $1 $3)
548 | '-' exp %prec NEG
549 (- $2)
550 | exp '^' exp
551 (expt $1 $3)
552 | '(' exp ')'
553 @{$2@}
554 ;
555
556%%
557@end group
558@end example
559
faeea569 560@node Compiling a grammar
9e7abd17
EL
561@section Compiling a grammar
562
563@cindex automaton
564After providing a context-free grammar in a suitable format, it must
565be translated into a set of tables (an @dfn{automaton}) that will be
566used to derive the parser. Like Bison, Wisent translates grammars that
567must be @dfn{LALR(1)}.
568
569@cindex LALR(1) grammar
570@cindex look-ahead token
571A grammar is @acronym{LALR(1)} if it is possible to tell how to parse
572any portion of an input string with just a single token of look-ahead:
573the @dfn{look-ahead token}. See @ref{Language and Grammar, , ,
574bison}, in the Bison manual for more information.
575
576@cindex grammar compilation
577Grammar translation (compilation) is achieved by the function:
578
579@cindex compiling a grammar
580@vindex wisent-single-start-flag
581@findex wisent-compile-grammar
582@defun wisent-compile-grammar grammar &optional start-list
583Compile @var{grammar} and return an @acronym{LALR(1)} automaton.
584
585Optional argument @var{start-list} is a list of start symbols
586(nonterminals). If @code{nil} the first nonterminal defined in the
587grammar is the default start symbol. If @var{start-list} contains
588only one element, it defines the start symbol. If @var{start-list}
589contains more than one element, all are defined as potential start
590symbols, unless @code{wisent-single-start-flag} is non-@code{nil}. In
591that case the first element of @var{start-list} defines the start
592symbol and others are ignored.
593
594The @acronym{LALR(1)} automaton is a vector of the form:
595
596@code{[@var{actions gotos starts functions}]}
597
598@table @var
599@item actions
600A state/token matrix telling the parser what to do at every state
601based on the current look-ahead token. That is shift, reduce, accept
602or error. See also @ref{Wisent Parsing}.
603
604@item gotos
605A state/nonterminal matrix telling the parser the next state to go to
606after reducing with each rule.
607
608@item starts
609An alist which maps the allowed start symbols (nonterminals) to
610lexical tokens that will be first shifted into the parser stack.
611
612@item functions
613An obarray of semantic action symbols. A semantic action is actually
614an Emacs Lisp function (lambda expression).
615@end table
616@end defun
617
faeea569 618@node Conflicts
9e7abd17
EL
619@section Conflicts
620
621Normally, a grammar should produce an automaton where at each state
622the parser has only one action to do (@pxref{Wisent Parsing}).
623
624@cindex ambiguous grammar
625In certain cases, a grammar can produce an automaton where, at some
626states, there are more than one action possible. Such a grammar is
627@dfn{ambiguous}, and generates @dfn{conflicts}.
628
629@cindex deterministic automaton
630The parser can't be driven by an automaton which isn't completely
631@dfn{deterministic}, that is which contains conflicts. It is
632necessary to resolve the conflicts to eliminate them. Wisent resolves
633conflicts like Bison does.
634
635@cindex grammar conflicts
636@cindex conflicts resolution
637There are two sorts of conflicts:
638
639@table @dfn
640@cindex shift/reduce conflicts
641@item shift/reduce conflicts
642When either a shift or a reduction would be valid at the same state.
643
644Such conflicts are resolved by choosing to shift, unless otherwise
645directed by operator precedence declarations.
646See @ref{Shift/Reduce , , , bison}, in the Bison manual for more
647information.
648
649@cindex reduce/reduce conflicts
650@item reduce/reduce conflicts
651That occurs if there are two or more rules that apply to the same
652sequence of input. This usually indicates a serious error in the
653grammar.
654
655Such conflicts are resolved by choosing to use the rule that appears
656first in the grammar, but it is very risky to rely on this. Every
657reduce/reduce conflict must be studied and usually eliminated. See
658@ref{Reduce/Reduce , , , bison}, in the Bison manual for more
659information.
660@end table
661
662@menu
bfe6ffb6
GM
663* Grammar Debugging::
664* Understanding the automaton::
9e7abd17
EL
665@end menu
666
667@node Grammar Debugging
668@subsection Grammar debugging
669
670@cindex grammar debugging
671@cindex grammar verbose description
672To help writing a new grammar, @code{wisent-compile-grammar} can
673produce a verbose report containing a detailed description of the
674grammar and parser (equivalent to what Bison reports with the
675@option{--verbose} option).
676
677To enable the verbose report you can set to non-@code{nil} the
678variable:
679
680@vindex wisent-verbose-flag
681@deffn Option wisent-verbose-flag
682non-@code{nil} means to report verbose information on generated parser.
683@end deffn
684
685Or interactively use the command:
686
687@findex wisent-toggle-verbose-flag
688@deffn Command wisent-toggle-verbose-flag
689Toggle whether to report verbose information on generated parser.
690@end deffn
691
692The verbose report is printed in the temporary buffer
693@code{*wisent-log*} when running interactively, or in file
694@file{wisent.output} when running in batch mode. Different
695reports are separated from each other by a line like this:
696
697@example
698@group
699*** Wisent @var{source-file} - 2002-06-27 17:33
700@end group
701@end example
702
703where @var{source-file} is the name of the Emacs Lisp file from which
704the grammar was read. See @ref{Understanding the automaton}, for
705details on the verbose report.
706
707@table @strong
708@item Please Note
709To help debugging the grammar compiler itself, you can set this
710variable to print the content of some internal data structures:
711
712@vindex wisent-debug-flag
713@defvar wisent-debug-flag
714non-@code{nil} means enable some debug stuff.
715@end defvar
716@end table
717
718@node Understanding the automaton
719@subsection Understanding the automaton
720
721@cindex understanding the automaton
722This section (took from the manual of Bison 1.49) describes how to use
723the verbose report printed by @code{wisent-compile-grammar} to
724understand the generated automaton, to tune or fix a grammar.
725
726We will use the following example:
727
728@example
729@group
730(let ((wisent-verbose-flag t)) ;; Print a verbose report!
731 (wisent-compile-grammar
732 '((NUM STR) ; %token NUM STR
733
bfe6ffb6 734 ((left ?+ ?-) ; %left '+' '-';
9e7abd17
EL
735 (left ?*)) ; %left '*'
736
737 (exp ; exp:
738 ((exp ?+ exp)) ; exp '+' exp
739 ((exp ?- exp)) ; | exp '-' exp
740 ((exp ?* exp)) ; | exp '*' exp
741 ((exp ?/ exp)) ; | exp '/' exp
742 ((NUM)) ; | NUM
743 ) ; ;
744
745 (useless ; useless:
746 ((STR)) ; STR
747 ) ; ;
748 )
749 'nil) ; no %start declarations
750 )
751@end group
752@end example
753
754When evaluating the above expression, grammar compilation first issues
755the following two clear messages:
756
757@example
758@group
759Grammar contains 1 useless nonterminals and 1 useless rules
760Grammar contains 7 shift/reduce conflicts
761@end group
762@end example
763
764The @samp{*wisent-log*} buffer details things!
765
766The first section reports conflicts that were solved using precedence
767and/or associativity:
768
769@example
770@group
771Conflict in state 7 between rule 1 and token '+' resolved as reduce.
772Conflict in state 7 between rule 1 and token '-' resolved as reduce.
773Conflict in state 7 between rule 1 and token '*' resolved as shift.
774Conflict in state 8 between rule 2 and token '+' resolved as reduce.
775Conflict in state 8 between rule 2 and token '-' resolved as reduce.
776Conflict in state 8 between rule 2 and token '*' resolved as shift.
777Conflict in state 9 between rule 3 and token '+' resolved as reduce.
778Conflict in state 9 between rule 3 and token '-' resolved as reduce.
779Conflict in state 9 between rule 3 and token '*' resolved as reduce.
780@end group
781@end example
782
783The next section reports useless tokens, nonterminal and rules (note
784that useless tokens might be used by the scanner):
785
786@example
787@group
788Useless nonterminals:
789
790 useless
791
792
793Terminals which are not used:
794
795 STR
796
797
798Useless rules:
799
800#6 useless: STR;
801@end group
802@end example
803
804The next section lists states that still have conflicts:
805
806@example
807@group
808State 7 contains 1 shift/reduce conflict.
809State 8 contains 1 shift/reduce conflict.
810State 9 contains 1 shift/reduce conflict.
811State 10 contains 4 shift/reduce conflicts.
812@end group
813@end example
814
815The next section reproduces the grammar used:
816
817@example
818@group
819Grammar
820
821 Number, Rule
822 1 exp -> exp '+' exp
823 2 exp -> exp '-' exp
824 3 exp -> exp '*' exp
825 4 exp -> exp '/' exp
826 5 exp -> NUM
827@end group
828@end example
829
830And reports the uses of the symbols:
831
832@example
833@group
834Terminals, with rules where they appear
835
836$EOI (-1)
837error (1)
838NUM (2) 5
839STR (3) 6
840'+' (4) 1
841'-' (5) 2
842'*' (6) 3
843'/' (7) 4
844
845
846Nonterminals, with rules where they appear
847
848exp (8)
849 on left: 1 2 3 4 5, on right: 1 2 3 4
850@end group
851@end example
852
853The report then details the automaton itself, describing each state
854with it set of @dfn{items}, also known as @dfn{pointed rules}. Each
855item is a production rule together with a point (marked by @samp{.})
856that the input cursor.
857
858@example
859@group
860state 0
861
862 NUM shift, and go to state 1
863
864 exp go to state 2
865@end group
866@end example
867
868State 0 corresponds to being at the very beginning of the parsing, in
869the initial rule, right before the start symbol (@samp{exp}). When
870the parser returns to this state right after having reduced a rule
871that produced an @samp{exp}, it jumps to state 2. If there is no such
872transition on a nonterminal symbol, and the lookahead is a @samp{NUM},
873then this token is shifted on the parse stack, and the control flow
874jumps to state 1. Any other lookahead triggers a parse error.
875
876In the state 1...
877
878@example
879@group
880state 1
881
882 exp -> NUM . (rule 5)
883
884 $default reduce using rule 5 (exp)
885@end group
886@end example
887
888the rule 5, @samp{exp: NUM;}, is completed. Whatever the lookahead
889(@samp{$default}), the parser will reduce it. If it was coming from
890state 0, then, after this reduction it will return to state 0, and
891will jump to state 2 (@samp{exp: go to state 2}).
892
893@example
894@group
895state 2
896
897 exp -> exp . '+' exp (rule 1)
898 exp -> exp . '-' exp (rule 2)
899 exp -> exp . '*' exp (rule 3)
900 exp -> exp . '/' exp (rule 4)
901
902 $EOI shift, and go to state 11
903 '+' shift, and go to state 3
904 '-' shift, and go to state 4
905 '*' shift, and go to state 5
906 '/' shift, and go to state 6
907@end group
908@end example
909
910In state 2, the automaton can only shift a symbol. For instance,
911because of the item @samp{exp -> exp . '+' exp}, if the lookahead if
912@samp{+}, it will be shifted on the parse stack, and the automaton
913control will jump to state 3, corresponding to the item
914@samp{exp -> exp . '+' exp}:
915
916@example
917@group
918state 3
919
920 exp -> exp '+' . exp (rule 1)
921
922 NUM shift, and go to state 1
923
924 exp go to state 7
925@end group
926@end example
927
928Since there is no default action, any other token than those listed
929above will trigger a parse error.
930
931The interpretation of states 4 to 6 is straightforward:
932
933@example
934@group
935state 4
936
937 exp -> exp '-' . exp (rule 2)
938
939 NUM shift, and go to state 1
940
941 exp go to state 8
942
943
944
945state 5
946
947 exp -> exp '*' . exp (rule 3)
948
949 NUM shift, and go to state 1
950
951 exp go to state 9
952
953
954
955state 6
956
957 exp -> exp '/' . exp (rule 4)
958
959 NUM shift, and go to state 1
960
961 exp go to state 10
962@end group
963@end example
964
965As was announced in beginning of the report, @samp{State 7 contains 1
966shift/reduce conflict.}:
967
968@example
969@group
970state 7
971
972 exp -> exp . '+' exp (rule 1)
973 exp -> exp '+' exp . (rule 1)
974 exp -> exp . '-' exp (rule 2)
975 exp -> exp . '*' exp (rule 3)
976 exp -> exp . '/' exp (rule 4)
977
978 '*' shift, and go to state 5
979 '/' shift, and go to state 6
980
981 '/' [reduce using rule 1 (exp)]
982 $default reduce using rule 1 (exp)
983@end group
984@end example
985
986Indeed, there are two actions associated to the lookahead @samp{/}:
987either shifting (and going to state 6), or reducing rule 1. The
988conflict means that either the grammar is ambiguous, or the parser
989lacks information to make the right decision. Indeed the grammar is
990ambiguous, as, since we did not specify the precedence of @samp{/},
991the sentence @samp{NUM + NUM / NUM} can be parsed as @samp{NUM + (NUM
992/ NUM)}, which corresponds to shifting @samp{/}, or as @samp{(NUM +
993NUM) / NUM}, which corresponds to reducing rule 1.
994
995Because in @acronym{LALR(1)} parsing a single decision can be made,
996Wisent arbitrarily chose to disable the reduction, see
997@ref{Conflicts}. Discarded actions are reported in between square
998brackets.
999
1000Note that all the previous states had a single possible action: either
1001shifting the next token and going to the corresponding state, or
1002reducing a single rule. In the other cases, i.e., when shifting
1003@emph{and} reducing is possible or when @emph{several} reductions are
1004possible, the lookahead is required to select the action. State 7 is
1005one such state: if the lookahead is @samp{*} or @samp{/} then the
1006action is shifting, otherwise the action is reducing rule 1. In other
1007words, the first two items, corresponding to rule 1, are not eligible
1008when the lookahead is @samp{*}, since we specified that @samp{*} has
1009higher precedence that @samp{+}. More generally, some items are
1010eligible only with some set of possible lookaheads.
1011
1012States 8 to 10 are similar:
1013
1014@example
1015@group
1016state 8
1017
1018 exp -> exp . '+' exp (rule 1)
1019 exp -> exp . '-' exp (rule 2)
1020 exp -> exp '-' exp . (rule 2)
1021 exp -> exp . '*' exp (rule 3)
1022 exp -> exp . '/' exp (rule 4)
1023
1024 '*' shift, and go to state 5
1025 '/' shift, and go to state 6
1026
1027 '/' [reduce using rule 2 (exp)]
1028 $default reduce using rule 2 (exp)
1029
1030
9e7abd17
EL
1031state 9
1032
1033 exp -> exp . '+' exp (rule 1)
1034 exp -> exp . '-' exp (rule 2)
1035 exp -> exp . '*' exp (rule 3)
1036 exp -> exp '*' exp . (rule 3)
1037 exp -> exp . '/' exp (rule 4)
1038
1039 '/' shift, and go to state 6
1040
1041 '/' [reduce using rule 3 (exp)]
1042 $default reduce using rule 3 (exp)
1043
1044
9e7abd17
EL
1045state 10
1046
1047 exp -> exp . '+' exp (rule 1)
1048 exp -> exp . '-' exp (rule 2)
1049 exp -> exp . '*' exp (rule 3)
1050 exp -> exp . '/' exp (rule 4)
1051 exp -> exp '/' exp . (rule 4)
1052
1053 '+' shift, and go to state 3
1054 '-' shift, and go to state 4
1055 '*' shift, and go to state 5
1056 '/' shift, and go to state 6
1057
1058 '+' [reduce using rule 4 (exp)]
1059 '-' [reduce using rule 4 (exp)]
1060 '*' [reduce using rule 4 (exp)]
1061 '/' [reduce using rule 4 (exp)]
1062 $default reduce using rule 4 (exp)
1063@end group
1064@end example
1065
1066Observe that state 10 contains conflicts due to the lack of precedence
1067of @samp{/} wrt @samp{+}, @samp{-}, and @samp{*}, but also because the
1068associativity of @samp{/} is not specified.
1069
1070Finally, the state 11 (plus 12) is named the @dfn{final state}, or the
1071@dfn{accepting state}:
1072
1073@example
1074@group
1075state 11
1076
1077 $EOI shift, and go to state 12
1078
1079
1080
1081state 12
1082
1083 $default accept
1084@end group
1085@end example
1086
1087The end of input is shifted @samp{$EOI shift,} and the parser exits
1088successfully (@samp{go to state 12}, that terminates).
1089
1090@node Wisent Parsing
1091@chapter Wisent Parsing
1092
1093@cindex bottom-up parser
1094@cindex shift-reduce parser
1095The Wisent's parser is what is called a @dfn{bottom-up} or
1096@dfn{shift-reduce} parser which repeatedly:
1097
1098@table @dfn
1099@cindex shift
1100@item shift
1101That is pushes the value of the last lexical token read (the
1102look-ahead token) into a value stack, and reads a new one.
1103
1104@cindex reduce
1105@item reduce
1106That is replaces a nonterminal by its semantic value. The values of
1107the components which form the right hand side of a rule are popped
1108from the value stack and reduced by the semantic action of this rule.
1109The result is pushed back on top of value stack.
1110@end table
1111
1112The parser will stop on:
1113
1114@table @dfn
1115@cindex accept
1116@item accept
1117When all input has been successfully parsed. The semantic value of
1118the start nonterminal is on top of the value stack.
1119
1120@cindex syntax error
1121@item error
1122When a syntax error (an unexpected token in input) has been detected.
1123At this point the parser issues an error message and either stops or
1124calls a recovery routine to try to resume parsing.
1125@end table
1126
1127@cindex table-driven parser
1128The above elementary actions are driven by the @acronym{LALR(1)}
1129automaton built by @code{wisent-compile-grammar} from a context-free
1130grammar.
1131
1132The Wisent's parser is entered by calling the function:
1133
1134@findex wisent-parse
1135@defun wisent-parse automaton lexer &optional error start
1136Parse input using the automaton specified in @var{automaton}.
1137
1138@table @var
1139@item automaton
1140Is an @acronym{LALR(1)} automaton generated by
1141@code{wisent-compile-grammar} (@pxref{Wisent Grammar}).
1142
1143@item lexer
1144Is a function with no argument called by the parser to obtain the next
1145terminal (token) in input (@pxref{Writing a lexer}).
1146
1147@item error
1148Is an optional reporting function called when a parse error occurs.
1149It receives a message string to report. It defaults to the function
1150@code{wisent-message} (@pxref{Report errors}).
1151
1152@item start
1153Specify the start symbol (nonterminal) used by the parser as its goal.
1154It defaults to the start symbol defined in the grammar
1155(@pxref{Wisent Grammar}).
1156@end table
1157@end defun
1158
1159The following two normal hooks permit to do some useful processing
1160respectively before to start parsing, and after the parser terminated.
1161
1162@vindex wisent-pre-parse-hook
1163@defvar wisent-pre-parse-hook
1164Normal hook run just before entering the @var{LR} parser engine.
1165@end defvar
1166
1167@vindex wisent-post-parse-hook
1168@defvar wisent-post-parse-hook
1169Normal hook run just after the @var{LR} parser engine terminated.
1170@end defvar
1171
1172@menu
bfe6ffb6
GM
1173* Writing a lexer::
1174* Actions goodies::
1175* Report errors::
1176* Error recovery::
1177* Debugging actions::
9e7abd17
EL
1178@end menu
1179
1180@node Writing a lexer
1181@section What the parser must receive
1182
1183It is important to understand that the parser does not parse
1184characters, but lexical tokens, and does not know anything about
1185characters in text streams!
1186
1187@cindex lexical analysis
1188@cindex lexer
1189@cindex scanner
1190Reading input data to produce lexical tokens is performed by a lexer
1191(also called a scanner) in a lexical analysis step, before the syntax
1192analysis step performed by the parser. The parser automatically calls
1193the lexer when it needs the next token to parse.
1194
1195@cindex lexical tokens
1196A Wisent's lexer is an Emacs Lisp function with no argument. It must
1197return a valid lexical token of the form:
1198
1199@code{(@var{token-class value} [@var{start} . @var{end}])}
1200
1201@table @var
1202@item token-class
1203Is a category of lexical token identifying a terminal as specified in
1204the grammar (@pxref{Wisent Grammar}). It can be a symbol or a character
1205literal.
1206
1207@item value
1208Is the value of the lexical token. It can be of any valid Emacs Lisp
1209data type.
1210
1211@item start
1212@itemx end
cccaebd2 1213Are the optional beginning and ending positions of @var{value} in the
9e7abd17
EL
1214input stream.
1215@end table
1216
1217When there are no more tokens to read the lexer must return the token
1218@code{(list wisent-eoi-term)} to each request.
1219
1220@vindex wisent-eoi-term
1221@defvar wisent-eoi-term
1222Predefined constant, End-Of-Input terminal symbol.
1223@end defvar
1224
1225@code{wisent-lex} is an example of a lexer that reads lexical tokens
1226produced by a @semantic{} lexer, and translates them into lexical tokens
1227suitable to the Wisent parser. See also @ref{Wisent Lex}.
1228
1229To call the lexer in a semantic action use the function
1230@code{wisent-lexer}. See also @ref{Actions goodies}.
1231
1232@node Actions goodies
1233@section Variables and macros useful in grammar actions.
1234
1235@vindex wisent-input
1236@defvar wisent-input
1237The last token read.
1238This variable only has meaning in the scope of @code{wisent-parse}.
1239@end defvar
1240
1241@findex wisent-lexer
1242@defun wisent-lexer
1243Obtain the next terminal in input.
1244@end defun
1245
1246@findex wisent-region
1247@defun wisent-region &rest positions
1248Return the start/end positions of the region including
1249@var{positions}. Each element of @var{positions} is a pair
1250@w{@code{(@var{start-pos} . @var{end-pos})}} or @code{nil}. The
1251returned value is the pair @w{@code{(@var{min-start-pos} .
1252@var{max-end-pos})}} or @code{nil} if no @var{positions} are
1253available.
1254@end defun
1255
1256@node Report errors
1257@section The error reporting function
1258
1259@cindex error reporting
1260When the parser encounters a syntax error it calls a user-defined
1261function. It must be an Emacs Lisp function with one argument: a
1262string containing the message to report.
1263
1264By default the parser uses this function to report error messages:
1265
1266@findex wisent-message
1267@defun wisent-message string &rest args
1268Print a one-line message if @code{wisent-parse-verbose-flag} is set.
1269Pass @var{string} and @var{args} arguments to @dfn{message}.
1270@end defun
1271
1272@table @strong
1273@item Please Note:
1274@code{wisent-message} uses the following function to print lexical
1275tokens:
1276
1277@defun wisent-token-to-string token
1278Return a printed representation of lexical token @var{token}.
1279@end defun
1280
1281The general printed form of a lexical token is:
1282
1283@w{@code{@var{token}(@var{value})@@@var{location}}}
1284@end table
1285
1286To control the verbosity of the parser you can set to non-@code{nil}
1287this variable:
1288
1289@vindex wisent-parse-verbose-flag
1290@deffn Option wisent-parse-verbose-flag
1291non-@code{nil} means to issue more messages while parsing.
1292@end deffn
1293
1294Or interactively use the command:
1295
1296@findex wisent-parse-toggle-verbose-flag
1297@deffn Command wisent-parse-toggle-verbose-flag
1298Toggle whether to issue more messages while parsing.
1299@end deffn
1300
1301When the error reporting function is entered the variable
1302@code{wisent-input} contains the unexpected token as returned by the
1303lexer.
1304
1305The error reporting function can be called from a semantic action too
1306using the special macro @code{wisent-error}. When called from a
1307semantic action entered by error recovery (@pxref{Error recovery}) the
1308value of the variable @code{wisent-recovering} is non-@code{nil}.
1309
1310@node Error recovery
1311@section Error recovery
1312
1313@cindex error recovery
1314The error recovery mechanism of the Wisent's parser conforms to the
1315one Bison uses. See @ref{Error Recovery, , , bison}, in the Bison
1316manual for details.
1317
1318@cindex error token
1319To recover from a syntax error you must write rules to recognize the
1320special token @code{error}. This is a terminal symbol that is
1321automatically defined and reserved for error handling.
1322
1323When the parser encounters a syntax error, it pops the state stack
1324until it finds a state that allows shifting the @code{error} token.
1325After it has been shifted, if the old look-ahead token is not
1326acceptable to be shifted next, the parser reads tokens and discards
1327them until it finds a token which is acceptable.
1328
1329@cindex error recovery strategy
1330Strategies for error recovery depend on the choice of error rules in
1331the grammar. A simple and useful strategy is simply to skip the rest
1332of the current statement if an error is detected:
1333
1334@example
1335@group
cccaebd2
PE
1336(statement (( error ?; )) ;; on error, skip until ';' is read
1337 )
9e7abd17
EL
1338@end group
1339@end example
1340
1341It is also useful to recover to the matching close-delimiter of an
1342opening-delimiter that has already been parsed:
1343
1344@example
1345@group
1346(primary (( ?@{ expr ?@} ))
1347 (( ?@{ error ?@} ))
1348 @dots{}
1349 )
1350@end group
1351@end example
1352
1353@cindex error recovery actions
1354Note that error recovery rules may have actions, just as any other
1355rules can. Here are some predefined hooks, variables, functions or
1356macros, useful in such actions:
1357
1358@vindex wisent-nerrs
1359@defvar wisent-nerrs
1360The number of parse errors encountered so far.
1361@end defvar
1362
1363@vindex wisent-recovering
1364@defvar wisent-recovering
1365non-@code{nil} means that the parser is recovering.
1366This variable only has meaning in the scope of @code{wisent-parse}.
1367@end defvar
1368
1369@findex wisent-error
1370@defun wisent-error msg
1371Call the user supplied error reporting function with message
1372@var{msg} (@pxref{Report errors}).
1373
1374For an example of use, @xref{wisent-skip-token}.
1375@end defun
1376
1377@findex wisent-errok
1378@defun wisent-errok
1379Resume generating error messages immediately for subsequent syntax
1380errors.
1381
1382The parser suppress error message for syntax errors that happens
1383shortly after the first, until three consecutive input tokens have
1384been successfully shifted.
1385
1386Calling @code{wisent-errok} in an action, make error messages resume
1387immediately. No error messages will be suppressed if you call it in
1388an error rule's action.
1389
1390For an example of use, @xref{wisent-skip-token}.
1391@end defun
1392
1393@findex wisent-clearin
1394@defun wisent-clearin
1395Discard the current lookahead token.
1396This will cause a new lexical token to be read.
1397
1398In an error rule's action the previous lookahead token is reanalyzed
1399immediately. @code{wisent-clearin} may be called to clear this token.
1400
1401For example, suppose that on a parse error, an error handling routine
1402is called that advances the input stream to some point where parsing
1403should once again commence. The next symbol returned by the lexical
1404scanner is probably correct. The previous lookahead token ought to
1405be discarded with @code{wisent-clearin}.
1406
1407For an example of use, @xref{wisent-skip-token}.
1408@end defun
1409
1410@findex wisent-abort
1411@defun wisent-abort
1412Abort parsing and save the lookahead token.
1413@end defun
1414
1415@findex wisent-set-region
1416@defun wisent-set-region start end
1417Change the region of text matched by the current nonterminal.
1418@var{start} and @var{end} are respectively the beginning and end
1419positions of the region occupied by the group of components associated
1420to this nonterminal. If @var{start} or @var{end} values are not a
1421valid positions the region is set to @code{nil}.
1422
1423For an example of use, @xref{wisent-skip-token}.
1424@end defun
1425
1426@vindex wisent-discarding-token-functions
1427@defvar wisent-discarding-token-functions
1428List of functions to be called when discarding a lexical token.
1429These functions receive the lexical token discarded.
1430When the parser encounters unexpected tokens, it can discards them,
1431based on what directed by error recovery rules. Either when the
1432parser reads tokens until one is found that can be shifted, or when an
1433semantic action calls the function @code{wisent-skip-token} or
1434@code{wisent-skip-block}.
1435For language specific hooks, make sure you define this as a local
1436hook.
1437
1438For example, in @semantic{}, this hook is set to the function
1439@code{wisent-collect-unmatched-syntax} to collect unmatched lexical
1440tokens (@pxref{Useful functions}).
1441@end defvar
1442
1443@findex wisent-skip-token
1444@defun wisent-skip-token
1445@anchor{wisent-skip-token}
1446Skip the lookahead token in order to resume parsing.
1447Return nil.
1448Must be used in error recovery semantic actions.
1449
1450It typically looks like this:
1451
1452@lisp
1453@group
1454(wisent-message "%s: skip %s" $action
1455 (wisent-token-to-string wisent-input))
1456(run-hook-with-args
1457 'wisent-discarding-token-functions wisent-input)
1458(wisent-clearin)
1459(wisent-errok)))
1460@end group
1461@end lisp
1462@end defun
1463
1464@findex wisent-skip-block
1465@defun wisent-skip-block
1466Safely skip a block in order to resume parsing.
1467Return nil.
1468Must be used in error recovery semantic actions.
1469
1470A block is data between an open-delimiter (syntax class @code{(}) and
1471a matching close-delimiter (syntax class @code{)}):
1472
1473@example
1474@group
1475(a parenthesized block)
1476[a block between brackets]
1477@{a block between braces@}
1478@end group
1479@end example
1480
1481The following example uses @code{wisent-skip-block} to safely skip a
1482block delimited by @samp{LBRACE} (@code{@{}) and @samp{RBRACE}
1483(@code{@}}) tokens, when a syntax error occurs in
1484@samp{other-components}:
1485
1486@example
1487@group
1488(block ((LBRACE other-components RBRACE))
1489 ((LBRACE RBRACE))
1490 ((LBRACE error)
1491 (wisent-skip-block))
1492 )
1493@end group
1494@end example
1495@end defun
1496
1497@node Debugging actions
1498@section Debugging semantic actions
1499
1500@cindex semantic action symbols
1501Each semantic action is represented by a symbol interned in an
1502@dfn{obarray} that is part of the @acronym{LALR(1)} automaton
1503(@pxref{Compiling a grammar}). @code{symbol-function} on a semantic
1504action symbol return the semantic action lambda expression.
1505
1506A semantic action symbol name has the form
1507@code{@var{nonterminal}:@var{index}}, where @var{nonterminal} is the
1508name of the nonterminal symbol the action belongs to, and @var{index}
1509is an action sequence number within the scope of @var{nonterminal}.
1510For example, this nonterminal definition:
1511
1512@example
1513@group
1514input:
1515 line [@code{input:0}]
1516 | input line
1517 (format "%s %s" $1 $2) [@code{input:1}]
1518 ;
1519@end group
1520@end example
1521
1522Will produce two semantic actions, and associated symbols:
1523
1524@table @code
1525@item input:0
1526A default action that returns @code{$1}.
1527
1528@item input:1
1529That returns @code{(format "%s %s" $1 $2)}.
1530@end table
1531
1532@cindex debugging semantic actions
1533Debugging uses the Lisp debugger to investigate what is happening
1534during execution of semantic actions.
1535Three commands are available to debug semantic actions. They receive
1536two arguments:
1537
1538@itemize @bullet
1539@item The automaton that contains the semantic action.
1540
1541@item The semantic action symbol.
1542@end itemize
1543
1544@findex wisent-debug-on-entry
1545@deffn Command wisent-debug-on-entry automaton function
1546Request @var{automaton}'s @var{function} to invoke debugger each time it is called.
1547@var{function} must be a semantic action symbol that exists in @var{automaton}.
1548@end deffn
1549
1550@findex wisent-cancel-debug-on-entry
1551@deffn Command wisent-cancel-debug-on-entry automaton function
1552Undo effect of @code{wisent-debug-on-entry} on @var{automaton}'s @var{function}.
1553@var{function} must be a semantic action symbol that exists in @var{automaton}.
1554@end deffn
1555
1556@findex wisent-debug-show-entry
1557@deffn Command wisent-debug-show-entry automaton function
1558Show the source of @var{automaton}'s semantic action @var{function}.
1559@var{function} must be a semantic action symbol that exists in @var{automaton}.
1560@end deffn
1561
1562@node Wisent Semantic
1563@chapter How to use Wisent with Semantic
1564
1565@cindex tags
1566This section presents how the Wisent's parser can be used to produce
1567@dfn{tags} for the @semantic{} tool set.
1568
1569@semantic{} tags form a hierarchy of Emacs Lisp data structures that
1570describes a program in a way independent of programming languages.
1571Tags map program declarations, like functions, methods, variables,
1572data types, classes, includes, grammar rules, etc..
1573
1574@cindex WY grammar format
1575To use the Wisent parser with @semantic{} you have to define
1576your grammar in @dfn{WY} form, a grammar format very close
1577to the one used by Bison.
1578
1579Please @inforef{top, Semantic Grammar Framework Manual, grammar-fw}
1580for more information on @semantic{} grammars.
1581
1582@menu
bfe6ffb6
GM
1583* Grammar styles::
1584* Wisent Lex::
9e7abd17
EL
1585@end menu
1586
1587@node Grammar styles
1588@section Grammar styles
1589
1590@cindex grammar styles
1591@semantic{} parsing heavily depends on how you wrote the grammar.
1592There are mainly two styles to write a Wisent's grammar intended to be
1593used with the @semantic{} tool set: the @dfn{Iterative style} and the
1594@dfn{Bison style}. Each one has pros and cons, and in certain cases
1595it can be worth a mix of the two styles!
1596
1597@menu
bfe6ffb6
GM
1598* Iterative style::
1599* Bison style::
1600* Mixed style::
1601* Start nonterminals::
1602* Useful functions::
9e7abd17
EL
1603@end menu
1604
faeea569 1605@node Iterative style
9e7abd17
EL
1606@subsection Iterative style
1607
1608@cindex grammar iterative style
1609The @dfn{iterative style} is the preferred style to use with @semantic{}.
1610It relies on an iterative parser back-end mechanism which parses start
1611nonterminals one at a time and automagically skips unexpected lexical
1612tokens in input.
1613
1614Compared to rule-based iterative functions (@pxref{Bison style}),
1615iterative parsers are better in that they can handle obscure errors
1616more cleanly.
1617
1618@cindex raw tag
1619Each start nonterminal must produces a @dfn{raw tag} by calling a
1620@code{TAG}-like grammar macro with appropriate parameters. See also
1621@ref{Start nonterminals}.
1622
1623@cindex expanded tag
1624Then, each parsing iteration automatically translates a raw tag into
1625@dfn{expanded tags}, updating the raw tag structure with internal
1626properties and buffer related data.
1627
1628After parsing completes, it results in a tree of expanded tags.
1629
1630The following example is a snippet of the iterative style Java grammar
1631provided in the @semantic{} distribution in the file
1632@file{semantic/wisent/java-tags.wy}.
1633
1634@example
1635@group
1636@dots{}
1637;; Alternate entry points
1638;; - Needed by partial re-parse
1639%start formal_parameter
1640@dots{}
1641;; - Needed by EXPANDFULL clauses
1642%start formal_parameters
1643@dots{}
1644
1645formal_parameter_list
1646 : PAREN_BLOCK
1647 (EXPANDFULL $1 formal_parameters)
1648 ;
1649
1650formal_parameters
1651 : LPAREN
1652 ()
1653 | RPAREN
1654 ()
1655 | formal_parameter COMMA
1656 | formal_parameter RPAREN
1657 ;
1658
1659formal_parameter
1660 : formal_parameter_modifier_opt type variable_declarator_id
1661 (VARIABLE-TAG $3 $2 nil :typemodifiers $1)
1662 ;
1663@end group
1664@end example
1665
1666@findex EXPANDFULL
1667It shows the use of the @code{EXPANDFULL} grammar macro to parse a
1668@samp{PAREN_BLOCK} which contains a @samp{formal_parameter_list}.
1669@code{EXPANDFULL} tells to recursively parse @samp{formal_parameters}
1670inside @samp{PAREN_BLOCK}. The parser iterates until it digested all
1671available input data inside the @samp{PAREN_BLOCK}, trying to match
1672any of the @samp{formal_parameters} rules:
1673
1674@itemize
1675@item @samp{LPAREN}
1676
1677@item @samp{RPAREN}
1678
1679@item @samp{formal_parameter COMMA}
1680
1681@item @samp{formal_parameter RPAREN}
1682@end itemize
1683
1684At each iteration it will return a @samp{formal_parameter} raw tag,
1685or @code{nil} to skip unwanted (single @samp{LPAREN} or @samp{RPAREN}
1686for example) or unexpected input data. Those raw tags will be
1687automatically expanded by the iterative back-end parser.
1688
1689@node Bison style
1690@subsection Bison style
1691
1692@cindex grammar bison style
1693What we call the @dfn{Bison style} is the traditional style of Bison's
1694grammars. Compared to iterative style, it is not straightforward to
1695use grammars written in Bison style in @semantic{}. Mainly because such
1696grammars are designed to parse the whole input data in one pass, and
1697don't use the iterative parser back-end mechanism (@pxref{Iterative
1698style}). With Bison style the parser is called once to parse the
1699grammar start nonterminal.
1700
1701The following example is a snippet of the Bison style Java grammar
1702provided in the @semantic{} distribution in the file
1703@file{semantic/wisent/java.wy}.
1704
1705@example
1706@group
1707%start formal_parameter
1708@dots{}
1709
1710formal_parameter_list
1711 : formal_parameter_list COMMA formal_parameter
1712 (cons $3 $1)
1713 | formal_parameter
1714 (list $1)
1715 ;
1716
1717formal_parameter
1718 : formal_parameter_modifier_opt type variable_declarator_id
1719 (EXPANDTAG
1720 (VARIABLE-TAG $3 $2 :typemodifiers $1)
1721 )
1722 ;
1723@end group
1724@end example
1725
1726The first consequence is that syntax errors are not automatically
1727handled by @semantic{}. Thus, it is necessary to explicitly handle
1728them at the grammar level, providing error recovery rules to skip
1729unexpected input data.
1730
1731The second consequence is that the iterative parser can't do automatic
1732tag expansion, except for the start nonterminal value. It is
1733necessary to explicitly expand tags from concerned semantic actions by
1734calling the grammar macro @code{EXPANDTAG} with a raw tag as
1735parameter. See also @ref{Start nonterminals}, for incremental
1736re-parse considerations.
1737
1738@node Mixed style
1739@subsection Mixed style
1740
1741@cindex grammar mixed style
1742@example
1743@group
1744%start grammar
1745;; Reparse
1746%start prologue epilogue declaration nonterminal rule
1747@dots{}
1748
1749%%
1750
1751grammar:
1752 prologue
1753 | epilogue
1754 | declaration
1755 | nonterminal
1756 | PERCENT_PERCENT
1757 ;
1758@dots{}
1759
1760nonterminal:
1761 SYMBOL COLON rules SEMI
1762 (TAG $1 'nonterminal :children $3)
1763 ;
1764
1765rules:
1766 lifo_rules
1767 (apply 'nconc (nreverse $1))
1768 ;
1769
1770lifo_rules:
1771 lifo_rules OR rule
1772 (cons $3 $1)
1773 | rule
1774 (list $1)
1775 ;
1776
1777rule:
1778 rhs
1779 (let* ((rhs $1)
1780 name type comps prec action elt)
1781 @dots{}
1782 (EXPANDTAG
1783 (TAG name 'rule :type type :value comps :prec prec :expr action)
1784 ))
1785 ;
1786@end group
1787@end example
1788
1789This example shows how iterative and Bison styles can be combined in
1790the same grammar to obtain a good compromise between grammar
1791complexity and an efficient parsing strategy in an interactive
1792environment.
1793
1794@samp{nonterminal} is parsed using iterative style via the main
1795@samp{grammar} rule. The semantic action uses the @code{TAG} macro to
1796produce a raw tag, automagically expanded by @semantic{}.
1797
1798But @samp{rules} part is parsed in Bison style! Why?
1799
1800Rule delimiters are the colon (@code{:}), that follows the nonterminal
1801name, and a final semicolon (@code{;}). Unfortunately these
1802delimiters are not @code{open-paren}/@code{close-paren} type, and the
1803Emacs' syntactic analyzer can't easily isolate data between them to
1804produce a @samp{RULES_PART} parenthesis-block-like lexical token.
1805Consequently it is not possible to use @code{EXPANDFULL} to iterate in
1806@samp{RULES_PART}, like this:
1807
1808@example
1809@group
1810nonterminal:
1811 SYMBOL COLON rules SEMI
1812 (TAG $1 'nonterminal :children $3)
1813 ;
1814
1815rules:
1816 RULES_PART ;; @strong{Map a parenthesis-block-like lexical token}
1817 (EXPANDFULL $1 'rules)
1818 ;
1819
1820rules:
1821 COLON
1822 ()
1823 OR
1824 ()
1825 SEMI
1826 ()
1827 rhs
1828 rhs
1829 (let* ((rhs $1)
1830 name type comps prec action elt)
1831 @dots{}
1832 (TAG name 'rule :type type :value comps :prec prec :expr action)
1833 )
1834 ;
1835@end group
1836@end example
1837
1838In such cases, when it is difficult for Emacs to obtain
1839parenthesis-block-like lexical tokens, the best solution is to use the
1840traditional Bison style with error recovery!
1841
1842In some extreme cases, it can also be convenient to extend the lexer,
1843to deliver new lexical tokens, to simplify the grammar.
1844
1845@node Start nonterminals
1846@subsection Start nonterminals
1847
1848@cindex start nonterminals
1849@cindex @code{reparse-symbol} property
1850When you write a grammar for @semantic{}, it is important to carefully
1851indicate the start nonterminals. Each one defines an entry point in
1852the grammar, and after parsing its semantic value is returned to the
1853back-end iterative engine. Consequently:
1854
1855@strong{The semantic value of a start nonterminal must be a produced
1856by a TAG like grammar macro}.
1857
1858Start nonterminals are declared by @code{%start} statements. When
1859nothing is specified the first nonterminal that appears in the grammar
1860is the start nonterminal.
1861
1862Generally, the following nonterminals must be declared as start
1863symbols:
1864
1865@itemize @bullet
1866@item The main grammar entry point
1867@quotation
1868Of course!
1869@end quotation
1870
1871@item nonterminals passed to @code{EXPAND}/@code{EXPANDFULL}
1872@quotation
1873These grammar macros recursively parse a part of input data, based on
1874rules of the given nonterminal.
1875
1876For example, the following will parse @samp{PAREN_BLOCK} data using
1877the @samp{formal_parameters} rules:
1878
1879@example
1880@group
1881formal_parameter_list
1882 : PAREN_BLOCK
1883 (EXPANDFULL $1 formal_parameters)
1884 ;
1885@end group
1886@end example
1887
1888The semantic value of @samp{formal_parameters} becomes the value of
1889the @code{EXPANDFULL} expression. It is a list of @semantic{} tags
1890spliced in the tags tree.
1891
1892Because the automaton must know that @samp{formal_parameters} is a
1893start symbol, you must declare it like this:
1894
1895@example
1896@group
1897%start formal_parameters
1898@end group
1899@end example
1900@end quotation
1901@end itemize
1902
1903@cindex incremental re-parse
1904@cindex reparse-symbol
1905The @code{EXPANDFULL} macro has a side effect it is important to know,
1906related to the incremental re-parse mechanism of @semantic{}: the
1907nonterminal symbol parameter passed to @code{EXPANDFULL} also becomes
1908the @code{reparse-symbol} property of the tag returned by the
1909@code{EXPANDFULL} expression.
1910
1911When buffer's data mapped by a tag is modified, @semantic{}
1912schedules an incremental re-parse of that data, using the tag's
1913@code{reparse-symbol} property as start nonterminal.
1914
1915@strong{The rules associated to such start symbols must be carefully
1916reviewed to ensure that the incremental parser will work!}
1917
1918Things are a little bit different when the grammar is written in Bison
bfe6ffb6 1919style.
9e7abd17
EL
1920
1921@strong{The @code{reparse-symbol} property is set to the nonterminal
1922symbol the rule that explicitly uses @code{EXPANDTAG} belongs to.}
1923
1924For example:
1925
1926@example
1927@group
1928rule:
1929 rhs
1930 (let* ((rhs $1)
1931 name type comps prec action elt)
1932 @dots{}
1933 (EXPANDTAG
1934 (TAG name 'rule :type type :value comps :prec prec :expr action)
1935 ))
1936 ;
1937@end group
1938@end example
1939
1940Set the @code{reparse-symbol} property of the expanded tag to
1941@samp{rule}. A important consequence is that:
1942
1943@strong{Every nonterminal having any rule that calls @code{EXPANDTAG}
1944in a semantic action, should be declared as a start symbol!}
1945
1946@node Useful functions
1947@subsection Useful functions
1948
1949Here is a description of some predefined functions it might be useful
1950to know when writing new code to use Wisent in @semantic{}:
1951
1952@findex wisent-collect-unmatched-syntax
1953@defun wisent-collect-unmatched-syntax input
1954Add @var{input} lexical token to the cache of unmatched tokens, in
1955variable @code{semantic-unmatched-syntax-cache}.
1956
1957See implementation of the function @code{wisent-skip-token} in
1958@ref{Error recovery}, for an example of use.
1959@end defun
1960
1961@node Wisent Lex
1962@section The Wisent Lex lexer
1963
1964@findex semantic-lex
1965The lexical analysis step of @semantic{} is performed by the general
1966function @code{semantic-lex}. For more information, @inforef{Writing
1967Lexers, ,semantic-langdev}.
1968
1969@code{semantic-lex} produces lexical tokens of the form:
1970
1971@example
1972@group
1973@code{(@var{token-class start} . @var{end})}
1974@end group
1975@end example
1976
1977@table @var
1978@item token-class
1979Is a symbol that identifies a lexical token class, like @code{symbol},
1980@code{string}, @code{number}, or @code{PAREN_BLOCK}.
1981
1982@item start
1983@itemx end
1984Are the start and end positions of mapped data in the input buffer.
1985@end table
bfe6ffb6 1986
9e7abd17
EL
1987The Wisent's parser doesn't depend on the nature of analyzed input
1988stream (buffer, string, etc.), and requires that lexical tokens have a
1989different form (@pxref{Writing a lexer}):
1990
1991@example
1992@group
1993@code{(@var{token-class value} [@var{start} . @var{end}])}
1994@end group
1995@end example
1996
1997@cindex lexical token mapping
1998@code{wisent-lex} is the default Wisent's lexer used in @semantic{}.
1999
2000@vindex wisent-lex-istream
2001@findex wisent-lex
2002@defun wisent-lex
2003Return the next available lexical token in Wisent's form.
2004
2005The variable @code{wisent-lex-istream} contains the list of lexical
2006tokens produced by @code{semantic-lex}. Pop the next token available
2007and convert it to a form suitable for the Wisent's parser.
2008@end defun
2009
2010Mapping of lexical tokens as produced by @code{semantic-lex} into
2011equivalent Wisent lexical tokens is straightforward:
2012
2013@example
2014@group
2015(@var{token-class start} . @var{end})
2016 @result{} (@var{token-class value start} . @var{end})
2017@end group
2018@end example
2019
2020@var{value} is the input @code{buffer-substring} from @var{start} to
2021@var{end}.
2022
2023@node GNU Free Documentation License
2024@appendix GNU Free Documentation License
2025
bfe6ffb6 2026@include doclicense.texi
9e7abd17
EL
2027
2028@node Index
2029@unnumbered Index
2030@printindex cp
2031
2032@iftex
2033@contents
2034@summarycontents
2035@end iftex
2036
2037@bye
2038
2039@c Following comments are for the benefit of ispell.
2040
2041@c LocalWords: Wisent automagically wisent Wisent's LALR obarray