HCoop Git - bpt/emacs.git/blame_incremental

... / ...

Commit	Line	Data
	1	\input texinfo @c --texinfo--
	2	@c %**start of header
	3	@setfilename ../../info/wisent
	4	@set TITLE Wisent Parser Development
	5	@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
	6	@settitle @value{TITLE}
	7
	8	@c *************************************************************************
	9	@c @ Header
	10	@c *************************************************************************
	11
	12	@c Merge all indexes into a single index for now.
	13	@c We can always separate them later into two or more as needed.
	14	@syncodeindex vr cp
	15	@syncodeindex fn cp
	16	@syncodeindex ky cp
	17	@syncodeindex pg cp
	18	@syncodeindex tp cp
	19
	20	@c @footnotestyle separate
	21	@c @paragraphindent 2
	22	@c @@smallbook
	23	@documentencoding UTF-8
	24	@documentlanguage en
	25	@c %**end of header
	26
	27	@copying
	28	Copyright @copyright{} 1988--1993, 1995, 1998--2004, 2007, 2012--2013
	29	Free Software Foundation, Inc.
	30
	31	@c Since we are both GNU manuals, we do not need to ack each other here.
	32	@ignore
	33	Some texts are borrowed or adapted from the manual of Bison version
	34	1.35. The text in section entitled ``Understanding the automaton'' is
	35	adapted from the section ``Understanding Your Parser'' in the manual
	36	of Bison version 1.49.
	37	@end ignore
	38
	39	@quotation
	40	Permission is granted to copy, distribute and/or modify this document
	41	under the terms of the GNU Free Documentation License, Version 1.3 or
	42	any later version published by the Free Software Foundation; with no
	43	Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
	44	and with the Back-Cover Texts as in (a) below. A copy of the license
	45	is included in the section entitled ``GNU Free Documentation License''.
	46
	47	(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
	48	modify this GNU manual.''
	49	@end quotation
	50	@end copying
	51
	52	@dircategory Emacs misc features
	53	@direntry
	54	* Wisent: (wisent). Semantic Wisent parser development.
	55	@end direntry
	56
	57	@iftex
	58	@finalout
	59	@end iftex
	60
	61	@c @setchapternewpage odd
	62	@c @setchapternewpage off
	63
	64	@titlepage
	65	@sp 10
	66	@title @value{TITLE}
	67	@author by @value{AUTHOR}
	68	@page
	69	@vskip 0pt plus 1 fill
	70	@insertcopying
	71	@end titlepage
	72	@page
	73
	74	@macro semantic{}
	75	@i{Semantic}
	76	@end macro
	77
	78	@c *************************************************************************
	79	@c @ Document
	80	@c *************************************************************************
	81	@contents
	82
	83	@node top
	84	@top @value{TITLE}
	85
	86	Wisent (the European Bison ;-) is an Emacs Lisp implementation of the
	87	GNU Compiler Compiler Bison.
	88
	89	This manual describes how to use Wisent to develop grammars for
	90	programming languages, and how to use grammars to parse language
	91	source in Emacs buffers.
	92
	93	It also describes how Wisent is used with the @semantic{} tool set
	94	described in the @ref{Top, Semantic Manual, Semantic Manual, semantic}.
	95
	96	@ifnottex
	97	@insertcopying
	98	@end ifnottex
	99
	100	@menu
	101	* Wisent Overview::
	102	* Wisent Grammar::
	103	* Wisent Parsing::
	104	* Wisent Semantic::
	105	* GNU Free Documentation License::
	106	* Index::
	107	@end menu
	108
	109	@node Wisent Overview
	110	@chapter Wisent Overview
	111
	112	@dfn{Wisent} (the European Bison) is an implementation in Emacs Lisp
	113	of the GNU Compiler Compiler Bison. Its code is a port of the C code
	114	of GNU Bison 1.28 & 1.31.
	115
	116	For more details on the basic concepts for understanding Wisent, it is
	117	worthwhile to read the @ref{Top, Bison Manual, , bison}.
	118
	119	Wisent can generate compilers compatible with the @semantic{} tool set.
	120	See the @ref{Top, Semantic Manual, , semantic}.
	121
	122	It benefits from these Bison features:
	123
	124	@itemize @bullet
	125	@item
	126	It uses a fast but not so space-efficient encoding for the parse
	127	tables, described in Corbett's PhD thesis from Berkeley:
	128	@quotation
	129	@cite{Static Semantics in Compiler Error Recovery}@*
	130	June 1985, Report No. UCB/CSD 85/251.
	131	@end quotation
	132
	133	@item
	134	For generating the lookahead sets, Wisent uses the well-known
	135	technique of F. DeRemer and A. Pennello described in:
	136	@quotation
	137	@cite{Efficient Computation of LALR(1) Look-Ahead Sets}@*
	138	October 1982, ACM TOPLAS Vol 4 No 4, 615--49,
	139	@uref{http://dx.doi.org/10.1145/69622.357187}.
	140	@end quotation
	141
	142	@item
	143	Wisent resolves shift/reduce conflicts using operator precedence and
	144	associativity.
	145
	146	@item
	147	Parser error recovery is accomplished using rules which match the
	148	special token @code{error}.
	149	@end itemize
	150
	151	Nevertheless there are some fundamental differences between Bison and
	152	Wisent.
	153
	154	@itemize
	155	@item
	156	Wisent is intended to be used in Emacs. It reads and produces Emacs
	157	Lisp data structures. All the additional code used in grammars is
	158	Emacs Lisp code.
	159
	160	@item
	161	Contrary to Bison, Wisent does not generate a parser which combines
	162	Emacs Lisp code and grammar constructs. They exist separately.
	163	Wisent reads the grammar from a Lisp data structure and then generates
	164	grammar constructs as tables. Afterward, the derived tables can be
	165	included and byte-compiled in separate Emacs Lisp files, and be used
	166	at a later time by the Wisent's parser engine.
	167
	168	@item
	169	Wisent allows multiple start nonterminals and allows a call to the
	170	parsing function to be made for a particular start nonterminal. For
	171	example, this is particularly useful to parse a region of an Emacs
	172	buffer. @semantic{} heavily depends on the availability of this feature.
	173	@end itemize
	174
	175	@node Wisent Grammar
	176	@chapter Wisent Grammar
	177
	178	@cindex context-free grammar
	179	@cindex rule
	180	In order for Wisent to parse a language, it must be described by a
	181	@dfn{context-free grammar}. That is a grammar specified as rules that
	182	can be applied regardless of context. For more information, see
	183	@ref{Language and Grammar, , , bison}, in the Bison manual.
	184
	185	@cindex terminal
	186	@cindex nonterminal
	187	The formal grammar is formulated using @dfn{terminal} and
	188	@dfn{nonterminal} items. Terminals can be Emacs Lisp symbols or
	189	characters, and nonterminals are symbols only.
	190
	191	@cindex token
	192	Terminals (also known as @dfn{tokens}) represent the lexical
	193	elements of the language like numbers, strings, etc..
	194
	195	For example @samp{PLUS} can represent the operator @samp{+}.
	196
	197	Nonterminal symbols are described by rules:
	198
	199	@example
	200	@group
	201	RESULT @equiv{} COMPONENTS@dots{}
	202	@end group
	203	@end example
	204
	205	@samp{RESULT} is a nonterminal that this rule describes and
	206	@samp{COMPONENTS} are various terminals and nonterminals that are put
	207	together by this rule.
	208
	209	For example, this rule:
	210
	211	@example
	212	@group
	213	exp @equiv{} exp PLUS exp
	214	@end group
	215	@end example
	216
	217	Says that two groupings of type @samp{exp}, with a @samp{PLUS} token
	218	in between, can be combined into a larger grouping of type @samp{exp}.
	219
	220	@menu
	221	* Grammar format::
	222	* Example::
	223	* Compiling a grammar::
	224	* Conflicts::
	225	@end menu
	226
	227	@node Grammar format
	228	@section Grammar format
	229
	230	@cindex grammar format
	231	To be acceptable by Wisent a context-free grammar must respect a
	232	particular format. That is, must be represented as an Emacs Lisp list
	233	of the form:
	234
	235	@code{(@var{terminals} @var{assocs} . @var{non-terminals})}
	236
	237	@table @var
	238	@item terminals
	239	Is the list of terminal symbols used in the grammar.
	240
	241	@cindex associativity
	242	@item assocs
	243	Specify the associativity of @var{terminals}. It is @code{nil} when
	244	there is no associativity defined, or an alist of
	245	@w{@code{(@var{assoc-type} . @var{assoc-value})}} elements.
	246
	247	@var{assoc-type} must be one of the @code{default-prec},
	248	@code{nonassoc}, @code{left} or @code{right} symbols. When
	249	@var{assoc-type} is @code{default-prec}, @var{assoc-value} must be
	250	@code{nil} or @code{t} (the default). Otherwise it is a list of
	251	tokens which must have been previously declared in @var{terminals}.
	252
	253	For details, see @ref{Contextual Precedence, , , bison}, in the
	254	Bison manual.
	255
	256	@item non-terminals
	257	Is the list of nonterminal definitions. Each definition has the form:
	258
	259	@code{(@var{nonterm} . @var{rules})}
	260
	261	Where @var{nonterm} is the nonterminal symbol defined and
	262	@var{rules} the list of rules that describe this nonterminal. Each
	263	rule is a list:
	264
	265	@code{(@var{components} [@var{precedence}] [@var{action}])}
	266
	267	Where:
	268
	269	@table @var
	270	@item components
	271	Is a list of various terminals and nonterminals that are put together
	272	by this rule.
	273
	274	For example,
	275
	276	@example
	277	@group
	278	(exp ((exp ?+ exp)) ;; exp: exp '+' exp
	279	) ;; ;
	280	@end group
	281	@end example
	282
	283	Says that two groupings of type @samp{exp}, with a @samp{+} token in
	284	between, can be combined into a larger grouping of type @samp{exp}.
	285
	286	@cindex grammar coding conventions
	287	By convention, a nonterminal symbol should be in lower case, such as
	288	@samp{exp}, @samp{stmt} or @samp{declaration}. Terminal symbols
	289	should be upper case to distinguish them from nonterminals: for
	290	example, @samp{INTEGER}, @samp{IDENTIFIER}, @samp{IF} or
	291	@samp{RETURN}. A terminal symbol that represents a particular keyword
	292	in the language is conventionally the same as that keyword converted
	293	to upper case. The terminal symbol @code{error} is reserved for error
	294	recovery.
	295
	296	@cindex middle-rule actions
	297	Scattered among the components can be @dfn{middle-rule} actions.
	298	Usually only @var{action} is provided (@pxref{action}).
	299
	300	If @var{components} in a rule is @code{nil}, it means that the rule
	301	can match the empty string. For example, here is how to define a
	302	comma-separated sequence of zero or more @samp{exp} groupings:
	303
	304	@smallexample
	305	@group
	306	(expseq (nil) ;; expseq: ;; empty
	307	((expseq1)) ;; \| expseq1
	308	) ;; ;
	309
	310	(expseq1 ((exp)) ;; expseq1: exp
	311	((expseq1 ?, exp)) ;; \| expseq1 ',' exp
	312	) ;; ;
	313	@end group
	314	@end smallexample
	315
	316	@cindex precedence level
	317	@item precedence
	318	Assign the rule the precedence of the given terminal item, overriding
	319	the precedence that would be deduced for it, that is the one of the
	320	last terminal in it. Notice that only terminals declared in
	321	@var{assocs} have a precedence level. The altered rule precedence
	322	then affects how conflicts involving that rule are resolved.
	323
	324	@var{precedence} is an optional vector of one terminal item.
	325
	326	Here is how @var{precedence} solves the problem of unary minus.
	327	First, declare a precedence for a fictitious terminal symbol named
	328	@code{UMINUS}. There are no tokens of this type, but the symbol
	329	serves to stand for its precedence:
	330
	331	@example
	332	@dots{}
	333	((default-prec t) ;; This is the default
	334	(left '+' '-')
	335	(left '*')
	336	(left UMINUS))
	337	@end example
	338
	339	Now the precedence of @code{UMINUS} can be used in specific rules:
	340
	341	@smallexample
	342	@group
	343	(exp @dots{} ;; exp: @dots{}
	344	((exp ?- exp)) ;; \| exp '-' exp
	345	@dots{} ;; @dots{}
	346	((?- exp) [UMINUS]) ;; \| '-' exp %prec UMINUS
	347	@dots{} ;; @dots{}
	348	) ;; ;
	349	@end group
	350	@end smallexample
	351
	352	If you forget to append @code{[UMINUS]} to the rule for unary minus,
	353	Wisent silently assumes that minus has its usual precedence. This
	354	kind of problem can be tricky to debug, since one typically discovers
	355	the mistake only by testing the code.
	356
	357	Using @code{(default-prec nil)} declaration makes it easier to
	358	discover this kind of problem systematically. It causes rules that
	359	lack a @var{precedence} modifier to have no precedence, even if the
	360	last terminal symbol mentioned in their components has a declared
	361	precedence.
	362
	363	If @code{(default-prec nil)} is in effect, you must specify
	364	@var{precedence} for all rules that participate in precedence conflict
	365	resolution. Then you will see any shift/reduce conflict until you
	366	tell Wisent how to resolve it, either by changing your grammar or by
	367	adding an explicit precedence. This will probably add declarations to
	368	the grammar, but it helps to protect against incorrect rule
	369	precedences.
	370
	371	The effect of @code{(default-prec nil)} can be reversed by giving
	372	@code{(default-prec t)}, which is the default.
	373
	374	For more details, see @ref{Contextual Precedence, , , bison}, in the
	375	Bison manual.
	376
	377	It is important to understand that @var{assocs} declarations defines
	378	associativity but also assign a precedence level to terminals. All
	379	terminals declared in the same @code{left}, @code{right} or
	380	@code{nonassoc} association get the same precedence level. The
	381	precedence level is increased at each new association.
	382
	383	On the other hand, @var{precedence} explicitly assign the precedence
	384	level of the given terminal to a rule.
	385
	386	@cindex semantic actions
	387	@item @anchor{action}action
	388	An action is an optional Emacs Lisp function call, like this:
	389
	390	@code{(identity $1)}
	391
	392	The result of an action determines the semantic value of a rule.
	393
	394	From an implementation standpoint, the function call will be embedded
	395	in a lambda expression, and several useful local variables will be
	396	defined:
	397
	398	@table @code
	399	@vindex $N
	400	@item $@var{n}
	401	Where @var{n} is a positive integer. Like in Bison, the value of
	402	@code{$@var{n}} is the semantic value of the @var{n}th element of
	403	@var{components}, starting from 1. It can be of any Lisp data
	404	type.
	405
	406	@vindex $region@var{n}
	407	@item $regionN
	408	Where @var{n} is a positive integer. For each @code{$@var{n}}
	409	variable defined there is a corresponding @code{$region@var{n}}
	410	variable. Its value is a pair @code{(@var{start-pos} .
	411	@var{end-pos})} that represent the start and end positions (in the
	412	lexical input stream) of the @code{$@var{n}} value. It can be
	413	@code{nil} when the component positions are not available, like for an
	414	empty string component for example.
	415
	416	@vindex $region
	417	@item $region
	418	Its value is the leftmost and rightmost positions of input data
	419	matched by all @var{components} in the rule. This is a pair
	420	@code{(@var{leftmost-pos} . @var{rightmost-pos})}. It can be
	421	@code{nil} when components positions are not available.
	422
	423	@vindex $nterm
	424	@item $nterm
	425	This variable is initialized with the nonterminal symbol
	426	(@var{nonterm}) the rule belongs to. It could be useful to improve
	427	error reporting or debugging. It is also used to automatically
	428	provide incremental re-parse entry points for @semantic{} tags
	429	(@pxref{Wisent Semantic}).
	430
	431	@vindex $action
	432	@item $action
	433	The value of @code{$action} is the symbolic name of the current
	434	semantic action (@pxref{Debugging actions}).
	435	@end table
	436
	437	When an action is not specified a default value is supplied, it is
	438	@code{(identity $1)}. This means that the default semantic value of a
	439	rule is the value of its first component. Excepted for a rule
	440	matching the empty string, for which the default action is to return
	441	@code{nil}.
	442	@end table
	443	@end table
	444
	445	@node Example
	446	@section Example
	447
	448	@cindex grammar example
	449	Here is an example to parse simple infix arithmetic expressions. See
	450	@ref{Infix Calc, , , bison}, in the Bison manual for details.
	451
	452	@lisp
	453	@group
	454	'(
	455	;; Terminals
	456	(NUM)
	457
	458	;; Terminal associativity & precedence
	459	((nonassoc ?=)
	460	(left ?- ?+)
	461	(left ?* ?/)
	462	(left NEG)
	463	(right ?^))
	464
	465	;; Rules
	466	(input
	467	((line))
	468	((input line)
	469	(format "%s %s" $1 $2))
	470	)
	471
	472	(line
	473	((?;)
	474	(progn ";"))
	475	((exp ?;)
	476	(format "%s;" $1))
	477	((error ?;)
	478	(progn "Error;")))
	479	)
	480
	481	(exp
	482	((NUM)
	483	(string-to-number $1))
	484	((exp ?= exp)
	485	(= $1 $3))
	486	((exp ?+ exp)
	487	(+ $1 $3))
	488	((exp ?- exp)
	489	(- $1 $3))
	490	((exp ?* exp)
	491	(* $1 $3))
	492	((exp ?/ exp)
	493	(/ $1 $3))
	494	((?- exp) [NEG]
	495	(- $2))
	496	((exp ?^ exp)
	497	(expt $1 $3))
	498	((?$ exp ?$)
	499	(progn $2))
	500	)
	501	)
	502	@end group
	503	@end lisp
	504
	505	In the bison-like @dfn{WY} format (@pxref{Wisent Semantic}) the
	506	grammar looks like this:
	507
	508	@example
	509	@group
	510	%token <number> NUM
	511
	512	%nonassoc '=' ;; comparison
	513	%left '-' '+'
	514	%left '*' '/'
	515	%left NEG ;; negation--unary minus
	516	%right '^' ;; exponentiation
	517
	518	%%
	519
	520	input:
	521	line
	522	\| input line
	523	(format "%s %s" $1 $2)
	524	;
	525
	526	line:
	527	';'
	528	@{";"@}
	529	\| exp ';'
	530	(format "%s;" $1)
	531	\| error ';'
	532	@{"Error;"@}
	533	;
	534
	535	exp:
	536	NUM
	537	(string-to-number $1)
	538	\| exp '=' exp
	539	(= $1 $3)
	540	\| exp '+' exp
	541	(+ $1 $3)
	542	\| exp '-' exp
	543	(- $1 $3)
	544	\| exp '*' exp
	545	(* $1 $3)
	546	\| exp '/' exp
	547	(/ $1 $3)
	548	\| '-' exp %prec NEG
	549	(- $2)
	550	\| exp '^' exp
	551	(expt $1 $3)
	552	\| '(' exp ')'
	553	@{$2@}
	554	;
	555
	556	%%
	557	@end group
	558	@end example
	559
	560	@node Compiling a grammar
	561	@section Compiling a grammar
	562
	563	@cindex automaton
	564	After providing a context-free grammar in a suitable format, it must
	565	be translated into a set of tables (an @dfn{automaton}) that will be
	566	used to derive the parser. Like Bison, Wisent translates grammars that
	567	must be @dfn{LALR(1)}.
	568
	569	@cindex LALR(1) grammar
	570	@cindex look-ahead token
	571	A grammar is @acronym{LALR(1)} if it is possible to tell how to parse
	572	any portion of an input string with just a single token of look-ahead:
	573	the @dfn{look-ahead token}. See @ref{Language and Grammar, , ,
	574	bison}, in the Bison manual for more information.
	575
	576	@cindex grammar compilation
	577	Grammar translation (compilation) is achieved by the function:
	578
	579	@cindex compiling a grammar
	580	@vindex wisent-single-start-flag
	581	@findex wisent-compile-grammar
	582	@defun wisent-compile-grammar grammar &optional start-list
	583	Compile @var{grammar} and return an @acronym{LALR(1)} automaton.
	584
	585	Optional argument @var{start-list} is a list of start symbols
	586	(nonterminals). If @code{nil} the first nonterminal defined in the
	587	grammar is the default start symbol. If @var{start-list} contains
	588	only one element, it defines the start symbol. If @var{start-list}
	589	contains more than one element, all are defined as potential start
	590	symbols, unless @code{wisent-single-start-flag} is non-@code{nil}. In
	591	that case the first element of @var{start-list} defines the start
	592	symbol and others are ignored.
	593
	594	The @acronym{LALR(1)} automaton is a vector of the form:
	595
	596	@code{[@var{actions gotos starts functions}]}
	597
	598	@table @var
	599	@item actions
	600	A state/token matrix telling the parser what to do at every state
	601	based on the current look-ahead token. That is shift, reduce, accept
	602	or error. See also @ref{Wisent Parsing}.
	603
	604	@item gotos
	605	A state/nonterminal matrix telling the parser the next state to go to
	606	after reducing with each rule.
	607
	608	@item starts
	609	An alist which maps the allowed start symbols (nonterminals) to
	610	lexical tokens that will be first shifted into the parser stack.
	611
	612	@item functions
	613	An obarray of semantic action symbols. A semantic action is actually
	614	an Emacs Lisp function (lambda expression).
	615	@end table
	616	@end defun
	617
	618	@node Conflicts
	619	@section Conflicts
	620
	621	Normally, a grammar should produce an automaton where at each state
	622	the parser has only one action to do (@pxref{Wisent Parsing}).
	623
	624	@cindex ambiguous grammar
	625	In certain cases, a grammar can produce an automaton where, at some
	626	states, there are more than one action possible. Such a grammar is
	627	@dfn{ambiguous}, and generates @dfn{conflicts}.
	628
	629	@cindex deterministic automaton
	630	The parser can't be driven by an automaton which isn't completely
	631	@dfn{deterministic}, that is which contains conflicts. It is
	632	necessary to resolve the conflicts to eliminate them. Wisent resolves
	633	conflicts like Bison does.
	634
	635	@cindex grammar conflicts
	636	@cindex conflicts resolution
	637	There are two sorts of conflicts:
	638
	639	@table @dfn
	640	@cindex shift/reduce conflicts
	641	@item shift/reduce conflicts
	642	When either a shift or a reduction would be valid at the same state.
	643
	644	Such conflicts are resolved by choosing to shift, unless otherwise
	645	directed by operator precedence declarations.
	646	See @ref{Shift/Reduce , , , bison}, in the Bison manual for more
	647	information.
	648
	649	@cindex reduce/reduce conflicts
	650	@item reduce/reduce conflicts
	651	That occurs if there are two or more rules that apply to the same
	652	sequence of input. This usually indicates a serious error in the
	653	grammar.
	654
	655	Such conflicts are resolved by choosing to use the rule that appears
	656	first in the grammar, but it is very risky to rely on this. Every
	657	reduce/reduce conflict must be studied and usually eliminated. See
	658	@ref{Reduce/Reduce , , , bison}, in the Bison manual for more
	659	information.
	660	@end table
	661
	662	@menu
	663	* Grammar Debugging::
	664	* Understanding the automaton::
	665	@end menu
	666
	667	@node Grammar Debugging
	668	@subsection Grammar debugging
	669
	670	@cindex grammar debugging
	671	@cindex grammar verbose description
	672	To help writing a new grammar, @code{wisent-compile-grammar} can
	673	produce a verbose report containing a detailed description of the
	674	grammar and parser (equivalent to what Bison reports with the
	675	@option{--verbose} option).
	676
	677	To enable the verbose report you can set to non-@code{nil} the
	678	variable:
	679
	680	@vindex wisent-verbose-flag
	681	@deffn Option wisent-verbose-flag
	682	non-@code{nil} means to report verbose information on generated parser.
	683	@end deffn
	684
	685	Or interactively use the command:
	686
	687	@findex wisent-toggle-verbose-flag
	688	@deffn Command wisent-toggle-verbose-flag
	689	Toggle whether to report verbose information on generated parser.
	690	@end deffn
	691
	692	The verbose report is printed in the temporary buffer
	693	@code{wisent-log} when running interactively, or in file
	694	@file{wisent.output} when running in batch mode. Different
	695	reports are separated from each other by a line like this:
	696
	697	@example
	698	@group
	699	*** Wisent @var{source-file} - 2002-06-27 17:33
	700	@end group
	701	@end example
	702
	703	where @var{source-file} is the name of the Emacs Lisp file from which
	704	the grammar was read. See @ref{Understanding the automaton}, for
	705	details on the verbose report.
	706
	707	@table @strong
	708	@item Please Note
	709	To help debugging the grammar compiler itself, you can set this
	710	variable to print the content of some internal data structures:
	711
	712	@vindex wisent-debug-flag
	713	@defvar wisent-debug-flag
	714	non-@code{nil} means enable some debug stuff.
	715	@end defvar
	716	@end table
	717
	718	@node Understanding the automaton
	719	@subsection Understanding the automaton
	720
	721	@cindex understanding the automaton
	722	This section (took from the manual of Bison 1.49) describes how to use
	723	the verbose report printed by @code{wisent-compile-grammar} to
	724	understand the generated automaton, to tune or fix a grammar.
	725
	726	We will use the following example:
	727
	728	@example
	729	@group
	730	(let ((wisent-verbose-flag t)) ;; Print a verbose report!
	731	(wisent-compile-grammar
	732	'((NUM STR) ; %token NUM STR
	733
	734	((left ?+ ?-) ; %left '+' '-';
	735	(left ?)) ; %left ''
	736
	737	(exp ; exp:
	738	((exp ?+ exp)) ; exp '+' exp
	739	((exp ?- exp)) ; \| exp '-' exp
	740	((exp ?* exp)) ; \| exp '*' exp
	741	((exp ?/ exp)) ; \| exp '/' exp
	742	((NUM)) ; \| NUM
	743	) ; ;
	744
	745	(useless ; useless:
	746	((STR)) ; STR
	747	) ; ;
	748	)
	749	'nil) ; no %start declarations
	750	)
	751	@end group
	752	@end example
	753
	754	When evaluating the above expression, grammar compilation first issues
	755	the following two clear messages:
	756
	757	@example
	758	@group
	759	Grammar contains 1 useless nonterminals and 1 useless rules
	760	Grammar contains 7 shift/reduce conflicts
	761	@end group
	762	@end example
	763
	764	The @samp{wisent-log} buffer details things!
	765
	766	The first section reports conflicts that were solved using precedence
	767	and/or associativity:
	768
	769	@example
	770	@group
	771	Conflict in state 7 between rule 1 and token '+' resolved as reduce.
	772	Conflict in state 7 between rule 1 and token '-' resolved as reduce.
	773	Conflict in state 7 between rule 1 and token '*' resolved as shift.
	774	Conflict in state 8 between rule 2 and token '+' resolved as reduce.
	775	Conflict in state 8 between rule 2 and token '-' resolved as reduce.
	776	Conflict in state 8 between rule 2 and token '*' resolved as shift.
	777	Conflict in state 9 between rule 3 and token '+' resolved as reduce.
	778	Conflict in state 9 between rule 3 and token '-' resolved as reduce.
	779	Conflict in state 9 between rule 3 and token '*' resolved as reduce.
	780	@end group
	781	@end example
	782
	783	The next section reports useless tokens, nonterminal and rules (note
	784	that useless tokens might be used by the scanner):
	785
	786	@example
	787	@group
	788	Useless nonterminals:
	789
	790	useless
	791
	792
	793	Terminals which are not used:
	794
	795	STR
	796
	797
	798	Useless rules:
	799
	800	#6 useless: STR;
	801	@end group
	802	@end example
	803
	804	The next section lists states that still have conflicts:
	805
	806	@example
	807	@group
	808	State 7 contains 1 shift/reduce conflict.
	809	State 8 contains 1 shift/reduce conflict.
	810	State 9 contains 1 shift/reduce conflict.
	811	State 10 contains 4 shift/reduce conflicts.
	812	@end group
	813	@end example
	814
	815	The next section reproduces the grammar used:
	816
	817	@example
	818	@group
	819	Grammar
	820
	821	Number, Rule
	822	1 exp -> exp '+' exp
	823	2 exp -> exp '-' exp
	824	3 exp -> exp '*' exp
	825	4 exp -> exp '/' exp
	826	5 exp -> NUM
	827	@end group
	828	@end example
	829
	830	And reports the uses of the symbols:
	831
	832	@example
	833	@group
	834	Terminals, with rules where they appear
	835
	836	$EOI (-1)
	837	error (1)
	838	NUM (2) 5
	839	STR (3) 6
	840	'+' (4) 1
	841	'-' (5) 2
	842	'*' (6) 3
	843	'/' (7) 4
	844
	845
	846	Nonterminals, with rules where they appear
	847
	848	exp (8)
	849	on left: 1 2 3 4 5, on right: 1 2 3 4
	850	@end group
	851	@end example
	852
	853	The report then details the automaton itself, describing each state
	854	with it set of @dfn{items}, also known as @dfn{pointed rules}. Each
	855	item is a production rule together with a point (marked by @samp{.})
	856	that the input cursor.
	857
	858	@example
	859	@group
	860	state 0
	861
	862	NUM shift, and go to state 1
	863
	864	exp go to state 2
	865	@end group
	866	@end example
	867
	868	State 0 corresponds to being at the very beginning of the parsing, in
	869	the initial rule, right before the start symbol (@samp{exp}). When
	870	the parser returns to this state right after having reduced a rule
	871	that produced an @samp{exp}, it jumps to state 2. If there is no such
	872	transition on a nonterminal symbol, and the lookahead is a @samp{NUM},
	873	then this token is shifted on the parse stack, and the control flow
	874	jumps to state 1. Any other lookahead triggers a parse error.
	875
	876	In the state 1...
	877
	878	@example
	879	@group
	880	state 1
	881
	882	exp -> NUM . (rule 5)
	883
	884	$default reduce using rule 5 (exp)
	885	@end group
	886	@end example
	887
	888	the rule 5, @samp{exp: NUM;}, is completed. Whatever the lookahead
	889	(@samp{$default}), the parser will reduce it. If it was coming from
	890	state 0, then, after this reduction it will return to state 0, and
	891	will jump to state 2 (@samp{exp: go to state 2}).
	892
	893	@example
	894	@group
	895	state 2
	896
	897	exp -> exp . '+' exp (rule 1)
	898	exp -> exp . '-' exp (rule 2)
	899	exp -> exp . '*' exp (rule 3)
	900	exp -> exp . '/' exp (rule 4)
	901
	902	$EOI shift, and go to state 11
	903	'+' shift, and go to state 3
	904	'-' shift, and go to state 4
	905	'*' shift, and go to state 5
	906	'/' shift, and go to state 6
	907	@end group
	908	@end example
	909
	910	In state 2, the automaton can only shift a symbol. For instance,
	911	because of the item @samp{exp -> exp . '+' exp}, if the lookahead if
	912	@samp{+}, it will be shifted on the parse stack, and the automaton
	913	control will jump to state 3, corresponding to the item
	914	@samp{exp -> exp . '+' exp}:
	915
	916	@example
	917	@group
	918	state 3
	919
	920	exp -> exp '+' . exp (rule 1)
	921
	922	NUM shift, and go to state 1
	923
	924	exp go to state 7
	925	@end group
	926	@end example
	927
	928	Since there is no default action, any other token than those listed
	929	above will trigger a parse error.
	930
	931	The interpretation of states 4 to 6 is straightforward:
	932
	933	@example
	934	@group
	935	state 4
	936
	937	exp -> exp '-' . exp (rule 2)
	938
	939	NUM shift, and go to state 1
	940
	941	exp go to state 8
	942
	943
	944
	945	state 5
	946
	947	exp -> exp '*' . exp (rule 3)
	948
	949	NUM shift, and go to state 1
	950
	951	exp go to state 9
	952
	953
	954
	955	state 6
	956
	957	exp -> exp '/' . exp (rule 4)
	958
	959	NUM shift, and go to state 1
	960
	961	exp go to state 10
	962	@end group
	963	@end example
	964
	965	As was announced in beginning of the report, @samp{State 7 contains 1
	966	shift/reduce conflict.}:
	967
	968	@example
	969	@group
	970	state 7
	971
	972	exp -> exp . '+' exp (rule 1)
	973	exp -> exp '+' exp . (rule 1)
	974	exp -> exp . '-' exp (rule 2)
	975	exp -> exp . '*' exp (rule 3)
	976	exp -> exp . '/' exp (rule 4)
	977
	978	'*' shift, and go to state 5
	979	'/' shift, and go to state 6
	980
	981	'/' [reduce using rule 1 (exp)]
	982	$default reduce using rule 1 (exp)
	983	@end group
	984	@end example
	985
	986	Indeed, there are two actions associated to the lookahead @samp{/}:
	987	either shifting (and going to state 6), or reducing rule 1. The
	988	conflict means that either the grammar is ambiguous, or the parser
	989	lacks information to make the right decision. Indeed the grammar is
	990	ambiguous, as, since we did not specify the precedence of @samp{/},
	991	the sentence @samp{NUM + NUM / NUM} can be parsed as @samp{NUM + (NUM
	992	/ NUM)}, which corresponds to shifting @samp{/}, or as @samp{(NUM +
	993	NUM) / NUM}, which corresponds to reducing rule 1.
	994
	995	Because in @acronym{LALR(1)} parsing a single decision can be made,
	996	Wisent arbitrarily chose to disable the reduction, see
	997	@ref{Conflicts}. Discarded actions are reported in between square
	998	brackets.
	999
	1000	Note that all the previous states had a single possible action: either
	1001	shifting the next token and going to the corresponding state, or
	1002	reducing a single rule. In the other cases, i.e., when shifting
	1003	@emph{and} reducing is possible or when @emph{several} reductions are
	1004	possible, the lookahead is required to select the action. State 7 is
	1005	one such state: if the lookahead is @samp{*} or @samp{/} then the
	1006	action is shifting, otherwise the action is reducing rule 1. In other
	1007	words, the first two items, corresponding to rule 1, are not eligible
	1008	when the lookahead is @samp{}, since we specified that @samp{} has
	1009	higher precedence that @samp{+}. More generally, some items are
	1010	eligible only with some set of possible lookaheads.
	1011
	1012	States 8 to 10 are similar:
	1013
	1014	@example
	1015	@group
	1016	state 8
	1017
	1018	exp -> exp . '+' exp (rule 1)
	1019	exp -> exp . '-' exp (rule 2)
	1020	exp -> exp '-' exp . (rule 2)
	1021	exp -> exp . '*' exp (rule 3)
	1022	exp -> exp . '/' exp (rule 4)
	1023
	1024	'*' shift, and go to state 5
	1025	'/' shift, and go to state 6
	1026
	1027	'/' [reduce using rule 2 (exp)]
	1028	$default reduce using rule 2 (exp)
	1029
	1030
	1031	state 9
	1032
	1033	exp -> exp . '+' exp (rule 1)
	1034	exp -> exp . '-' exp (rule 2)
	1035	exp -> exp . '*' exp (rule 3)
	1036	exp -> exp '*' exp . (rule 3)
	1037	exp -> exp . '/' exp (rule 4)
	1038
	1039	'/' shift, and go to state 6
	1040
	1041	'/' [reduce using rule 3 (exp)]
	1042	$default reduce using rule 3 (exp)
	1043
	1044
	1045	state 10
	1046
	1047	exp -> exp . '+' exp (rule 1)
	1048	exp -> exp . '-' exp (rule 2)
	1049	exp -> exp . '*' exp (rule 3)
	1050	exp -> exp . '/' exp (rule 4)
	1051	exp -> exp '/' exp . (rule 4)
	1052
	1053	'+' shift, and go to state 3
	1054	'-' shift, and go to state 4
	1055	'*' shift, and go to state 5
	1056	'/' shift, and go to state 6
	1057
	1058	'+' [reduce using rule 4 (exp)]
	1059	'-' [reduce using rule 4 (exp)]
	1060	'*' [reduce using rule 4 (exp)]
	1061	'/' [reduce using rule 4 (exp)]
	1062	$default reduce using rule 4 (exp)
	1063	@end group
	1064	@end example
	1065
	1066	Observe that state 10 contains conflicts due to the lack of precedence
	1067	of @samp{/} wrt @samp{+}, @samp{-}, and @samp{*}, but also because the
	1068	associativity of @samp{/} is not specified.
	1069
	1070	Finally, the state 11 (plus 12) is named the @dfn{final state}, or the
	1071	@dfn{accepting state}:
	1072
	1073	@example
	1074	@group
	1075	state 11
	1076
	1077	$EOI shift, and go to state 12
	1078
	1079
	1080
	1081	state 12
	1082
	1083	$default accept
	1084	@end group
	1085	@end example
	1086
	1087	The end of input is shifted @samp{$EOI shift,} and the parser exits
	1088	successfully (@samp{go to state 12}, that terminates).
	1089
	1090	@node Wisent Parsing
	1091	@chapter Wisent Parsing
	1092
	1093	@cindex bottom-up parser
	1094	@cindex shift-reduce parser
	1095	The Wisent's parser is what is called a @dfn{bottom-up} or
	1096	@dfn{shift-reduce} parser which repeatedly:
	1097
	1098	@table @dfn
	1099	@cindex shift
	1100	@item shift
	1101	That is pushes the value of the last lexical token read (the
	1102	look-ahead token) into a value stack, and reads a new one.
	1103
	1104	@cindex reduce
	1105	@item reduce
	1106	That is replaces a nonterminal by its semantic value. The values of
	1107	the components which form the right hand side of a rule are popped
	1108	from the value stack and reduced by the semantic action of this rule.
	1109	The result is pushed back on top of value stack.
	1110	@end table
	1111
	1112	The parser will stop on:
	1113
	1114	@table @dfn
	1115	@cindex accept
	1116	@item accept
	1117	When all input has been successfully parsed. The semantic value of
	1118	the start nonterminal is on top of the value stack.
	1119
	1120	@cindex syntax error
	1121	@item error
	1122	When a syntax error (an unexpected token in input) has been detected.
	1123	At this point the parser issues an error message and either stops or
	1124	calls a recovery routine to try to resume parsing.
	1125	@end table
	1126
	1127	@cindex table-driven parser
	1128	The above elementary actions are driven by the @acronym{LALR(1)}
	1129	automaton built by @code{wisent-compile-grammar} from a context-free
	1130	grammar.
	1131
	1132	The Wisent's parser is entered by calling the function:
	1133
	1134	@findex wisent-parse
	1135	@defun wisent-parse automaton lexer &optional error start
	1136	Parse input using the automaton specified in @var{automaton}.
	1137
	1138	@table @var
	1139	@item automaton
	1140	Is an @acronym{LALR(1)} automaton generated by
	1141	@code{wisent-compile-grammar} (@pxref{Wisent Grammar}).
	1142
	1143	@item lexer
	1144	Is a function with no argument called by the parser to obtain the next
	1145	terminal (token) in input (@pxref{Writing a lexer}).
	1146
	1147	@item error
	1148	Is an optional reporting function called when a parse error occurs.
	1149	It receives a message string to report. It defaults to the function
	1150	@code{wisent-message} (@pxref{Report errors}).
	1151
	1152	@item start
	1153	Specify the start symbol (nonterminal) used by the parser as its goal.
	1154	It defaults to the start symbol defined in the grammar
	1155	(@pxref{Wisent Grammar}).
	1156	@end table
	1157	@end defun
	1158
	1159	The following two normal hooks permit to do some useful processing
	1160	respectively before to start parsing, and after the parser terminated.
	1161
	1162	@vindex wisent-pre-parse-hook
	1163	@defvar wisent-pre-parse-hook
	1164	Normal hook run just before entering the @var{LR} parser engine.
	1165	@end defvar
	1166
	1167	@vindex wisent-post-parse-hook
	1168	@defvar wisent-post-parse-hook
	1169	Normal hook run just after the @var{LR} parser engine terminated.
	1170	@end defvar
	1171
	1172	@menu
	1173	* Writing a lexer::
	1174	* Actions goodies::
	1175	* Report errors::
	1176	* Error recovery::
	1177	* Debugging actions::
	1178	@end menu
	1179
	1180	@node Writing a lexer
	1181	@section What the parser must receive
	1182
	1183	It is important to understand that the parser does not parse
	1184	characters, but lexical tokens, and does not know anything about
	1185	characters in text streams!
	1186
	1187	@cindex lexical analysis
	1188	@cindex lexer
	1189	@cindex scanner
	1190	Reading input data to produce lexical tokens is performed by a lexer
	1191	(also called a scanner) in a lexical analysis step, before the syntax
	1192	analysis step performed by the parser. The parser automatically calls
	1193	the lexer when it needs the next token to parse.
	1194
	1195	@cindex lexical tokens
	1196	A Wisent's lexer is an Emacs Lisp function with no argument. It must
	1197	return a valid lexical token of the form:
	1198
	1199	@code{(@var{token-class value} [@var{start} . @var{end}])}
	1200
	1201	@table @var
	1202	@item token-class
	1203	Is a category of lexical token identifying a terminal as specified in
	1204	the grammar (@pxref{Wisent Grammar}). It can be a symbol or a character
	1205	literal.
	1206
	1207	@item value
	1208	Is the value of the lexical token. It can be of any valid Emacs Lisp
	1209	data type.
	1210
	1211	@item start
	1212	@itemx end
	1213	Are the optional beginning and ending positions of @var{value} in the
	1214	input stream.
	1215	@end table
	1216
	1217	When there are no more tokens to read the lexer must return the token
	1218	@code{(list wisent-eoi-term)} to each request.
	1219
	1220	@vindex wisent-eoi-term
	1221	@defvar wisent-eoi-term
	1222	Predefined constant, End-Of-Input terminal symbol.
	1223	@end defvar
	1224
	1225	@code{wisent-lex} is an example of a lexer that reads lexical tokens
	1226	produced by a @semantic{} lexer, and translates them into lexical tokens
	1227	suitable to the Wisent parser. See also @ref{Wisent Lex}.
	1228
	1229	To call the lexer in a semantic action use the function
	1230	@code{wisent-lexer}. See also @ref{Actions goodies}.
	1231
	1232	@node Actions goodies
	1233	@section Variables and macros useful in grammar actions.
	1234
	1235	@vindex wisent-input
	1236	@defvar wisent-input
	1237	The last token read.
	1238	This variable only has meaning in the scope of @code{wisent-parse}.
	1239	@end defvar
	1240
	1241	@findex wisent-lexer
	1242	@defun wisent-lexer
	1243	Obtain the next terminal in input.
	1244	@end defun
	1245
	1246	@findex wisent-region
	1247	@defun wisent-region &rest positions
	1248	Return the start/end positions of the region including
	1249	@var{positions}. Each element of @var{positions} is a pair
	1250	@w{@code{(@var{start-pos} . @var{end-pos})}} or @code{nil}. The
	1251	returned value is the pair @w{@code{(@var{min-start-pos} .
	1252	@var{max-end-pos})}} or @code{nil} if no @var{positions} are
	1253	available.
	1254	@end defun
	1255
	1256	@node Report errors
	1257	@section The error reporting function
	1258
	1259	@cindex error reporting
	1260	When the parser encounters a syntax error it calls a user-defined
	1261	function. It must be an Emacs Lisp function with one argument: a
	1262	string containing the message to report.
	1263
	1264	By default the parser uses this function to report error messages:
	1265
	1266	@findex wisent-message
	1267	@defun wisent-message string &rest args
	1268	Print a one-line message if @code{wisent-parse-verbose-flag} is set.
	1269	Pass @var{string} and @var{args} arguments to @dfn{message}.
	1270	@end defun
	1271
	1272	@table @strong
	1273	@item Please Note:
	1274	@code{wisent-message} uses the following function to print lexical
	1275	tokens:
	1276
	1277	@defun wisent-token-to-string token
	1278	Return a printed representation of lexical token @var{token}.
	1279	@end defun
	1280
	1281	The general printed form of a lexical token is:
	1282
	1283	@w{@code{@var{token}(@var{value})@@@var{location}}}
	1284	@end table
	1285
	1286	To control the verbosity of the parser you can set to non-@code{nil}
	1287	this variable:
	1288
	1289	@vindex wisent-parse-verbose-flag
	1290	@deffn Option wisent-parse-verbose-flag
	1291	non-@code{nil} means to issue more messages while parsing.
	1292	@end deffn
	1293
	1294	Or interactively use the command:
	1295
	1296	@findex wisent-parse-toggle-verbose-flag
	1297	@deffn Command wisent-parse-toggle-verbose-flag
	1298	Toggle whether to issue more messages while parsing.
	1299	@end deffn
	1300
	1301	When the error reporting function is entered the variable
	1302	@code{wisent-input} contains the unexpected token as returned by the
	1303	lexer.
	1304
	1305	The error reporting function can be called from a semantic action too
	1306	using the special macro @code{wisent-error}. When called from a
	1307	semantic action entered by error recovery (@pxref{Error recovery}) the
	1308	value of the variable @code{wisent-recovering} is non-@code{nil}.
	1309
	1310	@node Error recovery
	1311	@section Error recovery
	1312
	1313	@cindex error recovery
	1314	The error recovery mechanism of the Wisent's parser conforms to the
	1315	one Bison uses. See @ref{Error Recovery, , , bison}, in the Bison
	1316	manual for details.
	1317
	1318	@cindex error token
	1319	To recover from a syntax error you must write rules to recognize the
	1320	special token @code{error}. This is a terminal symbol that is
	1321	automatically defined and reserved for error handling.
	1322
	1323	When the parser encounters a syntax error, it pops the state stack
	1324	until it finds a state that allows shifting the @code{error} token.
	1325	After it has been shifted, if the old look-ahead token is not
	1326	acceptable to be shifted next, the parser reads tokens and discards
	1327	them until it finds a token which is acceptable.
	1328
	1329	@cindex error recovery strategy
	1330	Strategies for error recovery depend on the choice of error rules in
	1331	the grammar. A simple and useful strategy is simply to skip the rest
	1332	of the current statement if an error is detected:
	1333
	1334	@example
	1335	@group
	1336	(statement (( error ?; )) ;; on error, skip until ';' is read
	1337	)
	1338	@end group
	1339	@end example
	1340
	1341	It is also useful to recover to the matching close-delimiter of an
	1342	opening-delimiter that has already been parsed:
	1343
	1344	@example
	1345	@group
	1346	(primary (( ?@{ expr ?@} ))
	1347	(( ?@{ error ?@} ))
	1348	@dots{}
	1349	)
	1350	@end group
	1351	@end example
	1352
	1353	@cindex error recovery actions
	1354	Note that error recovery rules may have actions, just as any other
	1355	rules can. Here are some predefined hooks, variables, functions or
	1356	macros, useful in such actions:
	1357
	1358	@vindex wisent-nerrs
	1359	@defvar wisent-nerrs
	1360	The number of parse errors encountered so far.
	1361	@end defvar
	1362
	1363	@vindex wisent-recovering
	1364	@defvar wisent-recovering
	1365	non-@code{nil} means that the parser is recovering.
	1366	This variable only has meaning in the scope of @code{wisent-parse}.
	1367	@end defvar
	1368
	1369	@findex wisent-error
	1370	@defun wisent-error msg
	1371	Call the user supplied error reporting function with message
	1372	@var{msg} (@pxref{Report errors}).
	1373
	1374	For an example of use, @xref{wisent-skip-token}.
	1375	@end defun
	1376
	1377	@findex wisent-errok
	1378	@defun wisent-errok
	1379	Resume generating error messages immediately for subsequent syntax
	1380	errors.
	1381
	1382	The parser suppress error message for syntax errors that happens
	1383	shortly after the first, until three consecutive input tokens have
	1384	been successfully shifted.
	1385
	1386	Calling @code{wisent-errok} in an action, make error messages resume
	1387	immediately. No error messages will be suppressed if you call it in
	1388	an error rule's action.
	1389
	1390	For an example of use, @xref{wisent-skip-token}.
	1391	@end defun
	1392
	1393	@findex wisent-clearin
	1394	@defun wisent-clearin
	1395	Discard the current lookahead token.
	1396	This will cause a new lexical token to be read.
	1397
	1398	In an error rule's action the previous lookahead token is reanalyzed
	1399	immediately. @code{wisent-clearin} may be called to clear this token.
	1400
	1401	For example, suppose that on a parse error, an error handling routine
	1402	is called that advances the input stream to some point where parsing
	1403	should once again commence. The next symbol returned by the lexical
	1404	scanner is probably correct. The previous lookahead token ought to
	1405	be discarded with @code{wisent-clearin}.
	1406
	1407	For an example of use, @xref{wisent-skip-token}.
	1408	@end defun
	1409
	1410	@findex wisent-abort
	1411	@defun wisent-abort
	1412	Abort parsing and save the lookahead token.
	1413	@end defun
	1414
	1415	@findex wisent-set-region
	1416	@defun wisent-set-region start end
	1417	Change the region of text matched by the current nonterminal.
	1418	@var{start} and @var{end} are respectively the beginning and end
	1419	positions of the region occupied by the group of components associated
	1420	to this nonterminal. If @var{start} or @var{end} values are not a
	1421	valid positions the region is set to @code{nil}.
	1422
	1423	For an example of use, @xref{wisent-skip-token}.
	1424	@end defun
	1425
	1426	@vindex wisent-discarding-token-functions
	1427	@defvar wisent-discarding-token-functions
	1428	List of functions to be called when discarding a lexical token.
	1429	These functions receive the lexical token discarded.
	1430	When the parser encounters unexpected tokens, it can discards them,
	1431	based on what directed by error recovery rules. Either when the
	1432	parser reads tokens until one is found that can be shifted, or when an
	1433	semantic action calls the function @code{wisent-skip-token} or
	1434	@code{wisent-skip-block}.
	1435	For language specific hooks, make sure you define this as a local
	1436	hook.
	1437
	1438	For example, in @semantic{}, this hook is set to the function
	1439	@code{wisent-collect-unmatched-syntax} to collect unmatched lexical
	1440	tokens (@pxref{Useful functions}).
	1441	@end defvar
	1442
	1443	@findex wisent-skip-token
	1444	@defun wisent-skip-token
	1445	@anchor{wisent-skip-token}
	1446	Skip the lookahead token in order to resume parsing.
	1447	Return nil.
	1448	Must be used in error recovery semantic actions.
	1449
	1450	It typically looks like this:
	1451
	1452	@lisp
	1453	@group
	1454	(wisent-message "%s: skip %s" $action
	1455	(wisent-token-to-string wisent-input))
	1456	(run-hook-with-args
	1457	'wisent-discarding-token-functions wisent-input)
	1458	(wisent-clearin)
	1459	(wisent-errok)))
	1460	@end group
	1461	@end lisp
	1462	@end defun
	1463
	1464	@findex wisent-skip-block
	1465	@defun wisent-skip-block
	1466	Safely skip a block in order to resume parsing.
	1467	Return nil.
	1468	Must be used in error recovery semantic actions.
	1469
	1470	A block is data between an open-delimiter (syntax class @code{(}) and
	1471	a matching close-delimiter (syntax class @code{)}):
	1472
	1473	@example
	1474	@group
	1475	(a parenthesized block)
	1476	[a block between brackets]
	1477	@{a block between braces@}
	1478	@end group
	1479	@end example
	1480
	1481	The following example uses @code{wisent-skip-block} to safely skip a
	1482	block delimited by @samp{LBRACE} (@code{@{}) and @samp{RBRACE}
	1483	(@code{@}}) tokens, when a syntax error occurs in
	1484	@samp{other-components}:
	1485
	1486	@example
	1487	@group
	1488	(block ((LBRACE other-components RBRACE))
	1489	((LBRACE RBRACE))
	1490	((LBRACE error)
	1491	(wisent-skip-block))
	1492	)
	1493	@end group
	1494	@end example
	1495	@end defun
	1496
	1497	@node Debugging actions
	1498	@section Debugging semantic actions
	1499
	1500	@cindex semantic action symbols
	1501	Each semantic action is represented by a symbol interned in an
	1502	@dfn{obarray} that is part of the @acronym{LALR(1)} automaton
	1503	(@pxref{Compiling a grammar}). @code{symbol-function} on a semantic
	1504	action symbol return the semantic action lambda expression.
	1505
	1506	A semantic action symbol name has the form
	1507	@code{@var{nonterminal}:@var{index}}, where @var{nonterminal} is the
	1508	name of the nonterminal symbol the action belongs to, and @var{index}
	1509	is an action sequence number within the scope of @var{nonterminal}.
	1510	For example, this nonterminal definition:
	1511
	1512	@example
	1513	@group
	1514	input:
	1515	line [@code{input:0}]
	1516	\| input line
	1517	(format "%s %s" $1 $2) [@code{input:1}]
	1518	;
	1519	@end group
	1520	@end example
	1521
	1522	Will produce two semantic actions, and associated symbols:
	1523
	1524	@table @code
	1525	@item input:0
	1526	A default action that returns @code{$1}.
	1527
	1528	@item input:1
	1529	That returns @code{(format "%s %s" $1 $2)}.
	1530	@end table
	1531
	1532	@cindex debugging semantic actions
	1533	Debugging uses the Lisp debugger to investigate what is happening
	1534	during execution of semantic actions.
	1535	Three commands are available to debug semantic actions. They receive
	1536	two arguments:
	1537
	1538	@itemize @bullet
	1539	@item The automaton that contains the semantic action.
	1540
	1541	@item The semantic action symbol.
	1542	@end itemize
	1543
	1544	@findex wisent-debug-on-entry
	1545	@deffn Command wisent-debug-on-entry automaton function
	1546	Request @var{automaton}'s @var{function} to invoke debugger each time it is called.
	1547	@var{function} must be a semantic action symbol that exists in @var{automaton}.
	1548	@end deffn
	1549
	1550	@findex wisent-cancel-debug-on-entry
	1551	@deffn Command wisent-cancel-debug-on-entry automaton function
	1552	Undo effect of @code{wisent-debug-on-entry} on @var{automaton}'s @var{function}.
	1553	@var{function} must be a semantic action symbol that exists in @var{automaton}.
	1554	@end deffn
	1555
	1556	@findex wisent-debug-show-entry
	1557	@deffn Command wisent-debug-show-entry automaton function
	1558	Show the source of @var{automaton}'s semantic action @var{function}.
	1559	@var{function} must be a semantic action symbol that exists in @var{automaton}.
	1560	@end deffn
	1561
	1562	@node Wisent Semantic
	1563	@chapter How to use Wisent with Semantic
	1564
	1565	@cindex tags
	1566	This section presents how the Wisent's parser can be used to produce
	1567	@dfn{tags} for the @semantic{} tool set.
	1568
	1569	@semantic{} tags form a hierarchy of Emacs Lisp data structures that
	1570	describes a program in a way independent of programming languages.
	1571	Tags map program declarations, like functions, methods, variables,
	1572	data types, classes, includes, grammar rules, etc..
	1573
	1574	@cindex WY grammar format
	1575	To use the Wisent parser with @semantic{} you have to define
	1576	your grammar in @dfn{WY} form, a grammar format very close
	1577	to the one used by Bison.
	1578
	1579	Please @inforef{top, Semantic Grammar Framework Manual, grammar-fw}
	1580	for more information on @semantic{} grammars.
	1581
	1582	@menu
	1583	* Grammar styles::
	1584	* Wisent Lex::
	1585	@end menu
	1586
	1587	@node Grammar styles
	1588	@section Grammar styles
	1589
	1590	@cindex grammar styles
	1591	@semantic{} parsing heavily depends on how you wrote the grammar.
	1592	There are mainly two styles to write a Wisent's grammar intended to be
	1593	used with the @semantic{} tool set: the @dfn{Iterative style} and the
	1594	@dfn{Bison style}. Each one has pros and cons, and in certain cases
	1595	it can be worth a mix of the two styles!
	1596
	1597	@menu
	1598	* Iterative style::
	1599	* Bison style::
	1600	* Mixed style::
	1601	* Start nonterminals::
	1602	* Useful functions::
	1603	@end menu
	1604
	1605	@node Iterative style
	1606	@subsection Iterative style
	1607
	1608	@cindex grammar iterative style
	1609	The @dfn{iterative style} is the preferred style to use with @semantic{}.
	1610	It relies on an iterative parser back-end mechanism which parses start
	1611	nonterminals one at a time and automagically skips unexpected lexical
	1612	tokens in input.
	1613
	1614	Compared to rule-based iterative functions (@pxref{Bison style}),
	1615	iterative parsers are better in that they can handle obscure errors
	1616	more cleanly.
	1617
	1618	@cindex raw tag
	1619	Each start nonterminal must produces a @dfn{raw tag} by calling a
	1620	@code{TAG}-like grammar macro with appropriate parameters. See also
	1621	@ref{Start nonterminals}.
	1622
	1623	@cindex expanded tag
	1624	Then, each parsing iteration automatically translates a raw tag into
	1625	@dfn{expanded tags}, updating the raw tag structure with internal
	1626	properties and buffer related data.
	1627
	1628	After parsing completes, it results in a tree of expanded tags.
	1629
	1630	The following example is a snippet of the iterative style Java grammar
	1631	provided in the @semantic{} distribution in the file
	1632	@file{semantic/wisent/java-tags.wy}.
	1633
	1634	@example
	1635	@group
	1636	@dots{}
	1637	;; Alternate entry points
	1638	;; - Needed by partial re-parse
	1639	%start formal_parameter
	1640	@dots{}
	1641	;; - Needed by EXPANDFULL clauses
	1642	%start formal_parameters
	1643	@dots{}
	1644
	1645	formal_parameter_list
	1646	: PAREN_BLOCK
	1647	(EXPANDFULL $1 formal_parameters)
	1648	;
	1649
	1650	formal_parameters
	1651	: LPAREN
	1652	()
	1653	\| RPAREN
	1654	()
	1655	\| formal_parameter COMMA
	1656	\| formal_parameter RPAREN
	1657	;
	1658
	1659	formal_parameter
	1660	: formal_parameter_modifier_opt type variable_declarator_id
	1661	(VARIABLE-TAG $3 $2 nil :typemodifiers $1)
	1662	;
	1663	@end group
	1664	@end example
	1665
	1666	@findex EXPANDFULL
	1667	It shows the use of the @code{EXPANDFULL} grammar macro to parse a
	1668	@samp{PAREN_BLOCK} which contains a @samp{formal_parameter_list}.
	1669	@code{EXPANDFULL} tells to recursively parse @samp{formal_parameters}
	1670	inside @samp{PAREN_BLOCK}. The parser iterates until it digested all
	1671	available input data inside the @samp{PAREN_BLOCK}, trying to match
	1672	any of the @samp{formal_parameters} rules:
	1673
	1674	@itemize
	1675	@item @samp{LPAREN}
	1676
	1677	@item @samp{RPAREN}
	1678
	1679	@item @samp{formal_parameter COMMA}
	1680
	1681	@item @samp{formal_parameter RPAREN}
	1682	@end itemize
	1683
	1684	At each iteration it will return a @samp{formal_parameter} raw tag,
	1685	or @code{nil} to skip unwanted (single @samp{LPAREN} or @samp{RPAREN}
	1686	for example) or unexpected input data. Those raw tags will be
	1687	automatically expanded by the iterative back-end parser.
	1688
	1689	@node Bison style
	1690	@subsection Bison style
	1691
	1692	@cindex grammar bison style
	1693	What we call the @dfn{Bison style} is the traditional style of Bison's
	1694	grammars. Compared to iterative style, it is not straightforward to
	1695	use grammars written in Bison style in @semantic{}. Mainly because such
	1696	grammars are designed to parse the whole input data in one pass, and
	1697	don't use the iterative parser back-end mechanism (@pxref{Iterative
	1698	style}). With Bison style the parser is called once to parse the
	1699	grammar start nonterminal.
	1700
	1701	The following example is a snippet of the Bison style Java grammar
	1702	provided in the @semantic{} distribution in the file
	1703	@file{semantic/wisent/java.wy}.
	1704
	1705	@example
	1706	@group
	1707	%start formal_parameter
	1708	@dots{}
	1709
	1710	formal_parameter_list
	1711	: formal_parameter_list COMMA formal_parameter
	1712	(cons $3 $1)
	1713	\| formal_parameter
	1714	(list $1)
	1715	;
	1716
	1717	formal_parameter
	1718	: formal_parameter_modifier_opt type variable_declarator_id
	1719	(EXPANDTAG
	1720	(VARIABLE-TAG $3 $2 :typemodifiers $1)
	1721	)
	1722	;
	1723	@end group
	1724	@end example
	1725
	1726	The first consequence is that syntax errors are not automatically
	1727	handled by @semantic{}. Thus, it is necessary to explicitly handle
	1728	them at the grammar level, providing error recovery rules to skip
	1729	unexpected input data.
	1730
	1731	The second consequence is that the iterative parser can't do automatic
	1732	tag expansion, except for the start nonterminal value. It is
	1733	necessary to explicitly expand tags from concerned semantic actions by
	1734	calling the grammar macro @code{EXPANDTAG} with a raw tag as
	1735	parameter. See also @ref{Start nonterminals}, for incremental
	1736	re-parse considerations.
	1737
	1738	@node Mixed style
	1739	@subsection Mixed style
	1740
	1741	@cindex grammar mixed style
	1742	@example
	1743	@group
	1744	%start grammar
	1745	;; Reparse
	1746	%start prologue epilogue declaration nonterminal rule
	1747	@dots{}
	1748
	1749	%%
	1750
	1751	grammar:
	1752	prologue
	1753	\| epilogue
	1754	\| declaration
	1755	\| nonterminal
	1756	\| PERCENT_PERCENT
	1757	;
	1758	@dots{}
	1759
	1760	nonterminal:
	1761	SYMBOL COLON rules SEMI
	1762	(TAG $1 'nonterminal :children $3)
	1763	;
	1764
	1765	rules:
	1766	lifo_rules
	1767	(apply 'nconc (nreverse $1))
	1768	;
	1769
	1770	lifo_rules:
	1771	lifo_rules OR rule
	1772	(cons $3 $1)
	1773	\| rule
	1774	(list $1)
	1775	;
	1776
	1777	rule:
	1778	rhs
	1779	(let* ((rhs $1)
	1780	name type comps prec action elt)
	1781	@dots{}
	1782	(EXPANDTAG
	1783	(TAG name 'rule :type type :value comps :prec prec :expr action)
	1784	))
	1785	;
	1786	@end group
	1787	@end example
	1788
	1789	This example shows how iterative and Bison styles can be combined in
	1790	the same grammar to obtain a good compromise between grammar
	1791	complexity and an efficient parsing strategy in an interactive
	1792	environment.
	1793
	1794	@samp{nonterminal} is parsed using iterative style via the main
	1795	@samp{grammar} rule. The semantic action uses the @code{TAG} macro to
	1796	produce a raw tag, automagically expanded by @semantic{}.
	1797
	1798	But @samp{rules} part is parsed in Bison style! Why?
	1799
	1800	Rule delimiters are the colon (@code{:}), that follows the nonterminal
	1801	name, and a final semicolon (@code{;}). Unfortunately these
	1802	delimiters are not @code{open-paren}/@code{close-paren} type, and the
	1803	Emacs' syntactic analyzer can't easily isolate data between them to
	1804	produce a @samp{RULES_PART} parenthesis-block-like lexical token.
	1805	Consequently it is not possible to use @code{EXPANDFULL} to iterate in
	1806	@samp{RULES_PART}, like this:
	1807
	1808	@example
	1809	@group
	1810	nonterminal:
	1811	SYMBOL COLON rules SEMI
	1812	(TAG $1 'nonterminal :children $3)
	1813	;
	1814
	1815	rules:
	1816	RULES_PART ;; @strong{Map a parenthesis-block-like lexical token}
	1817	(EXPANDFULL $1 'rules)
	1818	;
	1819
	1820	rules:
	1821	COLON
	1822	()
	1823	OR
	1824	()
	1825	SEMI
	1826	()
	1827	rhs
	1828	rhs
	1829	(let* ((rhs $1)
	1830	name type comps prec action elt)
	1831	@dots{}
	1832	(TAG name 'rule :type type :value comps :prec prec :expr action)
	1833	)
	1834	;
	1835	@end group
	1836	@end example
	1837
	1838	In such cases, when it is difficult for Emacs to obtain
	1839	parenthesis-block-like lexical tokens, the best solution is to use the
	1840	traditional Bison style with error recovery!
	1841
	1842	In some extreme cases, it can also be convenient to extend the lexer,
	1843	to deliver new lexical tokens, to simplify the grammar.
	1844
	1845	@node Start nonterminals
	1846	@subsection Start nonterminals
	1847
	1848	@cindex start nonterminals
	1849	@cindex @code{reparse-symbol} property
	1850	When you write a grammar for @semantic{}, it is important to carefully
	1851	indicate the start nonterminals. Each one defines an entry point in
	1852	the grammar, and after parsing its semantic value is returned to the
	1853	back-end iterative engine. Consequently:
	1854
	1855	@strong{The semantic value of a start nonterminal must be a produced
	1856	by a TAG like grammar macro}.
	1857
	1858	Start nonterminals are declared by @code{%start} statements. When
	1859	nothing is specified the first nonterminal that appears in the grammar
	1860	is the start nonterminal.
	1861
	1862	Generally, the following nonterminals must be declared as start
	1863	symbols:
	1864
	1865	@itemize @bullet
	1866	@item The main grammar entry point
	1867	@quotation
	1868	Of course!
	1869	@end quotation
	1870
	1871	@item nonterminals passed to @code{EXPAND}/@code{EXPANDFULL}
	1872	@quotation
	1873	These grammar macros recursively parse a part of input data, based on
	1874	rules of the given nonterminal.
	1875
	1876	For example, the following will parse @samp{PAREN_BLOCK} data using
	1877	the @samp{formal_parameters} rules:
	1878
	1879	@example
	1880	@group
	1881	formal_parameter_list
	1882	: PAREN_BLOCK
	1883	(EXPANDFULL $1 formal_parameters)
	1884	;
	1885	@end group
	1886	@end example
	1887
	1888	The semantic value of @samp{formal_parameters} becomes the value of
	1889	the @code{EXPANDFULL} expression. It is a list of @semantic{} tags
	1890	spliced in the tags tree.
	1891
	1892	Because the automaton must know that @samp{formal_parameters} is a
	1893	start symbol, you must declare it like this:
	1894
	1895	@example
	1896	@group
	1897	%start formal_parameters
	1898	@end group
	1899	@end example
	1900	@end quotation
	1901	@end itemize
	1902
	1903	@cindex incremental re-parse
	1904	@cindex reparse-symbol
	1905	The @code{EXPANDFULL} macro has a side effect it is important to know,
	1906	related to the incremental re-parse mechanism of @semantic{}: the
	1907	nonterminal symbol parameter passed to @code{EXPANDFULL} also becomes
	1908	the @code{reparse-symbol} property of the tag returned by the
	1909	@code{EXPANDFULL} expression.
	1910
	1911	When buffer's data mapped by a tag is modified, @semantic{}
	1912	schedules an incremental re-parse of that data, using the tag's
	1913	@code{reparse-symbol} property as start nonterminal.
	1914
	1915	@strong{The rules associated to such start symbols must be carefully
	1916	reviewed to ensure that the incremental parser will work!}
	1917
	1918	Things are a little bit different when the grammar is written in Bison
	1919	style.
	1920
	1921	@strong{The @code{reparse-symbol} property is set to the nonterminal
	1922	symbol the rule that explicitly uses @code{EXPANDTAG} belongs to.}
	1923
	1924	For example:
	1925
	1926	@example
	1927	@group
	1928	rule:
	1929	rhs
	1930	(let* ((rhs $1)
	1931	name type comps prec action elt)
	1932	@dots{}
	1933	(EXPANDTAG
	1934	(TAG name 'rule :type type :value comps :prec prec :expr action)
	1935	))
	1936	;
	1937	@end group
	1938	@end example
	1939
	1940	Set the @code{reparse-symbol} property of the expanded tag to
	1941	@samp{rule}. A important consequence is that:
	1942
	1943	@strong{Every nonterminal having any rule that calls @code{EXPANDTAG}
	1944	in a semantic action, should be declared as a start symbol!}
	1945
	1946	@node Useful functions
	1947	@subsection Useful functions
	1948
	1949	Here is a description of some predefined functions it might be useful
	1950	to know when writing new code to use Wisent in @semantic{}:
	1951
	1952	@findex wisent-collect-unmatched-syntax
	1953	@defun wisent-collect-unmatched-syntax input
	1954	Add @var{input} lexical token to the cache of unmatched tokens, in
	1955	variable @code{semantic-unmatched-syntax-cache}.
	1956
	1957	See implementation of the function @code{wisent-skip-token} in
	1958	@ref{Error recovery}, for an example of use.
	1959	@end defun
	1960
	1961	@node Wisent Lex
	1962	@section The Wisent Lex lexer
	1963
	1964	@findex semantic-lex
	1965	The lexical analysis step of @semantic{} is performed by the general
	1966	function @code{semantic-lex}. For more information, @inforef{Writing
	1967	Lexers, ,semantic-langdev}.
	1968
	1969	@code{semantic-lex} produces lexical tokens of the form:
	1970
	1971	@example
	1972	@group
	1973	@code{(@var{token-class start} . @var{end})}
	1974	@end group
	1975	@end example
	1976
	1977	@table @var
	1978	@item token-class
	1979	Is a symbol that identifies a lexical token class, like @code{symbol},
	1980	@code{string}, @code{number}, or @code{PAREN_BLOCK}.
	1981
	1982	@item start
	1983	@itemx end
	1984	Are the start and end positions of mapped data in the input buffer.
	1985	@end table
	1986
	1987	The Wisent's parser doesn't depend on the nature of analyzed input
	1988	stream (buffer, string, etc.), and requires that lexical tokens have a
	1989	different form (@pxref{Writing a lexer}):
	1990
	1991	@example
	1992	@group
	1993	@code{(@var{token-class value} [@var{start} . @var{end}])}
	1994	@end group
	1995	@end example
	1996
	1997	@cindex lexical token mapping
	1998	@code{wisent-lex} is the default Wisent's lexer used in @semantic{}.
	1999
	2000	@vindex wisent-lex-istream
	2001	@findex wisent-lex
	2002	@defun wisent-lex
	2003	Return the next available lexical token in Wisent's form.
	2004
	2005	The variable @code{wisent-lex-istream} contains the list of lexical
	2006	tokens produced by @code{semantic-lex}. Pop the next token available
	2007	and convert it to a form suitable for the Wisent's parser.
	2008	@end defun
	2009
	2010	Mapping of lexical tokens as produced by @code{semantic-lex} into
	2011	equivalent Wisent lexical tokens is straightforward:
	2012
	2013	@example
	2014	@group
	2015	(@var{token-class start} . @var{end})
	2016	@result{} (@var{token-class value start} . @var{end})
	2017	@end group
	2018	@end example
	2019
	2020	@var{value} is the input @code{buffer-substring} from @var{start} to
	2021	@var{end}.
	2022
	2023	@node GNU Free Documentation License
	2024	@appendix GNU Free Documentation License
	2025
	2026	@include doclicense.texi
	2027
	2028	@node Index
	2029	@unnumbered Index
	2030	@printindex cp
	2031
	2032	@iftex
	2033	@contents
	2034	@summarycontents
	2035	@end iftex
	2036
	2037	@bye
	2038
	2039	@c Following comments are for the benefit of ispell.
	2040
	2041	@c LocalWords: Wisent automagically wisent Wisent's LALR obarray