Commit | Line | Data |
---|---|---|
7f918cf1 CE |
1 | \documentstyle{article} |
2 | \title{ ML-Yacc User's Manual \\ | |
3 | Version 2.4 | |
4 | } | |
5 | \author{ David R. Tarditi$^1$\\ | |
6 | Andrew W. Appel$^2$\\ | |
7 | \\ | |
8 | $^1$Microsoft Research \\ | |
9 | \\ | |
10 | $^2$Department of Computer Science \\ | |
11 | Princeton University \\ | |
12 | Princeton, NJ 08544 | |
13 | } | |
14 | \date{April 24, 2000} | |
15 | ||
16 | \begin{document} | |
17 | \maketitle | |
18 | \begin{center} | |
19 | (c) 1989, 1990, 1991,1994 Andrew W. Appel, David R. Tarditi | |
20 | \end{center} | |
21 | ||
22 | {\bf | |
23 | This software comes with ABSOLUTELY NO WARRANTY. It is subject only to | |
24 | the terms of the ML-Yacc NOTICE, LICENSE, and DISCLAIMER (in the | |
25 | file COPYRIGHT distributed with this software). | |
26 | } | |
27 | ||
28 | New in this version: Improved error correction directive \verb|%change| | |
29 | that allows multi-token insertions, deletions, substitutions. | |
30 | Explanation of how to build a parser (Section 5) and the Calc example | |
31 | (Section 7) revised for SML/NJ Version 110 and the use of CM. | |
32 | ||
33 | \newpage | |
34 | \tableofcontents | |
35 | \newpage | |
36 | ||
37 | \section{Introduction} | |
38 | \subsection{General} | |
39 | ML-Yacc is a parser generator for Standard ML modeled after the | |
40 | Yacc parser generator. It generates parsers for LALR languages, like Yacc, | |
41 | and has a similar syntax. The generated parsers use a different algorithm | |
42 | for recovering from syntax errors than parsers generated by Yacc. | |
43 | The algorithm is a partial implementation of an algorithm described in \cite{bf}. | |
44 | A parser tries to recover from a syntax error | |
45 | by making a single token insertion, deletion, or | |
46 | substitution near the point in the input stream at which the error | |
47 | was detected. The parsers delay the evaluation of semantic actions until | |
48 | parses are completed successfully. This makes it possible for | |
49 | parsers to recover from syntax errors that occur before the point | |
50 | of error detection, but it does prevent the parsers from | |
51 | affecting lexers in any significant way. The parsers | |
52 | can insert tokens with values and substitute tokens with values | |
53 | for other tokens. All symbols carry left and right position values | |
54 | which are available to semantic actions and are used in | |
55 | syntactic error messages. | |
56 | ||
57 | ML-Yacc uses context-free grammars to specify the syntax of languages to | |
58 | be parsed. See \cite{ahu} for definitions and information on context-free | |
59 | grammars and LR parsing. We briefly review some terminology here. A | |
60 | context-free grammar is defined by a set of terminals $T$, a set of | |
61 | nonterminals $NT$, a set of productions $P$, and a start | |
62 | nonterminal $S$. | |
63 | Terminals are interchangeably referred to as tokens. The terminal | |
64 | and nonterminal sets are assumed to be disjoint. The set of symbols is the | |
65 | union of the nonterminal and terminal sets. We use lower case | |
66 | Greek letters to denote a string of symbols. We use upper case | |
67 | Roman letters near the beginning of the alphabet to denote nonterminals. | |
68 | Each production gives a | |
69 | derivation of a string of symbols from a nonterminal, which we will | |
70 | write as $A \rightarrow \alpha$. We define a relation between strings of | |
71 | symbols $\alpha$ and $\beta$, written $\alpha \vdash \beta$ and read | |
72 | as $\alpha$ derives $\beta$, if and only if $\alpha = \delta A \gamma$, | |
73 | $\beta = \delta \phi \gamma$ and | |
74 | there exists some production $A \rightarrow \phi$. We write the | |
75 | transitive closure of this relation as | |
76 | $\vdash_*$. We say that a string of terminals $\alpha$ is a valid sentence | |
77 | of the language, {\em i.e.} it is derivable, if the start symbol | |
78 | $S \vdash_* \alpha$. The sequence of derivations is often | |
79 | visualized as a parse tree. | |
80 | ||
81 | ML-Yacc uses an attribute grammar scheme with synthesized attributes. | |
82 | Each symbol in the grammar may have a value (i.e. attribute) associated | |
83 | with it. Each production has a semantic action associated with it. | |
84 | A production with a semantic action is called a rule. | |
85 | Parsers perform bottom-up, left-to-right evaluations of parse trees using semantic | |
86 | actions to compute values as they do so. Given a production | |
87 | $P = A \rightarrow \alpha$, the corresponding semantic action is | |
88 | used to compute a value for $A$ from the values of the symbols in $\alpha$. | |
89 | If $A$ has no value, the semantic action is still evaluated but the value is ignored. | |
90 | Each parse returns the value associated with the start symbol $S$ of the | |
91 | grammar. A parse returns a nullary value if the start symbol does not carry a value. | |
92 | ||
93 | The synthesized attribute scheme can be adapted easily to inherited | |
94 | attributes. An inherited attribute is a value which propagates from | |
95 | a nonterminal to the symbols produced by the nonterminal according to | |
96 | some rule. Since functions are values in ML, | |
97 | the semantic actions for the derived symbols | |
98 | can return functions which takes the | |
99 | inherited value as an argument. | |
100 | ||
101 | \subsection{Modules} | |
102 | ML-Yacc uses the ML modules facility to specify the interface between | |
103 | a parser that it generates and a lexical analyzer that must be supplied | |
104 | by you. It also uses the ML modules facility to factor out | |
105 | a set of modules that are common to every generated parser. | |
106 | These common modules include a parsing structure, which contains | |
107 | an error-correcting LR parser\footnote{A plain LR parser is also | |
108 | available.}, an LR table structure, and a structure | |
109 | which defines the representation of terminals. ML-Yacc produces | |
110 | a functor for a particular parser parameterized by the LR table | |
111 | structure and the representation of terminals. This functor | |
112 | contains values specific to the parser, such as the | |
113 | LR table for the parser\footnote{The LR table is a value. The | |
114 | LR table structure defines an abstract LR table type.}, the | |
115 | semantic actions for the parser, and a structure containing | |
116 | the terminals for the parser. ML-Yacc produces a signature | |
117 | for the structure produced by applying this functor | |
118 | and another signature for the structure containing the terminals for | |
119 | the parser. You must | |
120 | supply a functor for the lexing module parameterized this | |
121 | structure. | |
122 | ||
123 | Figure 1 is a dependency diagram of the modules that summarizes this | |
124 | information. A module at the head of an arrow is dependent | |
125 | on the module at the tail. | |
126 | ||
127 | \begin{figure} | |
128 | \begin{tabular}{|rcl|} | |
129 | \hline | |
130 | parsing structure & $\longrightarrow$ & values for a particular parser\\ | |
131 | values for a particular parser & $\longrightarrow$ & lexical analyzer\\ | |
132 | parsing structure, & $\longrightarrow$ & particular parser\\ | |
133 | values for a particular parser, & & \\ | |
134 | lexical analyzer & & \\ | |
135 | \hline | |
136 | \end{tabular} | |
137 | \caption{Module Dependencies} | |
138 | \end{figure} | |
139 | ||
140 | \subsection{Error Recovery} | |
141 | ||
142 | The error recovery algorithm is able to accurately recover from many | |
143 | single token syntax errors. It tries to make a single token | |
144 | correction at the token in the input stream at which the syntax error | |
145 | was detected and any of the 15 tokens\footnote{An arbitrary number | |
146 | chosen because numbers above this do not seem to improve error | |
147 | correction much.} before that token. The algorithm checks corrections | |
148 | before the point of error detection because a syntax error is often | |
149 | not detected until several tokens beyond the token which caused the | |
150 | error.\footnote{An LR parser detects a syntax error as soon as | |
151 | possible, but this does not necessarily mean that the token at which | |
152 | the error was detected caused the error.} | |
153 | ||
154 | The algorithm works by trying corrections at each | |
155 | of the 16 tokens up to and including the token at which the | |
156 | error was detected. At each token in the input stream, it | |
157 | will try deleting the token, substituting other tokens for the | |
158 | token, or inserting some other token before the token. | |
159 | ||
160 | The algorithm uses a parse check to evaluate corrections. A parse | |
161 | check is a check of how far a correction allows a parser to | |
162 | parse without encountering a syntax error. | |
163 | You pass an upper bound on how many tokens beyond the error | |
164 | point a parser may read while doing a parse check as an argument to the | |
165 | parser. This allows | |
166 | you to control the amount of lookahead that a parser reads | |
167 | for different kinds of systems. For an interactive system, you | |
168 | should set the lookahead to zero. Otherwise, a parser may hang | |
169 | waiting for input in the case of a syntax error. If the lookahead | |
170 | is zero, no syntax errors will be corrected. For a batch system, | |
171 | you should set the lookahead to 15. | |
172 | ||
173 | The algorithm selects the set of corrections which allows the parse | |
174 | to proceed the farthest | |
175 | and parse through at least the error token. It then removes those | |
176 | corrections involving keywords which do not meet a longer minimum | |
177 | parse check. If there is more than one correction possible after this, | |
178 | it uses a simple heuristic priority scheme to order the corrections, | |
179 | and then arbitrarily chooses one of the corrections with the highest priority. | |
180 | You have some control over the priority scheme by being able to | |
181 | name a set of preferred insertions and a set of preferred substitutions. | |
182 | The priorities for corrections, ordered from highest to lowest | |
183 | priority, are | |
184 | preferred insertions, preferred substitutions, insertions, deletions, | |
185 | and substitutions. | |
186 | ||
187 | The error recovery algorithm is guaranteed to terminate since it always | |
188 | selects fixes which parse through the | |
189 | error token. | |
190 | ||
191 | The error-correcting LR parser implements the algorithm by keeping | |
192 | a queue of its state stacks before shifting tokens and using | |
193 | a lazy stream for the lexer. | |
194 | This makes it possible to restart the | |
195 | parse from before an error point and try various corrections. The | |
196 | error-correcting LR parser does not defer semantic actions. Instead, | |
197 | ML-Yacc creates semantic actions which are free of side-effects | |
198 | and always terminate. | |
199 | ML-Yacc uses higher-order functions to defer the | |
200 | evaluation of all user semantic actions until the parse is successfully | |
201 | completed without constructing an explicit parse tree. | |
202 | You may declare whether your semantic actions are free of side-effects | |
203 | and always terminate, in which case ML-Yacc does not need to defer | |
204 | the evaluation of your semantic actions. | |
205 | ||
206 | \subsection{Precedence} | |
207 | ML-Yacc uses the same precedence scheme as Yacc for resolving | |
208 | shift/reduce conflicts. Each terminal may be assigned a precedence and | |
209 | associativity. Each rule is then assigned the precedence of its rightmost | |
210 | terminal. If a shift/reduce conflict occurs, the conflict is resolved | |
211 | silently if the terminal and the rule in the conflict have | |
212 | precedences. | |
213 | If the terminal has the higher precedence, the shift is chosen. If | |
214 | the rule has the higher precedence, the reduction is chosen. If both | |
215 | the terminal and the rule have the same precedence, then the associativity | |
216 | of the terminal is used to resolve the conflict. If the terminal is | |
217 | left associative, the reduction is chosen. If the terminal is | |
218 | right associative, the shift is chosen. Terminals may be declared to | |
219 | be nonassociative, also, in which case an error message is produced | |
220 | if the associativity is need to resolve the parsing conflict. | |
221 | ||
222 | If a terminal or a rule in a shift/reduce conflict does not have | |
223 | a precedence, then an error message is produced and the shift | |
224 | is chosen. | |
225 | ||
226 | In reduce/reduce conflicts, an error message is always produced and | |
227 | the first rule listed in the specification is chosen for reduction. | |
228 | \subsection{Notation} | |
229 | ||
230 | Text surrounded by brackets denotes meta-notation. If you see | |
231 | something like \{parser name\}, you should substitute the actual | |
232 | name of your parser for the meta-notation. Text in a bold-face | |
233 | typewriter font ({\tt like this}) denotes text in a specification | |
234 | or ML code. | |
235 | ||
236 | \section{ML-Yacc specifications} | |
237 | ||
238 | An ML-Yacc specification consists of three parts, each of which is | |
239 | separated from the others by a {\tt \%\%} delimiter. The general format is: | |
240 | \begin{quote} | |
241 | \tt | |
242 | \{user declarations\} \\ | |
243 | \%\% \\ | |
244 | \{ML-Yacc declarations\} \\ | |
245 | \%\% \\ | |
246 | \{rules\} | |
247 | \end{quote} | |
248 | ||
249 | You can define values available in the semantic | |
250 | actions of the rules in the user declarations section. | |
251 | It is recommended that you keep the size of this | |
252 | section as small as possible and place large | |
253 | blocks of code in other modules. | |
254 | ||
255 | The ML-Yacc declarations section is used to make a set | |
256 | of required declarations and a set of optional declarations. | |
257 | You must declare the nonterminals and terminals and the | |
258 | types of the values associated with them there. You must | |
259 | also name the parser and declare the type of position values. | |
260 | You should specify the set of terminals which can follow | |
261 | the start symbol and the set of non-shiftable terminals. | |
262 | You may optionally declare precedences for terminals, | |
263 | make declarations that will | |
264 | improve error-recovery, and suppress the generation of | |
265 | default reductions in the parser. You may | |
266 | declare whether the parser generator should create | |
267 | a verbose description of the parser in a ``.desc'' file. This is useful | |
268 | for finding the causes of shift/reduce errors and other parsing conflicts. | |
269 | ||
270 | You may also declare whether the semantic actions are | |
271 | free of significant side-effects and always terminate. Normally, ML-Yacc | |
272 | delays the evaluation of semantic actions until the completion of a | |
273 | successful parse. This ensures that there will be no semantic actions | |
274 | to ``undo'' if a syntactic error-correction invalidates some semantic | |
275 | actions. If, however, the semantic actions are free of significant | |
276 | side-effects and always terminate, the results of semantic actions that | |
277 | are invalidated by a syntactic error-correction can always be safely | |
278 | ignored. | |
279 | ||
280 | Parsers run faster and need less memory when it is not | |
281 | necessary to delay the evaluation of semantic actions. You are | |
282 | encouraged to write semantic actions that are free of side-effects and | |
283 | always terminate and to declare this information to ML-Yacc. | |
284 | ||
285 | A semantic action is free of significant side-effects if it can be reexecuted | |
286 | a reasonably small number of times without affecting the result of a | |
287 | parse. (The reexecution occurs when the error-correcting parser is testing | |
288 | possible corrections to fix a syntax error, and the number of times | |
289 | reexecution occurs is roughly bounded, for each syntax error, by the number of | |
290 | terminals times the amount of lookahead permitted for the error-correcting | |
291 | parser). | |
292 | ||
293 | The rules section contains the context-free grammar productions and their | |
294 | associated semantic actions. | |
295 | ||
296 | \subsection{Lexical Definitions} | |
297 | ||
298 | Comments have the same lexical definition as they do in Standard | |
299 | ML and can be placed anywhere in a specification. | |
300 | ||
301 | All characters up to the first occurrence of a delimiting | |
302 | {\tt \%\%} outside of | |
303 | a comment are placed in the user declarations section. After that, the | |
304 | following words and symbols are reserved: | |
305 | \begin{quote} | |
306 | ||
307 | \verb'of for = { } , * -> : | ( )' | |
308 | ||
309 | \end{quote} | |
310 | ||
311 | The following classes of ML symbols are used: | |
312 | \begin{quote} | |
313 | \begin{description} | |
314 | \item[identifiers:] | |
315 | nonsymbolic ML identifiers, which consist | |
316 | of an alphabetic character followed by one or | |
317 | more alphabetic characters, numeric characters, | |
318 | primes ``{\tt '}'', or underscores ``{\tt \_}''. | |
319 | \item[type variables:] | |
320 | nonsymbolic ML identifier starting with a prime ``{\tt '}'' | |
321 | \item[integers:] one or more decimal digits. | |
322 | \item[qualified identifiers:] an identifer followed by a period. | |
323 | ||
324 | \end{description} | |
325 | \end{quote} | |
326 | The following classes of non-ML symbols are used: | |
327 | \begin{quote} | |
328 | \begin{description} | |
329 | \item[\% identifiers:] | |
330 | a percent sign followed by one or more lowercase | |
331 | alphabet letters. The valid \% identifiers | |
332 | are: | |
333 | \begin{quote} | |
334 | \raggedright | |
335 | \tt | |
336 | \%arg \%eop \%header \%keyword \%left \%name \%nodefault | |
337 | \%nonassoc \%nonterm \%noshift \%pos \%prec \%prefer | |
338 | \%pure \%right \%start \%subst \%term \%value \%verbose | |
339 | \end{quote} | |
340 | \item[code:] | |
341 | This class is meant to hold ML code. The ML code is not | |
342 | parsed for syntax errors. It consists of a left parenthesis | |
343 | followed by all characters up to a balancing right | |
344 | parenthesis. Parentheses in ML comments and ML strings | |
345 | are excluded from the count of balancing parentheses. | |
346 | ||
347 | \end{description} | |
348 | \end{quote} | |
349 | ||
350 | \subsection{Grammar} | |
351 | ||
352 | This is the grammar for specifications: | |
353 | \begin{eqnarray*} | |
354 | \mbox{spec} & ::= & \mbox{user-declarations {\tt \%\%} cmd-list {\tt \%\%} rule-list} \\ | |
355 | \mbox{ML-type} & ::= & \mbox{nonpolymorphic ML types (see the Standard ML manual)} \\ | |
356 | \mbox{symbol} & ::= & \mbox{identifier} \\ | |
357 | \mbox{symbol-list} & ::= & \mbox{symbol-list symbol} \\ | |
358 | & | & \epsilon \\ | |
359 | \mbox{symbol-type-list} & ::= & \mbox{symbol-type-list {\tt |} symbol {\tt of} ML-type} \\ | |
360 | & | & \mbox{symbol-type list {\tt |} symbol} \\ | |
361 | & | & \mbox{symbol {\tt of} ML-type} \\ | |
362 | & | & \mbox{symbol} \\ | |
363 | \mbox{subst-list} & ::= & \mbox{subst-list {\tt |} symbol {\tt for} symbol} \\ | |
364 | & | & \epsilon \\ | |
365 | \mbox{cmd} & ::= & \mbox{{\tt \%arg} (Any-ML-pattern) {\tt :} ML-type} \\ | |
366 | & | & \mbox{{\tt \%eop} symbol-list} \\ | |
367 | & | & \mbox{{\tt \%header} code} \\ | |
368 | & | & \mbox{{\tt \%keyword} symbol-list} \\ | |
369 | & | & \mbox{{\tt \%left} symbol-list} \\ | |
370 | & | & \mbox{{\tt \%name} identifier} \\ | |
371 | & | & \mbox{{\tt \%nodefault}} \\ | |
372 | & | & \mbox{{\tt \%nonassoc} symbol-list} \\ | |
373 | & | & \mbox{{\tt \%nonterm} symbol-type list} \\ | |
374 | & | & \mbox{{\tt \%noshift} symbol-list } \\ | |
375 | & | & \mbox{{\tt \%pos} ML-type} \\ | |
376 | & | & \mbox{{\tt \%prefer} symbol-list} \\ | |
377 | & | & \mbox{\tt \%pure} \\ | |
378 | & | & \mbox{{\tt \%right} symbol-list} \\ | |
379 | & | & \mbox{{\tt \%start} symbol} \\ | |
380 | & | & \mbox{{\tt \%subst} subst-list} \\ | |
381 | & | & \mbox{{\tt \%term} symbol-type-list} \\ | |
382 | & | & \mbox{{\tt \%value} symbol code} \\ | |
383 | & | & \mbox{{\tt \%verbose}} \\ | |
384 | \mbox{cmd-list} & ::= &\mbox{ cmd-list cmd} \\ | |
385 | & | & \mbox{cmd} \\ | |
386 | \mbox{rule-prec} & ::= & \mbox{{\tt \%prec} symbol} \\ | |
387 | & | & \epsilon \\ | |
388 | \mbox{clause-list} & ::= & \mbox{symbol-list rule-prec code} \\ | |
389 | & | & \mbox{clause-list {\tt |} symbol-list rule-prec code} \\ | |
390 | \mbox{rule} & ::= & \mbox{symbol {\tt :} clause-list} \\ | |
391 | \mbox{rule-list} & ::= & \mbox{rule-list rule} \\ | |
392 | & | & \mbox{rule} | |
393 | \end{eqnarray*} | |
394 | \subsection{Required ML-Yacc Declarations} | |
395 | \begin{description} | |
396 | \item[{\tt \%name}] | |
397 | You must specify the name of the parser with {\tt \%name} \{name\}. | |
398 | \item[{\tt \%nonterm} and {\tt \%term}] | |
399 | You must define the terminal and nonterminal sets using the | |
400 | {\tt \%term} and {\tt \%nonterm} | |
401 | declarations, respectively. These declarations are like an ML datatype | |
402 | definition. | |
403 | The type of the value that a symbol may carry is defined at the same time | |
404 | that the symbol is defined. Each declarations consists of the keyword | |
405 | ({\tt \%term} or {\tt \%nonterm}) | |
406 | followed by a list of symbol entries separated by a bar (``{\tt |}''). | |
407 | Each symbol entry is a symbol name followed by an optional | |
408 | ``of \/ $<$ML-type$>$''. The types cannot be polymorphic. | |
409 | Those symbol entries without a type carry no value. | |
410 | Nonterminal and terminal names must be disjoint and no name may be declared | |
411 | more than once in either declaration. | |
412 | ||
413 | The symbol names and types are used to construct a datatype union for the | |
414 | values on the semantic stack in the LR parser and to name the values | |
415 | associated with subcomponents of a rule. The names and types of | |
416 | terminals are also used to construct a signature for a structure that | |
417 | may be passed to the lexer functor. | |
418 | ||
419 | Because the types and names are used in these manners, do | |
420 | not use ML keywords as symbol names. The programs produced by ML-Yacc | |
421 | will not compile if ML keywords are used as symbol names. | |
422 | Make sure that the types specified in the {\tt \%term} declaration are | |
423 | fully qualified types or are available in the background | |
424 | environment when the signatures produced by ML-Yacc are loaded. Do | |
425 | not use any locally defined types from the user declarations section of | |
426 | the specification. | |
427 | ||
428 | These requirements on the types in the {\tt \%term} declaration are not | |
429 | a burden. | |
430 | They force the types to be defined in another module, | |
431 | which is a good idea since these types will | |
432 | be used in the lexer module. | |
433 | \item[{\tt \%pos}] | |
434 | You must declare the type of position values using the {\tt \%pos} declaration. | |
435 | The syntax is {\tt \%pos} $<$ML-type$>$. | |
436 | This type MUST be the same type as that which is actually found in the lexer. | |
437 | It cannot be polymorphic. | |
438 | ||
439 | \end{description} | |
440 | \subsection{Optional ML-Yacc Declarations} | |
441 | \label{optional-def} | |
442 | \begin{description} | |
443 | \item[{\tt \%arg}] | |
444 | You may want each invocation of the entire parser to be parameterized | |
445 | by a particular argument, such as the file-name of the input | |
446 | being parsed in an invocation of the parser. The {\tt \%arg} declaration | |
447 | allows you to specify such an argument. | |
448 | (This is often cleaner than using ``global'' reference variables.) | |
449 | The declaration | |
450 | \begin{quote} | |
451 | ||
452 | {\tt \%arg} (Any-ML-pattern) {\tt :} $<$ML-type$>$ | |
453 | ||
454 | \end{quote} | |
455 | specifies the argument to the parser, as well as its type. For example: | |
456 | \begin{quote} | |
457 | ||
458 | {\tt \%arg (filename) : string} | |
459 | ||
460 | \end{quote} | |
461 | ||
462 | If {\tt \%arg} is not specified, it defaults to {\tt () : unit}. | |
463 | \item[{\tt \%eop} and {\tt \%noshift}] | |
464 | You should specify the set of | |
465 | terminals that may follow the start | |
466 | symbol, also called end-of-parse symbols, using the {\tt \%eop} | |
467 | declaration. The {\tt \%eop} keyword should be followed by the list of | |
468 | terminals. This is useful, for example, in an interactive system | |
469 | where you want to force the evaluation of a statement before an | |
470 | end-of-file (remember, a parser delays the execution of semantic | |
471 | actions until a parse is successful). | |
472 | ||
473 | ML-Yacc has no concept of an end-of-file. You must | |
474 | define an end-of-file terminal (EOF, perhaps) in the | |
475 | {\tt \%term} declaration. | |
476 | You must declare terminals which cannot be shifted, such as | |
477 | end-of-file, in the {\tt \%noshift} declaration. The | |
478 | {\tt \%noshift} keyword should be followed by the list of non-shiftable | |
479 | terminals. An error message will be printed if a non-shiftable terminal | |
480 | is found on the right hand side of any rule, but ML-Yacc will not prevent | |
481 | you from using such grammars. | |
482 | ||
483 | It is important to emphasize that | |
484 | \begin{em} | |
485 | non-shiftable terminals must be declared. | |
486 | \end{em} | |
487 | The error-correcting parser may attempt to read past such terminals | |
488 | while evaluating a correction to a syntax error otherwise. This may | |
489 | confuse the lexer. | |
490 | \item[{\tt \%header}] | |
491 | You may define code to head the functor \{parser name\}LrValsFun here. This | |
492 | may be useful for adding additonal parameter structures to the functor. | |
493 | The functor must be parameterized by the Token structure, so | |
494 | the declaration should always have the form: | |
495 | \begin{quote} | |
496 | \begin{verbatim} | |
497 | %header (functor {parser name}LrValsFun( | |
498 | structure Token : TOKEN | |
499 | ...) | |
500 | ) | |
501 | \end{verbatim} | |
502 | \end{quote} | |
503 | ||
504 | \item[{\tt \%left},{\tt \%right},{\tt \%nonassoc}] | |
505 | You should list the precedence declarations in order of increasing (tighter-binding) | |
506 | precedence. Each precedence declaration consists | |
507 | of \% keyword specifying associativity followed by a list of terminals. | |
508 | The keywords are {\tt \%left}, {\tt \%right}, and {\tt \%nonassoc}, | |
509 | standing for their respective associativities. | |
510 | \item[{\tt \%nodefault}] | |
511 | The {\tt \%nodefault} declaration suppresses the generation of default | |
512 | reductions. If only one production can be reduced in a given state in | |
513 | an LR table, it may be made the default action for the state. An incorrect | |
514 | reduction will be caught later when the parser attempts to shift the lookahead | |
515 | terminal which caused the reduction. ML-Yacc usually produces programs and | |
516 | verbose files with default reductions. This saves a great deal of | |
517 | space in representing the LR tables,but | |
518 | sometimes it is useful for debugging and advanced | |
519 | uses of the parser to suppress the generation of default reductions. | |
520 | \item[{\tt \%pure}] | |
521 | Include the {\tt \%pure} declaration if the semantic actions | |
522 | are free of significant side-effects and always terminate. | |
523 | \item[{\tt \%start}] | |
524 | You may define the start symbol using | |
525 | the {\tt \%start} declaration. Otherwise the nonterminal for the | |
526 | first rule will be used as the start nonterminal. | |
527 | The keyword {\tt \%start} should be followed by the name of the starting | |
528 | nonterminal. This nonterminal should not be used on the right hand | |
529 | side of any rules, to avoid conflicts between reducing to the start | |
530 | symbol and shifting a terminal. ML-Yacc will not prevent you | |
531 | from using such grammars, but it will print a warning message. | |
532 | \item[{\tt \%verbose}] | |
533 | ||
534 | Include the {\tt \%verbose} declaration to produce a verbose | |
535 | description of the LALR parser. The name of this file is | |
536 | the name of the specification file with a ``.desc'' appended to it. | |
537 | ||
538 | This file has the following format: | |
539 | \begin{enumerate} | |
540 | ||
541 | \item A summary of errors found while generating the LALR tables. | |
542 | \item A detailed description of all errors. | |
543 | \item A description of the states of the parser. Each state | |
544 | is preceded by a list of conflicts in the state. | |
545 | ||
546 | \end{enumerate} | |
547 | \end{description} | |
548 | ||
549 | \subsection{Declarations for improving error-recovery} | |
550 | ||
551 | These optional declarations improve error-recovery: | |
552 | ||
553 | \begin{description} | |
554 | \item[{\tt \%keyword}] | |
555 | Specify all keywords in a grammar here. The {\tt \%keyword} | |
556 | should be followed by a list | |
557 | of terminal names. Fixes involving keywords are generally dangerous; | |
558 | they are prone to substantially altering the syntactic meaning | |
559 | of the program. They are subject to a more rigorous parse check than | |
560 | other fixes. | |
561 | ||
562 | \item[{\tt \%prefer}] | |
563 | List terminals to prefer for insertion after the {\tt \%prefer}. | |
564 | Corrections which insert a terminal on this list will be chosen over | |
565 | other corrections, all other things being equal. | |
566 | \item[{\tt \%subst}] | |
567 | This declaration should be followed by a list of clauses of the | |
568 | form \{terminal\} {\tt for} \{terminal\}, where items on the list are | |
569 | separated using a {\tt |}. Substitution corrections on this list | |
570 | will be chosen over all other corrections except preferred insertion | |
571 | corrections (listed above), all other things being equal. | |
572 | \item[{\tt \%change}] | |
573 | This is a generalization of {\tt \%prefer} and {\tt \%subst}. | |
574 | It takes a the following syntax: | |
575 | \begin{quote} | |
576 | ${\it tokens}_{1a}$ \verb|->| ${\it tokens}_{1b}$ \verb+|+ ${\it tokens}_{2a}$ \verb|->| ${\it tokens}_{2b}$ {\it etc.} | |
577 | \end{quote} | |
578 | where each {\it tokens} is a (possibly empty) seqence of tokens. The | |
579 | idea is that any instance of ${\it tokens}_{1a}$ can be ``corrected'' to | |
580 | ${\it tokens}_{1b}$, and so on. For example, to suggest that a good | |
581 | error correction to try is \verb|IN ID END| (which is useful for the | |
582 | ML parser), write, | |
583 | \begin{verbatim} | |
584 | %change -> IN ID END | |
585 | \end{verbatim} | |
586 | \item[{\tt \%value}] | |
587 | The error-correction algorithm may also insert terminals with values. | |
588 | You must supply a value for such a terminal. The keyword | |
589 | should be followed by a terminal and a piece of | |
590 | code (enclosed in parentheses) that when evaluated supplies the value. | |
591 | There must be a separate {\tt \%value} declaration for each terminal with | |
592 | a value that you wish may be inserted or substituted in an error correction. | |
593 | The code for the value is not evaluated until the parse is | |
594 | successful. | |
595 | ||
596 | Do not specify a {\tt \%value} for terminals without | |
597 | values. This will result in a type error in the program produced by | |
598 | ML-Yacc. | |
599 | \end{description} | |
600 | ||
601 | \subsection{Rules} | |
602 | ||
603 | All rules are declared in the final section, after the last {\tt \%\%} | |
604 | delimiter. A rule consists of a left hand side nonterminal, followed by | |
605 | a colon, followed by a list of right hand side clauses. | |
606 | ||
607 | The right hand side clauses should be separated by bars (``{\tt |}''). Each | |
608 | clause consists of a list of nonterminal and terminal symbols, followed | |
609 | by an optional {\tt \%prec} declaration, and then followed by the code to be | |
610 | evaluated when the rule is reduced. | |
611 | ||
612 | The optional {\tt \%prec} consists of the keyword {\tt \%prec} followed by a | |
613 | terminal whose precedence should be used as the precedence of the | |
614 | rule. | |
615 | ||
616 | The values of those symbols on the right hand side which have values are | |
617 | available inside the code. Positions for all the symbols are also | |
618 | available. | |
619 | Each value has the general form \{symbol name\}\{n+1\}, where \{n\} is the | |
620 | number of occurrences of the symbol to the left of the symbol. If | |
621 | the symbol occurs only once in the rule, \{symbol name\} may also | |
622 | be used. | |
623 | The positions are given by \{symbol~name\}\{n+1\}left and | |
624 | \{symbol~name\}\{n+1\}right. where \{n\} is defined as before. | |
625 | The position for a null rhs of | |
626 | a production is assumed to be the leftmost position of the lookahead | |
627 | terminal which is causing the reduction. This position value is | |
628 | available in {\tt defaultPos}. | |
629 | ||
630 | The value to which the code evaluates is used as the value of the | |
631 | nonterminal. The type of the value and the nonterminal must match. | |
632 | The value is ignored if the nonterminal has no value, but is still | |
633 | evaluated for side-effects. | |
634 | ||
635 | \section{Producing files with ML-Yacc} | |
636 | ||
637 | ML-Yacc may be used from the interactive system or built as a | |
638 | stand-alone program which may be run from the Unix command line. | |
639 | See the file {\bf README} in the mlyacc directory for directions | |
640 | on installing ML-Yacc. We recommend thaat ML-Yacc be installed as | |
641 | a stand-alone program. | |
642 | ||
643 | If you are using the stand-alone version of ML-Yacc, invoke the | |
644 | program ``sml-yacc'' with the name of the specifcation file. | |
645 | If you are using ML-Yacc in the interactive system, load the file | |
646 | ``smlyacc.sml''. The end result is a structure ParseGen, with one | |
647 | value parseGen in it. Apply parseGen to a string containing the | |
648 | name of the specification file. | |
649 | ||
650 | Two files will be created, one named by | |
651 | attaching ``.sig'' to the name of the specification, the other named by | |
652 | attaching ``.sml'' to the name of the specification. | |
653 | ||
654 | \section{The lexical analyzer} | |
655 | ||
656 | Let the name for | |
657 | the parser given in the {\tt \%name} declaration be denoted by \{n\} and | |
658 | the specification file name be denoted by \{spec name\} | |
659 | The parser generator creates a functor named \{n\}LrValsFun for | |
660 | the values needed for a particular parser. This functor is placed | |
661 | in \{spec name\}.sml. This | |
662 | functor contains a structure | |
663 | Tokens which allows you to construct terminals from the appropriate | |
664 | values. The structure has a function for each terminal that takes a tuple | |
665 | consisting of the value for the terminal (if there is any), a leftmost | |
666 | position for the terminal, and a rightmost position for the terminal and | |
667 | constructs the terminal from these values. | |
668 | ||
669 | A signature for the structure Tokens is created and placed in the ``.sig'' | |
670 | file created by ML-Yacc. This signature is \{n\}\_TOKENS, | |
671 | where \{n\} is | |
672 | the name given in the parser specification. A signature | |
673 | \{n\}\_LRVALS is created for the structure produced by | |
674 | applying \{n\}LrValsFun. | |
675 | ||
676 | Use the signature \{n\}\_TOKENS to create a functor for the | |
677 | lexical analyzer which takes the structure Tokens as an argument. The | |
678 | signature \{n\}\_TOKENS | |
679 | will not change unless the {\tt \%term} declaration in a | |
680 | specification is altered by adding terminals or | |
681 | changing the types of terminals. You do not need to recompile | |
682 | the lexical analyzer functor each time the specification for | |
683 | the parser is changed if the | |
684 | signature \{n\}\_TOKENS does not change. | |
685 | ||
686 | If you are using ML-Lex to create the lexical analyzer, you | |
687 | can turn the lexer structure into a functor using the | |
688 | {\tt \%header} declaration. | |
689 | {\tt \%header} allows the user to define the header for a structure body. | |
690 | ||
691 | If the name of the parser in the specification were Calc, you | |
692 | would add this declaration to the specification for the lexical | |
693 | analyzer: | |
694 | \begin{quote} | |
695 | \tt | |
696 | \begin{verbatim} | |
697 | %header (functor CalcLexFun(structure Tokens : Calc_TOKENS)) | |
698 | \end{verbatim} | |
699 | \end{quote} | |
700 | ||
701 | You must define the following in the user definitions section: | |
702 | \begin{quote} | |
703 | \tt | |
704 | \begin{verbatim} | |
705 | type pos | |
706 | \end{verbatim} | |
707 | \end{quote} | |
708 | This is the type of position values for terminals. This type | |
709 | must be the same as the one declared in the specification for | |
710 | the grammar. Note, however, that this type is not available | |
711 | in the Tokens structure that parameterizes the lexer functor. | |
712 | ||
713 | You must include the following code in the user definitions section of | |
714 | the ML-Lex specification: | |
715 | \begin{quote} | |
716 | \tt | |
717 | \begin{verbatim} | |
718 | type svalue = Tokens.svalue | |
719 | type ('a,'b) token = ('a,'b) Tokens.token | |
720 | type lexresult = (svalue,pos) token | |
721 | \end{verbatim} | |
722 | \end{quote} | |
723 | ||
724 | These types are used to give lexers signatures. | |
725 | ||
726 | You may use a lexer constructed using ML-Lex with the {\tt \%arg} | |
727 | declaration, but you must follow special instructions for tying the parser | |
728 | and lexer together. | |
729 | ||
730 | \section{Creating the parser} | |
731 | \label{create-parser} | |
732 | Let the name of the grammar specification file be denoted by | |
733 | \{grammar\} and the name of the lexer specification file be | |
734 | denoted by \{lexer\} (e.g. in our calculator example these would | |
735 | stand for calc.grm and calc.lex, respectively). | |
736 | Let the parser name in the specification be represented by \{n\} | |
737 | (e.g. Calc in our calculator example). | |
738 | ||
739 | To construct a parser, do the following: | |
740 | \begin{enumerate} | |
741 | \item In the appropriate CM description file (e.g. for your main | |
742 | program or one of its subgroups or libraries), include the lines: | |
743 | \begin{quote} | |
744 | \begin{verbatim} | |
745 | ml-yacc-lib.cm | |
746 | {lexer} | |
747 | {grammar} | |
748 | \end{verbatim} | |
749 | \end{quote} | |
750 | This will cause ML-Yacc to be run on \{grammar\}, producing source files | |
751 | \{grammar\}.sig and \{grammar\}.sml, and ML-Lex to be run on | |
752 | \{lexer\}, producing a source file \{lexer\}.sml. Then these files | |
753 | will be compiled after loading the necessary signatures and modules | |
754 | from the ML-Yacc library as specified by {\tt ml-yacc-lib.cm}. | |
755 | \item Apply functors to create the parser: | |
756 | \begin{quote} | |
757 | \begin{verbatim} | |
758 | structure {n}LrVals = | |
759 | {n}LrValsFun(structure Token = LrParser.Token) | |
760 | structure {n}Lex = | |
761 | {n}LexFun(structure Tokens = {n}LrVals.Tokens) | |
762 | structure {n}Parser= | |
763 | Join(structure ParserData = {n}LrVals.ParserData | |
764 | structure Lex={n}Lex | |
765 | structure LrParser=LrParser) | |
766 | \end{verbatim} | |
767 | \end{quote} | |
768 | If the lexer was created using the {\tt \%arg} declaration in ML-Lex, | |
769 | the definition of \{n\}Parser must be changed to use another functor | |
770 | called JoinWithArg: | |
771 | \begin{quote} | |
772 | \begin{verbatim} | |
773 | structure {n}Parser= | |
774 | JoinWithArg | |
775 | (structure ParserData={n}LrVals.ParserData | |
776 | structure Lex={n}Lex | |
777 | structure LrParser=LrParser) | |
778 | \end{verbatim} | |
779 | \end{quote} | |
780 | \end{enumerate} | |
781 | ||
782 | The following outline summarizes this process: | |
783 | \begin{quote} | |
784 | \begin{verbatim} | |
785 | (* available at top level *) | |
786 | ||
787 | TOKEN | |
788 | LR_TABLE | |
789 | STREAM | |
790 | LR_PARSER | |
791 | PARSER_DATA | |
792 | structure LrParser : LR_PARSER | |
793 | ||
794 | (* printed out in .sig file created by parser generator: *) | |
795 | ||
796 | signature {n}_TOKENS = | |
797 | sig | |
798 | structure Token : TOKEN | |
799 | type svalue | |
800 | val PLUS : 'pos * 'pos -> | |
801 | (svalue,'pos) Token.token | |
802 | val INTLIT : int * 'pos * 'pos -> | |
803 | (svalue,'pos) Token.token | |
804 | ... | |
805 | end | |
806 | ||
807 | signature {n}_LRVALS = | |
808 | sig | |
809 | structure Tokens : {n}_TOKENS | |
810 | structure ParserData : PARSER_DATA | |
811 | sharing ParserData.Token = Tokens.Token | |
812 | sharing type ParserData.svalue = Tokens.svalue | |
813 | end | |
814 | ||
815 | (* printed out by lexer generator: *) | |
816 | ||
817 | functor {n}LexFun(structure Tokens : {n}_TOKENS)= | |
818 | struct | |
819 | ... | |
820 | end | |
821 | ||
822 | (* printed out in .sml file created by parser generator: *) | |
823 | ||
824 | functor {n}LrValsFun(structure Token : TOKENS) = | |
825 | struct | |
826 | ||
827 | structure ParserData = | |
828 | struct | |
829 | structure Token = Token | |
830 | ||
831 | (* code in header section of specification *) | |
832 | ||
833 | structure Header = ... | |
834 | type svalue = ... | |
835 | type result = ... | |
836 | type pos = ... | |
837 | structure Actions = ... | |
838 | structure EC = ... | |
839 | val table = ... | |
840 | end | |
841 | ||
842 | structure Tokens : {n}_TOKENS = | |
843 | struct | |
844 | structure Token = ParserData.Token | |
845 | type svalue = ... | |
846 | fun PLUS(p1,p2) = ... | |
847 | fun INTLIT(i,p1,p2) = ... | |
848 | end | |
849 | ||
850 | end | |
851 | ||
852 | (* to be done by the user: *) | |
853 | ||
854 | structure {n}LrVals = | |
855 | {n}LrValsFun(structure Token = LrParser.Token) | |
856 | ||
857 | structure {n}Lex = | |
858 | {n}LexFun(structure Tokens = {n}LrVals.Tokens) | |
859 | ||
860 | structure {n}Parser = | |
861 | Join(structure Lex = {n}Lex | |
862 | structure ParserData = {n}ParserData | |
863 | structure LrParser = LrParser) | |
864 | \end{verbatim} | |
865 | \end{quote} | |
866 | ||
867 | \section{Using the parser} | |
868 | \subsection{Parser Structure Signatures} | |
869 | The final structure created will have the signature PARSER: | |
870 | \begin{quote} | |
871 | \begin{verbatim} | |
872 | signature PARSER = | |
873 | sig | |
874 | structure Token : TOKEN | |
875 | structure Stream : STREAM | |
876 | exception ParseError | |
877 | ||
878 | type pos (* pos is the type of line numbers *) | |
879 | type result (* value returned by the parser *) | |
880 | type arg (* type of the user-supplied argument *) | |
881 | type svalue (* the types of semantic values *) | |
882 | ||
883 | val makeLexer : (int -> string) -> | |
884 | (svalue,pos) Token.token Stream.stream | |
885 | val parse : | |
886 | int * ((svalue,pos) Token.token Stream.stream) * | |
887 | (string * pos * pos -> unit) * arg -> | |
888 | result * (svalue,pos) Token.token Stream.stream | |
889 | val sameToken : | |
890 | (svalue,pos) Token.token * (svalue,pos) Token.token -> | |
891 | bool | |
892 | end | |
893 | \end{verbatim} | |
894 | \end{quote} | |
895 | or the signature ARG\_PARSER if you used {\tt \%arg} to create the lexer. | |
896 | This signature differs from ARG\_PARSER in that it | |
897 | which has an additional type {\tt lexarg} and a different type | |
898 | for {\tt makeLexer}: | |
899 | \begin{quote} | |
900 | \begin{verbatim} | |
901 | type lexarg | |
902 | val makeLexer : (int -> string) -> lexarg -> | |
903 | (svalue,pos) token stream | |
904 | \end{verbatim} | |
905 | \end{quote} | |
906 | ||
907 | The signature STREAM (providing lazy streams) is: | |
908 | \begin{quote} | |
909 | \begin{verbatim} | |
910 | signature STREAM = | |
911 | sig | |
912 | type 'a stream | |
913 | val streamify : (unit -> 'a) -> 'a stream | |
914 | val cons : 'a * 'a stream -> 'a stream | |
915 | val get : 'a stream -> 'a * 'a stream | |
916 | end | |
917 | \end{verbatim} | |
918 | \end{quote} | |
919 | ||
920 | \subsection{Using the parser structure} | |
921 | ||
922 | The parser structure converts the lexing function produced by | |
923 | ML-Lex into a function which creates a lazy stream of tokens. The | |
924 | function {\tt makeLexer} takes the same values as the corresponding | |
925 | {\tt makeLexer} created by ML-Lex, but returns a stream of tokens | |
926 | instead of a function which yields tokens. | |
927 | ||
928 | The function parse takes the token stream and some other arguments that | |
929 | are described below and parses the token stream. It returns a pair composed | |
930 | of the value associated with the start symbol and the rest of | |
931 | the token stream. The rest of the token stream includes the | |
932 | end-of-parse symbol which caused the reduction of some rule | |
933 | to the start symbol. The function parse raises the | |
934 | exception ParseError if a syntax error occurs which it cannot fix. | |
935 | ||
936 | The lazy stream is implemented by the {\tt Stream} structure. | |
937 | The function {\tt streamify} converts a conventional implementation | |
938 | of a stream into a lazy stream. In a conventional implementation | |
939 | of a stream, a stream consists of a position in a list of | |
940 | values. Fetching a value from a stream returns the | |
941 | value associated with the position and updates the position to | |
942 | the next element in the list of values. The fetch is a side-effecting | |
943 | operation. In a lazy stream, a fetch returns a value and a new | |
944 | stream, without a side-effect which updates the position value. | |
945 | This means that a stream can be repeatedly re-evaluated without | |
946 | affecting the values that it returns. If $f$ is the function | |
947 | that is passed to {\tt streamify}, $f$ is called only as many | |
948 | times as necessary to construct the portion of the list of values | |
949 | that is actually used. | |
950 | ||
951 | Parse also takes an integer giving the maximum amount of lookahead permitted | |
952 | for the error-correcting parse, a function to print error messages, | |
953 | and a value of type arg. The maximum amount of lookahead for interactive | |
954 | systems should be zero. In this case, no attempt is made to correct any | |
955 | syntax errors. For non-interactive systems, try 15. The | |
956 | function to print error messages takes a tuple of values consisting | |
957 | of the left and right positions of the terminal which caused the error | |
958 | and an error message. If the {\tt \%arg} declaration is not used, the | |
959 | value of type arg should be a value of type unit. | |
960 | ||
961 | The function sameToken can be used to see if two tokens | |
962 | denote the same terminal, irregardless of any values that the | |
963 | tokens carry. It is useful if you have multiple end-of-parse | |
964 | symbols and must check which end-of-parse symbol has been left on the | |
965 | front of the token stream. | |
966 | ||
967 | The types have the following meanings. The type {\tt arg} is the type | |
968 | of the additional argument to the parser, which is specified by the | |
969 | {\tt \%arg} declaration in the ML-Yacc specification. The type | |
970 | {\tt lexarg} is the optional argument to lexers, and is specified by | |
971 | the {\tt \%arg} declaration in an ML-Lex specifcation. The type {\tt pos} | |
972 | is the type of line numbers, and is specified by the {\tt \%pos} declaration | |
973 | in an ML-Yacc specification and defined in the user declarations | |
974 | section of the ML-Lex specification. The type {\tt result} is | |
975 | the type associated with the start symbol in the ML-Yacc specification. | |
976 | ||
977 | \section{Examples} | |
978 | ||
979 | See the directory examples for examples of parsers constructed using | |
980 | ML-Yacc. Here is a small sample parser and lexer for an interactive | |
981 | calculator, from the directory examples/calc, along with code for | |
982 | creating a parsing function. The calculator reads one or more | |
983 | expressions from the standard input, evaluates the expressions, and | |
984 | prints their values. Expressions should be separated by semicolons, | |
985 | and may also be ended by using an end-of-file. This shows how to | |
986 | construct an interactive parser which reads a top-level declaration | |
987 | and processes the declaration before reading the next top-level | |
988 | declaration. | |
989 | ||
990 | \subsection{Sample Grammar} | |
991 | \begin{tt} | |
992 | \begin{verbatim} | |
993 | (* Sample interactive calculator for ML-Yacc *) | |
994 | ||
995 | fun lookup "bogus" = 10000 | |
996 | | lookup s = 0 | |
997 | ||
998 | %% | |
999 | ||
1000 | %eop EOF SEMI | |
1001 | ||
1002 | (* %pos declares the type of positions for terminals. | |
1003 | Each symbol has an associated left and right position. *) | |
1004 | ||
1005 | %pos int | |
1006 | ||
1007 | %left SUB PLUS | |
1008 | %left TIMES DIV | |
1009 | %right CARAT | |
1010 | ||
1011 | %term ID of string | NUM of int | PLUS | TIMES | PRINT | | |
1012 | SEMI | EOF | CARAT | DIV | SUB | |
1013 | %nonterm EXP of int | START of int option | |
1014 | ||
1015 | %name Calc | |
1016 | ||
1017 | %subst PRINT for ID | |
1018 | %prefer PLUS TIMES DIV SUB | |
1019 | %keyword PRINT SEMI | |
1020 | ||
1021 | %noshift EOF | |
1022 | %value ID ("bogus") | |
1023 | %nodefault | |
1024 | %verbose | |
1025 | %% | |
1026 | ||
1027 | (* the parser returns the value associated with the expression *) | |
1028 | ||
1029 | START : PRINT EXP (print EXP; | |
1030 | print "\n"; | |
1031 | flush_out std_out; SOME EXP) | |
1032 | | EXP (SOME EXP) | |
1033 | | (NONE) | |
1034 | EXP : NUM (NUM) | |
1035 | | ID (lookup ID) | |
1036 | | EXP PLUS EXP (EXP1+EXP2) | |
1037 | | EXP TIMES EXP (EXP1*EXP2) | |
1038 | | EXP DIV EXP (EXP1 div EXP2) | |
1039 | | EXP SUB EXP (EXP1-EXP2) | |
1040 | | EXP CARAT EXP (let fun e (m,0) = 1 | |
1041 | | e (m,l) = m*e(m,l-1) | |
1042 | in e (EXP1,EXP2) | |
1043 | end) | |
1044 | \end{verbatim} | |
1045 | \end{tt} | |
1046 | \subsection{Sample Lexer} | |
1047 | \begin{tt} | |
1048 | \begin{verbatim} | |
1049 | structure Tokens = Tokens | |
1050 | ||
1051 | type pos = int | |
1052 | type svalue = Tokens.svalue | |
1053 | type ('a,'b) token = ('a,'b) Tokens.token | |
1054 | type lexresult= (svalue,pos) token | |
1055 | ||
1056 | val pos = ref 0 | |
1057 | val eof = fn () => Tokens.EOF(!pos,!pos) | |
1058 | val error = fn (e,l : int,_) => | |
1059 | output(std_out,"line " ^ (makestring l) ^ | |
1060 | ": " ^ e ^ "\n") | |
1061 | %% | |
1062 | %header (functor CalcLexFun(structure Tokens: Calc_TOKENS)); | |
1063 | alpha=[A-Za-z]; | |
1064 | digit=[0-9]; | |
1065 | ws = [\ \t]; | |
1066 | %% | |
1067 | \n => (pos := (!pos) + 1; lex()); | |
1068 | {ws}+ => (lex()); | |
1069 | {digit}+ => (Tokens.NUM | |
1070 | (revfold (fn (a,r) => ord(a)-ord("0")+10*r) | |
1071 | (explode yytext) 0, | |
1072 | !pos,!pos)); | |
1073 | "+" => (Tokens.PLUS(!pos,!pos)); | |
1074 | "*" => (Tokens.TIMES(!pos,!pos)); | |
1075 | ";" => (Tokens.SEMI(!pos,!pos)); | |
1076 | {alpha}+ => (if yytext="print" | |
1077 | then Tokens.PRINT(!pos,!pos) | |
1078 | else Tokens.ID(yytext,!pos,!pos) | |
1079 | ); | |
1080 | "-" => (Tokens.SUB(!pos,!pos)); | |
1081 | "^" => (Tokens.CARAT(!pos,!pos)); | |
1082 | "/" => (Tokens.DIV(!pos,!pos)); | |
1083 | "." => (error ("ignoring bad character "^yytext,!pos,!pos); | |
1084 | lex()); | |
1085 | \end{verbatim} | |
1086 | \end{tt} | |
1087 | \subsection{Top-level code} | |
1088 | ||
1089 | You must follow the instructions in Section~\ref{create-parser} | |
1090 | to create the parser and lexer functors and load them. After you have | |
1091 | done this, you must then apply the functors to produce the {\tt CalcParser} | |
1092 | structure. The code for doing this is shown below. | |
1093 | \begin{quote} | |
1094 | \begin{verbatim} | |
1095 | structure CalcLrVals = | |
1096 | CalcLrValsFun(structure Token = LrParser.Token) | |
1097 | ||
1098 | structure CalcLex = | |
1099 | CalcLexFun(structure Tokens = CalcLrVals.Tokens); | |
1100 | ||
1101 | structure CalcParser = | |
1102 | Join(structure LrParser = LrParser | |
1103 | structure ParserData = CalcLrVals.ParserData | |
1104 | structure Lex = CalcLex) | |
1105 | \end{verbatim} | |
1106 | \end{quote} | |
1107 | ||
1108 | Now we need a function which given a lexer invokes the parser. The | |
1109 | function {\tt invoke} does this. | |
1110 | ||
1111 | \begin{quote} | |
1112 | \begin{verbatim} | |
1113 | fun invoke lexstream = | |
1114 | let fun print_error (s,i:int,_) = | |
1115 | TextIO.output(TextIO.stdOut, | |
1116 | "Error, line " ^ (Int.toString i) ^ ", " ^ s ^ "\n") | |
1117 | in CalcParser.parse(0,lexstream,print_error,()) | |
1118 | end | |
1119 | \end{verbatim} | |
1120 | \end{quote} | |
1121 | ||
1122 | Finally, we need a function which can read one or more expressions from | |
1123 | the standard input. The function {\tt parse}, shown below, does this. | |
1124 | It runs the calculator on the standard input and terminates | |
1125 | when an end-of-file is encountered. | |
1126 | ||
1127 | \begin{quote} | |
1128 | \begin{verbatim} | |
1129 | fun parse () = | |
1130 | let val lexer = CalcParser.makeLexer | |
1131 | (fn _ => TextIO.inputLine TextIO.stdIn) | |
1132 | val dummyEOF = CalcLrVals.Tokens.EOF(0,0) | |
1133 | val dummySEMI = CalcLrVals.Tokens.SEMI(0,0) | |
1134 | fun loop lexer = | |
1135 | let val (result,lexer) = invoke lexer | |
1136 | val (nextToken,lexer) = CalcParser.Stream.get lexer | |
1137 | in case result | |
1138 | of SOME r => | |
1139 | TextIO.output(TextIO.stdOut, | |
1140 | "result = " ^ (Int.toString r) ^ "\n") | |
1141 | | NONE => (); | |
1142 | if CalcParser.sameToken(nextToken,dummyEOF) then () | |
1143 | else loop lexer | |
1144 | end | |
1145 | in loop lexer | |
1146 | end | |
1147 | \end{verbatim} | |
1148 | \end{quote} | |
1149 | ||
1150 | \section{Signatures} | |
1151 | ||
1152 | This section contains signatures used by ML-Yacc for structures in | |
1153 | the file base.sml, functors and structures that it generates, and for | |
1154 | the signatures of lexer structures supplied by you. | |
1155 | ||
1156 | \subsection{Parsing structure signatures} | |
1157 | ||
1158 | \begin{quote} | |
1159 | \begin{verbatim} | |
1160 | (* STREAM: signature for a lazy stream.*) | |
1161 | ||
1162 | signature STREAM = | |
1163 | sig | |
1164 | type 'a stream | |
1165 | val streamify : (unit -> 'a) -> 'a stream | |
1166 | val cons : 'a * 'a stream -> 'a stream | |
1167 | val get : 'a stream -> 'a * 'a stream | |
1168 | end | |
1169 | ||
1170 | (* LR_TABLE: signature for an LR Table.*) | |
1171 | ||
1172 | signature LR_TABLE = | |
1173 | sig | |
1174 | datatype ('a,'b) pairlist | |
1175 | = EMPTY | |
1176 | | PAIR of 'a * 'b * ('a,'b) pairlist | |
1177 | datatype state = STATE of int | |
1178 | datatype term = T of int | |
1179 | datatype nonterm = NT of int | |
1180 | datatype action = SHIFT of state | |
1181 | | REDUCE of int | |
1182 | | ACCEPT | |
1183 | | ERROR | |
1184 | type table | |
1185 | ||
1186 | val numStates : table -> int | |
1187 | val numRules : table -> int | |
1188 | val describeActions : table -> state -> | |
1189 | (term,action) pairlist * action | |
1190 | val describeGoto : table -> state -> | |
1191 | (nonterm,state) pairlist | |
1192 | val action : table -> state * term -> action | |
1193 | val goto : table -> state * nonterm -> state | |
1194 | val initialState : table -> state | |
1195 | exception Goto of state * nonterm | |
1196 | ||
1197 | val mkLrTable : | |
1198 | {actions : ((term,action) pairlist * action) array, | |
1199 | gotos : (nonterm,state) pairlist array, | |
1200 | numStates : int, numRules : int, | |
1201 | initialState : state} -> table | |
1202 | end | |
1203 | ||
1204 | (* TOKEN: signature for the internal structure of a token.*) | |
1205 | ||
1206 | signature TOKEN = | |
1207 | sig | |
1208 | structure LrTable : LR_TABLE | |
1209 | datatype ('a,'b) token = TOKEN of LrTable.term * | |
1210 | ('a * 'b * 'b) | |
1211 | val sameToken : ('a,'b) token * ('a,'b) token -> bool | |
1212 | end | |
1213 | ||
1214 | (* LR_PARSER: signature for a polymorphic LR parser *) | |
1215 | ||
1216 | signature LR_PARSER = | |
1217 | sig | |
1218 | structure Stream: STREAM | |
1219 | structure LrTable : LR_TABLE | |
1220 | structure Token : TOKEN | |
1221 | ||
1222 | sharing LrTable = Token.LrTable | |
1223 | ||
1224 | exception ParseError | |
1225 | ||
1226 | val parse: | |
1227 | {table : LrTable.table, | |
1228 | lexer : ('b,'c) Token.token Stream.stream, | |
1229 | arg: 'arg, | |
1230 | saction : int * | |
1231 | 'c * | |
1232 | (LrTable.state * ('b * 'c * 'c)) list * | |
1233 | 'arg -> | |
1234 | LrTable.nonterm * | |
1235 | ('b * 'c * 'c) * | |
1236 | ((LrTable.state *('b * 'c * 'c)) list), | |
1237 | void : 'b, | |
1238 | ec: {is_keyword : LrTable.term -> bool, | |
1239 | noShift : LrTable.term -> bool, | |
1240 | preferred_subst:LrTable.term -> LrTable.term list, | |
1241 | preferred_insert : LrTable.term -> bool, | |
1242 | errtermvalue : LrTable.term -> 'b, | |
1243 | showTerminal : LrTable.term -> string, | |
1244 | terms: LrTable.term list, | |
1245 | error : string * 'c * 'c -> unit | |
1246 | }, | |
1247 | lookahead : int (* max amount of lookahead used in | |
1248 | * error correction *) | |
1249 | } -> 'b * (('b,'c) Token.token Stream.stream) | |
1250 | end | |
1251 | \end{verbatim} | |
1252 | \end{quote} | |
1253 | ||
1254 | \subsection{Lexers} | |
1255 | ||
1256 | Lexers for use with ML-Yacc's output must match one of these signatures. | |
1257 | ||
1258 | \begin{quote} | |
1259 | \begin{verbatim} | |
1260 | signature LEXER = | |
1261 | sig | |
1262 | structure UserDeclarations : | |
1263 | sig | |
1264 | type ('a,'b) token | |
1265 | type pos | |
1266 | type svalue | |
1267 | end | |
1268 | val makeLexer : (int -> string) -> unit -> | |
1269 | (UserDeclarations.svalue, UserDeclarations.pos) | |
1270 | UserDeclarations.token | |
1271 | end | |
1272 | ||
1273 | (* ARG_LEXER: the %arg option of ML-Lex allows users to | |
1274 | produce lexers which also take an argument before | |
1275 | yielding a function from unit to a token. | |
1276 | *) | |
1277 | ||
1278 | signature ARG_LEXER = | |
1279 | sig | |
1280 | structure UserDeclarations : | |
1281 | sig | |
1282 | type ('a,'b) token | |
1283 | type pos | |
1284 | type svalue | |
1285 | type arg | |
1286 | end | |
1287 | val makeLexer : | |
1288 | (int -> string) -> | |
1289 | UserDeclarations.arg -> | |
1290 | unit -> | |
1291 | (UserDeclarations.svalue, UserDeclarations.pos) | |
1292 | UserDeclarations.token | |
1293 | end | |
1294 | \end{verbatim} | |
1295 | \end{quote} | |
1296 | ||
1297 | \subsection{Signatures for the functor produced by ML-Yacc} | |
1298 | ||
1299 | The following signature is used in signatures generated by | |
1300 | ML-Yacc: | |
1301 | \begin{quote} | |
1302 | \begin{verbatim} | |
1303 | (* PARSER_DATA: the signature of ParserData structures in | |
1304 | {n}LrValsFun functor produced by ML-Yacc. All such | |
1305 | structures match this signature. *) | |
1306 | ||
1307 | signature PARSER_DATA = | |
1308 | sig | |
1309 | type pos (* the type of line numbers *) | |
1310 | type svalue (* the type of semantic values *) | |
1311 | type arg (* the type of the user-supplied *) | |
1312 | (* argument to the parser *) | |
1313 | type result | |
1314 | ||
1315 | structure LrTable : LR_TABLE | |
1316 | structure Token : TOKEN | |
1317 | sharing Token.LrTable = LrTable | |
1318 | ||
1319 | structure Actions : | |
1320 | sig | |
1321 | val actions : int * pos * | |
1322 | (LrTable.state * (svalue * pos * pos)) list * arg -> | |
1323 | LrTable.nonterm * (svalue * pos * pos) * | |
1324 | ((LrTable.state *(svalue * pos * pos)) list) | |
1325 | val void : svalue | |
1326 | val extract : svalue -> result | |
1327 | end | |
1328 | ||
1329 | (* structure EC contains information used to improve | |
1330 | error recovery in an error-correcting parser *) | |
1331 | ||
1332 | structure EC : | |
1333 | sig | |
1334 | val is_keyword : LrTable.term -> bool | |
1335 | val noShift : LrTable.term -> bool | |
1336 | val preferred_subst: LrTable.term -> LrTable.term list | |
1337 | val preferred_insert : LrTable.term -> bool | |
1338 | val errtermvalue : LrTable.term -> svalue | |
1339 | val showTerminal : LrTable.term -> string | |
1340 | val terms: LrTable.term list | |
1341 | end | |
1342 | ||
1343 | (* table is the LR table for the parser *) | |
1344 | ||
1345 | val table : LrTable.table | |
1346 | end | |
1347 | \end{verbatim} | |
1348 | \end{quote} | |
1349 | ||
1350 | ML-Yacc generates these two signatures: | |
1351 | \begin{quote} | |
1352 | \begin{verbatim} | |
1353 | (* printed out in .sig file created by parser generator: *) | |
1354 | ||
1355 | signature {n}_TOKENS = | |
1356 | sig | |
1357 | type ('a,'b) token | |
1358 | type svalue | |
1359 | ... | |
1360 | end | |
1361 | ||
1362 | signature {n}_LRVALS = | |
1363 | sig | |
1364 | structure Tokens : {n}_TOKENS | |
1365 | structure ParserData : PARSER_DATA | |
1366 | sharing type ParserData.Token.token = Tokens.token | |
1367 | sharing type ParserData.svalue = Tokens.svalue | |
1368 | end | |
1369 | \end{verbatim} | |
1370 | \end{quote} | |
1371 | \subsection{User parser signatures} | |
1372 | ||
1373 | Parsers created by applying the Join functor will match this signature: | |
1374 | \begin{quote} | |
1375 | \begin{verbatim} | |
1376 | signature PARSER = | |
1377 | sig | |
1378 | structure Token : TOKEN | |
1379 | structure Stream : STREAM | |
1380 | exception ParseError | |
1381 | ||
1382 | type pos (* pos is the type of line numbers *) | |
1383 | type result (* value returned by the parser *) | |
1384 | type arg (* type of the user-supplied argument *) | |
1385 | type svalue (* the types of semantic values *) | |
1386 | ||
1387 | val makeLexer : (int -> string) -> | |
1388 | (svalue,pos) Token.token Stream.stream | |
1389 | ||
1390 | val parse : | |
1391 | int * ((svalue,pos) Token.token Stream.stream) * | |
1392 | (string * pos * pos -> unit) * arg -> | |
1393 | result * (svalue,pos) Token.token Stream.stream | |
1394 | val sameToken : | |
1395 | (svalue,pos) Token.token * (svalue,pos) Token.token -> | |
1396 | bool | |
1397 | end | |
1398 | \end{verbatim} | |
1399 | \end{quote} | |
1400 | Parsers created by applying the JoinWithArg functor will match this | |
1401 | signature: | |
1402 | \begin{quote} | |
1403 | \begin{verbatim} | |
1404 | signature ARG_PARSER = | |
1405 | sig | |
1406 | structure Token : TOKEN | |
1407 | structure Stream : STREAM | |
1408 | exception ParseError | |
1409 | ||
1410 | type arg | |
1411 | type lexarg | |
1412 | type pos | |
1413 | type result | |
1414 | type svalue | |
1415 | ||
1416 | val makeLexer : (int -> string) -> lexarg -> | |
1417 | (svalue,pos) Token.token Stream.stream | |
1418 | val parse : int * | |
1419 | ((svalue,pos) Token.token Stream.stream) * | |
1420 | (string * pos * pos -> unit) * | |
1421 | arg -> | |
1422 | result * (svalue,pos) Token.token Stream.stream | |
1423 | val sameToken : | |
1424 | (svalue,pos) Token.token * (svalue,pos) Token.token -> | |
1425 | bool | |
1426 | end | |
1427 | \end{verbatim} | |
1428 | \end{quote} | |
1429 | ||
1430 | \section{Sharing constraints} | |
1431 | ||
1432 | Let the name of the parser be denoted by \{n\}. If | |
1433 | you have not created a lexer which takes an argument, and | |
1434 | you have followed the directions given earlier for creating the parser, you | |
1435 | will have the following structures with the following signatures: | |
1436 | \begin{quote} | |
1437 | \begin{verbatim} | |
1438 | (* always present *) | |
1439 | ||
1440 | signature TOKEN | |
1441 | signature LR_TABLE | |
1442 | signature STREAM | |
1443 | signature LR_PARSER | |
1444 | signature PARSER_DATA | |
1445 | structure LrParser : LR_PARSER | |
1446 | ||
1447 | (* signatures generated by ML-Yacc *) | |
1448 | ||
1449 | signature {n}_TOKENS | |
1450 | signature {n}_LRVALS | |
1451 | ||
1452 | (* structures created by you *) | |
1453 | ||
1454 | structure {n}LrVals : {n}_LRVALS | |
1455 | structure Lex : LEXER | |
1456 | structure {n}Parser : PARSER | |
1457 | \end{verbatim} | |
1458 | \end{quote} | |
1459 | ||
1460 | The following sharing constraints will exist: | |
1461 | \begin{quote} | |
1462 | \begin{verbatim} | |
1463 | sharing {n}Parser.Token = LrParser.Token = | |
1464 | {n}LrVals.ParserData.Token | |
1465 | sharing {n}Parser.Stream = LrParser.Stream | |
1466 | ||
1467 | sharing type {n}Parser.arg = {n}LrVals.ParserData.arg | |
1468 | sharing type {n}Parser.result = {n}LrVals.ParserData.result | |
1469 | sharing type {n}Parser.pos = {n}LrVals.ParserData.pos = | |
1470 | Lex.UserDeclarations.pos | |
1471 | sharing type {n}Parser.svalue = {n}LrVals.ParserData.svalue = | |
1472 | {n}LrVals.Tokens.svalue = Lex.UserDeclarations.svalue | |
1473 | sharing type {n}Parser.Token.token = | |
1474 | {n}LrVals.ParserData.Token.token = | |
1475 | LrParser.Token.token = | |
1476 | Lex.UserDeclarations.token | |
1477 | ||
1478 | sharing {n}LrVals.LrTable = LrParser.LrTable | |
1479 | ||
1480 | \end{verbatim} | |
1481 | \end{quote} | |
1482 | ||
1483 | If you used a lexer which takes an argument, then you will | |
1484 | have: | |
1485 | \begin{quote} | |
1486 | \begin{verbatim} | |
1487 | structure ARG_LEXER | |
1488 | structure {n}Parser : PARSER | |
1489 | ||
1490 | (* additional sharing constraint *) | |
1491 | ||
1492 | sharing type {n}Parser.lexarg = Lex.UserDeclarations.arg | |
1493 | \end{verbatim} | |
1494 | \end{quote} | |
1495 | ||
1496 | \section{Hints} | |
1497 | \subsection{Multiple start symbols} | |
1498 | To have multiple start symbols, define a dummy token for each | |
1499 | start symbol. Then define a start symbol which derives the | |
1500 | multiple start symbols with dummy tokens placed in front of | |
1501 | them. When you start the parser you must place a dummy token | |
1502 | on the front of the lexer stream to select a start symbol | |
1503 | from which to begin parsing. | |
1504 | ||
1505 | Assuming that you have followed the naming conventions used before, | |
1506 | create the lexer using the makeLexer function in the \{n\}Parser structure. | |
1507 | Then, place the dummy token on the front of the lexer: | |
1508 | \begin{quote} | |
1509 | \begin{verbatim} | |
1510 | val dummyLexer = | |
1511 | {n}Parser.Stream.cons | |
1512 | ({n}LrVals.Tokens.{dummy token name} | |
1513 | ({dummy lineno},{dummy lineno}), | |
1514 | lexer) | |
1515 | \end{verbatim} | |
1516 | \end{quote} | |
1517 | You have to pass a Tokens structure to the lexer. This Tokens structure | |
1518 | contains functions which construct tokens from values and line numbers. | |
1519 | So to create your dummy token just apply the appropriate token constructor | |
1520 | function from this Tokens structure to a value (if there is one) and the | |
1521 | line numbers. This is exactly what you do in the lexer to construct tokens. | |
1522 | ||
1523 | Then you must place the dummy token on the front of your lex stream. | |
1524 | The structure \{n\}Parser contains a structure Stream which implements | |
1525 | lazy streams. So you just cons the dummy token on to stream returned | |
1526 | by makeLexer. | |
1527 | \subsection{Functorizing things further} | |
1528 | ||
1529 | You may wish to functorize things even further. Two possibilities | |
1530 | are turning the lexer and parser structures into closed functors, | |
1531 | that is, functors which do not refer to types or values defined | |
1532 | outside their body or outside their parameter structures (except | |
1533 | for pervasive types and values), and creating a functor which | |
1534 | encapsulates the code necessary to invoke the parser. | |
1535 | ||
1536 | Use the {\tt \%header} declarations in ML-Lex and ML-Yacc to create | |
1537 | closed functors. See section~\ref{optional-def} of this manual | |
1538 | and section 4 of the manual for ML-Lex for complete descriptions of these | |
1539 | declarations. If you do this, you should also parameterize these | |
1540 | structures by the types of line numbers. The type will be an | |
1541 | abstract type, so you will also need to define all the valid | |
1542 | operations on the type. The signature INTERFACE, defined below, | |
1543 | shows one possible signature for a structure defining the line | |
1544 | number type and associated operations. | |
1545 | ||
1546 | If you wish to encapsulate the code necessary to invoke the | |
1547 | parser, your functor generally will have form: | |
1548 | \begin{quote} | |
1549 | \begin{verbatim} | |
1550 | functor Encapsulate( | |
1551 | structure Parser : PARSER | |
1552 | structure Interface : INTERFACE | |
1553 | sharing type Parser.arg = Interface.arg | |
1554 | sharing type Parser.pos = Interface.pos | |
1555 | sharing type Parser.result = ... | |
1556 | structure Tokens : {parser name}_TOKENS | |
1557 | sharing type Tokens.token = Parser.Token.token | |
1558 | sharing type Tokens.svalue = Parser.svalue) = | |
1559 | struct | |
1560 | ... | |
1561 | end | |
1562 | \end{verbatim} | |
1563 | \end{quote} | |
1564 | ||
1565 | The signature INTERFACE, defined below, is a possible signature for | |
1566 | a structure | |
1567 | defining the types | |
1568 | of line numbers and arguments (types pos and arg, respectively) | |
1569 | along with operations for them. You need this structure | |
1570 | because | |
1571 | these types will be abstract types inside the body of your | |
1572 | functor. | |
1573 | \begin{quote} | |
1574 | \begin{verbatim} | |
1575 | signature INTERFACE = | |
1576 | sig | |
1577 | type pos | |
1578 | val line : pos ref | |
1579 | val reset : unit -> unit | |
1580 | val next : unit -> unit | |
1581 | val error : string * pos * pos -> unit | |
1582 | ||
1583 | type arg | |
1584 | val nothing : arg | |
1585 | end | |
1586 | \end{verbatim} | |
1587 | \end{quote} | |
1588 | ||
1589 | The directory example/fol contains a sample parser in which | |
1590 | the code for tying together the lexer and parser has been | |
1591 | encapsulated in a functor. | |
1592 | ||
1593 | \section{Acknowledgements} | |
1594 | ||
1595 | Nick Rothwell wrote an SLR table generator in 1988 which inspired the | |
1596 | initial work on an ML parser generator. Bruce Duba and David | |
1597 | MacQueen made useful suggestions about the design of the error-correcting | |
1598 | parser. Thanks go to all the users at Carnegie Mellon who beta-tested | |
1599 | this version. Their comments and questions led to the creation of | |
1600 | this manual and helped improve it. | |
1601 | ||
1602 | \section{Bugs} | |
1603 | ||
1604 | There is a slight difference in syntax between ML-Lex and ML-Yacc. | |
1605 | In ML-Lex, semantic actions must be followed by a semicolon but | |
1606 | in ML-Yacc semantic actions cannot be followed by a semicolon. | |
1607 | The syntax should be the same. ML-Lex also produces structures with | |
1608 | two different signatures, but it should produce structures with just | |
1609 | one signature. This would simplify some things. | |
1610 | ||
1611 | \begin{thebibliography}{9} | |
1612 | ||
1613 | \bibitem{bf} ``A Practical Method for LR and LL Syntactic Error | |
1614 | Diagnosis and Recovery'', M. Burke and G. Fisher, | |
1615 | ACM Transactions on Programming Languages and | |
1616 | Systems, Vol. 9, No. 2, April 1987, pp. 164-167. | |
1617 | \bibitem{ahu} A. Aho, R. Sethi, J. Ullman, {\em Compilers: Principles, | |
1618 | Techniques, and Tools}, Addison-Wesley, Reading, MA, 1986. | |
1619 | ||
1620 | \end{thebibliography} | |
1621 | ||
1622 | \end{document} |