Coccinelle release-1.0.0-rc11
[bpt/coccinelle.git] / docs / manual / cocci_syntax.tex
CommitLineData
faf9a90c
C
1
2%\section{The SmPL Grammar}
3
4% This section presents the SmPL grammar. This definition follows closely
5% our implementation using the Menhir parser generator \cite{menhir}.
6
7This document presents the grammar of the SmPL language used by the
7f004419 8\href{http://coccinelle.lip6.fr/}{Coccinelle tool}. For the most
faf9a90c
C
9part, the grammar is written using standard notation. In some rules,
10however, the left-hand side is in all uppercase letters. These are
11macros, which take one or more grammar rule right-hand-sides as
12arguments. The grammar also uses some unspecified nonterminals, such
b1b2de81
C
13as \T{id}, \T{const}, etc. These refer to the sets suggested by
14the name, {\em i.e.}, \T{id} refers to the set of possible
15C-language identifiers, while \T{const} refers to the set of
978fd7e5 16possible C-language constants.
708f4980 17%
978fd7e5 18\ifhevea
708f4980 19A PDF version of this documentation is available at
951c7801 20\url{http://coccinelle.lip6.fr/docs/main_grammar.pdf}.
708f4980 21\else
faf9a90c 22A HTML version of this documentation is available online at
951c7801 23\url{http://coccinelle.lip6.fr/docs/main_grammar.html}.
708f4980 24\fi
faf9a90c 25
faf9a90c
C
26\section{Program}
27
28\begin{grammar}
29 \RULE{\rt{program}}
30 \CASE{\any{\NT{include\_cocci}} \some{\NT{changeset}}}
31
32 \RULE{\rt{include\_cocci}}
97111a47 33 \CASE{include \NT{string}}
faf9a90c
C
34 \CASE{using \NT{string}}
35 \CASE{using \NT{pathToIsoFile}}
5636bb2c 36 \CASE{virtual \T{id} \ANY{, \T{id}}}
faf9a90c
C
37
38 \RULE{\rt{changeset}}
39 \CASE{\NT{metavariables} \NT{transformation}}
b1b2de81 40 \CASE{\NT{script\_metavariables} \T{script\_code}}
faf9a90c 41% \CASE{\NT{metavariables} \ANY{--- filename +++ filename} \NT{transformation}}
faf9a90c
C
42\end{grammar}
43
b1b2de81
C
44\noindent
45\T{script\_code} is any code in the chosen scripting language. Parsing of
46the semantic patch does not check the validity of this code; any errors are
978fd7e5
C
47first detected when the code is executed. Furthermore, \texttt{@} should
48not be use in this code. Spatch scans the script code for the next
49\texttt{@} and considers that to be the beginning of the next rule, even if
8babbc8f 50\texttt{@} occurs within e.g., a comment.
b1b2de81 51
5636bb2c
C
52\texttt{virtual} keyword is used to declare virtual rules. Virtual
53rules may be subsequently used as a dependency for the rules in the
54SmPL file. Whether a virtual rule is defined or not is controlled by
55the \texttt{-D} option on the command line.
56
faf9a90c
C
57% Between the metavariables and the transformation rule, there can be a
58% specification of constraints on the names of the old and new files,
59% analogous to the filename specifications in the standard patch syntax.
60% (see Figure \ref{scsiglue_patch}).
61
b1b2de81 62\section{Metavariables for transformations}
faf9a90c
C
63
64The \NT{rulename} portion of the metavariable declaration can specify
65properties of a rule such as its name, the names of the rules that it
66depends on, the isomorphisms to be used in processing the rule, and whether
67quantification over paths should be universal or existential. The optional
68annotation {\tt expression} indicates that the pattern is to be considered
69as matching an expression, and thus can be used to avoid some parsing
70problems.
71
72The \NT{metadecl} portion of the metavariable declaration defines various
73types of metavariables that will be used for matching in the transformation
74section.
75
76\begin{grammar}
77 \RULE{\rt{metavariables}}
78 \CASE{@@ \any{\NT{metadecl}} @@}
79 \CASE{@ \NT{rulename} @ \any{\NT{metadecl}} @@}
80
81 \RULE{\rt{rulename}}
82 \CASE{\T{id} \OPT{extends \T{id}} \OPT{depends on \NT{dep}} \opt{\NT{iso}}
83 \opt{\NT{disable-iso}} \opt{\NT{exists}} \opt{expression}}
b1b2de81 84
faf9a90c 85 \RULE{\rt{dep}}
faf9a90c
C
86 \CASE{\T{id}}
87 \CASE{!\T{id}}
97111a47 88 \CASE{!(\NT{dep})}
faf9a90c
C
89 \CASE{ever \T{id}}
90 \CASE{never \T{id}}
97111a47
C
91 \CASE{\NT{dep} \&\& \NT{dep}}
92 \CASE{\NT{dep} || \NT{dep}}
faf9a90c
C
93 \CASE{(\NT{dep})}
94
95 \RULE{\rt{iso}}
96 \CASE{using \NT{string} \ANY{, \NT{string}}}
97
98 \RULE{\rt{disable-iso}}
99 \CASE{disable \NT{COMMA\_LIST}\mth{(}\T{id}\mth{)}}
100
101 \RULE{\rt{exists}}
102 \CASE{exists}
103 \CASE{forall}
104% \CASE{\opt{reverse} forall}
105
106 \RULE{\rt{COMMA\_LIST}\mth{(}\rt{elem}\mth{)}}
107 \CASE{\NT{elem} \ANY{, \NT{elem}}}
108\end{grammar}
109
b1b2de81 110The keyword \KW{disable} is normally used with the names of
faf9a90c
C
111isomorphisms defined in standard.iso or whatever isomorphism file has been
112included. There are, however, some other isomorphisms that are built into
113the implementation of Coccinelle and that can be disabled as well. Their
413ffc02 114names are given below. In each case, the text describes the standard
faf9a90c
C
115behavior. Using \NT{disable-iso} with the given name disables this behavior.
116
117\begin{itemize}
118\item \KW{optional\_storage}: A SmPL function definition that does not
119 specify any visibility (i.e., static or extern), or a SmPL variable
120 declaration that does not specify any storage (i.e., auto, static,
121 register, or extern), matches a function declaration or variable
122 declaration with any visibility or storage, respectively.
123\item \KW{optional\_qualifier}: This is similar to \KW{optional\_storage},
124 except that here is it the qualifier (i.e., const or volatile) that does
125 not have to be specified in the SmPL code, but may be present in the C code.
126\item \KW{value\_format}: Integers in various formats, e.g., 1 and 0x1, are
127 considered to be equivalent in the matching process.
5427db06
C
128\item \KW{optional\_declarer\_semicolon}: Some declarers (top-level terms
129 that look like function calls but serve to declare some variable) don't
130 require a semicolon. This isomorphism allows a SmPL declarer with a semicolon
131 to match such a C declarer, if no transformation is specified on the SmPL
132 semicolon.
faf9a90c
C
133\item \KW{comm\_assoc}: An expression of the form \NT{exp} \NT{bin\_op}
134 \KW{...}, where \NT{bin\_op} is commutative and associative, is
135 considered to match any top-level sequence of \NT{bin\_op} operators
136 containing \NT{exp} as the top-level argument.
137\end{itemize}
138
139The possible types of metavariable declarations are defined by the grammar
140rule below. Metavariables should occur at least once in the transformation
190f1acf
C
141immediately following their declaration. Fresh identifier metavariables
142must only be used in {\tt +} code. These properties are not expressed in
143the grammar, but are checked by a subsequent analysis. The metavariables
144are designated according to the kind of terms they can match, such as a
145statement, an identifier, or an expression. An expression metavariable can
146be further constrained by its type. A declaration metavariable matches the
413ffc02
C
147declaration of one or more variables, all sharing the same type
148specification ({\em e.g.}, {\tt int a,b,c=3;}). A field metavariable does
149the same, but for structure fields.
faf9a90c
C
150
151\begin{grammar}
152 \RULE{\rt{metadecl}}
b23ff9c7 153 \CASE{metavariable \NT{ids} ;}
faf9a90c 154 \CASE{fresh identifier \NT{ids} ;}
951c7801 155 \CASE{identifier \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_regexp}\mth{)} ;}
ae4735db 156 \CASE{identifier \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_virt\_or\_not\_eq}\mth{)} ;}
faf9a90c
C
157 \CASE{parameter \opt{list} \NT{ids} ;}
158 \CASE{parameter list [ \NT{id} ] \NT{ids} ;}
88e71198 159 \CASE{parameter list [ \NT{const} ] \NT{ids} ;}
faf9a90c
C
160 \CASE{type \NT{ids} ;}
161 \CASE{statement \opt{list} \NT{ids} ;}
f537ebc4 162 \CASE{declaration \NT{ids} ;}
413ffc02 163 \CASE{field \opt{list} \NT{ids} ;}
faf9a90c
C
164 \CASE{typedef \NT{ids} ;}
165 \CASE{declarer name \NT{ids} ;}
166% \CASE{\opt{local} function \NT{pmid\_with\_not\_eq\_list} ;}
951c7801 167 \CASE{declarer \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_regexp}\mth{)} ;}
faf9a90c
C
168 \CASE{declarer \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
169 \CASE{iterator name \NT{ids} ;}
951c7801 170 \CASE{iterator \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_regexp}\mth{)} ;}
faf9a90c
C
171 \CASE{iterator \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
172% \CASE{error \NT{pmid\_with\_not\_eq\_list} ; }
173 \CASE{\opt{local} idexpression \opt{\NT{ctype}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
174 \CASE{\opt{local} idexpression \OPT{\ttlb \NT{ctypes}\ttrb~\any{*}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
175 \CASE{\opt{local} idexpression \some{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
176 \CASE{expression list \NT{ids} ;}
177 \CASE{expression \some{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
e6509c05
C
178 \CASE{expression enum \any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
179 \CASE{expression struct \any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
180 \CASE{expression union \any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
faf9a90c 181 \CASE{expression \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_ceq}\mth{)} ;}
88e71198
C
182 \CASE{expression list [ \NT{id} ] \NT{ids} ;}
183 \CASE{expression list [ \NT{const} ] \NT{ids} ;}
faf9a90c
C
184 \CASE{\NT{ctype} [ ] \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
185 \CASE{\NT{ctype} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_ceq}\mth{)} ;}
186 \CASE{\ttlb \NT{ctypes}\ttrb~\any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_ceq}\mth{)} ;}
187 \CASE{\ttlb \NT{ctypes}\ttrb~\any{*} [ ] \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
188 \CASE{constant \opt{\NT{ctype}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
189 \CASE{constant \OPT{\ttlb \NT{ctypes}\ttrb~\any{*}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
190 \CASE{position \opt{any} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq\_mid}\mth{)} ;}
97111a47 191 \CASE{symbol \NT{ids};}
faf9a90c
C
192\end{grammar}
193
190f1acf
C
194A metavariable declaration local idexpression v means that v is restricted
195to be a local variable. If it should just be a variable, but not
196necessarily a local one, then drop local. A more complex description of a
197location, such as a->b is considered to be an expression, not an
198ideexpression.
199
200Constant is for constants, such as 27. But it also considers an identifier
201that is all capital letters (possibly containing numbers) as a constant as
202well, because the names gives to macros in Linux usually have this form.
203
204An identifier is the name of a structure field, a macro, a function, or a
205variable. Is is the name of something rather than an expression that has a
206value. But an identifier can be used in the position of an expression as
207well, where it represents a variable.
208
88e71198
C
209It is possible to specify that an expression list or a parameter list
210metavariable should match a specific number of expressions or parameters.
211
8babbc8f
C
212It is possible to specify some information about the definition of a fresh
213identifier. See the wiki.
214
97111a47
C
215A symbol declaration specifies that the provided identifiers should be
216considered C identifiers when encountered in the body of the rule.
217Identifiers in the body of the rule that are not declared explicitly are
218by default considered symbols, thus symbol declarations are optional.
219
17ba0788
C
220A position metavariable is used by attaching it using \texttt{@} to any
221token, including another metavariable. Its value is the position (file,
222line number, etc.) of the code matched by the token. It is also possible
223to attach expression, declaration, type, initialiser, and statement
224metavariables in this manner. In that case, the metavariable is bound to
225the closest enclosing expression, declaration, etc. If such a metavariable
226is itself followed by a position metavariable, the position metavariable
227applies to the metavariable that it follows, and not to the attached token.
228This makes it possible to get eg the starting and ending position of {\tt
229 f(...)}, by writing {\tt f(...)@E@p}, for expression metavariable {\tt E}
230and position metavariable {\tt p}.
8babbc8f 231
faf9a90c
C
232\begin{grammar}
233 \RULE{\rt{ids}}
234 \CASE{\NT{COMMA\_LIST}\mth{(}\NT{pmid}\mth{)}}
235
236 \RULE{\rt{pmid}}
237 \CASE{\T{id}}
238 \CASE{\NT{mid}}
239% \CASE{list}
240% \CASE{error}
241% \CASE{type}
242
243 \RULE{\rt{mid}} \CASE{\T{rulename\_id}.\T{id}}
244
951c7801 245 \RULE{\rt{pmid\_with\_regexp}}
f3c4ece6 246 \CASE{\NT{pmid} =\~{} \NT{regexp}}
7fe62b65 247 \CASE{\NT{pmid} !\~{} \NT{regexp}}
951c7801 248
faf9a90c 249 \RULE{\rt{pmid\_with\_not\_eq}}
5636bb2c
C
250 \CASE{\NT{pmid} \OPT{!= \NT{id\_or\_meta}}}
251 \CASE{\NT{pmid}
252 \OPT{!= \ttlb~\NT{COMMA\_LIST}\mth{(}\NT{id\_or\_meta}\mth{)} \ttrb}}
faf9a90c 253
55d38388 254 \RULE{\rt{pmid\_with\_virt\_or\_not\_eq}}
ae4735db 255 \CASE{virtual.\T{id}}
55d38388
C
256 \CASE{\NT{pmid\_with\_not\_eq}}
257
258 \RULE{\rt{pmid\_with\_not\_ceq}}
faf9a90c
C
259 \CASE{\NT{pmid} \OPT{!= \NT{id\_or\_cst}}}
260 \CASE{\NT{pmid} \OPT{!= \ttlb~\NT{COMMA\_LIST}\mth{(}\NT{id\_or\_cst}\mth{)} \ttrb}}
261
262 \RULE{\rt{id\_or\_cst}}
263 \CASE{\T{id}}
264 \CASE{\T{integer}}
265
5636bb2c
C
266 \RULE{\rt{id\_or\_meta}}
267 \CASE{\T{id}}
268 \CASE{\T{rulename\_id}.\T{id}}
269
faf9a90c
C
270 \RULE{\rt{pmid\_with\_not\_eq\_mid}}
271 \CASE{\NT{pmid} \OPT{!= \NT{mid}}}
272 \CASE{\NT{pmid} \OPT{!= \ttlb~\NT{COMMA\_LIST}\mth{(}\NT{mid}\mth{)} \ttrb}}
273\end{grammar}
274
275Subsequently, we refer to arbitrary metavariables as
276\mth{\msf{metaid}^{\mbox{\scriptsize{\it{ty}}}}}, where {\it{ty}}
277indicates the {\it metakind} used in the declaration of the variable.
278For example, \mth{\msf{metaid}^{\ssf{Type}}} refers to a metavariable
279that was declared using \texttt{type} and stands for any type.
280
b23ff9c7
C
281{\tt metavariable} declares a metavariable for which the parser tried to
282figure out the metavariable type based on the usage context. Such a
283metavariable must be used consistently. These metavariables cannot be used
284in all contexts; specifically, they cannot be used in context that would
285make the parsing ambiguous. Some examples are the leftmost term of an
286expression, such as the left-hand side of an assignment, or the type in a
287variable declaration. These restrictions may seems somewhat arbitrary from
288the user's point of view. Thus, it is better to use metavariables with
289metavariable types. If Coccinelle is given the argument {\tt
290 -parse\_cocci}, it will print information about the type that is inferred
291for each metavariable.
292
faf9a90c
C
293The \NT{ctype} and \NT{ctypes} nonterminals are used by both the grammar of
294metavariable declarations and the grammar of transformations, and are
295defined on page~\pageref{types}.
296
ae4735db
C
297An identifier metavariable with {\tt virtual} as its ``rule name'' is given
298a value on the command line. For example, if a semantic patch contains a
299rule that declares an identifier metavariable with the name {\tt
300 virtual.alloc}, then the command line could contain {\tt -D
301 alloc=kmalloc}. There should not be space around the {\tt =}. An
302example is in {\tt demos/vm.cocci} and {\tt demos/vm.c}.
303
8babbc8f
C
304
305\paragraph*{Warning:} Each metavariable declaration causes the declared
306metavariables to be immediately usable, without any inheritance
307indication. Thus the following are correct:
308
309\begin{quote}
310\begin{verbatim}
311@@
312type r.T;
313T x;
314@@
315
316[...] // some semantic patch code
317\end{verbatim}
318\end{quote}
319
320\begin{quote}
321\begin{verbatim}
322@@
323r.T x;
324type r.T;
325@@
326
327[...] // some semantic patch code
328\end{verbatim}
329\end{quote}
330
331\noindent
332But the following is not correct:
333
334\begin{quote}
335\begin{verbatim}
336@@
337type r.T;
338r.T x;
339@@
340
341[...] // some semantic patch code
342\end{verbatim}
343\end{quote}
344
345This applies to position variables, type metavariables, identifier
346metavariables that may be used in specifying a structure type, and
347metavariables used in the initialization of a fresh identifier. In the
348case of a structure type, any identifier metavariable indeed has to be
349declared as an identifier metavariable in advance. The syntax does not
350permit {\tt r.n} as the name of a structure or union type in such a
351declaration.
352
b1b2de81
C
353\section{Metavariables for scripts}
354
355Metavariables for scripts can only be inherited from transformation rules.
356In the spirit of scripting languages such as Python that use dynamic
357typing, metavariables for scripts do not include type declarations.
358
359\begin{grammar}
360 \RULE{\rt{script\_metavariables}}
413ffc02 361 \CASE{@ script:\NT{language} \OPT{\NT{rulename}} \OPT{depends on \NT{dep}} @
b1b2de81 362 \any{\NT{script\_metadecl}} @@}
5636bb2c
C
363 \CASE{@ initialize:\NT{language} \OPT{depends on \NT{dep}} @}
364 \CASE{@ finalize:\NT{language} \OPT{depends on \NT{dep}} @}
b1b2de81 365
413ffc02 366 \RULE{\rt{language}} \CASE{python} \CASE{ocaml}
b1b2de81 367
413ffc02
C
368 \RULE{\rt{script\_metadecl}}
369 \CASE{\T{id} <{}< \T{rulename\_id}.\T{id} ;}
370 \CASE{\T{id} ;}
b1b2de81
C
371\end{grammar}
372
174d1640
C
373Currently, the only scripting languages that are supported are Python and
374OCaml, indicated using {\tt python} and {\tt ocaml}, respectively. The
b1b2de81
C
375set of available scripting languages may be extended at some point.
376
377Script rules declared with \KW{initialize} are run before the treatment of
378any file. Script rules declared with \KW{finalize} are run when the
379treatment of all of the files has completed. There can be at most one of
380each per scripting language (thus currently at most one of each).
381Initialize and finalize script rules do not have access to SmPL
382metavariables. Nevertheless, a finalize script rule can access any
383variables initialized by the other script rules, allowing information to be
384transmitted from the matching process to the finalize rule.
385
413ffc02
C
386A script metavariable that does not specify an origin, using \texttt{<<},
387is newly declared by the script. This metavariable should be assigned to a
388string and can be inherited by subsequent rules as an identifier. In
389Python, the assignment of such a metavariable $x$ should refer to the
390metavariable as {\tt coccinelle.\(x\)}. Examples are in the files
391\texttt{demos/pythontococci.cocci} and \texttt{demos/camltococci.cocci}.
392
393In an ocaml script, the following extended form of \textit{script\_metadecl}
394may be used:
395
396\begin{grammar}
397 \RULE{\rt{script\_metadecl}}
398 \CASE{(\T{id},\T{id}) <{}< \T{rulename\_id}.\T{id} ;}
399 \CASE{\T{id} <{}< \T{rulename\_id}.\T{id} ;}
400 \CASE{\T{id} ;}
401\end{grammar}
402
403\noindent
404In a declaration of the form \texttt{(\T{id},\T{id}) <{}<
405 \T{rulename\_id}.\T{id} ;}, the left component of \texttt{(\T{id},\T{id})}
406receives a string representation of the value of the inherited metavariable
407while the right component receives its abstract syntax tree. The file
408\texttt{parsing\_c/ast\_c.ml} in the Coccinelle implementation gives some
409information about the structure of the abstract syntax tree. Either the
410left or right component may be replaced by \verb+_+, indicating that the
411string representation or abstract syntax trees representation is not
412wanted, respectively.
413
b23ff9c7
C
414The abstract syntax tree of a metavariable declared using {\tt
415 metavariable} is not available.
416
faf9a90c
C
417\section{Transformation}
418
97111a47
C
419The transformation specification essentially has the form of C code, except
420that lines to remove are annotated with \verb+-+ in the first column, and
421lines to add are annotated with \verb-+-. A transformation specification
422can also use {\em dots}, ``\verb-...-'', describing an arbitrary sequence
423of function arguments or instructions within a control-flow path.
424Implicitly, ``\verb-...-'' matches the shortest path between something that
425matches the pattern before the dots (or the beginning of the function, if
426there is nothing before the dots) and something that matches the pattern
427after the dots (or the end of the function, if there is nothing after the
428dots). Dots may be modified with a {\tt when} clause, indicating a pattern
429that should not occur anywhere within the matched sequence. {\tt when any}
430removes the aforementioned constraint that ``\verb-...-'' matches the
431shortest path. Finally, a transformation can specify a disjunction of
432patterns, of the form \mtt{( \mth{\mita{pat}_1} | \mita{\ldots} |
433 \mth{\mita{pat}_n} )} where each \texttt{(}, \texttt{|} or \texttt{)} is
434in column 0 or preceded by \texttt{\textbackslash}.
faf9a90c
C
435
436The grammar that we present for the transformation is not actually the
437grammar of the SmPL code that can be written by the programmer, but is
438instead the grammar of the slice of this consisting of the {\tt -}
439annotated and the unannotated code (the context of the transformed lines),
440or the {\tt +} annotated code and the unannotated code. For example, for
441parsing purposes, the following transformation
442%presented in Section \ref{sec:seq2}
443is split into the two variants shown below and each is parsed
444separately.
445
446\begin{center}
447\begin{tabular}{c}
448\begin{lstlisting}[language=Cocci]
449 proc_info_func(...) {
450 <...
451@-- hostno
452@++ hostptr->host_no
453 ...>
454 }
455\end{lstlisting}\\
456\end{tabular}
457\end{center}
458
459{%\sizecodebis
460\begin{center}
461\begin{tabular}{p{5cm}p{3cm}p{5cm}}
462\begin{lstlisting}[language=Cocci]
463 proc_info_func(...) {
464 <...
465@-- hostno
466 ...>
467 }
468\end{lstlisting}
469&&
470\begin{lstlisting}[language=Cocci]
471 proc_info_func(...) {
472 <...
473@++ hostptr->host_no
474 ...>
475 }
476\end{lstlisting}
477\end{tabular}
478\end{center}
479}
480
481\noindent
482Requiring that both slices parse correctly ensures that the rule matches
483syntactically valid C code and that it produces syntactically valid C code.
484The generated parse trees are then merged for use in the subsequent
485matching and transformation process.
486
487The grammar for the minus or plus slice of a transformation is as follows:
488
489\begin{grammar}
490
491 \RULE{\rt{transformation}}
492 \CASE{\some{\NT{include}}}
493 \CASE{\NT{OPTDOTSEQ}\mth{(}\NT{expr}, \NT{when}\mth{)}}
494 \CASE{\NT{OPTDOTSEQ}\mth{(}\some{\NT{decl\_stmt}}, \NT{when}\mth{)}}
495 \CASE{\NT{OPTDOTSEQ}\mth{(}\NT{fundecl}, \NT{when}\mth{)}}
496
497 \RULE{\rt{include}}
498 \CASE{\#include \T{include\_string}}
499
500% \RULE{\rt{fun\_decl\_stmt}}
501% \CASE{\NT{decl\_stmt}}
502% \CASE{\NT{fundecl}}
503
504% \CASE{\NT{ctype}}
505% \CASE{\ttlb \NT{initialize\_list} \ttrb}
506% \CASE{\NT{toplevel\_seq\_start\_after\_dots\_init}}
507%
508% \RULE{\rt{toplevel\_seq\_start\_after\_dots\_init}}
509% \CASE{\NT{stmt\_dots} \NT{toplevel\_after\_dots}}
510% \CASE{\NT{expr} \opt{\NT{toplevel\_after\_exp}}}
511% \CASE{\NT{decl\_stmt\_expr} \opt{\NT{toplevel\_after\_stmt}}}
512%
513% \RULE{\rt{stmt\_dots}}
514% \CASE{... \any{\NT{when}}}
515% \CASE{<... \any{\NT{when}} \NT{nest\_after\_dots} ...>}
516% \CASE{<+... \any{\NT{when}} \NT{nest\_after\_dots} ...+>}
517
518 \RULE{\rt{when}}
519 \CASE{when != \NT{when\_code}}
520 \CASE{when = \NT{rule\_elem\_stmt}}
521 \CASE{when \NT{COMMA\_LIST}\mth{(}\NT{any\_strict}\mth{)}}
522 \CASE{when true != \NT{expr}}
523 \CASE{when false != \NT{expr}}
524
525 \RULE{\rt{when\_code}}
526 \CASE{\NT{OPTDOTSEQ}\mth{(}\some{\NT{decl\_stmt}}, \NT{when}\mth{)}}
527 \CASE{\NT{OPTDOTSEQ}\mth{(}\NT{expr}, \NT{when}\mth{)}}
528
529 \RULE{\rt{rule\_elem\_stmt}}
530 \CASE{\NT{one\_decl}}
531 \CASE{\NT{expr};}
532 \CASE{return \opt{\NT{expr}};}
533 \CASE{break;}
534 \CASE{continue;}
535 \CASE{\bs(\NT{rule\_elem\_stmt} \SOME{\bs| \NT{rule\_elem\_stmt}}\bs)}
536
537 \RULE{\rt{any\_strict}}
538 \CASE{any}
539 \CASE{strict}
540 \CASE{forall}
541 \CASE{exists}
542
543% \RULE{\rt{nest\_after\_dots}}
544% \CASE{\NT{decl\_stmt\_exp} \opt{\NT{nest\_after\_stmt}}}
545% \CASE{\opt{\NT{exp}} \opt{\NT{nest\_after\_exp}}}
546%
547% \RULE{\rt{nest\_after\_stmt}}
548% \CASE{\NT{stmt\_dots} \NT{nest\_after\_dots}}
549% \CASE{\NT{decl\_stmt} \opt{\NT{nest\_after\_stmt}}}
550%
551% \RULE{\rt{nest\_after\_exp}}
552% \CASE{\NT{stmt\_dots} \NT{nest\_after\_dots}}
553%
554% \RULE{\rt{toplevel\_after\_dots}}
555% \CASE{\opt{\NT{toplevel\_after\_exp}}}
556% \CASE{\NT{exp} \opt{\NT{toplevel\_after\_exp}}}
557% \CASE{\NT{decl\_stmt\_expr} \NT{toplevel\_after\_stmt}}
558%
559% \RULE{\rt{toplevel\_after\_exp}}
560% \CASE{\NT{stmt\_dots} \opt{\NT{toplevel\_after\_dots}}}
561%
562% \RULE{\rt{decl\_stmt\_expr}}
563% \CASE{TMetaStmList$^\ddag$}
564% \CASE{\NT{decl\_var}}
565% \CASE{\NT{stmt}}
566% \CASE{(\NT{stmt\_seq} \ANY{| \NT{stmt\_seq}})}
567%
568% \RULE{\rt{toplevel\_after\_stmt}}
569% \CASE{\NT{stmt\_dots} \opt{\NT{toplevel\_after\_dots}}}
570% \CASE{\NT{decl\_stmt} \NT{toplevel\_after\_stmt}}
571
572\end{grammar}
573
574\begin{grammar}
575 \RULE{\rt{OPTDOTSEQ}\mth{(}\rt{grammar\_ds}, \rt{when\_ds}\mth{)}}
576 \CASE{}\multicolumn{3}{r}{\hspace{1cm}
97111a47
C
577 \KW{\opt{... \ANY{\NT{when\_ds}}} \NT{grammar\_ds}
578 \ANY{... \ANY{\NT{when\_ds}} \NT{grammar\_ds}}
579 \opt{... \ANY{\NT{when\_ds}}}}
faf9a90c
C
580 }
581
582% \CASE{\opt{... \opt{\NT{when\_ds}}} \NT{grammar}
583% \ANY{... \opt{\NT{when\_ds}} \NT{grammar}}
584% \opt{... \opt{\NT{when\_ds}}}}
585% \CASE{<... \any{\NT{when\_ds}} \NT{grammar} ...>}
586% \CASE{<+... \any{\NT{when\_ds}} \NT{grammar} ...+>}
587
588\end{grammar}
589
590\noindent
591Lines may be annotated with an element of the set $\{\mtt{-}, \mtt{+},
592\mtt{*}\}$ or the singleton $\mtt{?}$, or one of each set. \mtt{?}
97111a47
C
593represents at most one match of the given pattern, ie a match of the
594pattern is optional. \mtt{*} is used for
faf9a90c
C
595semantic match, \emph{i.e.}, a pattern that highlights the fragments
596annotated with \mtt{*}, but does not perform any modification of the
597matched code. \mtt{*} cannot be mixed with \mtt{-} and \mtt{+}. There are
598some constraints on the use of these annotations:
599\begin{itemize}
600\item Dots, {\em i.e.} \texttt{...}, cannot occur on a line marked
601 \texttt{+}.
602\item Nested dots, {\em i.e.}, dots enclosed in {\tt <} and {\tt >}, cannot
603 occur on a line with any marking.
604\end{itemize}
605
0708f913
C
606Each element of a disjunction must be a proper term like an
607expression, a statement, an identifier or a declaration. Thus, the
413ffc02 608rule on the left below is not a syntactically correct SmPL rule. One may
0708f913
C
609use the rule on the right instead.
610
611\begin{center}
612 \begin{tabular}{l@{\hspace{5cm}}r}
613\begin{lstlisting}[language=Cocci]
614@@
615type T;
616T b;
617@@
618
619(
620 writeb(...,
621|
f537ebc4 622 readb(...,
0708f913
C
623)
624@--(T)
625 b)
626\end{lstlisting}
627 &
628\begin{lstlisting}[language=Cocci]
629@@
630type T;
631T b;
632@@
633
634(
635read
636|
637write
638)
639 (...,
640@-- (T)
641 b)
642\end{lstlisting}
643 \\
644 \end{tabular}
645\end{center}
646
f537ebc4
C
647Some kinds of terms can only appear in + code. These include comments,
648ifdefs, and attributes (\texttt{\_\_attribute\_\_((...))}).
649
faf9a90c
C
650\section{Types}
651\label{types}
652
653\begin{grammar}
654
655 \RULE{\rt{ctypes}}
656 \CASE{\NT{COMMA\_LIST}\mth{(}\NT{ctype}\mth{)}}
657
658 \RULE{\rt{ctype}}
659 \CASE{\opt{\NT{const\_vol}} \NT{generic\_ctype} \any{*}}
660 \CASE{\opt{\NT{const\_vol}} void \some{*}}
661 \CASE{(\NT{ctype} \ANY{| \NT{ctype}})}
662
663 \RULE{\rt{const\_vol}}
664 \CASE{const}
665 \CASE{volatile}
666
667 \RULE{\rt{generic\_ctype}}
668 \CASE{\NT{ctype\_qualif}}
669 \CASE{\opt{\NT{ctype\_qualif}} char}
670 \CASE{\opt{\NT{ctype\_qualif}} short}
f3c4ece6 671 \CASE{\opt{\NT{ctype\_qualif}} short int}
faf9a90c
C
672 \CASE{\opt{\NT{ctype\_qualif}} int}
673 \CASE{\opt{\NT{ctype\_qualif}} long}
f3c4ece6 674 \CASE{\opt{\NT{ctype\_qualif}} long int}
faf9a90c 675 \CASE{\opt{\NT{ctype\_qualif}} long long}
f3c4ece6 676 \CASE{\opt{\NT{ctype\_qualif}} long long int}
faf9a90c 677 \CASE{double}
f3c4ece6 678 \CASE{long double}
faf9a90c 679 \CASE{float}
1eddfd50 680 \CASE{size\_t} \CASE{ssize\_t} \CASE{ptrdiff\_t}
c491d8ee 681 \CASE{enum \NT{id} \{ \NT{PARAMSEQ}\mth{(}\NT{dot\_expr}, \NT{exp\_whencode}\mth{)} \OPT{,} \}}
faf9a90c
C
682 \CASE{\OPT{struct\OR union} \T{id} \OPT{\{ \any{\NT{struct\_decl\_list}} \}}}
683
684 \RULE{\rt{ctype\_qualif}}
685 \CASE{unsigned}
686 \CASE{signed}
687
688 \RULE{\rt{struct\_decl\_list}}
689 \CASE{\NT{struct\_decl\_list\_start}}
690
691 \RULE{\rt{struct\_decl\_list\_start}}
692 \CASE{\NT{struct\_decl}}
693 \CASE{\NT{struct\_decl} \NT{struct\_decl\_list\_start}}
694 \CASE{... \opt{when != \NT{struct\_decl}}$^\dag$ \opt{\NT{continue\_struct\_decl\_list}}}
695
696 \RULE{\rt{continue\_struct\_decl\_list}}
697 \CASE{\NT{struct\_decl} \NT{struct\_decl\_list\_start}}
698 \CASE{\NT{struct\_decl}}
699
700 \RULE{\rt{struct\_decl}}
701 \CASE{\NT{ctype} \NT{d\_ident};}
702 \CASE{\NT{fn\_ctype} (* \NT{d\_ident}) (\NT{PARAMSEQ}\mth{(}\NT{name\_opt\_decl}, \mth{\varepsilon)});)}
703 \CASE{\opt{\NT{const\_vol}} \T{id} \NT{d\_ident};}
704
705 \RULE{\rt{d\_ident}}
c491d8ee 706 \CASE{\T{id} \any{[\opt{\NT{expr}}]}}
faf9a90c
C
707
708 \RULE{\rt{fn\_ctype}}
709 \CASE{\NT{generic\_ctype} \any{*}}
710 \CASE{void \any{*}}
711
712 \RULE{\rt{name\_opt\_decl}}
713 \CASE{\NT{decl}}
714 \CASE{\NT{ctype}}
715 \CASE{\NT{fn\_ctype}}
716\end{grammar}
717
718$^\dag$ The optional \texttt{when} construct ends at the end of the line.
719
720\section{Function declarations}
721
722\begin{grammar}
723
724 \RULE{\rt{fundecl}}
725 \CASE{\opt{\NT{fn\_ctype}} \any{\NT{funinfo}} \NT{funid}
726 (\opt{\NT{PARAMSEQ}\mth{(}\NT{param}, \mth{\varepsilon)}})
727 \ttlb~\opt{\NT{stmt\_seq}} \ttrb}
728
729 \RULE{\rt{funproto}}
730 \CASE{\opt{\NT{fn\_ctype}} \any{\NT{funinfo}} \NT{funid}
731 (\opt{\NT{PARAMSEQ}\mth{(}\NT{param}, \mth{\varepsilon)}});}
732
733 \RULE{\rt{funinfo}}
734 \CASE{inline}
735 \CASE{\NT{storage}}
736% \CASE{\NT{attr}}
737
738 \RULE{\rt{storage}}
739 \CASE{static}
740 \CASE{auto}
741 \CASE{register}
742 \CASE{extern}
743
744 \RULE{\rt{funid}}
745 \CASE{\T{id}}
746 \CASE{\mth{\T{metaid}^{\ssf{Id}}}}
d3f655c6 747 \CASE{\NT{OR}\mth{(}\NT{stmt}\mth{)}}
faf9a90c
C
748% \CASE{\mth{\T{metaid}^{\ssf{Func}}}}
749% \CASE{\mth{\T{metaid}^{\ssf{LocalFunc}}}}
750
751 \RULE{\rt{param}}
752 \CASE{\NT{type} \T{id}}
753 \CASE{\mth{\T{metaid}^{\ssf{Param}}}}
754 \CASE{\mth{\T{metaid}^{\ssf{ParamList}}}}
755
756 \RULE{\rt{decl}}
757 \CASE{\NT{ctype} \NT{id}}
758 \CASE{\NT{fn\_ctype} (* \NT{id}) (\NT{PARAMSEQ}\mth{(}\NT{name\_opt\_decl}, \mth{\varepsilon)})}
759 \CASE{void}
760 \CASE{\mth{\T{metaid}^{\ssf{Param}}}}
761\end{grammar}
762
763\begin{grammar}
764 \RULE{\rt{PARAMSEQ}\mth{(}\rt{gram\_p}, \rt{when\_p}\mth{)}}
765 \CASE{\NT{COMMA\_LIST}\mth{(}\NT{gram\_p} \OR \ldots \opt{\NT{when\_p}}\mth{)}}
766\end{grammar}
767
90aeb998
C
768To match a function it is not necessary to provide all of the annotations
769that appear before the function name. For example, the following semantic
770patch:
771
772\begin{lstlisting}[language=Cocci]
773@@
774@@
775
776foo() { ... }
777\end{lstlisting}
778
779\noindent
780matches a function declared as follows:
781
782\begin{lstlisting}[language=C]
783static int foo() { return 12; }
784\end{lstlisting}
785
786\noindent
787This behavior can be turned off by disabling the \KW{optional\_storage}
788isomorphism. If one adds code before a function declaration, then the
789effect depends on the kind of code that is added. If the added code is a
790function definition or CPP code, then the new code is placed before
791all information associated with the function definition, including any
97111a47 792comments preceding the function definition. On the other hand, if the new
90aeb998
C
793code is associated with the function, such as the addition of the keyword
794{\tt static}, the new code is placed exactly where it appears with respect
97111a47 795to the rest of the function definition in the semantic patch. For example,
90aeb998
C
796
797\begin{lstlisting}[language=Cocci]
798@@
799@@
800
801+ static
802foo() { ... }
803\end{lstlisting}
804
805\noindent
806causes static to be placed just before the function name. The following
807causes it to be placed just before the type
808
809\begin{lstlisting}[language=Cocci]
810@@
811type T;
812@@
813
814+ static
815T foo() { ... }
816\end{lstlisting}
817
818\noindent
413ffc02 819It may be necessary to consider several cases to ensure that the added ode
90aeb998
C
820is placed in the right position. For example, one may need one pattern
821that considers that the function is declared {\tt inline} and another that
822considers that it is not.
823
faf9a90c
C
824%\newpage
825
826\section{Declarations}
827
828\begin{grammar}
829 \RULE{\rt{decl\_var}}
830% \CASE{\NT{type} \opt{\NT{id} \opt{[\opt{\NT{dot\_expr}}]}
831% \ANY{, \NT{id} \opt{[ \opt{\NT{dot\_expr}}]}}};}
832 \CASE{\NT{common\_decl}}
833 \CASE{\opt{\NT{storage}} \NT{ctype} \NT{COMMA\_LIST}\mth{(}\NT{d\_ident}\mth{)} ;}
834 \CASE{\opt{\NT{storage}} \opt{\NT{const\_vol}} \T{id} \NT{COMMA\_LIST}\mth{(}\NT{d\_ident}\mth{)} ;}
835 \CASE{\opt{\NT{storage}} \NT{fn\_ctype} ( * \NT{d\_ident} ) ( \NT{PARAMSEQ}\mth{(}\NT{name\_opt\_decl}, \mth{\varepsilon)} ) = \NT{initialize} ;}
836 \CASE{typedef \NT{ctype} \NT{typedef\_ident} ;}
837
838 \RULE{\rt{one\_decl}}
839 \CASE{\NT{common\_decl}}
840 \CASE{\opt{\NT{storage}} \NT{ctype} \NT{id};}
841% \CASE{\NT{storage} \NT{ctype} \NT{id} \opt{[\opt{\NT{dot\\_expr}}]} = \NT{nest\\_expr};}
842 \CASE{\opt{\NT{storage}} \opt{\NT{const\_vol}} \T{id} \NT{d\_ident} ;}
843
844 \RULE{\rt{common\_decl}}
845 \CASE{\NT{ctype};}
846 \CASE{\NT{funproto}}
847 \CASE{\opt{\NT{storage}} \NT{ctype} \NT{d\_ident} = \NT{initialize} ;}
848 \CASE{\opt{\NT{storage}} \opt{\NT{const\_vol}} \T{id} \NT{d\_ident} = \NT{initialize} ;}
849 \CASE{\opt{\NT{storage}} \NT{fn\_ctype} ( * \NT{d\_ident} ) ( \NT{PARAMSEQ}\mth{(}\NT{name\_opt\_decl}, \mth{\varepsilon)} ) ;}
850 \CASE{\NT{decl\_ident} ( \OPT{\NT{COMMA\_LIST}\mth{(}\NT{expr}\mth{)}} ) ;}
851
852 \RULE{\rt{initialize}}
853 \CASE{\NT{dot\_expr}}
c491d8ee 854 \CASE{\mth{\T{metaid}^{\ssf{Initialiser}}}}
8f657093 855 \CASE{\ttlb~\opt{\NT{COMMA\_LIST}\mth{(}\NT{init\_list\_elem}\mth{)}}~\ttrb}
faf9a90c 856
c491d8ee
C
857 \RULE{\rt{init\_list\_elem}}
858 \CASE{\NT{dot\_expr}}
97111a47 859 \CASE{\NT{designator} = \NT{initialize}}
8f657093
C
860 \CASE{\mth{\T{metaid}^{\ssf{Initialiser}}}}
861 \CASE{\mth{\T{metaid}^{\ssf{InitialiserList}}}}
c491d8ee
C
862 \CASE{\NT{id} : \NT{dot\_expr}}
863
864 \RULE{\rt{designator}}
865 \CASE{. \NT{id}}
866 \CASE{[ \NT{dot\_expr} ]}
867 \CASE{[ \NT{dot\_expr} ... \NT{dot\_expr} ]}
868
faf9a90c
C
869 \RULE{\rt{decl\_ident}}
870 \CASE{\T{DeclarerId}}
871 \CASE{\mth{\T{metaid}^{\ssf{Declarer}}}}
872\end{grammar}
873
8f657093
C
874An initializer for a structure can be ordered or unordered. It is
875considered to be unordered if there is at least one key-value pair
876initializer, e.g., \texttt{.x = e}.
877
faf9a90c
C
878\section{Statements}
879
880The first rule {\em statement} describes the various forms of a statement.
881The remaining rules implement the constraints that are sensitive to the
882context in which the statement occurs: {\em single\_statement} for a
883context in which only one statement is allowed, and {\em decl\_statement}
884for a context in which a declaration, statement, or sequence thereof is
885allowed.
886
887\begin{grammar}
888 \RULE{\rt{stmt}}
889 \CASE{\NT{include}}
890 \CASE{\mth{\T{metaid}^{\ssf{Stmt}}}}
891 \CASE{\NT{expr};}
892 \CASE{if (\NT{dot\_expr}) \NT{single\_stmt} \opt{else \NT{single\_stmt}}}
893 \CASE{for (\opt{\NT{dot\_expr}}; \opt{\NT{dot\_expr}}; \opt{\NT{dot\_expr}})
894 \NT{single\_stmt}}
895 \CASE{while (\NT{dot\_expr}) \NT{single\_stmt}}
896 \CASE{do \NT{single\_stmt} while (\NT{dot\_expr});}
897 \CASE{\NT{iter\_ident} (\any{\NT{dot\_expr}}) \NT{single\_stmt}}
898 \CASE{switch (\opt{\NT{dot\_expr}}) \ttlb \any{\NT{case\_line}} \ttrb}
899 \CASE{return \opt{\NT{dot\_expr}};}
900 \CASE{\ttlb~\opt{\NT{stmt\_seq}} \ttrb}
901 \CASE{\NT{NEST}\mth{(}\some{\NT{decl\_stmt}}, \NT{when}\mth{)}}
902 \CASE{\NT{NEST}\mth{(}\NT{expr}, \NT{when}\mth{)}}
903 \CASE{break;}
904 \CASE{continue;}
905 \CASE{\NT{id}:}
906 \CASE{goto \NT{id};}
907 \CASE{\ttlb \NT{stmt\_seq} \ttrb}
908
909 \RULE{\rt{single\_stmt}}
910 \CASE{\NT{stmt}}
911 \CASE{\NT{OR}\mth{(}\NT{stmt}\mth{)}}
912
913 \RULE{\rt{decl\_stmt}}
914 \CASE{\mth{\T{metaid}^{\ssf{StmtList}}}}
915 \CASE{\NT{decl\_var}}
916 \CASE{\NT{stmt}}
917 \CASE{\NT{OR}\mth{(}\NT{stmt\_seq}\mth{)}}
918
919 \RULE{\rt{stmt\_seq}}
920 \CASE{\any{\NT{decl\_stmt}}
921 \opt{\NT{DOTSEQ}\mth{(}\some{\NT{decl\_stmt}},
922 \NT{when}\mth{)} \any{\NT{decl\_stmt}}}}
923 \CASE{\any{\NT{decl\_stmt}}
924 \opt{\NT{DOTSEQ}\mth{(}\NT{expr},
925 \NT{when}\mth{)} \any{\NT{decl\_stmt}}}}
926
927 \RULE{\rt{case\_line}}
928 \CASE{default :~\NT{stmt\_seq}}
929 \CASE{case \NT{dot\_expr} :~\NT{stmt\_seq}}
930
931 \RULE{\rt{iter\_ident}}
932 \CASE{\T{IteratorId}}
933 \CASE{\mth{\T{metaid}^{\ssf{Iterator}}}}
934\end{grammar}
935
936\begin{grammar}
937 \RULE{\rt{OR}\mth{(}\rt{gram\_o}\mth{)}}
938 \CASE{( \NT{gram\_o} \ANY{\ttmid \NT{gram\_o}})}
939
940 \RULE{\rt{DOTSEQ}\mth{(}\rt{gram\_d}, \rt{when\_d}\mth{)}}
941 \CASE{\ldots \opt{\NT{when\_d}} \ANY{\NT{gram\_d} \ldots \opt{\NT{when\_d}}}}
942
943 \RULE{\rt{NEST}\mth{(}\rt{gram\_n}, \rt{when\_n}\mth{)}}
944 \CASE{<\ldots \opt{\NT{when\_n}} \NT{gram\_n} \ANY{\ldots \opt{\NT{when\_n}} \NT{gram\_n}} \ldots>}
945 \CASE{<+\ldots \opt{\NT{when\_n}} \NT{gram\_n} \ANY{\ldots \opt{\NT{when\_n}} \NT{gram\_n}} \ldots+>}
946\end{grammar}
947
948\noindent
949OR is a macro that generates a disjunction of patterns. The three
950tokens \T{(}, \T{\ttmid}, and \T{)} must appear in the leftmost
951column, to differentiate them from the parentheses and bit-or tokens
952that can appear within expressions (and cannot appear in the leftmost
953column). These token may also be preceded by \texttt{\bs}
954when they are used in an other column. These tokens are furthermore
955different from (, \(\mid\), and ), which are part of the grammar
956metalanguage.
957
958\section{Expressions}
959
960A nest or a single ellipsis is allowed in some expression contexts, and
961causes ambiguity in others. For example, in a sequence \mtt{\ldots
962\mita{expr} \ldots}, the nonterminal \mita{expr} must be instantiated as an
963explicit C-language expression, while in an array reference,
964\mtt{\mth{\mita{expr}_1} \mtt{[} \mth{\mita{expr}_2} \mtt{]}}, the
965nonterminal \mth{\mita{expr}_2}, because it is delimited by brackets, can
966be also instantiated as \mtt{\ldots}, representing an arbitrary expression. To
967distinguish between the various possibilities, we define three nonterminals
968for expressions: {\em expr} does not allow either top-level nests or
969ellipses, {\em nest\_expr} allows a nest but not an ellipsis, and {\em
970dot\_expr} allows both. The EXPR macro is used to express these variants
971in a concise way.
972
973\begin{grammar}
974 \RULE{\rt{expr}}
975 \CASE{\NT{EXPR}\mth{(}\NT{expr}\mth{)}}
976
977 \RULE{\rt{nest\_expr}}
978 \CASE{\NT{EXPR}\mth{(}\NT{nest\_expr}\mth{)}}
979 \CASE{\NT{NEST}\mth{(}\NT{nest\_expr}, \NT{exp\_whencode}\mth{)}}
980
981 \RULE{\rt{dot\_expr}}
982 \CASE{\NT{EXPR}\mth{(}\NT{dot\_expr}\mth{)}}
983 \CASE{\NT{NEST}\mth{(}\NT{dot\_expr}, \NT{exp\_whencode}\mth{)}}
984 \CASE{...~\opt{\NT{exp\_whencode}}}
985
986 \RULE{\rt{EXPR}\mth{(}\rt{exp}\mth{)}}
987 \CASE{\NT{exp} \NT{assign\_op} \NT{exp}}
988 \CASE{\NT{exp}++}
989 \CASE{\NT{exp}--}
990 \CASE{\NT{unary\_op} \NT{exp}}
991 \CASE{\NT{exp} \NT{bin\_op} \NT{exp}}
992 \CASE{\NT{exp} ?~\NT{dot\_expr} :~\NT{exp}}
993 \CASE{(\NT{type}) \NT{exp}}
994 \CASE{\NT{exp} [\NT{dot\_expr}]}
995 \CASE{\NT{exp} .~\NT{id}}
996 \CASE{\NT{exp} -> \NT{id}}
997 \CASE{\NT{exp}(\opt{\NT{PARAMSEQ}\mth{(}\NT{arg}, \NT{exp\_whencode}\mth{)}})}
998 \CASE{\NT{id}}
7fe62b65 999 \CASE{(\NT{type}) \ttlb~{\NT{COMMA\_LIST}\mth{(}\NT{init\_list\_elem}\mth{)}}~\ttrb}
faf9a90c
C
1000% \CASE{\mth{\T{metaid}^{\ssf{Func}}}}
1001% \CASE{\mth{\T{metaid}^{\ssf{LocalFunc}}}}
1002 \CASE{\mth{\T{metaid}^{\ssf{Exp}}}}
1003% \CASE{\mth{\T{metaid}^{\ssf{Err}}}}
1004 \CASE{\mth{\T{metaid}^{\ssf{Const}}}}
1005 \CASE{\NT{const}}
1006 \CASE{(\NT{dot\_expr})}
1007 \CASE{\NT{OR}\mth{(}\NT{exp}\mth{)}}
1008
1009 \RULE{\rt{arg}}
1010 \CASE{\NT{nest\_expr}}
1011 \CASE{\mth{\T{metaid}^{\ssf{ExpList}}}}
1012
1013 \RULE{\rt{exp\_whencode}}
1014 \CASE{when != \NT{expr}}
1015
1016 \RULE{\rt{assign\_op}}
1017 \CASE{= \OR -= \OR += \OR *= \OR /= \OR \%=}
1018 \CASE{\&= \OR |= \OR \caret= \OR \lt\lt= \OR \gt\gt=}
1019
1020 \RULE{\rt{bin\_op}}
1021 \CASE{* \OR / \OR \% \OR + \OR -}
1022 \CASE{\lt\lt \OR \gt\gt \OR \caret\xspace \OR \& \OR \ttmid}
1023 \CASE{< \OR > \OR <= \OR >= \OR == \OR != \OR \&\& \OR \ttmid\ttmid}
1024
1025 \RULE{\rt{unary\_op}}
1026 \CASE{++ \OR -- \OR \& \OR * \OR + \OR - \OR !}
1027
1028\end{grammar}
1029
d3f655c6 1030\section{Constants, Identifiers and Types for Transformations}
faf9a90c
C
1031
1032\begin{grammar}
1033 \RULE{\rt{const}}
1034 \CASE{\NT{string}}
1035 \CASE{[0-9]+}
1036 \CASE{\mth{\cdots}}
1037
1038 \RULE{\rt{string}}
1039 \CASE{"\any{[\^{}"]}"}
1040
1041 \RULE{\rt{id}}
d3f655c6
C
1042 \CASE{\T{id} \OR \mth{\T{metaid}^{\ssf{Id}}}
1043 \OR {\NT{OR}\mth{(}\NT{stmt}\mth{)}}}
faf9a90c
C
1044
1045 \RULE{\rt{typedef\_ident}}
1046 \CASE{\T{id} \OR \mth{\T{metaid}^{\ssf{Type}}}}
1047
1048 \RULE{\rt{type}}
1049 \CASE{\NT{ctype} \OR \mth{\T{metaid}^{\ssf{Type}}}}
1050
1051 \RULE{\rt{pathToIsoFile}}
1052 \CASE{<.*>}
951c7801
C
1053
1054 \RULE{\rt{regexp}}
1055 \CASE{"\any{[\^{}"]}"}
faf9a90c
C
1056\end{grammar}
1057
97111a47 1058\section{Comments and preprocessor directives}
8babbc8f
C
1059
1060A \verb+//+ or \verb+/* */+ comment that is annotated with + in the
1061leftmost column is considered to be added code. A \verb+//+ or
97111a47 1062\verb+/* */+ comment without such an annotation is considered to be a
8babbc8f 1063comment about the SmPL code, and thus is not matched in the C code.
faf9a90c 1064
97111a47
C
1065The following preprocessor directives can likewise be added. They cannot
1066be matched against. The entire line is added, but it is not parsed.
1067
1068\begin{itemize}
1069\item \verb+if+
1070\item \verb+ifdef+
1071\item \verb+ifndef+
1072\item \verb+else+
1073\item \verb+elif+
1074\item \verb+endif+
1075\item \verb+error+
1076\item \verb+pragma+
1077\item \verb+line+
1078\end{itemize}
1079
993936c0
C
1080\section{Command-line semantic match}
1081
1082It is possible to specify a semantic match on the spatch command line,
1083using the argument {\tt -sp}. In such a semantic match, any token
1084beginning with a capital letter is assumed to be a metavariable of type
1085{\tt metavariable}. In this case, the parser must be able to figure out what
1086kind of metavariable it is. It is also possible to specify the type of a
1087metavariable by enclosing the type in :'s, concatenated directly to the
1088metavariable name.
1089
1090Some examples of semantic matches that can be given as an argument to {\tt
1091 -sp} are as follows:
1092
1093\begin{itemize}
1094\item \texttt{f(e)}: This only matches the expression \texttt{f(e)}.
1095\item \texttt{f(E)}: This matches a call to f with any argument.
1096\item \texttt{F(E)}: This gives a parse error; the semantic patch parser
1097 cannot figure out what kind of metavariable \texttt{F} is.
1098\item \texttt{F:identifier:(E)}: This matches any one argument function
1099 call.
1100\item \texttt{f:identifier:(e:struct foo *:)}: This matches any one
1101 argument function call where the argument has type \texttt{struct foo
1102 *}. Since the types of the metavariables are specified, it is not
1103 necessary for the metavariable names to begin with a capital letter.
1104\item \texttt{F:identifier:(F)}: This matches any one argument function call
1105 where the argument is the name of the function itself. This example
1106 shows that it is not necessary to repeat the metavariable type name.
1107\item \texttt{F:identifier:(F:identifier:)}: This matches any one argument
97111a47 1108 function call
993936c0
C
1109 where the argument is the name of the function itself. This example
1110 shows that it is possible to repeat the metavariable type name.
1111\end{itemize}
1112
1113\texttt{When} constraints, \textit{e.g.} \texttt{when != e}, are allowed
1114but the expression \texttt{e} must be represented as a single token.
1115
1116The generated semantic match behaves as though there were a \texttt{*} in front
1117of every token.
1118
faf9a90c
C
1119%%% Local Variables:
1120%%% mode: LaTeX
708f4980 1121%%% TeX-master: "main_grammar"
5636bb2c 1122%%% coding: utf-8
faf9a90c
C
1123%%% TeX-PDF-mode: t
1124%%% ispell-local-dictionary: "american"
1125%%% End: