Commit | Line | Data |
---|---|---|
faf9a90c C |
1 | |
2 | %\section{The SmPL Grammar} | |
3 | ||
4 | % This section presents the SmPL grammar. This definition follows closely | |
5 | % our implementation using the Menhir parser generator \cite{menhir}. | |
6 | ||
7 | This document presents the grammar of the SmPL language used by the | |
7f004419 | 8 | \href{http://coccinelle.lip6.fr/}{Coccinelle tool}. For the most |
faf9a90c C |
9 | part, the grammar is written using standard notation. In some rules, |
10 | however, the left-hand side is in all uppercase letters. These are | |
11 | macros, which take one or more grammar rule right-hand-sides as | |
12 | arguments. The grammar also uses some unspecified nonterminals, such | |
b1b2de81 C |
13 | as \T{id}, \T{const}, etc. These refer to the sets suggested by |
14 | the name, {\em i.e.}, \T{id} refers to the set of possible | |
15 | C-language identifiers, while \T{const} refers to the set of | |
978fd7e5 | 16 | possible C-language constants. |
708f4980 | 17 | % |
978fd7e5 | 18 | \ifhevea |
708f4980 | 19 | A PDF version of this documentation is available at |
951c7801 | 20 | \url{http://coccinelle.lip6.fr/docs/main_grammar.pdf}. |
708f4980 | 21 | \else |
faf9a90c | 22 | A HTML version of this documentation is available online at |
951c7801 | 23 | \url{http://coccinelle.lip6.fr/docs/main_grammar.html}. |
708f4980 | 24 | \fi |
faf9a90c | 25 | |
faf9a90c C |
26 | \section{Program} |
27 | ||
28 | \begin{grammar} | |
29 | \RULE{\rt{program}} | |
30 | \CASE{\any{\NT{include\_cocci}} \some{\NT{changeset}}} | |
31 | ||
32 | \RULE{\rt{include\_cocci}} | |
97111a47 | 33 | \CASE{include \NT{string}} |
faf9a90c C |
34 | \CASE{using \NT{string}} |
35 | \CASE{using \NT{pathToIsoFile}} | |
5636bb2c | 36 | \CASE{virtual \T{id} \ANY{, \T{id}}} |
faf9a90c C |
37 | |
38 | \RULE{\rt{changeset}} | |
39 | \CASE{\NT{metavariables} \NT{transformation}} | |
b1b2de81 | 40 | \CASE{\NT{script\_metavariables} \T{script\_code}} |
faf9a90c | 41 | % \CASE{\NT{metavariables} \ANY{--- filename +++ filename} \NT{transformation}} |
faf9a90c C |
42 | \end{grammar} |
43 | ||
b1b2de81 C |
44 | \noindent |
45 | \T{script\_code} is any code in the chosen scripting language. Parsing of | |
46 | the semantic patch does not check the validity of this code; any errors are | |
978fd7e5 C |
47 | first detected when the code is executed. Furthermore, \texttt{@} should |
48 | not be use in this code. Spatch scans the script code for the next | |
49 | \texttt{@} and considers that to be the beginning of the next rule, even if | |
8babbc8f | 50 | \texttt{@} occurs within e.g., a comment. |
b1b2de81 | 51 | |
5636bb2c C |
52 | \texttt{virtual} keyword is used to declare virtual rules. Virtual |
53 | rules may be subsequently used as a dependency for the rules in the | |
54 | SmPL file. Whether a virtual rule is defined or not is controlled by | |
55 | the \texttt{-D} option on the command line. | |
56 | ||
faf9a90c C |
57 | % Between the metavariables and the transformation rule, there can be a |
58 | % specification of constraints on the names of the old and new files, | |
59 | % analogous to the filename specifications in the standard patch syntax. | |
60 | % (see Figure \ref{scsiglue_patch}). | |
61 | ||
b1b2de81 | 62 | \section{Metavariables for transformations} |
faf9a90c C |
63 | |
64 | The \NT{rulename} portion of the metavariable declaration can specify | |
65 | properties of a rule such as its name, the names of the rules that it | |
66 | depends on, the isomorphisms to be used in processing the rule, and whether | |
67 | quantification over paths should be universal or existential. The optional | |
68 | annotation {\tt expression} indicates that the pattern is to be considered | |
69 | as matching an expression, and thus can be used to avoid some parsing | |
70 | problems. | |
71 | ||
72 | The \NT{metadecl} portion of the metavariable declaration defines various | |
73 | types of metavariables that will be used for matching in the transformation | |
74 | section. | |
75 | ||
76 | \begin{grammar} | |
77 | \RULE{\rt{metavariables}} | |
78 | \CASE{@@ \any{\NT{metadecl}} @@} | |
79 | \CASE{@ \NT{rulename} @ \any{\NT{metadecl}} @@} | |
80 | ||
81 | \RULE{\rt{rulename}} | |
82 | \CASE{\T{id} \OPT{extends \T{id}} \OPT{depends on \NT{dep}} \opt{\NT{iso}} | |
83 | \opt{\NT{disable-iso}} \opt{\NT{exists}} \opt{expression}} | |
b1b2de81 | 84 | |
faf9a90c | 85 | \RULE{\rt{dep}} |
faf9a90c C |
86 | \CASE{\T{id}} |
87 | \CASE{!\T{id}} | |
97111a47 | 88 | \CASE{!(\NT{dep})} |
faf9a90c C |
89 | \CASE{ever \T{id}} |
90 | \CASE{never \T{id}} | |
97111a47 C |
91 | \CASE{\NT{dep} \&\& \NT{dep}} |
92 | \CASE{\NT{dep} || \NT{dep}} | |
faf9a90c C |
93 | \CASE{(\NT{dep})} |
94 | ||
95 | \RULE{\rt{iso}} | |
96 | \CASE{using \NT{string} \ANY{, \NT{string}}} | |
97 | ||
98 | \RULE{\rt{disable-iso}} | |
99 | \CASE{disable \NT{COMMA\_LIST}\mth{(}\T{id}\mth{)}} | |
100 | ||
101 | \RULE{\rt{exists}} | |
102 | \CASE{exists} | |
103 | \CASE{forall} | |
104 | % \CASE{\opt{reverse} forall} | |
105 | ||
106 | \RULE{\rt{COMMA\_LIST}\mth{(}\rt{elem}\mth{)}} | |
107 | \CASE{\NT{elem} \ANY{, \NT{elem}}} | |
108 | \end{grammar} | |
109 | ||
b1b2de81 | 110 | The keyword \KW{disable} is normally used with the names of |
faf9a90c C |
111 | isomorphisms defined in standard.iso or whatever isomorphism file has been |
112 | included. There are, however, some other isomorphisms that are built into | |
113 | the implementation of Coccinelle and that can be disabled as well. Their | |
413ffc02 | 114 | names are given below. In each case, the text describes the standard |
faf9a90c C |
115 | behavior. Using \NT{disable-iso} with the given name disables this behavior. |
116 | ||
117 | \begin{itemize} | |
118 | \item \KW{optional\_storage}: A SmPL function definition that does not | |
119 | specify any visibility (i.e., static or extern), or a SmPL variable | |
120 | declaration that does not specify any storage (i.e., auto, static, | |
121 | register, or extern), matches a function declaration or variable | |
122 | declaration with any visibility or storage, respectively. | |
123 | \item \KW{optional\_qualifier}: This is similar to \KW{optional\_storage}, | |
124 | except that here is it the qualifier (i.e., const or volatile) that does | |
125 | not have to be specified in the SmPL code, but may be present in the C code. | |
126 | \item \KW{value\_format}: Integers in various formats, e.g., 1 and 0x1, are | |
127 | considered to be equivalent in the matching process. | |
5427db06 C |
128 | \item \KW{optional\_declarer\_semicolon}: Some declarers (top-level terms |
129 | that look like function calls but serve to declare some variable) don't | |
130 | require a semicolon. This isomorphism allows a SmPL declarer with a semicolon | |
131 | to match such a C declarer, if no transformation is specified on the SmPL | |
132 | semicolon. | |
faf9a90c C |
133 | \item \KW{comm\_assoc}: An expression of the form \NT{exp} \NT{bin\_op} |
134 | \KW{...}, where \NT{bin\_op} is commutative and associative, is | |
135 | considered to match any top-level sequence of \NT{bin\_op} operators | |
136 | containing \NT{exp} as the top-level argument. | |
137 | \end{itemize} | |
138 | ||
139 | The possible types of metavariable declarations are defined by the grammar | |
140 | rule below. Metavariables should occur at least once in the transformation | |
190f1acf C |
141 | immediately following their declaration. Fresh identifier metavariables |
142 | must only be used in {\tt +} code. These properties are not expressed in | |
143 | the grammar, but are checked by a subsequent analysis. The metavariables | |
144 | are designated according to the kind of terms they can match, such as a | |
145 | statement, an identifier, or an expression. An expression metavariable can | |
146 | be further constrained by its type. A declaration metavariable matches the | |
413ffc02 C |
147 | declaration of one or more variables, all sharing the same type |
148 | specification ({\em e.g.}, {\tt int a,b,c=3;}). A field metavariable does | |
149 | the same, but for structure fields. | |
faf9a90c C |
150 | |
151 | \begin{grammar} | |
152 | \RULE{\rt{metadecl}} | |
b23ff9c7 | 153 | \CASE{metavariable \NT{ids} ;} |
faf9a90c | 154 | \CASE{fresh identifier \NT{ids} ;} |
951c7801 | 155 | \CASE{identifier \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_regexp}\mth{)} ;} |
ae4735db | 156 | \CASE{identifier \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_virt\_or\_not\_eq}\mth{)} ;} |
faf9a90c C |
157 | \CASE{parameter \opt{list} \NT{ids} ;} |
158 | \CASE{parameter list [ \NT{id} ] \NT{ids} ;} | |
88e71198 | 159 | \CASE{parameter list [ \NT{const} ] \NT{ids} ;} |
faf9a90c C |
160 | \CASE{type \NT{ids} ;} |
161 | \CASE{statement \opt{list} \NT{ids} ;} | |
f537ebc4 | 162 | \CASE{declaration \NT{ids} ;} |
413ffc02 | 163 | \CASE{field \opt{list} \NT{ids} ;} |
faf9a90c C |
164 | \CASE{typedef \NT{ids} ;} |
165 | \CASE{declarer name \NT{ids} ;} | |
166 | % \CASE{\opt{local} function \NT{pmid\_with\_not\_eq\_list} ;} | |
951c7801 | 167 | \CASE{declarer \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_regexp}\mth{)} ;} |
faf9a90c C |
168 | \CASE{declarer \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;} |
169 | \CASE{iterator name \NT{ids} ;} | |
951c7801 | 170 | \CASE{iterator \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_regexp}\mth{)} ;} |
faf9a90c C |
171 | \CASE{iterator \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;} |
172 | % \CASE{error \NT{pmid\_with\_not\_eq\_list} ; } | |
173 | \CASE{\opt{local} idexpression \opt{\NT{ctype}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;} | |
174 | \CASE{\opt{local} idexpression \OPT{\ttlb \NT{ctypes}\ttrb~\any{*}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;} | |
175 | \CASE{\opt{local} idexpression \some{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;} | |
176 | \CASE{expression list \NT{ids} ;} | |
177 | \CASE{expression \some{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;} | |
e6509c05 C |
178 | \CASE{expression enum \any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;} |
179 | \CASE{expression struct \any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;} | |
180 | \CASE{expression union \any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;} | |
faf9a90c | 181 | \CASE{expression \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_ceq}\mth{)} ;} |
88e71198 C |
182 | \CASE{expression list [ \NT{id} ] \NT{ids} ;} |
183 | \CASE{expression list [ \NT{const} ] \NT{ids} ;} | |
faf9a90c C |
184 | \CASE{\NT{ctype} [ ] \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;} |
185 | \CASE{\NT{ctype} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_ceq}\mth{)} ;} | |
186 | \CASE{\ttlb \NT{ctypes}\ttrb~\any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_ceq}\mth{)} ;} | |
187 | \CASE{\ttlb \NT{ctypes}\ttrb~\any{*} [ ] \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;} | |
188 | \CASE{constant \opt{\NT{ctype}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;} | |
189 | \CASE{constant \OPT{\ttlb \NT{ctypes}\ttrb~\any{*}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;} | |
190 | \CASE{position \opt{any} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq\_mid}\mth{)} ;} | |
97111a47 | 191 | \CASE{symbol \NT{ids};} |
faf9a90c C |
192 | \end{grammar} |
193 | ||
190f1acf C |
194 | A metavariable declaration local idexpression v means that v is restricted |
195 | to be a local variable. If it should just be a variable, but not | |
196 | necessarily a local one, then drop local. A more complex description of a | |
197 | location, such as a->b is considered to be an expression, not an | |
198 | ideexpression. | |
199 | ||
200 | Constant is for constants, such as 27. But it also considers an identifier | |
201 | that is all capital letters (possibly containing numbers) as a constant as | |
202 | well, because the names gives to macros in Linux usually have this form. | |
203 | ||
204 | An identifier is the name of a structure field, a macro, a function, or a | |
205 | variable. Is is the name of something rather than an expression that has a | |
206 | value. But an identifier can be used in the position of an expression as | |
207 | well, where it represents a variable. | |
208 | ||
88e71198 C |
209 | It is possible to specify that an expression list or a parameter list |
210 | metavariable should match a specific number of expressions or parameters. | |
211 | ||
8babbc8f C |
212 | It is possible to specify some information about the definition of a fresh |
213 | identifier. See the wiki. | |
214 | ||
97111a47 C |
215 | A symbol declaration specifies that the provided identifiers should be |
216 | considered C identifiers when encountered in the body of the rule. | |
217 | Identifiers in the body of the rule that are not declared explicitly are | |
218 | by default considered symbols, thus symbol declarations are optional. | |
219 | ||
17ba0788 C |
220 | A position metavariable is used by attaching it using \texttt{@} to any |
221 | token, including another metavariable. Its value is the position (file, | |
222 | line number, etc.) of the code matched by the token. It is also possible | |
223 | to attach expression, declaration, type, initialiser, and statement | |
224 | metavariables in this manner. In that case, the metavariable is bound to | |
225 | the closest enclosing expression, declaration, etc. If such a metavariable | |
226 | is itself followed by a position metavariable, the position metavariable | |
227 | applies to the metavariable that it follows, and not to the attached token. | |
228 | This makes it possible to get eg the starting and ending position of {\tt | |
229 | f(...)}, by writing {\tt f(...)@E@p}, for expression metavariable {\tt E} | |
230 | and position metavariable {\tt p}. | |
8babbc8f | 231 | |
faf9a90c C |
232 | \begin{grammar} |
233 | \RULE{\rt{ids}} | |
234 | \CASE{\NT{COMMA\_LIST}\mth{(}\NT{pmid}\mth{)}} | |
235 | ||
236 | \RULE{\rt{pmid}} | |
237 | \CASE{\T{id}} | |
238 | \CASE{\NT{mid}} | |
239 | % \CASE{list} | |
240 | % \CASE{error} | |
241 | % \CASE{type} | |
242 | ||
243 | \RULE{\rt{mid}} \CASE{\T{rulename\_id}.\T{id}} | |
244 | ||
951c7801 | 245 | \RULE{\rt{pmid\_with\_regexp}} |
f3c4ece6 | 246 | \CASE{\NT{pmid} =\~{} \NT{regexp}} |
7fe62b65 | 247 | \CASE{\NT{pmid} !\~{} \NT{regexp}} |
951c7801 | 248 | |
faf9a90c | 249 | \RULE{\rt{pmid\_with\_not\_eq}} |
5636bb2c C |
250 | \CASE{\NT{pmid} \OPT{!= \NT{id\_or\_meta}}} |
251 | \CASE{\NT{pmid} | |
252 | \OPT{!= \ttlb~\NT{COMMA\_LIST}\mth{(}\NT{id\_or\_meta}\mth{)} \ttrb}} | |
faf9a90c | 253 | |
55d38388 | 254 | \RULE{\rt{pmid\_with\_virt\_or\_not\_eq}} |
ae4735db | 255 | \CASE{virtual.\T{id}} |
55d38388 C |
256 | \CASE{\NT{pmid\_with\_not\_eq}} |
257 | ||
258 | \RULE{\rt{pmid\_with\_not\_ceq}} | |
faf9a90c C |
259 | \CASE{\NT{pmid} \OPT{!= \NT{id\_or\_cst}}} |
260 | \CASE{\NT{pmid} \OPT{!= \ttlb~\NT{COMMA\_LIST}\mth{(}\NT{id\_or\_cst}\mth{)} \ttrb}} | |
261 | ||
262 | \RULE{\rt{id\_or\_cst}} | |
263 | \CASE{\T{id}} | |
264 | \CASE{\T{integer}} | |
265 | ||
5636bb2c C |
266 | \RULE{\rt{id\_or\_meta}} |
267 | \CASE{\T{id}} | |
268 | \CASE{\T{rulename\_id}.\T{id}} | |
269 | ||
faf9a90c C |
270 | \RULE{\rt{pmid\_with\_not\_eq\_mid}} |
271 | \CASE{\NT{pmid} \OPT{!= \NT{mid}}} | |
272 | \CASE{\NT{pmid} \OPT{!= \ttlb~\NT{COMMA\_LIST}\mth{(}\NT{mid}\mth{)} \ttrb}} | |
273 | \end{grammar} | |
274 | ||
275 | Subsequently, we refer to arbitrary metavariables as | |
276 | \mth{\msf{metaid}^{\mbox{\scriptsize{\it{ty}}}}}, where {\it{ty}} | |
277 | indicates the {\it metakind} used in the declaration of the variable. | |
278 | For example, \mth{\msf{metaid}^{\ssf{Type}}} refers to a metavariable | |
279 | that was declared using \texttt{type} and stands for any type. | |
280 | ||
b23ff9c7 C |
281 | {\tt metavariable} declares a metavariable for which the parser tried to |
282 | figure out the metavariable type based on the usage context. Such a | |
283 | metavariable must be used consistently. These metavariables cannot be used | |
284 | in all contexts; specifically, they cannot be used in context that would | |
285 | make the parsing ambiguous. Some examples are the leftmost term of an | |
286 | expression, such as the left-hand side of an assignment, or the type in a | |
287 | variable declaration. These restrictions may seems somewhat arbitrary from | |
288 | the user's point of view. Thus, it is better to use metavariables with | |
289 | metavariable types. If Coccinelle is given the argument {\tt | |
290 | -parse\_cocci}, it will print information about the type that is inferred | |
291 | for each metavariable. | |
292 | ||
faf9a90c C |
293 | The \NT{ctype} and \NT{ctypes} nonterminals are used by both the grammar of |
294 | metavariable declarations and the grammar of transformations, and are | |
295 | defined on page~\pageref{types}. | |
296 | ||
ae4735db C |
297 | An identifier metavariable with {\tt virtual} as its ``rule name'' is given |
298 | a value on the command line. For example, if a semantic patch contains a | |
299 | rule that declares an identifier metavariable with the name {\tt | |
300 | virtual.alloc}, then the command line could contain {\tt -D | |
301 | alloc=kmalloc}. There should not be space around the {\tt =}. An | |
302 | example is in {\tt demos/vm.cocci} and {\tt demos/vm.c}. | |
303 | ||
8babbc8f C |
304 | |
305 | \paragraph*{Warning:} Each metavariable declaration causes the declared | |
306 | metavariables to be immediately usable, without any inheritance | |
307 | indication. Thus the following are correct: | |
308 | ||
309 | \begin{quote} | |
310 | \begin{verbatim} | |
311 | @@ | |
312 | type r.T; | |
313 | T x; | |
314 | @@ | |
315 | ||
316 | [...] // some semantic patch code | |
317 | \end{verbatim} | |
318 | \end{quote} | |
319 | ||
320 | \begin{quote} | |
321 | \begin{verbatim} | |
322 | @@ | |
323 | r.T x; | |
324 | type r.T; | |
325 | @@ | |
326 | ||
327 | [...] // some semantic patch code | |
328 | \end{verbatim} | |
329 | \end{quote} | |
330 | ||
331 | \noindent | |
332 | But the following is not correct: | |
333 | ||
334 | \begin{quote} | |
335 | \begin{verbatim} | |
336 | @@ | |
337 | type r.T; | |
338 | r.T x; | |
339 | @@ | |
340 | ||
341 | [...] // some semantic patch code | |
342 | \end{verbatim} | |
343 | \end{quote} | |
344 | ||
345 | This applies to position variables, type metavariables, identifier | |
346 | metavariables that may be used in specifying a structure type, and | |
347 | metavariables used in the initialization of a fresh identifier. In the | |
348 | case of a structure type, any identifier metavariable indeed has to be | |
349 | declared as an identifier metavariable in advance. The syntax does not | |
350 | permit {\tt r.n} as the name of a structure or union type in such a | |
351 | declaration. | |
352 | ||
b1b2de81 C |
353 | \section{Metavariables for scripts} |
354 | ||
355 | Metavariables for scripts can only be inherited from transformation rules. | |
356 | In the spirit of scripting languages such as Python that use dynamic | |
357 | typing, metavariables for scripts do not include type declarations. | |
358 | ||
359 | \begin{grammar} | |
360 | \RULE{\rt{script\_metavariables}} | |
413ffc02 | 361 | \CASE{@ script:\NT{language} \OPT{\NT{rulename}} \OPT{depends on \NT{dep}} @ |
b1b2de81 | 362 | \any{\NT{script\_metadecl}} @@} |
5636bb2c C |
363 | \CASE{@ initialize:\NT{language} \OPT{depends on \NT{dep}} @} |
364 | \CASE{@ finalize:\NT{language} \OPT{depends on \NT{dep}} @} | |
b1b2de81 | 365 | |
413ffc02 | 366 | \RULE{\rt{language}} \CASE{python} \CASE{ocaml} |
b1b2de81 | 367 | |
413ffc02 C |
368 | \RULE{\rt{script\_metadecl}} |
369 | \CASE{\T{id} <{}< \T{rulename\_id}.\T{id} ;} | |
370 | \CASE{\T{id} ;} | |
b1b2de81 C |
371 | \end{grammar} |
372 | ||
174d1640 C |
373 | Currently, the only scripting languages that are supported are Python and |
374 | OCaml, indicated using {\tt python} and {\tt ocaml}, respectively. The | |
b1b2de81 C |
375 | set of available scripting languages may be extended at some point. |
376 | ||
377 | Script rules declared with \KW{initialize} are run before the treatment of | |
378 | any file. Script rules declared with \KW{finalize} are run when the | |
379 | treatment of all of the files has completed. There can be at most one of | |
380 | each per scripting language (thus currently at most one of each). | |
381 | Initialize and finalize script rules do not have access to SmPL | |
382 | metavariables. Nevertheless, a finalize script rule can access any | |
383 | variables initialized by the other script rules, allowing information to be | |
384 | transmitted from the matching process to the finalize rule. | |
385 | ||
413ffc02 C |
386 | A script metavariable that does not specify an origin, using \texttt{<<}, |
387 | is newly declared by the script. This metavariable should be assigned to a | |
388 | string and can be inherited by subsequent rules as an identifier. In | |
389 | Python, the assignment of such a metavariable $x$ should refer to the | |
390 | metavariable as {\tt coccinelle.\(x\)}. Examples are in the files | |
391 | \texttt{demos/pythontococci.cocci} and \texttt{demos/camltococci.cocci}. | |
392 | ||
393 | In an ocaml script, the following extended form of \textit{script\_metadecl} | |
394 | may be used: | |
395 | ||
396 | \begin{grammar} | |
397 | \RULE{\rt{script\_metadecl}} | |
398 | \CASE{(\T{id},\T{id}) <{}< \T{rulename\_id}.\T{id} ;} | |
399 | \CASE{\T{id} <{}< \T{rulename\_id}.\T{id} ;} | |
400 | \CASE{\T{id} ;} | |
401 | \end{grammar} | |
402 | ||
403 | \noindent | |
404 | In a declaration of the form \texttt{(\T{id},\T{id}) <{}< | |
405 | \T{rulename\_id}.\T{id} ;}, the left component of \texttt{(\T{id},\T{id})} | |
406 | receives a string representation of the value of the inherited metavariable | |
407 | while the right component receives its abstract syntax tree. The file | |
408 | \texttt{parsing\_c/ast\_c.ml} in the Coccinelle implementation gives some | |
409 | information about the structure of the abstract syntax tree. Either the | |
410 | left or right component may be replaced by \verb+_+, indicating that the | |
411 | string representation or abstract syntax trees representation is not | |
412 | wanted, respectively. | |
413 | ||
b23ff9c7 C |
414 | The abstract syntax tree of a metavariable declared using {\tt |
415 | metavariable} is not available. | |
416 | ||
faf9a90c C |
417 | \section{Transformation} |
418 | ||
97111a47 C |
419 | The transformation specification essentially has the form of C code, except |
420 | that lines to remove are annotated with \verb+-+ in the first column, and | |
421 | lines to add are annotated with \verb-+-. A transformation specification | |
422 | can also use {\em dots}, ``\verb-...-'', describing an arbitrary sequence | |
423 | of function arguments or instructions within a control-flow path. | |
424 | Implicitly, ``\verb-...-'' matches the shortest path between something that | |
425 | matches the pattern before the dots (or the beginning of the function, if | |
426 | there is nothing before the dots) and something that matches the pattern | |
427 | after the dots (or the end of the function, if there is nothing after the | |
428 | dots). Dots may be modified with a {\tt when} clause, indicating a pattern | |
429 | that should not occur anywhere within the matched sequence. {\tt when any} | |
430 | removes the aforementioned constraint that ``\verb-...-'' matches the | |
431 | shortest path. Finally, a transformation can specify a disjunction of | |
432 | patterns, of the form \mtt{( \mth{\mita{pat}_1} | \mita{\ldots} | | |
433 | \mth{\mita{pat}_n} )} where each \texttt{(}, \texttt{|} or \texttt{)} is | |
434 | in column 0 or preceded by \texttt{\textbackslash}. | |
faf9a90c C |
435 | |
436 | The grammar that we present for the transformation is not actually the | |
437 | grammar of the SmPL code that can be written by the programmer, but is | |
438 | instead the grammar of the slice of this consisting of the {\tt -} | |
439 | annotated and the unannotated code (the context of the transformed lines), | |
440 | or the {\tt +} annotated code and the unannotated code. For example, for | |
441 | parsing purposes, the following transformation | |
442 | %presented in Section \ref{sec:seq2} | |
443 | is split into the two variants shown below and each is parsed | |
444 | separately. | |
445 | ||
446 | \begin{center} | |
447 | \begin{tabular}{c} | |
448 | \begin{lstlisting}[language=Cocci] | |
449 | proc_info_func(...) { | |
450 | <... | |
451 | @-- hostno | |
452 | @++ hostptr->host_no | |
453 | ...> | |
454 | } | |
455 | \end{lstlisting}\\ | |
456 | \end{tabular} | |
457 | \end{center} | |
458 | ||
459 | {%\sizecodebis | |
460 | \begin{center} | |
461 | \begin{tabular}{p{5cm}p{3cm}p{5cm}} | |
462 | \begin{lstlisting}[language=Cocci] | |
463 | proc_info_func(...) { | |
464 | <... | |
465 | @-- hostno | |
466 | ...> | |
467 | } | |
468 | \end{lstlisting} | |
469 | && | |
470 | \begin{lstlisting}[language=Cocci] | |
471 | proc_info_func(...) { | |
472 | <... | |
473 | @++ hostptr->host_no | |
474 | ...> | |
475 | } | |
476 | \end{lstlisting} | |
477 | \end{tabular} | |
478 | \end{center} | |
479 | } | |
480 | ||
481 | \noindent | |
482 | Requiring that both slices parse correctly ensures that the rule matches | |
483 | syntactically valid C code and that it produces syntactically valid C code. | |
484 | The generated parse trees are then merged for use in the subsequent | |
485 | matching and transformation process. | |
486 | ||
487 | The grammar for the minus or plus slice of a transformation is as follows: | |
488 | ||
489 | \begin{grammar} | |
490 | ||
491 | \RULE{\rt{transformation}} | |
492 | \CASE{\some{\NT{include}}} | |
493 | \CASE{\NT{OPTDOTSEQ}\mth{(}\NT{expr}, \NT{when}\mth{)}} | |
494 | \CASE{\NT{OPTDOTSEQ}\mth{(}\some{\NT{decl\_stmt}}, \NT{when}\mth{)}} | |
495 | \CASE{\NT{OPTDOTSEQ}\mth{(}\NT{fundecl}, \NT{when}\mth{)}} | |
496 | ||
497 | \RULE{\rt{include}} | |
498 | \CASE{\#include \T{include\_string}} | |
499 | ||
500 | % \RULE{\rt{fun\_decl\_stmt}} | |
501 | % \CASE{\NT{decl\_stmt}} | |
502 | % \CASE{\NT{fundecl}} | |
503 | ||
504 | % \CASE{\NT{ctype}} | |
505 | % \CASE{\ttlb \NT{initialize\_list} \ttrb} | |
506 | % \CASE{\NT{toplevel\_seq\_start\_after\_dots\_init}} | |
507 | % | |
508 | % \RULE{\rt{toplevel\_seq\_start\_after\_dots\_init}} | |
509 | % \CASE{\NT{stmt\_dots} \NT{toplevel\_after\_dots}} | |
510 | % \CASE{\NT{expr} \opt{\NT{toplevel\_after\_exp}}} | |
511 | % \CASE{\NT{decl\_stmt\_expr} \opt{\NT{toplevel\_after\_stmt}}} | |
512 | % | |
513 | % \RULE{\rt{stmt\_dots}} | |
514 | % \CASE{... \any{\NT{when}}} | |
515 | % \CASE{<... \any{\NT{when}} \NT{nest\_after\_dots} ...>} | |
516 | % \CASE{<+... \any{\NT{when}} \NT{nest\_after\_dots} ...+>} | |
517 | ||
518 | \RULE{\rt{when}} | |
519 | \CASE{when != \NT{when\_code}} | |
520 | \CASE{when = \NT{rule\_elem\_stmt}} | |
521 | \CASE{when \NT{COMMA\_LIST}\mth{(}\NT{any\_strict}\mth{)}} | |
522 | \CASE{when true != \NT{expr}} | |
523 | \CASE{when false != \NT{expr}} | |
524 | ||
525 | \RULE{\rt{when\_code}} | |
526 | \CASE{\NT{OPTDOTSEQ}\mth{(}\some{\NT{decl\_stmt}}, \NT{when}\mth{)}} | |
527 | \CASE{\NT{OPTDOTSEQ}\mth{(}\NT{expr}, \NT{when}\mth{)}} | |
528 | ||
529 | \RULE{\rt{rule\_elem\_stmt}} | |
530 | \CASE{\NT{one\_decl}} | |
531 | \CASE{\NT{expr};} | |
532 | \CASE{return \opt{\NT{expr}};} | |
533 | \CASE{break;} | |
534 | \CASE{continue;} | |
535 | \CASE{\bs(\NT{rule\_elem\_stmt} \SOME{\bs| \NT{rule\_elem\_stmt}}\bs)} | |
536 | ||
537 | \RULE{\rt{any\_strict}} | |
538 | \CASE{any} | |
539 | \CASE{strict} | |
540 | \CASE{forall} | |
541 | \CASE{exists} | |
542 | ||
543 | % \RULE{\rt{nest\_after\_dots}} | |
544 | % \CASE{\NT{decl\_stmt\_exp} \opt{\NT{nest\_after\_stmt}}} | |
545 | % \CASE{\opt{\NT{exp}} \opt{\NT{nest\_after\_exp}}} | |
546 | % | |
547 | % \RULE{\rt{nest\_after\_stmt}} | |
548 | % \CASE{\NT{stmt\_dots} \NT{nest\_after\_dots}} | |
549 | % \CASE{\NT{decl\_stmt} \opt{\NT{nest\_after\_stmt}}} | |
550 | % | |
551 | % \RULE{\rt{nest\_after\_exp}} | |
552 | % \CASE{\NT{stmt\_dots} \NT{nest\_after\_dots}} | |
553 | % | |
554 | % \RULE{\rt{toplevel\_after\_dots}} | |
555 | % \CASE{\opt{\NT{toplevel\_after\_exp}}} | |
556 | % \CASE{\NT{exp} \opt{\NT{toplevel\_after\_exp}}} | |
557 | % \CASE{\NT{decl\_stmt\_expr} \NT{toplevel\_after\_stmt}} | |
558 | % | |
559 | % \RULE{\rt{toplevel\_after\_exp}} | |
560 | % \CASE{\NT{stmt\_dots} \opt{\NT{toplevel\_after\_dots}}} | |
561 | % | |
562 | % \RULE{\rt{decl\_stmt\_expr}} | |
563 | % \CASE{TMetaStmList$^\ddag$} | |
564 | % \CASE{\NT{decl\_var}} | |
565 | % \CASE{\NT{stmt}} | |
566 | % \CASE{(\NT{stmt\_seq} \ANY{| \NT{stmt\_seq}})} | |
567 | % | |
568 | % \RULE{\rt{toplevel\_after\_stmt}} | |
569 | % \CASE{\NT{stmt\_dots} \opt{\NT{toplevel\_after\_dots}}} | |
570 | % \CASE{\NT{decl\_stmt} \NT{toplevel\_after\_stmt}} | |
571 | ||
572 | \end{grammar} | |
573 | ||
574 | \begin{grammar} | |
575 | \RULE{\rt{OPTDOTSEQ}\mth{(}\rt{grammar\_ds}, \rt{when\_ds}\mth{)}} | |
576 | \CASE{}\multicolumn{3}{r}{\hspace{1cm} | |
97111a47 C |
577 | \KW{\opt{... \ANY{\NT{when\_ds}}} \NT{grammar\_ds} |
578 | \ANY{... \ANY{\NT{when\_ds}} \NT{grammar\_ds}} | |
579 | \opt{... \ANY{\NT{when\_ds}}}} | |
faf9a90c C |
580 | } |
581 | ||
582 | % \CASE{\opt{... \opt{\NT{when\_ds}}} \NT{grammar} | |
583 | % \ANY{... \opt{\NT{when\_ds}} \NT{grammar}} | |
584 | % \opt{... \opt{\NT{when\_ds}}}} | |
585 | % \CASE{<... \any{\NT{when\_ds}} \NT{grammar} ...>} | |
586 | % \CASE{<+... \any{\NT{when\_ds}} \NT{grammar} ...+>} | |
587 | ||
588 | \end{grammar} | |
589 | ||
590 | \noindent | |
591 | Lines may be annotated with an element of the set $\{\mtt{-}, \mtt{+}, | |
592 | \mtt{*}\}$ or the singleton $\mtt{?}$, or one of each set. \mtt{?} | |
97111a47 C |
593 | represents at most one match of the given pattern, ie a match of the |
594 | pattern is optional. \mtt{*} is used for | |
faf9a90c C |
595 | semantic match, \emph{i.e.}, a pattern that highlights the fragments |
596 | annotated with \mtt{*}, but does not perform any modification of the | |
597 | matched code. \mtt{*} cannot be mixed with \mtt{-} and \mtt{+}. There are | |
598 | some constraints on the use of these annotations: | |
599 | \begin{itemize} | |
600 | \item Dots, {\em i.e.} \texttt{...}, cannot occur on a line marked | |
601 | \texttt{+}. | |
602 | \item Nested dots, {\em i.e.}, dots enclosed in {\tt <} and {\tt >}, cannot | |
603 | occur on a line with any marking. | |
604 | \end{itemize} | |
605 | ||
0708f913 C |
606 | Each element of a disjunction must be a proper term like an |
607 | expression, a statement, an identifier or a declaration. Thus, the | |
413ffc02 | 608 | rule on the left below is not a syntactically correct SmPL rule. One may |
0708f913 C |
609 | use the rule on the right instead. |
610 | ||
611 | \begin{center} | |
612 | \begin{tabular}{l@{\hspace{5cm}}r} | |
613 | \begin{lstlisting}[language=Cocci] | |
614 | @@ | |
615 | type T; | |
616 | T b; | |
617 | @@ | |
618 | ||
619 | ( | |
620 | writeb(..., | |
621 | | | |
f537ebc4 | 622 | readb(..., |
0708f913 C |
623 | ) |
624 | @--(T) | |
625 | b) | |
626 | \end{lstlisting} | |
627 | & | |
628 | \begin{lstlisting}[language=Cocci] | |
629 | @@ | |
630 | type T; | |
631 | T b; | |
632 | @@ | |
633 | ||
634 | ( | |
635 | read | |
636 | | | |
637 | write | |
638 | ) | |
639 | (..., | |
640 | @-- (T) | |
641 | b) | |
642 | \end{lstlisting} | |
643 | \\ | |
644 | \end{tabular} | |
645 | \end{center} | |
646 | ||
f537ebc4 C |
647 | Some kinds of terms can only appear in + code. These include comments, |
648 | ifdefs, and attributes (\texttt{\_\_attribute\_\_((...))}). | |
649 | ||
faf9a90c C |
650 | \section{Types} |
651 | \label{types} | |
652 | ||
653 | \begin{grammar} | |
654 | ||
655 | \RULE{\rt{ctypes}} | |
656 | \CASE{\NT{COMMA\_LIST}\mth{(}\NT{ctype}\mth{)}} | |
657 | ||
658 | \RULE{\rt{ctype}} | |
659 | \CASE{\opt{\NT{const\_vol}} \NT{generic\_ctype} \any{*}} | |
660 | \CASE{\opt{\NT{const\_vol}} void \some{*}} | |
661 | \CASE{(\NT{ctype} \ANY{| \NT{ctype}})} | |
662 | ||
663 | \RULE{\rt{const\_vol}} | |
664 | \CASE{const} | |
665 | \CASE{volatile} | |
666 | ||
667 | \RULE{\rt{generic\_ctype}} | |
668 | \CASE{\NT{ctype\_qualif}} | |
669 | \CASE{\opt{\NT{ctype\_qualif}} char} | |
670 | \CASE{\opt{\NT{ctype\_qualif}} short} | |
f3c4ece6 | 671 | \CASE{\opt{\NT{ctype\_qualif}} short int} |
faf9a90c C |
672 | \CASE{\opt{\NT{ctype\_qualif}} int} |
673 | \CASE{\opt{\NT{ctype\_qualif}} long} | |
f3c4ece6 | 674 | \CASE{\opt{\NT{ctype\_qualif}} long int} |
faf9a90c | 675 | \CASE{\opt{\NT{ctype\_qualif}} long long} |
f3c4ece6 | 676 | \CASE{\opt{\NT{ctype\_qualif}} long long int} |
faf9a90c | 677 | \CASE{double} |
f3c4ece6 | 678 | \CASE{long double} |
faf9a90c | 679 | \CASE{float} |
1eddfd50 | 680 | \CASE{size\_t} \CASE{ssize\_t} \CASE{ptrdiff\_t} |
c491d8ee | 681 | \CASE{enum \NT{id} \{ \NT{PARAMSEQ}\mth{(}\NT{dot\_expr}, \NT{exp\_whencode}\mth{)} \OPT{,} \}} |
faf9a90c C |
682 | \CASE{\OPT{struct\OR union} \T{id} \OPT{\{ \any{\NT{struct\_decl\_list}} \}}} |
683 | ||
684 | \RULE{\rt{ctype\_qualif}} | |
685 | \CASE{unsigned} | |
686 | \CASE{signed} | |
687 | ||
688 | \RULE{\rt{struct\_decl\_list}} | |
689 | \CASE{\NT{struct\_decl\_list\_start}} | |
690 | ||
691 | \RULE{\rt{struct\_decl\_list\_start}} | |
692 | \CASE{\NT{struct\_decl}} | |
693 | \CASE{\NT{struct\_decl} \NT{struct\_decl\_list\_start}} | |
694 | \CASE{... \opt{when != \NT{struct\_decl}}$^\dag$ \opt{\NT{continue\_struct\_decl\_list}}} | |
695 | ||
696 | \RULE{\rt{continue\_struct\_decl\_list}} | |
697 | \CASE{\NT{struct\_decl} \NT{struct\_decl\_list\_start}} | |
698 | \CASE{\NT{struct\_decl}} | |
699 | ||
700 | \RULE{\rt{struct\_decl}} | |
701 | \CASE{\NT{ctype} \NT{d\_ident};} | |
702 | \CASE{\NT{fn\_ctype} (* \NT{d\_ident}) (\NT{PARAMSEQ}\mth{(}\NT{name\_opt\_decl}, \mth{\varepsilon)});)} | |
703 | \CASE{\opt{\NT{const\_vol}} \T{id} \NT{d\_ident};} | |
704 | ||
705 | \RULE{\rt{d\_ident}} | |
c491d8ee | 706 | \CASE{\T{id} \any{[\opt{\NT{expr}}]}} |
faf9a90c C |
707 | |
708 | \RULE{\rt{fn\_ctype}} | |
709 | \CASE{\NT{generic\_ctype} \any{*}} | |
710 | \CASE{void \any{*}} | |
711 | ||
712 | \RULE{\rt{name\_opt\_decl}} | |
713 | \CASE{\NT{decl}} | |
714 | \CASE{\NT{ctype}} | |
715 | \CASE{\NT{fn\_ctype}} | |
716 | \end{grammar} | |
717 | ||
718 | $^\dag$ The optional \texttt{when} construct ends at the end of the line. | |
719 | ||
720 | \section{Function declarations} | |
721 | ||
722 | \begin{grammar} | |
723 | ||
724 | \RULE{\rt{fundecl}} | |
725 | \CASE{\opt{\NT{fn\_ctype}} \any{\NT{funinfo}} \NT{funid} | |
726 | (\opt{\NT{PARAMSEQ}\mth{(}\NT{param}, \mth{\varepsilon)}}) | |
727 | \ttlb~\opt{\NT{stmt\_seq}} \ttrb} | |
728 | ||
729 | \RULE{\rt{funproto}} | |
730 | \CASE{\opt{\NT{fn\_ctype}} \any{\NT{funinfo}} \NT{funid} | |
731 | (\opt{\NT{PARAMSEQ}\mth{(}\NT{param}, \mth{\varepsilon)}});} | |
732 | ||
733 | \RULE{\rt{funinfo}} | |
734 | \CASE{inline} | |
735 | \CASE{\NT{storage}} | |
736 | % \CASE{\NT{attr}} | |
737 | ||
738 | \RULE{\rt{storage}} | |
739 | \CASE{static} | |
740 | \CASE{auto} | |
741 | \CASE{register} | |
742 | \CASE{extern} | |
743 | ||
744 | \RULE{\rt{funid}} | |
745 | \CASE{\T{id}} | |
746 | \CASE{\mth{\T{metaid}^{\ssf{Id}}}} | |
d3f655c6 | 747 | \CASE{\NT{OR}\mth{(}\NT{stmt}\mth{)}} |
faf9a90c C |
748 | % \CASE{\mth{\T{metaid}^{\ssf{Func}}}} |
749 | % \CASE{\mth{\T{metaid}^{\ssf{LocalFunc}}}} | |
750 | ||
751 | \RULE{\rt{param}} | |
752 | \CASE{\NT{type} \T{id}} | |
753 | \CASE{\mth{\T{metaid}^{\ssf{Param}}}} | |
754 | \CASE{\mth{\T{metaid}^{\ssf{ParamList}}}} | |
755 | ||
756 | \RULE{\rt{decl}} | |
757 | \CASE{\NT{ctype} \NT{id}} | |
758 | \CASE{\NT{fn\_ctype} (* \NT{id}) (\NT{PARAMSEQ}\mth{(}\NT{name\_opt\_decl}, \mth{\varepsilon)})} | |
759 | \CASE{void} | |
760 | \CASE{\mth{\T{metaid}^{\ssf{Param}}}} | |
761 | \end{grammar} | |
762 | ||
763 | \begin{grammar} | |
764 | \RULE{\rt{PARAMSEQ}\mth{(}\rt{gram\_p}, \rt{when\_p}\mth{)}} | |
765 | \CASE{\NT{COMMA\_LIST}\mth{(}\NT{gram\_p} \OR \ldots \opt{\NT{when\_p}}\mth{)}} | |
766 | \end{grammar} | |
767 | ||
90aeb998 C |
768 | To match a function it is not necessary to provide all of the annotations |
769 | that appear before the function name. For example, the following semantic | |
770 | patch: | |
771 | ||
772 | \begin{lstlisting}[language=Cocci] | |
773 | @@ | |
774 | @@ | |
775 | ||
776 | foo() { ... } | |
777 | \end{lstlisting} | |
778 | ||
779 | \noindent | |
780 | matches a function declared as follows: | |
781 | ||
782 | \begin{lstlisting}[language=C] | |
783 | static int foo() { return 12; } | |
784 | \end{lstlisting} | |
785 | ||
786 | \noindent | |
787 | This behavior can be turned off by disabling the \KW{optional\_storage} | |
788 | isomorphism. If one adds code before a function declaration, then the | |
789 | effect depends on the kind of code that is added. If the added code is a | |
790 | function definition or CPP code, then the new code is placed before | |
791 | all information associated with the function definition, including any | |
97111a47 | 792 | comments preceding the function definition. On the other hand, if the new |
90aeb998 C |
793 | code is associated with the function, such as the addition of the keyword |
794 | {\tt static}, the new code is placed exactly where it appears with respect | |
97111a47 | 795 | to the rest of the function definition in the semantic patch. For example, |
90aeb998 C |
796 | |
797 | \begin{lstlisting}[language=Cocci] | |
798 | @@ | |
799 | @@ | |
800 | ||
801 | + static | |
802 | foo() { ... } | |
803 | \end{lstlisting} | |
804 | ||
805 | \noindent | |
806 | causes static to be placed just before the function name. The following | |
807 | causes it to be placed just before the type | |
808 | ||
809 | \begin{lstlisting}[language=Cocci] | |
810 | @@ | |
811 | type T; | |
812 | @@ | |
813 | ||
814 | + static | |
815 | T foo() { ... } | |
816 | \end{lstlisting} | |
817 | ||
818 | \noindent | |
413ffc02 | 819 | It may be necessary to consider several cases to ensure that the added ode |
90aeb998 C |
820 | is placed in the right position. For example, one may need one pattern |
821 | that considers that the function is declared {\tt inline} and another that | |
822 | considers that it is not. | |
823 | ||
faf9a90c C |
824 | %\newpage |
825 | ||
826 | \section{Declarations} | |
827 | ||
828 | \begin{grammar} | |
829 | \RULE{\rt{decl\_var}} | |
830 | % \CASE{\NT{type} \opt{\NT{id} \opt{[\opt{\NT{dot\_expr}}]} | |
831 | % \ANY{, \NT{id} \opt{[ \opt{\NT{dot\_expr}}]}}};} | |
832 | \CASE{\NT{common\_decl}} | |
833 | \CASE{\opt{\NT{storage}} \NT{ctype} \NT{COMMA\_LIST}\mth{(}\NT{d\_ident}\mth{)} ;} | |
834 | \CASE{\opt{\NT{storage}} \opt{\NT{const\_vol}} \T{id} \NT{COMMA\_LIST}\mth{(}\NT{d\_ident}\mth{)} ;} | |
835 | \CASE{\opt{\NT{storage}} \NT{fn\_ctype} ( * \NT{d\_ident} ) ( \NT{PARAMSEQ}\mth{(}\NT{name\_opt\_decl}, \mth{\varepsilon)} ) = \NT{initialize} ;} | |
836 | \CASE{typedef \NT{ctype} \NT{typedef\_ident} ;} | |
837 | ||
838 | \RULE{\rt{one\_decl}} | |
839 | \CASE{\NT{common\_decl}} | |
840 | \CASE{\opt{\NT{storage}} \NT{ctype} \NT{id};} | |
841 | % \CASE{\NT{storage} \NT{ctype} \NT{id} \opt{[\opt{\NT{dot\\_expr}}]} = \NT{nest\\_expr};} | |
842 | \CASE{\opt{\NT{storage}} \opt{\NT{const\_vol}} \T{id} \NT{d\_ident} ;} | |
843 | ||
844 | \RULE{\rt{common\_decl}} | |
845 | \CASE{\NT{ctype};} | |
846 | \CASE{\NT{funproto}} | |
847 | \CASE{\opt{\NT{storage}} \NT{ctype} \NT{d\_ident} = \NT{initialize} ;} | |
848 | \CASE{\opt{\NT{storage}} \opt{\NT{const\_vol}} \T{id} \NT{d\_ident} = \NT{initialize} ;} | |
849 | \CASE{\opt{\NT{storage}} \NT{fn\_ctype} ( * \NT{d\_ident} ) ( \NT{PARAMSEQ}\mth{(}\NT{name\_opt\_decl}, \mth{\varepsilon)} ) ;} | |
850 | \CASE{\NT{decl\_ident} ( \OPT{\NT{COMMA\_LIST}\mth{(}\NT{expr}\mth{)}} ) ;} | |
851 | ||
852 | \RULE{\rt{initialize}} | |
853 | \CASE{\NT{dot\_expr}} | |
c491d8ee | 854 | \CASE{\mth{\T{metaid}^{\ssf{Initialiser}}}} |
8f657093 | 855 | \CASE{\ttlb~\opt{\NT{COMMA\_LIST}\mth{(}\NT{init\_list\_elem}\mth{)}}~\ttrb} |
faf9a90c | 856 | |
c491d8ee C |
857 | \RULE{\rt{init\_list\_elem}} |
858 | \CASE{\NT{dot\_expr}} | |
97111a47 | 859 | \CASE{\NT{designator} = \NT{initialize}} |
8f657093 C |
860 | \CASE{\mth{\T{metaid}^{\ssf{Initialiser}}}} |
861 | \CASE{\mth{\T{metaid}^{\ssf{InitialiserList}}}} | |
c491d8ee C |
862 | \CASE{\NT{id} : \NT{dot\_expr}} |
863 | ||
864 | \RULE{\rt{designator}} | |
865 | \CASE{. \NT{id}} | |
866 | \CASE{[ \NT{dot\_expr} ]} | |
867 | \CASE{[ \NT{dot\_expr} ... \NT{dot\_expr} ]} | |
868 | ||
faf9a90c C |
869 | \RULE{\rt{decl\_ident}} |
870 | \CASE{\T{DeclarerId}} | |
871 | \CASE{\mth{\T{metaid}^{\ssf{Declarer}}}} | |
872 | \end{grammar} | |
873 | ||
8f657093 C |
874 | An initializer for a structure can be ordered or unordered. It is |
875 | considered to be unordered if there is at least one key-value pair | |
876 | initializer, e.g., \texttt{.x = e}. | |
877 | ||
faf9a90c C |
878 | \section{Statements} |
879 | ||
880 | The first rule {\em statement} describes the various forms of a statement. | |
881 | The remaining rules implement the constraints that are sensitive to the | |
882 | context in which the statement occurs: {\em single\_statement} for a | |
883 | context in which only one statement is allowed, and {\em decl\_statement} | |
884 | for a context in which a declaration, statement, or sequence thereof is | |
885 | allowed. | |
886 | ||
887 | \begin{grammar} | |
888 | \RULE{\rt{stmt}} | |
889 | \CASE{\NT{include}} | |
890 | \CASE{\mth{\T{metaid}^{\ssf{Stmt}}}} | |
891 | \CASE{\NT{expr};} | |
892 | \CASE{if (\NT{dot\_expr}) \NT{single\_stmt} \opt{else \NT{single\_stmt}}} | |
893 | \CASE{for (\opt{\NT{dot\_expr}}; \opt{\NT{dot\_expr}}; \opt{\NT{dot\_expr}}) | |
894 | \NT{single\_stmt}} | |
895 | \CASE{while (\NT{dot\_expr}) \NT{single\_stmt}} | |
896 | \CASE{do \NT{single\_stmt} while (\NT{dot\_expr});} | |
897 | \CASE{\NT{iter\_ident} (\any{\NT{dot\_expr}}) \NT{single\_stmt}} | |
898 | \CASE{switch (\opt{\NT{dot\_expr}}) \ttlb \any{\NT{case\_line}} \ttrb} | |
899 | \CASE{return \opt{\NT{dot\_expr}};} | |
900 | \CASE{\ttlb~\opt{\NT{stmt\_seq}} \ttrb} | |
901 | \CASE{\NT{NEST}\mth{(}\some{\NT{decl\_stmt}}, \NT{when}\mth{)}} | |
902 | \CASE{\NT{NEST}\mth{(}\NT{expr}, \NT{when}\mth{)}} | |
903 | \CASE{break;} | |
904 | \CASE{continue;} | |
905 | \CASE{\NT{id}:} | |
906 | \CASE{goto \NT{id};} | |
907 | \CASE{\ttlb \NT{stmt\_seq} \ttrb} | |
908 | ||
909 | \RULE{\rt{single\_stmt}} | |
910 | \CASE{\NT{stmt}} | |
911 | \CASE{\NT{OR}\mth{(}\NT{stmt}\mth{)}} | |
912 | ||
913 | \RULE{\rt{decl\_stmt}} | |
914 | \CASE{\mth{\T{metaid}^{\ssf{StmtList}}}} | |
915 | \CASE{\NT{decl\_var}} | |
916 | \CASE{\NT{stmt}} | |
917 | \CASE{\NT{OR}\mth{(}\NT{stmt\_seq}\mth{)}} | |
918 | ||
919 | \RULE{\rt{stmt\_seq}} | |
920 | \CASE{\any{\NT{decl\_stmt}} | |
921 | \opt{\NT{DOTSEQ}\mth{(}\some{\NT{decl\_stmt}}, | |
922 | \NT{when}\mth{)} \any{\NT{decl\_stmt}}}} | |
923 | \CASE{\any{\NT{decl\_stmt}} | |
924 | \opt{\NT{DOTSEQ}\mth{(}\NT{expr}, | |
925 | \NT{when}\mth{)} \any{\NT{decl\_stmt}}}} | |
926 | ||
927 | \RULE{\rt{case\_line}} | |
928 | \CASE{default :~\NT{stmt\_seq}} | |
929 | \CASE{case \NT{dot\_expr} :~\NT{stmt\_seq}} | |
930 | ||
931 | \RULE{\rt{iter\_ident}} | |
932 | \CASE{\T{IteratorId}} | |
933 | \CASE{\mth{\T{metaid}^{\ssf{Iterator}}}} | |
934 | \end{grammar} | |
935 | ||
936 | \begin{grammar} | |
937 | \RULE{\rt{OR}\mth{(}\rt{gram\_o}\mth{)}} | |
938 | \CASE{( \NT{gram\_o} \ANY{\ttmid \NT{gram\_o}})} | |
939 | ||
940 | \RULE{\rt{DOTSEQ}\mth{(}\rt{gram\_d}, \rt{when\_d}\mth{)}} | |
941 | \CASE{\ldots \opt{\NT{when\_d}} \ANY{\NT{gram\_d} \ldots \opt{\NT{when\_d}}}} | |
942 | ||
943 | \RULE{\rt{NEST}\mth{(}\rt{gram\_n}, \rt{when\_n}\mth{)}} | |
944 | \CASE{<\ldots \opt{\NT{when\_n}} \NT{gram\_n} \ANY{\ldots \opt{\NT{when\_n}} \NT{gram\_n}} \ldots>} | |
945 | \CASE{<+\ldots \opt{\NT{when\_n}} \NT{gram\_n} \ANY{\ldots \opt{\NT{when\_n}} \NT{gram\_n}} \ldots+>} | |
946 | \end{grammar} | |
947 | ||
948 | \noindent | |
949 | OR is a macro that generates a disjunction of patterns. The three | |
950 | tokens \T{(}, \T{\ttmid}, and \T{)} must appear in the leftmost | |
951 | column, to differentiate them from the parentheses and bit-or tokens | |
952 | that can appear within expressions (and cannot appear in the leftmost | |
953 | column). These token may also be preceded by \texttt{\bs} | |
954 | when they are used in an other column. These tokens are furthermore | |
955 | different from (, \(\mid\), and ), which are part of the grammar | |
956 | metalanguage. | |
957 | ||
958 | \section{Expressions} | |
959 | ||
960 | A nest or a single ellipsis is allowed in some expression contexts, and | |
961 | causes ambiguity in others. For example, in a sequence \mtt{\ldots | |
962 | \mita{expr} \ldots}, the nonterminal \mita{expr} must be instantiated as an | |
963 | explicit C-language expression, while in an array reference, | |
964 | \mtt{\mth{\mita{expr}_1} \mtt{[} \mth{\mita{expr}_2} \mtt{]}}, the | |
965 | nonterminal \mth{\mita{expr}_2}, because it is delimited by brackets, can | |
966 | be also instantiated as \mtt{\ldots}, representing an arbitrary expression. To | |
967 | distinguish between the various possibilities, we define three nonterminals | |
968 | for expressions: {\em expr} does not allow either top-level nests or | |
969 | ellipses, {\em nest\_expr} allows a nest but not an ellipsis, and {\em | |
970 | dot\_expr} allows both. The EXPR macro is used to express these variants | |
971 | in a concise way. | |
972 | ||
973 | \begin{grammar} | |
974 | \RULE{\rt{expr}} | |
975 | \CASE{\NT{EXPR}\mth{(}\NT{expr}\mth{)}} | |
976 | ||
977 | \RULE{\rt{nest\_expr}} | |
978 | \CASE{\NT{EXPR}\mth{(}\NT{nest\_expr}\mth{)}} | |
979 | \CASE{\NT{NEST}\mth{(}\NT{nest\_expr}, \NT{exp\_whencode}\mth{)}} | |
980 | ||
981 | \RULE{\rt{dot\_expr}} | |
982 | \CASE{\NT{EXPR}\mth{(}\NT{dot\_expr}\mth{)}} | |
983 | \CASE{\NT{NEST}\mth{(}\NT{dot\_expr}, \NT{exp\_whencode}\mth{)}} | |
984 | \CASE{...~\opt{\NT{exp\_whencode}}} | |
985 | ||
986 | \RULE{\rt{EXPR}\mth{(}\rt{exp}\mth{)}} | |
987 | \CASE{\NT{exp} \NT{assign\_op} \NT{exp}} | |
988 | \CASE{\NT{exp}++} | |
989 | \CASE{\NT{exp}--} | |
990 | \CASE{\NT{unary\_op} \NT{exp}} | |
991 | \CASE{\NT{exp} \NT{bin\_op} \NT{exp}} | |
992 | \CASE{\NT{exp} ?~\NT{dot\_expr} :~\NT{exp}} | |
993 | \CASE{(\NT{type}) \NT{exp}} | |
994 | \CASE{\NT{exp} [\NT{dot\_expr}]} | |
995 | \CASE{\NT{exp} .~\NT{id}} | |
996 | \CASE{\NT{exp} -> \NT{id}} | |
997 | \CASE{\NT{exp}(\opt{\NT{PARAMSEQ}\mth{(}\NT{arg}, \NT{exp\_whencode}\mth{)}})} | |
998 | \CASE{\NT{id}} | |
7fe62b65 | 999 | \CASE{(\NT{type}) \ttlb~{\NT{COMMA\_LIST}\mth{(}\NT{init\_list\_elem}\mth{)}}~\ttrb} |
faf9a90c C |
1000 | % \CASE{\mth{\T{metaid}^{\ssf{Func}}}} |
1001 | % \CASE{\mth{\T{metaid}^{\ssf{LocalFunc}}}} | |
1002 | \CASE{\mth{\T{metaid}^{\ssf{Exp}}}} | |
1003 | % \CASE{\mth{\T{metaid}^{\ssf{Err}}}} | |
1004 | \CASE{\mth{\T{metaid}^{\ssf{Const}}}} | |
1005 | \CASE{\NT{const}} | |
1006 | \CASE{(\NT{dot\_expr})} | |
1007 | \CASE{\NT{OR}\mth{(}\NT{exp}\mth{)}} | |
1008 | ||
1009 | \RULE{\rt{arg}} | |
1010 | \CASE{\NT{nest\_expr}} | |
1011 | \CASE{\mth{\T{metaid}^{\ssf{ExpList}}}} | |
1012 | ||
1013 | \RULE{\rt{exp\_whencode}} | |
1014 | \CASE{when != \NT{expr}} | |
1015 | ||
1016 | \RULE{\rt{assign\_op}} | |
1017 | \CASE{= \OR -= \OR += \OR *= \OR /= \OR \%=} | |
1018 | \CASE{\&= \OR |= \OR \caret= \OR \lt\lt= \OR \gt\gt=} | |
1019 | ||
1020 | \RULE{\rt{bin\_op}} | |
1021 | \CASE{* \OR / \OR \% \OR + \OR -} | |
1022 | \CASE{\lt\lt \OR \gt\gt \OR \caret\xspace \OR \& \OR \ttmid} | |
1023 | \CASE{< \OR > \OR <= \OR >= \OR == \OR != \OR \&\& \OR \ttmid\ttmid} | |
1024 | ||
1025 | \RULE{\rt{unary\_op}} | |
1026 | \CASE{++ \OR -- \OR \& \OR * \OR + \OR - \OR !} | |
1027 | ||
1028 | \end{grammar} | |
1029 | ||
d3f655c6 | 1030 | \section{Constants, Identifiers and Types for Transformations} |
faf9a90c C |
1031 | |
1032 | \begin{grammar} | |
1033 | \RULE{\rt{const}} | |
1034 | \CASE{\NT{string}} | |
1035 | \CASE{[0-9]+} | |
1036 | \CASE{\mth{\cdots}} | |
1037 | ||
1038 | \RULE{\rt{string}} | |
1039 | \CASE{"\any{[\^{}"]}"} | |
1040 | ||
1041 | \RULE{\rt{id}} | |
d3f655c6 C |
1042 | \CASE{\T{id} \OR \mth{\T{metaid}^{\ssf{Id}}} |
1043 | \OR {\NT{OR}\mth{(}\NT{stmt}\mth{)}}} | |
faf9a90c C |
1044 | |
1045 | \RULE{\rt{typedef\_ident}} | |
1046 | \CASE{\T{id} \OR \mth{\T{metaid}^{\ssf{Type}}}} | |
1047 | ||
1048 | \RULE{\rt{type}} | |
1049 | \CASE{\NT{ctype} \OR \mth{\T{metaid}^{\ssf{Type}}}} | |
1050 | ||
1051 | \RULE{\rt{pathToIsoFile}} | |
1052 | \CASE{<.*>} | |
951c7801 C |
1053 | |
1054 | \RULE{\rt{regexp}} | |
1055 | \CASE{"\any{[\^{}"]}"} | |
faf9a90c C |
1056 | \end{grammar} |
1057 | ||
97111a47 | 1058 | \section{Comments and preprocessor directives} |
8babbc8f C |
1059 | |
1060 | A \verb+//+ or \verb+/* */+ comment that is annotated with + in the | |
1061 | leftmost column is considered to be added code. A \verb+//+ or | |
97111a47 | 1062 | \verb+/* */+ comment without such an annotation is considered to be a |
8babbc8f | 1063 | comment about the SmPL code, and thus is not matched in the C code. |
faf9a90c | 1064 | |
97111a47 C |
1065 | The following preprocessor directives can likewise be added. They cannot |
1066 | be matched against. The entire line is added, but it is not parsed. | |
1067 | ||
1068 | \begin{itemize} | |
1069 | \item \verb+if+ | |
1070 | \item \verb+ifdef+ | |
1071 | \item \verb+ifndef+ | |
1072 | \item \verb+else+ | |
1073 | \item \verb+elif+ | |
1074 | \item \verb+endif+ | |
1075 | \item \verb+error+ | |
1076 | \item \verb+pragma+ | |
1077 | \item \verb+line+ | |
1078 | \end{itemize} | |
1079 | ||
993936c0 C |
1080 | \section{Command-line semantic match} |
1081 | ||
1082 | It is possible to specify a semantic match on the spatch command line, | |
1083 | using the argument {\tt -sp}. In such a semantic match, any token | |
1084 | beginning with a capital letter is assumed to be a metavariable of type | |
1085 | {\tt metavariable}. In this case, the parser must be able to figure out what | |
1086 | kind of metavariable it is. It is also possible to specify the type of a | |
1087 | metavariable by enclosing the type in :'s, concatenated directly to the | |
1088 | metavariable name. | |
1089 | ||
1090 | Some examples of semantic matches that can be given as an argument to {\tt | |
1091 | -sp} are as follows: | |
1092 | ||
1093 | \begin{itemize} | |
1094 | \item \texttt{f(e)}: This only matches the expression \texttt{f(e)}. | |
1095 | \item \texttt{f(E)}: This matches a call to f with any argument. | |
1096 | \item \texttt{F(E)}: This gives a parse error; the semantic patch parser | |
1097 | cannot figure out what kind of metavariable \texttt{F} is. | |
1098 | \item \texttt{F:identifier:(E)}: This matches any one argument function | |
1099 | call. | |
1100 | \item \texttt{f:identifier:(e:struct foo *:)}: This matches any one | |
1101 | argument function call where the argument has type \texttt{struct foo | |
1102 | *}. Since the types of the metavariables are specified, it is not | |
1103 | necessary for the metavariable names to begin with a capital letter. | |
1104 | \item \texttt{F:identifier:(F)}: This matches any one argument function call | |
1105 | where the argument is the name of the function itself. This example | |
1106 | shows that it is not necessary to repeat the metavariable type name. | |
1107 | \item \texttt{F:identifier:(F:identifier:)}: This matches any one argument | |
97111a47 | 1108 | function call |
993936c0 C |
1109 | where the argument is the name of the function itself. This example |
1110 | shows that it is possible to repeat the metavariable type name. | |
1111 | \end{itemize} | |
1112 | ||
1113 | \texttt{When} constraints, \textit{e.g.} \texttt{when != e}, are allowed | |
1114 | but the expression \texttt{e} must be represented as a single token. | |
1115 | ||
1116 | The generated semantic match behaves as though there were a \texttt{*} in front | |
1117 | of every token. | |
1118 | ||
faf9a90c C |
1119 | %%% Local Variables: |
1120 | %%% mode: LaTeX | |
708f4980 | 1121 | %%% TeX-master: "main_grammar" |
5636bb2c | 1122 | %%% coding: utf-8 |
faf9a90c C |
1123 | %%% TeX-PDF-mode: t |
1124 | %%% ispell-local-dictionary: "american" | |
1125 | %%% End: |