docs/manual/cocci_syntax.tex

   1
   2 %\section{The SmPL Grammar}
   3
   4 % This section presents the SmPL grammar.  This definition follows closely
   5 % our implementation using the Menhir parser generator \cite{menhir}.
   6
   7 This document presents the grammar of the SmPL language used by the
   8 \href{http://coccinelle.lip6.fr/}{Coccinelle tool}.  For the most
   9 part, the grammar is written using standard notation.  In some rules,
  10 however, the left-hand side is in all uppercase letters.  These are
  11 macros, which take one or more grammar rule right-hand-sides as
  12 arguments.  The grammar also uses some unspecified nonterminals, such
  13 as \T{id}, \T{const}, etc.  These refer to the sets suggested by
  14 the name, {\em i.e.}, \T{id} refers to the set of possible
  15 C-language identifiers, while \T{const} refers to the set of
  16 possible C-language constants.
  17 %
  18 \ifhevea
  19 A PDF version of this documentation is available at
  20 \url{http://coccinelle.lip6.fr/docs/main_grammar.pdf}.
  21 \else
  22 A HTML version of this documentation is available online at
  23 \url{http://coccinelle.lip6.fr/docs/main_grammar.html}.
  24 \fi
  25
  26 \section{Program}
  27
  28 \begin{grammar}
  29   \RULE{\rt{program}}
  30   \CASE{\any{\NT{include\_cocci}} \some{\NT{changeset}}}
  31
  32   \RULE{\rt{include\_cocci}}
  33   \CASE{include \NT{string}}
  34   \CASE{using \NT{string}}
  35   \CASE{using \NT{pathToIsoFile}}
  36   \CASE{virtual \T{id} \ANY{, \T{id}}}
  37
  38   \RULE{\rt{changeset}}
  39   \CASE{\NT{metavariables} \NT{transformation}}
  40   \CASE{\NT{script\_metavariables} \T{script\_code}}
  41 %  \CASE{\NT{metavariables} \ANY{--- filename +++ filename} \NT{transformation}}
  42 \end{grammar}
  43
  44 \noindent
  45 \T{script\_code} is any code in the chosen scripting language.  Parsing of
  46 the semantic patch does not check the validity of this code; any errors are
  47 first detected when the code is executed.  Furthermore, \texttt{@} should
  48 not be use in this code.  Spatch scans the script code for the next
  49 \texttt{@} and considers that to be the beginning of the next rule, even if
  50 \texttt{@} occurs within e.g., a comment.
  51
  52 \texttt{virtual} keyword is used to declare virtual rules. Virtual
  53 rules may be subsequently used as a dependency for the rules in the
  54 SmPL file. Whether a virtual rule is defined or not is controlled by
  55 the \texttt{-D} option on the command line.
  56
  57 % Between the metavariables and the transformation rule, there can be a
  58 % specification of constraints on the names of the old and new files,
  59 % analogous to the filename specifications in the standard patch syntax.
  60 % (see Figure \ref{scsiglue_patch}).
  61
  62 \section{Metavariables for transformations}
  63
  64 The \NT{rulename} portion of the metavariable declaration can specify
  65 properties of a rule such as its name, the names of the rules that it
  66 depends on, the isomorphisms to be used in processing the rule, and whether
  67 quantification over paths should be universal or existential.  The optional
  68 annotation {\tt expression} indicates that the pattern is to be considered
  69 as matching an expression, and thus can be used to avoid some parsing
  70 problems.
  71
  72 The \NT{metadecl} portion of the metavariable declaration defines various
  73 types of metavariables that will be used for matching in the transformation
  74 section.
  75
  76 \begin{grammar}
  77   \RULE{\rt{metavariables}}
  78   \CASE{@@ \any{\NT{metadecl}} @@}
  79   \CASE{@ \NT{rulename} @ \any{\NT{metadecl}} @@}
  80
  81   \RULE{\rt{rulename}}
  82   \CASE{\T{id} \OPT{extends \T{id}} \OPT{depends on \NT{dep}} \opt{\NT{iso}}
  83     \opt{\NT{disable-iso}} \opt{\NT{exists}} \opt{expression}}
  84
  85   \RULE{\rt{dep}}
  86   \CASE{\T{id}}
  87   \CASE{!\T{id}}
  88   \CASE{!(\NT{dep})}
  89   \CASE{ever \T{id}}
  90   \CASE{never \T{id}}
  91   \CASE{\NT{dep} \&\& \NT{dep}}
  92   \CASE{\NT{dep} || \NT{dep}}
  93   \CASE{(\NT{dep})}
  94
  95   \RULE{\rt{iso}}
  96   \CASE{using \NT{string} \ANY{, \NT{string}}}
  97
  98   \RULE{\rt{disable-iso}}
  99   \CASE{disable \NT{COMMA\_LIST}\mth{(}\T{id}\mth{)}}
 100
 101   \RULE{\rt{exists}}
 102   \CASE{exists}
 103   \CASE{forall}
 104 %  \CASE{\opt{reverse} forall}
 105
 106   \RULE{\rt{COMMA\_LIST}\mth{(}\rt{elem}\mth{)}}
 107   \CASE{\NT{elem} \ANY{, \NT{elem}}}
 108 \end{grammar}
 109
 110 The keyword \KW{disable} is normally used with the names of
 111 isomorphisms defined in standard.iso or whatever isomorphism file has been
 112 included.  There are, however, some other isomorphisms that are built into
 113 the implementation of Coccinelle and that can be disabled as well.  Their
 114 names are given below.  In each case, the text describes the standard
 115 behavior.  Using \NT{disable-iso} with the given name disables this behavior.
 116
 117 \begin{itemize}
 118 \item \KW{optional\_storage}: A SmPL function definition that does not
 119   specify any visibility (i.e., static or extern), or a SmPL variable
 120   declaration that does not specify any storage (i.e., auto, static,
 121   register, or extern), matches a function declaration or variable
 122   declaration with any visibility or storage, respectively.
 123 \item \KW{optional\_qualifier}: This is similar to \KW{optional\_storage},
 124   except that here is it the qualifier (i.e., const or volatile) that does
 125   not have to be specified in the SmPL code, but may be present in the C code.
 126 \item \KW{value\_format}: Integers in various formats, e.g., 1 and 0x1, are
 127   considered to be equivalent in the matching process.
 128 \item \KW{optional\_declarer\_semicolon}: Some declarers (top-level terms
 129   that look like function calls but serve to declare some variable) don't
 130   require a semicolon.  This isomorphism allows a SmPL declarer with a semicolon
 131   to match such a C declarer, if no transformation is specified on the SmPL
 132   semicolon.
 133 \item \KW{comm\_assoc}: An expression of the form \NT{exp} \NT{bin\_op}
 134   \KW{...}, where \NT{bin\_op} is commutative and associative, is
 135   considered to match any top-level sequence of \NT{bin\_op} operators
 136   containing \NT{exp} as the top-level argument.
 137 \end{itemize}
 138
 139 The possible types of metavariable declarations are defined by the grammar
 140 rule below.  Metavariables should occur at least once in the transformation
 141 immediately following their declaration.  Fresh identifier metavariables
 142 must only be used in {\tt +} code.  These properties are not expressed in
 143 the grammar, but are checked by a subsequent analysis.  The metavariables
 144 are designated according to the kind of terms they can match, such as a
 145 statement, an identifier, or an expression.  An expression metavariable can
 146 be further constrained by its type.  A declaration metavariable matches the
 147 declaration of one or more variables, all sharing the same type
 148 specification ({\em e.g.}, {\tt int a,b,c=3;}).  A field metavariable does
 149 the same, but for structure fields.
 150
 151 \begin{grammar}
 152   \RULE{\rt{metadecl}}
 153   \CASE{metavariable \NT{ids} ;}
 154   \CASE{fresh identifier \NT{ids} ;}
 155   \CASE{identifier \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_regexp}\mth{)} ;}
 156   \CASE{identifier \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_virt\_or\_not\_eq}\mth{)} ;}
 157   \CASE{parameter \opt{list} \NT{ids} ;}
 158   \CASE{parameter list [ \NT{id} ] \NT{ids} ;}
 159   \CASE{parameter list [ \NT{const} ] \NT{ids} ;}
 160   \CASE{type \NT{ids} ;}
 161   \CASE{statement \opt{list} \NT{ids} ;}
 162   \CASE{declaration \NT{ids} ;}
 163   \CASE{field \opt{list} \NT{ids} ;}
 164   \CASE{typedef \NT{ids} ;}
 165   \CASE{declarer name \NT{ids} ;}
 166 %  \CASE{\opt{local} function \NT{pmid\_with\_not\_eq\_list} ;}
 167   \CASE{declarer \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_regexp}\mth{)} ;}
 168   \CASE{declarer \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
 169   \CASE{iterator name \NT{ids} ;}
 170   \CASE{iterator \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_regexp}\mth{)} ;}
 171   \CASE{iterator \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
 172 %  \CASE{error \NT{pmid\_with\_not\_eq\_list} ; }
 173   \CASE{\opt{local} idexpression \opt{\NT{ctype}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
 174   \CASE{\opt{local} idexpression \OPT{\ttlb \NT{ctypes}\ttrb~\any{*}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
 175   \CASE{\opt{local} idexpression \some{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
 176   \CASE{expression list \NT{ids} ;}
 177   \CASE{expression \some{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
 178   \CASE{expression enum \any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
 179   \CASE{expression struct \any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
 180   \CASE{expression union \any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
 181   \CASE{expression \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_ceq}\mth{)} ;}
 182   \CASE{expression list [ \NT{id} ] \NT{ids} ;}
 183   \CASE{expression list [ \NT{const} ] \NT{ids} ;}
 184   \CASE{\NT{ctype} [ ] \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
 185   \CASE{\NT{ctype} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_ceq}\mth{)} ;}
 186   \CASE{\ttlb \NT{ctypes}\ttrb~\any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_ceq}\mth{)} ;}
 187   \CASE{\ttlb \NT{ctypes}\ttrb~\any{*} [ ] \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
 188   \CASE{constant \opt{\NT{ctype}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
 189   \CASE{constant \OPT{\ttlb \NT{ctypes}\ttrb~\any{*}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
 190   \CASE{position \opt{any} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq\_mid}\mth{)} ;}
 191   \CASE{symbol \NT{ids};}
 192 \end{grammar}
 193
 194 A metavariable declaration local idexpression v means that v is restricted
 195 to be a local variable.  If it should just be a variable, but not
 196 necessarily a local one, then drop local.  A more complex description of a
 197 location, such as a->b is considered to be an expression, not an
 198 ideexpression.
 199
 200 Constant is for constants, such as 27.  But it also considers an identifier
 201 that is all capital letters (possibly containing numbers) as a constant as
 202 well, because the names gives to macros in Linux usually have this form.
 203
 204 An identifier is the name of a structure field, a macro, a function, or a
 205 variable.  Is is the name of something rather than an expression that has a
 206 value.  But an identifier can be used in the position of an expression as
 207 well, where it represents a variable.
 208
 209 It is possible to specify that an expression list or a parameter list
 210 metavariable should match a specific number of expressions or parameters.
 211
 212 It is possible to specify some information about the definition of a fresh
 213 identifier.  See the wiki.
 214
 215 A symbol declaration specifies that the provided identifiers should be
 216 considered C identifiers when encountered in the body of the rule.
 217 Identifiers in the body of the rule that are not declared explicitly are
 218 by default considered symbols, thus symbol declarations are optional.
 219
 220 A position metavariable is used by attaching it using \texttt{@} to any
 221 token, including another metavariable.  Its value is the position (file,
 222 line number, etc.) of the code matched by the token.  It is also possible
 223 to attach expression, declaration, type, initialiser, and statement
 224 metavariables in this manner.  In that case, the metavariable is bound to
 225 the closest enclosing expression, declaration, etc.  If such a metavariable
 226 is itself followed by a position metavariable, the position metavariable
 227 applies to the metavariable that it follows, and not to the attached token.
 228 This makes it possible to get eg the starting and ending position of {\tt
 229   f(...)}, by writing {\tt f(...)@E@p}, for expression metavariable {\tt E}
 230 and position metavariable {\tt p}.
 231
 232 \begin{grammar}
 233   \RULE{\rt{ids}}
 234   \CASE{\NT{COMMA\_LIST}\mth{(}\NT{pmid}\mth{)}}
 235
 236   \RULE{\rt{pmid}}
 237   \CASE{\T{id}}
 238   \CASE{\NT{mid}}
 239 %   \CASE{list}
 240 %   \CASE{error}
 241 %   \CASE{type}
 242
 243   \RULE{\rt{mid}}  \CASE{\T{rulename\_id}.\T{id}}
 244
 245   \RULE{\rt{pmid\_with\_regexp}}
 246   \CASE{\NT{pmid} =\~{} \NT{regexp}}
 247   \CASE{\NT{pmid} !\~{} \NT{regexp}}
 248
 249   \RULE{\rt{pmid\_with\_not\_eq}}
 250   \CASE{\NT{pmid} \OPT{!= \NT{id\_or\_meta}}}
 251   \CASE{\NT{pmid}
 252      \OPT{!= \ttlb~\NT{COMMA\_LIST}\mth{(}\NT{id\_or\_meta}\mth{)} \ttrb}}
 253
 254   \RULE{\rt{pmid\_with\_virt\_or\_not\_eq}}
 255   \CASE{virtual.\T{id}}
 256   \CASE{\NT{pmid\_with\_not\_eq}}
 257
 258   \RULE{\rt{pmid\_with\_not\_ceq}}
 259   \CASE{\NT{pmid} \OPT{!= \NT{id\_or\_cst}}}
 260   \CASE{\NT{pmid} \OPT{!= \ttlb~\NT{COMMA\_LIST}\mth{(}\NT{id\_or\_cst}\mth{)} \ttrb}}
 261
 262   \RULE{\rt{id\_or\_cst}}
 263   \CASE{\T{id}}
 264   \CASE{\T{integer}}
 265
 266   \RULE{\rt{id\_or\_meta}}
 267   \CASE{\T{id}}
 268   \CASE{\T{rulename\_id}.\T{id}}
 269
 270   \RULE{\rt{pmid\_with\_not\_eq\_mid}}
 271   \CASE{\NT{pmid} \OPT{!= \NT{mid}}}
 272   \CASE{\NT{pmid} \OPT{!= \ttlb~\NT{COMMA\_LIST}\mth{(}\NT{mid}\mth{)} \ttrb}}
 273 \end{grammar}
 274
 275 Subsequently, we refer to arbitrary metavariables as
 276 \mth{\msf{metaid}^{\mbox{\scriptsize{\it{ty}}}}}, where {\it{ty}}
 277 indicates the {\it metakind} used in the declaration of the variable.
 278 For example, \mth{\msf{metaid}^{\ssf{Type}}} refers to a metavariable
 279 that was declared using \texttt{type} and stands for any type.
 280
 281 {\tt metavariable} declares a metavariable for which the parser tried to
 282 figure out the metavariable type based on the usage context.  Such a
 283 metavariable must be used consistently.  These metavariables cannot be used
 284 in all contexts; specifically, they cannot be used in context that would
 285 make the parsing ambiguous.  Some examples are the leftmost term of an
 286 expression, such as the left-hand side of an assignment, or the type in a
 287 variable declaration.  These restrictions may seems somewhat arbitrary from
 288 the user's point of view.  Thus, it is better to use metavariables with
 289 metavariable types.  If Coccinelle is given the argument {\tt
 290   -parse\_cocci}, it will print information about the type that is inferred
 291 for each metavariable.
 292
 293 The \NT{ctype} and \NT{ctypes} nonterminals are used by both the grammar of
 294 metavariable declarations and the grammar of transformations, and are
 295 defined on page~\pageref{types}.
 296
 297 An identifier metavariable with {\tt virtual} as its ``rule name'' is given
 298 a value on the command line.  For example, if a semantic patch contains a
 299 rule that declares an identifier metavariable with the name {\tt
 300   virtual.alloc}, then the command line could contain {\tt -D
 301   alloc=kmalloc}.  There should not be space around the {\tt =}.  An
 302 example is in {\tt demos/vm.cocci} and {\tt demos/vm.c}.
 303
 304
 305 \paragraph*{Warning:} Each metavariable declaration causes the declared
 306 metavariables to be immediately usable, without any inheritance
 307 indication.  Thus the following are correct:
 308
 309 \begin{quote}
 310 \begin{verbatim}
 311 @@
 312 type r.T;
 313 T x;
 314 @@
 315
 316 [...] // some semantic patch code
 317 \end{verbatim}
 318 \end{quote}
 319
 320 \begin{quote}
 321 \begin{verbatim}
 322 @@
 323 r.T x;
 324 type r.T;
 325 @@
 326
 327 [...] // some semantic patch code
 328 \end{verbatim}
 329 \end{quote}
 330
 331 \noindent
 332 But the following is not correct:
 333
 334 \begin{quote}
 335 \begin{verbatim}
 336 @@
 337 type r.T;
 338 r.T x;
 339 @@
 340
 341 [...] // some semantic patch code
 342 \end{verbatim}
 343 \end{quote}
 344
 345 This applies to position variables, type metavariables, identifier
 346 metavariables that may be used in specifying a structure type, and
 347 metavariables used in the initialization of a fresh identifier.  In the
 348 case of a structure type, any identifier metavariable indeed has to be
 349 declared as an identifier metavariable in advance.  The syntax does not
 350 permit {\tt r.n} as the name of a structure or union type in such a
 351 declaration.
 352
 353 \section{Metavariables for scripts}
 354
 355 Metavariables for scripts can only be inherited from transformation rules.
 356 In the spirit of scripting languages such as Python that use dynamic
 357 typing, metavariables for scripts do not include type declarations.
 358
 359 \begin{grammar}
 360   \RULE{\rt{script\_metavariables}}
 361   \CASE{@ script:\NT{language} \OPT{\NT{rulename}} \OPT{depends on \NT{dep}} @
 362         \any{\NT{script\_metadecl}} @@}
 363   \CASE{@ initialize:\NT{language} \OPT{depends on \NT{dep}} @}
 364   \CASE{@ finalize:\NT{language} \OPT{depends on \NT{dep}} @}
 365
 366   \RULE{\rt{language}} \CASE{python} \CASE{ocaml}
 367
 368   \RULE{\rt{script\_metadecl}}
 369   \CASE{\T{id} <{}< \T{rulename\_id}.\T{id} ;}
 370   \CASE{\T{id} ;}
 371 \end{grammar}
 372
 373 Currently, the only scripting languages that are supported are Python and
 374 OCaml, indicated using {\tt python} and {\tt ocaml}, respectively.  The
 375 set of available scripting languages may be extended at some point.
 376
 377 Script rules declared with \KW{initialize} are run before the treatment of
 378 any file.  Script rules declared with \KW{finalize} are run when the
 379 treatment of all of the files has completed.  There can be at most one of
 380 each per scripting language (thus currently at most one of each).
 381 Initialize and finalize script rules do not have access to SmPL
 382 metavariables.  Nevertheless, a finalize script rule can access any
 383 variables initialized by the other script rules, allowing information to be
 384 transmitted from the matching process to the finalize rule.
 385
 386 A script metavariable that does not specify an origin, using \texttt{<<},
 387 is newly declared by the script.  This metavariable should be assigned to a
 388 string and can be inherited by subsequent rules as an identifier.  In
 389 Python, the assignment of such a metavariable $x$ should refer to the
 390 metavariable as {\tt coccinelle.\(x\)}.  Examples are in the files
 391 \texttt{demos/pythontococci.cocci} and \texttt{demos/camltococci.cocci}.
 392
 393 In an ocaml script, the following extended form of \textit{script\_metadecl}
 394 may be used:
 395
 396 \begin{grammar}
 397   \RULE{\rt{script\_metadecl}}
 398   \CASE{(\T{id},\T{id}) <{}< \T{rulename\_id}.\T{id} ;}
 399   \CASE{\T{id} <{}< \T{rulename\_id}.\T{id} ;}
 400   \CASE{\T{id} ;}
 401 \end{grammar}
 402
 403 \noindent
 404 In a declaration of the form \texttt{(\T{id},\T{id}) <{}<
 405   \T{rulename\_id}.\T{id} ;}, the left component of \texttt{(\T{id},\T{id})}
 406 receives a string representation of the value of the inherited metavariable
 407 while the right component receives its abstract syntax tree.  The file
 408 \texttt{parsing\_c/ast\_c.ml} in the Coccinelle implementation gives some
 409 information about the structure of the abstract syntax tree.  Either the
 410 left or right component may be replaced by \verb+_+, indicating that the
 411 string representation or abstract syntax trees representation is not
 412 wanted, respectively.
 413
 414 The abstract syntax tree of a metavariable declared using {\tt
 415   metavariable} is not available.
 416
 417 \section{Transformation}
 418
 419 The transformation specification essentially has the form of C code, except
 420 that lines to remove are annotated with \verb+-+ in the first column, and
 421 lines to add are annotated with \verb-+-.  A transformation specification
 422 can also use {\em dots}, ``\verb-...-'', describing an arbitrary sequence
 423 of function arguments or instructions within a control-flow path.
 424 Implicitly, ``\verb-...-'' matches the shortest path between something that
 425 matches the pattern before the dots (or the beginning of the function, if
 426 there is nothing before the dots) and something that matches the pattern
 427 after the dots (or the end of the function, if there is nothing after the
 428 dots).  Dots may be modified with a {\tt when} clause, indicating a pattern
 429 that should not occur anywhere within the matched sequence.  {\tt when any}
 430 removes the aforementioned constraint that ``\verb-...-'' matches the
 431 shortest path.  Finally, a transformation can specify a disjunction of
 432 patterns, of the form \mtt{( \mth{\mita{pat}_1} | \mita{\ldots} |
 433   \mth{\mita{pat}_n} )} where each \texttt{(}, \texttt{|} or \texttt{)} is
 434 in column 0 or preceded by \texttt{\textbackslash}.
 435
 436 The grammar that we present for the transformation is not actually the
 437 grammar of the SmPL code that can be written by the programmer, but is
 438 instead the grammar of the slice of this consisting of the {\tt -}
 439 annotated and the unannotated code (the context of the transformed lines),
 440 or the {\tt +} annotated code and the unannotated code.  For example, for
 441 parsing purposes, the following transformation
 442 %presented in Section \ref{sec:seq2}
 443 is split into the two variants shown below and each is parsed
 444 separately.
 445
 446 \begin{center}
 447 \begin{tabular}{c}
 448 \begin{lstlisting}[language=Cocci]
 449   proc_info_func(...) {
 450     <...
 451 @--    hostno
 452 @++    hostptr->host_no
 453     ...>
 454  }
 455 \end{lstlisting}\\
 456 \end{tabular}
 457 \end{center}
 458
 459 {%\sizecodebis
 460 \begin{center}
 461 \begin{tabular}{p{5cm}p{3cm}p{5cm}}
 462 \begin{lstlisting}[language=Cocci]
 463   proc_info_func(...) {
 464     <...
 465 @--    hostno
 466     ...>
 467  }
 468 \end{lstlisting}
 469 &&
 470 \begin{lstlisting}[language=Cocci]
 471   proc_info_func(...) {
 472     <...
 473 @++    hostptr->host_no
 474     ...>
 475  }
 476 \end{lstlisting}
 477 \end{tabular}
 478 \end{center}
 479 }
 480
 481 \noindent
 482 Requiring that both slices parse correctly ensures that the rule matches
 483 syntactically valid C code and that it produces syntactically valid C code.
 484 The generated parse trees are then merged for use in the subsequent
 485 matching and transformation process.
 486
 487 The grammar for the minus or plus slice of a transformation is as follows:
 488
 489 \begin{grammar}
 490
 491   \RULE{\rt{transformation}}
 492   \CASE{\some{\NT{include}}}
 493   \CASE{\NT{OPTDOTSEQ}\mth{(}\NT{expr}, \NT{when}\mth{)}}
 494   \CASE{\NT{OPTDOTSEQ}\mth{(}\some{\NT{decl\_stmt}}, \NT{when}\mth{)}}
 495   \CASE{\NT{OPTDOTSEQ}\mth{(}\NT{fundecl}, \NT{when}\mth{)}}
 496
 497   \RULE{\rt{include}}
 498   \CASE{\#include \T{include\_string}}
 499
 500 %  \RULE{\rt{fun\_decl\_stmt}}
 501 %  \CASE{\NT{decl\_stmt}}
 502 %  \CASE{\NT{fundecl}}
 503
 504 %  \CASE{\NT{ctype}}
 505 %  \CASE{\ttlb \NT{initialize\_list} \ttrb}
 506 %  \CASE{\NT{toplevel\_seq\_start\_after\_dots\_init}}
 507 %
 508 %  \RULE{\rt{toplevel\_seq\_start\_after\_dots\_init}}
 509 %  \CASE{\NT{stmt\_dots} \NT{toplevel\_after\_dots}}
 510 %  \CASE{\NT{expr} \opt{\NT{toplevel\_after\_exp}}}
 511 %  \CASE{\NT{decl\_stmt\_expr} \opt{\NT{toplevel\_after\_stmt}}}
 512 %
 513 %  \RULE{\rt{stmt\_dots}}
 514 %  \CASE{... \any{\NT{when}}}
 515 %  \CASE{<... \any{\NT{when}} \NT{nest\_after\_dots} ...>}
 516 %  \CASE{<+... \any{\NT{when}} \NT{nest\_after\_dots} ...+>}
 517
 518   \RULE{\rt{when}}
 519   \CASE{when != \NT{when\_code}}
 520   \CASE{when = \NT{rule\_elem\_stmt}}
 521   \CASE{when \NT{COMMA\_LIST}\mth{(}\NT{any\_strict}\mth{)}}
 522   \CASE{when true != \NT{expr}}
 523   \CASE{when false != \NT{expr}}
 524
 525   \RULE{\rt{when\_code}}
 526   \CASE{\NT{OPTDOTSEQ}\mth{(}\some{\NT{decl\_stmt}}, \NT{when}\mth{)}}
 527   \CASE{\NT{OPTDOTSEQ}\mth{(}\NT{expr}, \NT{when}\mth{)}}
 528
 529   \RULE{\rt{rule\_elem\_stmt}}
 530   \CASE{\NT{one\_decl}}
 531   \CASE{\NT{expr};}
 532   \CASE{return \opt{\NT{expr}};}
 533   \CASE{break;}
 534   \CASE{continue;}
 535   \CASE{\bs(\NT{rule\_elem\_stmt} \SOME{\bs| \NT{rule\_elem\_stmt}}\bs)}
 536
 537   \RULE{\rt{any\_strict}}
 538   \CASE{any}
 539   \CASE{strict}
 540   \CASE{forall}
 541   \CASE{exists}
 542
 543 %  \RULE{\rt{nest\_after\_dots}}
 544 %  \CASE{\NT{decl\_stmt\_exp} \opt{\NT{nest\_after\_stmt}}}
 545 %  \CASE{\opt{\NT{exp}} \opt{\NT{nest\_after\_exp}}}
 546 %
 547 %  \RULE{\rt{nest\_after\_stmt}}
 548 %  \CASE{\NT{stmt\_dots} \NT{nest\_after\_dots}}
 549 %  \CASE{\NT{decl\_stmt} \opt{\NT{nest\_after\_stmt}}}
 550 %
 551 %  \RULE{\rt{nest\_after\_exp}}
 552 %  \CASE{\NT{stmt\_dots} \NT{nest\_after\_dots}}
 553 %
 554 %  \RULE{\rt{toplevel\_after\_dots}}
 555 %  \CASE{\opt{\NT{toplevel\_after\_exp}}}
 556 %  \CASE{\NT{exp} \opt{\NT{toplevel\_after\_exp}}}
 557 %  \CASE{\NT{decl\_stmt\_expr} \NT{toplevel\_after\_stmt}}
 558 %
 559 %  \RULE{\rt{toplevel\_after\_exp}}
 560 %  \CASE{\NT{stmt\_dots} \opt{\NT{toplevel\_after\_dots}}}
 561 %
 562 %  \RULE{\rt{decl\_stmt\_expr}}
 563 %  \CASE{TMetaStmList$^\ddag$}
 564 %  \CASE{\NT{decl\_var}}
 565 %  \CASE{\NT{stmt}}
 566 %  \CASE{(\NT{stmt\_seq} \ANY{| \NT{stmt\_seq}})}
 567 %
 568 %  \RULE{\rt{toplevel\_after\_stmt}}
 569 %  \CASE{\NT{stmt\_dots} \opt{\NT{toplevel\_after\_dots}}}
 570 %  \CASE{\NT{decl\_stmt} \NT{toplevel\_after\_stmt}}
 571
 572 \end{grammar}
 573
 574 \begin{grammar}
 575   \RULE{\rt{OPTDOTSEQ}\mth{(}\rt{grammar\_ds}, \rt{when\_ds}\mth{)}}
 576   \CASE{}\multicolumn{3}{r}{\hspace{1cm}
 577   \KW{\opt{... \ANY{\NT{when\_ds}}} \NT{grammar\_ds}
 578     \ANY{... \ANY{\NT{when\_ds}} \NT{grammar\_ds}}
 579     \opt{... \ANY{\NT{when\_ds}}}}
 580   }
 581
 582 %  \CASE{\opt{... \opt{\NT{when\_ds}}} \NT{grammar}
 583 %    \ANY{... \opt{\NT{when\_ds}} \NT{grammar}}
 584 %    \opt{... \opt{\NT{when\_ds}}}}
 585 %  \CASE{<... \any{\NT{when\_ds}} \NT{grammar} ...>}
 586 %  \CASE{<+... \any{\NT{when\_ds}} \NT{grammar} ...+>}
 587
 588 \end{grammar}
 589
 590 \noindent
 591 Lines may be annotated with an element of the set $\{\mtt{-}, \mtt{+},
 592 \mtt{*}\}$ or the singleton $\mtt{?}$, or one of each set. \mtt{?}
 593 represents at most one match of the given pattern, ie a match of the
 594 pattern is optional. \mtt{*} is used for
 595 semantic match, \emph{i.e.}, a pattern that highlights the fragments
 596 annotated with \mtt{*}, but does not perform any modification of the
 597 matched code. \mtt{*} cannot be mixed with \mtt{-} and \mtt{+}.  There are
 598 some constraints on the use of these annotations:
 599 \begin{itemize}
 600 \item Dots, {\em i.e.} \texttt{...}, cannot occur on a line marked
 601   \texttt{+}.
 602 \item Nested dots, {\em i.e.}, dots enclosed in {\tt <} and {\tt >}, cannot
 603   occur on a line with any marking.
 604 \end{itemize}
 605
 606 Each element of a disjunction must be a proper term like an
 607 expression, a statement, an identifier or a declaration. Thus, the
 608 rule on the left below is not a syntactically correct SmPL rule. One may
 609 use the rule on the right instead.
 610
 611 \begin{center}
 612   \begin{tabular}{l@{\hspace{5cm}}r}
 613 \begin{lstlisting}[language=Cocci]
 614 @@
 615 type T;
 616 T b;
 617 @@
 618
 619 (
 620  writeb(...,
 621 |
 622  readb(...,
 623 )
 624 @--(T)
 625  b)
 626 \end{lstlisting}
 627     &
 628 \begin{lstlisting}[language=Cocci]
 629 @@
 630 type T;
 631 T b;
 632 @@
 633
 634 (
 635 read
 636 |
 637 write
 638 )
 639  (...,
 640 @-- (T)
 641   b)
 642 \end{lstlisting}
 643     \\
 644   \end{tabular}
 645 \end{center}
 646
 647 Some kinds of terms can only appear in + code.  These include comments,
 648 ifdefs, and attributes (\texttt{\_\_attribute\_\_((...))}).
 649
 650 \section{Types}
 651 \label{types}
 652
 653 \begin{grammar}
 654
 655   \RULE{\rt{ctypes}}
 656   \CASE{\NT{COMMA\_LIST}\mth{(}\NT{ctype}\mth{)}}
 657
 658   \RULE{\rt{ctype}}
 659   \CASE{\opt{\NT{const\_vol}} \NT{generic\_ctype} \any{*}}
 660   \CASE{\opt{\NT{const\_vol}} void \some{*}}
 661   \CASE{(\NT{ctype} \ANY{| \NT{ctype}})}
 662
 663   \RULE{\rt{const\_vol}}
 664   \CASE{const}
 665   \CASE{volatile}
 666
 667   \RULE{\rt{generic\_ctype}}
 668   \CASE{\NT{ctype\_qualif}}
 669   \CASE{\opt{\NT{ctype\_qualif}} char}
 670   \CASE{\opt{\NT{ctype\_qualif}} short}
 671   \CASE{\opt{\NT{ctype\_qualif}} short int}
 672   \CASE{\opt{\NT{ctype\_qualif}} int}
 673   \CASE{\opt{\NT{ctype\_qualif}} long}
 674   \CASE{\opt{\NT{ctype\_qualif}} long int}
 675   \CASE{\opt{\NT{ctype\_qualif}} long long}
 676   \CASE{\opt{\NT{ctype\_qualif}} long long int}
 677   \CASE{double}
 678   \CASE{long double}
 679   \CASE{float}
 680   \CASE{size\_t} \CASE{ssize\_t} \CASE{ptrdiff\_t}
 681   \CASE{enum \NT{id} \{ \NT{PARAMSEQ}\mth{(}\NT{dot\_expr}, \NT{exp\_whencode}\mth{)} \OPT{,} \}}
 682   \CASE{\OPT{struct\OR union} \T{id} \OPT{\{ \any{\NT{struct\_decl\_list}} \}}}
 683
 684   \RULE{\rt{ctype\_qualif}}
 685   \CASE{unsigned}
 686   \CASE{signed}
 687
 688   \RULE{\rt{struct\_decl\_list}}
 689   \CASE{\NT{struct\_decl\_list\_start}}
 690
 691   \RULE{\rt{struct\_decl\_list\_start}}
 692   \CASE{\NT{struct\_decl}}
 693   \CASE{\NT{struct\_decl} \NT{struct\_decl\_list\_start}}
 694   \CASE{... \opt{when != \NT{struct\_decl}}$^\dag$ \opt{\NT{continue\_struct\_decl\_list}}}
 695
 696   \RULE{\rt{continue\_struct\_decl\_list}}
 697   \CASE{\NT{struct\_decl} \NT{struct\_decl\_list\_start}}
 698   \CASE{\NT{struct\_decl}}
 699
 700   \RULE{\rt{struct\_decl}}
 701   \CASE{\NT{ctype} \NT{d\_ident};}
 702   \CASE{\NT{fn\_ctype} (* \NT{d\_ident}) (\NT{PARAMSEQ}\mth{(}\NT{name\_opt\_decl}, \mth{\varepsilon)});)}
 703   \CASE{\opt{\NT{const\_vol}} \T{id} \NT{d\_ident};}
 704
 705   \RULE{\rt{d\_ident}}
 706   \CASE{\T{id} \any{[\opt{\NT{expr}}]}}
 707
 708   \RULE{\rt{fn\_ctype}}
 709   \CASE{\NT{generic\_ctype} \any{*}}
 710   \CASE{void \any{*}}
 711
 712   \RULE{\rt{name\_opt\_decl}}
 713   \CASE{\NT{decl}}
 714   \CASE{\NT{ctype}}
 715   \CASE{\NT{fn\_ctype}}
 716 \end{grammar}
 717
 718 $^\dag$ The optional \texttt{when} construct ends at the end of the line.
 719
 720 \section{Function declarations}
 721
 722 \begin{grammar}
 723
 724   \RULE{\rt{fundecl}}
 725   \CASE{\opt{\NT{fn\_ctype}} \any{\NT{funinfo}} \NT{funid}
 726     (\opt{\NT{PARAMSEQ}\mth{(}\NT{param}, \mth{\varepsilon)}})
 727     \ttlb~\opt{\NT{stmt\_seq}} \ttrb}
 728
 729   \RULE{\rt{funproto}}
 730   \CASE{\opt{\NT{fn\_ctype}} \any{\NT{funinfo}} \NT{funid}
 731     (\opt{\NT{PARAMSEQ}\mth{(}\NT{param}, \mth{\varepsilon)}});}
 732
 733   \RULE{\rt{funinfo}}
 734   \CASE{inline}
 735   \CASE{\NT{storage}}
 736 %   \CASE{\NT{attr}}
 737
 738   \RULE{\rt{storage}}
 739   \CASE{static}
 740   \CASE{auto}
 741   \CASE{register}
 742   \CASE{extern}
 743
 744   \RULE{\rt{funid}}
 745   \CASE{\T{id}}
 746   \CASE{\mth{\T{metaid}^{\ssf{Id}}}}
 747   \CASE{\NT{OR}\mth{(}\NT{stmt}\mth{)}}
 748 %   \CASE{\mth{\T{metaid}^{\ssf{Func}}}}
 749 %   \CASE{\mth{\T{metaid}^{\ssf{LocalFunc}}}}
 750
 751   \RULE{\rt{param}}
 752   \CASE{\NT{type} \T{id}}
 753   \CASE{\mth{\T{metaid}^{\ssf{Param}}}}
 754   \CASE{\mth{\T{metaid}^{\ssf{ParamList}}}}
 755
 756   \RULE{\rt{decl}}
 757   \CASE{\NT{ctype} \NT{id}}
 758   \CASE{\NT{fn\_ctype} (* \NT{id}) (\NT{PARAMSEQ}\mth{(}\NT{name\_opt\_decl}, \mth{\varepsilon)})}
 759   \CASE{void}
 760   \CASE{\mth{\T{metaid}^{\ssf{Param}}}}
 761 \end{grammar}
 762
 763 \begin{grammar}
 764   \RULE{\rt{PARAMSEQ}\mth{(}\rt{gram\_p}, \rt{when\_p}\mth{)}}
 765   \CASE{\NT{COMMA\_LIST}\mth{(}\NT{gram\_p} \OR \ldots \opt{\NT{when\_p}}\mth{)}}
 766 \end{grammar}
 767
 768 To match a function it is not necessary to provide all of the annotations
 769 that appear before the function name.  For example, the following semantic
 770 patch:
 771
 772 \begin{lstlisting}[language=Cocci]
 773 @@
 774 @@
 775
 776 foo() { ... }
 777 \end{lstlisting}
 778
 779 \noindent
 780 matches a function declared as follows:
 781
 782 \begin{lstlisting}[language=C]
 783 static int foo() { return 12; }
 784 \end{lstlisting}
 785
 786 \noindent
 787 This behavior can be turned off by disabling the \KW{optional\_storage}
 788 isomorphism.  If one adds code before a function declaration, then the
 789 effect depends on the kind of code that is added.  If the added code is a
 790 function definition or CPP code, then the new code is placed before
 791 all information associated with the function definition, including any
 792 comments preceding the function definition.  On the other hand, if the new
 793 code is associated with the function, such as the addition of the keyword
 794 {\tt static}, the new code is placed exactly where it appears with respect
 795 to the rest of the function definition in the semantic patch.  For example,
 796
 797 \begin{lstlisting}[language=Cocci]
 798 @@
 799 @@
 800
 801 + static
 802 foo() { ... }
 803 \end{lstlisting}
 804
 805 \noindent
 806 causes static to be placed just before the function name.  The following
 807 causes it to be placed just before the type
 808
 809 \begin{lstlisting}[language=Cocci]
 810 @@
 811 type T;
 812 @@
 813
 814 + static
 815 T foo() { ... }
 816 \end{lstlisting}
 817
 818 \noindent
 819 It may be necessary to consider several cases to ensure that the added ode
 820 is placed in the right position.  For example, one may need one pattern
 821 that considers that the function is declared {\tt inline} and another that
 822 considers that it is not.
 823
 824 %\newpage
 825
 826 \section{Declarations}
 827
 828 \begin{grammar}
 829   \RULE{\rt{decl\_var}}
 830 %  \CASE{\NT{type} \opt{\NT{id} \opt{[\opt{\NT{dot\_expr}}]}
 831 %      \ANY{, \NT{id} \opt{[ \opt{\NT{dot\_expr}}]}}};}
 832   \CASE{\NT{common\_decl}}
 833   \CASE{\opt{\NT{storage}} \NT{ctype} \NT{COMMA\_LIST}\mth{(}\NT{d\_ident}\mth{)} ;}
 834   \CASE{\opt{\NT{storage}} \opt{\NT{const\_vol}} \T{id} \NT{COMMA\_LIST}\mth{(}\NT{d\_ident}\mth{)} ;}
 835   \CASE{\opt{\NT{storage}} \NT{fn\_ctype} ( * \NT{d\_ident} ) ( \NT{PARAMSEQ}\mth{(}\NT{name\_opt\_decl}, \mth{\varepsilon)} ) = \NT{initialize} ;}
 836   \CASE{typedef \NT{ctype} \NT{typedef\_ident} ;}
 837
 838   \RULE{\rt{one\_decl}}
 839   \CASE{\NT{common\_decl}}
 840   \CASE{\opt{\NT{storage}} \NT{ctype} \NT{id};}
 841 %  \CASE{\NT{storage} \NT{ctype} \NT{id} \opt{[\opt{\NT{dot\\_expr}}]} = \NT{nest\\_expr};}
 842   \CASE{\opt{\NT{storage}} \opt{\NT{const\_vol}} \T{id} \NT{d\_ident} ;}
 843
 844   \RULE{\rt{common\_decl}}
 845   \CASE{\NT{ctype};}
 846   \CASE{\NT{funproto}}
 847   \CASE{\opt{\NT{storage}} \NT{ctype} \NT{d\_ident} = \NT{initialize} ;}
 848   \CASE{\opt{\NT{storage}} \opt{\NT{const\_vol}} \T{id} \NT{d\_ident} = \NT{initialize} ;}
 849   \CASE{\opt{\NT{storage}} \NT{fn\_ctype} ( * \NT{d\_ident} ) ( \NT{PARAMSEQ}\mth{(}\NT{name\_opt\_decl}, \mth{\varepsilon)} ) ;}
 850   \CASE{\NT{decl\_ident} ( \OPT{\NT{COMMA\_LIST}\mth{(}\NT{expr}\mth{)}} ) ;}
 851
 852   \RULE{\rt{initialize}}
 853   \CASE{\NT{dot\_expr}}
 854   \CASE{\mth{\T{metaid}^{\ssf{Initialiser}}}}
 855   \CASE{\ttlb~\opt{\NT{COMMA\_LIST}\mth{(}\NT{init\_list\_elem}\mth{)}}~\ttrb}
 856
 857   \RULE{\rt{init\_list\_elem}}
 858   \CASE{\NT{dot\_expr}}
 859   \CASE{\NT{designator} = \NT{initialize}}
 860   \CASE{\mth{\T{metaid}^{\ssf{Initialiser}}}}
 861   \CASE{\mth{\T{metaid}^{\ssf{InitialiserList}}}}
 862   \CASE{\NT{id} : \NT{dot\_expr}}
 863
 864   \RULE{\rt{designator}}
 865   \CASE{. \NT{id}}
 866   \CASE{[ \NT{dot\_expr} ]}
 867   \CASE{[ \NT{dot\_expr} ... \NT{dot\_expr} ]}
 868
 869   \RULE{\rt{decl\_ident}}
 870   \CASE{\T{DeclarerId}}
 871   \CASE{\mth{\T{metaid}^{\ssf{Declarer}}}}
 872 \end{grammar}
 873
 874 An initializer for a structure can be ordered or unordered.  It is
 875 considered to be unordered if there is at least one key-value pair
 876 initializer, e.g., \texttt{.x = e}.
 877
 878 \section{Statements}
 879
 880 The first rule {\em statement} describes the various forms of a statement.
 881 The remaining rules implement the constraints that are sensitive to the
 882 context in which the statement occurs: {\em single\_statement} for a
 883 context in which only one statement is allowed, and {\em decl\_statement}
 884 for a context in which a declaration, statement, or sequence thereof is
 885 allowed.
 886
 887 \begin{grammar}
 888   \RULE{\rt{stmt}}
 889   \CASE{\NT{include}}
 890   \CASE{\mth{\T{metaid}^{\ssf{Stmt}}}}
 891   \CASE{\NT{expr};}
 892   \CASE{if (\NT{dot\_expr}) \NT{single\_stmt} \opt{else \NT{single\_stmt}}}
 893   \CASE{for (\opt{\NT{dot\_expr}}; \opt{\NT{dot\_expr}}; \opt{\NT{dot\_expr}})
 894     \NT{single\_stmt}}
 895   \CASE{while (\NT{dot\_expr}) \NT{single\_stmt}}
 896   \CASE{do \NT{single\_stmt} while (\NT{dot\_expr});}
 897   \CASE{\NT{iter\_ident} (\any{\NT{dot\_expr}}) \NT{single\_stmt}}
 898   \CASE{switch (\opt{\NT{dot\_expr}}) \ttlb \any{\NT{case\_line}} \ttrb}
 899   \CASE{return \opt{\NT{dot\_expr}};}
 900   \CASE{\ttlb~\opt{\NT{stmt\_seq}} \ttrb}
 901   \CASE{\NT{NEST}\mth{(}\some{\NT{decl\_stmt}}, \NT{when}\mth{)}}
 902   \CASE{\NT{NEST}\mth{(}\NT{expr}, \NT{when}\mth{)}}
 903   \CASE{break;}
 904   \CASE{continue;}
 905   \CASE{\NT{id}:}
 906   \CASE{goto \NT{id};}
 907   \CASE{\ttlb \NT{stmt\_seq} \ttrb}
 908
 909   \RULE{\rt{single\_stmt}}
 910   \CASE{\NT{stmt}}
 911   \CASE{\NT{OR}\mth{(}\NT{stmt}\mth{)}}
 912
 913   \RULE{\rt{decl\_stmt}}
 914   \CASE{\mth{\T{metaid}^{\ssf{StmtList}}}}
 915   \CASE{\NT{decl\_var}}
 916   \CASE{\NT{stmt}}
 917   \CASE{\NT{OR}\mth{(}\NT{stmt\_seq}\mth{)}}
 918
 919   \RULE{\rt{stmt\_seq}}
 920   \CASE{\any{\NT{decl\_stmt}}
 921     \opt{\NT{DOTSEQ}\mth{(}\some{\NT{decl\_stmt}},
 922       \NT{when}\mth{)} \any{\NT{decl\_stmt}}}}
 923   \CASE{\any{\NT{decl\_stmt}}
 924     \opt{\NT{DOTSEQ}\mth{(}\NT{expr},
 925       \NT{when}\mth{)} \any{\NT{decl\_stmt}}}}
 926
 927   \RULE{\rt{case\_line}}
 928   \CASE{default :~\NT{stmt\_seq}}
 929   \CASE{case \NT{dot\_expr} :~\NT{stmt\_seq}}
 930
 931   \RULE{\rt{iter\_ident}}
 932   \CASE{\T{IteratorId}}
 933   \CASE{\mth{\T{metaid}^{\ssf{Iterator}}}}
 934 \end{grammar}
 935
 936 \begin{grammar}
 937   \RULE{\rt{OR}\mth{(}\rt{gram\_o}\mth{)}}
 938   \CASE{( \NT{gram\_o} \ANY{\ttmid \NT{gram\_o}})}
 939
 940   \RULE{\rt{DOTSEQ}\mth{(}\rt{gram\_d}, \rt{when\_d}\mth{)}}
 941   \CASE{\ldots \opt{\NT{when\_d}} \ANY{\NT{gram\_d} \ldots \opt{\NT{when\_d}}}}
 942
 943   \RULE{\rt{NEST}\mth{(}\rt{gram\_n}, \rt{when\_n}\mth{)}}
 944   \CASE{<\ldots \opt{\NT{when\_n}} \NT{gram\_n} \ANY{\ldots \opt{\NT{when\_n}} \NT{gram\_n}} \ldots>}
 945   \CASE{<+\ldots \opt{\NT{when\_n}} \NT{gram\_n} \ANY{\ldots \opt{\NT{when\_n}} \NT{gram\_n}} \ldots+>}
 946 \end{grammar}
 947
 948 \noindent
 949 OR is a macro that generates a disjunction of patterns.  The three
 950 tokens \T{(}, \T{\ttmid}, and \T{)} must appear in the leftmost
 951 column, to differentiate them from the parentheses and bit-or tokens
 952 that can appear within expressions (and cannot appear in the leftmost
 953 column). These token may also be preceded by \texttt{\bs}
 954 when they are used in an other column.  These tokens are furthermore
 955 different from (, \(\mid\), and ), which are part of the grammar
 956 metalanguage.
 957
 958 \section{Expressions}
 959
 960 A nest or a single ellipsis is allowed in some expression contexts, and
 961 causes ambiguity in others.  For example, in a sequence \mtt{\ldots
 962 \mita{expr} \ldots}, the nonterminal \mita{expr} must be instantiated as an
 963 explicit C-language expression, while in an array reference,
 964 \mtt{\mth{\mita{expr}_1} \mtt{[} \mth{\mita{expr}_2} \mtt{]}}, the
 965 nonterminal \mth{\mita{expr}_2}, because it is delimited by brackets, can
 966 be also instantiated as \mtt{\ldots}, representing an arbitrary expression.  To
 967 distinguish between the various possibilities, we define three nonterminals
 968 for expressions: {\em expr} does not allow either top-level nests or
 969 ellipses, {\em nest\_expr} allows a nest but not an ellipsis, and {\em
 970 dot\_expr} allows both.  The EXPR macro is used to express these variants
 971 in a concise way.
 972
 973 \begin{grammar}
 974   \RULE{\rt{expr}}
 975   \CASE{\NT{EXPR}\mth{(}\NT{expr}\mth{)}}
 976
 977   \RULE{\rt{nest\_expr}}
 978   \CASE{\NT{EXPR}\mth{(}\NT{nest\_expr}\mth{)}}
 979   \CASE{\NT{NEST}\mth{(}\NT{nest\_expr}, \NT{exp\_whencode}\mth{)}}
 980
 981   \RULE{\rt{dot\_expr}}
 982   \CASE{\NT{EXPR}\mth{(}\NT{dot\_expr}\mth{)}}
 983   \CASE{\NT{NEST}\mth{(}\NT{dot\_expr}, \NT{exp\_whencode}\mth{)}}
 984   \CASE{...~\opt{\NT{exp\_whencode}}}
 985
 986   \RULE{\rt{EXPR}\mth{(}\rt{exp}\mth{)}}
 987   \CASE{\NT{exp} \NT{assign\_op} \NT{exp}}
 988   \CASE{\NT{exp}++}
 989   \CASE{\NT{exp}--}
 990   \CASE{\NT{unary\_op} \NT{exp}}
 991   \CASE{\NT{exp} \NT{bin\_op} \NT{exp}}
 992   \CASE{\NT{exp} ?~\NT{dot\_expr} :~\NT{exp}}
 993   \CASE{(\NT{type}) \NT{exp}}
 994   \CASE{\NT{exp} [\NT{dot\_expr}]}
 995   \CASE{\NT{exp} .~\NT{id}}
 996   \CASE{\NT{exp} -> \NT{id}}
 997   \CASE{\NT{exp}(\opt{\NT{PARAMSEQ}\mth{(}\NT{arg}, \NT{exp\_whencode}\mth{)}})}
 998   \CASE{\NT{id}}
 999   \CASE{(\NT{type}) \ttlb~{\NT{COMMA\_LIST}\mth{(}\NT{init\_list\_elem}\mth{)}}~\ttrb}
1000 %   \CASE{\mth{\T{metaid}^{\ssf{Func}}}}
1001 %   \CASE{\mth{\T{metaid}^{\ssf{LocalFunc}}}}
1002   \CASE{\mth{\T{metaid}^{\ssf{Exp}}}}
1003 %   \CASE{\mth{\T{metaid}^{\ssf{Err}}}}
1004   \CASE{\mth{\T{metaid}^{\ssf{Const}}}}
1005   \CASE{\NT{const}}
1006   \CASE{(\NT{dot\_expr})}
1007   \CASE{\NT{OR}\mth{(}\NT{exp}\mth{)}}
1008
1009   \RULE{\rt{arg}}
1010   \CASE{\NT{nest\_expr}}
1011   \CASE{\mth{\T{metaid}^{\ssf{ExpList}}}}
1012
1013   \RULE{\rt{exp\_whencode}}
1014   \CASE{when != \NT{expr}}
1015
1016   \RULE{\rt{assign\_op}}
1017   \CASE{= \OR -= \OR += \OR *= \OR /= \OR \%=}
1018   \CASE{\&= \OR |= \OR \caret= \OR \lt\lt= \OR \gt\gt=}
1019
1020   \RULE{\rt{bin\_op}}
1021   \CASE{* \OR / \OR \% \OR + \OR -}
1022   \CASE{\lt\lt \OR \gt\gt \OR \caret\xspace \OR \& \OR \ttmid}
1023   \CASE{< \OR > \OR <= \OR >= \OR == \OR != \OR \&\& \OR \ttmid\ttmid}
1024
1025   \RULE{\rt{unary\_op}}
1026   \CASE{++ \OR -- \OR \& \OR * \OR + \OR - \OR !}
1027
1028 \end{grammar}
1029
1030 \section{Constants, Identifiers and Types for Transformations}
1031
1032 \begin{grammar}
1033   \RULE{\rt{const}}
1034   \CASE{\NT{string}}
1035   \CASE{[0-9]+}
1036   \CASE{\mth{\cdots}}
1037
1038   \RULE{\rt{string}}
1039   \CASE{"\any{[\^{}"]}"}
1040
1041   \RULE{\rt{id}}
1042   \CASE{\T{id} \OR \mth{\T{metaid}^{\ssf{Id}}}
1043         \OR {\NT{OR}\mth{(}\NT{stmt}\mth{)}}}
1044
1045   \RULE{\rt{typedef\_ident}}
1046   \CASE{\T{id} \OR \mth{\T{metaid}^{\ssf{Type}}}}
1047
1048   \RULE{\rt{type}}
1049   \CASE{\NT{ctype} \OR \mth{\T{metaid}^{\ssf{Type}}}}
1050
1051   \RULE{\rt{pathToIsoFile}}
1052   \CASE{<.*>}
1053
1054   \RULE{\rt{regexp}}
1055   \CASE{"\any{[\^{}"]}"}
1056 \end{grammar}
1057
1058 \section{Comments and preprocessor directives}
1059
1060 A \verb+//+ or \verb+/* */+ comment that is annotated with + in the
1061 leftmost column is considered to be added code.  A \verb+//+ or
1062 \verb+/* */+ comment without such an annotation is considered to be a
1063 comment about the SmPL code, and thus is not matched in the C code.
1064
1065 The following preprocessor directives can likewise be added.  They cannot
1066 be matched against.  The entire line is added, but it is not parsed.
1067
1068 \begin{itemize}
1069 \item \verb+if+
1070 \item \verb+ifdef+
1071 \item \verb+ifndef+
1072 \item \verb+else+
1073 \item \verb+elif+
1074 \item \verb+endif+
1075 \item \verb+error+
1076 \item \verb+pragma+
1077 \item \verb+line+
1078 \end{itemize}
1079
1080 \section{Command-line semantic match}
1081
1082 It is possible to specify a semantic match on the spatch command line,
1083 using the argument {\tt -sp}.  In such a semantic match, any token
1084 beginning with a capital letter is assumed to be a metavariable of type
1085 {\tt metavariable}.  In this case, the parser must be able to figure out what
1086 kind of metavariable it is.  It is also possible to specify the type of a
1087 metavariable by enclosing the type in :'s, concatenated directly to the
1088 metavariable name.
1089
1090 Some examples of semantic matches that can be given as an argument to {\tt
1091   -sp} are as follows:
1092
1093 \begin{itemize}
1094 \item \texttt{f(e)}: This only matches the expression \texttt{f(e)}.
1095 \item \texttt{f(E)}: This matches a call to f with any argument.
1096 \item \texttt{F(E)}: This gives a parse error; the semantic patch parser
1097   cannot figure out what kind of metavariable \texttt{F} is.
1098 \item \texttt{F:identifier:(E)}: This matches any one argument function
1099   call.
1100 \item \texttt{f:identifier:(e:struct foo *:)}: This matches any one
1101   argument function call where the argument has type \texttt{struct foo
1102     *}.  Since the types of the metavariables are specified, it is not
1103   necessary for the metavariable names to begin with a capital letter.
1104 \item \texttt{F:identifier:(F)}: This matches any one argument function call
1105   where the argument is the name of the function itself.  This example
1106   shows that it is not necessary to repeat the metavariable type name.
1107 \item \texttt{F:identifier:(F:identifier:)}: This matches any one argument
1108   function call
1109   where the argument is the name of the function itself.  This example
1110   shows that it is possible to repeat the metavariable type name.
1111 \end{itemize}
1112
1113 \texttt{When} constraints, \textit{e.g.} \texttt{when != e}, are allowed
1114 but the expression \texttt{e} must be represented as a single token.
1115
1116 The generated semantic match behaves as though there were a \texttt{*} in front
1117 of every token.
1118
1119 %%% Local Variables:
1120 %%% mode: LaTeX
1121 %%% TeX-master: "main_grammar"
1122 %%% coding: utf-8
1123 %%% TeX-PDF-mode: t
1124 %%% ispell-local-dictionary: "american"
1125 %%% End: