-
Notifications
You must be signed in to change notification settings - Fork 205
Sgrep
Contents
The goal of sgrep
is to allow programmers to
express complex code patterns while using a syntax they
already are familiar with. For instance to find all comparisons
of strstr
to false
one can simply write:
$ sgrep -e 'strstr(...) == false' foo.php
or:
$ sgrep -e 'strstr(...) == false' <directory>
to process all PHP files recursively under <directory>
.
This will work even if the expression is split across multiple lines in
the PHP files or has extra spaces between ==
and false
,
because sgrep
works at the abstract syntax tree level, not at the
token or string level like grep
.
See also Spatch to not only match but also transform code patterns.
See https://github.com/facebook/pfff/blob/master/main_sgrep.ml
A current solution when one wants to find code is to use grep
. It is
fine when the pattern is simple, such as the name of a function, but
very tedious to use when one wants to find certain kinds of calls. For
instance to find all calls to foo
where the second argument is 1
one could write the foo(.*, 1, .*)
regexp but this would not handle function
calls split across multiple lines, or using different amount of space,
or function having nested function calls as arguments. Working at
a string-level is not the adequate level. With sgrep
one can simply do:
$ sgrep -e 'foo(X, 1, ...)' *.php
Another solution would be to use a compiler frontend and write a visitor on the abstract syntax tree that recognizes the complex pattern. Unfortunately this is also tedious to write as a compiler frontend is usually a large software and the abstract syntax tree is a complex structure.
The idea of sgrep
is to mix the convenience of grep
with the
correctness and precision of a compiler frontend.
Note that sgrep
is for very precise matching. Most of the time you would
be fine with grep
, but for the few occasions where you need precise
matching, then sgrep
can be a useful "refiner".
The synopsis is:
$ sgrep [-lang <lang>] [ -pvar <var> ] -e <pattern> <files_or_dirs>
For instance to find certain patterns of use of strstr
, do:
$ sgrep -e 'strstr(...) == false' *.php
There is support for a few programming languages. See Matrix to check for your favourite programming language.
You can use metavariables that match any expression:
$ sgrep -e 'foo(X)' *.php
This will match code such as foo(1+1)
.
The metavariable has to be a single upper case letter (so that you
can also match regular constants if you want, hoping nobody use constants
with a single letter) and optionally
followed by a number and some optional _
whatever,
e.g. X
, X1
, Y_ILOVEPUPPIES
.
NEW: You can use multiple times the same metavariable in which case the pattern will match only if all the occurences of the metavariable have the same value. For instance:
$ sgrep -e 'X && X' *.php
will find all binary And operations where both operands are the same (which is usually buggy code).
NEW: You can also use the -pvar flag of sgrep to print not the matched code but the matched metavariables. For instance:
$ sgrep -pvar X -e 'X && X' *.php
will print the content of the matched metavariable X.
If you want to match function calls with a PHP variable, use $X, $Y, or any uppercase letter as in the previous section as in:
$ sgrep -e 'foo($X)' *.php
this will match foo($a)
, foo($b)
, but not foo(1)
!!
Use instead foo(X)
to match any expression.
You can also use metavariables for XHP attribute values as in:
$ sgrep -e '<ui:section-header border=X></ui:section-header>' *.php
You can also use '...' in arguments at the end to say you dont care about the other args as in:
$ sgrep -e 'foo(1, ...)' *.php
this will match foo(1)
, foo(1,2)
, etc.
NEW This also works in array expression as in array(...)
.
You can use "..." in a pattern to say you want to match only constant strings as in:
$ sgrep -e 'foo("...")' *.php
This with match foo("foo")
, foo("")
, but not foo(1)
.
NEW You can also bind metavariable to string content as in
foo("X")
. Because PHP has no first-class function or class,
it's quite common to pass around function or class name via strings.
The metavariable X above is then binded to the content of the string
without the quote, so it can be used another time to match
a class name.
NEW You can use '=~/.../ in a pattern to say you want regexp matching (using the Perl regexp syntax) as in:
$ sgrep -e 'foo("=~/^cst/")' *.php
The principle of sgrep
is to take a pattern
and match it over a source file. By using metavariables we
get a more flexible pattern that can accomodate more source files.
In the same way even if the pattern contains
extra spaces between tokens, or if an expression is split on multiple lines,
it will still match source files using a different indentation style because
sgrep
works at the AST level.
Here are a few other tricks done by sgrep
called isomorphisms
which allow the pattern to accomodate more source files:
NEW: People abuse assignements in PHP to mimic keyword argument passing as
in Smalltalk. sgrep
can handle such equivalence/isomorphism:
$ sgrep -e 'foo(true)' *.php
will match foo(true)
as well as foo($x=true)
.
In XHP, attributes can be given in any order but we actually don't care about that order. When we write a pattern like:
<x:frag border="1" foo="2"></x:frag>
we want it to match even code like:
<x:frag foo="2" border="1"></x:frag>
Actually we also want it by default to match code like:
<x:frag foo="2" bar="3" border="1"></x:frag>
or code like:
<x:frag foo="2" border="1" foobar="3">this is a body</x:frag>
To accomodate those needs the sgrep
code matching engine
has hardcoded a few equivalences (isomorphisms) regarding XHP.
You can write any expression as a pattern e.g.:
$ sgrep -e '1+X' *.php $ sgrep -e 'foo(bar(foobar(X, ...), ..., 2, "large", $X, $Y)))' *.php
Here are example to find bugs:
$ sgrep -e 'strstr(...) == false' $ sgrep -e 'fbt($X)' $ sgrep -e 'fbt(X . $Y)' $ sgrep -e 'fbt($X . Y)'
See also https://github.com/facebook/pfff/blob/master/lang_php/matcher/unit_matcher_php.ml for some unit tests showing the capabilities of sgrep.
sgrep
is significantly slower than grep
because it works on a more
complex structure than a stream of characters, the abstract syntax
tree. Nevertheless you can combine it with git grep
piped to xargs
to speedup things:
$ git grep -l foo |xargs sgrep -e 'foo(X, "large", ...)'
Look at pfff/editor/emacs/sgrep.el
.
If the syntactical grep notation is not expressive enough for your search needs, you can try to express your match by using the internal pfff API that works on the ASTs of the source code.
One difficulty is to find which OCaml constructor corresponds to which PHP
construct (which was one of the main motivation behind sgrep).
To alleviate the problem the pfff
command line tool
has a flag, -dump_php
, that allows to output on stdout the internal
representation of a program. This output can then be copy pasted directly
into a .ml file; it is a valid OCaml pattern. Here is an example:
$ pfff -dump_php demos/foo.php [FuncDef( {f_tok: i_2; f_ref: None; f_name: Name(("foo", i_3)); f_params: (i_4, [Left( ...
See https://github.com/facebook/pfff/blob/master/demos/simple_code_search.ml for a full example.
See also https://github.com/facebook/pfff/blob/master/lang_php/analyze/foundation/include_require_php.ml
for complex code patterns copy pasted from pfff -dump_php
.
Use an expression metavariable, as in:
$ sgrep -e 'X->addPreparable(...)'
You can not find method calls with sgrep -e 'addPreparable(...)' because this will be parsed as a function call, not a method call. To match methods you need to use the -> syntax and so find something on the left of the arrow.
Because sgrep works at the Abstract Syntax Tree level where a function call is considered something different than an object instantiation.
Allow more complex patterns, allow to match over statements, not just expressions, or functions, or classes.