Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiline search for simple cases #360

Closed
timotheecour opened this issue Feb 12, 2017 · 4 comments
Closed

multiline search for simple cases #360

timotheecour opened this issue Feb 12, 2017 · 4 comments
Labels
question An issue that is lacking clarity on one or more points.

Comments

@timotheecour
Copy link

timotheecour commented Feb 12, 2017

even though ripgrep is line oriented and full multiline search is sadly out of question (after reading #176), supporting simple cases would be very useful, eg:

full description

  • --multiline=[unordered|ordered] --window_size=N [--pattern $regex_i ...]

will return a list of matches file:line such that:

with --unordered:

  • file:line contains one of the K requested patterns
  • the K-1 other patterns can be found in file at lines [line...line+N)

with --ordered:

  • file:line contains the 1st of the K requested patterns
  • the K-1 other patterns can be found in file at lines [line...line+N) in the order in which they are given on command line

--window_size=N represents the context to search for the K matches. When unspecified, N defaults to infinity (ie all patterns must occur in file from line to end of file

When only a single pattern is used, it has same behavior as standard rg

Examples

foo.d:

// first line
auto getFoo(string[] bar){
  // some comment
  return bar.sort.uniq;
}
$ rg --multiline=ordered --window_size=3 --pattern `string[]` --pattern `sort\b`
foo.d:2:auto getFoo(string[] bar){
foo.d:4%return bar.sort.uniq;

$ rg --multiline=ordered --window_size=2 --pattern `string[]` --pattern `sort\b`
# no match, `foo.d:4` is beyond `foo.d:[2...2+window_size)

$ rg --multiline=ordered --window_size=3 --pattern `sort\b` --pattern `string[]`
# no match, the first pattern matches at `foo.d:4` and we can't find the 2nd pattern in `foo.d:[4...4+3)`

$ rg --multiline=unordered --window_size=3 --pattern `sort\b` --pattern `string[]`
# now there's a match since we provided `unordered`
foo.d:2:auto getFoo(string[] bar){
foo.d:4%return bar.sort.uniq;
@BurntSushi
Copy link
Owner

I don't understand the feature request. I don't know what window_size means and I don't know what --multiline does and I don't know the difference between unordered and ordered. It sounds like you have a lot of context in your head that you need to put into writing as a specification.

Tip: consider reusing the -A/-B/-C flags.

@BurntSushi BurntSushi added the question An issue that is lacking clarity on one or more points. label Feb 12, 2017
@BurntSushi
Copy link
Owner

FWIW, I am personally sympathetic to the idea of running regexes against the context of a match (but where each individual regex can still only match in a single line). That is certainly feasible to do in a way that true multiline search is not. However, it is still a significant addition in terms of implementation.

Please also keep in mind that I am very against a complicated UX. A complicated UX is one with lots of knobs that only implementors understand.

@timotheecour
Copy link
Author

timotheecour commented Feb 13, 2017

  • @BurntSushi I've rewritten the proposal, let me know what you think!
    This would be super useful as a generic advanced code search among other things. Happy to discuss implementation details if needed

  • each individual regex can still only match in a single line

yes, that's the case

  • To implement this (efficiently) using an external tool built on top of (lib)ripgrep, here's what i currently have to emulate a multiline search:
rg --multiline=unordered --window_size=4 --pattern 'green' --pattern '\?old'
rg --column --no-heading -A 4 '((green)|(\?old))' --replace '@P1{{$2}P2{$3}}@'

and then parse output looking for @P1{{...}}@, deducing each match from the regex expansions. That's of course no fun and super brittle.

  • Could we for starters have a machine output (json output option to allow scripting #359) that tells us each match with no information loss for a disjunctive regex of the form (pattern1|pattern2|...)
    file:offset:length:regex_index
    with regex_index indicating which pattern got matched (eg: 0 for green, 1 for ?old in this example) and how many bytes the corresponding match is

@BurntSushi
Copy link
Owner

Parseable output is here: #244 --- Having that return the index of the regex that matched is not going to happen, since, in the current implementation, that would have a big performance hit.

Overall, I find the feature proposed here to be way too complicated. It doesn't reuse existing flags (like -A/-B/-C) to search contexts and instead invents a new "window size" concept. The --multiline=ordered|unordered modes don't really make sense to me and seem very unintuitive.

Your needs would be better met by writing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question An issue that is lacking clarity on one or more points.
Projects
None yet
Development

No branches or pull requests

2 participants