multiline search for simple cases #360

timotheecour · 2017-02-12T07:11:47Z

even though ripgrep is line oriented and full multiline search is sadly out of question (after reading #176), supporting simple cases would be very useful, eg:

full description

--multiline=[unordered|ordered] --window_size=N [--pattern $regex_i ...]

will return a list of matches file:line such that:

with --unordered:

file:line contains one of the K requested patterns
the K-1 other patterns can be found in file at lines [line...line+N)

with --ordered:

file:line contains the 1st of the K requested patterns
the K-1 other patterns can be found in file at lines [line...line+N) in the order in which they are given on command line

--window_size=N represents the context to search for the K matches. When unspecified, N defaults to infinity (ie all patterns must occur in file from line to end of file

When only a single pattern is used, it has same behavior as standard rg

Examples

foo.d:

// first line
auto getFoo(string[] bar){
  // some comment
  return bar.sort.uniq;
}

$ rg --multiline=ordered --window_size=3 --pattern `string[]` --pattern `sort\b`
foo.d:2:auto getFoo(string[] bar){
foo.d:4%return bar.sort.uniq;

$ rg --multiline=ordered --window_size=2 --pattern `string[]` --pattern `sort\b`
# no match, `foo.d:4` is beyond `foo.d:[2...2+window_size)

$ rg --multiline=ordered --window_size=3 --pattern `sort\b` --pattern `string[]`
# no match, the first pattern matches at `foo.d:4` and we can't find the 2nd pattern in `foo.d:[4...4+3)`

$ rg --multiline=unordered --window_size=3 --pattern `sort\b` --pattern `string[]`
# now there's a match since we provided `unordered`
foo.d:2:auto getFoo(string[] bar){
foo.d:4%return bar.sort.uniq;

The text was updated successfully, but these errors were encountered:

BurntSushi · 2017-02-12T15:28:30Z

I don't understand the feature request. I don't know what window_size means and I don't know what --multiline does and I don't know the difference between unordered and ordered. It sounds like you have a lot of context in your head that you need to put into writing as a specification.

Tip: consider reusing the -A/-B/-C flags.

BurntSushi · 2017-02-12T15:31:30Z

FWIW, I am personally sympathetic to the idea of running regexes against the context of a match (but where each individual regex can still only match in a single line). That is certainly feasible to do in a way that true multiline search is not. However, it is still a significant addition in terms of implementation.

Please also keep in mind that I am very against a complicated UX. A complicated UX is one with lots of knobs that only implementors understand.

timotheecour · 2017-02-13T00:51:25Z

@BurntSushi I've rewritten the proposal, let me know what you think!
This would be super useful as a generic advanced code search among other things. Happy to discuss implementation details if needed
each individual regex can still only match in a single line

yes, that's the case

To implement this (efficiently) using an external tool built on top of (lib)ripgrep, here's what i currently have to emulate a multiline search:

rg --multiline=unordered --window_size=4 --pattern 'green' --pattern '\?old'
rg --column --no-heading -A 4 '((green)|(\?old))' --replace '@P1{{$2}P2{$3}}@'

and then parse output looking for @P1{{...}}@, deducing each match from the regex expansions. That's of course no fun and super brittle.

Could we for starters have a machine output (json output option to allow scripting #359) that tells us each match with no information loss for a disjunctive regex of the form (pattern1|pattern2|...)
file:offset:length:regex_index
with regex_index indicating which pattern got matched (eg: 0 for green, 1 for ?old in this example) and how many bytes the corresponding match is

BurntSushi · 2017-02-18T17:49:06Z

Parseable output is here: #244 --- Having that return the index of the regex that matched is not going to happen, since, in the current implementation, that would have a big performance hit.

Overall, I find the feature proposed here to be way too complicated. It doesn't reuse existing flags (like -A/-B/-C) to search contexts and instead invents a new "window size" concept. The --multiline=ordered|unordered modes don't really make sense to me and seem very unintuitive.

Your needs would be better met by writing code.

BurntSushi added the question An issue that is lacking clarity on one or more points. label Feb 12, 2017

timotheecour mentioned this issue Feb 17, 2017

Feature idea: "context matches" #346

Closed

BurntSushi closed this as completed Feb 18, 2017

BurntSushi mentioned this issue Mar 17, 2017

support searching across multiple lines #176

Closed

This was referenced Feb 14, 2018

--line-number-width doesn't work well with --no-heading #795

Closed

fix issue #359 --machine-readable #802

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiline search for simple cases #360

multiline search for simple cases #360

timotheecour commented Feb 12, 2017 •

edited

Loading

BurntSushi commented Feb 12, 2017

BurntSushi commented Feb 12, 2017

timotheecour commented Feb 13, 2017 •

edited

Loading

BurntSushi commented Feb 18, 2017

multiline search for simple cases #360

multiline search for simple cases #360

Comments

timotheecour commented Feb 12, 2017 • edited Loading

full description

Examples

BurntSushi commented Feb 12, 2017

BurntSushi commented Feb 12, 2017

timotheecour commented Feb 13, 2017 • edited Loading

BurntSushi commented Feb 18, 2017

timotheecour commented Feb 12, 2017 •

edited

Loading

timotheecour commented Feb 13, 2017 •

edited

Loading