-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slower than grep: test case on 10M-line text file #617
Comments
Verified with rg 0.6.0 (-AVX -SIMD) against GNU grep 3.1. |
The reason is likely because the corpus is using random data. More analysis would be a good idea though of course. |
On second thought, this actually appears to be related to the length of the query string, which probably in turn means that ripgrep is slower because it's not actually using Boyer Moore. On short strings, the benefit of Boyer Moore isn't that great (and is indeed one reason why ripgrep is faster in a large number of common cases), but on longer strings, Boyer Moore means we can skip more of the input, which adds up over time. Here's an example that adds some indirect evidence of this:
This reproduces the OP's result: ripgrep is slower. Let's start shrinking the length of the query pattern. The gap narrows:
And shorten it some more:
This definitely smells like Boyer Moore to me. Thanks for the report! This bug is more properly fixed in the regex engine, so I've file a ticket: rust-lang/regex#408 --- I'm going to close this one. |
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
Add an implimentation of Tuned Boyer-Moore. While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: #408 See: [ripgrep-617](BurntSushi/ripgrep#617)
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
Add an implimentation of Tuned Boyer-Moore. While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: #408 See: [ripgrep-617](BurntSushi/ripgrep#617)
The regex update fixes the Rust nightly build failure by in turn updating its simd dependency to 2.x. The regex update also includes a literal optimization that uses Tuned Boyer Moore. Fixes #617
The regex crate now has an implementation of TBM, and applies to the OP's case. It will be included in the next release of ripgrep. Note though that inputs can be crafted such that TBM won't be used, and there will be an opportunity for GNU grep to be faster. There is nothing to be done here. It is a consequence of using different literal searching algorithms that are built on heuristics. |
The regex update fixes the Rust nightly build failure by in turn updating its simd dependency to 2.x. The regex update also includes a literal optimization that uses Tuned Boyer Moore. Fixes #617
On your blog you're asking to file an issue with something you can reproduce, so here we go.
Ripgrep 0.60 -AVX -SIMD is slower than GNU grep 2.27 when searching for the last line in a tmpfs-mounted 620MiB file containing 64-character lines, as shown in this log:
Subsequent executions show similar timings.
The text was updated successfully, but these errors were encountered: