-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regexes appear to be significantly slower than re2 in some cases #16989
Comments
Indeed! RE2's C++ implementation is extremely sophisticated. It has a DFA for very quickly finding matches and the location of an overall match. A full NFA simulation is used when you need submatches (and this is made more performant, I think, by running a DFA first to limit the range of text searched by the NFA). This includes doing wonderfully crazy things like reversing the regex itself. Also, I believe RE2/C++ uses parallelism over DFA states, but I'm not entirely sure how that works. The Rust implementation uses a full NFA simulation. Full stop. The With that said, the main NFA simulation (not the native code specialization) should be roughly on par with Go's RE2 for Go and C++ were written by the same person (Russ Cox). The sophistication of Rust's So yes, Rust's regexes are going to be slower than RE2/C++'s regexes until its implementation becomes more sophisticated. (I've always tried to be very careful to say "RE2/Go" or "RE2/C++", but I'm sure that got lost in the kerfuffle. RE2 is more readily identified with the C++ library, and I don't think I've ever expected my implementation to be on par with it. RE2/Go? Yeah. On par, or close to it. When you add native regexes to the mix, Rust will outperform Go's implementation I think.) |
Thanks for the explanation, @BurntSushi. |
IIRC those regexes are pretty simple. Maybe we can add some hacks to make them (and only them) fast? |
@BurntSushi do you have a sense of the scope of a project to add the advanced optimizations from RE2/C++? For example, could a student do it in a month? |
@brson I think it's possible. Another idea might be to start them with the "one pass" NFA optimization. Russ Cox briefly describes it here: http://swtch.com/~rsc/regexp/regexp3.html (Do a CTRL-F for Also, in general, Russ's regexp articles are required reading: http://swtch.com/~rsc/regexp/ --- Rust's regex implementation is based on them. |
I meant to move this to the regex crate :( |
I'm pulling a massive triage effort to get us ready for 1.0. As part of this, I'm moving stuff that's wishlist-like to the RFCs repo, as that's where major new things should get discussed/prioritized. This issue has been moved to the RFCs repo: rust-lang/regex#24 |
…m-associated-types, r=Veykril fix: ensure implied bounds from associated types are considered in autocomplete closes: rust-lang#16989 rust-analyzer needs to consider implied bounds from associated types in order to get all methods suggestions people expect. A pretty easy way to do that is to keep the `candidate_trait_id`'s receiver if it matches `TyFingerprint::Unnameable`. When benchmarking this change, I didn't notice a meaningful difference in autocomplete latency. (`TyFingerprint::Unnameable` corresponds to `TyKind::AssociatedType`, `TyKind::OpaqueType`, `TyKind::FnDef`, `TyKind::Closure`, `TyKind::Coroutine`, and `TyKind::CoroutineWitness`.)
My understanding is that we believe them to be competitive, but two benchmarks I've seen were not really in the ballpark. The easiest one to test is the shootout's regexdna, which you can see from the upstream shootout is drastically slower than the re2-based C++ implementation.
cc @BurntSushi
The text was updated successfully, but these errors were encountered: