Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False mismatch for most of the file #1188

Closed
ghost opened this issue Feb 7, 2019 · 3 comments
Closed

False mismatch for most of the file #1188

ghost opened this issue Feb 7, 2019 · 3 comments
Labels
duplicate An issue that is duplicative of another.

Comments

@ghost
Copy link

ghost commented Feb 7, 2019

What version of ripgrep are you using?

with cargo install ripgrip:

ripgrep 0.10.0
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

with github release binary:

ripgrep 0.10.0 (rev 8a7db1a918)
-SIMD -AVX (compiled)

How did you install ripgrep?

  1. with cargo install ripgrep
  2. with github release binary

What operating system are you using ripgrep on?

Windows 10 Enterprise
Version: 1709
OS Build: 16299.904

Describe your question, feature request, or bug.

rg exits before matching all there is or, alternatively, thinks those matches failed. I'm running:

rg -N "^[^ ]+ (2018|2019)-" file.log >output.log

The input file size is 7735mb, the output file is 1229mb.

When I'm running:

grep "^[^ ]\+ \(2018\|2019\)-" file.log >output.log

The output file is 5948mb.

When I look into rg's result it just looks like file.log got abandoned somewhere along the way and rg just quit without completing the job. No error messages.

If this is a bug, what are the steps to reproduce the behavior?

See above, reproduces every time.

If this is a bug, what is the actual behavior?

DEBUG|grep_regex::literal|grep-regex\src\literal.rs:110: required literal found: " 201"
DEBUG|globset|globset\src\lib.rs:429: built glob set; 0 literals, 0 basenames, 8 extensions, 0 prefixes, 0 suffixes, 0 required extensions,
0 regexes
DEBUG|grep_searcher::searcher::mmap|grep-searcher\src\searcher\mmap.rs:86: gm.log: failed to open memory map: file length overflows usize

If this is a bug, what is the expected behavior?

To match all the input.

@BurntSushi
Copy link
Owner

Can you should the output with the --trace flag? It's going to be hard to debug this without the input file. The current size of the file is probably too big, so if you can find a way to shrink the test case such that the problem still occurs, that would be helpful.

My guess is that your log file contains binary data (a NUL byte), which is causing ripgrep to give up. If so, then passing the -a flag to ripgrep should fix it. With that said, I'd probably expect grep to have the same behavior here, so something isn't adding up.

@BurntSushi BurntSushi added the question An issue that is lacking clarity on one or more points. label Feb 7, 2019
@ghost
Copy link
Author

ghost commented Feb 8, 2019

@BurntSushi You are right, completely bizzarely, our Java app produced log messages that contained NUL characters. I'm not even sure how that's technically possible, because all log messages should be java.lang.String which is pure Unicode. -a solves the problem, thanks. But still, GNU grep doesn't seem to care about it and rg didn't give any stderr warning, so it looked fishy.

@BurntSushi
Copy link
Owner

All righty. I'm going to call this a duplicate of #306 then. See my comment in that issue for a deeper explanation of the issue. Thanks for the report!

@BurntSushi BurntSushi added duplicate An issue that is duplicative of another. and removed question An issue that is lacking clarity on one or more points. labels Feb 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate An issue that is duplicative of another.
Projects
None yet
Development

No branches or pull requests

1 participant