-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor datetime searching and file processing to support "forward seek" mode or "random seek" mode #283
Labels
code improvement
enhancement not seen by the user
difficult
A difficult problem; a major coding effort or difficult algorithm to perfect
file parser
Comments
jtmoon79
added
code improvement
enhancement not seen by the user
difficult
A difficult problem; a major coding effort or difficult algorithm to perfect
file parser
labels
Apr 20, 2024
jtmoon79
changed the title
refactor datetime searching and general file processing to support "forward seek" mode or "random seek" mode
refactor datetime searching and file processing to support "forward seek" mode or "random seek" mode
Apr 20, 2024
This was referenced Apr 20, 2024
jtmoon79
added a commit
that referenced
this issue
May 6, 2024
jtmoon79
added a commit
that referenced
this issue
May 6, 2024
support parsing lz4 compressed files `.lz4` refactor processing sequential-only syslogs (compressed logs that can only be read linearly; no binary searching) add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those add tests in compare-current-and-expected add lz4 flamegraph in flamegraphs.sh Fix Issue #201 with tests that had unknown panics for blockreader processing gzip files. Issue PSeitz/lz4_flex#159 Issue #201 Issue #128 Issue #291 Issue #283
jtmoon79
added a commit
that referenced
this issue
May 6, 2024
support parsing lz4 compressed files `.lz4` refactor processing sequential-only syslogs (compressed logs that can only be read linearly; no binary searching) add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those add tests in compare-current-and-expected add lz4 flamegraph in flamegraphs.sh Fix Issue #201 with tests that had unknown panics for blockreader processing gzip files. Issue PSeitz/lz4_flex#159 Issue #201 Issue #128 Issue #291 Issue #283
jtmoon79
added a commit
that referenced
this issue
May 7, 2024
support parsing lz4 compressed files `.lz4` refactor processing sequential-only syslogs (compressed logs that can only be read linearly; no binary searching) add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those add tests in compare-current-and-expected add lz4 flamegraph in flamegraphs.sh Fix Issue #201 with tests that had unknown panics for blockreader processing gzip files. Issue PSeitz/lz4_flex#159 Issue #201 Issue #128 Issue #291 Issue #283
jtmoon79
added a commit
that referenced
this issue
May 7, 2024
support parsing lz4 compressed files `.lz4` refactor processing sequential-only syslogs (compressed logs that can only be read linearly; no binary searching) add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those add tests in compare-current-and-expected add lz4 flamegraph in flamegraphs.sh Fix Issue #201 with tests that had unknown panics for blockreader processing gzip files. Issue PSeitz/lz4_flex#159 Issue #201 Issue #128 Issue #291 Issue #283
jtmoon79
added a commit
that referenced
this issue
May 7, 2024
support parsing lz4 compressed files `.lz4` refactor processing sequential-only syslogs (compressed logs that can only be read linearly; no binary searching) add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those add tests in compare-current-and-expected add lz4 flamegraph in flamegraphs.sh Fix Issue #201 with tests that had unknown panics for blockreader processing gzip files. README.md mention lz4 and update comparison table Issue PSeitz/lz4_flex#159 Issue #201 Issue #128 Issue #291 Issue #283
jtmoon79
added a commit
that referenced
this issue
May 7, 2024
support parsing lz4 compressed files `.lz4` refactor processing sequential-only syslogs (compressed logs that can only be read linearly; no binary searching) add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those add tests in compare-current-and-expected add lz4 flamegraph in flamegraphs.sh Fix Issue #201 with tests that had unknown panics for blockreader processing gzip files. README.md mention lz4 and update comparison table Issue PSeitz/lz4_flex#159 Issue #201 Issue #128 Issue #291 Issue #283
jtmoon79
added a commit
that referenced
this issue
May 7, 2024
support parsing lz4 compressed files `.lz4` refactor processing sequential-only syslogs (compressed logs that can only be read linearly; no binary searching) add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those add tests in compare-current-and-expected add lz4 flamegraph in flamegraphs.sh Fix Issue #201 with tests that had unknown panics for blockreader processing gzip files. README.md mention lz4 and update comparison table Issue PSeitz/lz4_flex#159 Issue #201 Issue #128 Issue #291 Issue #283
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
code improvement
enhancement not seen by the user
difficult
A difficult problem; a major coding effort or difficult algorithm to perfect
file parser
Current Behavior
This project's original design used a binary search algorithm to quickly find log messages within a user-passed datetime window.
The main benefit of this complication was to avoid unnecessary disk reads. i.e. instead of reading the entire file from disk, only a subset of Blocks were read from disk (defaulting to 64KB-sized Blocks). This supported the author's originating scenario where very large log files (GB sized) were read over a low-bandwidth high-latency network connection from an overburdened SMB share.
This binary search design affects structs
SyslogProcessor
,SyslineReader
, andBlockReader
.However...
Having random file seeks (seeks that may go backwards) does not dovetail with compressed files; compressed files must always be read from the beginning of the file up to the requested file offset. (see #12 (comment)).
This creates a problem when performing a binary search: any "jumps backwards" must either
and
Suggested behavior
Some entity should decide if a file can be read with "random seeks" or "sequential seeks".
The overlying users of
BlockReader
, (SyslogProcessor
and it'sSyslineReader
) must know the datetime search strategy to use. So some entity must decide on the "search mode" ahead of time, most likely a function infilepreprocessor.rs
.Then those entities that do the search for the datetime nearest the beginning of the user-passed datetime window, those entities would perform the search strategy that is appropriate, i.e. "sequential read mode" implies a linear search from the file beginning, "random read mode" implies a binary search.
Other
Given a sequential read mode, this is one step toward fixing the problem highlighted by Issue #12, as the file size of the uncompressed file would not need to be known. If the file size does not need to be known then this is one step toward implementing Chained Block Reads (Issue #14).
Also, reading in a "forward only" linear read mode has an incremental approach will be necessary for Issue #7.
Also, a sequential/linear read mode means Blocks can be progressively dropped as the file is processed. This would improve the problem of high memory usage for compressed files (Issue #182).
This search strategy difference is also hinted-at in various code comments;
syslogprocessor.rs#L822-L828
,syslinereader.rs#L2474-L2520
.The text was updated successfully, but these errors were encountered: