Skip to content

Commit

Permalink
src support lz4, refactor processing linear syslogs
Browse files Browse the repository at this point in the history
support parsing lz4 compressed files `.lz4`

refactor processing sequential-only syslogs (compressed logs that can only
be read linearly; no binary searching)

add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those

add tests in compare-current-and-expected

add lz4 flamegraph in flamegraphs.sh

Fix Issue #201 with tests that had unknown panics for blockreader
processing gzip files.

README.md mention lz4 and update comparison table

Issue PSeitz/lz4_flex#159
Issue #201
Issue #128
Issue #291
Issue #283
  • Loading branch information
jtmoon79 committed May 7, 2024
1 parent 56b3552 commit c017bc2
Show file tree
Hide file tree
Showing 39 changed files with 1,259 additions and 217 deletions.
46 changes: 40 additions & 6 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ kinded = "0.3.0"
lazy_static = "1.4.0"
lru = "0.12.3"
lzma-rs = "0.3.0"
lz4_flex = "0.11"
memoffset = "0.9.1"
mime_guess = "2.0.4"
min-max = "0.1.8"
Expand Down
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ formats, including multi-line log messages.
It also parses binary accounting records acct, lastlog, and utmp
(`acct`, `pacct`, `lastlog`, `utmp`, `utmpx`, `wtmp`),
systemd journal logs (`.journal`), and Microsoft Event Logs (`.evtx`).
`s4` can read compressed logs (`.gz`, `.xz`), or archived logs (`.tar`).<sup><a href="#f3">\[3\]</a></sup>
`s4` can read compressed logs (`.gz`, `.lz4`, `.xz`), or archived logs (`.tar`).<sup><a href="#f3">\[3\]</a></sup>

The first goal of `s4` is speedy searching and printing.

Expand Down Expand Up @@ -466,11 +466,13 @@ See the real-world example rationale in the section below,

### Hacks

- Entire `.lz4` files are read once before processing ([Issue #293])
- Entire `.xz` files are read into memory before printing ([Issue #12])
- Entire `.evtx` files are read into memory before printing ([Issue #86])
- Entire [user accounting record files are read into memory] before printing

[user accounting record files are read into memory]: https://docs.rs/super_speedy_syslog_searcher/0.6.70/s4lib/readers/fixedstructreader/struct.FixedStructReader.html#summary-of-operation
[Issue #293]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/293

<br/>

Expand Down Expand Up @@ -550,13 +552,13 @@ XXX: I could not get `logdissect.py` to work for any "parser" for any standard R

#### Archive Formats Supported

|Program |`.gz` |`.bz`/`.bz2` |`.xz` |`.tar`|`.zip`|
|- |- |- |- |- |- |
|`grep \| sort` |`zgrep`|`bzip2` |`xz` |||
|`s4` ||[](https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/40)|||[](https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/39)|
|`logmerger` ||||||
|`tl` ||||||
|`logdissect.py`||||||
|Program |`.gz` |`.lz4` |`.bz`/`.bz2` |`.xz` |`.tar`|`.zip`|
|- |- |- |- |- |- |- |
|`grep \| sort` |`zgrep`|`lz4`|`bzip2` |`xz` |||
|`s4` |||[](https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/40)|||[](https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/39)|
|`logmerger` ||| ||||
|`tl` |||||||
|`logdissect.py`||| ||||

---

Expand Down
Binary file added logs/Debian11/aarch64_ARM64/wtmp.lz4
Binary file not shown.
Binary file modified logs/other/tests/dtf2-2.log.lz
Binary file not shown.
Binary file added logs/other/tests/dtf2-2.log.lz4
Binary file not shown.
Binary file added logs/other/tests/dtf2-2.log.tar.lz4
Binary file not shown.
Binary file modified logs/other/tests/dtf2-2.lz
Binary file not shown.
Binary file added logs/other/tests/dtf2-2.lz4
Binary file not shown.
Binary file modified logs/other/tests/dtf2-2.tar.lz
Binary file not shown.
Binary file added logs/other/tests/dtf2-2.tar.lz4
Binary file not shown.
Binary file added logs/other/tests/gen-1000-3-foobar.log.lz4
Binary file not shown.
Binary file modified logs/programs/utmp/host-entry6.wtmp.lz
Binary file not shown.
Binary file added logs/programs/utmp/host-entry6.wtmp.lz4
Binary file not shown.
8 changes: 8 additions & 0 deletions src/bin/s4.rs
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,11 @@ const CLI_OPT_PREPEND_FMT: &str = "%Y%m%dT%H%M%S%.3f%z";
/// `--help` _afterword_ message.
const CLI_HELP_AFTER: &str = concatcp!(
"\
Given a file path, the file will be processed based on a best guess of the file
name. If the format is not guessed then it will be parsed as a UTF8 text file.
If a file path is a directory then file names that have well known non-log file
extensions will be skipped.
DateTime Filters may be strftime specifier patterns:
\"",
CLI_FILTER_PATTERNS[0].0,
Expand Down Expand Up @@ -487,6 +492,9 @@ DateTimes supported are only of the Gregorian calendar.
DateTimes supported language is English.
Further background and examples are at the project website:
https://github.com/jtmoon79/super-speedy-syslog-searcher/
Is s4 failing to parse a log file? Report an Issue at
https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/new/choose
"#
Expand Down
11 changes: 11 additions & 0 deletions src/common.rs
Original file line number Diff line number Diff line change
Expand Up @@ -573,6 +573,8 @@ pub enum FileTypeArchive {
///
/// Presumed to contain one regular file; see Issue #8
Gz,
/// a compressed LZIP file, e.g. `log.lzma`
Lz4,
/// a file within a `.tar` archive file
Tar,
/// a file compressed "xz'd" file, e.g. `log.xz`
Expand All @@ -589,6 +591,7 @@ impl std::fmt::Display for FileTypeArchive {
match self {
FileTypeArchive::Normal => write!(f, "Normal"),
FileTypeArchive::Gz => write!(f, "gzip"),
FileTypeArchive::Lz4 => write!(f, "lz4"),
FileTypeArchive::Tar => write!(f, "tar"),
FileTypeArchive::Xz => write!(f, "xz"),
}
Expand Down Expand Up @@ -728,18 +731,22 @@ impl FileType {
match self {
FileType::Evtx{ archival_type: FileTypeArchive::Normal } => false,
FileType::Evtx{ archival_type: FileTypeArchive::Gz } => true,
FileType::Evtx{ archival_type: FileTypeArchive::Lz4 } => true,
FileType::Evtx{ archival_type: FileTypeArchive::Tar } => false,
FileType::Evtx{ archival_type: FileTypeArchive::Xz } => true,
FileType::FixedStruct{ archival_type: FileTypeArchive::Normal, .. } => false,
FileType::FixedStruct{ archival_type: FileTypeArchive::Gz, .. } => true,
FileType::FixedStruct{ archival_type: FileTypeArchive::Lz4, .. } => true,
FileType::FixedStruct{ archival_type: FileTypeArchive::Tar, .. } => false,
FileType::FixedStruct{ archival_type: FileTypeArchive::Xz, .. } => true,
FileType::Journal{ archival_type: FileTypeArchive::Normal } => false,
FileType::Journal{ archival_type: FileTypeArchive::Gz } => true,
FileType::Journal{ archival_type: FileTypeArchive::Lz4 } => true,
FileType::Journal{ archival_type: FileTypeArchive::Tar } => false,
FileType::Journal{ archival_type: FileTypeArchive::Xz } => true,
FileType::Text{ archival_type: FileTypeArchive::Normal, .. } => false,
FileType::Text{ archival_type: FileTypeArchive::Gz, .. } => true,
FileType::Text{ archival_type: FileTypeArchive::Lz4, .. } => true,
FileType::Text{ archival_type: FileTypeArchive::Tar, .. } => false,
FileType::Text{ archival_type: FileTypeArchive::Xz, .. } => true,
FileType::Unparsable => false,
Expand All @@ -751,18 +758,22 @@ impl FileType {
match self {
FileType::Evtx{ archival_type: FileTypeArchive::Normal } => false,
FileType::Evtx{ archival_type: FileTypeArchive::Gz } => false,
FileType::Evtx{ archival_type: FileTypeArchive::Lz4 } => false,
FileType::Evtx{ archival_type: FileTypeArchive::Tar } => true,
FileType::Evtx{ archival_type: FileTypeArchive::Xz } => false,
FileType::FixedStruct{ archival_type: FileTypeArchive::Normal, .. } => false,
FileType::FixedStruct{ archival_type: FileTypeArchive::Gz, .. } => false,
FileType::FixedStruct{ archival_type: FileTypeArchive::Lz4, .. } => false,
FileType::FixedStruct{ archival_type: FileTypeArchive::Tar, .. } => true,
FileType::FixedStruct{ archival_type: FileTypeArchive::Xz, .. } => false,
FileType::Journal{ archival_type: FileTypeArchive::Normal } => false,
FileType::Journal{ archival_type: FileTypeArchive::Gz } => false,
FileType::Journal{ archival_type: FileTypeArchive::Lz4 } => false,
FileType::Journal{ archival_type: FileTypeArchive::Tar } => true,
FileType::Journal{ archival_type: FileTypeArchive::Xz } => false,
FileType::Text{ archival_type: FileTypeArchive::Normal, ..} => false,
FileType::Text{ archival_type: FileTypeArchive::Gz, .. } => false,
FileType::Text{ archival_type: FileTypeArchive::Lz4, .. } => false,
FileType::Text{ archival_type: FileTypeArchive::Tar, .. } => true,
FileType::Text{ archival_type: FileTypeArchive::Xz, .. } => false,
FileType::Unparsable => false,
Expand Down
2 changes: 2 additions & 0 deletions src/printer/summary.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1285,8 +1285,10 @@ fn print_summary_opt_processed_summaryblockreader(
);
}
FileType::FixedStruct{ archival_type: FileTypeArchive::Gz, fixedstruct_type: _ }
| FileType::FixedStruct{ archival_type: FileTypeArchive::Lz4, fixedstruct_type: _ }
| FileType::FixedStruct{ archival_type: FileTypeArchive::Xz, fixedstruct_type: _ }
| FileType::Text{ archival_type: FileTypeArchive::Gz, encoding_type: _ }
| FileType::Text{ archival_type: FileTypeArchive::Lz4, encoding_type: _ }
| FileType::Text{ archival_type: FileTypeArchive::Xz, encoding_type: _ }
=> {
eprintln!(
Expand Down
Loading

0 comments on commit c017bc2

Please sign in to comment.