Skip to content

Conversation

HFTrader
Copy link
Contributor

A lot of criticism on Reddit about boost being ancient so I added an ExternalProject to compile boost from source. We are not compiling nor downloading the entire boost though, only libs/regex and a handful of its dependencies so download and build is kept to minimum.

Added a note on what to install on Ubuntu 20.04 as guidance.

CMake supports multi-line commands so I lined up the vendor file to 80 columns for better readability without changing the commands themselves.

Got a top Intel AWS C6i (IceLake) machine for a couple hours to run the benchmarks and added the results to the end as well.

HFTrader and others added 21 commits January 4, 2022 10:50
Added more info about the new libraries added.
- Fix typo in project name (RegexPeformance -> RegexPerformance)
- Fix inconsistent C++ standard (changed -std=c++11 to -std=c++20)
- Add Excel temporary files to .gitignore (.~lock.*#, *.tmp)
- Enhanced CMakeLists.txt with -mtune=native alongside existing -march=native
- Updated build_deps_simple.sh with -march=native -mtune=native for all dependencies
- Added comprehensive documentation for modern Clang 19.1.6 toolchain build process
- All 11 regex engines now built with CPU-specific optimizations
- Performance improvements observed across all regex engines
- Rust components automatically use -C target-cpu=native via Cargo
src/CMakeLists.txt:
- Fix RE2 linking with proper CMake target-based approach
- Use find_package(re2) and re2::re2 target instead of manual linking
- Resolve complex RE2/Abseil dependency issues

src/main.c:
- Add robust file loading with proper error handling and bounds checking
- Implement memory usage tracking and reporting via /proc/self/status
- Add comprehensive statistical analysis with outlier detection
- Include 95% confidence intervals and measurement stability indicators
- Add cross-engine result validation to detect discrepancies
- Implement JIT engine warmup cycles for better performance accuracy
- Enhanced CSV export with memory usage data
- Add memory vs speed analysis with trade-off calculations

src/main.h:
- Extend result structure with memory tracking fields
- Add statistical confidence interval fields
- Include function declarations for new utility functions

These changes significantly improve the reliability, accuracy, and
analytical capabilities of the regex performance benchmarking tool.
- Change default library inclusion from "local" to "system"
- Replace complex ExternalProject configurations with simple find_library calls
- Simplify build process to use pre-built dependencies from vendor/local/
- Add explicit git executable specification for better toolchain compatibility
- Remove legacy Boost, Hyperscan, Oniguruma, RE2, TRE, PCRE2, CTRE, and YARA build scripts
- Use modern CMake approach with locally built static libraries

This refactoring aligns with the new build_deps_simple.sh approach where
dependencies are pre-built with native optimizations and discovered via
CMake's find_library mechanism.
- Implement configurable timeout mechanism with 1-second default
- Add timeout checking to all regex engine timing loops
- Integrate Hyperscan build support in dependency script
- Fix YARA CMake configuration for proper linking
- Add command line option for timeout configuration (-t flag)
- Remove hyperscan, oniguruma, re2, and tre from Git tracking
- Add vendor dependency directories to .gitignore
- Dependencies will be rebuilt by build scripts as needed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant