A high-performance data streaming system using DuckDB and Apache Arrow Flight.
- DuckDB Flight Server (
duckdb_flight_server.py
): A server that exposes DuckDB through Arrow Flight protocol - Data Loader (
load_data.py
): Continuously generates and loads random data into DuckDB - Query Client (
query_data.py
): Executes continuous queries against the loaded data
- Install dependencies:
pip install duckdb pyarrow
- Start the server:
python duckdb_flight_server.py
- Start the data loader:
python load_data.py
- Start the query client:
python query_data.py
The system creates a table called concurrent_test
with the following schema:
batch_id
: BIGINTtimestamp
: VARCHARvalue
: DOUBLEcategory
: VARCHAR
- Persistent storage using DuckDB
- High-performance data transfer using Arrow Flight
- Continuous data loading and querying
- Memory-efficient batch processing
- Aligned Arrow buffers for optimal performance