Datafuse is a Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture written in Rust, inspired by ClickHouse and powered by arrow-rs, built to make it easy to power the Data Cloud.
-
Fearless
- No data races, No unsafe, Minimize unhandled errors
-
High Performance
- Everything is Parallelism
-
High Scalability
- Everything is Distributed
-
High Reliability
- Datafuse primary design goal is reliability
- Memory SIMD-Vector processing performance only
- Dataset: 100,000,000,000 (100 Billion)
- Hardware: AMD Ryzen 7 PRO 4750U, 8 CPU Cores, 16 Threads
- Rust: rustc 1.53.0-nightly (673d0db5e 2021-03-23)
- Build with Link-time Optimization and Using CPU Specific Instructions
- ClickHouse server version 21.4.6 revision 54447
Query | FuseQuery (v0.4.1) | ClickHouse (v21.4.6) |
---|---|---|
SELECT avg(number) FROM numbers_mt(100000000000) | 3.87 s. (25.83 billion rows/s., 206.79 GB/s.) |
×1.6 slow, (6.04 s.) (16.57 billion rows/s., 132.52 GB/s.) |
SELECT sum(number) FROM numbers_mt(100000000000) | 4.86 s. (20.57 billion rows/s., 164.70 GB/s.) |
×1.2 slow, (5.90 s.) (16.95 billion rows/s., 135.62 GB/s.) |
SELECT min(number) FROM numbers_mt(100000000000) | 5.61 s. (17.82 billion rows/s., 142.65 GB/s.) |
×2.3 slow, (13.05 s.) (7.66 billion rows/s., 61.26 GB/s.) |
SELECT max(number) FROM numbers_mt(100000000000) | 5.61 s. (17.82 billion rows/s., 142.67 GB/s.) |
×2.5 slow, (14.07 s.) (7.11 billion rows/s., 56.86 GB/s.) |
SELECT count(number) FROM numbers_mt(100000000000) | 3.12 s. (32.03 billion rows/s., 256.48 GB/s.) |
×1.2 slow, (3.71 s.) (26.93 billion rows/s., 215.43 GB/s.) |
SELECT sum(number+number+number) FROM numbers_mt(100000000000) | 17.85 s. (5.60 billion rows/s., 44.85 GB/s.) |
×16.9 slow, (233.71 s.) (427.87 million rows/s., 3.42 GB/s.) |
SELECT sum(number) / count(number) FROM numbers_mt(100000000000) | 4.02 s. (24.86 billion rows/s., 199.10 GB/s.) |
×2.4 slow, (9.70 s.) (10.31 billion rows/s., 82.52 GB/s.) |
SELECT sum(number) / count(number), max(number), min(number) FROM numbers_mt(100000000000) | 9.60 s. (10.41 billion rows/s., 83.38 GB/s.) |
×3.4 slow, (32.87 s.) (3.04 billion rows/s., 24.34 GB/s.) |
SELECT number FROM numbers_mt(10000000000) ORDER BY number DESC LIMIT 1000 | 5.34 s. (1.87 billion rows/s., 14.99 GB/s.) |
×2.6 slow, (13.95 s.) (716.62 million rows/s., 5.73 GB/s.) |
SELECT max(number),sum(number) FROM numbers_mt(1000000000) GROUP BY number % 3, number % 4, number % 5 | 9.03 s. (110.71 million rows/s., 886.50 MB/s.) |
×3.5 fast, (2.60 s.) (385.28 million rows/s., 3.08 GB/s.) |
Note:
- ClickHouse system.numbers_mt is 16-way parallelism processing, gist
- FuseQuery system.numbers_mt is 16-way parallelism processing, gist
- SQL Parser
- Query Planner
- Query Optimizer
- Predicate Push Down
- Limit Push Down
- Projection Push Down
- Type coercion
- Parallel Query Execution
- Distributed Query Execution
- Hash GroupBy
- Merge-Sort OrderBy
- Joins (WIP)
- Projection
- Filter (WHERE)
- Limit
- Aggregate Functions
- Scalar Functions
- UDF Functions
- SubQueries
- Sorting
- Joins (WIP)
- Window (TODO)
- 0.1 Support aggregation select (2021.02)
- 0.2 Support distributed query (2021.03)
- 0.3 Support group by (2021.04)
- 0.4 Support order by (2021.04)
- 0.5 Support join
- 1.0 Support TPC-H benchmark
Datafuse is currently in Alpha and is not ready to be used in production.
We are doing our best to release R1.
Datafuse is licensed under Apache 2.0.