Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffered compression column by column for native protocol #808

Merged
merged 16 commits into from
Dec 16, 2022

Conversation

gingerwizard
Copy link
Collaborator

@gingerwizard gingerwizard commented Nov 2, 2022

closes #755

This is probably sufficient for now. Ultimately we should handle this better as part of #809

@gingerwizard
Copy link
Collaborator Author

gingerwizard commented Nov 3, 2022

Few pending things:

  • We need to decide what size we should flush on i.e. MaxCompressionBuffer. The original issue Compress insertion data column by column to reduce memory usage #755 used 1Mb. This feels too low IMO @genzgd thoughts?
  • Needs docs inc. recommendation on how to reduce memory usage (we should document block buffer wrt to this as well). Document MaxCompressionBuffer.
  • We need to benchmark and confirm reduced memory + impact on performance. Initial tests suggest this is actually faster.
  • Add some more tests around this - forcing compression.

@mshustov mshustov requested a review from jkaflik December 2, 2022 09:18
@jkaflik jkaflik self-assigned this Dec 6, 2022
@jkaflik
Copy link
Contributor

jkaflik commented Dec 12, 2022

@gingerwizard @genzgd

We need to decide what size we should flush on i.e. MaxCompressionBuffer. The original issue #755 used 1Mb. This feels too low IMO @genzgd thoughts?

I found max_insert_block_size is 1048449 bytes default, so IMO it makes sense to keep this value since we flush every MaxCompressionBuffer is met.

@jkaflik jkaflik changed the title compress column by column over native Buffered compression column by column for native protocol Dec 13, 2022
@jkaflik
Copy link
Contributor

jkaflik commented Dec 13, 2022

benchmark/v2/write-compress-buffer-limit/write_test.go benchmark against main without MaxCompressionBuffer:

BenchmarkWrite1KB
Alloc = 925 MiB	TotalAlloc = 4024 MiB	Sys = 1920 MiB	NumGC = 38
BenchmarkWrite1KB-10      	       1	2201197625 ns/op
BenchmarkWrite16KB
Alloc = 1153 MiB	TotalAlloc = 8047 MiB	Sys = 1920 MiB	NumGC = 78
BenchmarkWrite16KB-10     	       1	2093274875 ns/op
BenchmarkWrite64KB
Alloc = 925 MiB	TotalAlloc = 12069 MiB	Sys = 2202 MiB	NumGC = 118
BenchmarkWrite64KB-10     	       1	2067872083 ns/op
BenchmarkWrite256KB
Alloc = 1153 MiB	TotalAlloc = 16092 MiB	Sys = 2203 MiB	NumGC = 158
BenchmarkWrite256KB-10    	       1	2066669625 ns/op
BenchmarkWrite512KB
Alloc = 925 MiB	TotalAlloc = 20115 MiB	Sys = 2203 MiB	NumGC = 198
BenchmarkWrite512KB-10    	       1	2065087792 ns/op
BenchmarkWrite1MB
Alloc = 1153 MiB	TotalAlloc = 24138 MiB	Sys = 2203 MiB	NumGC = 238
BenchmarkWrite1MB-10      	       1	2047471958 ns/op
BenchmarkWrite5MB
Alloc = 925 MiB	TotalAlloc = 28161 MiB	Sys = 2203 MiB	NumGC = 278
BenchmarkWrite5MB-10      	       1	2059622667 ns/op
BenchmarkWrite10MB
Alloc = 1153 MiB	TotalAlloc = 32184 MiB	Sys = 2203 MiB	NumGC = 319
BenchmarkWrite10MB-10     	       1	2039583250 ns/op

benchmark/v2/write-compress-buffer-limit/write_test.go benchmark against compress_by_column with various MaxCompressionBuffer values:

BenchmarkWrite1KB
Alloc = 686 MiB	TotalAlloc = 3229 MiB	Sys = 1459 MiB	NumGC = 37
BenchmarkWrite1KB-10      	       1	1973350542 ns/op
BenchmarkWrite16KB
Alloc = 802 MiB	TotalAlloc = 6458 MiB	Sys = 1459 MiB	NumGC = 77
BenchmarkWrite16KB-10     	       1	1883714208 ns/op
BenchmarkWrite64KB
Alloc = 686 MiB	TotalAlloc = 9687 MiB	Sys = 1459 MiB	NumGC = 118
BenchmarkWrite64KB-10     	       1	1920653250 ns/op
BenchmarkWrite256KB
Alloc = 895 MiB	TotalAlloc = 12915 MiB	Sys = 1459 MiB	NumGC = 158
BenchmarkWrite256KB-10    	       1	1867119792 ns/op
BenchmarkWrite512KB
Alloc = 686 MiB	TotalAlloc = 16144 MiB	Sys = 1459 MiB	NumGC = 198
BenchmarkWrite512KB-10    	       1	1871656875 ns/op
BenchmarkWrite1MB
Alloc = 686 MiB	TotalAlloc = 19372 MiB	Sys = 1460 MiB	NumGC = 237
BenchmarkWrite1MB-10      	       1	1864227708 ns/op
BenchmarkWrite5MB
Alloc = 895 MiB	TotalAlloc = 22601 MiB	Sys = 1460 MiB	NumGC = 277
BenchmarkWrite5MB-10      	       1	1879189750 ns/op
BenchmarkWrite10MB
Alloc = 686 MiB	TotalAlloc = 25829 MiB	Sys = 1460 MiB	NumGC = 317
BenchmarkWrite10MB-10     	       1	1894366416 ns/op

there are visible throughput and memory allocation improvements.

@gingerwizard gingerwizard marked this pull request as ready for review December 16, 2022 09:39
@jkaflik jkaflik merged commit 4d33629 into main Dec 16, 2022
@jkaflik jkaflik deleted the compress_by_column branch May 5, 2023 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compress insertion data column by column to reduce memory usage
3 participants