Buffered compression column by column for native protocol #808

gingerwizard · 2022-11-02T17:13:58Z

closes #755

This is probably sufficient for now. Ultimately we should handle this better as part of #809

gingerwizard · 2022-11-03T12:15:50Z

Few pending things:

We need to decide what size we should flush on i.e. MaxCompressionBuffer. The original issue Compress insertion data column by column to reduce memory usage #755 used 1Mb. This feels too low IMO @genzgd thoughts?
Needs docs inc. recommendation on how to reduce memory usage (we should document block buffer wrt to this as well). Document MaxCompressionBuffer.
We need to benchmark and confirm reduced memory + impact on performance. Initial tests suggest this is actually faster.
Add some more tests around this - forcing compression.

jkaflik · 2022-12-12T13:09:39Z

We need to decide what size we should flush on i.e. MaxCompressionBuffer. The original issue #755 used 1Mb. This feels too low IMO @genzgd thoughts?

I found max_insert_block_size is 1048449 bytes default, so IMO it makes sense to keep this value since we flush every MaxCompressionBuffer is met.

# Conflicts: # clickhouse_options.go

jkaflik · 2022-12-13T12:40:03Z

benchmark/v2/write-compress-buffer-limit/write_test.go benchmark against main without MaxCompressionBuffer:

BenchmarkWrite1KB
Alloc = 925 MiB	TotalAlloc = 4024 MiB	Sys = 1920 MiB	NumGC = 38
BenchmarkWrite1KB-10      	       1	2201197625 ns/op
BenchmarkWrite16KB
Alloc = 1153 MiB	TotalAlloc = 8047 MiB	Sys = 1920 MiB	NumGC = 78
BenchmarkWrite16KB-10     	       1	2093274875 ns/op
BenchmarkWrite64KB
Alloc = 925 MiB	TotalAlloc = 12069 MiB	Sys = 2202 MiB	NumGC = 118
BenchmarkWrite64KB-10     	       1	2067872083 ns/op
BenchmarkWrite256KB
Alloc = 1153 MiB	TotalAlloc = 16092 MiB	Sys = 2203 MiB	NumGC = 158
BenchmarkWrite256KB-10    	       1	2066669625 ns/op
BenchmarkWrite512KB
Alloc = 925 MiB	TotalAlloc = 20115 MiB	Sys = 2203 MiB	NumGC = 198
BenchmarkWrite512KB-10    	       1	2065087792 ns/op
BenchmarkWrite1MB
Alloc = 1153 MiB	TotalAlloc = 24138 MiB	Sys = 2203 MiB	NumGC = 238
BenchmarkWrite1MB-10      	       1	2047471958 ns/op
BenchmarkWrite5MB
Alloc = 925 MiB	TotalAlloc = 28161 MiB	Sys = 2203 MiB	NumGC = 278
BenchmarkWrite5MB-10      	       1	2059622667 ns/op
BenchmarkWrite10MB
Alloc = 1153 MiB	TotalAlloc = 32184 MiB	Sys = 2203 MiB	NumGC = 319
BenchmarkWrite10MB-10     	       1	2039583250 ns/op

benchmark/v2/write-compress-buffer-limit/write_test.go benchmark against compress_by_column with various MaxCompressionBuffer values:

BenchmarkWrite1KB
Alloc = 686 MiB	TotalAlloc = 3229 MiB	Sys = 1459 MiB	NumGC = 37
BenchmarkWrite1KB-10      	       1	1973350542 ns/op
BenchmarkWrite16KB
Alloc = 802 MiB	TotalAlloc = 6458 MiB	Sys = 1459 MiB	NumGC = 77
BenchmarkWrite16KB-10     	       1	1883714208 ns/op
BenchmarkWrite64KB
Alloc = 686 MiB	TotalAlloc = 9687 MiB	Sys = 1459 MiB	NumGC = 118
BenchmarkWrite64KB-10     	       1	1920653250 ns/op
BenchmarkWrite256KB
Alloc = 895 MiB	TotalAlloc = 12915 MiB	Sys = 1459 MiB	NumGC = 158
BenchmarkWrite256KB-10    	       1	1867119792 ns/op
BenchmarkWrite512KB
Alloc = 686 MiB	TotalAlloc = 16144 MiB	Sys = 1459 MiB	NumGC = 198
BenchmarkWrite512KB-10    	       1	1871656875 ns/op
BenchmarkWrite1MB
Alloc = 686 MiB	TotalAlloc = 19372 MiB	Sys = 1460 MiB	NumGC = 237
BenchmarkWrite1MB-10      	       1	1864227708 ns/op
BenchmarkWrite5MB
Alloc = 895 MiB	TotalAlloc = 22601 MiB	Sys = 1460 MiB	NumGC = 277
BenchmarkWrite5MB-10      	       1	1879189750 ns/op
BenchmarkWrite10MB
Alloc = 686 MiB	TotalAlloc = 25829 MiB	Sys = 1460 MiB	NumGC = 317
BenchmarkWrite10MB-10     	       1	1894366416 ns/op

there are visible throughput and memory allocation improvements.

…s_by_column

compress column by column over native

6fc8af9

gingerwizard added the enhancement label Nov 2, 2022

gingerwizard assigned gingerwizard and genzgd Nov 2, 2022

gingerwizard added 3 commits November 3, 2022 09:52

refer to Buffer

1f6699d

simple test

7290aec

Better implementation

bf76e6d

gingerwizard requested review from genzgd and ernado November 3, 2022 12:17

gingerwizard unassigned genzgd Nov 3, 2022

gingerwizard mentioned this pull request Nov 3, 2022

Move to auto flushing API based on memory usage #809

Open

rename

68d6869

mshustov requested a review from jkaflik December 2, 2022 09:18

jkaflik self-assigned this Dec 6, 2022

Merge remote-tracking branch 'origin/main' into compress_by_column

da74584

# Conflicts: # clickhouse_options.go

jkaflik unassigned gingerwizard Dec 12, 2022

jkaflik added 5 commits December 12, 2022 23:22

do not overflow compression buffer

bd6cca0

benchmark

b6674c0

print mem usage

9904808

remove buffer compression overflow

e6f97fd

add buffor size debugf

c5c9499

jkaflik changed the title ~~compress column by column over native~~ Buffered compression column by column for native protocol Dec 13, 2022

Merge branch 'main' into compress_by_column

62d41f3

gingerwizard marked this pull request as ready for review December 16, 2022 09:39

jkaflik added 2 commits December 16, 2022 20:50

add DSN tests and README

956f096

Merge remote-tracking branch 'origin/compress_by_column' into compres…

81c9d49

…s_by_column

jkaflik approved these changes Dec 16, 2022

View reviewed changes

jkaflik added 2 commits December 16, 2022 21:49

fix non trivial DSN parse issue

a89e197

license

e065e76

jkaflik merged commit 4d33629 into main Dec 16, 2022

jkaflik deleted the compress_by_column branch May 5, 2023 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffered compression column by column for native protocol #808

Buffered compression column by column for native protocol #808

gingerwizard commented Nov 2, 2022 •

edited

Loading

gingerwizard commented Nov 3, 2022 •

edited by jkaflik

Loading

jkaflik commented Dec 12, 2022 •

edited

Loading

jkaflik commented Dec 13, 2022 •

edited

Loading

Buffered compression column by column for native protocol #808

Buffered compression column by column for native protocol #808

Conversation

gingerwizard commented Nov 2, 2022 • edited Loading

gingerwizard commented Nov 3, 2022 • edited by jkaflik Loading

jkaflik commented Dec 12, 2022 • edited Loading

jkaflik commented Dec 13, 2022 • edited Loading

gingerwizard commented Nov 2, 2022 •

edited

Loading

gingerwizard commented Nov 3, 2022 •

edited by jkaflik

Loading

jkaflik commented Dec 12, 2022 •

edited

Loading

jkaflik commented Dec 13, 2022 •

edited

Loading