Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement] Some performance related changes to evaluate #1426

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mdonkers
Copy link
Contributor

Summary

Based on some CPU and memory analysis we ran on our production systems, we noticed a few things that could be improved.

I'd like to present these and discuss whether it makes sense to merge these into the project (otherwise we'll likely keep these on a local fork). Also some changes perhaps need a bit of work or could be done in another way. All feedback welcome!

Allow String column provider to take the name, so that it can use different settings

For String column types its possible to define a ColStrProvider, but this can only be defined once and globally. While ColStrProvider is great for pre-sizing buffers when creating batches, you can imagine not every column needs as much space. Some may be limited to a few characters, while others (e.g. log bodies) can be hundreds of characters.
Adding the name of the column as input to the ColStrProvider allows the internal logic to size buffers differently, or maybe include other logic.
This change does not affect users of the API.

Prevent hashing enums just for checking validity

We are creating quite big batches, and have quite a few enums in our records. In CPU profiles we noticed a lot of time was spent in map lookups, just for adding the enum to the batch. This is done to check whether the enum is valid, since they don't need to be a continuous range in ClickHouse.

The optimization takes a bit of overhead up-front, by checking if the Enum definition is a continuous range and capturing the lower and upper bound. Because if continuous, then for validating we only need to check if the number falls between lower and upper bound. This simple boolean logic is much faster overall than the map lookup.

Allow tuple type to prevent array creation overhead

This is probably the change with the least impact, though perhaps could be improved. Because normally when using Tuple type, the values need to be inserted as Array / Slice type. But depending on how data is internally presented before inserting, it might mean new slices need to be allocated.
So I'm wondering if a specific Tuple type makes sense, with specific types for the most common lengths (like Tuple2, Tuple3 and Tuple4).

Usage would look something like this:

type ValueWithTypeTuple struct {
	V string
	T int8
}

var EmptyValueTuple = ValueWithTypeTuple{V: "", T: ValueTypeEmpty}

// Get implements the column.Tuple2 interface from ClickHouse. It returns two values that can be inserted as Tuple.
func (avt ValueWithTypeTuple) Get() (any, any) {
	return &avt.V, int8(avt.T)
}

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG
  • For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

@mdonkers
Copy link
Contributor Author

@jkaflik by the way I don't expect the PR to be ready for merging in its current state.
But wanted to discuss on the approach etc before spending a lot of time. Everything works and compiles though.

@jkaflik
Copy link
Contributor

jkaflik commented Nov 4, 2024

@SpencerTorres could you take a look please?

@jkaflik jkaflik requested review from SpencerTorres and removed request for jkaflik November 4, 2024 08:44
lib/column/enum.go Outdated Show resolved Hide resolved
@@ -47,6 +48,9 @@ func Enum(chType Type, name string) (Interface, error) {
enum.iv[values[i]] = proto.Enum8(v)
enum.vi[proto.Enum8(v)] = values[i]
}
enum.minEnum = int8(slices.Min(maps.Keys(enum.vi)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to create slice of keys (& bring in exp/maps dependency)

Instead you can have one loop over keys (either using plain range enum.vi or range maps.Key(enum.vi) from golang stdlib, not sure if the latter is still too recent in golang for us to use in this library), & calculate min/max at the same time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I can make those changes if the solution in general is accepted (something I wasn't sure about).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noticed range values loop above, can calc min/max in there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this for Enum16, but will wait a bit for the Enum8 case for a response here: https://github.com/ClickHouse/clickhouse-go/pull/1426/files#r1905195485

@@ -47,6 +48,9 @@ func Enum(chType Type, name string) (Interface, error) {
enum.iv[values[i]] = proto.Enum8(v)
enum.vi[proto.Enum8(v)] = values[i]
}
enum.minEnum = int8(slices.Min(maps.Keys(enum.vi)))
enum.maxEnum = int8(slices.Max(maps.Keys(enum.vi)))
enum.continuous = (enum.maxEnum-enum.minEnum)+1 == int8(len(enum.vi))
Copy link
Member

@serprex serprex Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Enum8 non-continuous & continous could both use a 256 bit bitmap (only 32 bytes, could skip heap by being in struct as [4]uint64)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand your suggestion. Do you have some (semi-code) example of how you would use that, and which library? (Since I don't know of any bitmap implementations in the Golang standard library itself.)

Copy link
Member

@serprex serprex Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use library, can be done in loop above, something like

var bitset [4]uint64
for i := range values {
  bitIndex := uint8(indexes[i])
  bitset[bitIndex>>6] |= 1 << (bitIndex&63)
}

then when checking existence bitset[bitIndex>>6] & (1 << (bitIndex&63)) != 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave this solution some thought, but I think its defeating the purpose. The whole idea was to skip the map lookup (col.vi[v]) which is the thing causing lots of overhead.
Using the bitmap as you proposed, it still needs to do a map lookup: bitset[bitIndex>>6]. I think whether the map appears on the heap or not doesn't matter.

If you think it does, I can try creating a benchmark and compare the different solutions. But would need to find a bit of time to put in that effort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the expense on the CPU would be different between a standard map type and the bitmap. It should be similarly as fast as the integer bounds check.

If the solution works as-is then I'm okay with keeping it, if it's invalid then the server will throw an error. Most clients are already fairly in sync with their data model, especially with enums. I wouldn't want to hold the PR back due to this, we can iterate on it later

Copy link
Member

@serprex serprex Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[4]uint64 isn't a map. Array access will be about as fast as field access on continuous/min/max

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️ of course. Somehow I had a map in my mind due to the bitmap

I'll make the changes accordingly and verify

lib/column/tuple.go Outdated Show resolved Hide resolved
lib/column/tuple.go Outdated Show resolved Hide resolved
lib/column/tuple.go Outdated Show resolved Hide resolved
Copy link
Member

@SpencerTorres SpencerTorres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your effort in finding these optimizations along with a good explanation for each.

@serprex found most of the major concerns I would say. Here's my summary after reviewing each file:

  • The String column provider change is great, very convenient for supplying your own function and switching on column name.
  • The enum changes are good too, but I agree with @serprex on the bitmap solution. It's somewhat obscure, but once they're wrapped in functions it should be clear what is happening. This seems like it would be the fastest and most accurate approach for checking enum validity. The implementation would need to be adjusted for each enum size too (8 vs 16).
  • Let's remove the Tuple change for now. I like the idea of having optimized definitons for common sizes, but I don't want to hold back the other optimizations in this PR while we decide on that implementation. Maybe we can move it to an issue for discussion?
  • And then finally once everything is updated/finalized, some good test coverage (especially for the enums.)

Thanks!

- Allow tuple type to prevent array creation overhead
- Prevent hashing enums just for checking validity
- Allow String column provider to take the name, so that it can use
  different settings
@mdonkers mdonkers force-pushed the insert-performance-improvements branch from 362c740 to 4cd4d3c Compare January 7, 2025 09:01
@mdonkers mdonkers force-pushed the insert-performance-improvements branch from f46e9a5 to 73d9f63 Compare January 9, 2025 08:48
@mdonkers
Copy link
Contributor Author

mdonkers commented Jan 9, 2025

@SpencerTorres @serprex Could you have another look please? I think I addressed all comments and also added some tests to check the enum8 usage with the bitset for checking existence.

(if all is good I'll update the description and split out the Tuple suggestion into its own issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants