Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/clickhouse] Update default logs table schema (2) #34203

Merged
merged 2 commits into from
Aug 6, 2024

Conversation

SpencerTorres
Copy link
Member

Description:

Previously updated in #33611, I am opening this to start a discussion on further improvements that can be made to the table.

Notable changes:

  • Changed from monthly partitions to daily. With ttl_only_drop_parts=1, this will help drop data for TTLs shorter than 1 month (such as when your log retention is only 7 days).
  • Changed idx_body granularity to 8, which should reduce the index size (especially beneficial for cloud services with separate storage)
  • Removed TimestampDate column
  • Simplified primary key to only use TimestampTime. Performance difference is negligible if not better. Also makes queries easier to write-- with the current version it requires that you provide both TimestampDate and TimestampTime for optimal sorting performance.
  • Separated and updated order by. Now it matches the primary key, with the addition of Timestamp, so that nanoseconds sorting is preserved by default.

Let me know if you have any more suggestions.

Link to tracking Issue:

Testing:

Documentation:

@crobert-1
Copy link
Member

Question out of ignorance: How does changing DB schema impact upgrade? Will this receiver still work on a DB that was created with the old schema? Are there any tests to ensure functionality works as expected with this?

@SpencerTorres
Copy link
Member Author

How does changing DB schema impact upgrade? Will this receiver still work on a DB that was created with the old schema? Are there any tests to ensure functionality works as expected with this?

Yep, it's compatible with the old schema as none of the column names have changed. We have some integration tests built in. I believe these are disabled for now due to inconsistencies in the CI, but they can still be run manually to confirm everything works the same.

The intent of changing the schema in this way is to improve performance. In production deployments, it's recommended that users tune the schema to match their query patterns. The logs table we provide here is intended to be a good default.

@crobert-1 crobert-1 added the ready to merge Code review completed; ready to merge by maintainers label Jul 29, 2024
@mx-psi mx-psi merged commit 44f6861 into open-telemetry:main Aug 6, 2024
163 checks passed
@github-actions github-actions bot added this to the next release milestone Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter/clickhouse ready to merge Code review completed; ready to merge by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants