Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(replays): Add materialized tags column and index #6308

Merged
merged 5 commits into from
Sep 18, 2024

Conversation

cmanallen
Copy link
Member

@cmanallen cmanallen commented Sep 16, 2024

Adds materialized _tags_hash_map column and indexes it. Follow the prior art established by the errors table.

Addresses the following problems:

  1. The number of bytes scanned searching for a tag.
  2. The amount of time it takes to search by tag.

Related: getsentry/sentry#76289

Copy link

github-actions bot commented Sep 16, 2024

This PR has a migration; here is the generated SQL

-- start migrations

-- forward migration replays : 0021_index_tags
Local op: ALTER TABLE replays_local ADD COLUMN IF NOT EXISTS _tags_hash_map Array(UInt64) MATERIALIZED arrayMap((k, v) -> cityHash64(concat(replaceRegexpAll(k, '(\\=|\\\\)', '\\\\\\1'), '=', v)), tags.key, tags.value) AFTER tags.value;
Distributed op: ALTER TABLE replays_dist ADD COLUMN IF NOT EXISTS _tags_hash_map Array(UInt64) MATERIALIZED arrayMap((k, v) -> cityHash64(concat(replaceRegexpAll(k, '(\\=|\\\\)', '\\\\\\1'), '=', v)), tags.key, tags.value) AFTER tags.value;
Local op: ALTER TABLE replays_local ADD INDEX IF NOT EXISTS bf_tags_hash_map _tags_hash_map TYPE bloom_filter GRANULARITY 1;
-- end forward migration replays : 0021_index_tags




-- backward migration replays : 0021_index_tags
Local op: ALTER TABLE replays_local DROP INDEX IF EXISTS bf_tags_hash_map;
Distributed op: ALTER TABLE replays_dist DROP COLUMN IF EXISTS _tags_hash_map;
Local op: ALTER TABLE replays_local DROP COLUMN IF EXISTS _tags_hash_map;
-- end backward migration replays : 0021_index_tags

@cmanallen cmanallen marked this pull request as ready for review September 16, 2024 17:30
@cmanallen cmanallen requested a review from a team as a code owner September 16, 2024 17:30
Copy link

codecov bot commented Sep 16, 2024

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1241 1 1240 2
View the top 1 failed tests by shortest run time
tests.migrations.test_runner test_reverse_all
Stack Traces | 9.31s run time
Traceback (most recent call last):
  File ".../snuba/clickhouse/native.py", line 200, in execute
    result_data = query_execute()
                  ^^^^^^^^^^^^^^^
  File ".../snuba/clickhouse/native.py", line 183, in query_execute
    return conn.execute(  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 373, in execute
    rv = self.process_ordinary_query(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 571, in process_ordinary_query
    return self.receive_result(with_column_types=with_column_types,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 204, in receive_result
    return result.get_result()
           ^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.../site-packages/clickhouse_driver/result.py", line 50, in get_result
    for packet in self.packet_generator:
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 220, in packet_generator
    packet = self.receive_packet()
             ^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 237, in receive_packet
    raise packet.exception
clickhouse_driver.errors.ServerException: Code: 47.
DB::Exception: Missing columns: '_tags_hash_map' while processing query: '_tags_hash_map', required columns: '_tags_hash_map' '_tags_hash_map': Cannot apply mutation because it breaks skip index bf_tags_hash_map. Stack trace:

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0xe295a55 in ....................................................................................................../usr/bin/clickhouse
1. ? @ 0x8ac772c in ....................................................................................................../usr/bin/clickhouse
2. DB::TreeRewriterResult::collectUsedColumns(std::shared_ptr<DB::IAST> const&, bool, bool) @ 0x13cfb859 in ....................................................................................................../usr/bin/clickhouse
3. DB::TreeRewriter::analyze(std::shared_ptr<DB::IAST>&, DB::NamesAndTypesList const&, std::shared_ptr<DB::IStorage const>, std::shared_ptr<DB::StorageSnapshot> const&, bool, bool, bool, bool) const @ 0x13d04b78 in ....................................................................................................../usr/bin/clickhouse
4. DB::IndexDescription::getIndexFromAST(std::shared_ptr<DB::IAST> const&, DB::ColumnsDescription const&, std::shared_ptr<DB::Context const>) @ 0x140d8eb4 in ....................................................................................................../usr/bin/clickhouse
5. DB::AlterCommands::apply(DB::StorageInMemoryMetadata&, std::shared_ptr<DB::Context const>) const @ 0x14093f00 in ....................................................................................................../usr/bin/clickhouse
6. DB::MergeTreeData::checkAlterIsPossible(DB::AlterCommands const&, std::shared_ptr<DB::Context const>) const @ 0x146fcf3a in ....................................................................................................../usr/bin/clickhouse
7. DB::InterpreterAlterQuery::executeToTable(DB::ASTAlterQuery const&) @ 0x138b51c4 in ....................................................................................................../usr/bin/clickhouse
8. DB::InterpreterAlterQuery::execute() @ 0x138b2570 in ....................................................................................................../usr/bin/clickhouse
9. ? @ 0x13da444b in ....................................................................................................../usr/bin/clickhouse
10. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x13da170d in ....................................................................................................../usr/bin/clickhouse
11. DB::TCPHandler::runImpl() @ 0x14bb32c4 in ....................................................................................................../usr/bin/clickhouse
12. DB::TCPHandler::run() @ 0x14bc9699 in ....................................................................................................../usr/bin/clickhouse
13. Poco::Net::TCPServerConnection::start() @ 0x17b1ec54 in ....................................................................................................../usr/bin/clickhouse
14. Poco::Net::TCPServerDispatcher::run() @ 0x17b1fe7b in ....................................................................................................../usr/bin/clickhouse
15. Poco::PooledThread::run() @ 0x17ca8e27 in ....................................................................................................../usr/bin/clickhouse
16. Poco::ThreadImpl::runnableEntry(void*) @ 0x17ca689d in ....................................................................................................../usr/bin/clickhouse
17. ? @ 0x7f3ff0308609 in ?
18. __clone @ 0x7f3ff022d353 in ?


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".../tests/migrations/test_runner.py", line 328, in test_reverse_all
    runner.reverse_migration(migration, force=True)
  File ".../snuba/migrations/runner.py", line 382, in reverse_migration
    self._reverse_migration_impl(migration_key)
  File ".../snuba/migrations/runner.py", line 439, in _reverse_migration_impl
    migration.backwards(context, dry_run)
  File ".../snuba/migrations/migration.py", line 193, in backwards
    op.execute()
  File ".../snuba/migrations/operations.py", line 81, in execute
    connection.execute(self.format_sql(), settings=self._settings)
  File ".../snuba/clickhouse/native.py", line 285, in execute
    raise ClickhouseError(e.message, code=e.code) from e
snuba.clickhouse.errors.ClickhouseError: DB::Exception: Missing columns: '_tags_hash_map' while processing query: '_tags_hash_map', required columns: '_tags_hash_map' '_tags_hash_map': Cannot apply mutation because it breaks skip index bf_tags_hash_map. Stack trace:

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0xe295a55 in ....................................................................................................../usr/bin/clickhouse
1. ? @ 0x8ac772c in ....................................................................................................../usr/bin/clickhouse
2. DB::TreeRewriterResult::collectUsedColumns(std::shared_ptr<DB::IAST> const&, bool, bool) @ 0x13cfb859 in ....................................................................................................../usr/bin/clickhouse
3. DB::TreeRewriter::analyze(std::shared_ptr<DB::IAST>&, DB::NamesAndTypesList const&, std::shared_ptr<DB::IStorage const>, std::shared_ptr<DB::StorageSnapshot> const&, bool, bool, bool, bool) const @ 0x13d04b78 in ....................................................................................................../usr/bin/clickhouse
4. DB::IndexDescription::getIndexFromAST(std::shared_ptr<DB::IAST> const&, DB::ColumnsDescription const&, std::shared_ptr<DB::Context const>) @ 0x140d8eb4 in ....................................................................................................../usr/bin/clickhouse
5. DB::AlterCommands::apply(DB::StorageInMemoryMetadata&, std::shared_ptr<DB::Context const>) const @ 0x14093f00 in ....................................................................................................../usr/bin/clickhouse
6. DB::MergeTreeData::checkAlterIsPossible(DB::AlterCommands const&, std::shared_ptr<DB::Context const>) const @ 0x146fcf3a in ....................................................................................................../usr/bin/clickhouse
7. DB::InterpreterAlterQuery::executeToTable(DB::ASTAlterQuery const&) @ 0x138b51c4 in ....................................................................................................../usr/bin/clickhouse
8. DB::InterpreterAlterQuery::execute() @ 0x138b2570 in ....................................................................................................../usr/bin/clickhouse
9. ? @ 0x13da444b in ....................................................................................................../usr/bin/clickhouse
10. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x13da170d in ....................................................................................................../usr/bin/clickhouse
11. DB::TCPHandler::runImpl() @ 0x14bb32c4 in ....................................................................................................../usr/bin/clickhouse
12. DB::TCPHandler::run() @ 0x14bc9699 in ....................................................................................................../usr/bin/clickhouse
13. Poco::Net::TCPServerConnection::start() @ 0x17b1ec54 in ....................................................................................................../usr/bin/clickhouse
14. Poco::Net::TCPServerDispatcher::run() @ 0x17b1fe7b in ....................................................................................................../usr/bin/clickhouse
15. Poco::PooledThread::run() @ 0x17ca8e27 in ....................................................................................................../usr/bin/clickhouse
16. Poco::ThreadImpl::runnableEntry(void*) @ 0x17ca689d in ....................................................................................................../usr/bin/clickhouse
17. ? @ 0x7f3ff0308609 in ?
18. __clone @ 0x7f3ff022d353 in ?

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

@cmanallen cmanallen merged commit 5bdae7f into master Sep 18, 2024
30 checks passed
@cmanallen cmanallen deleted the cmanallen/replays-add-indexed-tags-column branch September 18, 2024 14:09
cmanallen added a commit that referenced this pull request Sep 19, 2024
Exposes _tags_hash_map column. Follow the prior art established by the
errors table.

Related issue: getsentry/sentry#76289
Migration PR: #6308
cmanallen added a commit to getsentry/sentry that referenced this pull request Sep 20, 2024
Tags search is expensive from a bytes and rows scanned perspective. This
optimization makes it so that single tag searches can be run quickly and
efficiently.

Depends on: getsentry/snuba#6308 and
getsentry/snuba#6309
Related: #76289
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants