BTreeIndex Split Flush method, use bigger resolution #3182

kunga · 2024-03-26T18:13:31Z

Changelog entry

...

Changelog category

Not for changelog (changelog entry is not required)

Additional information

BTreeIndex Split Flush method, use bigger resolution

Before: https://nda.ya.ru/t/G8iQ0UAh759rCW

After: https://nda.ya.ru/t/G8iQ0UAh759rCW

github-actions · 2024-03-26T18:16:58Z

⚪ 2024-03-26 18:16:58 UTC Pre-commit check for 07a3ac8 has started.
⚪ 2024-03-26 18:17:00 UTC Build linux-x86_64-release-clang14 is running...
⚫ 2024-03-26 18:27:57 UTC Check cancelled

github-actions · 2024-03-26T18:17:08Z

⚪ 2024-03-26 18:17:08 UTC Pre-commit check for 07a3ac8 has started.
⚪ 2024-03-26 18:17:11 UTC Build linux-x86_64-relwithdebinfo is running...
⚫ 2024-03-26 18:27:57 UTC Check cancelled

github-actions · 2024-03-26T18:17:11Z

⚪ 2024-03-26 18:17:11 UTC Pre-commit check for 07a3ac8 has started.
⚪ 2024-03-26 18:17:13 UTC Build linux-x86_64-release-asan is running...
⚫ 2024-03-26 18:27:57 UTC Check cancelled

github-actions · 2024-03-26T18:32:33Z

⚪ 2024-03-26 18:32:33 UTC Pre-commit check for e2eabd4 has started.
⚪ 2024-03-26 18:32:35 UTC Build linux-x86_64-relwithdebinfo is running...
🔴 2024-03-26 18:44:34 UTC Build failed. see the build logs.
🔴 2024-03-26 18:46:07 UTC Tests run skipped.

github-actions · 2024-03-26T18:32:54Z

⚪ 2024-03-26 18:32:54 UTC Pre-commit check for e2eabd4 has started.
⚪ 2024-03-26 18:32:57 UTC Build linux-x86_64-release-asan is running...
🔴 2024-03-26 18:46:00 UTC Build failed. see the build logs.
🔴 2024-03-26 18:47:31 UTC Tests run skipped.

github-actions · 2024-03-26T18:51:46Z

⚪ 2024-03-26 18:51:46 UTC Pre-commit check for 1ad04e5 has started.
⚪ 2024-03-26 18:51:49 UTC Build linux-x86_64-release-clang14 is running...
🟢 2024-03-26 18:54:32 UTC Build successful.

github-actions · 2024-03-26T18:52:40Z

⚪ 2024-03-26 18:52:40 UTC Pre-commit check for 1ad04e5 has started.
⚪ 2024-03-26 18:52:42 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-03-26 18:54:56 UTC Build successful.
⚪ 2024-03-26 18:56:42 UTC Tests are running...
🔴 2024-03-26 20:04:42 UTC Some tests failed, follow the links below.

Test history

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
10162	10055	0	3	30	74

github-actions · 2024-03-26T18:54:16Z

⚪ 2024-03-26 18:54:15 UTC Pre-commit check for 1ad04e5 has started.
⚪ 2024-03-26 18:54:18 UTC Build linux-x86_64-release-asan is running...
🟢 2024-03-26 18:56:55 UTC Build successful.
⚪ 2024-03-26 18:58:45 UTC Tests are running...
🔴 2024-03-26 20:26:15 UTC Some tests failed, follow the links below.

Test history

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
9016	8827	0	23	139	27

github-actions · 2024-03-26T20:37:38Z

⚪ 2024-03-26 20:37:38 UTC Pre-commit check for f136d65 has started.
⚪ 2024-03-26 20:37:40 UTC Build linux-x86_64-release-clang14 is running...
🟢 2024-03-26 20:48:20 UTC Build successful.

github-actions · 2024-03-26T20:37:40Z

⚪ 2024-03-26 20:37:40 UTC Pre-commit check for f136d65 has started.
⚪ 2024-03-26 20:37:43 UTC Build linux-x86_64-release-asan is running...
🟢 2024-03-26 20:48:58 UTC Build successful.
⚪ 2024-03-26 20:50:47 UTC Tests are running...
🔴 2024-03-26 22:21:02 UTC Some tests failed, follow the links below.

Test history

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
9005	8814	0	32	136	23

github-actions · 2024-03-26T20:37:52Z

⚪ 2024-03-26 20:37:51 UTC Pre-commit check for f136d65 has started.
⚪ 2024-03-26 20:37:53 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-03-26 20:48:12 UTC Build successful.
⚪ 2024-03-26 20:49:58 UTC Tests are running...
🟢 2024-03-26 21:53:12 UTC Tests successful.

Test history

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
10057	10007	0	0	30	20

snaury · 2024-03-28T15:05:48Z

ydb/core/tablet_flat/benchmark/b_part.cpp

@@ -226,7 +226,7 @@ BENCHMARK_DEFINE_F(TPartFixture, DoCharge)(benchmark::State& state) {
 BENCHMARK_DEFINE_F(TPartFixture, BuildStats)(benchmark::State& state) {
    for (auto _ : state) {
        TStats stats;
-        BuildStats(*Subset, stats, NDataShard::gDbStatsRowCountResolution, NDataShard::gDbStatsDataSizeResolution, &Env);
+        BuildStats(*Subset, stats, NDataShard::gDbStatsRowCountResolution, NDataShard::gDbStatsDataSizeResolution, NDataShard::gDbStatsResolutionMultiplier, &Env);


Странновато конечно, что в коде tablet_flat используются переменные из даташардов. Может быть тут конкретные значения указать, зачем из даташардов тянуть?

snaury · 2024-03-28T15:12:34Z

ydb/core/tablet_flat/flat_page_conf.h

+        ui32 BTreeIndexNodeTargetSize = 7 * 1024;     /* 1 GB of (up to) 140B keys leads to 3-level B-Tree index */
+        ui32 BTreeIndexNodeKeysMin = 6;               /* 1 GB of 7KB keys leads to 6-level B-Tree index (node size - ~42KB) */
+        ui32 BTreeIndexNodeKeysMax = Max<ui32>();     /* for UTs */
+        ui32 BTreeIndexLeafDataSizeMax = 1024 * 1024; /* gDbStatsDataSizeResolution / gDbStatsResolutionMultiplier */


Я если честно не понимаю зачем. Зачем нам мельчить с размерами страниц индекса? Давай представим, что я храню в таблице uint64 key -> bytes data, и у меня там строки по 4MB. Даже одна data страница будет пробивать этот лимит, как я понял там в итоге накопится 6 ключей (uint64) и будет этакая мелкая leaf страница? А какой в ней смысл в такой мелкой? Мы бы с тем же успехом могли сканировать их на leaf уровне.

snaury · 2024-03-28T15:31:11Z

ydb/core/tablet_flat/flat_page_btree_index_writer.h

+                Levels[levelIndex].GetKeysCount() > waitFullNodes * NodeKeysMax ||
+                CalcPageSize(Levels[levelIndex]) > waitFullNodes * NodeTargetSize || 
+                levelIndex == 0 && Levels[levelIndex].GetDataSize() > waitFullNodes * LeafDataSizeMax ||
+                levelIndex == 0 && Levels[levelIndex].GetRowCount() > waitFullNodes * LeafRowsCountMax;


Мне кажется для таблиц у которых uint64 key -> uint64 value и на одной странице помещаются сотни строк (или даже больше, если размер data страницы увеличили), по этому условию будут генерироваться индексные страницы всегда по 25-50 ключей вместо пары сотен, они будут всегда существенно меньше 7KB, и это всё-таки слишком мелко. Что-то я не уверен насколько это полезно.

Я думаю тут не нужно как-то специально подгонять индекс под статистику, тут статистику нужно считать по тому, что есть. Сначала очень грубо и с большим uncertainty, зато быстро, и постепенно уменьшая uncertainty (там где это действительно необходимо) пока не дойдём до нужного разрешения или кол-ва ключей.

На разрешении 10MB (и уж тем более 1MB!) тоже свет клином не сошёлся, даташард наоборот разрешение уменьшает если оказывается что ключей слишком много получается (там максимум 100 ключей в гистограмме кажется, и на 2GB это разрешение 20MB). Думаю очень верхнеуровнево можно было бы помержить sst'ки по корню индекса, получилось бы N ключей, но т.к. мержим мы sst разных размеров, и ключи идеально не совпадают, между ключами может получаться большой возможный разброс размеров данных, и вот тут мы пытаемся его уточнить, спускаясь на уровень ниже и выбирая другой более подходящий ключ. Так как нам в итоге нужен максимум 100 ключей, нам в крупных sst скорее всего нужно будет ещё на пару уровней спуститься (если ключи большие, а не uint64) и всё. И вообще лучше не плясать от разрешения 10MB (это хак из расчёта 2GB/100=20MB!), а плясать от необходимого нам кол-ва ключей. Так 2GB делим на 100 - получаем желаемое разрешение 20MB, и если между парой ключей ±5-10MB, то это уже good enough. Но если шард на 10MB, то мы всё-равно на самом деле хотим 100 ключей, просто сейчас из-за хака не можем.

А ещё, мы на самом деле скорее всего даже не хотим 100 ключей. Нам было бы достаточно 10. SchemeShard'у так вообще кроме медианы сейчас больше ничего не нужно.

github-actions · 2024-03-28T19:07:12Z

⚪ 2024-03-28 19:07:12 UTC Pre-commit check for d3ae3fb has started.
⚪ 2024-03-28 19:07:14 UTC Build linux-x86_64-relwithdebinfo is running...
🔴 2024-03-28 19:21:13 UTC Build failed. see the build logs.
🔴 2024-03-28 19:22:42 UTC Tests run skipped.

github-actions · 2024-03-28T19:07:14Z

⚪ 2024-03-28 19:07:14 UTC Pre-commit check for d3ae3fb has started.
⚪ 2024-03-28 19:07:17 UTC Build linux-x86_64-release-clang14 is running...
🔴 2024-03-28 19:21:46 UTC Build failed. see the build logs.

github-actions · 2024-03-28T19:07:25Z

⚪ 2024-03-28 19:07:25 UTC Pre-commit check for d3ae3fb has started.
⚪ 2024-03-28 19:07:27 UTC Build linux-x86_64-release-asan is running...
🔴 2024-03-28 19:23:09 UTC Build failed. see the build logs.
🔴 2024-03-28 19:24:42 UTC Tests run skipped.

github-actions · 2024-03-28T19:45:53Z

⚪ 2024-03-28 19:45:52 UTC Pre-commit check for 20bae0a has started.
⚪ 2024-03-28 19:45:54 UTC Build linux-x86_64-release-clang14 is running...
🟢 2024-03-28 19:47:55 UTC Build successful.

github-actions · 2024-03-28T19:47:27Z

⚪ 2024-03-28 19:47:27 UTC Pre-commit check for 20bae0a has started.
⚪ 2024-03-28 19:47:29 UTC Build linux-x86_64-release-asan is running...
🟢 2024-03-28 19:49:30 UTC Build successful.
⚪ 2024-03-28 19:51:18 UTC Tests are running...
🔴 2024-03-28 20:06:35 UTC Test run completed, no test results found for commit 8628d09. Please check build logs.
⚫ 2024-03-28 20:06:38 UTC Check cancelled

github-actions · 2024-03-28T19:47:32Z

⚪ 2024-03-28 19:47:31 UTC Pre-commit check for 20bae0a has started.
⚪ 2024-03-28 19:47:34 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-03-28 19:49:21 UTC Build successful.
⚪ 2024-03-28 19:51:09 UTC Tests are running...
🔴 2024-03-28 20:06:35 UTC Test run completed, no test results found for commit 8628d09. Please check build logs.
⚫ 2024-03-28 20:06:38 UTC Check cancelled

github-actions · 2024-03-28T20:08:02Z

⚪ 2024-03-28 20:08:02 UTC Pre-commit check for 234c4c4 has started.
⚪ 2024-03-28 20:08:04 UTC Build linux-x86_64-release-asan is running...
🟢 2024-03-28 20:10:03 UTC Build successful.
⚪ 2024-03-28 20:11:43 UTC Tests are running...
🔴 2024-03-28 21:39:48 UTC Some tests failed, follow the links below.

Test history

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
9083	8879	0	35	140	29

github-actions · 2024-03-28T20:10:07Z

⚪ 2024-03-28 20:10:07 UTC Pre-commit check for 234c4c4 has started.
⚪ 2024-03-28 20:10:09 UTC Build linux-x86_64-release-clang14 is running...
🟢 2024-03-28 20:11:36 UTC Build successful.

github-actions · 2024-03-28T20:10:10Z

⚪ 2024-03-28 20:10:09 UTC Pre-commit check for 234c4c4 has started.
⚪ 2024-03-28 20:10:12 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-03-28 20:11:41 UTC Build successful.
⚪ 2024-03-28 20:13:29 UTC Tests are running...
🔴 2024-03-28 21:18:56 UTC Some tests failed, follow the links below.

Test history

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
10231	10157	0	2	46	26

github-actions · 2024-03-28T21:52:21Z

⚪ 2024-03-28 21:52:21 UTC Pre-commit check for c8c4cb1 has started.
⚪ 2024-03-28 21:52:23 UTC Build linux-x86_64-release-asan is running...
🟢 2024-03-28 21:54:06 UTC Build successful.
⚪ 2024-03-28 21:55:58 UTC Tests are running...
🔴 2024-03-28 23:22:11 UTC Some tests failed, follow the links below.

Test history

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
9082	8843	0	38	172	29

github-actions · 2024-03-28T21:52:21Z

⚪ 2024-03-28 21:52:21 UTC Pre-commit check for c8c4cb1 has started.
⚪ 2024-03-28 21:52:23 UTC Build linux-x86_64-release-clang14 is running...
🟢 2024-03-28 21:53:59 UTC Build successful.

github-actions · 2024-03-28T21:52:37Z

⚪ 2024-03-28 21:52:37 UTC Pre-commit check for c8c4cb1 has started.
⚪ 2024-03-28 21:52:40 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-03-28 21:54:21 UTC Build successful.
⚪ 2024-03-28 21:56:07 UTC Tests are running...
🔴 2024-03-28 23:00:07 UTC Some tests failed, follow the links below.

Test history

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
10231	10155	0	1	46	29

github-actions bot added the not-for-changelog label Mar 26, 2024

kunga self-assigned this Mar 26, 2024

kunga changed the title ~~BTreeIndex Keep leaf nodes small for stats~~ BTreeIndex Keep leaf nodes small for BuildStats Mar 26, 2024

kunga requested a review from snaury March 27, 2024 09:53

kunga changed the title ~~BTreeIndex Keep leaf nodes small for BuildStats~~ BTreeIndex Limit leaf nodes small enough for skipping them in BuildStats Mar 27, 2024

kunga mentioned this pull request Mar 27, 2024

👑 Readonly B-Tree SST index #1483

Open

61 tasks

snaury reviewed Mar 28, 2024

View reviewed changes

BTreeIndex Split Flush method, use bigger resolution

7219232

kunga force-pushed the btree-build-stats-leaves branch from f95c2ef to 7219232 Compare March 28, 2024 19:03

fix build

8628d09

don't divide

bab1f0b

fix tests

a83dddc

kunga changed the title ~~BTreeIndex Limit leaf nodes small enough for skipping them in BuildStats~~ BTreeIndex Split Flush method, use bigger resolution Mar 28, 2024

snaury approved these changes Mar 29, 2024

View reviewed changes

kunga merged commit 95416a7 into ydb-platform:main Mar 29, 2024
6 of 8 checks passed

This was referenced Apr 4, 2024

hotkeys crush on describe #3462

Merged

added QuotaStorage to tenantinfo response #3463

Merged

query stats full #3464

Merged

shnikd mentioned this pull request Apr 4, 2024

Support create sequence in pg parser #3465

Merged

StekPerepolnen mentioned this pull request Apr 4, 2024

hc api: updated possible problems #3482

Open

This was referenced Apr 6, 2024

Fix kafka read session verify #3522

Merged

Fix kafka read session partitions releases #3528

Merged

StekPerepolnen mentioned this pull request Apr 10, 2024

hc fallback when static group has unknown status #3620

Merged

shnikd mentioned this pull request Apr 11, 2024

Support create sequence #3662

Merged

This was referenced Apr 16, 2024

json_autocomplete error reports #3784

Merged

json_storage added pdiskid filter #4037

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BTreeIndex Split Flush method, use bigger resolution #3182

BTreeIndex Split Flush method, use bigger resolution #3182

kunga commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

snaury Mar 28, 2024

snaury Mar 28, 2024

snaury Mar 28, 2024

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

BTreeIndex Split Flush method, use bigger resolution #3182

BTreeIndex Split Flush method, use bigger resolution #3182

Conversation

kunga commented Mar 26, 2024 • edited Loading

Changelog entry

Changelog category

Additional information

github-actions bot commented Mar 26, 2024 • edited Loading

github-actions bot commented Mar 26, 2024 • edited Loading

github-actions bot commented Mar 26, 2024 • edited Loading

github-actions bot commented Mar 26, 2024 • edited Loading

github-actions bot commented Mar 26, 2024 • edited Loading

github-actions bot commented Mar 26, 2024 • edited Loading

github-actions bot commented Mar 26, 2024 • edited Loading

github-actions bot commented Mar 26, 2024 • edited Loading

github-actions bot commented Mar 26, 2024 • edited Loading

github-actions bot commented Mar 26, 2024 • edited Loading

github-actions bot commented Mar 26, 2024 • edited Loading

snaury Mar 28, 2024

Choose a reason for hiding this comment

snaury Mar 28, 2024

Choose a reason for hiding this comment

snaury Mar 28, 2024

Choose a reason for hiding this comment

github-actions bot commented Mar 28, 2024 • edited Loading

github-actions bot commented Mar 28, 2024 • edited Loading

github-actions bot commented Mar 28, 2024 • edited Loading

github-actions bot commented Mar 28, 2024 • edited Loading

github-actions bot commented Mar 28, 2024 • edited Loading

github-actions bot commented Mar 28, 2024 • edited Loading

github-actions bot commented Mar 28, 2024 • edited Loading

github-actions bot commented Mar 28, 2024 • edited Loading

github-actions bot commented Mar 28, 2024 • edited Loading

github-actions bot commented Mar 28, 2024 • edited Loading

github-actions bot commented Mar 28, 2024 • edited Loading

github-actions bot commented Mar 28, 2024 • edited Loading

kunga commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading