Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add local kmeans actor scan #8756

Merged
merged 12 commits into from
Sep 9, 2024
Merged

Conversation

MBkkt
Copy link
Collaborator

@MBkkt MBkkt commented Sep 4, 2024

It run kmeans for datashard with embeddings.

This scan contains 3 phases:

  1. First iteration collect sample of clusters
  2. Then N iterations recompute clusters (main cycle of batched kmeans)
  3. Finally last iteration upload clusters to level table and postings to corresponding posting table

Important note, that this code doesn't contains slow-path (that can store some intermediate results to LocalDB).
Because in general slow-path will be run only if this code fails and at the same time, datashards are small enough to don't think about this in first implementation of vector index (another words granularity of work that needed to be re-done in case datashard restart <= datashard size)

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

@MBkkt MBkkt force-pushed the mbkkt/kmeans-compute branch from aaf632b to ae2247d Compare September 6, 2024 05:08
@MBkkt MBkkt marked this pull request as ready for review September 6, 2024 05:09
@MBkkt MBkkt requested a review from azevaykin September 6, 2024 05:09

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

@MBkkt MBkkt self-assigned this Sep 6, 2024

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

@MBkkt MBkkt requested review from snaury and azevaykin September 6, 2024 14:26
@MBkkt MBkkt force-pushed the mbkkt/kmeans-compute branch from e2e6649 to 7225fcc Compare September 6, 2024 14:41
azevaykin
azevaykin previously approved these changes Sep 6, 2024

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

@MBkkt MBkkt requested a review from azevaykin September 9, 2024 09:53

This comment was marked as outdated.

@MBkkt MBkkt requested a review from snaury September 9, 2024 09:53
@MBkkt MBkkt requested a review from snaury September 9, 2024 10:42
Copy link

github-actions bot commented Sep 9, 2024

2024-09-09 10:44:20 UTC Pre-commit check linux-x86_64-relwithdebinfo for 7a9d92c has started.
2024-09-09 10:47:45 UTC ya make is running...
🟡 2024-09-09 12:08:44 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
76454 62617 0 45 13768 24

2024-09-09 12:16:44 UTC ya make is running... (failed tests rerun, try 2)
🟡 2024-09-09 12:24:29 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
79 (only retried tests) 56 0 6 0 17

2024-09-09 12:24:37 UTC ya make is running... (failed tests rerun, try 3)
🔴 2024-09-09 12:32:40 UTC Some tests failed, follow the links below.

Test history | Ya make output

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
28 (only retried tests) 6 0 5 0 17

🟢 2024-09-09 12:32:46 UTC Build successful.
🔴 2024-09-09 12:33:29 UTC ydbd size 8.4 GiB changed* by +3.7 MiB, which is >= 2.0 MiB vs main: Alert

ydbd size dash main: b440d31 merge: 7a9d92c diff diff %
ydbd size 9 017 352 288 Bytes 9 021 234 472 Bytes +3.7 MiB +0.043%
ydbd stripped size 487 429 128 Bytes 487 647 240 Bytes +213.0 KiB +0.045%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Sep 9, 2024

2024-09-09 10:45:14 UTC Pre-commit check linux-x86_64-release-asan for 7a9d92c has started.
2024-09-09 10:48:36 UTC ya make is running...
🔴 2024-09-09 12:38:31 UTC Some tests failed, follow the links below.

Test history | Ya make output

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14144 13966 0 51 95 32

🟢 2024-09-09 12:39:51 UTC Build successful.
🔴 2024-09-09 12:40:27 UTC ydbd size 5.6 GiB changed* by +2.8 MiB, which is >= 2.0 MiB vs main: Alert

ydbd size dash main: b440d31 merge: 7a9d92c diff diff %
ydbd size 6 032 161 600 Bytes 6 035 051 528 Bytes +2.8 MiB +0.048%
ydbd stripped size 1 510 169 648 Bytes 1 510 821 552 Bytes +636.6 KiB +0.043%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Sep 9, 2024

2024-09-09 10:45:17 UTC Pre-commit check linux-x86_64-release-clang14 for 7a9d92c has started.
2024-09-09 10:48:35 UTC ya make is running...
🟢 2024-09-09 11:02:16 UTC Build successful.

@MBkkt MBkkt mentioned this pull request Sep 9, 2024
17 tasks
@maximyurchuk maximyurchuk merged commit 0787510 into ydb-platform:main Sep 9, 2024
7 of 12 checks passed
@shnikd shnikd mentioned this pull request Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants