v1.16: Shares accounts hash cache data between full and incremental (backport of #33164) #33212

mergify · 2023-09-11T19:57:55Z

This is an automatic backport of pull request #33164 done by Mergify.
Cherry-pick of 6298c6c has failed:

On branch mergify/bp/v1.16/pr-33164
Your branch is up to date with 'origin/v1.16'.

You are currently cherry-picking commit 6298c6c31e.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   runtime/src/accounts_db.rs
	both modified:   runtime/src/cache_hash_data.rs

no changes added to commit (use "git add" and/or "git commit -a")

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/github/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

Mergify commands and options

More conditions and actions can be found in the documentation.

You can also trigger Mergify actions by commenting on this pull request:

@Mergifyio refresh will re-evaluate the rules
@Mergifyio rebase will rebase this PR on its base branch
@Mergifyio update will merge the base branch into this PR
@Mergifyio backport <destination> will backport this PR on <destination> branch

Additionally, on Mergify dashboard you can:

look at your merge queues
generate the Mergify configuration with the config editor.

Finally, you can contact us on https://mergify.com

(cherry picked from commit 6298c6c) # Conflicts: # runtime/src/accounts_db.rs # runtime/src/cache_hash_data.rs

brooksprumo · 2023-09-11T21:41:48Z

runtime/src/accounts_db.rs

@@ -2370,7 +2368,7 @@ impl AccountsDb {
                (accounts_hash_cache_path, None)
            } else {
                let temp_dir = TempDir::new().expect("new tempdir");
-                let cache_path = temp_dir.path().join(Self::DEFAULT_ACCOUNTS_HASH_CACHE_DIR);
+                let cache_path = temp_dir.path().to_path_buf();


This temp dir is only used by the accounts hash cache. By using it directly, we don't need to create a subdir within it.

codecov · 2023-09-11T22:31:48Z

Codecov Report

Merging #33212 (d542486) into v1.16 (2cbbf4b) will decrease coverage by 0.1%.
The diff coverage is 85.7%.

@@            Coverage Diff            @@
##            v1.16   #33212     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         762      762             
  Lines      207822   207824      +2     
=========================================
  Hits       170355   170355             
- Misses      37467    37469      +2

brooksprumo · 2023-09-11T22:32:21Z

@HaoranYi I had to resolve merge conflicts in this backport. Would you be able to review this one again?

HaoranYi

lgtm

brooksprumo · 2023-09-12T15:01:34Z

@t-nelson Requesting your review since you've been fielding some of the validator questions/concerns on this. This will help cleanup some files on disk for validators.

These subsequent PRs will also make their way to v1.16, #33181 and #33183, which will remove all redundant/slightly-confusing files/directories.

brooksprumo · 2023-09-12T15:46:14Z

@CriesofCarrots Requesting your review for 🦅 👀

t-nelson

this will orphan any old dirs lying around, no? we should hold this until a pr that cleans them up is ready if os

t-nelson · 2023-09-12T18:53:17Z

runtime/src/cache_hash_data.rs

+    pre_existing_cache_files: Arc<Mutex<HashSet<PathBuf>>>,
+    should_delete_old_cache_files_on_drop: bool,


the inconsistent terminology here isn't great "old" vs. "pre-existing"

the inconsistent terminology here isn't great "old" vs. "pre-existing"

Yes, it's non-intuitive.

When an instance of CacheHashData is created, we call get_cache_files() on it. That method populates pre_existing_cache_files will all the cache files it finds on disk. As this instance gets used, it is queried for cache files via get_file_reference_to_map_later(). As part of that function, we remove the path from the pre_existing_cache_files.

So, on drop, the remaining paths left in pre_existing_cache_files are all the ones that have not been used. This indicates they are now old, since they were not part of the accounts hash calculation, and can be deleted.

So the tl;dr is that on drop, "pre-existing" means "old", which is why both the method (delete_old_cache_files()) and the flag (should_delete_old_cache_files_on_drop) have "old" in the name.

We could tweak that names to reflect this, or add documentation, or a second HashSet/Container for the pre-existing vs the old files. I would image that those changes should go to master first.

brooksprumo · 2023-09-12T19:13:53Z

this will orphan any old dirs lying around, no? we should hold this until a pr that cleans them up is ready if os

That v1.16 backport PR is here: #33224. We need to merge this one first, otherwise we'll purge the in-use directory on every restart.

Edit: Ideally we land all these backport PRs in the same release, and then there will be no issue.

CriesofCarrots · 2023-09-12T19:27:02Z

@brooksprumo , can you make the actual case for backporting? ie. why is it important these changes go in v1.16 instead of waiting for v1.17?

brooksprumo · 2023-09-12T19:42:07Z

@brooksprumo , can you make the actual case for backporting? ie. why is it important these changes go in v1.16 instead of waiting for v1.17?

Once we enable Incremental Accounts Hash, we'll start doing a different accounts hash for incremental snapshots. This will greatly reduce the amount of work each incremental snapshot needs to do for its accompanying (now-incremental) accounts hash.

As part of that, the accounts hash cache files, which are the input to the calculation, are somewhat duplicated. IAH uses a smaller range of slots, so the amount of duplication is smaller.

But the two main issues are (1) the cache files from IAH are not reused with a normal, full accounts hash. This negatively impacts full accounts hash calculations, as it'll need to redo the work to create the input cache files that IAH already did. Since IAH is not enabled on any cluster, we'll see a perf degradation for full accounts hash calculations once IAH is turned on.

And (2), since the full accounts hash calculation will have to redo the same work as IAH, we'll see an unnecessary increase in disk usage. The amount of extra disk usage should be small (I'm guessing single-digit GBs), but why waste the space unnecessarily?

I don't expect the perf degradation from (1) to be large either, and since the accounts hash calculation is in the background, it likely will not impact much. Both issues (1) and (2) get worse as (a) the number of accounts grows, and (b) the snapshot intervals grow. (b) is unlikely to change, but we will continue to see the number of accounts grow.

Edit:

So it would be advantageous to get these fixes in before IAH is turned on, so no issues are even experienced. Additionally, correctness is not an issue, just some cache and its subsequent performance implications. All of this is resolved by changing paths, which feels like a low risk resolution to me.

CriesofCarrots · 2023-09-13T05:16:28Z

@brooksprumo , thanks for the detailed reply! This feels like a bug fix for a v1.16-timeline feature, so I am on board with the backport.

t-nelson · 2023-09-13T15:03:37Z

Edit: Ideally we land all these backport PRs in the same release, and then there will be no issue.

yes this. please be sure that #releng is aware

Shares accounts hash cache data between full and incremental (#33164)

f2e100c

(cherry picked from commit 6298c6c) # Conflicts: # runtime/src/accounts_db.rs # runtime/src/cache_hash_data.rs

mergify bot added the conflicts label Sep 11, 2023

mergify bot assigned brooksprumo Sep 11, 2023

brooksprumo force-pushed the mergify/bp/v1.16/pr-33164 branch from 790f82f to 946fb0e Compare September 11, 2023 21:33

fix merge conflicts

d542486

brooksprumo force-pushed the mergify/bp/v1.16/pr-33164 branch from 946fb0e to d542486 Compare September 11, 2023 21:40

brooksprumo reviewed Sep 11, 2023

View reviewed changes

brooksprumo requested a review from HaoranYi September 11, 2023 22:31

brooksprumo requested review from brooksprumo and removed request for brooksprumo September 11, 2023 23:05

HaoranYi approved these changes Sep 12, 2023

View reviewed changes

brooksprumo requested a review from t-nelson September 12, 2023 14:59

brooksprumo requested a review from CriesofCarrots September 12, 2023 15:45

t-nelson reviewed Sep 12, 2023

View reviewed changes

brooksprumo requested a review from t-nelson September 12, 2023 19:44

This was referenced Sep 13, 2023

v1.16: Uses hex for the hash in accounts hash cache file names (backport of #33196) #33223

Merged

v1.16: Transient accounts hash cache dir is unnecessary (backport of #33181) #33221

Merged

t-nelson merged commit c0dcc77 into v1.16 Sep 13, 2023

t-nelson deleted the mergify/bp/v1.16/pr-33164 branch September 13, 2023 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.16: Shares accounts hash cache data between full and incremental (backport of #33164) #33212

v1.16: Shares accounts hash cache data between full and incremental (backport of #33164) #33212

mergify bot commented Sep 11, 2023

brooksprumo Sep 11, 2023

codecov bot commented Sep 11, 2023

brooksprumo commented Sep 11, 2023

HaoranYi left a comment

brooksprumo commented Sep 12, 2023 •

edited

Loading

brooksprumo commented Sep 12, 2023

t-nelson left a comment

t-nelson Sep 12, 2023

brooksprumo Sep 12, 2023

brooksprumo commented Sep 12, 2023 •

edited

Loading

CriesofCarrots commented Sep 12, 2023

brooksprumo commented Sep 12, 2023 •

edited

Loading

CriesofCarrots commented Sep 13, 2023

t-nelson commented Sep 13, 2023

		pre_existing_cache_files: Arc<Mutex<HashSet<PathBuf>>>,
		should_delete_old_cache_files_on_drop: bool,

v1.16: Shares accounts hash cache data between full and incremental (backport of #33164) #33212

v1.16: Shares accounts hash cache data between full and incremental (backport of #33164) #33212

Conversation

mergify bot commented Sep 11, 2023

brooksprumo Sep 11, 2023

Choose a reason for hiding this comment

codecov bot commented Sep 11, 2023

Codecov Report

brooksprumo commented Sep 11, 2023

HaoranYi left a comment

Choose a reason for hiding this comment

brooksprumo commented Sep 12, 2023 • edited Loading

brooksprumo commented Sep 12, 2023

t-nelson left a comment

Choose a reason for hiding this comment

t-nelson Sep 12, 2023

Choose a reason for hiding this comment

brooksprumo Sep 12, 2023

Choose a reason for hiding this comment

brooksprumo commented Sep 12, 2023 • edited Loading

CriesofCarrots commented Sep 12, 2023

brooksprumo commented Sep 12, 2023 • edited Loading

CriesofCarrots commented Sep 13, 2023

t-nelson commented Sep 13, 2023

brooksprumo commented Sep 12, 2023 •

edited

Loading

brooksprumo commented Sep 12, 2023 •

edited

Loading

brooksprumo commented Sep 12, 2023 •

edited

Loading