Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

rework accounts hash calc dedup to avoid kernel file error #33195

Merged
merged 5 commits into from
Sep 11, 2023

Conversation

jeffwashington
Copy link
Contributor

@jeffwashington jeffwashington commented Sep 8, 2023

Problem

#33174

Summary of Changes

use mmap file instead of BufWriter. The temp file is mmapped for reading anyway.
This avoids the BufWriter api completely.

Fixes #

@jeffwashington jeffwashington added the v1.16 PRs that should be backported to v1.16 label Sep 8, 2023
@codecov
Copy link

codecov bot commented Sep 8, 2023

Codecov Report

Merging #33195 (34e37e2) into master (dc6b1eb) will increase coverage by 0.0%.
Report is 3 commits behind head on master.
The diff coverage is 91.8%.

@@           Coverage Diff           @@
##           master   #33195   +/-   ##
=======================================
  Coverage    82.1%    82.1%           
=======================================
  Files         786      786           
  Lines      211596   211608   +12     
=======================================
+ Hits       173777   173843   +66     
+ Misses      37819    37765   -54     

@jeffwashington jeffwashington force-pushed the redo_dedup_file branch 3 times, most recently from 150471e to 8103560 Compare September 8, 2023 20:38
@jeffwashington
Copy link
Contributor Author

Perf is about .5s slower in calculating hash.
This is background process, so the difference is not important.
image
The startup hash calculation actually looks to be faster by around 2-3s.

@jeffwashington jeffwashington marked this pull request as ready for review September 11, 2023 13:21
Comment on lines +107 to +115
let map = unsafe { MmapMut::map_mut(&data) };
let map = map.unwrap_or_else(|e| {
error!(
"Failed to map the data file (size: {}): {}.\n
Please increase sysctl vm.max_map_count or equivalent for your platform.",
self.capacity, e
);
std::process::exit(1);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that in append_vec.rs if creating the Mmap fails we'll call exit(), but that feels strange to me. Seems like a panic or expect would work here too. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks to be copied from append vec, where stephen wrote this 3 years ago.
Some thoughts:
error messages get logged immediately.
Panics get delayed sometimes and eaten until a thread is joined (IIRC).
Either way it is an error condition. I'm not sure what is best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, likely won't matter much, since if we cannot create a new mmap file then it's likely the whole system is borked.

I tend to think a library function shouldn't call exit, but instead assert its invariants. We also get a backtrace with panics, but I doubt it would contain useful information. I'd lean towards doing an expect with a message like "create accounts hashes mmap file".

Not a dealbreaker for me, so I defer to you on this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#12213
Here is the source of the copied code.
@sakridge what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some inconsistency.

Here's an info I added a few years ago: #17727

Sometimes we just unwrap the mmap.

Ok(unsafe { MmapMut::map_mut(file).unwrap() })

accounts-db/src/accounts_hash.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@brooksprumo
Copy link
Contributor

brooksprumo commented Sep 11, 2023

Can the title of this PR be updated before merging? (Mainly to remove the 'wip')

@jeffwashington jeffwashington changed the title wip: rework accounts hash calc dedup to avoid kernel file error rework accounts hash calc dedup to avoid kernel file error Sep 11, 2023
@jeffwashington jeffwashington merged commit 4dfe62a into solana-labs:master Sep 11, 2023
mergify bot pushed a commit that referenced this pull request Sep 11, 2023
* in hash calc, calculate max_inclusive_num_pubkeys

* in hash calc, dedup uses mmap files to avoid os panic

* as_mut_ptr

* remove unsafe code

* refactor count in hash files

---------

Co-authored-by: HaoranYi <haoran.yi@solana.com>
(cherry picked from commit 4dfe62a)

# Conflicts:
#	runtime/src/accounts_hash.rs
jeffwashington added a commit that referenced this pull request Sep 18, 2023
…ckport of #33195) (#33204)

* rework accounts hash calc dedup to avoid kernel file error (#33195)

* in hash calc, calculate max_inclusive_num_pubkeys

* in hash calc, dedup uses mmap files to avoid os panic

* as_mut_ptr

* remove unsafe code

* refactor count in hash files

---------

Co-authored-by: HaoranYi <haoran.yi@solana.com>
(cherry picked from commit 4dfe62a)

# Conflicts:
#	runtime/src/accounts_hash.rs

* fix merge conflicts

---------

Co-authored-by: Jeff Washington (jwash) <jeff.washington@solana.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
v1.16 PRs that should be backported to v1.16
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants