-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
Cool. I didn't review the code in-depth yet, but from the description and scanning it, it seems like a really good direction. |
52ef243
to
4f8fa22
Compare
Codecov Report
@@ Coverage Diff @@
## master #9527 +/- ##
========================================
+ Coverage 80.4% 80.5% +0.1%
========================================
Files 283 283
Lines 65868 66382 +514
========================================
+ Hits 52995 53498 +503
- Misses 12873 12884 +11 |
7ff9f55
to
18d14c0
Compare
assert_eq!( | ||
range, | ||
Pubkey::new_from_array([ | ||
0x02, 0x82, 0x5a, 0x89, 0xd1, 0xac, 0x58, 0x9c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my math is correct:
[5] pry(main)> step = (0xffff_ffff_ffff_ffff / 12345)
=> 1494268454735484
[10] pry(main)> "%016x" % (step * 121)
=> "02825a89d1ac589c"
0x00, 0x00, 0x00, 0x00 | ||
]) | ||
..=Pubkey::new_from_array([ | ||
0x15, 0x3c, 0x1d, 0xf1, 0xc6, 0x39, 0xef, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[9] pry(main)> "%016x" % ((step * 1024) - 1)
=> "153c1df1c639efff"
[12] pry(main)> 1023 - 120 # as index (in test)
=> 903
[13] pry(main)> 1024 - 121 # as raw multiple (in this pry)
=> 903
29c10f6
to
4cfe6a7
Compare
19efeaa
to
4fa5e4e
Compare
I couldn't successfully launch a validator to devnet patched with #10099.... Too large for now. Anyway, devnet will slowly reduce its snapshot size. |
@ryoqun - that store count graph looks pretty, can you add it to the main dashboard? |
@mvines Sure! :) |
@jstarry Thanks for reviewing!
Well, this isn't case. Hint: parallelization. :)
Nice point! Yeah, I didn't explicitly mention about that attack scenario but subsequent leaders are free to parallelize the eager collection as much as they could. Though, I haven't implemented it. ;) EDIT: Strictly speaking, it's a bit hard to scan subrange of BTreeMap in parallel. But, grouped (this happens when previous leaders couldn't handle previous eager rent collections) subrange scan of b-trees in general should be quite fast. :)
Well, such activity is easily observable because alive account pubkeys are always public information. However monitoring for such thing isn't implemented... |
Oops, I just noticed AccountsDB::frozen_accounts would trigger panic with eager rent collection... Need to add an exception for that. FYI: @mvines |
Yikes, ok. Thanks for the heads up. This'll block upgrading mainnet-beta to 1.1 at the end of the month |
Ah 😄
Under the new design, keys at the end of the range will be slightly more valuable since they will be valid for (currently) 2 more days than those at the beginning of the range. So I think system load will skew towards the end of the epoch. For instance, a simple client can always generate two accounts and choose the one towards the end of the range. Could we add a deterministic offset to avoid this? EDIT: The above comment assumes that lazy rent collection is eventually removed |
Yeah, this concern is real.
Well, lazy rent collection won't be removed due to the very reason you just mentioned. Then, problem solved. :) So, even after the eager rent collection is fully enabled, we always collect rent when accessed for the given epoch. |
Cool, agreed! So |
@jstarry Not necessarily. We could remove it because the slot at which the old state of any given account was updated is computable easily. Otherwise, it would mean we don't have deterministic set of all alive accounts at any moment... ;) Moreover, for program reentrant thing, it could be morphed into more granular |
So it means we could infer whether we already collected the rent or not for the current epoch easily. |
@mvines I haven't tested but I'm sure my code reading is right: When frozen account is eagerly collected (this always happens regardless of rent-exempt or not to avoid #8931), we store the account via Bank much like other usual updates, so it changes the account hash due to the incremented solana/runtime/src/accounts_db.rs Line 1934 in 5e43304
We could add a special condition there to white-list eager rent collection update. Otherwise it means there is a bug where we don't collect all of accounts or frozen account doesn't catch account updates. ;) |
solana/runtime/src/accounts_db.rs Lines 1380 to 1393 in 5e43304
|
@mvines Lucky! Then, we're ok. I just didn't notice there is different account hashing for frozen account and you even have a test for solana/runtime/src/accounts_db.rs Line 3376 in 5e43304
Just cool. :) |
Status update (devnet): eager rent collection is working fine. |
🎊 🎊 🎊 |
Finally, this will be enabled on the mainnet-beta on epoch 34 in 18 hours. I'm planning to watch the moment, personally. :) |
Finally, this went alive on mainnet-beta!!! Also it seems that we managed to avoid #10206, so as for the master branch & mainet-beta compatibility #10163 , this should be resolved after exactly this single epoch 34, otherwise it's bug.. ;)
|
Problem
rent-due (!= rent-exempt) accounts are never garbage-collected unless updated (i.e. lazy collection), which accumulates over time and can be a DoS attack vector.
Also, the current rent collection mechanism has some gaming factor due to being lazy. (noted below)
Even more, there is some UX issue for the lazy rent collection.
Summary of Changes
not implemented yetimplemented) for each slot progress, while updating all accounts.AccountIndex
was switched fromHashMap
toBTreeMap
for range queries to realize thatAppendVec
is now bound to the number of slots of given epoch, combined with the recently-introduced per-slot shrinking Introduce background stale AppendVec shrink mechanism #9219.AccountsIndex.roots
.account.rent_epoch
as a bonus. (ACCOUNT_STORAGE_OVERHEAD
: 128 -> 120, -6.25% for IO bandwidth)if
guard is needed.minor problem of current rent colletion
Also I found that the lazy rent collection is problematic because delayed rent collection doesn't account for past rent fees were different or not.
That means the current calculation is based on the latest rent fee multiple of epochs. it's not the integral of past actual (dynamic) rent fees over epochs.
This means it introduces some gaming much like recently removed
redeem-vote-credits
(#7916):A validator or an account owner might send 0 lamports to rent-due accounts to collect/save rent when the at-the-time dynamic rent is more profitable.
Although, the dynamic rent isn't still implemented...
(cons) new dos vector
An attacker can send lamports to arbitrary address if prepared to burn them so we're vulnerable to a new attack vector of targeted DoS (send so many pubkeys under some subrange corresponding to victim leader's slot), given publicly available deterministic next epoch's leader schedule.
But, I think that's tolerable given that there is already easier attack with similar threat profile: just flood the victim leader by burst TXes for some specific slot.
Still, If forced to mitigate, I think cloning the whole
Pubkey
set is required and then.iter().chunk(.len() / SLOT_COUNT)
-ing on it.Alternatives:
account_index
and collect rent when new epoch beginsclone()
ofaccount_index.keys()
at each epoch boundary and process.chunk()
of it for each slot like this PR.BTreeMap
.AppendVec
for several slots showed performance degradation in the past. Also, this solution still doesn't solve stale rent-due accounts and gaming problem.HashSet<Pubkey>
for rent-due accounts, separately:(Also, I've lightly researched the rent mechanism of other blockchain projects and found there is no better solution. ;)
todo (once the direction is confirmed)
[ ] update docs for rents(let's do with another future pr, this pr is already big). Update docs for eager rent collection #10348BTreeMap
overHashMap
(doing right now)Fixes #9383