-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precise per-component ETH-denominated rewards tracking in ncli_db validatorDb
#3107
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -120,16 +120,36 @@ func initiate_validator_exit*(cfg: RuntimeConfig, state: var ForkyBeaconState, | |
validator.withdrawable_epoch = | ||
validator.exit_epoch + cfg.MIN_VALIDATOR_WITHDRAWABILITY_DELAY | ||
|
||
# https://github.com/ethereum/consensus-specs/blob/v1.1.5/specs/phase0/beacon-chain.md#slash_validator | ||
# https://github.com/ethereum/consensus-specs/blob/v1.1.5/specs/altair/beacon-chain.md#modified-slash_validator | ||
# https://github.com/ethereum/consensus-specs/blob/v1.1.4/specs/merge/beacon-chain.md#modified-slash_validator | ||
# Extracted from `slash_validator` in order to reuse in ncli_db for detailed rewards tracking | ||
proc get_slashing_penalty*( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this could simply use overloading, which is what most of the state transition uses - it's easier to recognise, generally There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code where this was extracted from was not using overloading. In general, I disagree that overloading is superior to a series of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. my point was mostly about conventions, and how we can make this new function stand out less - the conventions help guide the reader with less effort than a custom-written when jungle that breaks the convention (for example because it's easier for tooling to understand it) - in this particular case, it's trivial either way - in the larger functions, making sense of a when jungle with the extra indentation it brings is difficult, so overloading was chosen as the dominant strategy, and it has worked pretty well so far as far as I can tell, including when deciphering error messages. the error messages are perhaps more relevant when you've created a complex and deep type machinery with many layers of generics and significant risk of having the wrong functions being called accidentally, like happens with nim-serialization - this code however is as simple as it gets, and it pays off, error messages included. these style questions are a bit like colors, everyone has a favorite - mixing them all in one place however rarely ends up well (unless you like brown). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the convention is wrong, let's fix the convention. Also, the convention you are describing is hardly followed in the spec code. After all, I've just copy-pasted the exact body of this function which was inlined in another spec function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well, yes, you copied an exception - I mentioned in the review how it can be brought in line with the rest, but also noted it's a minor point. if you want to change the convention, the way to go about it would be through a separate PR that changes the convention across the board - I suspect it will be ugly mainly because it's not enforced by a language rule but rather by the whims of whoever writes the code, which is hard to maintain at a good level. anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've come up with few more arguments against unnecessary use of overloading in general. My primary way of navigating the project is through the "Jump to function name" functionality provided by Vim's CTRL+P plugin and its ctags integration. It looks like this: ... but you get to jump directly to function names. Needless to say, this workflow works much better when no overloading is present. Your assertion that the error messages are just fine also doesn't match my experience. During the development of this feature I've made some trivial typos and I had to face some pretty bad error messages due to the heavy overloading of In general, if we follow the general philosophy that the least powerful tool should be used to solve any particular problem, I think there is a clear gradient of power/complexity that goes like this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hrm, Github doesn't seem to like uploading gifs. Here is the Ctrl+P animation I wanted to share: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
well, with a when jungle, you're jumping to the beginning of a potentially long tree of complex statements that you manually have to disambiguate - first, you have to remember what type you were jumping from (which likely was deduced as well), then find the right when clause by executing the type comparison code in your head and so on, and there is no hope that a tool will ever understand this. With overloading, a tool like nimsuggest can understand that it's supposed to make a contextual jump and gets you to the right spot without the extra processing that you, the developer, have to do. when statements are free-form - they provide no constraining structure that you can learn - at best, you can try to write a style guide with conventions, but that's generally not good enough for tooling, or reading by experience. For anything larger than a few lines of code, the extra level of indent also makes a significant dent in readability - you don't know where the phase0 logic vs the altair logic begins and ends. Soon you end up with a fairly convoluted structure: a manually written when jungle that forwards calls to an actual implementation that has non-overloaded names. All that said, more important than the particular style is the fact that having two or more styles for this problem brings out all the negatives of either approach, but none of the benefits - you end up having to deal with both styles and nobody is happy - this is what my comment primarily was about: most of the spec uses overloading to solve this problem and this new function represents an opportunity to align the code with that convention simply based on a majority decision. It's a completely different argument whether the convention should be changed or not, but no matter what your opinion is on this subject, I think we can agree that this PR is not the time and place for making that change. The function previously used a when jungle because it had a "shared" section - the new function has no shared code - this difference gives the overload argument more weight. If we really want to make this piece of code clean, we'd simply select the constant based on type using a much smaller helper function, then make a generic function with the division logic. Ah, the joy of arguing code style philosophy over morning ☕ :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
very ambitious - however, I'd recommend picking a language that has overloading - free choice makes for a much more compelling argument 😛 |
||
StateType: type ForkyBeaconState, validator: Validator): Gwei = | ||
when StateType is phase0.BeaconState: | ||
validator.effective_balance div MIN_SLASHING_PENALTY_QUOTIENT | ||
elif StateType is altair.BeaconState: | ||
validator.effective_balance div MIN_SLASHING_PENALTY_QUOTIENT_ALTAIR | ||
elif StateType is merge.BeaconState: | ||
validator.effective_balance div MIN_SLASHING_PENALTY_QUOTIENT_MERGE | ||
else: | ||
{.fatal: "Implement this for new forks".} | ||
|
||
# https://github.com/ethereum/consensus-specs/blob/v1.1.5/specs/phase0/beacon-chain.md#slash_validator | ||
# https://github.com/ethereum/consensus-specs/blob/v1.1.5/specs/altair/beacon-chain.md#modified-slash_validator | ||
# https://github.com/ethereum/consensus-specs/blob/v1.1.4/specs/merge/beacon-chain.md#modified-slash_validator | ||
proc slash_validator*( | ||
cfg: RuntimeConfig, state: var ForkyBeaconState, | ||
slashed_index: ValidatorIndex, cache: var StateCache) = | ||
cfg: RuntimeConfig, | ||
state: var ForkyBeaconState, | ||
slashed_index: ValidatorIndex, | ||
cache: var StateCache, | ||
outSlotRewards: SlotRewards | type DontTrackRewards = DontTrackRewards) = | ||
## Slash the validator with index ``index``. | ||
let epoch = get_current_epoch(state) | ||
initiate_validator_exit(cfg, state, slashed_index, cache) | ||
let validator = addr state.validators[slashed_index] | ||
|
||
template validator: untyped = | ||
state.validators[slashed_index] | ||
|
||
trace "slash_validator: ejecting validator via slashing (validator_leaving)", | ||
index = slashed_index, | ||
|
@@ -146,19 +166,10 @@ proc slash_validator*( | |
state.slashings[int(epoch mod EPOCHS_PER_SLASHINGS_VECTOR)] += | ||
validator.effective_balance | ||
|
||
# TODO Consider whether this is better than splitting the functions apart; in | ||
# each case, tradeoffs. Here, it's just changing a couple of constants. | ||
when state is phase0.BeaconState: | ||
decrease_balance(state, slashed_index, | ||
validator.effective_balance div MIN_SLASHING_PENALTY_QUOTIENT) | ||
elif state is altair.BeaconState: | ||
decrease_balance(state, slashed_index, | ||
validator.effective_balance div MIN_SLASHING_PENALTY_QUOTIENT_ALTAIR) | ||
elif state is merge.BeaconState: | ||
decrease_balance(state, slashed_index, | ||
validator.effective_balance div MIN_SLASHING_PENALTY_QUOTIENT_MERGE) | ||
else: | ||
raiseAssert "invalid BeaconState type" | ||
let slashing_penalty = get_slashing_penalty(typeof(state), validator) | ||
decrease_balance(state, slashed_index, slashing_penalty) | ||
when outSlotRewards is SlotRewards: | ||
outSlotRewards[].data[slashed_index].slashing_outcome -= slashing_penalty.int64 | ||
|
||
# The rest doesn't make sense without there being any proposer index, so skip | ||
let proposer_index = get_beacon_proposer_index(state, cache) | ||
|
@@ -186,6 +197,9 @@ proc slash_validator*( | |
increase_balance( | ||
state, whistleblower_index, whistleblower_reward - proposer_reward) | ||
|
||
when outSlotRewards is SlotRewards: | ||
outSlotRewards[].data[whistleblower_index].slashing_outcome += whistleblower_reward.int64 | ||
|
||
func genesis_time_from_eth1_timestamp*(cfg: RuntimeConfig, eth1_timestamp: uint64): uint64 = | ||
eth1_timestamp + cfg.GENESIS_DELAY | ||
|
||
|
@@ -621,9 +635,13 @@ proc check_attestation*( | |
ok() | ||
|
||
proc process_attestation*( | ||
state: var ForkyBeaconState, attestation: SomeAttestation, flags: UpdateFlags, | ||
base_reward_per_increment: Gwei, cache: var StateCache): | ||
Result[void, cstring] {.nbench.} = | ||
state: var ForkyBeaconState, | ||
attestation: SomeAttestation, | ||
flags: UpdateFlags, | ||
base_reward_per_increment: Gwei, | ||
cache: var StateCache, | ||
outSlotRewards: SlotRewards | type DontTrackRewards = DontTrackRewards | ||
): Result[void, cstring] {.nbench.} = | ||
# In the spec, attestation validation is mixed with state mutation, so here | ||
# we've split it into two functions so that the validation logic can be | ||
# reused when looking for suitable blocks to include in attestations. | ||
|
@@ -679,6 +697,9 @@ proc process_attestation*( | |
proposer_reward_denominator = (WEIGHT_DENOMINATOR.uint64 - PROPOSER_WEIGHT.uint64) * WEIGHT_DENOMINATOR.uint64 div PROPOSER_WEIGHT.uint64 | ||
proposer_reward = Gwei(proposer_reward_numerator div proposer_reward_denominator) | ||
increase_balance(state, proposer_index.get, proposer_reward) | ||
when outSlotRewards is SlotRewards: | ||
outSlotRewards[].data[proposer_index.get].proposer_outcome += proposer_reward.int64 | ||
outSlotRewards[].data[proposer_index.get].inclusion_delay = some(int64(state.slot - attestation.data.slot)) | ||
|
||
when state is phase0.BeaconState: | ||
doAssert base_reward_per_increment == 0.Gwei | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -468,7 +468,7 @@ type | |
# block root known to the state. | ||
isPreviousEpochHeadAttester | ||
|
||
RewardStatus* = object | ||
RewardStatus* {.inheritable.} = object | ||
## Data detailing the status of a single validator with respect to the | ||
## reward processing | ||
|
||
|
@@ -483,6 +483,66 @@ type | |
|
||
flags*: set[RewardFlags] | ||
|
||
DetailedRewardsAndPenalties* = object | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. from what I can tell, most of this information is already present in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The compression in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we can fix both problems by exporting building blocks instead of computed data ;) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My opinion is that this is too complicated and can be explored in future PRs. I can perhaps put online the alternative code that I abandoned as a start of such an effort. A defining feature of the current approach is that we currently have assertions that the component-wise balances match the actual balance changes that end up in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
let's start there - an actual example will make it easier to judge. That said, from the point of view of ncli_db, of course it makes sense to put the code here - the sticking point is that ncli_db is not a priority here, and that as far as possible, code for it should be kept out of here. This was a distinctive and clear priority in the architecture of the original solution and division of labour, both when writing the spec code and when creating ncli_db and its validator performance analysis features. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The alternative code is not finished though and I'm personally not convinced that spending more time on it is beneficial. If you strongly disagree with this and you are willing to take over the maintenance of the more complicated version of |
||
## This record in used in `ncli_db validatorDb` to keep track of all | ||
## sources of changes to the validator balances. At the end of each | ||
## transition, `ncli_db` asserts that the detailed book-keeping agrees | ||
## with the final balance recorded in the `BeaconState`, so we must | ||
## really record every possible change (such as a new deposit from | ||
## the same validator which has already occured on mainnet). | ||
## | ||
## The `outcome` fields record what actually happened. | ||
## | ||
## The `max_reward` fields record what could have happened if the | ||
## validator had performed its duties correctly (please note that | ||
## this is still different from the theoretical maximum where all | ||
## validators have performed their duties correctly). | ||
## | ||
## The difference between the two could be considered an opportunity | ||
## cost (or loss) for the validator and this is what we report in the | ||
## `ncli_db validatorDb` jupyter noteboook. | ||
source_outcome*: int64 | ||
max_source_reward*: Gwei | ||
target_outcome*: int64 | ||
max_target_reward*: Gwei | ||
head_outcome*: int64 | ||
max_head_reward*: Gwei | ||
inclusion_delay_outcome*: int64 | ||
max_inclusion_delay_reward*: Gwei | ||
sync_committee_outcome*: int64 | ||
max_sync_committee_reward*: Gwei | ||
proposer_outcome*: int64 | ||
inactivity_penalty*: Gwei | ||
slashing_outcome*: int64 | ||
deposits*: Gwei | ||
inclusion_delay*: Option[int64] | ||
|
||
DetailedRewardStatus* = object of RewardStatus | ||
## `RewardStatus` augmented with detailed `RewardInfo`. | ||
detailed_info*: DetailedRewardsAndPenalties | ||
|
||
SlotRewards* = ref object | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this ref? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nim has an issue with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. seems like the lesser evil still is to do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
outSlotRewards: SlotRewards | type DontTrackRewards There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is the kind of trickery that I very much like to avoid in spec code. the feature / corner case was too complicated for the person who introduced it in the compiler to implement that they couldn't do it well - it's not a good sign in general that people that read the code will understand what's going on, without relying on complex and rarely used language details. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's complex to understand precisely? You can claim that it's unfamiliar, sure, but claiming that it is complex is wildly exaggerated. Also, in your accusations, you often confuse "complicated for the person who introduced it in the compiler to implement" with "the person who implemented in the compiler had very limited development time and wasn't able to follow the perfect Q/A process". By paying the price of using a less familiar trick, you get to follow the zero overhead principle - you pay nothing for a feature that you don't use. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's the point I'm trying to make: it's not zero-overhead - there's a significant cognitive overhead of sorting out the cleverness and filtering out the noise when reading the code, which is something we do a lot when reverse engineering problems. The approach of making The approach of not making these clever things at all and simply check the performance with a benchmark goes even further: it makes the argument that if the overhead cannot be detected in real life, it's not there, and the complexity is actually unwarranted, no matter what theory you had in mind when writing the code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I find this to be an exaggerated claim. Benchmarking the code, measuring the effects on compilation time, making sure that there won't be any regression in your results in the future (e.g. due to changes in the used compilers), etc is more development overhead and technical debt than learning a new Nim trick that everybody can benefit from in other situations. |
||
# Used in ncli_db for detailed rewards and penalties tracking | ||
# in the Altair world where balances can change on every slot | ||
# | ||
# TODO | ||
# This is a ref type only becuase Nim has problems with proc | ||
# signatures such as: | ||
# | ||
# proc foo(x: (var Foo) | Bar) | ||
# | ||
# Even then the `var Foo` branch is instantiated, the `x` param | ||
# will be treated as read-only within the body of the function. | ||
# For the specific use case of `SlotRewards` in `ncli_db`, the | ||
# ref type is considered acceptable and it solves the mutability | ||
# problem. | ||
data*: seq[DetailedRewardsAndPenalties] | ||
|
||
DontTrackRewards* = object | ||
# Passed as a compile-time symbol to the functions tracking | ||
# slot rewards in order to disable the tracking (i.e. this | ||
# is what we use by default in the regular client). | ||
|
||
# https://github.com/ethereum/consensus-specs/blob/v1.1.5/specs/phase0/beacon-chain.md#get_total_balance | ||
TotalBalances* = object | ||
# The total effective balance of all active validators during the _current_ | ||
|
@@ -903,3 +963,27 @@ template isomorphicCast*[T, U](x: U): T = | |
doAssert sizeof(T) == sizeof(U) | ||
doAssert getSizeofSig(T()) == getSizeofSig(U()) | ||
cast[ptr T](unsafeAddr x)[] | ||
|
||
proc sum*(lhs, rhs: DetailedRewardsAndPenalties): DetailedRewardsAndPenalties = | ||
DetailedRewardsAndPenalties( | ||
source_outcome: lhs.source_outcome + rhs.source_outcome, | ||
max_source_reward: lhs.max_source_reward + rhs.max_source_reward, | ||
target_outcome: lhs.target_outcome + rhs.target_outcome, | ||
max_target_reward: lhs.max_target_reward + rhs.max_target_reward, | ||
head_outcome: lhs.head_outcome + rhs.head_outcome, | ||
max_head_reward: lhs.max_head_reward + rhs.max_head_reward, | ||
inclusion_delay_outcome: lhs.inclusion_delay_outcome + | ||
rhs.inclusion_delay_outcome, | ||
max_inclusion_delay_reward: lhs.max_inclusion_delay_reward + | ||
rhs.max_inclusion_delay_reward, | ||
sync_committee_outcome: lhs.sync_committee_outcome + | ||
rhs.sync_committee_outcome, | ||
max_sync_committee_reward: lhs.max_sync_committee_reward + | ||
rhs.max_sync_committee_reward, | ||
proposer_outcome: lhs.proposer_outcome + rhs.proposer_outcome, | ||
inactivity_penalty: lhs.inactivity_penalty + rhs.inactivity_penalty, | ||
slashing_outcome: lhs.slashing_outcome + rhs.slashing_outcome, | ||
deposits: lhs.deposits + rhs.deposits, | ||
inclusion_delay: if lhs.inclusion_delay.isSome: lhs.inclusion_delay | ||
elif rhs.inclusion_delay.isSome: rhs.inclusion_delay | ||
else: none(int64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be exposed - expose a
readOnly
function insteadThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and add a test - the db has fairly extensive coverage and should have coverage for this as well