-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot in latest storer fix #5381
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## rc/v1.6.0 #5381 +/- ##
==========================================
Coverage 80.09% 80.10%
==========================================
Files 706 708 +2
Lines 93355 93408 +53
==========================================
+ Hits 74770 74821 +51
- Misses 13269 13275 +6
+ Partials 5316 5312 -4
☔ View full report in Codecov by Sentry. |
state/accountsDB.go
Outdated
} | ||
|
||
select { | ||
case <-time.After(args.WaitTimeForSnapshotEpochCheck): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use time.Timer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
state/accountsDB.go
Outdated
return nil | ||
} | ||
|
||
if storageManager.IsClosed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will be better moved on L1167, immediately after getting the storageManager pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
process/block/metablock.go
Outdated
@@ -1442,8 +1442,8 @@ func (mp *metaProcessor) updateState(lastMetaBlock data.MetaHeaderHandler, lastM | |||
"rootHash", lastMetaBlock.GetRootHash(), | |||
"prevRootHash", prevMetaBlock.GetRootHash(), | |||
"validatorStatsRootHash", lastMetaBlock.GetValidatorStatsRootHash()) | |||
mp.accountsDB[state.UserAccountsState].SnapshotState(lastMetaBlock.GetRootHash()) | |||
mp.accountsDB[state.PeerAccountsState].SnapshotState(lastMetaBlock.GetValidatorStatsRootHash()) | |||
go mp.accountsDB[state.UserAccountsState].SnapshotState(lastMetaBlock.GetRootHash(), lastMetaBlock.GetEpoch()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't work in import-db mode. Please remove the 2 go calls + go call on shardblock.go L1261.
Let's refactor the solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, refactored.
…st-storer-fix # Conflicts: # process/transactionEvaluator/simulationAccountsDB_test.go # process/txsimulator/wrappedAccountsDB.go # state/accountsDB.go # state/export_test.go # state/interface.go
state/snapshotsManager.go
Outdated
mutex sync.RWMutex | ||
} | ||
|
||
// NewSnapshotsManager creates a new snapshots manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// NewSnapshotsManager creates a new snapshots manager | |
// NewSnapshotsManager creates a new snapshots manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
state/accountsDB.go
Outdated
func (adb *AccountsDB) waitForStorageEpochChange(args StorageEpochChangeWaitArgs) error { | ||
if args.SnapshotWaitTimeout < args.WaitTimeForSnapshotEpochCheck { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this func does not seem to be used, i think it was actually moved to snapshotsManager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it was moved. Removed this and moved the tests.
state/snapshotsManager.go
Outdated
// StorageEpochChangeWaitArgs are the args needed for calling the WaitForStorageEpochChange function | ||
type StorageEpochChangeWaitArgs struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if waitForStorageEpochChange
will be removed from accountsDB as it seems that it's not being used, i think we can unexport this struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
state/snapshotsManager.go
Outdated
rootHash, epoch, err := sm.getSnapshotRootHashAndEpoch(trieStorageManager) | ||
if err != nil { | ||
log.Debug("could not retrieve snapshot info", "error", err) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we are not starting the snapshot if there is an error getting the values here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also consider import db processing mode, when starting snapshot after restart?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do not start the snapshot because we do not know which rootHash to snapshot and for which epoch. When the node will reach the next epoch, it will start the snapshot for that epoch.
Regarding import-db, the process will be the same. If the import-db is stoped when a snapshot is in progress, that snapshot will restart when the node restarts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the log to WARN.
state/snapshotsManager.go
Outdated
func (sm *snapshotsManager) getSnapshotRootHashAndEpoch(trieStorageManager common.StorageManager) ([]byte, uint32, error) { | ||
sm.mutex.RLock() | ||
defer sm.mutex.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need mutex lock here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, removed.
state/snapshotsManager.go
Outdated
} | ||
|
||
func (sm *snapshotsManager) setStateCheckpoint(rootHash []byte, trieStorageManager common.StorageManager) { | ||
log.Trace("accountsDB.SetStateCheckpoint", "root hash", rootHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Trace("accountsDB.SetStateCheckpoint", "root hash", rootHash) | |
log.Trace("snapshotsManager.SetStateCheckpoint", "root hash", rootHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
state/peerAccountsDB_test.go
Outdated
@@ -4,6 +4,7 @@ import ( | |||
"bytes" | |||
"errors" | |||
"fmt" | |||
"github.com/multiversx/mx-chain-core-go/core/atomic" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
state/snapshotsManager.go
Outdated
|
||
select { | ||
case <-timer.C: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this continue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
state/snapshotsManager.go
Outdated
sm.waitForCompletionIfAppropriate(stats) | ||
} | ||
|
||
func (sm *snapshotsManager) verifyAndStartSnapshot(rootHash []byte, epoch uint32, trieStorageManager common.StorageManager) (*snapshotStatistics, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a new function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, is a rename of prepareSnapshot()
with some small changes. Reverted the name as prepareSnapshot
is more accurate to what the func actually does
state/snapshotsManager_test.go
Outdated
sm, _ := state.NewSnapshotsManager(getDefaultSnapshotManagerArgs()) | ||
assert.Equal(t, state.ErrNilTrieSyncer, sm.StartSnapshotAfterRestartIfNeeded(&storageManager.StorageManagerStub{})) | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the line between sub-tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return isActiveDB(stsm) | ||
} | ||
|
||
func isActiveDB(stsm *snapshotTrieStorageManager) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this logic moved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed. The reason was that when the node starts, it opens as many dbs as necessary. At epoch change the node also manages the opening/closing of epoch storers. So these checks are redundant. As an added bonus, even if something happens and a certain snapshot can not be completed fully from the node's own storage, the data will be synced from other nodes (this will not happen if these checks were still in place because the snapshot would not start at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ Log scanner @@
snapshot-in-latest-storer-fix
================================================================================
- Known Warnings 13
- New Warnings 7
- Known Errors 0
- New Errors 1
- Panics 0
================================================================================ - block hash does not match 22167
- wrong nonce in block 7366
- miniblocks does not match 0
- num miniblocks does not match 0
- miniblock hash does not match 0
- block bodies does not match 0
- receipts hash missmatch 1
================================================================================ - No jailed nodes on the testnet
================================================================================
Reasoning behind the pull request
Proposed changes
Testing procedure
Pre-requisites
Based on the Contributing Guidelines the PR author and the reviewers must check the following requirements are met:
feat
branch created?feat
branch merging, do all satellite projects have a proper tag insidego.mod
?