-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address rare corruption of NFTokenPage linked list #4945
Conversation
It was discovered that under rare circumstances the links between NFTokenPages could be removed. If this happens, then the account_objects and account_nfts RPC commands under-report the NFTokens owned by an account. The fixNFTokenPageLinks amendment does the following to address the problem: - It fixes the underlying problem so no further broken links should be created. - It adds Invariants so, if such damage were introduced in the future, an invariant would stop it. - It adds a new FixLedgerState transaction that repairs directories that were damaged in this fashion. - It adds unit tests for all of it.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #4945 +/- ##
========================================
Coverage 74.8% 74.9%
========================================
Files 766 768 +2
Lines 63004 63134 +130
Branches 8842 8848 +6
========================================
+ Hits 47139 47264 +125
- Misses 15865 15870 +5
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Nice job on this!
You're probably going to be unhappy with the number of comments that I tagged as "consider removing", but in almost all cases they are comments where the comment mirrors a single line of code and doesn't add any benefit (and we don't want things in the code that don't pull their weight).
But 👍 no matter how you decide to resolve the comments, it's your call.
@@ -633,6 +663,22 @@ ValidNFTokenPage::finalize( | |||
return false; | |||
} | |||
|
|||
// Only enforce these invariants if fixNFTokenPageLinks is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the code without the comment. The if statement below is not confusing.
switch (ctx.tx[sfLedgerFixType]) | ||
{ | ||
case FixType::nfTokenPageLink: | ||
// FixType::nfTokenPageLink requires an Owner field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the code without the comment. We can see from the case statement immediately above that this is "FixType::nfTokenPageLink" and the code immediately below that it requires an owner field.
if (prev) | ||
{ | ||
// Make our previous page point to our next page: | ||
// With fixNFTokenPageLinks... | ||
// The page is empty and there is a prev. If the last page of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is double spaced (I noticed the others were too). I don't think double spacing is very common, but I don't object too strongly if you prefer it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean that there are two spaces after the period? In the days of typewriters (yes, I'm that old) that was the standard. I still think it improves readability with a monospaced font, which I hope most folks who read code with indentation are using. So, since you don't object strongly, I'll leave it.
bool | ||
repairNFTokenDirectoryLinks(ApplyView& view, AccountID const& owner) | ||
{ | ||
// Assume that no repair is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing lots of comments of like this - a comment and then a line of code immediately below it that doesn't need the comment. I'll tag the ones I'd consider removing, but I'll stop saying "because it mirrors the line of code below it and the code is better without things that don't pull their weight".
// Assume that no repair is required. | ||
bool didRepair = false; | ||
|
||
// Walk from the first NFTokenPage owned by owner to the last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing.
view.update(page); | ||
} | ||
|
||
// nextPage should have a link to page; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing.
if (!nextPage->isFieldPresent(sfPreviousPageMin) || | ||
nextPage->getFieldH256(sfPreviousPageMin) != page->key()) | ||
{ | ||
// Repair the link to page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing.
// We need special handling for the last page. | ||
break; | ||
|
||
// Continue to walk the links. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing.
|
||
// The lastPage is present in the ledger and has the right index. There's | ||
// also a previous page. Make sure that the last page does not have a | ||
// next link. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing.
src/ripple/protocol/Feature.h
Outdated
@@ -74,7 +74,7 @@ namespace detail { | |||
// Feature.cpp. Because it's only used to reserve storage, and determine how | |||
// large to make the FeatureBitset, it MAY be larger. It MUST NOT be less than | |||
// the actual number of amendments. A LogicError on startup will verify this. | |||
static constexpr std::size_t numFeatures = 68; | |||
static constexpr std::size_t numFeatures = 69; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a reminder that this needs to be incremented when the other amendment is merged. We won't get a merge conflict, since they change to the same number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good other than a minor typo 😃
Why do you use the custom transaction method instead of doing this automatically which is what happens in |
@exunda asked:
That's a fair question. It was answered in the (admittedly long) description of the pull request. Here's the quote:
Does that answer your question? |
Yes thank you for explaining more |
Due to changes in |
@intelliot, I'll work on that merge. |
Once the merge is done, will this PR be ready to merge? |
…-page-delete-merge4
@ximinez, the merge is almost done -- I need to fix some clang formatting. Gahh, merges across the restructure are painful. Once clang-format is fixed (soon) I think it could be merged to develop. It has two approvals. But, given that it's an amendment, it would be nice if there were one more review from a senior developer. I had not marked the branch as passed because I was waiting for that additional review. |
BEAST_EXPECT( | ||
bobLastNFTokenPage->at(sfPreviousPageMin) != | ||
bobMiddleNFTokenPageIndex); | ||
BEAST_EXPECT(!bobLastNFTokenPage->isFieldPresent(sfNextPageMin)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth checking that the sfPreviousPageMin
page points to the first page, and that the first page's sfNextPageMin
points to this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea. I ran out of time so I didn't make that change. But someone else could.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and pushed this change. Someone else will need to reverse review it.
|
||
BEAST_EXPECT( | ||
carolLastNFTokenPage->isFieldPresent(sfPreviousPageMin)); | ||
BEAST_EXPECT(!carolLastNFTokenPage->isFieldPresent(sfNextPageMin)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Should we confirm that the pages all point to each other correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. Same response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and pushed this change. Someone else will need to reverse review it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, @ximinez! They were all good suggestions. I got to most of them, but I ran out of time on a couple that I noted. Maybe someone else can do those? I think it's also not the end of the world if those changes to the tests don't happen.
Thanks again!
BEAST_EXPECT( | ||
bobLastNFTokenPage->at(sfPreviousPageMin) != | ||
bobMiddleNFTokenPageIndex); | ||
BEAST_EXPECT(!bobLastNFTokenPage->isFieldPresent(sfNextPageMin)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea. I ran out of time so I didn't make that change. But someone else could.
|
||
BEAST_EXPECT( | ||
carolLastNFTokenPage->isFieldPresent(sfPreviousPageMin)); | ||
BEAST_EXPECT(!carolLastNFTokenPage->isFieldPresent(sfNextPageMin)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. Same response.
* This is from suggestions I made on the review, so someone else will have to review these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since @scottschurr is unavailable to give final sign off, I'll do it: This is ready to merge.
* Add fixNFTokenPageLinks amendment: It was discovered that under rare circumstances the links between NFTokenPages could be removed. If this happens, then the account_objects and account_nfts RPC commands under-report the NFTokens owned by an account. The fixNFTokenPageLinks amendment does the following to address the problem: - It fixes the underlying problem so no further broken links should be created. - It adds Invariants so, if such damage were introduced in the future, an invariant would stop it. - It adds a new FixLedgerState transaction that repairs directories that were damaged in this fashion. - It adds unit tests for all of it.
High Level Overview of Change
It was discovered that there are a couple of NFT directories on the MainNet that are missing links in the middle of the chain. This pull request does several things to address the corruption:
LedgerStateFix
, is added which allows any account on ledger to pay the fee to repair the links of any single account.A new amendment,
fixNFTokenPageLinks
, includes all of these changes.Details follow.
The Source of the Corruption
There are two different transactions that introduced the corruption into the two directories. In both cases the following conditions were met:
When these conditions were met, the last NFToken page was removed and the next-to-last page was left as the final page in the directory.
That would be fine except the NFToken directory has a an expectation that the last page has a specific index. The page with that index was just deleted. So when an NFToken is added to the directory, and that token has a high enough value that it does not belong on the new last page, then a new last page is created that has no links to the previous page.
Now there's a linked list with a hole in the middle of the list.
The
fixNFTokenPageLinks
amendment modifies the NFToken page coalescing code to notice when the very last page of the directory would be removed. In that case it simply moves all of the contents of the next lower page into the last page and deletes the next-to-last page (and fixes up the links).New invariant checks also validate aspects of the links on pages, so a similar corruption should cause a
tecINVARIANT_FAILED
transaction result. That should prevent similar corruptions in the future.The New Transaction:
LedgerStateFix
Once the amendment goes live then that prevents any similar NFToken directory corruption in the future. But we are still left with two damaged NFToken directories. And potentially more that could be created between now and when the amendment goes live.
We could leave the corruptions. I don't like that idea for two reasons:
We could put in code that cleans up the two known corrupted directories when the amendment goes live. That's not a good approach for two reasons:
We need a transaction to clean up the directories. There are no transactions that just naturally walk an NFToken directory. So we can't leverage an existing transaction. There are RPC commands that walk the directory, but RPC commands can't modify ledger state.
But we would also rather not introduce a new single-use transaction for this very tiny corner case.
So this approach creates a new transaction aimed at cleaning up messes in the ledger:
LedgerStateFix
. We'll use it for this situation. If there are other repairable corruptions in the future we can re-use this transaction to repair those future corruptions.LedgerStateFix
LedgerStateFix
fields (in addition to common fields) are:TransactionType
uint16
LedgerStateFix
Account
STAccount
Fee
STAmount
Flags
uint32
LedgerFixType
uint16
1
is currently supportedOwner
STAccount
LedgerFixType
==1
TransactionType
— identifies this as aLedgerStateFix
transaction.Account
— identifies the account signing and submitting the transaction as well as paying theFee
.Fee
— this transaction is rare and potentially compute intensive. We want to discourage folks from spamming the ledger with these. So the minimum fee is the same as the fee for anAccountDelete
transaction. And, yes, if the transaction fails with atec
code the fee is still charged.Flags
— not needed forLedgerFixType == 1
. May be used for some unknown future ledger fix.LedgerFixType
— for now the only valid value is1
, which fixes the NFToken directory for a single account.Owner
— the account that owns the NFToken directory that needs fixing. Need not have any relationship toAccount
.Fixing NFTokenPage Links
Once the amendment goes live, any corrupted NFTokenPage directory can be fixed by submitting a transaction like this:
If a repair is actually completed, then
tesSUCCESS
is returned. If there is an error or if the directory did not need repair then atec
code is returned.Future Fixes
Unfortunately this is not the first time ledger state has needed tidying up. Amendment
fixTrustLinesToSelf
removed invalid state from the ledger. Since programmers are humans we can expect similar kinds of problems to show up in the future.If there are future ledger state problems, they can be attached to the
LedgerStateFix
transaction by picking aLedgerFixType
value other than1
. Any additional parameters can be added as optional values.Context of Change
While I was research how efficient NFTokenPages actually are in use, I stumbled across two NFToken directories that were missing links in the middle of the chain. This is a form of ledger corruption. But its consequences do not seem too dire, even for the owners of those directories. Known consequences are:
account_objects
RPC command returns only those objects at the beginning of the NFToken directory up to the point where a link is missing. So, for those accounts with missing links, the NFTokens returned byaccount_objects
is incomplete.account_nfts
RPC command has a problem similar toaccount_objects
.account_nfts
returns NFTokens only up to the point where a link is missing.There may be other consequences, but I've looked for and not found any. Particularly, I've not seen any evidence of lost NFTokens. NFTokens still get added to the correct owner's page. They simply are not reported by the RPC commands.
Type of Change
API Impact
Consideration
This NFT-related amendment arrives at about the same time as a different NFT-related amendment: #4946
Should these two pull requests be merged into a single amendment?