Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address rare corruption of NFTokenPage linked list #4945

Merged
merged 17 commits into from
Aug 7, 2024

Conversation

scottschurr
Copy link
Collaborator

@scottschurr scottschurr commented Mar 9, 2024

High Level Overview of Change

It was discovered that there are a couple of NFT directories on the MainNet that are missing links in the middle of the chain. This pull request does several things to address the corruption:

  1. The source of the corruption was identified and fixed, which should prevent the problem from happening in the future.
  2. A new transaction, LedgerStateFix, is added which allows any account on ledger to pay the fee to repair the links of any single account.
  3. More invariants are added which should prevent similar problems from happening in the future.
  4. And unit tests are added to exercise all of the above changes.

A new amendment, fixNFTokenPageLinks, includes all of these changes.

Details follow.

The Source of the Corruption

There are two different transactions that introduced the corruption into the two directories. In both cases the following conditions were met:

  1. There were at least two NFToken pages in the directory.
  2. The next-to-last page was completely full, holding 32 NFTokens.
  3. The very last page of the directory contained only one NFToken, and the transaction removed that last remaining token from the last page.

When these conditions were met, the last NFToken page was removed and the next-to-last page was left as the final page in the directory.

That would be fine except the NFToken directory has a an expectation that the last page has a specific index. The page with that index was just deleted. So when an NFToken is added to the directory, and that token has a high enough value that it does not belong on the new last page, then a new last page is created that has no links to the previous page.

Now there's a linked list with a hole in the middle of the list.

The fixNFTokenPageLinks amendment modifies the NFToken page coalescing code to notice when the very last page of the directory would be removed. In that case it simply moves all of the contents of the next lower page into the last page and deletes the next-to-last page (and fixes up the links).

New invariant checks also validate aspects of the links on pages, so a similar corruption should cause a tecINVARIANT_FAILED transaction result. That should prevent similar corruptions in the future.

The New Transaction: LedgerStateFix

Once the amendment goes live then that prevents any similar NFToken directory corruption in the future. But we are still left with two damaged NFToken directories. And potentially more that could be created between now and when the amendment goes live.

We could leave the corruptions. I don't like that idea for two reasons:

  1. These corruptions seriously inconvenience those accounts that suffered the corruption, through no fault of their own.
  2. When you drop food on the kitchen floor you clean it up. You don't put up a sign that says, please don't step here. You clean it up.

We could put in code that cleans up the two known corrupted directories when the amendment goes live. That's not a good approach for two reasons:

  1. As mentioned before, additional directories could face similar corruption between now and when the amendment goes live. We would like to be able to clean up all damaged NFToken directories, regardless of when they are created.
  2. The MainNet is not the only place running the XRP Ledger code. Other networks may have similar corruptions that need to be fixed. The approach to the fix should be applicable to other networks as well.

We need a transaction to clean up the directories. There are no transactions that just naturally walk an NFToken directory. So we can't leverage an existing transaction. There are RPC commands that walk the directory, but RPC commands can't modify ledger state.

But we would also rather not introduce a new single-use transaction for this very tiny corner case.

So this approach creates a new transaction aimed at cleaning up messes in the ledger: LedgerStateFix. We'll use it for this situation. If there are other repairable corruptions in the future we can re-use this transaction to repair those future corruptions.

LedgerStateFix

LedgerStateFix fields (in addition to common fields) are:

Field Name Data Type Required/Optional Description
TransactionType uint16 Required LedgerStateFix
Account STAccount Required Account that signs and pays fee
Fee STAmount Required Fee paid for transaction
Flags uint32 Optional Available for future fixes
LedgerFixType uint16 Required Only 1 is currently supported
Owner STAccount Optional Required if LedgerFixType == 1

TransactionType — identifies this as a LedgerStateFix transaction.

Account — identifies the account signing and submitting the transaction as well as paying the Fee.

Fee — this transaction is rare and potentially compute intensive. We want to discourage folks from spamming the ledger with these. So the minimum fee is the same as the fee for an AccountDelete transaction. And, yes, if the transaction fails with a tec code the fee is still charged.

Flags — not needed for LedgerFixType == 1. May be used for some unknown future ledger fix.

LedgerFixType — for now the only valid value is 1, which fixes the NFToken directory for a single account.

Owner — the account that owns the NFToken directory that needs fixing. Need not have any relationship to Account.

Fixing NFTokenPage Links

Once the amendment goes live, any corrupted NFTokenPage directory can be fixed by submitting a transaction like this:

{
   "Account" : "<Signer and fee payer>",
   "Fee" : "2000000",
   "LedgerFixType" : 1,
   "Owner" : "<Account with corrupted NFTokenPage directory",
   "Sequence" : <n>,
   "SigningPubKey" : "<Account's public key>",
   "TransactionType" : "LedgerStateFix",
   "TxnSignature" : "<Signature>",
}

If a repair is actually completed, then tesSUCCESS is returned. If there is an error or if the directory did not need repair then a tec code is returned.

Future Fixes

Unfortunately this is not the first time ledger state has needed tidying up. Amendment fixTrustLinesToSelf removed invalid state from the ledger. Since programmers are humans we can expect similar kinds of problems to show up in the future.

If there are future ledger state problems, they can be attached to the LedgerStateFix transaction by picking a LedgerFixType value other than 1. Any additional parameters can be added as optional values.

Context of Change

While I was research how efficient NFTokenPages actually are in use, I stumbled across two NFToken directories that were missing links in the middle of the chain. This is a form of ledger corruption. But its consequences do not seem too dire, even for the owners of those directories. Known consequences are:

  1. The account_objects RPC command returns only those objects at the beginning of the NFToken directory up to the point where a link is missing. So, for those accounts with missing links, the NFTokens returned by account_objects is incomplete.
  2. The account_nfts RPC command has a problem similar to account_objects. account_nfts returns NFTokens only up to the point where a link is missing.

There may be other consequences, but I've looked for and not found any. Particularly, I've not seen any evidence of lost NFTokens. NFTokens still get added to the correct owner's page. They simply are not reported by the RPC commands.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Tests (there are tests for the new code)
  • Documentation will need to be added to cover the new transaction.

API Impact

  • Public API: New feature (a new transaction type is added)

Consideration

This NFT-related amendment arrives at about the same time as a different NFT-related amendment: #4946

Should these two pull requests be merged into a single amendment?

It was discovered that under rare circumstances the links between
NFTokenPages could be removed.  If this happens, then the
account_objects and account_nfts RPC commands under-report the
NFTokens owned by an account.

The fixNFTokenPageLinks amendment does the following to address
the problem:

- It fixes the underlying problem so no further broken links
  should be created.
- It adds Invariants so, if such damage were introduced in the
  future, an invariant would stop it.
- It adds a new FixLedgerState transaction that repairs
  directories that were damaged in this fashion.
- It adds unit tests for all of it.
@scottschurr scottschurr added Amendment Perf impact not expected Change is not expected to improve nor harm performance. labels Mar 9, 2024
@codecov-commenter
Copy link

codecov-commenter commented Mar 9, 2024

Codecov Report

Attention: Patch coverage is 87.87879% with 16 lines in your changes missing coverage. Please review.

Project coverage is 74.9%. Comparing base (0a331ea) to head (a6ae5b5).

Files Patch % Lines
src/xrpld/app/tx/detail/NFTokenUtils.cpp 81.6% 14 Missing ⚠️
src/xrpld/app/tx/detail/LedgerStateFix.cpp 93.8% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           develop   #4945    +/-   ##
========================================
  Coverage     74.8%   74.9%            
========================================
  Files          766     768     +2     
  Lines        63004   63134   +130     
  Branches      8842    8848     +6     
========================================
+ Hits         47139   47264   +125     
- Misses       15865   15870     +5     
Files Coverage Δ
include/xrpl/protocol/Feature.h 100.0% <ø> (ø)
include/xrpl/protocol/SField.h 100.0% <ø> (ø)
include/xrpl/protocol/TER.h 100.0% <ø> (ø)
src/libxrpl/protocol/Feature.cpp 94.6% <ø> (ø)
src/libxrpl/protocol/SField.cpp 77.5% <ø> (ø)
src/libxrpl/protocol/TER.cpp 100.0% <ø> (ø)
src/libxrpl/protocol/TxFormats.cpp 100.0% <100.0%> (ø)
src/xrpld/app/tx/detail/InvariantCheck.cpp 91.1% <100.0%> (+4.1%) ⬆️
src/xrpld/app/tx/detail/InvariantCheck.h 100.0% <ø> (ø)
src/xrpld/app/tx/detail/LedgerStateFix.h 100.0% <100.0%> (ø)
... and 4 more

... and 2 files with indirect coverage changes

Impacted file tree graph

Copy link
Collaborator

@seelabs seelabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Nice job on this!

You're probably going to be unhappy with the number of comments that I tagged as "consider removing", but in almost all cases they are comments where the comment mirrors a single line of code and doesn't add any benefit (and we don't want things in the code that don't pull their weight).

But 👍 no matter how you decide to resolve the comments, it's your call.

@@ -633,6 +663,22 @@ ValidNFTokenPage::finalize(
return false;
}

// Only enforce these invariants if fixNFTokenPageLinks is enabled.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the code without the comment. The if statement below is not confusing.

switch (ctx.tx[sfLedgerFixType])
{
case FixType::nfTokenPageLink:
// FixType::nfTokenPageLink requires an Owner field.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the code without the comment. We can see from the case statement immediately above that this is "FixType::nfTokenPageLink" and the code immediately below that it requires an owner field.

if (prev)
{
// Make our previous page point to our next page:
// With fixNFTokenPageLinks...
// The page is empty and there is a prev. If the last page of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is double spaced (I noticed the others were too). I don't think double spacing is very common, but I don't object too strongly if you prefer it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean that there are two spaces after the period? In the days of typewriters (yes, I'm that old) that was the standard. I still think it improves readability with a monospaced font, which I hope most folks who read code with indentation are using. So, since you don't object strongly, I'll leave it.

bool
repairNFTokenDirectoryLinks(ApplyView& view, AccountID const& owner)
{
// Assume that no repair is required.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing lots of comments of like this - a comment and then a line of code immediately below it that doesn't need the comment. I'll tag the ones I'd consider removing, but I'll stop saying "because it mirrors the line of code below it and the code is better without things that don't pull their weight".

// Assume that no repair is required.
bool didRepair = false;

// Walk from the first NFTokenPage owned by owner to the last
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing.

view.update(page);
}

// nextPage should have a link to page;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing.

if (!nextPage->isFieldPresent(sfPreviousPageMin) ||
nextPage->getFieldH256(sfPreviousPageMin) != page->key())
{
// Repair the link to page.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing.

// We need special handling for the last page.
break;

// Continue to walk the links.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing.


// The lastPage is present in the ledger and has the right index. There's
// also a previous page. Make sure that the last page does not have a
// next link.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing.

@@ -74,7 +74,7 @@ namespace detail {
// Feature.cpp. Because it's only used to reserve storage, and determine how
// large to make the FeatureBitset, it MAY be larger. It MUST NOT be less than
// the actual number of amendments. A LogicError on startup will verify this.
static constexpr std::size_t numFeatures = 68;
static constexpr std::size_t numFeatures = 69;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder that this needs to be incremented when the other amendment is merged. We won't get a merge conflict, since they change to the same number.

Copy link
Collaborator

@shawnxie999 shawnxie999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good other than a minor typo 😃

@exunda
Copy link

exunda commented Jun 2, 2024

Why do you use the custom transaction method instead of doing this automatically which is what happens in activateTrustLinesToSelfFix which is called when the amendment is activated? Are you worried that more corrupted pages will create before the amendment activates?

@scottschurr
Copy link
Collaborator Author

@exunda asked:

Why do you use the custom transaction method instead of doing this automatically which is what happens in activateTrustLinesToSelfFix which is called when the amendment is activated?

That's a fair question. It was answered in the (admittedly long) description of the pull request. Here's the quote:

We could put in code that cleans up the two known corrupted directories when the amendment goes live. That's not a good approach for two reasons:

  1. As mentioned before, additional directories could face similar corruption between now and when the amendment goes live. We would like to be able to clean up all damaged NFToken directories, regardless of when they are created.
  2. The MainNet is not the only place running the XRP Ledger code. Other networks may have similar corruptions that need to be fixed. The approach to the fix should be applicable to other networks as well.

We need a transaction to clean up the directories. There are no transactions that just naturally walk an NFToken directory. So we can't leverage an existing transaction. There are RPC commands that walk the directory, but RPC commands can't modify ledger state.

Does that answer your question?

@exunda
Copy link

exunda commented Jun 11, 2024

Yes thank you for explaining more

@intelliot
Copy link
Collaborator

Due to changes in develop, there are now merge conflicts. @scottschurr would it make sense for you to bring this branch up-to-date?

@scottschurr
Copy link
Collaborator Author

@intelliot, I'll work on that merge.

@ximinez
Copy link
Collaborator

ximinez commented Jul 31, 2024

Once the merge is done, will this PR be ready to merge?

@scottschurr
Copy link
Collaborator Author

@ximinez, the merge is almost done -- I need to fix some clang formatting. Gahh, merges across the restructure are painful.

Once clang-format is fixed (soon) I think it could be merged to develop. It has two approvals. But, given that it's an amendment, it would be nice if there were one more review from a senior developer. I had not marked the branch as passed because I was waiting for that additional review.

src/test/app/FixNFTokenPageLinks_test.cpp Outdated Show resolved Hide resolved
src/test/app/FixNFTokenPageLinks_test.cpp Outdated Show resolved Hide resolved
src/test/app/FixNFTokenPageLinks_test.cpp Outdated Show resolved Hide resolved
src/test/app/FixNFTokenPageLinks_test.cpp Outdated Show resolved Hide resolved
BEAST_EXPECT(
bobLastNFTokenPage->at(sfPreviousPageMin) !=
bobMiddleNFTokenPageIndex);
BEAST_EXPECT(!bobLastNFTokenPage->isFieldPresent(sfNextPageMin));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth checking that the sfPreviousPageMin page points to the first page, and that the first page's sfNextPageMin points to this one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea. I ran out of time so I didn't make that change. But someone else could.

Copy link
Collaborator

@ximinez ximinez Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and pushed this change. Someone else will need to reverse review it.


BEAST_EXPECT(
carolLastNFTokenPage->isFieldPresent(sfPreviousPageMin));
BEAST_EXPECT(!carolLastNFTokenPage->isFieldPresent(sfNextPageMin));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Should we confirm that the pages all point to each other correctly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Same response.

Copy link
Collaborator

@ximinez ximinez Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and pushed this change. Someone else will need to reverse review it.

src/xrpld/app/tx/detail/LedgerStateFix.cpp Show resolved Hide resolved
src/xrpld/app/tx/detail/NFTokenUtils.cpp Show resolved Hide resolved
Copy link
Collaborator Author

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, @ximinez! They were all good suggestions. I got to most of them, but I ran out of time on a couple that I noted. Maybe someone else can do those? I think it's also not the end of the world if those changes to the tests don't happen.

Thanks again!

BEAST_EXPECT(
bobLastNFTokenPage->at(sfPreviousPageMin) !=
bobMiddleNFTokenPageIndex);
BEAST_EXPECT(!bobLastNFTokenPage->isFieldPresent(sfNextPageMin));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea. I ran out of time so I didn't make that change. But someone else could.


BEAST_EXPECT(
carolLastNFTokenPage->isFieldPresent(sfPreviousPageMin));
BEAST_EXPECT(!carolLastNFTokenPage->isFieldPresent(sfNextPageMin));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Same response.

* This is from suggestions I made on the review, so someone else will
  have to review these changes.
Copy link
Collaborator

@shawnxie999 shawnxie999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still looks good

Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since @scottschurr is unavailable to give final sign off, I'll do it: This is ready to merge.

@ximinez ximinez added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Aug 7, 2024
@ximinez ximinez merged commit c19a88f into XRPLF:develop Aug 7, 2024
19 of 20 checks passed
vlntb pushed a commit to vlntb/rippled that referenced this pull request Aug 23, 2024
* Add fixNFTokenPageLinks amendment:

It was discovered that under rare circumstances the links between
NFTokenPages could be removed.  If this happens, then the
account_objects and account_nfts RPC commands under-report the
NFTokens owned by an account.

The fixNFTokenPageLinks amendment does the following to address
the problem:

- It fixes the underlying problem so no further broken links
  should be created.
- It adds Invariants so, if such damage were introduced in the
  future, an invariant would stop it.
- It adds a new FixLedgerState transaction that repairs
  directories that were damaged in this fashion.
- It adds unit tests for all of it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Amendment Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. Perf impact not expected Change is not expected to improve nor harm performance.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants