ingest: Extract ledger entry changes from Tx Meta in a deterministic order #5070

tamirms · 2023-10-03T08:24:58Z

PR Checklist

PR Structure

This PR has reasonably narrow scope (if not, break it down into smaller PRs).
This PR avoids mixing refactoring changes with feature changes (split into two PRs
otherwise).
This PR's title starts with name of package that is most changed in the PR, ex.
services/friendbot, or all or doc if the changes are broad or impact many
packages.

Thoroughness

This PR adds tests for the most critical parts of the new functionality or fixes.
I've updated any docs (developer docs, .md
files, etc... affected by this change). Take a look in the docs folder for a given service,
like this one.

Release planning

I've updated the relevant CHANGELOG (here for Horizon) if
needed with deprecations, added features, breaking changes, and DB schema changes.
I've decided if this PR requires a new major/minor version according to
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.

What

In stellar/stellar-core#3952 we discovered that the tx meta emitted by captive core can appear in a random order. This is problematic for Horizon because we would like ingestion to always be deterministic and reproducible. For some releases we test our ingestion system by ingesting history twice, once with the release candidate and the second time with the previous stable version and then we assert that content produced by both runs are the same. If tx meta can appear in a random order this can result in horizon effects being produced in a different order.

This commit is an attempt to make the ordering of Tx Meta consistent in Horizon ingestion.

Known limitations

stellar/stellar-core#3954 is an attempt to fix this issue in core. Once that change is included in a core stable release we can revert this PR.

MonsieurNicolas · 2023-10-03T18:40:49Z

This change is surprising to me: the key order in meta has never been deterministic (since 2015), so what changed in Horizon and/or clients?

If this is it to help with testing, it seems odd to take a dependency on ordering in production?

Also, meta already stored basically everywhere (Horizon DB, core-DB in "legacy mode") is already non-canonical so this change may not be enough.

MonsieurNicolas · 2023-10-03T18:48:01Z

I mean: if it makes the logic simpler, you can try to normalize meta in some way across all data sources, but I am not sure it's something you really want.

tamirms · 2023-10-03T21:28:45Z

@MonsieurNicolas

This change is surprising to me: the key order in meta has never been deterministic (since 2015), so what changed in Horizon and/or clients?
If this is it to help with testing, it seems odd to take a dependency on ordering in production?

Having horizon ingestion be deterministic and reproducible makes it a lot easier for us catch bugs in release candidates. The most straightforward way to guarantee that horizon ingestion is deterministic is to ensure that the order of tx meta is deterministic. What would be the alternative approach?

Also, meta already stored basically everywhere (Horizon DB, core-DB in "legacy mode") is already non-canonical so this change may not be enough.

This change will re-sort tx meta before Horizon consumes it in ingestion. The sorting will be applied regardless if the source of the tx meta is captive core or the core-db in legacy mode.

MonsieurNicolas · 2023-10-04T02:19:48Z

Having horizon ingestion be deterministic and reproducible makes it a lot easier for us catch bugs in release candidates. The most straightforward way to guarantee that horizon ingestion is deterministic is to ensure that the order of tx meta is deterministic. What would be the alternative approach?

Maybe it depends at which layer you test, which is why I was asking about what changed in Horizon: I thought you would just compare ingested state with expected state (so set comparisons) and not really worry about the format of the diff.

tamirms · 2023-10-05T08:47:29Z

@MonsieurNicolas Horizon stores some ledger entries in its database and those are periodically compared against the ledger entries from history archive snapshots. That comparison is a set comparison like you described (where ordering doesn't matter).

But we also compare horizon effects between the release candidate and the last stable release. The ordering of meta does affect the ordering of horizon effects in some cases.

Here is an example comparing the effects response for a specific operation in horizon staging and horizon production

In staging the effects response looks like:

[
      {
        "id": "0207289144773263361-0000000001",
        "paging_token": "207289144773263361-1",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "trustline_deauthorized",
        "type_i": 24,
        "created_at": "2023-09-24T22:46:03Z",
        "trustor": "GDWAC25VFDIIX442O34FEKFPEALTRDUN5LU3ANNIDWO25JJMBAI2JXC4",
        "asset_type": "credit_alphanum12",
        "asset_code": "NICKEL"
      },
     ..., // other effects omitted for brevity  
      {
        "id": "0207289144773263361-0000000008",
        "paging_token": "207289144773263361-8",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "claimable_balance_sponsorship_created",
        "type_i": 69,
        "created_at": "2023-09-24T22:46:03Z",
        "balance_id": "00000000362c689c6734eca6de952a0cbb4eaf28545885307e66710747aa03935f446199",
        "sponsor": "GDWAC25VFDIIX442O34FEKFPEALTRDUN5LU3ANNIDWO25JJMBAI2JXC4"
      },
      {
        "id": "0207289144773263361-0000000009",
        "paging_token": "207289144773263361-9",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "claimable_balance_sponsorship_created",
        "type_i": 69,
        "created_at": "2023-09-24T22:46:03Z",
        "balance_id": "00000000ba96b670dfbd77fe9cf207336439806022dd77dfd1fcdda0f3a8c48559e11dee",
        "sponsor": "GDWAC25VFDIIX442O34FEKFPEALTRDUN5LU3ANNIDWO25JJMBAI2JXC4"
      },
      {
        "id": "0207289144773263361-0000000010",
        "paging_token": "207289144773263361-10",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "liquidity_pool_removed",
        "type_i": 94,
        "created_at": "2023-09-24T22:46:03Z",
        "liquidity_pool_id": "3e1337116f38a6ff6b4161cb4455ea4f3f2777562e52dbeffebaa0582cc7e561"
      }
    ]

In production the effects response looks like:

[
      {
        "id": "0207289144773263361-0000000001",
        "paging_token": "207289144773263361-1",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "trustline_deauthorized",
        "type_i": 24,
        "created_at": "2023-09-24T22:46:03Z",
        "trustor": "GDWAC25VFDIIX442O34FEKFPEALTRDUN5LU3ANNIDWO25JJMBAI2JXC4",
        "asset_type": "credit_alphanum12",
        "asset_code": "NICKEL"
      },
      {
        "id": "0207289144773263361-0000000008",
        "paging_token": "207289144773263361-8",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "claimable_balance_sponsorship_created",
        "type_i": 69,
        "created_at": "2023-09-24T22:46:03Z",
        "balance_id": "00000000ba96b670dfbd77fe9cf207336439806022dd77dfd1fcdda0f3a8c48559e11dee",
        "sponsor": "GDWAC25VFDIIX442O34FEKFPEALTRDUN5LU3ANNIDWO25JJMBAI2JXC4"
      },
     ..., // other effects omitted for brevity  
      {
        "id": "0207289144773263361-0000000009",
        "paging_token": "207289144773263361-9",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "claimable_balance_sponsorship_created",
        "type_i": 69,
        "created_at": "2023-09-24T22:46:03Z",
        "balance_id": "00000000362c689c6734eca6de952a0cbb4eaf28545885307e66710747aa03935f446199",
        "sponsor": "GDWAC25VFDIIX442O34FEKFPEALTRDUN5LU3ANNIDWO25JJMBAI2JXC4"
      },
      {
        "id": "0207289144773263361-0000000010",
        "paging_token": "207289144773263361-10",
        "account": "GDPCYRFMTUIGB37GJ76LNJP7NXUENBACS5QTBLMICFIOYAJNGMGHBJIU",
        "type": "liquidity_pool_removed",
        "type_i": 94,
        "created_at": "2023-09-24T22:46:03Z",
        "liquidity_pool_id": "3e1337116f38a6ff6b4161cb4455ea4f3f2777562e52dbeffebaa0582cc7e561"
      }
    ]

The responses are almost identical. The only difference is in the claimable_balance_sponsorship_created effects. In horizon staging 00000000362c689c6734eca6de952a0cbb4eaf28545885307e66710747aa03935f446199 is first and is followed by 00000000ba96b670dfbd77fe9cf207336439806022dd77dfd1fcdda0f3a8c48559e11dee. In horizon production the order is reversed: 00000000ba96b670dfbd77fe9cf207336439806022dd77dfd1fcdda0f3a8c48559e11dee is first and is followed by 00000000362c689c6734eca6de952a0cbb4eaf28545885307e66710747aa03935f446199.

After doing some digging, I found that this issue of random tx meta ordering affecting the effects response has occurred before in #3942 . If you look at the conversation in the issue, @2opremio argues that core should provide a deterministic ordering and he was met with objections from Jon Jove who was concerned about the impact of core's performance.

The way we fixed this issue is by adding sorting logic locally in the ingestion processor:

883ccb3

We could fix this new issue in the same way by adding sorting logic locally in this part of the code:

go/services/horizon/internal/ingest/processors/effects_processor.go

Lines 262 to 267 in facabfc

    
           for _, change := range changes { 
        
           	if err = wrapper.addLedgerEntrySponsorshipEffects(change); err != nil { 
        
           		return nil, err 
        
           	} 
        
           	wrapper.addSignerSponsorshipEffects(change) 
        
           }

However, I believe a more robust solution would be sorting tx meta at the source either in core (the ideal solution) or from the ingestion library which reads tx meta from core (which is what this PR does).

sreuland · 2023-10-05T18:07:07Z

ingest/change.go

+// but if that is the case, we fall back to the original ordering of the changes
+// by using a stable sorting algorithm.
+func sortChanges(changes []Change) []Change {
+	sort.Stable(newSortableChanges(changes))


since sortableChanges can mutate the underlying array in changes passed from caller as part of Swap method, should a copy of changes be made to avoid side effect?

the intent was to mutate changes and avoid making a copy of the slice. I can make sortChanges() have a void return value to make that intent more clear

ok, that works also, thanks!

sreuland · 2023-10-05T18:24:53Z

ingest/ledger_change_reader.go

@@ -139,6 +139,7 @@ func (r *LedgerChangeReader) Read() (Change, error) {
 				Post: nil,
 			}
 		}
+		changes = sortChanges(changes)


this now results in all changes being (re)sorted, which is great for the goal stated, do you think it may have potential to trigger false warnings from verify-range as that compares history table dumps, at least in the interim while the production dumps are from a horizon release prior to the one having this change, maybe it's just an aspect worth mentioning in the changelog notes?

do you think it may have potential to trigger false warnings from verify-range ?

yes, that's true but we actually already have false positives from verify-range due to random ordering of tx meta from core. In fact, I discovered this issue by debugging verify-range failures on the soroban release candidate.

The aim of this PR is to fix this issue for all releases going forward. If we want to compare a release candidate with an older version then we would need to cherry-pick this commit which sorts the ingestion changes deterministically on top of the older version. Then we could run verify-range on the release candidate and compare it against the older version + cherry-picked commit.

Good call about adding this to the changelog notes. I can do that

MonsieurNicolas · 2023-10-05T20:55:27Z

Thanks @tamirms for the background. I didn't realize this was needed for effects that don't have good global "IDs".
Now that I understand what this is, I think this change looks fine as Horizon may emit effects from basically anything, right?

Soroban-RPC / other ingesters may not need this though (but maybe we have a lot of spare CPU there)?

As far as doing the sort at the core layer: as we'd be looking at some pretty bad migration (to eventually remove this from Horizon), it does not seem to be worth doing at this time (we may revisit this if we do SPVs/parallel execution).

tamirms · 2023-10-06T08:08:03Z

Now that I understand what this is, I think this change looks fine as Horizon may emit effects from basically anything, right?

Yes, horizon effects try to correlate a meta change to a specific operation. In the cases where there is more than 1 meta change of the same ledger entry type associated to an operation we can ran into ordering issues unless we perform a sort.

Soroban-RPC / other ingesters may not need this though (but maybe we have a lot of spare CPU there)?

Yes, this is true, soroban-rpc does not do anything similar to effects so it does not need tx meta to be ordered consistently

…order (stellar#5070)

Extract ledger entry changes from Tx Meta in a deterministic order

8854037

tamirms requested a review from a team October 3, 2023 08:26

fix test

2b2048b

MonsieurNicolas mentioned this pull request Oct 3, 2023

Fix bug 3952 by normalizing LedgerCloseMeta on meta stream stellar/stellar-core#3954

Closed

tamirms added 2 commits October 5, 2023 09:51

Remove cb sorting now that tx meta order is deterministic

111f22d

fix TestLiquidityPoolRevoke

4a94d3b

sreuland reviewed Oct 5, 2023

View reviewed changes

sreuland approved these changes Oct 5, 2023

View reviewed changes

code review fixes

48fc0e8

tamirms merged commit 4ecb643 into stellar:release-horizon-v2.27.0 Oct 6, 2023

tamirms added a commit to tamirms/go that referenced this pull request Dec 11, 2023

ingest: Extract ledger entry changes from Tx Meta in a deterministic …

136d71b

…order (stellar#5070)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ingest: Extract ledger entry changes from Tx Meta in a deterministic order #5070

ingest: Extract ledger entry changes from Tx Meta in a deterministic order #5070

tamirms commented Oct 3, 2023

MonsieurNicolas commented Oct 3, 2023

MonsieurNicolas commented Oct 3, 2023

tamirms commented Oct 3, 2023

MonsieurNicolas commented Oct 4, 2023

tamirms commented Oct 5, 2023

sreuland Oct 5, 2023

tamirms Oct 5, 2023

sreuland Oct 5, 2023

sreuland Oct 5, 2023

tamirms Oct 5, 2023

MonsieurNicolas commented Oct 5, 2023

tamirms commented Oct 6, 2023

ingest: Extract ledger entry changes from Tx Meta in a deterministic order #5070

ingest: Extract ledger entry changes from Tx Meta in a deterministic order #5070

Conversation

tamirms commented Oct 3, 2023

PR Structure

Thoroughness

Release planning

What

Known limitations

MonsieurNicolas commented Oct 3, 2023

MonsieurNicolas commented Oct 3, 2023

tamirms commented Oct 3, 2023

MonsieurNicolas commented Oct 4, 2023

tamirms commented Oct 5, 2023

sreuland Oct 5, 2023

Choose a reason for hiding this comment

tamirms Oct 5, 2023

Choose a reason for hiding this comment

sreuland Oct 5, 2023

Choose a reason for hiding this comment

sreuland Oct 5, 2023

Choose a reason for hiding this comment

tamirms Oct 5, 2023

Choose a reason for hiding this comment

MonsieurNicolas commented Oct 5, 2023

tamirms commented Oct 6, 2023