fix(congestion) - Correctly handle receipt gas and size calculation in protocol upgrades #12031

wacban · 2024-09-02T16:48:30Z

For now just a request for comments. There are still a few missing pieces like adding tests and gating this feature by a protocol feature. Please let me know if this approach makes sense on a high level.

This PR addresses the issue described in #11923. Please have a look at that first for context.

This is a hacky implementation of option 1:
"Store the receipts together with the gas and size when added and use that when removing the receipt from buffer."

The basic idea is that instead of storing Receipt directly in state, it can be wrapped in a StateStoredReceipt together with the metadata that is needed to correctly handle protocol upgrades.

In order to migrate from the old way of storing receipts I'm using a rather hacky solution. I implemented a custom serialization and deserialization for the StateStoredReceipt. Using that customization it's possible to discriminate between serialized Receipt and serialized StateStoredReceipt. I added a helper struct ReceiptOrStateStoredReceipt and I made it so that both Receipt and StateStoredReceipt can be deserialized directly to it.

How to differentiate between Receipt and StateStoredReceipt ?

Receipt::V0 has first two bytes in the form of [x, 0] - the second byte is zero.
Receipt::V1 has first two bytes in the form of [1, x] - where x != 0 - the first byte is one and second is non-zero.
Receipt::Vn (future) has first two bytes in the form of [n, x] - where x != 0
StateStoredReceipt has first two bytes in the for of [T, T] where T is a magic tag == u8::MAX.

The StateStoredReceipt can be told apart from Receipt::V0 by the second byte.
The StateStoredReceipt can be told apart from Receipt::Vn by the first byte.
Receipt::V0 and Receipt::V1 can be told apart by the second byte as described in

nearcore/core/primitives/src/receipt.rs

Lines 105 to 118 in 0e5e7c7

    
           impl BorshDeserialize for Receipt { 
        
               /// Deserialize based on the first and second bytes of the stream. For V0, we do backward compatible deserialization by deserializing 
        
               /// the entire stream into V0. For V1, we consume the first byte and then deserialize the rest. 
        
               fn deserialize_reader<R: Read>(reader: &mut R) -> std::io::Result<Self> { 
        
                   let u1 = u8::deserialize_reader(reader)?; 
        
                   let u2 = u8::deserialize_reader(reader)?; 
        
                   let u3 = u8::deserialize_reader(reader)?; 
        
                   let u4 = u8::deserialize_reader(reader)?; 
        
                   // This is a ridiculous hackery: because the first field in `ReceiptV0` is an `AccountId` 
        
                   // and an account id is at most 64 bytes, for all valid `ReceiptV0` the second byte must be 0 
        
                   // because of the littel endian encoding of the length of the account id. 
        
                   // On the other hand, for `ReceiptV0`, since the first byte is 1 and an account id must have nonzero 
        
                   // length, so the second byte must not be zero. Therefore, we can distinguish between the two versions 
        
                   // by looking at the second byte.

How will the migration from Receipt to StateStoredReceipt happen?

In the old protocol version receipts will be stored as Receipt
In the new protocol version receipts will be stored as StateStoredReceipt
In both version receipts will be read as ReceiptOrStateStoredReceipt. This covers the fun case where receipts are stored in the old version and read in the new version.

wacban · 2024-09-02T16:49:43Z

@jakmeier and @jancionear Can you have a look?

jakmeier · 2024-09-04T15:51:13Z

Please let me know if this approach makes sense on a high level.

Thanks for preparing this. I think the approach can work out but I have a few thougts on it.

In order to migrate from the old way of storing receipts I'm using a rather hacky solution.

Indeed it's quite hacky, the protocol specification becomes a real nightmare as we add layers of serialization hacks upon each ohter.

How will the migration from Receipt to StateStoredReceipt happen?

In the old protocol version receipts will be stored as Receipt

In the new protocol version receipts will be stored as StateStoredReceipt

In both version receipts will be read as ReceiptOrStateStoredReceipt. This covers the fun case where receipts are stored in the old version and read in the new version.

Once upon a time, the creators of Near decided they need to invent Borsh in order to have a serialization format that is bijective. At least that is my understanding. I think your implementation breaks that.

I never found a written down explanation for why exactly a bijective de-serialization is needed. My guess is it has to do with consensus, where a malicious actor could provide 2 different state roots to peers, where both represent the same data but serialized differently. Proofing which of the two is wrong would be impossible without fully recomputing it, since they contain the exact same data when deserisalized.

I don't know for sure if we still care about this property, you guys know much better than I how consensus under stateless validation works. I just wanted to bring it to your attention. I imagine you could change the implementation to become bijective again with a bit of extra effort.

That's my thoughts so far on your implementation. I hope it helps.

wacban · 2024-09-04T16:49:42Z

@jakmeier Thanks!

Once upon a time, the creators of Near decided they need to invent Borsh in order to have a serialization format that is bijective. At least that is my understanding. I think your implementation breaks that.

Yeah it's true that the same byte array may be deserialized differently but in the implementation we always deserialize as ReceiptOrStateStoredReceipt. It's not possible to serialize the same data in two different ways. The Receipt and ReceiptOrStateStoredReceipt will be serialized exactly the same. Same for StateStoredReceipt.

Still it is very hacky.

I imagine you could change the implementation to become bijective again with a bit of extra effort.

I'm not quite sure how to do that. The constraint is that the new implementation needs to support the old data format. I guess we could attempt something along the lines of "try parse as Receipt, otherwise parse as StateStoredReceipt but that's just another way of expressing the double deserialization.

jakmeier · 2024-09-04T20:23:40Z

Once upon a time, the creators of Near decided they need to invent Borsh in order to have a serialization format that is bijective. At least that is my understanding. I think your implementation breaks that.

Yeah it's true that the same byte array may be deserialized differently but in the implementation we always deserialize as ReceiptOrStateStoredReceipt. It's not possible to serialize the same data in two different ways. The Receipt and ReceiptOrStateStoredReceipt will be serialized exactly the same. Same for StateStoredReceipt.

Yeah, but it's still possible to construct two different state roots that both work perfectly fine as the previous state root, if you are trying to do something malicious that requires it. As you are describing it, it seems we only need it to be a surjective serialization, not bijective. But borsh.io specifically says it's bijecive, so I always thought that is relevant.

Before SV, I believe challanges were a major concern, that would require that a malicious state root can be proven to be invalid without downloading all the state and doing the full computation for a state transition. This assumption would be broken if you can have two state roots that contain the exact same state, thus it needs to be bijective.

Honestly, I'm not sure how much of problem that is with SV. As I understand it, chunk producers, as well as stateless validators, always start with a state root that is verified by a block hash, which itself has been verified using headersync from genesis. Therefore, there is no way to trick them into using the wrong state root, even if it would work to produce the same outcome. A bijection seems good enough.

The problem would be if you had some code that just checks that the transition from state root A to B is correct, without verifiying that the previous state root is correct. I think there were discussions about a design like that, where transitions are verified and signed independent of where exactly they are in the block chain. But it hasn't been implemented like this, right?

I imagine you could change the implementation to become bijective again with a bit of extra effort.

I'm not quite sure how to do that. The constraint is that the new implementation needs to support the old data format. I guess we could attempt something along the lines of "try parse as Receipt, otherwise parse as StateStoredReceipt but that's just another way of expressing the double deserialization.

Sorry I didn't mean it as, just try a bit harder :P
I meant, if we are willing to make the code quite a bit more complex, it could work in a bijective way.
Re-reading that, I've expressed it very poorly.

What I had in mind was some way to always know which version to expect when deserializing. That requires extra house-keeping until the protocol upgrade has happened. Like, when the protocol upgrade happens, take a note how many receipts there are in each queue and use the old deserialization for those. This would be quite a bit of extra code but it wouldn't be too complicated and it could be removed again once the upgrade is over.
Or maybe a data migration on upgrade could also be viable, assuming there aren't too many receipts in the queues? Not sure.

wacban · 2024-09-05T10:17:43Z

Yeah, but it's still possible to construct two different state roots that both work perfectly fine as the previous state root, if you are trying to do something malicious that requires it. As you are describing it, it seems we only need it to be a surjective serialization, not bijective. But borsh.io specifically says it's bijecive, so I always thought that is relevant.

Ah I think I understand what you mean. We can have the same Receipt placed in state as either Receipt or StateStoredReceipt. I think it is the case with every versioned struct that we have. Not to search too far a Receipt can be either ReceiptV0 or ReceiptV1 and given priorities are not finished it wouldn't make any semantic difference.

However in both cases only one of the possible representations is a result of a correct state transition. What structure should be used is determined by the protocol version. A malicious actor can't just pick and choose which one to use. Worth mentioning that this isn't implemented yet in this PR, I have TODOs in place for that where use_state_stored_receipt should be obtained from the runtime config.

That being said I'm no longer that confident that it's ok to break the bijectivity. @bowenwang1996 Can you share your wisdom on this PR? Is this hackery fine?

Or maybe a data migration on upgrade could also be viable, assuming there aren't too many receipts in the queues? Not sure.

I like that idea although I'm still leaning towards this PR since it is almost ready :)

jakmeier · 2024-09-05T12:15:44Z

I think it is the case with every versioned struct that we have. Not to search too far a Receipt can be either ReceiptV0 or ReceiptV1 and given priorities are not finished it wouldn't make any semantic difference.

The crucial difference is that ReceiptV0 is always serialized back as ReceiptV0. Therefore, serialization is still bijective for receipts. But as I understood your explanation that I quoted on my first comment, you want to decide the version used on storing a StateStoredReceipt based on the protocol version.

How will the migration from Receipt to StateStoredReceipt happen?

In the old protocol version receipts will be stored as Receipt

In the new protocol version receipts will be stored as StateStoredReceipt

In both version receipts will be read as ReceiptOrStateStoredReceipt. This covers the fun case where receipts are stored in the old version and read in the new version.

Or maybe I misunderstood, as the protocol version specific behaviour is not implemented, yet. As things are implemented right now, you preserve the version on serialization, which means serialization is still bijective.

Thinking about it again, I don't see why you even need to store as StateStoredReceipt based on the protocol version. Just store as ReceiptOrStateStoredReceipt instead of as StateStoredReceipt in the new version and it should be fine.
Or did you have a reason why you wanted to store as StateStoredReceipt in the new protocol version?

wacban · 2024-09-05T12:48:01Z

I added the protocol feature for this change and the handling for protocol upgrade.

abbr:
R = Receipt
SSR = StateStoredReceipt
RoSSR = ReceiptOrStateStoredReceipt

The crucial difference is that ReceiptV0 is always serialized back as ReceiptV0. Therefore, serialization is still bijective for receipts. But as I understood your explanation that I quoted on my first comment, you want to decide the version used on storing a StateStoredReceipt based on the protocol version.

What I proposed is the following:

Serialization depends on protocol version:
- Before this feature is enabled receipts are serialized and stored as Receipt
- After this feature is enabled receipts are serialized and stored as SSR / RoSSR
  - SSR and RoSSR serialize to exactly the same bytes so either interpretation is correct
Deserialization is always to RoSSR.
- When deserializing R it will become RoSSR::Receipt
- When deserializing SSR it will become RoSSR::StateStoredReceipt

Other way that I think about it is that in State it looks like either R or SSR and RoSSR is just an implementation detail helper for conveniently handling this union.

core/primitives/src/receipt.rs

jakmeier · 2024-09-09T20:28:03Z

abbr: R = Receipt SSR = StateStoredReceipt RoSSR = ReceiptOrStateStoredReceipt

The crucial difference is that ReceiptV0 is always serialized back as ReceiptV0. Therefore, serialization is still bijective for receipts. But as I understood your explanation that I quoted on my first comment, you want to decide the version used on storing a StateStoredReceipt based on the protocol version.

What I proposed is the following:
* Serialization depends on protocol version:
  
  * Before this feature is enabled receipts are serialized and stored as Receipt
  * After this feature is enabled receipts are serialized and stored as `SSR` / `RoSSR`
    
    * `SSR` and `RoSSR` serialize to exactly the same bytes so either interpretation is correct

* Deserialization is always to `RoSSR`.
  
  * When deserializing `R` it will become `RoSSR::Receipt`
  * When deserializing `SSR` it will become `RoSSR::StateStoredReceipt`
Other way that I think about it is that in State it looks like either R or SSR and RoSSR is just an implementation detail helper for conveniently handling this union.

Thanks for writing it down so clearly and also for implementing the protocol specific behaviour, as well as the tests that show the behaviour exactly.

I guess you could say R and RoSSR::Receipt are really the same value, for all we care. They serialize to the same bytes and they are treated exactly the same in code. The same goes for SSR and RoSSR::StateStoredReceipt. I thought you wanted to serialize R and RoSSR::Receipt to different bytes, which is why I thought bijection is somehow broken.

This makes sense. Sorry for the derailed discussion, your first response should have cleared this up but I didn't understand it.

Still, I'm not a fan of the code complexity. The manual tagging, even if well-justified, is unintuitive in my opinion. Sadly, I couldn't find a better alternative... :( As I see it, if we want to store the metadata alongside the receipts, I think we are stuck with this solution. Right now, the code doesn't worry me, only the bit rotting makes me scratch my head.

(For a worse alternative, if you are curious, I thought about putting the metadata at the end of the struct. Then you check the length to figure out if you should deserialize as (Receipt,()) or (Receipt, metadata). At first glance, it seemed more intuitive than 3 magic bytes intertwined. But to be honest, it's even worse to maintain. Especially if new receipt versions are added with different lengths, suddenly it might be impossible to to distinguish all cases.)

All of this makes me super glad that you thought of making StateStoredReceipt versioned! 😁 👍

All said, I have nothing against this PR to block it but only reviewed it on a high level. If you want a detailed review for the +1 from me, ping me when it's ready and I'll take another look.

wacban · 2024-09-09T22:01:11Z

@jakmeier Thanks! Yeah I'm not too happy about it either but I think it's the best we can do given circumstances. I'm just glad it's localized to the serialization and doesn't leak into apply. We discussed it today in the chain sync meeting and agreed to proceed with this solution.

@jancionear Can you do a full review now please? The tests seem to be failing because I forgot to regen some .snap files, I'll fix it tomorrow.

jancionear

Looks nice 👍
Left some comments, I think we could include it in protocol 72.

core/parameters/src/config_store.rs

core/primitives/src/receipt.rs

runtime/runtime/src/congestion_control.rs

runtime/runtime/src/lib.rs

tools/state-viewer/src/congestion_control.rs

codecov · 2024-09-10T14:39:41Z

Codecov Report

Attention: Patch coverage is 82.62295% with 53 lines in your changes missing coverage. Please review.

Project coverage is 71.57%. Comparing base (e516989) to head (4c909b8).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
core/primitives/src/receipt.rs	89.32%	7 Missing and 12 partials ⚠️
runtime/runtime/src/congestion_control.rs	80.55%	1 Missing and 13 partials ⚠️
runtime/runtime/src/balance_checker.rs	42.85%	4 Missing and 4 partials ⚠️
core/store/src/lib.rs	0.00%	3 Missing ⚠️
runtime/runtime/src/lib.rs	50.00%	3 Missing ⚠️
tools/state-viewer/src/congestion_control.rs	0.00%	2 Missing ⚠️
chain/client/src/test_utils/test_env.rs	80.00%	0 Missing and 1 partial ⚠️
core/parameters/src/parameter_table.rs	0.00%	0 Missing and 1 partial ⚠️
core/store/src/trie/receipts_column_helper.rs	93.33%	0 Missing and 1 partial ⚠️
...me-params-estimator/src/costs_to_runtime_config.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #12031      +/-   ##
==========================================
+ Coverage   71.55%   71.57%   +0.01%     
==========================================
  Files         821      821              
  Lines      165172   165425     +253     
  Branches   165172   165425     +253     
==========================================
+ Hits       118186   118400     +214     
- Misses      41843    41870      +27     
- Partials     5143     5155      +12

Flag	Coverage Δ
backward-compatibility	`0.17% <0.00%> (-0.01%)`	⬇️
db-migration	`0.17% <0.00%> (-0.01%)`	⬇️
genesis-check	`1.25% <0.00%> (-0.01%)`	⬇️
integration-tests	`38.73% <51.80%> (+<0.01%)`	⬆️
linux	`71.35% <82.62%> (+0.02%)`	⬆️
linux-nightly	`71.14% <82.62%> (+0.01%)`	⬆️
macos	`54.05% <54.93%> (+<0.01%)`	⬆️
pytests	`1.51% <0.00%> (-0.01%)`	⬇️
sanity-checks	`1.32% <0.00%> (-0.01%)`	⬇️
unittests	`65.30% <78.68%> (+0.03%)`	⬆️
upgradability	`0.21% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

core/parameters/src/snapshots/near_parameters__config_store__tests__146.json.snap

core/primitives/src/receipt.rs

integration-tests/src/tests/client/features/congestion_control.rs

jancionear

Sorry for the late review, I had to go debug issues with 2.2.0 and it consumed a lot of my time.

core/primitives-core/src/version.rs

integration-tests/src/tests/client/features/congestion_control.rs

core/parameters/res/runtime_configs/72.yaml

core/store/src/trie/receipts_column_helper.rs

Remove lifetimes from types generated by ProtocolSchema tooling. Needed for #12031.

jancionear

lgtm 👍, left some nits

core/store/src/trie/mem/mem_tries.rs

integration-tests/src/tests/client/features/congestion_control.rs

runtime/runtime/src/congestion_control.rs

runtime/runtime/src/lib.rs

tools/state-viewer/src/congestion_control.rs

wacban requested a review from a team as a code owner September 2, 2024 16:48

wacban requested review from saketh-are, jakmeier and jancionear and removed request for saketh-are September 2, 2024 16:48

jancionear reviewed Sep 5, 2024

View reviewed changes

core/primitives/src/receipt.rs Outdated Show resolved Hide resolved

wacban force-pushed the waclaw-cc branch from 52d7ba6 to cc9ecc5 Compare September 6, 2024 12:56

jancionear reviewed Sep 10, 2024

View reviewed changes

wacban requested a review from jancionear September 10, 2024 15:26

jancionear reviewed Sep 10, 2024

View reviewed changes

core/parameters/src/snapshots/near_parameters__config_store__tests__146.json.snap Outdated Show resolved Hide resolved

core/primitives/src/receipt.rs Show resolved Hide resolved

wacban force-pushed the waclaw-cc branch from c188622 to f8abec0 Compare September 11, 2024 13:56

wacban commented Sep 11, 2024

View reviewed changes

integration-tests/src/tests/client/features/congestion_control.rs Show resolved Hide resolved

wacban requested a review from jancionear September 11, 2024 14:03

jancionear reviewed Sep 16, 2024

View reviewed changes

wacban force-pushed the waclaw-cc branch 2 times, most recently from 1c56359 to 8742fbc Compare September 20, 2024 11:59

Longarithm mentioned this pull request Sep 20, 2024

feat: lifetimes in ProtocolSchema types #12120

Merged

github-merge-queue bot pushed a commit that referenced this pull request Sep 20, 2024

feat: lifetimes in ProtocolSchema types (#12120)

08065ff

Remove lifetimes from types generated by ProtocolSchema tooling. Needed for #12031.

wacban added 2 commits September 20, 2024 15:29

first draft

f59e897

serialization & deserialization

1938064

wacban added 14 commits September 20, 2024 15:29

uncomment tests

03b347f

nits

a900c1c

added config to handle the protocol upgrade to StateStoredReceipt

a731dbd

finishing the

eb34fd1

disable ChunkEndorsementsInBlockHeader

dbd8b21

added unit tests

55fa1d8

address some comments, fix tests

bb82dcf

cow

f914070

nits

9f3c83d

test and more comments

ed6774d

undo nightly protocol version change

591cd23

fix some tests

cbbf878

fix more tests, add comments

49bf9bf

fix protocol schema check

d1e0656

wacban force-pushed the waclaw-cc branch from 8742fbc to d1e0656 Compare September 20, 2024 15:14

wacban requested a review from jancionear September 20, 2024 15:35

jancionear approved these changes Sep 23, 2024

View reviewed changes

wacban and others added 4 commits September 23, 2024 14:35

split setup_runtime to setup_real_runtime and setup_test_runtime

cfca0e3

address other comments

f1923a7

Merge branch 'master' into waclaw-cc

39d3ff7

post merge fix

4c909b8

wacban added this pull request to the merge queue Sep 23, 2024

Merged via the queue into master with commit 7ad85a6 Sep 23, 2024
27 of 29 checks passed

wacban deleted the waclaw-cc branch September 23, 2024 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(congestion) - Correctly handle receipt gas and size calculation in protocol upgrades #12031

fix(congestion) - Correctly handle receipt gas and size calculation in protocol upgrades #12031

wacban commented Sep 2, 2024

wacban commented Sep 2, 2024

jakmeier commented Sep 4, 2024

wacban commented Sep 4, 2024

jakmeier commented Sep 4, 2024

wacban commented Sep 5, 2024

jakmeier commented Sep 5, 2024 •

edited

Loading

wacban commented Sep 5, 2024

jakmeier commented Sep 9, 2024

wacban commented Sep 9, 2024

jancionear left a comment

codecov bot commented Sep 10, 2024 •

edited

Loading

jancionear left a comment

jancionear left a comment

	impl BorshDeserialize for Receipt {
	/// Deserialize based on the first and second bytes of the stream. For V0, we do backward compatible deserialization by deserializing
	/// the entire stream into V0. For V1, we consume the first byte and then deserialize the rest.
	fn deserialize_reader<R: Read>(reader: &mut R) -> std::io::Result<Self> {
	let u1 = u8::deserialize_reader(reader)?;
	let u2 = u8::deserialize_reader(reader)?;
	let u3 = u8::deserialize_reader(reader)?;
	let u4 = u8::deserialize_reader(reader)?;
	// This is a ridiculous hackery: because the first field in `ReceiptV0` is an `AccountId`
	// and an account id is at most 64 bytes, for all valid `ReceiptV0` the second byte must be 0
	// because of the littel endian encoding of the length of the account id.
	// On the other hand, for `ReceiptV0`, since the first byte is 1 and an account id must have nonzero
	// length, so the second byte must not be zero. Therefore, we can distinguish between the two versions
	// by looking at the second byte.

fix(congestion) - Correctly handle receipt gas and size calculation in protocol upgrades #12031

fix(congestion) - Correctly handle receipt gas and size calculation in protocol upgrades #12031

Conversation

wacban commented Sep 2, 2024

wacban commented Sep 2, 2024

jakmeier commented Sep 4, 2024

wacban commented Sep 4, 2024

jakmeier commented Sep 4, 2024

wacban commented Sep 5, 2024

jakmeier commented Sep 5, 2024 • edited Loading

wacban commented Sep 5, 2024

jakmeier commented Sep 9, 2024

wacban commented Sep 9, 2024

jancionear left a comment

Choose a reason for hiding this comment

codecov bot commented Sep 10, 2024 • edited Loading

Codecov Report

jancionear left a comment

Choose a reason for hiding this comment

jancionear left a comment

Choose a reason for hiding this comment

jakmeier commented Sep 5, 2024 •

edited

Loading

codecov bot commented Sep 10, 2024 •

edited

Loading