Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: implemented a test for handling incoming receipts during resharding #9467

Merged
merged 8 commits into from
Sep 6, 2023

Conversation

wacban
Copy link
Contributor

@wacban wacban commented Aug 31, 2023

Implemented an integration test checking if incoming receipts are correctly handled during resharding. They are not. The test is ignored and I will work on a solution independently. (This is an attempt at TDD, let's see how that goes).

The current test setup allows for skipping all chunks in a block but in that case there wouldn't be any incoming receipts generated in that block so it doesn't quite cut it. Instead I added a DropChunkCondition that allows dropping individual chunks from a block.

In the 1->4 resharding it doesn't make much sense because before resharding there is always one shard so dropping doesn't trigger the error. In the 4->5 resharding the error is triggered e.g. when only one chunk is dropped at the last block before switching to new shard layout.

I also added some other minor changes and refactorings that came up when debugging this case.

The test fails with the following error:

thread 'tests::client::sharding_upgrade::test_shard_layout_upgrade_incoming_receipts_impl_v2' panicked at 'assertion failed: `Failure(ActionError(ActionError { index: Some(0), kind: AccountDoesNotExist { account_id: AccountId("test1") } }))` does not match `ExecutionStatusView::SuccessValue(_)`', integration-tests/src/tests/client/sharding_upgrade.rs:512:21

And here are some other relevant logs:

4.802s DEBUG {client=AId("test1")}:produce_block{next_height=9}: client: AId("test1") Producing block at height 9, parent 8 @ AXDd, 4 new chunks                                                                                                                       [77/149]
 4.849s DEBUG process block{client=0}:on_block_accepted{prev_block_hash=ZQod shard_id=0}:produce_chunk{next_height=10 shard_id=0 epoch_id=EpochId(J233)}: client: Producing chunk at height 10 for shard 0, I'm test0
 4.899s DEBUG process block{client=1}:on_block_accepted{prev_block_hash=ZQod shard_id=1}:produce_chunk{next_height=10 shard_id=1 epoch_id=EpochId(J233)}: client: Producing chunk at height 10 for shard 1, I'm test1
 4.949s DEBUG process block{client=2}:on_block_accepted{prev_block_hash=ZQod shard_id=2}:produce_chunk{next_height=10 shard_id=2 epoch_id=EpochId(J233)}: client: Producing chunk at height 10 for shard 2, I'm test2
 5.147s DEBUG {client=AId("test1")}:produce_block{next_height=10}: client: AId("test1") Producing block at height 10, parent 9 @ ZQod, 3 new chunks
 5.161s DEBUG process block{client=0}:on_block_accepted{prev_block_hash=AFny shard_id=0}:produce_chunk{next_height=11 shard_id=0 epoch_id=EpochId(HACN)}: client: Producing chunk at height 11 for shard 0, I'm test0
 5.166s DEBUG process block{client=0}:on_block_accepted{prev_block_hash=AFny shard_id=4}:produce_chunk{next_height=11 shard_id=4 epoch_id=EpochId(HACN)}: client: Producing chunk at height 11 for shard 4, I'm test0
 5.188s DEBUG process block{client=1}:on_block_accepted{prev_block_hash=AFny shard_id=1}:produce_chunk{next_height=11 shard_id=1 epoch_id=EpochId(HACN)}: client: Producing chunk at height 11 for shard 1, I'm test1
 5.207s DEBUG process block{client=2}:on_block_accepted{prev_block_hash=AFny shard_id=2}:produce_chunk{next_height=11 shard_id=2 epoch_id=EpochId(HACN)}: client: Producing chunk at height 11 for shard 2, I'm test2
 5.226s DEBUG process block{client=3}:on_block_accepted{prev_block_hash=AFny shard_id=3}:produce_chunk{next_height=11 shard_id=3 epoch_id=EpochId(HACN)}: client: Producing chunk at height 11 for shard 3, I'm test3

 5.450s DEBUG {client=AId("test0")}:produce_block{next_height=11}: client: AId("test0") Producing block at height 11, parent 10 @ AFny, 5 new chunks
 5.460s DEBUG process block{client=0}:start_process_block_impl{height=11}:apply_chunks_preprocessing: near_chain::store: get_incoming_receipts_for_shard found receipts from block with missing chunk shard_id=3 last_chunk_height_included=9 block_hash=AFny
 5.461s DEBUG process block{client=0}:start_process_block_impl{height=11}:apply_chunks_preprocessing: near_chain::store: get_incoming_receipts_for_shard could not find receipts from block with missing chunk shard_id=4 last_chunk_height_included=9 block_hash=AFny err=DBNot
FoundErr("INCOMING RECEIPT: AFny 4")
 5.511s DEBUG process block{client=0}:on_block_accepted{prev_block_hash=8Tsb shard_id=0}:produce_chunk{next_height=12 shard_id=0 epoch_id=EpochId(HACN)}: client: Producing chunk at height 12 for shard 0, I'm test0
 5.516s DEBUG process block{client=0}:on_block_accepted{prev_block_hash=8Tsb shard_id=4}:produce_chunk{next_height=12 shard_id=4 epoch_id=EpochId(HACN)}: client: Producing chunk at height 12 for shard 4, I'm test0
 5.534s DEBUG process block{client=1}:start_process_block_impl{height=11}:apply_chunks_preprocessing: near_chain::store: get_incoming_receipts_for_shard found receipts from block with missing chunk shard_id=3 last_chunk_height_included=9 block_hash=AFny
 5.534s DEBUG process block{client=1}:start_process_block_impl{height=11}:apply_chunks_preprocessing: near_chain::store: get_incoming_receipts_for_shard could not find receipts from block with missing chunk shard_id=4 last_chunk_height_included=9 block_hash=AFny err=DBNot
FoundErr("INCOMING RECEIPT: AFny 4")
 5.585s DEBUG process block{client=1}:on_block_accepted{prev_block_hash=8Tsb shard_id=1}:produce_chunk{next_height=12 shard_id=1 epoch_id=EpochId(HACN)}: client: Producing chunk at height 12 for shard 1, I'm test1
 5.600s DEBUG process block{client=2}:start_process_block_impl{height=11}:apply_chunks_preprocessing: near_chain::store: get_incoming_receipts_for_shard found receipts from block with missing chunk shard_id=3 last_chunk_height_included=9 block_hash=AFny
 5.600s DEBUG process block{client=2}:start_process_block_impl{height=11}:apply_chunks_preprocessing: near_chain::store: get_incoming_receipts_for_shard could not find receipts from block with missing chunk shard_id=4 last_chunk_height_included=9 block_hash=AFny err=DBNot
FoundErr("INCOMING RECEIPT: AFny 4")
 5.650s DEBUG process block{client=2}:on_block_accepted{prev_block_hash=8Tsb shard_id=2}:produce_chunk{next_height=12 shard_id=2 epoch_id=EpochId(HACN)}: client: Producing chunk at height 12 for shard 2, I'm test2
 5.666s DEBUG process block{client=3}:start_process_block_impl{height=11}:apply_chunks_preprocessing: near_chain::store: get_incoming_receipts_for_shard found receipts from block with missing chunk shard_id=3 last_chunk_height_included=9 block_hash=AFny
 5.667s DEBUG process block{client=3}:start_process_block_impl{height=11}:apply_chunks_preprocessing: near_chain::store: get_incoming_receipts_for_shard could not find receipts from block with missing chunk shard_id=4 last_chunk_height_included=9 block_hash=AFny err=DBNot
FoundErr("INCOMING RECEIPT: AFny 4")
 5.708s DEBUG process block{client=3}:on_block_accepted{prev_block_hash=8Tsb shard_id=3}:produce_chunk{next_height=12 shard_id=3 epoch_id=EpochId(HACN)}: client: Producing chunk at height 12 for shard 3, I'm test3

@wacban wacban marked this pull request as ready for review August 31, 2023 12:03
@wacban wacban requested a review from a team as a code owner August 31, 2023 12:03
@wacban wacban requested a review from akhi3030 August 31, 2023 12:03
@wacban wacban changed the title Waclaw resharding incoming test: implemented a test for handling incoming receipts during resharding Aug 31, 2023
@wacban wacban requested a review from nikurt August 31, 2023 12:43
@pugachAG pugachAG self-requested a review September 4, 2023 09:27
let mut receipts = collect_receipts(incoming_receipts.get(&shard_id).unwrap());
let receipt_proof_response = &self.store().get_incoming_receipts_for_shard(
let new_receipts = collect_receipts(incoming_receipts.get(&shard_id).unwrap());
let old_receipts = &self.store().get_incoming_receipts_for_shard(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why reassigning old_receipts is needed here, maybe let's just do

let old_receipts = collect_receipts_from_response(&self.store().get_incoming_receipts_for_shard(...));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed, just personal preference for not having too long lines

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far I haven't seen that pattern in our code, so I suggest to keep it consistent and not introduce that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is idiomatic rust and personally I'm using it a lot so there is at least some of it :)
I don't think it has any disavantages and it does make code more readable. Some other usages are to change the type of a variable e.g. when parsing or removing mutability after it's no longer needed.

let number = "123";
let number = parse(number); // changes the type of number to return type of parse e.g. u32

let mut x = 0;
for ... { x += 1; }
let x = x; 

core/primitives/src/epoch_manager.rs Outdated Show resolved Hide resolved
core/primitives/src/epoch_manager.rs Outdated Show resolved Hide resolved
core/primitives/src/epoch_manager.rs Outdated Show resolved Hide resolved
integration-tests/src/tests/client/sharding_upgrade.rs Outdated Show resolved Hide resolved
@@ -166,21 +241,21 @@ impl TestShardUpgradeEnv {
/// layout V2 to be used once the appropriate protocol version is reached
fn step_impl(
&mut self,
p_drop_chunk: f64,
drop_chunk_condition: &DropChunkCondition,
protocol_version: ProtocolVersion,
resharding_type: &ReshardingType,
) {
let env = &mut self.env;
let mut rng = thread_rng();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest switching to a deterministic rand generator to make the test reproducible, you can use something like let mut rng = rand::rngs::StdRng::seed_from_u64(42)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, let me follow up and implement this in a separate PR. FYI this test is randomized on many different levels and I don't think it's possible to fix all of them but I'll try to cover most of it. I also think there is value in having the test run on different random seeds every time. Rather than making it deterministic I would aim to make it reproducible by picking the seed at random and printing it to stdout. This way we can reproduce the issue when debugging by hardcoding the seed to a known bad value but we still get the randomized coverage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think there is value in having the test run on different random seeds every time.

I would argue that there is not much value in that. If you want more coverage then just run the test with more data/samples. Having reproducible tests helps to avoid flakiness, which is a real issue in our codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, let me get the randomness to be seeded from a single place first and then we can continue the discussion.

integration-tests/src/tests/client/sharding_upgrade.rs Outdated Show resolved Hide resolved
integration-tests/src/tests/client/sharding_upgrade.rs Outdated Show resolved Hide resolved
@@ -98,6 +112,62 @@ struct TestShardUpgradeEnv {
num_clients: usize,
}

/// The condition that determines if a chunk should be produced of dropped.
/// Can be probabilistic, predefined based on height and shard id or both.
struct DropChunkCondition {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this abstraction to be a bit weird for the following reasons:

  • it doesn't change state, by_height_shard_id has to be maintained from outside
  • the api has only 1 function

So in oder to use it we need to create a new instance every time just to call should_drop_chunk.
I suggest replacing it with a single function should_drop_chunk that takes probability and by_height_shard_id as a parameter. That would be much simpler and easier to understand.
Alternatively you can make it stateful and update by_height_shard_id every time should_drop_chunk is called (probably it makes sense to rename it something like fn maybe_produce_chunk(..) -> bool).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it should be used is to create one instance of it in a unit test and use it throughout. It is used like that in the new test but you're right that in the other ones I did a bad job of refactoring it - I'll fix that. I could replace it with a function but then I would need to pass both probability and by_height_shard_id which is why I prefer to put those in a struct.

@wacban wacban requested a review from pugachAG September 6, 2023 07:08
@wacban wacban added this pull request to the merge queue Sep 6, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 6, 2023
@wacban wacban added this pull request to the merge queue Sep 6, 2023
Merged via the queue into master with commit d747432 Sep 6, 2023
1 of 2 checks passed
@wacban wacban deleted the waclaw-resharding-incoming branch September 6, 2023 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants