Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Tests fail with --test-threads 1 #10479

Closed
ggwpez opened this issue Dec 13, 2021 · 13 comments · Fixed by #12036
Closed

Tests fail with --test-threads 1 #10479

ggwpez opened this issue Dec 13, 2021 · 13 comments · Fixed by #12036
Assignees
Labels
T1-runtime This PR/Issue is related to the topic “runtime”.

Comments

@ggwpez
Copy link
Member

ggwpez commented Dec 13, 2021

Observation

Many tests fail when run with --test-threads 1. I only tried to run the pallet tests, eg:

cargo test -p "pallet-*" --no-fail-fast -- --test-threads 1

It also seems to happen with --features=runtime-benchmarks.
Maybe someone can check it for tests outside of pallet-*. Polkadot and Cumulus took too long on my machine to compile.

Assumed problem(s)

Thread local vars

The first test that fails (assets/freezer_should_work) should be fixed after #10473 by line which resets a dirty thread local var.
I first spotted it by running Tarpaulin on Substrate, but Tarpaulin was not the problem.
The fact that the test used thread local vars and tarpaulin implicitly forces --test-threads 1 seems to cause at least some of the failures.
The following tests only fails with --test-threads 1:

use std::cell::RefCell;

thread_local! {
    static STICKY: RefCell<bool> = RefCell::new(false);
}

#[test]
fn tarpauline() {
    // STICKY must be false since we just started the test.
    assert!(!STICKY.with(|s| *s.borrow()));
    // Set to true.
    STICKY.with(|s| s.replace(true));
}

#[test]
fn tarpauline2() {
    // STICKY must be false since we just started the test.
    assert!(!STICKY.with(|s| *s.borrow())); // <- FAILS HERE
    // Set to true.
    STICKY.with(|s| s.replace(true));
}

As far as I understand it: cargo test normally schedules the tests on multiple threads, which means that the STICKY var is false for each #[test].
When --test-threads 1 is used, the tests run consecutively and the second test that is executed gets the sticky/dirty value of the first test function; causing the failure.

Other causes

I did not have the time to look into the other failures to see if they could all be failing because of dirty thread locals.
Maybe there is more amiss.

Next

Please confirm and decide if this should be fixed.
Theoretically this could also be changed in cargo test, such that it resets thread locals before each #[test] 🤔

@github-actions github-actions bot added the J2-unconfirmed Issue might be valid, but it’s not yet known. label Dec 13, 2021
@shawntabrizi shawntabrizi self-assigned this Jan 3, 2022
@wigy-opensource-developer
Copy link
Contributor

Rust does not have built-in setup/teardown support for test fixtures. But that does not mean we cannot use something like rstest. We seem to be using thread-local-storage for just easier fixture setup in these mocks and we want to use a fresh fixture for every test case, whatever thread they are running in.

@ggwpez ggwpez removed the J2-unconfirmed Issue might be valid, but it’s not yet known. label Jan 12, 2022
@ggwpez
Copy link
Member Author

ggwpez commented Jan 12, 2022

We seem to be using thread-local-storage for just easier fixture setup in these mocks and we want to use a fresh fixture for every test case, whatever thread they are running in.

Some of these locals are accessed in callbacks and not just for closure I think, so they need to be accessible in the scope surrounding the #[test] functions.
AFAIU the rstest only passes arguments to the test functions, right? @wigy-opensource-developer

One solution that I would not be particularly proud of:
Create a define_thread_locals! macro to define all thread locals and creates a clear_thread_locals! macro to clear all RefCell with RefCell::new(Default::default()). This could then be called in new_test_ext().execute_with.
PS: Would be the same construction as used in #10592

@shawntabrizi
Copy link
Member

shawntabrizi commented Jan 12, 2022

@ggwpez but what would be the final outcome we achieved here for additional complexity and work? Does not seem that really anyone executes tests like this anyway, which is why this is the first time it is being reported.

@wigy-opensource-developer
Copy link
Contributor

wigy-opensource-developer commented Jan 12, 2022

Some of these locals are accessed in callbacks and not just for closure I think, so they need to be accessible in the scope surrounding the #[test] functions. AFAIU the rstest only passes arguments to the test functions, right?

If we want independent tests, we need to create a fresh fixture for each of them. Because the Rust test runner starts each #[test] on a fresh stack, it is only static and thread local state from mock.rs that can make the tests interact with each other making them erratic. (I try to use terms correctly from http://xunitpatterns.com/ )

@ggwpez
Copy link
Member Author

ggwpez commented Jan 13, 2022

but what would be the final outcome we achieved here for additional complexity and work? Does not seem that really anyone executes tests like this anyway, which is why this is the first time it is being reported.

Well, this is kind of what I meant with

Please confirm and decide if this should be fixed.

I'm not sure if it is worth it. Might be worth testing on a single threaded system, to see it it happens there as well @shawntabrizi.
PS: It does not compile with the current Tarpaulin version, but if it turns out not working this could be a reason to fix it.

For example when I look at the tests for pallet-assets, I do not see how this could be done with test closures since they would need to be accessible for the FrozenBalance balance impl here. @wigy-opensource-developer

@wigy-opensource-developer
Copy link
Contributor

wigy-opensource-developer commented Jan 13, 2022

Yeah. So because the FrozenBalance trait does not take a proper &self or &mut self parameter on their methods, what can we do but use static state, right? Well, wrong!

The only reasonable way forward is to not treat every type definition in a #[pallet::config] a global variable. We hire Rust developers, but this is a completely different, limited language we can use in these macros, causing all kind of gotchas like this.

This should be completely viable for implementing FrozenBalance:

#[derive(Debug, Default)]
pub struct TestFreezer {
	frozen: HashMap<(u32, u64), u64>,
	hooks: Vec<Hook>,
}

impl FrozenBalance<u32, u64, u64> for TestFreezer {
	fn frozen_balance(&self, asset: u32, who: &u64) -> Option<u64> {
		self.frozen.get(&(asset, who.clone())).cloned()
	}

	fn died(&mut self, asset: u32, who: &u64) {
		self.hooks.push(Hook::Died(asset, who.clone()));
	}
}

impl TestFreezer {
	pub(crate) fn set_frozen_balance(&mut self, asset: u32, who: u64, amount: u64) {
		self.frozen.insert((asset, who), amount);
	}
	pub(crate) fn clear_frozen_balance(&mut self, asset: u32, who: u64) {
		self.frozen.remove(&(asset, who));
	}
	pub(crate) fn hooks(&self) -> Vec<Hook> {
		self.hooks.clone()
	}
}

@ggwpez
Copy link
Member Author

ggwpez commented Jan 13, 2022

I do not want to hold you back from fixing it. Just for me personally the priority of working on it is not be the highest.
You can open a MR if you want to fix it.
@wigy-opensource-developer

@wigy-opensource-developer
Copy link
Contributor

@shawntabrizi What I understand from the FRAME design is that all state is in the storage, and in order to emphasize the "statelessness" (purity) of the block function composed out of all this code, self is missing from all these things intentionally. The missing piece is testability, which requires dependency injection. Although this is an article about F#, but writer on several books on the topic, Mark Seemann describes why just for the sake of testability we need to allow impurity in the final assembled runtime: https://blog.ploeh.dk/2017/01/30/partial-application-is-dependency-injection/

@arkpar
Copy link
Member

arkpar commented Jan 13, 2022

Using TLS for tests looks like an anti-pattern to me. Why not just use the storage?
Something like this should work:

pub struct TestFreezer;
impl FrozenBalance<u32, u64, u64> for TestFreezer {
	fn frozen_balance(asset: u32, who: &u64) -> Option<u64> {
		let key: u128 = (asset as u128) << 64 | *who as u128;
		sp_io::storage::get(&key.to_le_bytes()).map(|x| u64::from_le_bytes(x.as_slice().try_into().unwrap()))
	}
}

pub(crate) fn set_frozen_balance(asset: u32, who: u64, amount: u64) {
	let key: u128 = (asset as u128) << 64 | who as u128;
	sp_io::storage::set(&key.to_le_bytes(), &amount.to_le_bytes());
}
pub(crate) fn clear_frozen_balance(asset: u32, who: u64) {
	let key: u128 = (asset as u128) << 64 | who as u128;
	sp_io::storage::clear(&key.to_le_bytes());
}

Alternatively let key = (asset, who).encode()

@wigy-opensource-developer
Copy link
Contributor

Using TLS for tests looks like an anti-pattern to me. Why not just use the storage? Something like this should work

Will you create a fresh storage for each and every test function? Otherwise you end up with the same interacting tests.

@arkpar
Copy link
Member

arkpar commented Jan 13, 2022

Each test already uses a unique instance of TestExternalities which holds the storage. There should be no interference.

@ggwpez
Copy link
Member Author

ggwpez commented Sep 8, 2022

Yea there are still crates failing. Anyway we can do this cleanup lean step by step.

@ggwpez
Copy link
Member Author

ggwpez commented Jul 3, 2023

Does not reproduce on rust 1.70 anymore. Neither the pallet tests not the example code fail on one test thread 🎉

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T1-runtime This PR/Issue is related to the topic “runtime”.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants