Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Potential side effects with multiprocessing mode #3473

Merged
merged 1 commit into from
Aug 15, 2023

Conversation

wenceslas-sanchez
Copy link
Contributor

@wenceslas-sanchez wenceslas-sanchez commented Aug 6, 2023

@hwwhww said there was an issue when the multiprocessing mode was activated:


For example, the
minimal/capella/fork_choice/ex_ante/pyspec_tests/ex_ante_attestations_is_greater_than_proposer_boost_with_boost test should NOT exist because we’ve added the decorator “@with_presets([MAINNET], reason="to create non-duplicate committee”)” (

@with_altair_and_later
@with_presets([MAINNET], reason="to create non-duplicate committee")
@spec_state_test
def test_ex_ante_attestations_is_greater_than_proposer_boost_with_boost(spec, state):
)

_But if you run `make gen_fork_choice` with `MODE_MULTIPROCESSING`, you will see both folders:_
mainnet/capella/fork_choice/ex_ante/pyspec_tests/ex_ante_attestations_is_greater_than_proposer_boost_with_boost

minimal/capella/fork_choice/ex_ante/pyspec_tests/ex_ante_attestations_is_greater_than_proposer_boost_with_boost

Its like the decorator @with_presets([MAINNET], reason="to create non-duplicate committee") was ignored.

What I did was to only consider 2 tests in eth2spec/test/phase0/fork_choice/test_ex_ante.py: test_ex_ante_attestations_is_greater_than_proposer_boost_with_boost and test_ex_ante_vanilla. What I discovered is the fact that the generate_test_vector was running both time the test method of test_ex_ante_vanilla. From the below:

for tprov in test_providers:
if not collect_only:
# runs anything that we don't want to repeat for every test case.
tprov.prepare()
for test_case in tprov.make_cases():
# If preset list is assigned, filter by presets.
if len(presets) != 0 and test_case.preset_name not in presets:
continue
# If fork list is assigned, filter by forks.
if len(forks) != 0 and test_case.fork_name not in forks:
continue
case_dir = get_test_case_dir(test_case, output_dir)
print(f"Collected test at: {case_dir}")
diagnostics_obj.collected_test_count += 1
is_skip, diagnostics_obj = should_skip_case_dir(case_dir, args.force, diagnostics_obj)
if is_skip:
continue
if GENERATOR_MODE == MODE_SINGLE_PROCESS:
result = generate_test_vector(test_case, case_dir, log_file, file_mode)
write_result_into_diagnostics_obj(result, diagnostics_obj)
elif GENERATOR_MODE == MODE_MULTIPROCESSING:
item = TestCaseParams(test_case, case_dir, log_file, file_mode)
all_test_case_params.append(item)
if GENERATOR_MODE == MODE_MULTIPROCESSING:
with Pool(processes=NUM_PROCESS) as pool:
results = pool.map(worker_function, iter(all_test_case_params))

test_ex_ante_attestations_is_greater_than_proposer_boost_with_boost was the first test to be submitted in the list all_test_case_params, and when started the second iteration of for tprov in test_providers: then the item in all_test_case_params was modified: the value all_test_case_params[0].test_case.case_fn() runned the method test_ex_ante_vanilla and not anymore the method test_ex_ante_attestations_is_greater_than_proposer_boost_with_boost (to test that, I changed both method, one raised a AssertionError and the other one a ValueError; and I runned test_case.case_fn() to check what was the error raised.)

It seems that the method generate_from_tests in eth2spec/gen_helpers/gen_from_tests/gen.py have a side effect on the lambda function provided to TestCase.case_fn => like tfn was modified when the TestCase of test_ex_ante_vanilla was generated. That's why the decorator seems to be ignored.

Honestly I don't know why the lambda function is impacted, my opinion is that this succesion of yield may generates this side effect.

What I did to fix this was to generate the lambda function by another method outside the scope of generate_from_tests and it seems to work.

@wenceslas-sanchez wenceslas-sanchez marked this pull request as draft August 7, 2023 20:44
@hwwhww
Copy link
Contributor

hwwhww commented Aug 8, 2023

((For the context, the issue was described here: https://hackmd.io/nYh_-IykT2K4jeVwd87iaw))

Holy 🤯

It looks promising! I ran the testgen with your fix against the current master branch. The full git status diff: https://gist.githubusercontent.com/hwwhww/9a5b4eecf34f756437c4f3f5a327ff3a/raw/d5bc62e2645f8e3f8ed07f6bcc198c89989aa678/multiprocessing_diff.txt

Testgen total time: 57m 24s ✨

It looks like some random test vectors were modified, but at least no test cases were deleted this time, which means it seems that issue 2 (#3385) was also resolved!

I'm checking why some test vectors were modified and if this fix changes MODE_SINGLE_PROCESS mode result too.

Well done! 👏 🚀

@hwwhww
Copy link
Contributor

hwwhww commented Aug 8, 2023

Update: confirmed that this fix doesn't change MODE_SINGLE_PROCESS results.

@hwwhww hwwhww marked this pull request as ready for review August 15, 2023 12:39
Copy link
Contributor

@hwwhww hwwhww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @wenceslas-sanchez!

I'm going to fix it and then fix the tests in another PR.

@hwwhww hwwhww merged commit ef434e8 into ethereum:dev Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants