Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: intermittent failure in feature_llmq_simplepose.py #5859

Merged
merged 2 commits into from
Feb 7, 2024

Conversation

knst
Copy link
Collaborator

@knst knst commented Feb 2, 2024

Issue being fixed or feature implemented

kudos to kwvg to report issue: he pointed out that functional tests randomly fail lately.
Bisect pointed out an exact commit: bitcoin#20027 - mockable time everywhere

What was done?

Added call mn.node.mockscheduler as it is done in 20027 for other functional tests

How Has This Been Tested?

Run 20 times. Without this patch 20% failures; with this patch - zero failures.

test/functional/test_runner.py -j20 feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py feature_llmq_simplepose.py

Breaking Changes

N/a

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone

@knst knst added this to the 20.1 milestone Feb 2, 2024
@knst knst marked this pull request as draft February 2, 2024 19:21
@knst knst force-pushed the fix-llmq-simplepose branch from 878bd28 to 9b6eb25 Compare February 2, 2024 19:38
@knst knst marked this pull request as ready for review February 2, 2024 20:48
@knst
Copy link
Collaborator Author

knst commented Feb 3, 2024

@kwvg claims that with this patch it still fails randomly on his machine, but only 1/3 of failures compare to without patch.

<with patch>
feature_llmq_simplepose.py | � Passed  | 376 s
feature_llmq_simplepose.py | � Passed  | 379 s
feature_llmq_simplepose.py | � Passed  | 382 s
feature_llmq_simplepose.py | � Passed  | 383 s
feature_llmq_simplepose.py | � Passed  | 384 s
feature_llmq_simplepose.py | � Passed  | 384 s
feature_llmq_simplepose.py | � Passed  | 386 s
feature_llmq_simplepose.py | � Passed  | 387 s
feature_llmq_simplepose.py | � Passed  | 387 s
feature_llmq_simplepose.py | � Passed  | 388 s
feature_llmq_simplepose.py | � Passed  | 388 s
feature_llmq_simplepose.py | � Passed  | 389 s
feature_llmq_simplepose.py | � Passed  | 389 s
feature_llmq_simplepose.py | � Passed  | 390 s
feature_llmq_simplepose.py | � Passed  | 390 s
feature_llmq_simplepose.py | � Passed  | 391 s
feature_llmq_simplepose.py | � Passed  | 391 s
feature_llmq_simplepose.py | � Failed  | 69 s
feature_llmq_simplepose.py | � Failed  | 69 s
feature_llmq_simplepose.py | � Failed  | 71 s

vs

<without patch>
feature_llmq_simplepose.py | � Passed  | 353 s
feature_llmq_simplepose.py | � Passed  | 356 s
feature_llmq_simplepose.py | � Passed  | 351 s
feature_llmq_simplepose.py | � Passed  | 352 s
feature_llmq_simplepose.py | � Passed  | 354 s
feature_llmq_simplepose.py | � Passed  | 350 s
feature_llmq_simplepose.py | � Passed  | 353 s
feature_llmq_simplepose.py | � Passed  | 357 s
feature_llmq_simplepose.py | � Passed  | 346 s
feature_llmq_simplepose.py | � Passed  | 356 s
feature_llmq_simplepose.py | � Passed  | 346 s
feature_llmq_simplepose.py | � Failed  | 62 s
feature_llmq_simplepose.py | � Failed  | 63 s
feature_llmq_simplepose.py | � Failed  | 59 s
feature_llmq_simplepose.py | � Failed  | 61 s
feature_llmq_simplepose.py | � Failed  | 63 s
feature_llmq_simplepose.py | � Failed  | 59 s
feature_llmq_simplepose.py | � Failed  | 61 s
feature_llmq_simplepose.py | � Failed  | 60 s
feature_llmq_simplepose.py | � Failed  | 60 s

any ideas how we can improve this simplepose test futhermore?

Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any improvement on my machine using this PR. Nodes drop connections on develop for no clear reason (no errors in logs). Simply waiting for connections did not help so I tried moving block generation out of the loop and suddenly d89f41c fixed the issue for me. But I don't understand how/why exactly... 🤷‍♂️ It would be nice if someone could explain why it worked 😄 Anyway, pls try this patch instead and share your thoughts.

test/functional/feature_llmq_simplepose.py Outdated Show resolved Hide resolved
@knst
Copy link
Collaborator Author

knst commented Feb 3, 2024

Simply waiting for connections did not help so I tried moving block generation out of the loop and suddenly d89f41c fixed the issue for me

almost fixed for me, still one failure out of 20 which is pretty good

@kwvg , can you test it, please?

feature_llmq_simplepose.py | ✓ Passed  | 355 s
feature_llmq_simplepose.py | ✓ Passed  | 355 s
feature_llmq_simplepose.py | ✓ Passed  | 358 s
feature_llmq_simplepose.py | ✓ Passed  | 359 s
feature_llmq_simplepose.py | ✓ Passed  | 360 s
feature_llmq_simplepose.py | ✓ Passed  | 360 s
feature_llmq_simplepose.py | ✓ Passed  | 361 s
feature_llmq_simplepose.py | ✓ Passed  | 361 s
feature_llmq_simplepose.py | ✓ Passed  | 362 s
feature_llmq_simplepose.py | ✓ Passed  | 366 s
feature_llmq_simplepose.py | ✓ Passed  | 366 s
feature_llmq_simplepose.py | ✓ Passed  | 367 s
feature_llmq_simplepose.py | ✓ Passed  | 367 s
feature_llmq_simplepose.py | ✓ Passed  | 368 s
feature_llmq_simplepose.py | ✓ Passed  | 369 s
feature_llmq_simplepose.py | ✓ Passed  | 369 s
feature_llmq_simplepose.py | ✓ Passed  | 370 s
feature_llmq_simplepose.py | ✓ Passed  | 370 s
feature_llmq_simplepose.py | ✓ Passed  | 371 s
feature_llmq_simplepose.py | ✖ Failed  | 126 s

@knst knst force-pushed the fix-llmq-simplepose branch from 9b6eb25 to ea0d064 Compare February 3, 2024 17:03
@knst knst requested a review from UdjinM6 February 3, 2024 17:03
@kwvg
Copy link
Collaborator

kwvg commented Feb 3, 2024

dash@228219c1225b:/src/dash$ ./test/functional/test_runner.py -j20 feature_llmq_simplepose.py [...]
[...]
TEST                       | STATUS    | DURATION

feature_llmq_simplepose.py | � Passed  | 381 s
feature_llmq_simplepose.py | � Passed  | 385 s
feature_llmq_simplepose.py | � Passed  | 386 s
feature_llmq_simplepose.py | � Passed  | 386 s
feature_llmq_simplepose.py | � Passed  | 387 s
feature_llmq_simplepose.py | � Passed  | 387 s
feature_llmq_simplepose.py | � Passed  | 388 s
feature_llmq_simplepose.py | � Passed  | 389 s
feature_llmq_simplepose.py | � Passed  | 390 s
feature_llmq_simplepose.py | � Passed  | 392 s
feature_llmq_simplepose.py | � Passed  | 392 s
feature_llmq_simplepose.py | � Passed  | 393 s
feature_llmq_simplepose.py | � Passed  | 393 s
feature_llmq_simplepose.py | � Passed  | 394 s
feature_llmq_simplepose.py | � Passed  | 394 s
feature_llmq_simplepose.py | � Passed  | 395 s
feature_llmq_simplepose.py | � Passed  | 395 s
feature_llmq_simplepose.py | � Passed  | 396 s
feature_llmq_simplepose.py | � Passed  | 396 s
feature_llmq_simplepose.py | � Passed  | 397 s

ALL                        | � Passed  | 7816 s (accumulated)
Runtime: 397 s
dash@228219c1225b:/src/dash$ git rev-parse HEAD
d89f41caeec33b9a4957c95b1a747e26e2e915b9

@UdjinM6
Copy link

UdjinM6 commented Feb 3, 2024

I think I figured it out. Previously isolated nodes are trying to fetch blocks they missed as soon as they have a new connection open. When we bump mocktime 10+ minutes in the middle of that we trigger Timeout downloading block event https://github.com/dashpay/dash/blob/develop/src/net_processing.cpp#L5171. So these nodes drop "slow" peers (node0, the only connection they have) and everything falls apart. Bumping mocktime only when nodes are synced already should solve the issue. Pls check f8581d8.

@kwvg
Copy link
Collaborator

kwvg commented Feb 3, 2024

dash@228219c1225b:/src/dash$ ./test/functional/test_runner.py -j20 feature_llmq_simplepose.py [...]
[...]
TEST                       | STATUS    | DURATION

feature_llmq_simplepose.py | � Passed  | 374 s
feature_llmq_simplepose.py | � Passed  | 379 s
feature_llmq_simplepose.py | � Passed  | 384 s
feature_llmq_simplepose.py | � Passed  | 385 s
feature_llmq_simplepose.py | � Passed  | 386 s
feature_llmq_simplepose.py | � Passed  | 387 s
feature_llmq_simplepose.py | � Passed  | 389 s
feature_llmq_simplepose.py | � Passed  | 389 s
feature_llmq_simplepose.py | � Passed  | 390 s
feature_llmq_simplepose.py | � Passed  | 390 s
feature_llmq_simplepose.py | � Passed  | 391 s
feature_llmq_simplepose.py | � Passed  | 393 s
feature_llmq_simplepose.py | � Passed  | 394 s
feature_llmq_simplepose.py | � Passed  | 394 s
feature_llmq_simplepose.py | � Passed  | 395 s
feature_llmq_simplepose.py | � Passed  | 396 s
feature_llmq_simplepose.py | � Passed  | 396 s
feature_llmq_simplepose.py | � Passed  | 397 s
feature_llmq_simplepose.py | � Passed  | 397 s
feature_llmq_simplepose.py | � Passed  | 398 s

ALL                        | � Passed  | 7804 s (accumulated)
Runtime: 398 s
dash@228219c1225b:/src/dash$ git rev-parse HEAD
f8581d89975f78ebde1a66de2bd98788c4422e18

@UdjinM6
Copy link

UdjinM6 commented Feb 6, 2024

Pls check f8581d8

ping @knst

@knst knst force-pushed the fix-llmq-simplepose branch from ea0d064 to 623c30d Compare February 7, 2024 12:03
Copy link
Collaborator Author

@knst knst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls check f8581d8

this one works perfectly for me.

Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK :)

Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK for squash merge

@PastaPastaPasta PastaPastaPasta merged commit 83ec7f2 into dashpay:develop Feb 7, 2024
9 checks passed
Copy link
Collaborator

@kwvg kwvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post-merge utACK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants