pvf validation timeout #4969

xlc · 2024-07-08T11:35:44Z

Getting this error on our internal testnet

2024-07-08 05:38:35.865 WARN tokio-runtime-worker parachain::pvf: execution worker concluded, error occurred: candidate validation: invalid: hard timeout artifact_id=ArtifactId { code_hash: 0xb14e5edf8ef54349c887a9ee7cdaddb39156aac9ca53d3a1c434697628c4feff, executor_params_hash: 0x03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c111314 } worker=Worker(1v267) worker_rip=true
2024-07-08 05:38:35.865 INFO tokio-runtime-worker parachain::candidate-validation: Failed to validate candidate para_id=Id(2000) error=Invalid(HardTimeout)
2024-07-08 05:38:35.866 WARN tokio-runtime-worker parachain::candidate-backing: Validation yielded an invalid candidate candidate_hash=0xb149d18e0796830b5c03f699b6002d4a42cf7b81c24b29a1b5b46c22f0617f56 reason=Timeout traceID=235656643200957744926756583968484633930

The para block is empty and produced in 3ms

2024-07-08 09:32:00.019 INFO tokio-runtime-worker sc_basic_authorship::basic_authorship: 🎁 Prepared block for proposing at 2253304 (3 ms) [hash: 0x3713a36193696554d697035c9788c1e3912e2d1c5fd736798d04e919c2c0b992; parent_hash: 0x1732…c11d; extrinsics (8): [0x7866…5f7d, 0x66c3…5293, 0x9a3d…0e44, 0xadb6…0f1b, 0x495c…19cf, 0x1061…b5dd, 0x121e…e936, 0x9209…8606]

Relaychain is using 1.10.0 node with runtime of rococo 9420
Parachain is Acala 2.25.0 which is using polkadot sdk 1.9.0
It has been running for few months fine and suddenly getting this error and I don't see any special transaction/action that may have triggered this.

I will give #4640 a try to see if I can dig out more info

The text was updated successfully, but these errors were encountered:

alexggh · 2024-07-08T12:17:45Z

The error says the candidate took too long to validate, and most of the time I saw it it was because of mismatch in computing power between the collator and the validator.

Are you sure the block took just 3ms to build? Because the timestamps you provided don't seem to match.

xlc · 2024-07-08T12:26:16Z

The validator log and collator log are gathered at different time and referencing to different blocks, but the parachain is in stuck so it just keep reproducing the same block and I am pretty sure it is a small one as I can still see the pending block and it is very empty, no runtime migration, no scheduling tasks etc.

The validator and collator are running on machines with same spec and the fact it was working few days ago for many months.

bkchr · 2024-07-08T12:44:47Z

Are you sure the block took just 3ms to build? Because the timestamps you provided don't seem to match.

Yeah these 3ms in the log reference the build time of the entire block.

@xlc and me already discussed this on element. I don't really get why this block runs into performance issues.

alexggh · 2024-07-08T13:21:38Z

Yes, it is weird, can you provide the full logs for both collators and validators for a period of time, maybe something comes up.

xlc · 2024-07-08T23:56:43Z

I think it is related to IO issue. Not sure if is the node somehow suddenly making unreasonable amount of IO requests or somewhere in AWS triggered it.

xlc · 2024-07-09T00:07:47Z

yeah can confirm it is indeed IO issue. the startup validator bench actually highlights the IO performance is significantly lower than expected

bkchr · 2024-07-09T07:46:02Z

@xlc how did the IO degrade that much? :D Any ideas?

xlc · 2024-07-09T09:12:26Z

not sure what triggers it but this is using AWS EFS with burst mode. ie it can only have peak IO for short period of time before getting throttled so maybe it just cross the default throughput line and getting heavily throttled

xlc closed this as completed Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pvf validation timeout #4969

pvf validation timeout #4969

xlc commented Jul 8, 2024 •

edited

Loading

alexggh commented Jul 8, 2024

xlc commented Jul 8, 2024

bkchr commented Jul 8, 2024

alexggh commented Jul 8, 2024

xlc commented Jul 8, 2024

xlc commented Jul 9, 2024

bkchr commented Jul 9, 2024

xlc commented Jul 9, 2024

pvf validation timeout #4969

pvf validation timeout #4969

Comments

xlc commented Jul 8, 2024 • edited Loading

alexggh commented Jul 8, 2024

xlc commented Jul 8, 2024

bkchr commented Jul 8, 2024

alexggh commented Jul 8, 2024

xlc commented Jul 8, 2024

xlc commented Jul 9, 2024

bkchr commented Jul 9, 2024

xlc commented Jul 9, 2024

xlc commented Jul 8, 2024 •

edited

Loading