-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Got Essential task overseer
failed error after upgrading Kusama and Polkadot validator to v1.5.0
#2728
Comments
Please provide more logs and also the logs and not a screenshot. |
Thanks for the response @bkchr.
And here are the parameters that I used when running the validator |
This is the root cause:
I think you are hiting the same problem as here: #2662 |
Yes, looks like some new security features couldn't be enabled. "Operation not permitted" is interesting. Can you share details of your setup, is there anything unusual about it? Is your database path on a mount or have any special restrictions? By the way, upgrading to Linux 5.13+ would make this part of the error go away, and the other part (unshare) becomes optional:
If upgrading is not possible you can pass the CLI flag specified in the error. |
68d8650 Bump thiserror from 1.0.50 to 1.0.51 009c989 remove no longer valid check from the ensure_weights_are_correct (#2740) 94c44a7 Added Rococo BH <> Rococo Bulletin bridge (#2724) 5fe0f2f Bump tokio from 1.34.0 to 1.35.0 25f8251 Grafana update stuff (#2733) 06fbe8b Improved `ExportXcm::validate` implementation for BridgeHubs - step 1 (#2727) 390e836 Select header that will be fully refunded in on-demand batch finality relay (#2729) ce701dd separate constants for average and worst case relay headers (#2728) 09215c5 Backport from `polkadot-sdk` + bump (#2725) 6327261 Bump serde from 1.0.192 to 1.0.193 fff9ddd Bump sysinfo from 0.29.10 to 0.29.11 4be99fe Monitoring and alerts for Rococo/Westend (#2710) 67a683a Bump ed25519-dalek from 2.0.0 to 2.1.0 8e0e794 quick and dirty fix for the `wait -p` and older distros (#2712) 3ab6562 Add withdraw reserve assets to zombienet tests (#2711) c2c409b increase init timeouts in zombienet tests (#2706) a8c60b4 fix lane id and bridged chain id (#2705) 9ac0f26 removed bp-asset-hub-kusama and bp-asset-hub-polkadot (#2703) 4916475 Some fixes for zombienet tests (polkadot-staging) (#2704) 6f9a147 zombienet from Wococo to Westend (#2699) 3ba7910 Porting changes from polkadot-sdk to polkadot-staging - before update subtree with removed wococo stuff (#2696) 653448f Remove Woococo related stuff (#2692) 03aaab2 Gitspiegel polkadot staging (#2695) 702a4c1 Drop Rialto <> Millau bridges (#2663) (#2694) 6a63b5f Start version guards for the ED loop (#2678) 896b9a9 typo (#2690) 671d27c Bump serde from 1.0.190 to 1.0.192 991b229 Bump clap from 4.4.7 to 4.4.8 ec267ec Bump env_logger from 0.10.0 to 0.10.1 592e407 Bump tokio from 1.33.0 to 1.34.0 c49ce3d Bump serde_json from 1.0.107 to 1.0.108 04b3319 Update subxt-codegen version (#2674) 03f9804 backport #2139 (#2673) 49245dd removed unused PARACHAINS_FINALITY_PALLET_NAME constant (#2670) 658a3f5 BHR/BHWE spec_version according to the `polkadot-sdk` (#2668) 7666b94 Nit from `polkadot-sdk` (#2665) b5c43bb Adjusted constant because for measuring we used mistakenly rococo constants (#2664) 062449d Add Rococo<>Westend bridge support/relay (#2647) 55eb44e Add basic zombienet test to be used in the future (#2649) (#2660) 93b6b3f Bump clap from 4.4.6 to 4.4.7 4c01ab0 Bump futures from 0.3.28 to 0.3.29 a31a6c0 Bump tempfile from 3.8.0 to 3.8.1 bcdfe83 Bump serde from 1.0.189 to 1.0.190 f7433b0 Port #2648 to polkadot-staging (#2651) 3896738 Bump scale-info from 2.9.0 to 2.10.0 12d62c5 Bump thiserror from 1.0.49 to 1.0.50 1d78aa1 Backport from `polkadot-sdk` with actual master (#2633) ab4de94 Grandpa justifications: Avoid duplicate vote ancestries (#2634) (#2635) 465562a add missing crate descriptions (#2629) 28d3680 Bump fixed-hash 67528c4 Bump serde from 1.0.188 to 1.0.189 d450c47 Bump time from 0.3.29 to 0.3.30 6a19f83 Bump async-trait from 0.1.73 to 0.1.74 a92d213 Millau, Rialto: accept equivocation reports (#2614) (#2617) a61f777 Bump tokio from 1.32.0 to 1.33.0 0052f64 Bump subxt from 0.32.0 to 0.32.1 ccc849d Bump num-traits from 0.2.16 to 0.2.17 22f2752 apply late suggestions for #2600 (#2603) 0320172 actualize check_obsolete_call comment (#2601) 5cbbd25 Reject transactions if bridge pallets are halted (#2600) ca4dfe3 Bump subxt from 0.31.0 to 0.32.0 8bf7b58 Bump clap from 4.4.4 to 4.4.6 88b0b99 Bump thiserror from 1.0.48 to 1.0.49 263833b https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/3833103 (#2589) 4f44968 Backport changes from polkadot-sdk (#2588) 7200ed1 fiox overflow when computing priority boost (#2587) e02cbd3 Bump time from 0.3.28 to 0.3.29 a097dd2 Bump clap from 4.4.3 to 4.4.4 801ce88 Merge bulletin chain changes into polkadot staging (#2574) a3803ce Add unit tests for the equivocation detection loop (#2571) 26dfc31 Bump clap from 4.4.2 to 4.4.3 66a8beb Bump serde_json from 1.0.106 to 1.0.107 18c50da Bump trie-db from 0.27.1 to 0.28.0 4c4fa92 Equivocation detection loop: Reorganize block checking logic as state machine (#2555) (#2557) 6bd317a Bump serde_json from 1.0.105 to 1.0.106 a7e6bfd Backport for polkadot-sdk#1446 (#2546) d9f8050 Bump sysinfo from 0.29.9 to 0.29.10 901f44c Bump thiserror from 1.0.47 to 1.0.48 82eeb50 Bump sysinfo from 0.29.8 to 0.29.9 a0c934b Bump strum from 0.24.1 to 0.25.0 1064fbf Bump subxt from 0.28.0 to 0.31.0 e50398d bridges subtree fixes (#2528) 99af075 Markdown linter (#1309) (#2526) 733ff0f `polkadot-staging` branch: Use polkadot-sdk dependencies (#2524) e8a59f1 Fix benchmark with new XCM::V3 `MAX_INSTRUCTIONS_TO_DECODE` (#2514) 62b185d Backport `polkadot-sdk` changes to `polkadot-staging` (#2518) d9658f4 Fix equivocation detection containers startup (#2516) (#2517) d65db28 Backport: building images from locally built binaries (#2513) 5fdbaf4 Start the equivocation detection loop from the complex relayer (#2507) (#2512) 7fbb67d Backport: Implement basic equivocations detection loop (#2375) cb7efe2 Manually update deps in polkadot staging (#2371) d17981f #2351 to polkadot-staging (#2359) git-subtree-dir: bridges git-subtree-split: 68d8650
Thanks for the reply! @mrcnski I am running the node as a pod in a k8s cluster. And the database is in a PVC that is mounted on the pod. The related settings for the pod:
|
I am completely unfamiliar with kubernetes, but I presume the node is running in a container. That is probably why certain operations are not allowed, and maybe it depends on the container settings. For example if there is a seccomp sandbox it could be blocking the syscall, but I think this can be turned off. What is your Linux kernel version? |
Thanks @mrcnski. Yeah, I think you are right. Here is my Linux kernel version
|
Thanks @AlexZhenWang! If it's possible, you can upgrade to Linux 5.13+; you'll still get a warning due to running in a container, but it won't be a hard error. Otherwise you'll need to pass |
#2486 (comment) posting this here as I think it belongs here. |
@matthewmarcus Hi! Could you please provide an output/answer for the following commands/questions
Thank you! |
Landlock is optional, anyway. The main problem to me is that |
Hey. Thanks for the reply. Here are the results:
When attempting to run
|
@matthewmarcus what Linux distribution is that and what architecture you're running on? |
Ubuntu 20.04 Intel NUC w/ Core i7-8559U processor |
@matthewmarcus Canonical hasn't officially released 6.7.5 kernel for 20.04 AFAIK. Do you use Mainline or some other kernel manager? |
@matthewmarcus I haven't been using Ubuntu for quite some time now, but from what I googled quickly, for 20.04 the supported version in the HWE stack is 5.15, and the 6.7.5 most probably comes from mainline builds. The mainline builds are not supported, not guaranteed to work, and not recommended for production use. I don't say it's definitely a problem, but if you could try to run your node after booting from the officially supported 5.15 kernel from Ubuntu distro, you could probably save us a lot of debugging time :) I personally run 6.7.0 from the Manjaro distro, and I don't have any issues with secure validator mode, but that's not exactly the same as the mainline builds. |
Yeah, I've spawned a test Ubuntu 20.04 amd64 machine and the latest available through the |
@maksimryndin how about |
@s0me0ne-unkn0wn yeah, you're right :) 5.15. Exactly! |
I used the https://askubuntu.com/questions/1388115/how-do-i-update-my-kernel-to-the-latest-one |
Well, the kernel we were using prior to 6.7.5 was 5.15.0-88-generic and that was giving us the same errors (see #2486 (comment)). So unless one of the minor builds after 5.15.0-88 fixed the issue, the 5.15 kernel isn't working either. |
That's interesting b/c we've never manually updated the kernel on this machine since originally installing Ubuntu 20.04, and the kernel it chose (?) to use was 5.15.0-88-generic. I did notice there were a boat load of other kernels on the box as well, but I removed them in an attempt to free up some disk space. Removing them, tho, did not free up any disk space. :) @maksimryndin |
Just ran the
Sharing in case it helps with debugging. @s0me0ne-unkn0wn Also, ran this command
|
Did I scare everyone away? @s0me0ne-unkn0wn @maksimryndin |
@matthewmarcus what does the OS-suggested Honestly, I'd try to install Ubuntu from scratch, not using mainline builds and using the supported HWE stack. There's nothing special about NUC hardware that might prevent proper sandboxing AFAIK (@koute ?) so that's most probably a kernel issue. If you're able to sort it out with |
Well, there can be a few reasons why this doesn't work, but AFAIK usually the reason is that the environment is configured to disallow unprivileged users to create user namespaces. Some Linux distribution might be configured in such a way by default, and some containerization software (Docker/Podman/Kubernetes/insert you trendy alternative of the week) might also disallow it. So the fundamental question here is: does this happen because of how the environment is configured, or does this happen because our code doesn't handle some corner case? If it's the former - that's an unsupported configuration, and we should document this and tell the users how to fix it. (Ideally we could detect this exact situation and have the node print out a helpful error message.) If it's the latter - we need to fix their code. Either way the fastest way to investigate and fix it is probably something like this:
(At least that's what I would do.) |
@s0me0ne-unkn0wn I'm out of town at the moment, and don't want to issue that command until I return (this Wed) just in case it breaks the entire system. I'll let you know when I try and report back. As for reinstalling Ubuntu, I have several nodes/systems running on this platform so doing that would be a real undertaking and result in significant down time. I would only want to do that as a last resort. @maksimryndin has reached out and we're gonna look at the problem together in the coming days. If we figure anything out, we'll be sure to let you know. |
So with @matthewmarcus we have figured out the actual reason - polkadot ran behind too restrictive systemd unit configuration (which also turned off an ability to create namespaces). Nothing special about the system itself. Linux Good-KarMa 6.7.5-060705-generic #202402161836 SMP PREEMPT_DYNAMIC Fri Feb 16 19:10:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux Matthew had created that restrictive systemd service way before the introduced security features for pvf. And when he updated for the release with security features added, they couldn't be enabled We turned off systemd restrictions in favor of native polkadot security mechanisms and everything works as expected. |
@maksimryndin oh wow, thanks a lot for investigating that! Can you please elaborate on what restrictions were in force? It's probably worth mentioning in the documentation to avoid other users hitting the problem. |
Yes! Many many thanks to @maksimryndin for his excellent guidance and support today. We spent several hours attempting to figure out the issue only to find, as was mentioned, my systemd config for the service was much too restrictive. Here is a portion of the config file. As you can see, once we commented all of the unnecessary parameters, the service worked perfectly.
|
So I would advise users in case of a similar issue try to check and to turn off systemd security-related settings in favor of native polkadot security features. And I believe we should come up with a standard template for troubleshooting such kind of things (I can try to prepare a testing script and come up with a Github issue template suggestions). By the way, during our experiments (we tried to run zombienet first to avoid touching a running validator) we encountered an issue (filed here paritytech/zombienet#1737). So,
|
In the case of creating a script for verification, I would recommend the following order:
Tested on Ubuntu 20.04.6, 22.04.4, 23.10. polkadot version 1.8.0-ec7817e5ad |
* separate constants for average and worst case relay headers * fix compilation
closing as stale |
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Description of bug
Hi, I am trying to upgrade our both Polkadot and Kusama validator to v1.5.0. But I got an
Essential task overseer
failed error after upgrading Kusama and Polkadot validator to v1.5.0.After downgrading back to v1.4.0, the issue gone. This error happened on both Polkadot and Kusama validator.
The logs:
Update:
logs
parameters:
--chain=polkadot --base-path=/chain-data --rpc-cors=all --port=30333 --unsafe-rpc-external --node-key=<xxx> --rpc-methods=Unsafe --name=<name> --telemetry-url="wss://telemetry-backend.w3f.community/submit 1" --public-addr=/dns4/<xxx>/tcp/23739 --in-peers=100 --in-peers-light=0 --db-cache=512
The text was updated successfully, but these errors were encountered: