A panic in the worker leaves it unable to process any subsequent requests for a period of time #166

jakubadamw · 2022-04-09T08:47:30Z

When a Rust-based worker panics, it leaves the worker dangling and unable to process subsequent requests from the same client (client A), which manifests itself in consistent 500s returned by Cloudflare. Requests from another client (client B) work fine, which suggests the sticky routing mechanism causes the requests sent by the client A to be routed to a dangling non-functional instance of the worker.

I created a minimal test case in this repository with the exact steps to reproduce.

Here's the relevant worker exhibiting this behaviour.

use worker::*;

#[event(fetch)]
pub async fn main(req: Request, env: Env, _context: Context) -> Result<Response> {
    let router = Router::new();
    router
        .get("/helloworld", |_, _ctx| Response::ok("Hello World!"))
        .get("/crashtheworker", |_, _ctx| panic!("CRASH"))
        .run(req, env)
        .await
}

I am not sure if this is a bug in workers-rs – perhaps not – but hopefully it can be routed to the appropriate team.

Thank you!

The text was updated successfully, but these errors were encountered:

zebp · 2022-04-22T20:16:50Z

Hmm, this is certainly a problem we should fix. After a quick look it seems that re-creating the entire wasm instance and re-setting the JS glue isn't really possible with wasm-pack because of it's inability to use reference-types (related PR), meaning we have an "object heap" that we can't reset. But I was able to accomplish this by using wasm-bindgen manually, so if worker-build switches to that and uses reference-types we should be able to fix this.

Moving away from wasm-pack was discussed in the past but this might be the straw that broke the camel's back. I'll work on a fix for this soon.

jakubadamw · 2022-05-20T16:55:05Z

@zebp thanks so much for taking a look at this! This has been one of the significant issues we've been hitting with Workers in building our product.

Do you know if a potential fix for this issue is considered to be included in the team's nearest roadmap?

Thanks again.

zebp · 2022-07-14T18:14:01Z

Thankfully, this seems to have been fixed when we updated wasm-bindgen in this commit which is now released in 0.0.10. I'm not sure which commit to wasm-bindgen fixed this, so we'll need to be careful in the future to make sure we don't get a regression.

I have a production worker with a panic! here:
https://panicked.zeb.workers.dev/
https://panicked.zeb.workers.dev/panic

JorritSalverda · 2023-09-04T12:34:24Z

I'm still seeng this issue. After a queue event leading to the following exception all executions following this one no longer log anything:

panicked at 'js error: JsValue(Error: Too many API requests by single worker invocation.
Error: Too many API requests by single worker invocation.
    at entry.js:830:18
    at l (entry.js:185:14)
    at be (entry.js:829:10)
    at wasm://wasm/00acdd9e:wasm-function[1009]:0x1a9471
    at wasm://wasm/00acdd9e:wasm-function[104]:0x53223
    at wasm://wasm/00acdd9e:wasm-function[102]:0x29a73
    at wasm://wasm/00acdd9e:wasm-function[112]:0xc52c2
    at wasm://wasm/00acdd9e:wasm-function[284]:0x160c43
    at wasm://wasm/00acdd9e:wasm-function[1436]:0x1b7fb7
    at rt (entry.js:171:17))', src/lib.rs:23:1

Stack:

Error
    at entry.js:672:13
    at i (entry.js:155:14)
    at Ut (entry.js:671:10)
    at wasm://wasm/00acdd9e:wasm-function[205]:0x145733
    at wasm://wasm/00acdd9e:wasm-function[532]:0x188500
    at wasm://wasm/00acdd9e:wasm-function[616]:0x191269
    at wasm://wasm/00acdd9e:wasm-function[1687]:0x1bc100
    at wasm://wasm/00acdd9e:wasm-function[1075]:0x1ac3ed
    at wasm://wasm/00acdd9e:wasm-function[1125]:0x1ae237
    at wasm://wasm/00acdd9e:wasm-function[1028]:0x1aa365

I'm using at least the following dependencies:

chrono = { version = "0.4", features = ["wasmbind", "serde"] }
console_error_panic_hook = { version = "0.1" }
getrandom = { version = "0.2", features = ["js"] }
worker = { version = "0.0.18", features = ["queue"] }

According to my Cargo.lock file I have the following versions:

name = "wasm-bindgen"
version = "0.2.86"

name = "wasm-bindgen-backend"
version = "0.2.86"

name = "wasm-bindgen-futures"
version = "0.4.36"

name = "wasm-bindgen-macro"
version = "0.2.86"

name = "wasm-bindgen-macro-support"
version = "0.2.86"

name = "wasm-bindgen-shared"
version = "0.2.86"

Any thoughts on why my worker still seems to turn non-functional? It led to issue #374 but after instrumenting the worker in line with https://blog.cloudflare.com/wasm-coredumps/ it seems to have the behaviour as described in this issue.

zebp mentioned this issue May 9, 2022

Add call_unwind to to prevent panics from crashing worker #178

Closed

zebp closed this as completed Jul 14, 2022

ppedziwiatr mentioned this issue Jun 14, 2023

[BUG] Uncaught Exception when reading some Rust contracts warp-contracts/warp#426

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A panic in the worker leaves it unable to process any subsequent requests for a period of time #166

A panic in the worker leaves it unable to process any subsequent requests for a period of time #166

jakubadamw commented Apr 9, 2022 •

edited

Loading

zebp commented Apr 22, 2022

jakubadamw commented May 20, 2022 •

edited

Loading

zebp commented Jul 14, 2022

JorritSalverda commented Sep 4, 2023

A panic in the worker leaves it unable to process any subsequent requests for a period of time #166

A panic in the worker leaves it unable to process any subsequent requests for a period of time #166

Comments

jakubadamw commented Apr 9, 2022 • edited Loading

zebp commented Apr 22, 2022

jakubadamw commented May 20, 2022 • edited Loading

zebp commented Jul 14, 2022

JorritSalverda commented Sep 4, 2023

jakubadamw commented Apr 9, 2022 •

edited

Loading

jakubadamw commented May 20, 2022 •

edited

Loading