Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A panic in the worker leaves it unable to process any subsequent requests for a period of time #166

Closed
jakubadamw opened this issue Apr 9, 2022 · 4 comments

Comments

@jakubadamw
Copy link
Contributor

jakubadamw commented Apr 9, 2022

When a Rust-based worker panics, it leaves the worker dangling and unable to process subsequent requests from the same client (client A), which manifests itself in consistent 500s returned by Cloudflare. Requests from another client (client B) work fine, which suggests the sticky routing mechanism causes the requests sent by the client A to be routed to a dangling non-functional instance of the worker.

I created a minimal test case in this repository with the exact steps to reproduce.

Here's the relevant worker exhibiting this behaviour.

use worker::*;

#[event(fetch)]
pub async fn main(req: Request, env: Env, _context: Context) -> Result<Response> {
    let router = Router::new();
    router
        .get("/helloworld", |_, _ctx| Response::ok("Hello World!"))
        .get("/crashtheworker", |_, _ctx| panic!("CRASH"))
        .run(req, env)
        .await
}

I am not sure if this is a bug in workers-rs – perhaps not – but hopefully it can be routed to the appropriate team.

Thank you!

@zebp
Copy link
Collaborator

zebp commented Apr 22, 2022

Hmm, this is certainly a problem we should fix. After a quick look it seems that re-creating the entire wasm instance and re-setting the JS glue isn't really possible with wasm-pack because of it's inability to use reference-types (related PR), meaning we have an "object heap" that we can't reset. But I was able to accomplish this by using wasm-bindgen manually, so if worker-build switches to that and uses reference-types we should be able to fix this.

Moving away from wasm-pack was discussed in the past but this might be the straw that broke the camel's back. I'll work on a fix for this soon.

@jakubadamw
Copy link
Contributor Author

jakubadamw commented May 20, 2022

@zebp thanks so much for taking a look at this! This has been one of the significant issues we've been hitting with Workers in building our product.

Do you know if a potential fix for this issue is considered to be included in the team's nearest roadmap?

Thanks again.

@zebp
Copy link
Collaborator

zebp commented Jul 14, 2022

Thankfully, this seems to have been fixed when we updated wasm-bindgen in this commit which is now released in 0.0.10. I'm not sure which commit to wasm-bindgen fixed this, so we'll need to be careful in the future to make sure we don't get a regression.

I have a production worker with a panic! here:
https://panicked.zeb.workers.dev/
https://panicked.zeb.workers.dev/panic

@JorritSalverda
Copy link

I'm still seeng this issue. After a queue event leading to the following exception all executions following this one no longer log anything:

panicked at 'js error: JsValue(Error: Too many API requests by single worker invocation.
Error: Too many API requests by single worker invocation.
    at entry.js:830:18
    at l (entry.js:185:14)
    at be (entry.js:829:10)
    at wasm://wasm/00acdd9e:wasm-function[1009]:0x1a9471
    at wasm://wasm/00acdd9e:wasm-function[104]:0x53223
    at wasm://wasm/00acdd9e:wasm-function[102]:0x29a73
    at wasm://wasm/00acdd9e:wasm-function[112]:0xc52c2
    at wasm://wasm/00acdd9e:wasm-function[284]:0x160c43
    at wasm://wasm/00acdd9e:wasm-function[1436]:0x1b7fb7
    at rt (entry.js:171:17))', src/lib.rs:23:1

Stack:

Error
    at entry.js:672:13
    at i (entry.js:155:14)
    at Ut (entry.js:671:10)
    at wasm://wasm/00acdd9e:wasm-function[205]:0x145733
    at wasm://wasm/00acdd9e:wasm-function[532]:0x188500
    at wasm://wasm/00acdd9e:wasm-function[616]:0x191269
    at wasm://wasm/00acdd9e:wasm-function[1687]:0x1bc100
    at wasm://wasm/00acdd9e:wasm-function[1075]:0x1ac3ed
    at wasm://wasm/00acdd9e:wasm-function[1125]:0x1ae237
    at wasm://wasm/00acdd9e:wasm-function[1028]:0x1aa365

I'm using at least the following dependencies:

chrono = { version = "0.4", features = ["wasmbind", "serde"] }
console_error_panic_hook = { version = "0.1" }
getrandom = { version = "0.2", features = ["js"] }
worker = { version = "0.0.18", features = ["queue"] }

According to my Cargo.lock file I have the following versions:

name = "wasm-bindgen"
version = "0.2.86"

name = "wasm-bindgen-backend"
version = "0.2.86"

name = "wasm-bindgen-futures"
version = "0.4.36"

name = "wasm-bindgen-macro"
version = "0.2.86"

name = "wasm-bindgen-macro-support"
version = "0.2.86"

name = "wasm-bindgen-shared"
version = "0.2.86"

Any thoughts on why my worker still seems to turn non-functional? It led to issue #374 but after instrumenting the worker in line with https://blog.cloudflare.com/wasm-coredumps/ it seems to have the behaviour as described in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants