Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Golden Rule - Don’t Block the Event Loop #73

Open
4 tasks done
punkeel opened this issue Mar 20, 2021 · 4 comments
Open
4 tasks done

The Golden Rule - Don’t Block the Event Loop #73

punkeel opened this issue Mar 20, 2021 · 4 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed status-quo-story-ideas "Status quo" user story ideas

Comments

@punkeel
Copy link
Contributor

punkeel commented Mar 20, 2021

Brief summary

As a user with a Java background, I'm used to the Netty/Vert.x approach to async where blocking the event loop is very bad, and almost certainly results in a performance hit, or the application being stuck.
This is really well detailed in the Vert.x doc, in The Golden Rule - Don’t Block the Event Loop paragraph:

We already know that the Vert.x APIs are non blocking and won’t block the event loop, but that’s not much help if you block the event loop yourself in a handler.

If you do that, then that event loop will not be able to do anything else while it’s blocked. If you block all of the event loops in Vertx instance then your application will grind to a complete halt!

So don’t do it! You have been warned.

Examples of blocking include:

  • Thread.sleep()

  • Waiting on a lock

  • Waiting on a mutex or monitor (e.g. synchronized section)

  • Doing a long lived database operation and waiting for a result

  • Doing a complex calculation that takes some significant time.

  • Spinning in a loop

If any of the above stop the event loop from doing anything else for a significant amount of time then you should go immediately to the naughty step, and await further instructions.

So…​ what is a significant amount of time?

How long is a piece of string? It really depends on your application and the amount of concurrency you require.

If you have a single event loop, and you want to handle 10000 http requests per second, then it’s clear that each request can’t take more than 0.1 ms to process, so you can’t block for any more time than that.

The maths is not hard and shall be left as an exercise for the reader.

If your application is not responsive it might be a sign that you are blocking an event loop somewhere. To help you diagnose such issues, Vert.x will automatically log warnings if it detects an event loop hasn’t returned for some time. If you see warnings like these in your logs, then you should investigate.

Thread vertx-eventloop-thread-3 has been blocked for 20458 ms

Vert.x will also provide stack traces to pinpoint exactly where the blocking is occurring.

I've seen this message pop up in several Vert.x projects, always for good reasons. In other projects, not necessarily using Vert.x, I've found it to be a good habit to add checks along those lines:

  • When defining blocking functions, add assertions on the current thread (Thread.currentThread()) to panic early.
  • Add timers, and log warnings when tasks are running for more than a few ms.

I've never seen this kind of error messages in Rust, and I'm not confident with my ability to use the right APIs in the right context. How can I make sure this doesn't come and bite me?

Optional details

  • (Optional) Which character(s) would be the best fit and why?
    • Alan: the experienced "GC'd language" developer, new to Rust
    • Grace: the systems programming expert, new to Rust
    • Niklaus: new programmer from an unconventional background
    • Barbara: the experienced Rust developer
  • (Optional) Which project(s) would be the best fit and why?
    • All, potentially?
    • As a library, this may serve as a safety guard when invoking user-provided methods.
  • (Optional) What are the key points or morals to emphasize?
    • Nothing prevents async code from calling sync code — which is fine, but this will hurt performances in some cases.
    • How do we make it easier for users to detect when this happens, and why their whole app is slow?
    • How do we help libraries integrate and surface these warnings?
@punkeel punkeel added good first issue Good for newcomers help wanted Extra attention is needed status-quo-story-ideas "Status quo" user story ideas labels Mar 20, 2021
@nikomatsakis
Copy link
Contributor

@punkeel it'd be helpful, I think, if you could frame this as a story about what happens today. Or maybe to write about what happens in Java! That'd be an interesting twist.

If I were to guess, it might be something like:

  • Alan is accustomed to warnings from Jetty.
  • Alan is writing Rust code and can't really tell if his tasks are running for too long or not.
  • Maybe he is testing on a multicore machine at home and things work ok, but then he deploys in a single core image and suddenly performance is terrible. Why?
  • Turns out there is a long-running loop that blocks the event thread, but he has a heck of a time tracking it down. Eventually he gives up and watches Game of Thrones.

@estebank
Copy link

For traceability, CC #19 & rust-lang/rust-clippy#4377

@gilescope
Copy link

Some systems kill the process if the event loop blocks more than a certain length of time. Pragmatically that works, but it would be even better to flag certain threads as event-loops and have compile failures if they might block.

@punkeel
Copy link
Contributor Author

punkeel commented Mar 28, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed status-quo-story-ideas "Status quo" user story ideas
Projects
None yet
Development

No branches or pull requests

4 participants