The Golden Rule - Don’t Block the Event Loop #73

punkeel · 2021-03-20T03:05:24Z

Brief summary

As a user with a Java background, I'm used to the Netty/Vert.x approach to async where blocking the event loop is very bad, and almost certainly results in a performance hit, or the application being stuck.
This is really well detailed in the Vert.x doc, in The Golden Rule - Don’t Block the Event Loop paragraph:

We already know that the Vert.x APIs are non blocking and won’t block the event loop, but that’s not much help if you block the event loop yourself in a handler.

If you do that, then that event loop will not be able to do anything else while it’s blocked. If you block all of the event loops in Vertx instance then your application will grind to a complete halt!

So don’t do it! You have been warned.

Examples of blocking include:

Thread.sleep()

Waiting on a lock

Waiting on a mutex or monitor (e.g. synchronized section)

Doing a long lived database operation and waiting for a result

Doing a complex calculation that takes some significant time.

Spinning in a loop

If any of the above stop the event loop from doing anything else for a significant amount of time then you should go immediately to the naughty step, and await further instructions.

So… what is a significant amount of time?

How long is a piece of string? It really depends on your application and the amount of concurrency you require.

If you have a single event loop, and you want to handle 10000 http requests per second, then it’s clear that each request can’t take more than 0.1 ms to process, so you can’t block for any more time than that.

The maths is not hard and shall be left as an exercise for the reader.

If your application is not responsive it might be a sign that you are blocking an event loop somewhere. To help you diagnose such issues, Vert.x will automatically log warnings if it detects an event loop hasn’t returned for some time. If you see warnings like these in your logs, then you should investigate.

Thread vertx-eventloop-thread-3 has been blocked for 20458 ms

Vert.x will also provide stack traces to pinpoint exactly where the blocking is occurring.

I've seen this message pop up in several Vert.x projects, always for good reasons. In other projects, not necessarily using Vert.x, I've found it to be a good habit to add checks along those lines:

When defining blocking functions, add assertions on the current thread (Thread.currentThread()) to panic early.
Add timers, and log warnings when tasks are running for more than a few ms.

I've never seen this kind of error messages in Rust, and I'm not confident with my ability to use the right APIs in the right context. How can I make sure this doesn't come and bite me?

Optional details

(Optional) Which character(s) would be the best fit and why?
- Alan: the experienced "GC'd language" developer, new to Rust
- Grace: the systems programming expert, new to Rust
- Niklaus: new programmer from an unconventional background
- Barbara: the experienced Rust developer
(Optional) Which project(s) would be the best fit and why?
- All, potentially?
- As a library, this may serve as a safety guard when invoking user-provided methods.
(Optional) What are the key points or morals to emphasize?
- Nothing prevents async code from calling sync code — which is fine, but this will hurt performances in some cases.
- How do we make it easier for users to detect when this happens, and why their whole app is slow?
- How do we help libraries integrate and surface these warnings?

The text was updated successfully, but these errors were encountered:

nikomatsakis · 2021-03-20T09:35:34Z

@punkeel it'd be helpful, I think, if you could frame this as a story about what happens today. Or maybe to write about what happens in Java! That'd be an interesting twist.

If I were to guess, it might be something like:

Alan is accustomed to warnings from Jetty.
Alan is writing Rust code and can't really tell if his tasks are running for too long or not.
Maybe he is testing on a multicore machine at home and things work ok, but then he deploys in a single core image and suddenly performance is terrible. Why?
Turns out there is a long-running loop that blocks the event thread, but he has a heck of a time tracking it down. Eventually he gives up and watches Game of Thrones.

estebank · 2021-03-22T17:10:43Z

For traceability, CC #19 & rust-lang/rust-clippy#4377

gilescope · 2021-03-25T07:44:35Z

Some systems kill the process if the event loop blocks more than a certain length of time. Pragmatically that works, but it would be even better to flag certain threads as event-loops and have compile failures if they might block.

punkeel · 2021-03-28T02:28:47Z

Related piece by @fasterthanlime: https://fasterthanli.me/articles/pin-and-suffering

punkeel added good first issue Good for newcomers help wanted Extra attention is needed status-quo-story-ideas "Status quo" user story ideas labels Mar 20, 2021

ericseppanen mentioned this issue Apr 22, 2021

Async functions and blocking I/O don't mix neondatabase/neon#58

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Golden Rule - Don’t Block the Event Loop #73

The Golden Rule - Don’t Block the Event Loop #73

punkeel commented Mar 20, 2021 •

edited

Loading

nikomatsakis commented Mar 20, 2021

estebank commented Mar 22, 2021

gilescope commented Mar 25, 2021

punkeel commented Mar 28, 2021

The Golden Rule - Don’t Block the Event Loop #73

The Golden Rule - Don’t Block the Event Loop #73

Comments

punkeel commented Mar 20, 2021 • edited Loading

Brief summary

Optional details

nikomatsakis commented Mar 20, 2021

estebank commented Mar 22, 2021

gilescope commented Mar 25, 2021

punkeel commented Mar 28, 2021

punkeel commented Mar 20, 2021 •

edited

Loading