Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Footgun with futures unordered #131

Closed
3 of 4 tasks
nikomatsakis opened this issue Apr 7, 2021 · 1 comment · Fixed by #172
Closed
3 of 4 tasks

Footgun with futures unordered #131

nikomatsakis opened this issue Apr 7, 2021 · 1 comment · Fixed by #172
Labels
good first issue Good for newcomers help wanted Extra attention is needed status-quo-story-ideas "Status quo" user story ideas

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Apr 7, 2021

Brief summary

@farnz opened a really interesting issue rust-lang/futures-rs#2387:

This would make a great status quo story!

We've found a nasty footgun when we use FuturesUnordered (or buffered etc) to get concurrency from a set of futures.

Because FuturesUnordered only polls its contents when it is polled, it is possible for futures lurking in the queue to be surprised by a long poll, even though no individual future spends a long time in poll(). This causes issues in two cases:

  1. When interfacing with an external system via the network; if you take a result from the stream with while let Some(res) = stream.next().await and then do significant wall-clock time inside the loop (even if very little CPU time is involved because you're awaiting another network service), you can hit the external system's timeouts and fail unexpectedly.

  2. When using an async friendly semaphore (like Tokio provides), you can deadlock yourself by having the tasks that are waiting in the FuturesUnordered owning all the semaphores, while having an item in a .for_each() block after buffer_unordered() requiring a semaphore.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f58e77ba077b40eba40636a4e32b5710 shows the effect. Naïvely, you'd expect all the 10 to 20ms sleep futures to complete in under 40ms, and the 100ms sleep futures to take 100ms to 200ms. However, you can see that the sleep threads complete in the timescale expected and send the wakeup to the future that spawned them, but some of the short async sleep_for futures take over 100ms to complete, because while the thread signals them to wake up, the loop is .awaiting a long sleep future and does not get round to polling the stream again for some time.

We've found this in practice with things where the loop body is "nice" in the sense that it doesn't run for very long inside its poll function, but the total time spent in the loop body is large. The futures being polled by FuturesUnordered do:

async fn do_select<T>(database: &Database, query: Query) -> Result<Vec<T>> {
    let conn = database.get_conn().await?;
    conn.select_query(query).await
}

and the main work looks like:

async fn do_work(database: &Database) {
    let work = do_select(database, FIND_WORK_QUERY)?;
    stream::iter(
        work
            .into_iter()
            .map(|item| do_select(database, work_from_item(item)).await)
            .buffered(5)
            .for_each(|work_item| do_giant_work(work_item)).await;
}

do_giant_work can take 20 seconds wall clock time for big work items. It's possible for get_conn to open the connection (which has a 10 second idle timeout) for each Future in the buffered set, send the first handshake packet, and then return Poll::Pending as it waits for the reply. When the first of the 5 in the buffered set returns Poll::Ready(item), the code then runs do_giant_work which takes 20 seconds. While do_giant_work is in control, nothing re-polls the buffered set of Futures, and so the idle timeout kicks in server-side, and all of the 4 open connections get dropped because we've opened a connection and then not completed the handshake.

We can mitigate the problem by using spawn_with_handle to ensure that the do_select work happens whenever the do_giant_work Future awaits something, but this behaviour has surprised my team more than once (despite enough experience to diagnose this after the fact).

I'm not sure that a perfect technical solution is possible; the issue is that FuturesUnordered is a sub-executor driven by the main executor, and if not polled, it can't poll its set of pending futures. Meanwhile, the external code is under no obligation to poll the FuturesUnordered in a timely fashion. Spawning the futures before putting them in the sub-executor works because the main executor then drives them, and the sub-executor is merely picking up final results, but futures have to be 'static lifetime to be spawned.

Optional details

  • (Optional) Which character(s) would be the best fit and why?
    • Alan: the experienced "GC'd language" developer, new to Rust
    • Grace: the systems programming expert, new to Rust
    • Niklaus: new programmer from an unconventional background
    • Barbara: the experienced Rust developer
  • (Optional) Which project(s) would be the best fit and why?
    • List some projects here.
  • (Optional) What are the key points or morals to emphasize?
    • Write some morals here.
@farnz
Copy link
Contributor

farnz commented Apr 7, 2021

If it helps, the team that's hit this repeatedly without realising until the footgun strikes in production has people who fit with Alan, Grace, and Barbara, but not Niklaus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed status-quo-story-ideas "Status quo" user story ideas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants