Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security: Spawn each initial peer handshake in a separate task, Credit: Equilibrium #2163

Closed
teor2345 opened this issue May 18, 2021 · 2 comments · Fixed by #3189
Closed
Assignees
Labels
A-network Area: Network protocol updates or fixes A-rust Area: Updates to Rust code C-security Category: Security issues I-hang A Zebra component stops responding to requests I-integration-fail Continuous integration fails, including build and test failures I-slow Problems with performance or responsiveness

Comments

@teor2345
Copy link
Contributor

teor2345 commented May 18, 2021

Is your feature request related to a problem? Please describe.

Zebra can hang if there are a small number of initial peers. (It eventually continues after 3*75 seconds, but that's a long time to wait.)

There might also be some potential deadlocks between the initial peer handshakes, the candidate set, and the peer set.

Reported by Niklas Long of Equilibrium.

Describe the solution you'd like

Spawn each initial peer handshake in a separate task, so they can make progress independently.

Additional context

This is the same fix as #1950, but for the initial peer handshakes.

This is a follow-up to #2154, which fixed part of this security issue.

@teor2345 teor2345 added A-rust Area: Updates to Rust code P-Medium C-security Category: Security issues I-hang A Zebra component stops responding to requests I-slow Problems with performance or responsiveness I-integration-fail Continuous integration fails, including build and test failures A-network Area: Network protocol updates or fixes labels May 18, 2021
@teor2345 teor2345 added this to the 2021 Sprint 9 milestone May 18, 2021
@teor2345 teor2345 self-assigned this May 18, 2021
@teor2345
Copy link
Contributor Author

This is a high-risk change, so we'll leave it until after NU5 testnet activation.

@teor2345
Copy link
Contributor Author

teor2345 commented Dec 9, 2021

Each initial handshake is a future, so they all run in the same task:

async move {
// Rate-limit the connection, sleeping for an interval according
// to its index in the list.
sleep(constants::MIN_PEER_CONNECTION_INTERVAL.saturating_mul(i as u32)).await;
outbound_connector
.oneshot(req)
.map_err(move |e| (addr, e))
.await
}

Instead, we need to spawn a new task to run each future, like this code:

// Construct a handshake future but do not drive it yet....
let handshake = handshaker.call(HandshakeRequest {
tcp_stream,
connected_addr,
connection_tracker,
});
// ... instead, spawn a new task to handle this connection
{
let mut peerset_tx = peerset_tx.clone();
tokio::spawn(
async move {
if let Ok(client) = handshake.await {
let _ = peerset_tx.send(Ok(Change::Insert(addr, client))).await;
}
}
.instrument(handshaker_span),
);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-network Area: Network protocol updates or fixes A-rust Area: Updates to Rust code C-security Category: Security issues I-hang A Zebra component stops responding to requests I-integration-fail Continuous integration fails, including build and test failures I-slow Problems with performance or responsiveness
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants