-
Notifications
You must be signed in to change notification settings - Fork 36.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: fix banning and disallow sending messages before receiving verack #9720
Conversation
I see that @TheBlueMatt and I managed to make almost the exact same change: Edit: I should also mention that this and #9715 are complementary. |
What is the risk by not merging this? |
Concept ACK, will give this a more full review after breakfast.
…On February 8, 2017 1:46:00 AM EST, Cory Fields ***@***.***> wrote:
This is the last of my net issues for 0.14. As discussed with
@TheBlueMatt and @gmaxwell.
Fixes for a few problems discovered while running a network
stress/fuzzer:
- Remote nodes weren't always banned when they hadn't yet sent a
verack. Regression from 7a8c251.
- Require a verack before sending any non-handshake messages. This is
much more straightforward behavior, and allows for tests to be easily
written
- Now that there's a sane model for testing, add checks for leaky
messages sent out before the handshake is complete, as well as for
banning in those cases.
You can view, comment on, or merge this pull request online at:
#9720
-- Commit Summary --
* net: correctly ban before the handshake is complete
* net: parse reject earlier
* net: require a verack before responding to anything else
* qa: allow for a node that does not send an initial version message
* qa: add a test to detect leaky p2p messages
-- File Changes --
M qa/pull-tester/rpc-tests.py (1)
A qa/rpc-tests/p2p-leaktests.py (135)
M qa/rpc-tests/test_framework/mininode.py (21)
M src/net_processing.cpp (115)
-- Patch Links --
https://github.com/bitcoin/bitcoin/pull/9720.patch
https://github.com/bitcoin/bitcoin/pull/9720.diff
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#9720
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 50ce6755e024bc849ef8681f9aaa81ff3f3bca57
I'd also like the commit referred to in 0abd2ccfaa24fdd1e5d24271e2360c4d9d64b662 's commit message to include its commit hash for easier understanding.
src/net_processing.cpp
Outdated
AssertLockHeld(cs_main); | ||
CNodeState &state = *State(pnode->GetId()); | ||
|
||
BOOST_FOREACH(const CBlockReject& reject, state.rejects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind putting braces around this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: avoid BOOST_FOREACH
if possible (though this was moved code, so perhaps not applicable, but..).
for (const CBlockReject& reject : state.rejects) [...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, for moved code this is not applicable. To avoid confusion, let's keep code style changes, moves and bugfixes separate where possible.
src/net_processing.cpp
Outdated
@@ -2594,6 +2594,35 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr | |||
return true; | |||
} | |||
|
|||
static bool SendRejectsAndCheckBan(CNode* pnode, CConnman& connman) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The name makes me think true
means fShouldBan
was set to true. Invert the boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, i waffled back and forth on this. Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usual fix for waffling is comment and clarify name :p
I got a few questions about these changes, so I've updated the commit messages to provide (I hope) better back-story/context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK, did not review tests
src/net_processing.cpp
Outdated
AssertLockHeld(cs_main); | ||
CNodeState &state = *State(pnode->GetId()); | ||
|
||
BOOST_FOREACH(const CBlockReject& reject, state.rejects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: avoid BOOST_FOREACH
if possible (though this was moved code, so perhaps not applicable, but..).
for (const CBlockReject& reject : state.rejects) [...]
src/net_processing.cpp
Outdated
LogPrintf("%s(%s, %u bytes) FAILED peer=%d\n", __func__, SanitizeString(strCommand), nMessageSize, pfrom->id); | ||
LOCK(cs_main); | ||
SendRejectsAndCheckBan(pfrom, connman); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few cases it looks like we return true after setting misbehaving, so this probably needs to be checked either way (or those cases need updating - but that might result in double-logging).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to avoid the cs_main lock for each message received, but you're right that it means that we rely on a return value that's not very well-defined.
I'll just make it unconditional. Until we have parallel processing, there's not much harm in that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
utACK df1a323, did not review tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK (only tested lightly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test looks good @theuni ! I've added a bunch of style comments.
qa/rpc-tests/p2p-leaktests.py
Outdated
from test_framework.mininode import * | ||
from test_framework.test_framework import BitcoinTestFramework | ||
from test_framework.util import * | ||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this. logging isn't being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
#!/usr/bin/env python3 | ||
# Copyright (c) 2017 The Bitcoin Core developers | ||
# Distributed under the MIT software license, see the accompanying | ||
# file COPYING or http://www.opensource.org/licenses/mit-license.php. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a docstring describing what this test case is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
qa/rpc-tests/p2p-leaktests.py
Outdated
def check(self): | ||
def done(): | ||
return self.done | ||
return wait_until(done, timeout=10) and not self.unrequested_msg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure what this is achieving. I think you can remove check()
and self.done
entirely.
You can check directly that no unexpected messages have been received at the end of the test (see my comment below).
qa/rpc-tests/p2p-leaktests.py
Outdated
self.connection.send_message(message) | ||
|
||
def bad_message(self, message): | ||
self.unrequested_msg = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is unexpected_message
a better name than unrequested_message
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems about the same to me, but sure
qa/rpc-tests/p2p-leaktests.py
Outdated
|
||
def bad_message(self, message): | ||
self.unrequested_msg = True | ||
raise Exception("should not have received message: %s" % message.command) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should be raising an exception in the network thread. Just set unexpected_message
to True (and optionally print some debug info)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
qa/rpc-tests/p2p-leaktests.py
Outdated
def __init__(self): | ||
super().__init__() | ||
self.num_nodes = 1 | ||
self.banscore = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since banscore isn't changing I'd prefer to make it a global constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i'd rather keep this local so that we can use different scores for different tests.
qa/rpc-tests/p2p-leaktests.py
Outdated
def setup_network(self): | ||
extra_args = [['-debug', '-banscore='+str(self.banscore)] | ||
for i in range(self.num_nodes)] | ||
self.nodes = start_nodes(self.num_nodes, self.options.tmpdir, extra_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's only one node here. You can replace the above three lines with:
self.nodes = [start_node(0, self.options.tmpdir, ['-debug', '-banscore='+str(self.banscore)])]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave it as is for now. My goal was to generalize this logic and get rid of most setup_network
methods in test as the only thing they commonly do is fire up the requested number of nodes and nothing else. So there would be no need to overwrite this function if you want a simple network set up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it would be easier to adapt if the number of nodes changes.
qa/rpc-tests/p2p-leaktests.py
Outdated
no_verack_idlenode.add_connection(connections[2]) | ||
|
||
NetworkThread().start() # Start up network handling in another thread | ||
self.nodes[0].generate(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very needed, but needs a comment. We generate a block, and wait a few secs to make sure that it's not relayed to the nodes who haven't versioned/veracked yet.
qa/rpc-tests/p2p-leaktests.py
Outdated
|
||
assert(no_version_bannode.check()) | ||
assert(no_version_idlenode.check()) | ||
assert(no_verack_idlenode.check()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well just check the variable directly:
assert(not no_version_bannode.unexpected_message)
assert(not no_version_idlenode.unexpected_message)
assert(not no_verack_idlenode.unexpected_message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
qa/rpc-tests/p2p-leaktests.py
Outdated
assert(no_version_bannode.check()) | ||
assert(no_version_idlenode.check()) | ||
assert(no_verack_idlenode.check()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend you disconnect all the connections to the node at the end of the test:
# Disconnect all peers
[conn.disconnect_node() for conn in connections]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@jnewbery Thanks for the great test review. I'm not sure I'll have time to get to this before Sunday, so let's not let it hold back merge if a few more ACKs come in. I'll for sure fix up the tests post-merge if that's the case. |
7a8c251 made a change to avoid getting into SendMessages() until the version handshake (VERSION + VERACK) is complete. That was done to avoid leaking out messages to nodes who could connect, but never bothered sending us their version/verack. Unfortunately, the ban tally and possible disconnect are done as part of SendMessages(). So after 7a8c251, if a peer managed to do something bannable before completing the handshake (say send 100 non-version messages before their version), they wouldn't actually end up getting disconnected/banned. That's fixed here by checking the banscore as part of ProcessMessages() in addition to SendMessages().
Prior to this change, all messages were ignored until a VERSION message was received, as well as possibly incurring a ban score. Since REJECT messages can be sent at any time (including as a response to a bad VERSION message), make sure to always parse them. Moving this parsing up keeps it from being caught in the if (pfrom->nVersion == 0) check below.
7a8c251 made this logic hard to follow. After that change, messages would not be sent to a peer via SendMessages() before the handshake was complete, but messages could still be sent as a response to an incoming message. For example, if a peer had not yet sent a verack, we wouldn't notify it about new blocks, but we would respond to a PING with a PONG. This change makes the behavior straightforward: until we've received a verack, never send any message other than version/verack/reject. The behavior until a VERACK is received has always been undefined, this change just tightens our policy. This also makes testing much easier, because we can now connect but not send version/verack, and anything sent to us is an error.
This is certainly not exhaustive, but it's better than nothing. Adds checks for: - Any message received before sending a version - Any message received other than version/reject before sending a verack It also tries to goad the remote into sending a pong, address, or block announcement.
Fixed up the tests and squashed. Only the tests changed, the bitcoin code is exactly the same as before squash. I've archived the old branch here: https://github.com/theuni/bitcoin/commits/fix-ban2 in case anyone wants to compare. I replaced my mininode changes with @TheBlueMatt's commits from #9715, so that these won't conflict with eachother. |
…ceiving verack d943491 qa: add a test to detect leaky p2p messages (Cory Fields) 8650bbb qa: Expose on-connection to mininode listeners (Matt Corallo) 5b5e4f8 qa: mininode learns when a socket connects, not its first action (Matt Corallo) cbfc5a6 net: require a verack before responding to anything else (Cory Fields) 8502e7a net: parse reject earlier (Cory Fields) c45b9fb net: correctly ban before the handshake is complete (Cory Fields)
…fore receiving verack d943491 qa: add a test to detect leaky p2p messages (Cory Fields) 8650bbb qa: Expose on-connection to mininode listeners (Matt Corallo) 5b5e4f8 qa: mininode learns when a socket connects, not its first action (Matt Corallo) cbfc5a6 net: require a verack before responding to anything else (Cory Fields) 8502e7a net: parse reject earlier (Cory Fields) c45b9fb net: correctly ban before the handshake is complete (Cory Fields)
…fore receiving verack d943491 qa: add a test to detect leaky p2p messages (Cory Fields) 8650bbb qa: Expose on-connection to mininode listeners (Matt Corallo) 5b5e4f8 qa: mininode learns when a socket connects, not its first action (Matt Corallo) cbfc5a6 net: require a verack before responding to anything else (Cory Fields) 8502e7a net: parse reject earlier (Cory Fields) c45b9fb net: correctly ban before the handshake is complete (Cory Fields)
…fore receiving verack d943491 qa: add a test to detect leaky p2p messages (Cory Fields) 8650bbb qa: Expose on-connection to mininode listeners (Matt Corallo) 5b5e4f8 qa: mininode learns when a socket connects, not its first action (Matt Corallo) cbfc5a6 net: require a verack before responding to anything else (Cory Fields) 8502e7a net: parse reject earlier (Cory Fields) c45b9fb net: correctly ban before the handshake is complete (Cory Fields)
This is the last of my net issues for 0.14. As discussed with @TheBlueMatt and @gmaxwell.
Fixes for a few problems discovered while running a network stress/fuzzer: