This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Track when the pulled event signature fails #13815
Merged
MadLittleMods
merged 11 commits into
develop
from
madlittlemods/13700-track-when-event-signature-fails
Oct 3, 2022
Merged
Changes from 1 commit
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
924ae2b
Track when the pulled event signature fails
MadLittleMods d240aeb
Add changelog
MadLittleMods cfb4e88
Fix reference
MadLittleMods 3d8507d
Merge branch 'develop' into madlittlemods/13700-track-when-event-sign…
MadLittleMods 88a75cf
Use callback pattern to record signature failures
MadLittleMods d29ac0b
Add docstring
MadLittleMods 14e39ee
Record failure from get_pdu and add test
MadLittleMods 7898371
Be more selective about which errors to care about
MadLittleMods 43f1d1a
Merge branch 'develop' into madlittlemods/13700-track-when-event-sign…
MadLittleMods 83feb1b
Merge branch 'develop' into madlittlemods/13700-track-when-event-sign…
MadLittleMods 7d102e8
Merge branch 'develop' into madlittlemods/13700-track-when-event-sign…
MadLittleMods File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -80,7 +80,13 @@ async def _check_sigs_and_hash( | |
InvalidEventSignatureError if the signature check failed. Nothing | ||
will be logged in this case. | ||
""" | ||
await _check_sigs_on_pdu(self.keyring, room_version, pdu) | ||
try: | ||
await _check_sigs_on_pdu(self.keyring, room_version, pdu) | ||
except Exception as exc: | ||
await self._store.record_event_failed_pull_attempt( | ||
pdu.room_id, pdu.event_id, str(exc) | ||
MadLittleMods marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) | ||
raise exc | ||
|
||
if not check_event_content_hash(pdu): | ||
# let's try to distinguish between failures because the event was | ||
|
@@ -116,6 +122,9 @@ async def _check_sigs_and_hash( | |
"event_id": pdu.event_id, | ||
} | ||
) | ||
await self._store.record_event_failed_pull_attempt( | ||
pdu.room_id, pdu.event_id, "Event content has been tampered with" | ||
) | ||
return redacted_event | ||
|
||
spam_check = await self.spam_checker.check_event_for_spam(pdu) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the future, we will potentially record a failed attempt when the spam checker soft-fails an event. Part of #13700 |
||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This except clause seems too broad; I'd worry it might swallow errors arising from bugs rather than solely invalid signatures.
Can we narrow down the interesting types of Exceptions that we care about here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. this will catch
CancelledError
s which sounds dodgy (I don't know if cancellation is enabled for backfill, but it doesn't sound like a good reason to record a failed pull attempt)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we can just worry about
InvalidEventSignatureError
here 👍But all of the downstream places in
_check_sigs_on_pdu
do the sameexcept Exception as e:
🤷There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we decide here, we should also apply to #13814
Is there a way we can make the errors we care about opt-out instead of opt-in? It seems like it's going to suck to keep some of these error lists up to date whenever the downstream functions change and raise different errors.
Maybe it's better to be opt-in to handle just what you know but as an example at the following spot, we probably need to look at
FederationError
,RuntimeError
,InvalidResponseError
from a quick glance at some of the downstream code and this list seems like it will morph under us since there is so much going on in that block. Is there a good way to get a general sense of all exception types that a code path can raise (doesn't need to be comprehensive, maybe info just from docstrings)?synapse/synapse/handlers/federation_event.py
Lines 916 to 920 in 6f0c3e6
(created a draft PR to track this point, #13969, and will update with what we think about here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't imagine opt-out is easier — I don't think
RuntimeError
,ValueError
,AssertionError
,KeyError
etc should be included, for example. These represent programming errors rather than the remote not acting according to the way we expect.Tracking the set of exceptions that some code can throw is difficult in Python, unfortunately.
I would suggest trying some cases of where you want to track a failed pull attempt, perhaps even adding tests that reproduce these cases. I can think of at least these classes of problems:
KeyError
(IMO this should be fixed so it sounds more like a validation error)I realise this sounds a bit painful but we ought to figure out what we should do for these cases. Should a network problem actually be tracked against the event at all; won't the existing backoff behaviour be sufficient there? ....
Perhaps, rather than accepting that the code underneath is always going to be messy, try and improve the downstream situation — can we define superclasses for some of these interesting groups of exceptions a e.g.
TemporaryNetworkError
exception class and make relevant exceptions derived from those or get re-raised as those?We can then add to the docstrings of the downstream functions what exceptions they're allowed to raise for these kinds of circumstances.
Just some thoughts, realise it may not necessarily be easy..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough on
RuntimeError
and friends, I saw this case and during my quick scan and seemed relevant (it's not) although if it was, would be a good candidate to make it throw the proper error type 👍Having these tests would be great. We'd want to apply them to
/event/{eventId}
,/event_auth/{roomId}/{eventId}
,/state/{roomId}?event_id=$xxx
,/state_ids/{roomId}?event_id=$xxx
(although related and probably covered by the existing backoff andNotRetryingDestination
in the next point below). For/backfill
and/get_missing_events
, we can't track server/network problems because there is no specific event we're requesting but we can track whether the events in their responses are rejected, etc.In general, if anything goes wrong with trying to pull an event from another server, we should backoff and do it in the background next time (#13623).
The existing backoff underlying all of the federation requests (
matrixfederationclient.py
andNotRetryingDestination
) seems like it could be sufficient for network related errors. I haven't experienced network problems being the cause of/messages
being slow so I haven't put as much thought in this area.Seems like we shouldn't be asking homeservers if they aren't on our whitelist in the first place. I guess we just let
FederationDeniedError
handle this andcontinue
to the next destination which accomplishes this though.Thanks for the thoughtful response on this @reivilibre! Really helping me form the right thoughts around all of this!