-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
AssertionError: No forward extremities left! #5090
Comments
(this is happening fairly regularly in the matrix.org logs) |
I'm also seeing this now, and it appears to be preventing federation with matrix.org |
Quoting from a matrix conversation with @turt2live who has evidence that this bug is preventing larger home server instances federating with matrix.org.
|
fwiw the trace is a bit smaller in my environment for some reason: Master process:
Lining it up by the txnId, here's the federation reader:
This was the federation reader trying to persist 80 "newly-received auth/state events" by its own admission. Synapse appears to think it was missing 1 prev event and went out hunting for it non-recursively, and somehow that blew up into 80 requests for individual events - likely due to the call to get_missing_events which revealed more that were missing? |
the above is also a smaller instance of this happening. For a bit of context, a transaction containing just 4 PDUs is taking minutes, resulting in a POST to the master with 3788 events - this kinda tells me that it is probably not a line length problem and more something around the state resolution algorithm itself? Possibly the federation reader and master aren't doing the same thing or the reader isn't disclosing enough events to the master? Edit: for my reference so I can find it in logs:
|
Skipping the federation reader and throwing transactions at the master process don't help reduce the errors, so the theory of the two processes doing resolution differently is probably wrong. |
When considering the candidates to be forward-extremities, we must exclude soft failures. Hopefully fixes #5090.
The problem seems to be to do with soft-failed events: in particular, when we end up with all of the forward-extremities in a room being soft-failed. #5146 is an (as yet totally untested) patch. |
for the historical record: I've deployed #5146 (t2bot@aa32643) to t2bot.io as of roughly 21:10 UTC today - will see how it goes/recovers (if at all). |
Still getting this after a few days of catch up, however not as common. This appears to be preventing t2bot.io from catching up to matrix.org transactions.
That translates to this transaction from matrix.org:
Specifically a membership event of someone else on a different homeserver which got pulled in by state resolution. |
Joining the club of people having this issue, I don't believe I have any new information to add. |
When considering the candidates to be forward-extremities, we must exclude soft failures. Hopefully fixes #5090.
fixed by #5146, I hope |
This can't be good:
(holy stacktrace, batman)
No clues as to which event/room is causing the problem
The text was updated successfully, but these errors were encountered: