Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Inbound federation fails, "synapse.federation.transport.server - 428 - ERROR - PUT-XXXXXX - on_incoming_transaction failed" #8832

Closed
PC-Admin opened this issue Nov 27, 2020 · 5 comments

Comments

@PC-Admin
Copy link

PC-Admin commented Nov 27, 2020

Description

Inbound federation to perthchat.org has been either very delayed or simply not working for over a month now. Outbound federation is responsive still.

Here is the synapse log error that seems to be related to it:

Nov 27 10:45:55 pagrus matrix-synapse[1070]: 2020-11-27 02:45:55,338 - synapse.federation.transport.server - 428 - ERROR - PUT-600246 - on_incoming_transaction failed
Nov 27 10:45:55 pagrus matrix-synapse[1070]: Traceback (most recent call last):
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/synapse/federation/transport/server.py", line 424, in on_PUT
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     code, response = await self.handler.on_incoming_transaction(
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/synapse/federation/federation_server.py", line 170, in on_incoming_transaction
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     return await self._transaction_resqp_cache.wrap(
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     result = result.throwExceptionIntoGenerator(g)
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     return g.throw(self.type, self.value, self.tb)
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/synapse/federation/federation_server.py", line 196, in _on_incoming_transaction_inner
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     result = await self._handle_incoming_transaction(
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/synapse/federation/federation_server.py", line 242, in _handle_incoming_transaction
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     pdu_results, _ = await make_deferred_yieldable(
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     result = result.throwExceptionIntoGenerator(g)
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     return g.throw(self.type, self.value, self.tb)
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/synapse/federation/federation_server.py", line 363, in _handle_pdus_in_txn
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     await concurrently_execute(
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     result = g.send(result)
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/synapse/util/async_helpers.py", line 172, in _concurrently_execute_inner
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     await maybe_awaitable(func(next(it)))
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/synapse/federation/federation_server.py", line 336, in process_pdus_for_room
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     await self.check_server_matches_acl(origin_host, room_id)
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/synapse/federation/federation_server.py", line 775, in check_server_matches_acl
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     acl_event = await self.store.get_event(acl_event_id)
Nov 27 10:45:55 pagrus matrix-synapse[1070]:   File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events_worker.py", line 265, in get_event
Nov 27 10:45:55 pagrus matrix-synapse[1070]:     raise NotFoundError("Could not find event %s" % (event_id,))
Nov 27 10:45:55 pagrus matrix-synapse[1070]: synapse.api.errors.NotFoundError: 404: Could not find event $<redacted>

I tried rejoining the room with the matrix.org account, but that didn't change anything.

I'm happy to dig up more information if you need or try any fixes you can imagine. Unfortunately this is quite detrimental to our users experience. :(

Steps to reproduce

  • create a perthchat.org account and navigate to any of our rooms, for example: #perth:perthchat.org.
  • have a matrix.org account join the same room, send a message from it.
  • witness it not arriving on the perthchat.org side.

Version information

  • Homeserver:

If not matrix.org:

  • Version:

    "python_version": "3.8.6",
    "server_version": "1.23.0"

  • Install method:

Spantaleevs deploy script.

  • Platform:

Docker container on Debian 10.

@anoadragon453
Copy link
Member

Is it always failing on the same event, or different events each time? Is only one room affected? Also which workers are you currently using if any?

await self.store.get_event(acl_event_id)

Given this, it looks like your server is unable to find the server ACL event for this room. Though I'm not sure whether this just happens to be the one event your server has lost, or it can't find a number of different events pointing to a larger problem.

@PC-Admin
Copy link
Author

PC-Admin commented Nov 28, 2020

Using no workers at the moment.
It happens to all events from matrix.org, libre.bzh and flobob.ovh, to any rooms on my server. Interestingly i spun up a new server for testing (cheesedomain.xyz) and inbound federation from that new server is quick and responsive.

Outbound federation is always quick and responsive, but all inbound federation seems ends up being lagged for an hour or two before events arrive.

I can DM you a copy of my Synapse logs if you wish.

Edit: 2 days later and it's still repeatedly having the above error about that same event. :S

@PC-Admin
Copy link
Author

Closing this since the lag seems to have vanished, yet Synapse is still screaming constantly about that event. :S

Bless this mess.

@anoadragon453
Copy link
Member

Synapse is likely wasting resources attempting to validate that event over and over. The event seems to exist in the current state of the room, but not in your database, which is either the symptom of a bug or some database surgery gone wibbly.

To clear the error I'd suggest leaving the room, purging it from your database and then rejoining again.

@richvdh
Copy link
Member

richvdh commented Nov 30, 2020

if the room in question was !CxOubzqfmBgVbsQPxy:matrix.org, it was due to some broken events being sent to that room back in July 2019. The bug that allowed those events was addressed in #5701, but you may still have the broken room in your database. The best course of action is indeed to purge it (though you might need the "force_purge" functionality in #8843 to do so.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants