Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Make _get_state_map_for_room not break when room state events don't contain an event id. #13174

Merged
merged 4 commits into from
Jul 7, 2022

Conversation

Yoric
Copy link
Contributor

@Yoric Yoric commented Jul 4, 2022

When adding a on_new_event handler on a federated homeserver, I keep hitting:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/synapse/replication/tcp/handler.py", line 285, in _unsafe_process_queue
    await self._process_command(cmd, conn, stream_name)
  File "/usr/local/lib/python3.9/site-packages/synapse/replication/tcp/handler.py", line 300, in _process_command
    await self._process_rdata(stream_name, conn, cmd)
  File "/usr/local/lib/python3.9/site-packages/synapse/replication/tcp/handler.py", line 516, in _process_rdata
    await self.on_rdata(stream_name, cmd.instance_name, cmd.token, rows)
  File "/usr/local/lib/python3.9/site-packages/synapse/replication/tcp/handler.py", line 531, in on_rdata
    await self._replication_data_handler.on_rdata(
  File "/usr/local/lib/python3.9/site-packages/synapse/replication/tcp/client.py", line 212, in on_rdata
    await self.notifier.on_new_room_event_args(
  File "/usr/local/lib/python3.9/site-packages/synapse/notifier.py", line 338, in on_new_room_event_args
    await self._third_party_rules.on_new_event(event_id)
  File "/usr/local/lib/python3.9/site-packages/synapse/events/third_party_rules.py", line 399, in on_new_event
    state_events = await self._get_state_map_for_room(event.room_id)
  File "/usr/local/lib/python3.9/site-packages/synapse/events/third_party_rules.py", line 471, in _get_state_map_for_room
    state_events[key] = room_state_events[event_id]
KeyError: (some event id)

Apparently, we have some invalid state in the database of this homeserver (the event was already in the logs last May) and any attempt to call _get_state_map_for_room will break. The fact that _get_state_map_for_room is solely due to the fact that there was no on_new_event callback on this homeserver until now.

This PR uses the more robust get_current_state to implement _get_state_map_for_room, to avoid such breakages.

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

When adding a `on_new_event` handler on a federated homeserver, I keep hitting:

```python
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/synapse/replication/tcp/handler.py", line 285, in _unsafe_process_queue
    await self._process_command(cmd, conn, stream_name)
  File "/usr/local/lib/python3.9/site-packages/synapse/replication/tcp/handler.py", line 300, in _process_command
    await self._process_rdata(stream_name, conn, cmd)
  File "/usr/local/lib/python3.9/site-packages/synapse/replication/tcp/handler.py", line 516, in _process_rdata
    await self.on_rdata(stream_name, cmd.instance_name, cmd.token, rows)
  File "/usr/local/lib/python3.9/site-packages/synapse/replication/tcp/handler.py", line 531, in on_rdata
    await self._replication_data_handler.on_rdata(
  File "/usr/local/lib/python3.9/site-packages/synapse/replication/tcp/client.py", line 212, in on_rdata
    await self.notifier.on_new_room_event_args(
  File "/usr/local/lib/python3.9/site-packages/synapse/notifier.py", line 338, in on_new_room_event_args
    await self._third_party_rules.on_new_event(event_id)
  File "/usr/local/lib/python3.9/site-packages/synapse/events/third_party_rules.py", line 399, in on_new_event
    state_events = await self._get_state_map_for_room(event.room_id)
  File "/usr/local/lib/python3.9/site-packages/synapse/events/third_party_rules.py", line 471, in _get_state_map_for_room
    state_events[key] = room_state_events[event_id]
KeyError: (some event id)
```

I do not understand how we can reach such a state. However, a few clues:

- I can reproduce on a live server with an event was sent back in last May;
- according to my logs, the code never actually calls my `on_new_event` callback, so it's probably not guilty;
- this exception is risen on a codepath that is deactivated when there are no `on_new_event` callbacks, so there is a possibility that the error is simply activated by the existence of the `on_new_event` handler.

This PR simply plasters around the error by ignoring events that do not show up in the state.
@Yoric Yoric requested a review from a team as a code owner July 4, 2022 18:58
@erikjohnston
Copy link
Member

FTR I think this is because a rejected event got into the state, a bit like #6867. We correctly handle this case in other places, so the fix here is just to reuse that code rather than implementing it again separately.

WIP: Add Changelog.

Signed-off-by: David Teller <davidt@element.io>
@clokep
Copy link
Member

clokep commented Jul 5, 2022

@Yoric please update the title / description of this to match the current fix! Thank you! 😄

@Yoric Yoric changed the title Plaster around KeyError in _get_state_map_for_room. Make _get_state_map_for_room not break when room state events don't contain an event id. Jul 5, 2022
changelog.d/13174.bugfix Outdated Show resolved Hide resolved
@Yoric Yoric enabled auto-merge (squash) July 7, 2022 05:48
@Yoric Yoric merged commit 57f6f59 into develop Jul 7, 2022
@Yoric Yoric deleted the yoric/on-new-event branch July 7, 2022 08:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants