-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Faster room joins: fix race in recalculation of current room state #13151
Faster room joins: fix race in recalculation of current room state #13151
Conversation
This moves us closer to fixing the race between recalculation of a room's current state and event persistence. The next step is to move recalculation of current state into the event persistence queue. Signed-off-by: Sean Quah <seanq@matrix.org>
Avoid races between event persistence and recalculation of a room's current state by putting them in the same queue. Signed-off-by: Sean Quah <seanq@matrix.org>
Signed-off-by: Sean Quah <seanq@matrix.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are failing alas
async def _serialize_payload(room_id: str) -> JsonDict: # type: ignore[override] | ||
return {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can drop this since returning an empty dict is the default implementation in the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python complains because the base implementation is an abstractmethod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh boo
synapse/state/__init__.py
Outdated
self._events_shard_config = hs.config.worker.events_shard_config | ||
self._instance_name = hs.get_instance_name() | ||
|
||
self._update_current_state = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth naming this _client
to make it obvious what the difference is between it and update_current_state
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed!
end_item = queue[-1] | ||
existing_task = queue[-1].task | ||
# add our events to the existing queue item | ||
existing_task.events_and_contexts.extend(task.events_and_contexts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it'd be cleaner to have a def try_update(..) -> bool
function in _EventPersistQueueTask
that encapsulates this logic? For _UpdateCurrentStateTask
it'd simply always return false (i.e. update failed), and in _PersistEventsTask
have this check? Though that might complicate things unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a go at this
|
…_recalculation_race
When we finish un-partial stating all events in a room, we recalculate
the current state using forward extremities. This can race with
persistence of another event, which could result in an invalid current
room state in the database.
To avoid the race, we recalculate current room state in the same
queue as event persistence. The event persistence queue may be on
another worker, so a new replication endpoint is required as well.
Fixes #13007.
May be easiest to review commit by commit.