Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add experimental support for sharding event persister. #8170

Merged
merged 6 commits into from
Sep 2, 2020

Conversation

erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Aug 26, 2020

This is not ready for production yet. Caveats:

  1. We should write some tests...
  2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.

Probably worth looking at this commit by commit. There is a FIXME in Implement config... that is dealt with by the federation handler refactor,

@erikjohnston erikjohnston force-pushed the erikj/add_stream_token_type branch 7 times, most recently from 0df5060 to e151ff2 Compare September 1, 2020 12:56
@erikjohnston erikjohnston requested a review from a team September 1, 2020 13:12
@erikjohnston erikjohnston force-pushed the erikj/add_stream_token_type branch 2 times, most recently from 9c8eb15 to 609bc17 Compare September 1, 2020 14:45
@erikjohnston erikjohnston force-pushed the erikj/add_stream_token_type branch from 609bc17 to ac494a8 Compare September 1, 2020 15:40
Copy link
Member

@clokep clokep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see anything glaringly incorrect, but did leave quite a few comments. I'm guessing documentation won't be updated until this is not experimental?

synapse/config/_base.py Outdated Show resolved Hide resolved
synapse/config/workers.py Outdated Show resolved Hide resolved
synapse/config/workers.py Outdated Show resolved Hide resolved
@@ -923,7 +923,8 @@ async def backfill(self, dest, room_id, limit, extremities):
)
)

await self._handle_new_events(dest, ev_infos, backfilled=True)
if ev_infos:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this if-statement just an optimization?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh err, somewhere along the lines I think something got very unhappy about empty lists. I forget the details now or if its even necessary 😕

synapse/handlers/federation.py Outdated Show resolved Hide resolved
Comment on lines +2931 to +2932
instance = self.config.worker.events_shard_config.get_instance(room_id)
if instance != self._instance_name:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: I wrote the below before realizing that instance is also used in the _send_events call below, I thought it was worth leaving though in case it jiggles a good idea loose:

It looks like this pattern is used quite a bit, in the comments for ShardedWorkerHandlingConfig it says to prefer should_handle, which seems like it could be used here:

if self.config.worker.events_shard_config.should_handle(self._instance_name, room_id):

Although I think some of this could be simplified more if ShardedWorkerHandlingConfig knew what the current instance was (maybe should_handle wouldn't need the instance passed in?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nods. The problem with giving ShardedWorkerHandlingConfig the current instance name is that I don't think we know what the instance name is during config parsing (outside of the worker config parsing).

Copy link
Member Author

@erikjohnston erikjohnston Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we do need to use should_handle first to technically conform to the docs of ShardedWorkerHandlingConfig

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I guess the fact that we go on to do a HTTP replication hit means that get_instance has to work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is OK to use get_instance here, that and should_handle should have matching logic after all!

synapse/handlers/room.py Outdated Show resolved Hide resolved
synapse/storage/util/id_generators.py Show resolved Hide resolved
synapse/storage/databases/main/events_worker.py Outdated Show resolved Hide resolved
erikjohnston and others added 2 commits September 2, 2020 10:29
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
@erikjohnston erikjohnston requested a review from clokep September 2, 2020 10:46
@erikjohnston
Copy link
Member Author

I'm guessing documentation won't be updated until this is not experimental?

Yeah, I really really don't want people using this at all. Not even in tests (I have a branch with working sytests but they go so slowly)

@clokep
Copy link
Member

clokep commented Sep 2, 2020

@erikjohnston So I think this looks good! Not sure if you're looking for a ✅ or want to add tests?

@clokep
Copy link
Member

clokep commented Sep 2, 2020

I think this is related to #7986.

@erikjohnston
Copy link
Member Author

@erikjohnston So I think this looks good! Not sure if you're looking for a or want to add tests?

So I was waiting for SyTests to land, but actually it turns out that with this current implementation they take forever. This is because of the get_persisted_position not always advancing due to gaps in the events stream, which can be fixed by making each worker periodically send out updated positions, but a) that makes the tests slow and b) not really something I necessarily want to add.

Given this is completely undocumented I'm tempted to merge as is, and then require tests for the next phase (which is augmenting the get_persisted_position stuff with something a bit smarter)

@erikjohnston erikjohnston merged commit 82c1ee1 into develop Sep 2, 2020
@erikjohnston erikjohnston deleted the erikj/add_stream_token_type branch September 2, 2020 14:48
babolivier added a commit that referenced this pull request Sep 3, 2020
babolivier added a commit that referenced this pull request Sep 4, 2020
…" (#8242)

* Revert "Add experimental support for sharding event persister. (#8170)"

This reverts commit 82c1ee1.

* Changelog
erikjohnston added a commit that referenced this pull request Sep 11, 2020
This is *not* ready for production yet. Caveats:

1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
babolivier pushed a commit that referenced this pull request Sep 1, 2021
* commit '0d4f614fd':
  Refactor `_get_e2e_device_keys_for_federation_query_txn` (#8225)
  Add experimental support for sharding event persister. (#8170)
  Add /user/{user_id}/shared_rooms/ api (#7785)
  Do not try to store invalid data in the stats table (#8226)
  Convert the main methods run by the reactor to async. (#8213)
babolivier pushed a commit that referenced this pull request Sep 1, 2021
* commit '9f8abdcc3':
  Revert "Add experimental support for sharding event persister. (#8170)" (#8242)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants